Actually, everything is very logical. If not Lingvo, FineReader'a could and not to be. Everything began with a large-scale and ambitious complex under the name Lingvo Systems. With its help the person could scan the text in one language, pass it through the program and get translation, however, draft, but for understanding of its sense was enough.
Four programs were united in Lingvo Systems: from the third-party companies — a raspoznavalka of characters, the proofreader, the translator, and also our Lingvo dictionary. And recognition was the weakest link just: the program needed to be trained long in each font, but even after that quality left much to be desired. The program had to meet at least several copies of single letters and every time needed the hint. Gradually she "began to see clearly" and began to understand more and more characters. So there took place training activity. But as soon as the font or at least its size changed, everything had to be repeated at first.
It is necessary to tell that then, at the beginning of the 90th, in the organizations which gemmated from different scientific research institutes already began to develop the OCR systems (optical character recognition). It was quite demanded technology – high-quality recognition was necessary not only to us for our Lingvo Systems, but also to the market. And we had a choice – to wait until someone another makes the abrupt program or to develop own.
We decided not to wait. Of course, the task seemed nontrivial: the whole scientific institutes dealt with a problem of character recognition, and we had no such experience. But we were young and ambitious, considered that any tasks to us on a shoulder therefore with enthusiasm undertook development of the qualitative program.
We began to create the program in November, 1992, and were going to finish by May, 1993. Lack of the qualitative program of recognition significantly prevented sales, competitors did not doze therefore we should have hurried. Understanding that it is impossible to develop all technology from scratch in such time, purchased some practices at the young scientist who worked over the similar program at home in free time – with no specific aim, just from a personal interest in a subject.
Its technology was in the status far from commercial application, and we made the mass of efforts that the program learned to issue useful result. One business – exploratory development, another – the working product. Initially the code of the program was developed under MS DOS, and we needed to transfer all under Windows. Besides, the technology supported only one elementary format of images (uncompressed BMP), and from a commercial product support of all main at that time of formats – at least the TIFF format was required. But in those days it was very not settled format, everyone wrote it how he wanted: with alignment, and a miscellaneous, without, in direct option, in a negative. Generally, it was necessary to tinker, and all the same for a long time there were TIFF files which caused problems with reading.
Well and most important: in system there were practically no ready descriptions of characters, and tools for creation of these descriptions were absent at all. As such tool the set of big text files in which in a pseudographics type the generalized circuits of characters were drawn was used. They needed to be governed and improved directly in this file in a normal text editor. At some point David personally sat down at data preparation for system – before several girls employed for this work refused it, it was very heavy. The tool was very inconvenient, and work was big, difficult and very tiresome. It was necessary to go through for hours huge text files, something to look out and govern there, studying results of test runs. Work seemed eternal, progress happened with small steps. The strong mentality was necessary to cope with it. And David two months without days off every day for 12-14 hours licked base of recognition into shape.
In parallel with it we began to penetrate into data domain. Communicated with specialists, got acquainted with Alexander Lvovich Shamis – the outstanding scientist who dealt with practical and theoretical problems of artificial intelligence, developed applied technologies in the field of machine perception (Alexander Lvovich still works as the scientific consultant in ABBYY). And by the time of release of FineReader 1.0 we already knew what has to be upcoming version. You ask why all that good what we thought up, was not included into the first version – we will answer that the first version needed to be done quickly. The companies were necessary money – without the first version we would not have enough money for development of following. Upcoming version was significantly better than the first – even not on the head, and on many heads. She made much less mistakes, coped with complex problems much better, saved formatting significantly better and for those times had just record accuracy of work.
Of course, we approached development intelligently, imagined how the ideal program has to look. And we had two benefits at once – independence of a font and a mnogoyazychnost.
With a mnogoyazychnost everything is simple: it is obvious that many technical texts, even written in Russian, contain very many words and terms in Latin, most often in English. But at that time for some reason nobody thought of it, and the first systems of recognition understood only one language. And we specially included support of the Russian and English languages that such texts could be processed qualitatively in the program. Here we were helped by existence in command of Vladimir Selegey who had considerable experience in development of means of spell check for different languages. In general, since then and until now dictionary support is strength of our sensing technology.
Independence of a font (omnifontovost) means that the program did not need to be configured for recognition of each new font, that is she distinguishes characters practically of any sizes and outlines. Our FineReader was the first omnifontovy program supporting Cyrillics. Now we already got used that if the program did not recognize a font, so it some very difficult or fancy, and then even for normal book fonts was necessary to provide training. A step to the left, the step to the right – and the program cannot apprehend even that font which generally knows. For example, if it or image quality is worse than other size.
So the box of the first FineReader looked:
At once after release of the program to it there was a huge interest. Demand was big, and FineReader of the program existing before emergence did not satisfy it. We were lucky — we appeared in the correct place and in the correct time.
The first FineReader'a version appeared with a circulation of 500 copies. In the first month we sold more than one hundred copies – for those times it was epoch-making number! Even sales of Lingvo already very popular at that time and costing several times cheaper, seldom reached 100 copies a month.
Of course, still a lot of work was coming us to bring the program to the highest level. And, by the way, we were helped with it by competitive fight against one of the Russian companies. As a result in the heat of the hot competition we created the product which appeared better than many foreign analogs.
Release of the second FineReader version was followed by one more interesting story. FineReader 2.0 was the 32-bit application. We planned its release for spring of 1995, and were going to do in time directly under release of Windows 95 (before Microsoft declared that the new Windows version will appear in April). New Windows profitable differed from old, we understood what people will begin to do aprgeyd at once and our sales will go uphill. But at the same time we had "reserve airfield" in a type the Win32s components – additions to 16-bit Windows 3.1x which allowed to start under it specially adapted 32-bit applications. But there were at the same time two problems: Microsoft postponed release of Windows 95 to August, and in Win32s of version 1.2 the error with support of Unicode because of which the Russian letters in the interface were not displayed was found. It was necessary to contact urgently Microsoft that at that time was business almost impossible, – it was the largest monopolist in the market of software on which almost everything depended in the industry and to expect from it reaction to needs of the small company in far Russia with a sales market, scanty for Microsoft, would be madness.
But there was a miracle: the same problem appeared at the Autodesk company which was a strategic partner of Microsoft. As a result of us with Autodesk integrated in one keyz and selected the special manager who entered to us into a correspondence. As a result it was succeeded to agree that in version 1.3 which, however, appeared along with Windows 95 this error was corrected. And before we found bypass option – the received version did not work correctly under Windows 95, but worked in Windows 3.1x for the time being.
So FineReader 1.3 looked:
In general, our risk invention with release of a 32-bit shrink-wrapped software product spoiled to us a lot of blood. 16-bit Windows was still widespread, and Win32s did not differ in stability. I remember how nearly a week we caught some terrible error in a subsoil of Win32s by means of a kernel debugger (kernel debugger) through com-port in the command and lower case mode. Found a problem – something incorrectly worked in a system allokator of memory, and could think up a bypass. But new FineReader shone on Windows 95, being the family for it the application, and the 32-bit mode was very important for the OCR program as it allowed to optimize considerably work with big data in memory that is typical for problems of recognition. It gave us odds for many years ahead before competitors and in many respects predetermined our success in the market of licensing of sensing technology.
And here FineReader 2.0:
The program was loaded from four diskettes:
Of course, you wait for screenshots. Here is how the FineReader 3.0 interface looked:
FineReader became the sign program for us. We entered with it on the international market. Today this program is used by more than 20 million people in the world. And the sensing technology of texts which is the cornerstone of FineReader the largest world companies – Microsoft license, Samsung, Fujitsu, Panasonic and many others.
Then, 22 years ago, we also could not assume where everything will come. And today we understand that could achieve such impressive result thanking:
• To big and persistent work. Yes, with the last bit of strength, but with the enormous drive (you remember about 12-14 hours a day without days off).
• To ability to find and create competitive benefits – those a mnogoyazychnost and independence of a font.
• And courage. Now we understand that how it is important not to be afraid of barriers on the way.
This article is a translation of the original post at habrahabr.ru/post/273219/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: email@example.com.
We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.