But if you thought that our "affair with Tolstoy" on it ended, then you were mistaken – having digitized texts of the writer, we began to investigate them by means of technology of information extraction ABBYY Compreno – not to vanish to such rich material. About what gave us "Thick text mining" and where now the received results are used, read further.
The main goal of the All Tolstoy in One Click project was to make Tolstoy's creativity rather general property that all texts which issued from its pen were available in one click in any point of Earth. As, by the way, the author who still during lifetime refused all rights to the texts also bequeathed (yes, the anonymous, Lev Tolstoy knew about a copyleft and an opendata long before this your Internet and Richard Stallman).
However an opportunity to load the book in a convenient format in the reader or the tablet – not the only plus of digitization. Now it is possible not only to read Tolstoy's texts, but also "to measure", that is to investigate by different quantitative methods, using all arsenal of means of hands-off processing of the text (AOT, it is NLP). If you have all texts of the writer in electronic form, even by means of one-two competent search queries you can obtain curious data for which production in other times some literary critic could spend weeks and months of persistent work. And if you besides have an advanced technology of the analysis of a natural language, that is chances to make serious philological discovery (even without being a philologist). Below I will tell that we managed to namerit and learn, but before it is the couple of words about the one who as well as why is engaged in hands-off processing of artistic texts and that interesting can turn out at the same time.
1 year, 8 months ago
ABBYY FineReader – the program for recognition of texts which in Russia is known to much since student's times. This year FineReader is performed 22 years, it is a little younger than our Lingvo dictionary. How so left that together with the dictionary young programmers from BIT Software (at that time ABBYY was called quite so) were engaged in recognition of texts? And what helped Fayn to become one of the programs which are most recognized in the market?
Actually, everything is very logical. If not Lingvo, FineReader'a could and not to be. Everything began with a large-scale and ambitious complex under the name Lingvo Systems. With its help the person could scan the text in one language, pass it through the program and get translation, however, draft, but for understanding of its sense was enough.
1 year, 8 months ago
Hi! Last time we told about how the technology of understanding and the analysis of texts in natural languages of ABBYY Compreno is arranged. Many ask us – how many it is already possible to develop technology and where already, at last, products based on Compreno. As promised, today's material is devoted to products and what problems of business they solve already today.
On the basis of our technology it is possible to create a number of solutions for different type of tasks. But focus of our attention is a corporate market, the companies which need to obtain in a short time significant information from data arrays today. This direction is perspective for us and from the point of view of a demand of such technologies clients, and from the point of view of the fastest return of our investments into technology.
At once we will note that solutions based on the Compreno technology are application or technology modules which are built in any solutions, adding it features.
Perhaps it will seem to someone from you surprising, but the text similar to what you see on the picture (and it is Burmese) too can be recognized. Some time ago on the Internet the amusing comic book about distinction of Asian languages, but it too indecent went to publish it in the corporate blog :) About that why we needed to distinguish Burmese and what problems it was necessary to face at the same time, – under a cat.
1 year, 8 months ago
As you know, ABBYY is engaged in development of technology of the analysis of natural languages of Compreno. Now the system works at the English and Russian languages, and is actively used in many projects. However initially technology was conceived as multilingual therefore we pay also to "training" in other foreign languages much attention. And here it is possible to draw some analogy to the person: after studying of one foreign language others are given easier. In particular, now we add German to technology and in parallel we investigate possibilities of the market – whether there is an interest in this direction. At once we will make a reservation – so far the speech about the products supporting German does not go, we at the very beginning of a way.
Today we want to tell you that we decided to take part in the Habrakhabra project "The help to startups". The essence of an invention is that IT cтартапы ß the income there are no more than 20 million rubles a year, passed our selection, will be able to get free access to our sensing technologies, and winners of the project — in addition and marketing support. Read all details here, requests are accepted till November 22, 2015.
I hope, you were interested by our yesterday's post about system of information extraction of ABBYY Compreno in which we have told about system architecture, the semantico-syntactical parser and its role and, the most important, about information objects.
From the second part of article you learn how the engine of information extraction is arranged.
My name is Ilya Bulgakov, I am the programmer of department of information extraction in ABBYY. In series from two posts I will tell you our main secret – as the technology of Information extraction in ABBYY Compreno works.
Earlier my colleague Danya Skorinkin of DSkorinkin has managed to tell about view of system from ontoinzhener, having touched upon the following subjects:
This time we will fall more deeply to subsoil of the ABBYY Compreno technology, we will talk about system architecture in general, the basic principles of its work and algorithm of information extraction!
Today we will tell how uses service one of our clients — Aviasales. It is one of the largest searchers of air tickets in the world (the company works at the foreign markets under the name of JetRadar), in month about 10 million people use it, and the number of search sessions per day comes nearer to one million. Already for a long time Aviasales, as well as any service respecting itself, has released mobile applications for search and purchase of air tickets.
1 year, 10 months ago
Many beginner developers face need to show the existing software development experiment. Even if vacancy is called "the programmer without experience", and without Captain clearly that the preference will be given to the applicant who has shown more or less distinct code which is more or less confirming availability of useful skills in work.
What to do? To show solution of standard task from early courses or the degree project? Perhaps, it is not enough? Perhaps, to write any impressive program and at the same time to get development experience? For example … or what there usually write the calculator or the compiler for receiving and show of experience? And if there are no forces to finish?
On these questions and doubts there is one almost universal answer — completion of projects with the open code. You have opportunity to solve real problems of the most different levels of complexity, result not only will work for hundreds-thousands-million of people around the world, but it also can be shown. This opportunity is mentioned sometimes, but usually on it do not stop. This time — we will stop on it.