2 years, 10 months ago
Hello, Habr! Recently we received from "News" the order for carrying out research of public opinion on a movie occasion "Zvezdny voina: Awakening of Force" which premiere took place on December 17. For this purpose we decided to carry out the analysis of a tonality of the Russian Twitter segment on several relevant hashtags. Result from us waited for everything in 3 days (and it in the end of the year!) therefore we needed very fast method. We found several similar online services (among which there is sentiment140 and tweet_viz) in the Internet, but it turned out that they do not work with Russian and for some reason analyze only small percent of tweets. We would be helped by the AlchemyAPI service, but restriction in 1000 requests a day also did not suit us. Then we decided to make the analyzer of a tonality with a blackjack and all the rest, having created unpretentious recurrent neural network with memory. Results of our research were used in article of "News" published on January 3.
In this article I will a little tell about such networks and I will acquaint with pair of cool tools for house experiments which will allow to build neural networks of any complexity in several code lines even to school students. Welcome under kat.
2 years, 10 months ago
At the end of November Yandex Company announced the updated weather service under the name "Yandex. Meteum". Allegedly new program software development to the algorithms is capable to calculate the forecast to within the house. I as the amateur meteorologist could not but be interested in a new product. I was always respectful to Yandex, even despite unsuccessful restart of Kinopoisk, but having studied the announcement published on the website "Habrakhabr" in detail I found in it a number of not joining and logical errors. Then I decided to conduct the research of accuracy of new service, concerning other weather resources, namely my website "Weather 45" (A weather forecast for Kurgan) and Foreca (from which basic resource Yandex takes data).
Further I will rely on the expanded announcement published on Habrakhabr. Let's investigate this not joining and logical inaccuracies which were found by me in this announcement.
2 years, 10 months ago
Today machines without effort "connect two words" (1, 2), but are not able to carry on with guarantee dialogue on the general subjects yet. However, already tomorrow you will ask them to make correctly the summary and to select for your children the best section on chess near the house. You want to understand in more detail how in this direction scientists from Facebook, with Google and dr work? You come to listen to them.
2 years, 10 months ago
I want to share experience with a problem of the known tender of machine learning from Kaggle. This tender is positioned as tender for beginners, and I just had no practical experience in this area. I knew the theory a little, but almost did not deal with real data and densely did not work with a python. As a result, having spent couple of New Year's Eve evenings, gathered 0.80383 (the first quarter of a rating).
Generally this article for still beginners from already begun.
On December 26 our FlyElephant command will take part in Hub AI&BigData meetings; meetup devoted to big data and artificial intelligence. Action will take place in Odessa and will begin at 11:00. For all who will not be able to come online broadcasting will be organized.
2 years, 11 months ago
Hi, Habr! In this article it will be a question of such not really pleasant aspect of machine learning as optimization of hyper parameters. Two weeks ago in very famous and useful Vowpal Wabbit projectthe vw-hyperopt.py module able to find good configurations of hyper parameters of the Vowpal Wabbit models in spaces of big dimension was poured. The module was developed in DCA (Data-Centric Alliance).
For search of good configurations of vw-hyperopt uses algorithms from pitonovsky library Hyperopt and can optimize hyper parameters it is adaptive by means of the Tree-Structured Parzen Estimators (TPE) method. It allows to find the best optimum, than simple grid search, at the equal number of iterations.
This article will be interesting to all who deal with Vowpal Wabbit, and especially by that who was annoyed with absence in the source code of methods of tuning of numerous handles of models, and or tyunit them manually, or incensed optimization independently.
Machine learning is engaged in search of the hidden patterns in data. The growing growth of interest in this subject in IT community is connected with the exclusive results received thanks to it. Voice recognition and the scanned documents, search engines — all this is created with use of machine learning. In this article I will tell about the current project of our company: how to apply methods of machine learning to increase in performance of DBMS. The existing mechanism of the scheduler of PostgreSQL understands the first part of this article, in the second part it is told about opportunities of its improvement using machine learning.
2 years, 11 months agoWindows is so evil that consumes extra energy to make the things running.
The XGBoost library rattles at all machine learning competitions and helps to take prizes. However, the owner of this packet for Python under Windows to become not so simple.
Installation process is poorly described on GitHub and the little is broader at the forum Kaggle. Therefore I will try to describe step by step in more detail. I hope it will help to save a lot of time to novice users.
So-called machine learning does not cease to surprise, however for mathematicians the success reason is still not absolutely clear.
Somehow few years ago at supper, to which I was invited, the outstanding specialist in the field of differential geometry Eugenio Calabi volunteered to devote me in a subtlety of very ironic theory about a difference between adherents of pure and applied mathematics. So, having reached a deadlock in the researches, supporters of a pure mathematics quite often narrow a perspective, trying to bypass an obstacle thus. And their colleagues specializing in applied mathematics come to a conclusion that current situation indicates the need to continue studying of mathematics for the purpose of creation of more effective tools.
I always liked such approach; thanks to it it becomes clear that applied mathematicians will always manage to involve new concepts and structures which continually appear within fundamental mathematics. Today, when the question of studying of "big data" – too volume or difficult information blocks which do not manage to be understood is on the agenda, using only traditional methods of data handling – the tendency especially does not lose the relevance.
This post — about quality of air which we breathe. It is considered to be that in general, air of big cities is unhealthy. It also is clear — here to you both a traffic and plants and you never know what else. Generally, all this keeps residents of the megalopolis in permanent concern about "an adverse ecological situation".