We express many thanks for preparation of article to Kirill Malev from the Merku company. Kirill more 3kh is engaged years in practical application of machine learning for different data volumes. In the company solves problems in the field of prediction of outflow of clients and natural languag processing, much attention paying commercializations of the received results. Has finished magistracy of University of Bologna and NGTU
Today we will tell you how in practice to use cloud platform of Azure for solution of problems of machine learning for solution of problems of machine learning on the example of popular problem of prediction of the survived passengers of Titanic.
We all remember the known picture about owl therefore in this article all steps are in detail commented. If any step is not clear to you, you can ask questions in comments.
1 year, 10 months ago
Mobile operators have strikingly different scale of charges on the Internet in different regions, but many of these rates or work at once at the territory of all Russia, or demand connection of additional option that can be nevertheless cheaper than use of rates of the region.
Happens, sometimes I should use exclusively mobile Internet within week moreover and in other region. Traffic for this week about 3 gigabytes though usually the gigabyte a month is enough for me are spent.
It would be desirable to have some SIM both for trips, and for daily use where there would be the cheapest traffic, but what operator and what rate / packet to select? I also have tried to learn it. As you understand, SIM all the time will be used out of the house region therefore this comparison does not apply for completeness since I considered only those options which interested me. It should be noted that calls and SMS did not interest me absolutely — I do not call and to me do not call.
Let's begin with Internet packets of Megafon. Unfortunately, all packets of Megafon work only in the house region, except Moscow, and the option "All Russia" costs impressive 10 rubles per day so for myself I have excluded at once this option. Basic rates too are rather expensive therefore have not been considered by me.
1 year, 10 months ago
Analysts sometimes need to answer questions like such: "how many the sites use WordPress, and how many Ghost", "what covering at Google Analytics and what at the Metrics", "as often the site of X refers to the site Y". The most honest way to answer them — to walk on all pages on the Internet and to count. This not so mad idea as can seem. There is Commoncrawl project which publishes every month fresh dump of the Internet in the form of gzip-archives the total size in ~ 30tb. Data lie on S3 therefore for processing MapReduce from Amazon is usually used. There is mass of instructions how to do it. But with current rate of dollar such approach became a little expensive. I would like to share in the way how to reduce the price of calculation approximately twice.
Contrary to the name and the first impression which arises at most of inhabitants — "Big Data" is not simply "big data" and does not even integrate under itself all arrays with unlimited (or constantly renewed and extending) data.
Actually "Big Data" — it first of all approaches, tools and methods of processing directly this. Which, in turn, are most often not structured, diverse and diverse.
And that is the most important, "Big Data" — is new section of 2015 within the HighLoad program ++, for the first time offered, by the way, at meeting of speakers. The first, single, reports, have appeared last years:
Last time we already helped to see off Hakaton according to open data on whom in effect some interesting services have been thought up and implemented. Now we hurry to report that starts very large-scale All-Russian action for data analysis. We will try to help the Russian Government Analytical Centre and the Open Government to make this action rather interesting and fascinating. Last time we almost managed it. It is clear that the level of such actions for data analysis specialists is far from about what we write and than we are engaged. However, we recognize that to try to improve this situation better once again, than to do nothing.
1 year, 11 months ago
The extensive growth of number of unstructured data (tweets, posts, comments, photo and video) generated by mankind – both fantastic opportunities, and headache for many old and new industries.
As promised, I continue the publication of analyses of tasks which I proreshat during the operating time with children from MLClass.ru. This time we will sort method main component on the example of known problem of recognition of digits of Digit Recognizer from the Kaggle platform. Article will be useful to beginners who else start studying data analysis. By the way, not late to sign up for course Applied data analysis, having had opportunity as fast as possible to be pumped over in the field.