Developers Club geek daily blog

Prediction of survival of passengers of Titanic by means of Azure Machine Learning

1 year, 8 months ago
We express many thanks for preparation of article to Kirill Malev from the Merku company. Kirill more 3kh is engaged years in practical application of machine learning for different data volumes. In the company solves problems in the field of prediction of outflow of clients and natural languag processing, much attention paying commercializations of the received results. Has finished magistracy of University of Bologna and NGTU

Today we will tell you how in practice to use cloud platform of Azure for solution of problems of machine learning for solution of problems of machine learning on the example of popular problem of prediction of the survived passengers of Titanic.

We all remember the known picture about owl therefore in this article all steps are in detail commented. If any step is not clear to you, you can ask questions in comments.


Read more »


Comparison of rates for the mobile Internet in different regions

1 year, 8 months ago
Mobile operators have strikingly different scale of charges on the Internet in different regions, but many of these rates or work at once at the territory of all Russia, or demand connection of additional option that can be nevertheless cheaper than use of rates of the region.

Happens, sometimes I should use exclusively mobile Internet within week moreover and in other region. Traffic for this week about 3 gigabytes though usually the gigabyte a month is enough for me are spent.

It would be desirable to have some SIM both for trips, and for daily use where there would be the cheapest traffic, but what operator and what rate / packet to select? I also have tried to learn it. As you understand, SIM all the time will be used out of the house region therefore this comparison does not apply for completeness since I considered only those options which interested me. It should be noted that calls and SMS did not interest me absolutely — I do not call and to me do not call.

Megaphone


Let's begin with Internet packets of Megafon. Unfortunately, all packets of Megafon work only in the house region, except Moscow, and the option "All Russia" costs impressive 10 rubles per day so for myself I have excluded at once this option. Basic rates too are rather expensive therefore have not been considered by me.


Read more »


How to pogrepat the Internet

1 year, 8 months ago
Analysts sometimes need to answer questions like such: "how many the sites use WordPress, and how many Ghost", "what covering at Google Analytics and what at the Metrics", "as often the site of X refers to the site Y". The most honest way to answer them — to walk on all pages on the Internet and to count. This not so mad idea as can seem. There is Commoncrawl project which publishes every month fresh dump of the Internet in the form of gzip-archives the total size in ~ 30tb. Data lie on S3 therefore for processing MapReduce from Amazon is usually used. There is mass of instructions how to do it. But with current rate of dollar such approach became a little expensive. I would like to share in the way how to reduce the price of calculation approximately twice.

Read more »


Visualization of static and dynamic networks on R, part 5

1 year, 8 months ago
In the first part:
  • visualization of networks: what for? how?
  • visualization parameters
  • best practices — esthetics and productivity
  • data formats and preparation
  • the description of data sets which are used in examples
  • the beginning of work with igraph

In the second part: colors and fonts in diagrams R.

In the third part: parameters of graphs, tops and edges.

In the fourth part: placements of network.

In this part: emphasis of properties of network, tops, edges, ways.

Read more »


Big Data and Machine Learning? To you on HighLoad ++

1 year, 9 months ago


Contrary to the name and the first impression which arises at most of inhabitants — "Big Data" is not simply "big data" and does not even integrate under itself all arrays with unlimited (or constantly renewed and extending) data.

Actually "Big Data" — it first of all approaches, tools and methods of processing directly this. Which, in turn, are most often not structured, diverse and diverse.

And that is the most important, "Big Data" — is new section of 2015 within the HighLoad program ++, for the first time offered, by the way, at meeting of speakers. The first, single, reports, have appeared last years:


Read more »


All-Russian competition "Open Data"

1 year, 9 months ago
Hi, habr!



Last time we already helped to see off Hakaton according to open data on whom in effect some interesting services have been thought up and implemented. Now we hurry to report that starts very large-scale All-Russian action for data analysis. We will try to help the Russian Government Analytical Centre and the Open Government to make this action rather interesting and fascinating. Last time we almost managed it. It is clear that the level of such actions for data analysis specialists is far from about what we write and than we are engaged. However, we recognize that to try to improve this situation better once again, than to do nothing.

Read more »


Thousand and one blister. Search of drugs with inflated price

1 year, 9 months ago

Read more »


How many tweets are necessary to learn your character?

1 year, 9 months ago
The extensive growth of number of unstructured data (tweets, posts, comments, photo and video) generated by mankind – both fantastic opportunities, and headache for many old and new industries.

The other day we already gave factual account on volumes of number of the messages made by mankind per day it is clear that billions of expressions demand absolutely other solutions and technologies. "Old" (horror, there have passed 3-5 years, and already old) approaches and the people developing them fight for place in the sun. But …

image

We give transfer of recent material from division of IBM Watson as classical example:

Read more »


As I participated in competition of Sberbank about prediction of outflow of clients

1 year, 9 months ago

Read more »


Analysis of problem of Digit Recognizer of the competition Kaggle

1 year, 9 months ago
Hi, habr!



As promised, I continue the publication of analyses of tasks which I proreshat during the operating time with children from MLClass.ru. This time we will sort method main component on the example of known problem of recognition of digits of Digit Recognizer from the Kaggle platform. Article will be useful to beginners who else start studying data analysis. By the way, not late to sign up for course Applied data analysis, having had opportunity as fast as possible to be pumped over in the field.

Read more »