Developers Club geek daily blog

Evolution of data structures in Yandex. To a metrics

1 year, 2 months ago
Yandex. A metrics today it not only system of web analytics, but also AppMetrica — system of analytics for applications. On an input in the Metrics we have a data stream — the events which are taking place on the websites or in applications. Our task — to process these data and to present them in the form, suitable for the analysis.



But data handling is not a problem. The problem is in that as well as in what type to save results of processing that it was possible to work with them conveniently. In development process we had to change approach to the organization of data storage several times completely. We began with the tables MyISAM, used LSM trees and eventually came to column-oriented to the database. In this article I want to tell what us forced to do it.

Yandex. The metrics works since 2008 — more than seven years. Every time change of approach to data storage was caused by the fact that this or that solution worked too badly — with an insufficient stock on performance, it is insufficiently reliable also with a large number of problems at operation, used too many computing resources, or just did not allow us to implement that we want.

Read more »


Festival of data in the museum of Moscow or as Big Data helps to live and work

1 year, 2 months ago


Hi Habr,

If long ago it was interesting to you how Big Data is applied in different areas of business, science and public administration and it there was a wish to hear from people who are engaged in it, then welcome to the Festival of Data which will take place on December 19 at the Exhibition of the High SMIT Technologies in the Museum of Moscow.

During several business hours of the Festival leading experts of the industry from Yandex, "Schools of data Beeline", Data-Centric Alliance, Avito, state unitary enterprise "NI and PI of the General plan of Moscow, Higher School of Economics National Research University will tell guests of an exhibition about perspectives of use of data analysis in the next several years.

Read more »


How to get on giving the president to five o'clock in the morning

1 year, 2 months ago
This post how normal cracking turned back pangs of conscience and sincere torments. There will be not many source codes, it is more than photos and analysis. So, somebody Vasya works as "the bad guy". Extent of falling of Vasya such is that funds for life to it are brought by search and analysis of information, access to which was compromised owing to illiterate service, disorder or economy on service personnel.


Read more »


We study the graph - the oriented Neo4j DBMS on the example of the lexical Wordnet base

1 year, 2 months ago
Neo4j DBMS is NoSQL the database oriented to storage of graphs. A highlight of a product is declarative language of requests Cypher.

Cypher borrowed a key word like WHERE, ORDER BY from SQL; syntax from such different languages as Python, Haskell, SPARQL; and as a result there was a language allowing to make requests to graphs in a visual form like ASCII art. For example, I would present heading of this article in the form of the graph (Neo4j) — [we study]-> (Wordnet). And it is almost ready database request!


Read more »


Verification of the theory of six handshakes

1 year, 2 months ago

Read more »


Use of Google Analytics in games

1 year, 2 months ago
During development of the game SUPERVERSE we needed means for tracking of how players interact with game, and also receipts of data on "iron", a display resolution, operating system, etc. These data could be useful not only at a debugging stage, but also would help to study features of behavior of users in game.


Read more »


AeroState — monitoring and the forecast of quality of air in Moscow (and not only)

1 year, 2 months ago
Hi, Habr!

This post — about quality of air which we breathe. It is considered to be that in general, air of big cities is unhealthy. It also is clear — here to you both a traffic and plants and you never know what else. Generally, all this keeps residents of the megalopolis in permanent concern about "an adverse ecological situation".

image
Picture from here

However, however, slightly more difficult.

Read more »


Steady beauty of indecent models

1 year, 2 months ago
Титаника на КДПВ нет, он утонул
— You to us could not construct statistical model?
— With pleasure. It is possible to look at your historical data?
— We have no data yet. But the model is all the same necessary.

Familiar dialog, isn't that so? Further two options of succession of events are possible:

A. "Then you come when data appear." The option will not be considered as trivial.
B. "Tell what factors in your opinion are most important." Article remaining balance about it.

Under a cat the story that such improper model why their beauty is steady and what it costs. In total on the example of a distressful data set about a survival of passengers of Titanic.

Read more »


News called to the road: the superfast power effective optical coprocessor for big data

1 year, 2 months ago


Last week Phys.org burst in news: the startup of LightOn offered alternative to central processors (CPU) and graphic processors (GPU) of a solution of tasks of the analysis of big data. The group of authors is based in Pierre and Marie Curie University, Sorbonne and all other correct places in France. The solution is based on optical analog data handling "with light speed". Sounds interestingly. As in the press release there were no scientific and technical details, it was necessary to look for information in patent databases and on the websites of universities. Results of investigation under a cat.

Read more »


Data Science Skills

1 year, 2 months ago


We continue a series of analytical researches of a demand of skills in labor market. This time thanks to Pavel Surmenk of sharky we will consider a new profession – Data Scientist.

The last years the term Data Science began to gain popularity. Write about it much, speak at conferences. Some companies even employ people to a position with the sonorous name Data Scientist. What is Data Science? And who such Data Scientists?

Read more »