Steady beauty of indecent models

Титаника на КДПВ нет, он утонул
— You to us could not construct statistical model?
— With pleasure. It is possible to look at your historical data?
— We have no data yet. But the model is all the same necessary.

Familiar dialog, isn't that so? Further two options of succession of events are possible:

A. "Then you come when data appear." The option will not be considered as trivial.
B. "Tell what factors in your opinion are most important." Article remaining balance about it.

Under a cat the story that such improper model why their beauty is steady and what it costs. In total on the example of a distressful data set about a survival of passengers of Titanic.

News called to the road: the superfast power effective optical coprocessor for big data

Last week burst in news: the startup of LightOn offered alternative to central processors (CPU) and graphic processors (GPU) of a solution of tasks of the analysis of big data. The group of authors is based in Pierre and Marie Curie University, Sorbonne and all other correct places in France. The solution is based on optical analog data handling "with light speed". Sounds interestingly. As in the press release there were no scientific and technical details, it was necessary to look for information in patent databases and on the websites of universities. Results of investigation under a cat.

Under Redis cowl: Hash table (part 1)

If you know why after execution of 'hset mySey foo bar' we will spend not less than 296 bytes of random access memory why engineers of instagramm do not use line keys why it is always worth changing hash-max-ziplist-entries/hash-max-ziplist-val and why the data type which is the cornerstone of hash it and part of list, sorted set, set — do not read. For the others I will try to tell about it. The understanding of the device and work a hash of tables in Redis is crucial when writing systems where the economy of memory is important.

About what this article — what expenses incurs Redis on storages of the key that such ziplist and dict when and for what they are used how many borrow in memory. When hash is stored in ziplist when in dicth and that it gives us. What councils from fashionable articles about optimization of Redis you should not perceive seriously and why.

Data Science Skills

We continue a series of analytical researches of a demand of skills in labor market. This time thanks to Pavel Surmenk of sharky we will consider a new profession – Data Scientist.

The last years the term Data Science began to gain popularity. Write about it much, speak at conferences. Some companies even employ people to a position with the sonorous name Data Scientist. What is Data Science? And who such Data Scientists?

Target figure Damme's by method

КДПВThe target figure is often added to identifiers which people can write or give with errors that then to find these errors.

The last digit of a credit card number, the ninth digit of the VIN cars sold in in the USA or the last digit of ISBN can be examples.

Algorithm of a target figure of van Damme — rather new and therefore little-known. It is published 2004.

The algorithm finds all errors in one digit and all single shifts of the next digits. It is much simpler, than Verkhuff's algorithm, comparable by opportunities, and does not demand use of special characters (such as X in 10-unit ISBN).

sin 1 ° on the calculator

Калькулятор Casio
Important refining — the calculator normal, without the sin button. As in accounts department or in the market.

Under a cat three different candidate solutions from different eras, from ancient Samarkand to the USA of times of cold war.

The announcement of online courses of Technopark, Technosphere and Tekhnotrek on Stepic

Pleasant news to all who have no opportunity to be trained in Technopark, the Technosphere or Tekhnotreke: now courses of these projects are available in the form of online courses on the Stepic platform! Today record on five disciplines is available:

Over time the quantity of courses will be increased.

Why we do it? Quite obviously: not everyone can become the listener of our projects, only pupils from three Moscow higher education institutions can take part. And thanks to online training also many other talented students will be able to gain knowledge, so useful to the beginning IT specialists. Within online courses pupils will be able to watch content interesting by it and to perform practical tasks for check of the acquired material. Besides, they will have an opportunity to communicate with each other, to discuss tasks and to ask questions online. Following the results of successful passing of a course the certificate will be issued. And now is more detailed about available disciplines.

Practical aspects of automatic generation of unique texts for SEO

The most awful horror story for persons interested to post content written by the computer on the websites — sanctions of search engines. We too were frightened in due time by the fact that the website with the nonunique and/or generated texts, will be badly indexed or that it in general will get under Bang. At the same time nobody could tell exact requirements to texts to us. In general the subject of unique content and its role in website promotion is more similar to occult knowledge. Each following "specialist" promises to open the terrible truth on the page, but the truth does not open, and the essence of many discussions at forums to be reduced to the fact that, say, Yandex, will recognize the generated content by means of magic. Not such words, but sense in it.

As recently to us customers handled a task to create descriptions for goods on the website, we decided to study this question in more detail. What algorithms exist for determination of automatically written texts what properties the text not to be recognized as web spam and what means can generate it has to have?

We write tactical game about digits under Android

When I only undertook programming (3 months ago), I quickly understood that it is better at once to begin to be engaged in the projects. It is impossible to be all day long at books or courses, but if you begin to do something special, then easily stay behind development from morning to the morning.

This article — the small tutorial how to make logical game with a bot. Game will look here so:

* I will in detail describe rules once again in the section about AI.

Conditionally I separate readers of article into three groups.
  1. Began to program a few hours ago.
    To you it will be difficult, better previously complete some small course on introduction to Android-development, deal with two-dimensional arrays and interfaces. And then load the project from a gitkhab. Comments and this article will help you to understand as as works.
  2. Already you are able to program, but you cannot call yourself experienced yet.
    It will be interesting to you because you very quickly will be able to make the game. I undertook a dirty job on creation of logic of game and an ui-component, I leave you creative part. You can make other mode of game (2 on 2, online, etc.), to change algorithms of a bot, to create levels, etc.
  3. Experienced.
    To you can be interesting to think over AI — to write it not so easily as it seems at first sight. Also I would be very glad to receive from you notes on a code — is sure, I made not all optimum.

Summarization system for three languages

I want to tell about the service of summarization of news texts developed by me in the English, Russian and German languages.

Summarization systems (summarization) (SAR) — the subject quite specific will also be interesting generally to those who studies automatic language processing. Though ideally performed sammarayzer could become the useful assistant in spheres where it is necessary to overcome information overload and to quickly make a decision on what information is worth of further consideration.

