2 years, 5 months ago In the last few years I rather strongly was interested in a question of pricing in the Russian online stores. Every time at the statement of online store for a big discount in soul creeps in doubt … Whether really such big discount? Whether the price which is crossed out now was actual? Sharp changes in the exchange rate of dollar at the end of 2014 added fuel to the fire. There was a strong wish to receive the answer to a question as the prices depend on dollar rate in reality. As a result, I decided to finish these questions and to collect change in price history on the Russian online stores. On a cat results of work + several interesting patterns.
2 years, 5 months ago
Standard plan of any hackathon ↓
In these days off there will take place the hackathon on machine learning which organizer is the Microsoft company. Participants of a hackathon will have 2 days strong it is better not to sleep and make the world.
The narration in this article will pass in the same promptest manner in what as I believe, for most of participants will pass also a hackathon. Any water (if you are not familiar with Azure ML, then it is better to read "water" or some fact-finding material after all; otherwise it will be unclear), long determinations and such long introductions as it further — only the fact that it is necessary for you to win on a hackathon.
The cognitive IBM Watson system can discuss different problems of people not worse than any expert now. Command of specialists from Technology university of Georgia (USA) together with representatives of IBM within six months trained cognitive system to understand world around and to find solutions of some important problems.
Each of six commands working on the project prepared 200 different questions. IBM Watson was necessary to study several objects, new to itself to be able to carry on dialogue. For example, the cognitive system studied several hundred articles in biology from a repository of the articles Biologue. After training of Watson of command began to ask system questions of architecture, telecommunications, ADP equipment. Watson had to formulate a definite answer, using the studied material.
2 years, 6 months ago
If you know why the simple line 'strings' in Redis will borrow in random access memory of 56 bytes — to you, I think, article will not be interesting. I will try to tell all rest what is lines in Redis and why it is important to the developer using this database to understand how they are arranged and work. This knowledge is especially important if you try to calculate the actual consumption of memory your application or are going to build highly loaded systems of statistics or accounting of data. Or, as often happens, you try to understand urgently why suddenly your copy redis began to consume unexpectedly a lot of memory.
2 years, 6 months agoThe The Art Of Analytics project from the Teradata company looks rather unusually. Idea of the project — to explain researches on the basis of big data in the form of artistic images of wide audience. You want to learn as detection of fraud in banks and terrorist threats or comparison of one-malt whisky looks? Under a cat some of 20 researches represented in the form of pictures.
In this article I want to tell about some opportunities of free and extremely useful, but for the present poorly known superstructure over MS Excel under the name Power Query.
Power Query allows to take away data from the most different sources (such as csv, xls, json, text files, folders with these files, the most different databases, different api like Facebook opengraph, Google Analytics, Yandex. The metrics, CallTouch and a lot of things still) to create the repeated sequences of processing of these data and to load them in the tables Excel or data model.
And here under a cat you can find details of all this magnificence of opportunities.
My name is Gleb Morozov, we are already familiar under the previous articles. At numerous requests I continue to describe experience of the participation in the educational MLClass.ru projects (by the way who was not in time yet — up to the end still it is possible to receive materials of last courses is, probably, the shortest and most practical course on data analysis which can be imagined).
This work describes my attempt to create model for a prediction of the survived passengers of "Titanic". The main objective — training in use of the tools used in Data Science to data analysis and the presentation of results of research therefore this article will be very much and very long. The main attention is paid to the research analysis (exploratory research) and work on creation and the choice of predictors (feature engineering). The model is created within the competition Titanic: Machine Learning from Disaster passing on the website Kaggle. In the work I will use the R language.
2 years, 6 months ago
On October 10 the fourth DataTalks took place. This time the predictive analytics became a subject of a meeting, and we would like to share videos of performances with community.
Why predictive analytics? She allows to predict on the basis of historical data different events in the future, such as behavior of clients or results of the made actions. Thanks to it business can make optimal solutions and consider forecasts about future actions and desires of their clients.
Under a cat you will find records of performances:
How to answer the question "What Will Be?": a practical advice / Andrey Yarmola, Data Science Team Lead in Wargaming
Necessary minimum of tools for creation of the system Recommendations / Alexey Dyomin, Java Server Side Developer in InData Labs
What is predictive analytics and to whom it is necessary / Nadezhda Ruchanova, the deputy director of representative office of OOO "SAP SNG", Mikhail Avetisov the leading expert of OOO "SAP SNG" on predictive analytics
Creation of the data storage on the basis of a platform Hadoop / Igor Nakhvat in Data Integration Engineer, Wargaming
Use of predictive analytics for management of the value of client base / Maxims of Brain Director of CRM in Wargaming
At the end of October there was a new HP Vertica version. The design team carried on nice traditions of release of construction equipment of BigData and gave a code name of the new Excavator version.
Having studied innovations of this version, I think the name correct is selected: everything that was necessary for work with big data at HP Vertica is already implemented, now it is necessary to balance and improve existing, that is to dig :)
I in brief will walk on the most significant changes from my point of view.
The policy of licensing is changed
In the new version algorithms of calculation of the occupied data size in the license were changed:
For tabular data now at calculation 1 bytes of a divider for numerical and date time of fields are not considered;
For data in the zone flex at calculation the size of the license is considered as 1/10 from the size of the loaded JSON.
Thus, upon transition to the new version, the size of the occupied license of your storage will decrease that will be especially noticeable on the big data storages occupying tens and hundreds terabyte.
Official support of RHEL 7 and CentOS 7 is added
Now it will be possible to develop Vertica cluster on more modern Linux OS that I think has to please system administrators.
Storage of the database catalog is optimized
The format of storage of the directory of data in Vertica already many versions remained the same. Taking into account growth not only data in databases, but also amounts of objects in them and quantities of notes in clusters, it already ceased to satisfy to efficiency questions for the high-loaded data storages. In the new version optimization, for the purpose of reduction of the size of the directory was performed that positively affected the speed of its synchronization between notes and work with it at execution of requests.
After a hackathon we were not satisfied with what has already been achieved as it usually happens, and continued work. We on hands had data to which, probably, only the staff of the Ministry of Education had access earlier: results of SFE and a victory at the Olympic Games for 2014-2015 for 90% of the Moscow schools. For 55% of schools it was succeeded to collect data on USE for 2015. Pumped over all accounts of the Moscow school students in Contact, looked what HIGHER EDUCATION INSTITUTIONS they specify at themselves in profiles after the termination.
Naturally, it was interesting to study such datast. At first trivial things about which people from education, probably, well know:
Points on USE in humanitarian objects are higher, than on technical. History — an exception;