3 years ago
We continue the story about development methodologies in the field of the Big Data applied in the Megafon company (the first part of article here). Every day brings us new tasks which require new solutions. Therefore also techniques of the organization of development are constantly improved.
3 years agoOn October 17 there took place the annual brutal Highload Dev Conf conference. More than 300 severe developers to whom the high-loaded projects and BigData are interesting became participants.
3 years ago
Task No. 1 for the retailer — to understand who specifically makes purchases in shop, to study behavior of buyers, to select typical models, and by means of this knowledge to influence quantity and quality of purchases.
The solution is possible, using such approaches:
data analysis from programs of loyalty and other forms of studying of persons and behavior of buyers;
data analysis about purchases and transactions.
Paraphrasing the second approach — what goods the buyer put in the basket?
3 years agoNote of the translator:In our blog we tell about technologies. It is possible to gather something interesting not only, analyzing infrastructure projects of the companies, but also studying work of modern IT specialists. Today we present to yours adapted translation of article of the associate professor of computer sciences of university of Rochester Philip Guo (Philipp Guo) about the one who such "scientists by data" as they work and what difficulties face.
During writing of the thesis for degree of Ph.D. I developed a number of tools for those who write the programs allowing to retrieve valuable information from data. Millions of experts in such areas as science, equipment, business, finance, political science and journalism, along with numerous students and amateur programmers, work with such programs daily.
In 2012, soon after the end of writing of my thesis, the concept "Data science" [English data science] began to extend gradually. Some specialists in this area call a profession "the scientist by data" [English data scientists] "one of the most attractive in the 21st century". In addition, the manual of universities invests heavily in creation of institutes of data handling [English data science institutes].
Now I understand that scientists by data are the main target audience of those tools which were developed by me in the course of writing of the thesis. However this position was not demanded when I still studied in postgraduate study therefore I did not begin to refer specifically to it in the work.
What scientists by data and what difficulties they should face are engaged in?
3 years ago
When we speak about open data, it is always important to remember that they are impossible without basic existence of data in general. I as the person who is engaged in data analysis of state data in the field of state finance and all command of our project of Goszatrata we are engaged in the fact that regularly we try to convince departments responsible for state policy of this area in that that open data were available and that it was most convenient to work with them.
In many respects in it the key to success of public projects also is. To find "fuel" in a type of data on which the project can be constructed and find "fuel" in the form of financing which would allow the project will appear and to be supported. For example, Goszatrata's project where we analyze data of state contracts is supported by Committee of Citizens' initiatives (http://komitetgi.ru). And Goszatrata it is one of the KGI few technology non-commercial projects and technology non-commercial projects in Russia in principle.
I want to touch upon several important subjects at once. Also I ask to consider each of these subjects as a question.
[@tsafin — the Owner Turing's awardsMichael Stounbreykerof should not be represented, it and his students from Berkeley and MIT created, on feelings, the most part relational and not relational databases for the last a couple of decades. Ingress and Postgress, C-Store and Vertica, H-Store and VoltDB – here only small part of projects and firms which Michael and his students affected directly and still are a set of forok and derivatives …
Thus when he criticizes something, whether it be NoSQL or Hadoop, the industry should listen, at least, and it is better to try to change.
His point of view on Hadoop stated in articles 2012 and 2014 of year seemed to me interesting and it was interesting to trace development of the point of view of "classic" for such short period.
The first article "Possible Hadoop Trajectories" published in "Comunications of ACM" http://cacm.acm.org/blogs/blog-cacm/149074-possible-hadoop-trajectories/fulltext, the Stonebreaker wrote in May, 2012 in a co-authorship with Jeremy Kepner (Jeremy Kepner) who at that moment worked as the senior technicians in MIT and as the researcher in MIT Mathematics Department and MIT Computer Science and AI Lab. This article written in a co-authorship seems more impudent and fervent, in comparison with the second, written already by him two years later (and what there, the first article is written to IMHO in the best style), but I publish them in a sheaf since the context for passed few years strongly changed, and it would be dishonest to leave in relation to an ecosystem of Hadoop/HDFS it unnoticed.
3 years ago
At the moment there are many companies of persons in need in systems of analytics, but high cost and excessive complexity of this software in most cases forces to refuse idea of creation of own analytical system for benefit of simple all known ekselya. Also additional expenses on training of employees, support of expensive data storage systems, etc. And here solution Open Source — them can come to the rescue not so much, but there are very worthy softwares, one of which which RapidMiner is.
3 years ago
Time to share our experience of the organization of development process in a fashionable subject of "Big Data" came. In the telecommunication industry considerable hopes for new niches, products, and, respectively, the income are pinned on Big Data. However, many telecommunication companies prefer to buy ready solutions in the field of Big Data, but not to be engaged in development of own examination. Since 2013 "MegaFon" went some other way, having relied on command of the strong specialists in Big Data capable to effectively solve very difficult problems.
3 years ago
Once long ago the owner of shop, it is seller, could remember all goods of the range easily. To tell about features of everyone, history as far as the goods are effective, knew precisely as it is on sale when to order still …
With development of retail traffic control of goods demands other approaches. Systems of accounting and analytics of sales, managements of the range supplement experience of employees of shop or distribution network.
Serious solutions, for example, on removal of goods from the range, are accepted not so simply. Both the category manager, and the managing director of shop needs justifications for such actions.
Therefore one type of the analysis insufficiently. Apply combination of several types (in a different way, the cross-analysis).
In article we on the example of commodity group of "Confectionery" will consider basic approaches to the organization of the cross-analysis. And still we learn who is guilty that Rafaello — goods with unstable sales.