3 years, 1 month ago
This article is prepared by Dmitry Ovcharenko, the architect of Department of applied financial systems of the company of "Infosistemy Jett"
Yes there will be unification! Such decision has been made at design of the integration architecture connecting CRM with other external systems by means of the bus on Oracle Service Bus. In addition to online integration at basis of web services, it accepts the files coming to system and calls the web services on the party of CRM which are specially developed for each type of the entering data.
The file contains set of records, and on everyone it is required to execute separate call of service on the party of CRM. Processing of the file is made in the scraper on records. On each call of service leaves for 5 seconds is very much, but for execution of the put requirements quite was enough. Web service call processing in CRM checks previously record for double, then executes required business logic and creates record in DB.
But "suddenness" can arise in the unexpected moments of "mounting". On industrial data volumes in the CRM base doubles began to appear. We have found out that the source can resend for some reason the big file (how it will be picked up by file proxy-service and it is placed in the Stage-folder). And lag between calls of the web services creating doubles so is not enough that at the time of the second call data in the first yet not zakommichena, and check on the party of CRM does not manage to work.
3 years, 1 month ago
When you work hard with data, it is necessary to build often diagrams and to do different conversions over tables. It is important to learn to do it quickly and minimum straining brain. The matter is that data analysis in many respects consists in inventing and check of hypotheses. It is more interesting to think out, of course, than to check. But it is necessary to do both. Good tools in the trained hands help to spend the minimum quantity of time and intellectual energy for technical work.
I have tried many tools: Excel, Python+Matplotlib, R+ggplot, Python+ggplot, has also stopped on linking of Python+Pandas+Seaborn. Has solved with their use already many problems and would like to share supervision.
As promised, I continue the publication of articles in which I describe the experiment after training on Data Science from children from MLClass.ru (by the way who was not in time yet — I recommend to be registered). This time we on the example of problem of Digit Recognizer will study influence of the amount of the training selection on quality of algorithm of machine learning. It is one of the very first and main questions which arise at creation of predictive model
3 years, 1 month ago
As all of you already probably know, dump of the AshleyMadison bases have been laid out recently. I have decided not to miss opportunity and to analyze real data platform deyting. Let's try to predict solvency of the client according to its characteristics such as age, growth, weight, habits, etc.
Welcome the main difference of conference of developers of the high-loaded HighLoad systems ++ from many others is lack of the hidden purposes. For us do not cost any person or the organization which would impose rules of the game or was engaged in Hunting on action, type:
Already for many years HighLoad ++ remains event which one developers will organize for other developers.
Nine years ago we have accepted for ourselves some strict rules to which we try to follow strictly. Let's not list them all — for this purpose time still will come, we will call only the main.
My name is Gleb, I long time work in retail analytics and now I am engaged in application of machine learning in the field. Not so long ago I have got acquainted with children from MLClass.ru who for very short term have quite strongly pumped over me in the field of Data Science. Thanks to them, literally in month I began to sabmitit actively on kaggle. Therefore this series of publications will describe my experience of studying of Data Science: all mistakes which are made, and also valuable advice which to me were given by children. Today I will tell about experience of participation in the competition The Analytics Edge (Spring 2015). It is my first article — do not judge strictly =)
3 years, 1 month ago
I often ride bicycle and the motorcycle therefore the question "whether and there will be rain" disturbs me rather often. As it has appeared, the Central Aerological Observatory regularly uploads at itself to the website pictures from meteorological radars. That to use them there are no two things: opportunities to approach the card and to look as moved clouds in the last hour. If to add these two features, the useful piece turns out:
Bad news: Roshydromet has forbidden TsAO to publish data in real time, now they are available with delay at 24 o'clock. Put likes, perhaps, it will turn out to obtain in some look up-to-date data back. Nichos, about it even the petition are — "To return open access of pictures of DMRL (radar)". And already letters wrote to Roshydromet.
On it time will be a question of application in our practice Apache Spark and the tool allowing to create remarketingovy audiences.
Exactly thanks to this tool, having looked at fret saw once, you will see it in all corners of the Internet until the end of the life. Here we have also typed the first cones in the address with Apache Spark.
We are already familiar under the previous articles on data analysis. Now time has come to tell about one very practical task which we have learned to solve. Namely — we learn who actually manages our opinion on social network of VKontakte. The code cat is a lot of unusual results and interesting mathematics.
"Big data" are much closer to you and them is strong more, than it seems. Despite abundance of actions on this subject very few people, "between us girls speaking", owns a subject. And to squeeze out advantage and money of information — it is necessary to understand very well... — in subtleties.
The technology of "an extraction of big data" roughly is divided into two, very much different layer — engineering and algorithmic. In the first monolith the software so far quite crude, violently develops from what at developers, simple words, already "the roof goes": it is necessary to understand tools from "old kind" Hadoop with HDFS, actively using Hive, Impala, Presto, Vertica both other and other … and not to lag behind competitors, yuvelirno to own secrets of Apache Spark, svayanny on fine laconic Scala.