Developers Club geek daily blog

Data Scientist profession: general information and the main difficulties which scientists by data" face"

1 year, 2 months ago
Note of the translator: In our blog we tell about technologies. It is possible to gather something interesting not only, analyzing infrastructure projects of the companies, but also studying work of modern IT specialists. Today we present to yours adapted translation of article of the associate professor of computer sciences of university of Rochester Philip Guo (Philipp Guo) about the one who such "scientists by data" as they work and what difficulties face.



During writing of the thesis for degree of Ph.D. I developed a number of tools for those who write the programs allowing to retrieve valuable information from data. Millions of experts in such areas as science, equipment, business, finance, political science and journalism, along with numerous students and amateur programmers, work with such programs daily.

In 2012, soon after the end of writing of my thesis, the concept "Data science" [English data science] began to extend gradually. Some specialists in this area call a profession "the scientist by data" [English data scientists] "one of the most attractive in the 21st century". In addition, the manual of universities invests heavily in creation of institutes of data handling [English data science institutes].

Now I understand that scientists by data are the main target audience of those tools which were developed by me in the course of writing of the thesis. However this position was not demanded when I still studied in postgraduate study therefore I did not begin to refer specifically to it in the work.

What scientists by data and what difficulties they should face are engaged in?

Read more »


Visualization of static and dynamic networks on R, part 7, the last

1 year, 2 months ago
In the first part:
  • visualization of networks: what for? how?
  • visualization parameters
  • best practices — an esthetics and performance
  • formats of data and preparation
  • the description of data sets which are used in examples
  • beginning of work with igraph

In the second part: colors and fonts in diagrams R.

In the third part: parameters of graphs, tops and edges.

In the fourth part: placements of a network.

In the fifth part: emphasis of properties of a network, tops, edges, ways.

In the sixth part: interactive visualization of networks, other methods of representation of a network.

In this part: the animated visualization of networks, evolution of a network in time.

Read more »


Introduction to RapidMiner

1 year, 2 months ago
RapidMiner logoAt the moment there are many companies of persons in need in systems of analytics, but high cost and excessive complexity of this software in most cases forces to refuse idea of creation of own analytical system for benefit of simple all known ekselya. Also additional expenses on training of employees, support of expensive data storage systems, etc. And here solution Open Source — them can come to the rescue not so much, but there are very worthy softwares, one of which which RapidMiner is.

Read more »


Analyze it: how to derive additional benefit from client logs

1 year, 2 months ago
imageIt is known that the golden rule of the treatment of the client – "not to bother": neither advertizing, nor news, nor careful inquiries about what in your programs is pleasant to it and that is not present. The same concerns also technical support: the smaller quantity of calls, letters and remote sessions is required to you to collect all necessary data, the better – as for the company which will save a little bit money, and for the client who will save nemozhko time, without speaking about nerves of both parties. It also sets thinking eventually: and whether it is impossible to retrieve somehow additional information for reflections from those data which we already have, without disturbing clients mailings and polls once again?

In this article I will tell about one of methods which we use in the Parallels company.

Read more »


Wrest of information from URL, in Slack and Twitter style

1 year, 3 months ago
Many use Slack, Twitter and saw such pieces:

How it works and how such to make?

Read more »


Visualization of static and dynamic networks on R, part 6

1 year, 3 months ago
In the first part:
  • visualization of networks: what for? how?
  • visualization parameters
  • best practices — esthetics and productivity
  • data formats and preparation
  • the description of data sets which are used in examples
  • the beginning of work with igraph

In the second part: colors and fonts in diagrams R.

In the third part: parameters of graphs, tops and edges.

In the fourth part: placements of network.

In the fifth part: emphasis of properties of network, tops, edges, ways.

In this part: interactive visualization of networks, other ways of representation of network.

Read more »


Mastering of the specialty Data Science on Coursera: personal experience (ch.2)

1 year, 3 months ago


We publish the second part of post of Vladimir of Podolsk vpodolskiy, the analyst in department on work with formation of IBS which has finished training on specialization of Data Science on Coursera. It is set from 9 kurserovsky courses from Johns Hopkins University + the thesis which successful completion grants the right for the certificate.

Read in the first part: About the specialty Data Science in general. Courses: Instruments of data analysis (programming on R); Preprocessing of data; Documentation of processing of data.

Part 2

Read more »


Moscow schools. As we participated in the second hakaton according to open data

1 year, 3 months ago

Read more »


Hakaton Big Data for Business: begin the technological startup

1 year, 3 months ago

We invite developers, analysts, marketing specialists, designers, product managers and business angels on hakaton Big Data for Business – two-day team competition in development of the software products solving business problems through data analysis. Hakaton will pass on November 18-19 in the Kazan IT park. Sponsors of action — the EMC and Brocade company. Partners — Textocat, DGL, Provectus and Business incubator of IT park Kazan. Prize fund — 150 000 rubles.

Having taken part in the hakatena of Big Data for Business, you will be able:

  • to find team of adherents,
  • to think up cool business idea, to implement and improve it with leading experts,
  • to gain recognition,
  • to win valuable prizes,
  • to adopt experience in the technology sphere and the principles of packaging of product,
  • to take the first step towards the startup on the basis of technologies of data analysis
  • to get acquainted with perspective product teams in the field of Big Data.

Further we will tell about key features of our action.

Read more »


Mastering of the specialty Data Science on Coursera: personal experience (p.1)

1 year, 3 months ago


Recently Vladimir Podolsk vpodolskiy, the analyst in department on work with formation of IBS, has finished training on specialization of Data Science on Coursera. It is set from 9 kurserovsky courses from Johns Hopkins University + the thesis which successful completion grants the right for the certificate. For our blog on Habré it has written detailed post about the study. For convenience we have broken it into 2 parts. Let's add that Vladimir became also the editor of the project on transfer of specialization of Data Science into Russian which have started IBS and ABBYY LS in the spring.

Part 1. About the specialty Data Science in general. Courses: Instruments of data analysis (programming on R); Preprocessing of data; Documentation of processing of data.

Hi, Habr!


Not so long ago my 7-month marathon on specialization mastering "Data science" (Data Science) on Coursera has ended. The organizational parties of mastering of specialty are very precisely described here. In the post I will share impressions of content of courses. I hope, after reading of this note everyone will be able to draw for himself conclusions on, whether it is worth spending time for knowledge acquisition on analytics of data or not.

Read more »