For an example we will consider the account @Russia_Direct. This small edition which covers events in Russia for English-speaking readers. Something the Russia Today type, but with deeper and academic materials.
Now on them 4000 people — students, journalists, teachers of universities are signed ~:
On average their subscribers follow 500 accounts. There is some strange crest around 2000. That is abnormally many people follow ~ 2000 accounts. Perhaps, it is robots? Who knows, write in comments.
Let's look at whom except RD these people follow. Twitter has absolutely draconian limits on appeals to API, it is possible to take away the list of subscriptions only once minute. Therefore it is necessary to spend for 4000 users 4000 minutes. Nothing, we wait for half of week, we receive the list from 4 000 000 accounts. We sort it by a share of audience which is signed on RD, we look at a top and we see interesting:
- For example, from 497 subscribers of @NewBooksRussia 122 are signed on RD. The audience of these two accounts means it is very similar. Roughly speaking, it is possible to consider that other 375 subscribers of @NewBooksRussia are potential followers of RD.
- On the first place, of course, RD. Why not 100% of subscribers of RD are signed on RD? Owing to a number of technical reasons. Some profiles under a lock, for them the list of subscriptions it is impossible to receive. Sometimes API answers with strange errors.
Here it is necessary to mention that just like that it is impossible to consider a share of subscribers, in a top there will be accounts which have only 1 follower, from them a share of 100% signed on RD automatically. It is possible to throw out small accounts, but it is unclear on what threshold. 50 followers are a little or enough? In such cases I like to use for sorting not a share (n / N), and here such indicator of n/N — 3 sqrt (n (N — n + 1) / N3). Why it, it is possible to understand, for example, from Probabilistic Programming and Bayesian Methods for Hackers.
Thus, we on hands have a list of accounts similar to RD. This list in itself is useful. It is possible to look what there are competitors what they write about. It is simple to collect potential followers now, it is necessary to download lists of followers of similar accounts and to look what users meet several times. Here, for example, a certain Romanian official is signed on 5 similar on RD accounts, and on RD is not signed:
And such there is a lot of.
There can be a need to separate accounts of people from accounts of the organizations. As a first approximation it is possible to consider that if on an avatar there is a person, it means the person. The elementary code with use of OpenCV not bad copes with a task:
Further these people it is possible to investigate, write them messages and tweets, to follow and configure advertizing campaigns.
This article is a translation of the original post at habrahabr.ru/post/273531/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: email@example.com.
We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.