Developers Club geek daily blog

1 year, 2 months ago
It is no secret that many software developers open source and not only, for various reasons wish to save the anonymity. Quite recently the group of researchers published work in which methods of de-anonymization of the programmer on its coding style through the analysis of source codes are described. Authors claim that to them the medium accuracy of identification managed to reach in 94%.

By means of creation of abstract syntax trees on the basis of analysis of a source text, they managed to select steady distinguishers when writing a code which are difficult for hiding even purposefully. Using machine learning and a set evristik, it was succeeded to achieve the impressive accuracy of determination of authorship among selection of 1600 Google Code Jam programmers.

De-anonymization of the programmer is possible not only through the source code, but also through the compiled binary file

In the new work, researchers showed that de-anonymization is possible also through the analysis of already compiled binary files in lack of source codes (video of the presentation of the report). This time for research source codes of 600 participants of Google Code Jam which were compiled in the performed files were used, and then were exposed to analysis. Thanks to the fact that tasks at competitions were identical to all the difference of files consisted substantially in programming style, but not in algorithm. Initially, at assembly of binary files were disconnected optimization of the compiler and the obfuskation of source codes was not applied. But, according to authors of work, some distinguishers remain also at application of these methods of concealment of authorship, reducing de-anonymization accuracy to 65%.

De-anonymization of the programmer is possible not only through the source code, but also through the compiled binary file

By means of disassembling and decompiling, applying the same abstract syntax trees, the analysis of the graph of a control flow is carried out, distinguishers of coding are selected and training of the qualifier on the basis of vectors of signs is made.

De-anonymization of the programmer is possible not only through the source code, but also through the compiled binary file


De-anonymization of the programmer is possible not only through the source code, but also through the compiled binary file


What is interesting, it was revealed that Bol professional programmers can be deanonimizirovana much easier after comparison with less experienced colleagues since have more expressed and individual programming style.

Authors are sure that by means of similar methods these authors of such developments as Bitcoin, TrueCrypt and the known malicious applications will be sometime revealed.

This article is a translation of the original post at habrahabr.ru/post/274533/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus