Developers Club geek daily blog

1 year, 7 months ago
"War and peace" – testing time

4 December days in a row, about 1300 people for 60 hours from 30 cities read "War and peace". The unprecedented multimedia project from VGTRK during which Lev Tolstoy's work was read from the first and to the last line. The project takes the grandness and pulls on the Guinness Book of Records.

In addition to a literary marathon a series of interactive infografichesky works under the auspices of analytical community Tolstoy Digital was released. Each infographics, and all them 4, analyzes work under razlichnyma corners: human relations, places, time, history, objects, culture in general.

Under a cat fragments from the novel, there is a little code and my thoughts on process of an infografirovaniye of data on the example of an event taymlayn.

"To it it was perfect all the same this minute whoever was above it whatever spoke about it; he was glad only that people stopped over it, and wished only that these people helped it and would restore him to life which seemed to it so fine because he so differently understood it now"

I Tom, Part 3, Chapter 19

The essence of infographics is receipt of answers to questions. Infographics can answer, how one of questions: What? Where? When? How? And to combine different aspects of understanding of several questions. And sometimes infographics – it is simply beautiful, without any measures of calculation of entropy.

What it is possible to be sure of is that infographics cannot exist without data. If there are no data, then infographics loses meaning. Let's try to characterize data in the context of infographics – normally it is digits, texts and communications between these digits and texts, both multiple, and single.

Digits in itself it is already beautiful and if in them there is a certain sense, then they become more beautiful in two, and even in three or ten times for those who solved this sense.

Texts in difference from digits happen senseless, but sometimes as it became with the work "War and peace", texts become exceptional. It in power to recover events of bicentennial prescription directly at you in the eyes! Especially if they are read by more than one thousand people.

Communications between data are the most difficult and intricate part at creation of infographics. It is heavy to hit the bull's eye from the first, having provided readability of infographics, the selected communications. The first attempts cover the screen with the confused artful design of the multi-colored lines going it is unknown from where and getting it is unknown where.
"The sovereign directly told that Council and the Senate an essence the state estates; he told that the board has to have the basis not arbitrary behavior, but the firm beginnings. The sovereign told that finance has to be transformed and reports to be publichna"

II Tom, Part 3, Chapter 18

If on hands there are data - it is already a business floor. And in what a secret of the second half of business?:) In that as we will be able quickly to understand them!

What needs to be made for this purpose? Of course to try to penetrate into data, to blue in the face to look for patterns, to try to lead them to the idealized scheme. But work can be simplified as to, and to the colleagues if to adhere to the following principles.

Connect programmers at a design stage and data analysis


Even if you develop static infographics, the programmer will be able quicker to prove or disprove the existing idea of data.

Never edit basic data manually


It harmful influences work performance. I want to assure at once those who think that data do not change. They change more often than you think:) Therefore we will agree that our infographics accepts those data which come from editors. So you expand to editors of border of thinking, without bringing them to narrow columns of a program data set. If there is too much data and process of preparation on live occupies a long period, then it is necessary to lead them to the normalized type in advance. But for a start this process can be described also in the infographics, and after to cut in the separate module of data preparation. As we adhered to this council within development of infographics:

Essence of a taymlayn – to show how events in the novel correspond to the real course of historical events. To each chapter, if it allows to understand the text of the novel, the time span or some specific date is appropriated. All this information contains in one of columns of the initial table. It also caused the first difficulties as the format was chaotic, but as it appeared, giving in to certain rules. After the short analysis it became clear that there are about 12 patterns. We write processors and we normalize a column to a uniform format:

convertDate: function (date, year) {
        // чистим строку с предполагаемой датой
        date = util.clearValue(date);
        
        // может быть указано, что даты нет или ячейка будет пуста
        if (['нет', ''].indexOf(date) !== -1) {
            return null;
        }
       
        // в качестве даты может быть обычный формат 28.01.1812
        if (this.tests.testDate(date)) {
            return this.getDate(date);
        }
       
        // убираем слова паразиты, которые никак не влияют на процесс распознания даты
        var dateParts = _.without(date
            .split(/[\s-]/g), 'с', 'по', 'и', 'за', 'романа');
        var patternId = '';

        // ищем для каждой подстроки возможный паттерн и собираем id паттерна
        // конечный id имеет следующий вид "42423", не найденные паттерны содержат 9-ки
        for (var key in dateParts) {
            patternId += this.getPatternId(dateParts[key]);
        }
       
        // если паттерн найден, возвращаем дату или диапазон дат
        if (this.patterns[patternId]) {
            return this.patterns[patternId](dateParts, year);
        }
}

Chapters which correspond to a historical period are described in other column. They also has a chaotic idea, but give in to a certain logic:

getChapters: function (chapters) {
        // может быть указано, что глав нет или ячейка будет пуста
        if (['нет', ''].indexOf(this.clearValue(chapters)) !== -1) {
            return [];
        }
        
        return _.chain(this.clearValue(chapters, true)
            // чистим ничего не значащие точку или запятую в конце строки
            .replace(/[\.\,]$/g, '')
            // разделителями выступают, как точки так и запятые
            .split(/[\.\,]/g))
            .map(chapter => {
	         // диапазоны указываются через -
                var chapters = chapter.split('-');

                if (chapters.length == 1) {
                    return chapters[0];
                }

                return _.invoke(_.range(Number(chapters[0]), Number(chapters[1]) + 1), 'toString');
            })
	    // сворачиваем внутренние массивы на один уровень
            .flatten()
            .uniq()
            .value()
        ;
}

Cache everything that is possible at start of your infographics


Avoid rantaym calculations, thereby you spare machines of users from excessive loading. Clean data in advance. Create missing parts of data necessary for visual collections. If to visual collections the repeated address after creation of HTML is required, cache all necessary elements in a data structure.

// кэшируем в данные для лет пиксельные показатели и ссылку на svg - элемент
yearTimelineView.selectAll('.year')
        .data(yearsData)
        .enter()
        .append('g')
        .each(function(d) {
            d.yearNext = d.year + 1;
            d.startYearY = timeScale(new Date(d.year, 0));
            d.endYearY = timeScale(new Date(d.yearNext, 0));
            d.localYearY = (d.endYearY - d.startYearY)/2
            d.yearView = d3.select(this);
        })
;

Prepare multilayer data


You get separate convenient data structures for each visual collection. Be not afraid to duplicate data, sometimes it is vital need to save transparency of the project.

// столько различных наборов данных понадобилось для постоения таймлайна
var timelineData,
    filteredTimelineData,
    historyFilteredTimelineData,
    yearsData,
    dataByType,
    dataLinks,
    dataUrls,
    dataChapters;

Turn data inside out


Manipulate data with the help of libraries: underscore, lodash and the functions which are built in d3. Always try to obtain that format which will ensure to you comfortable functioning. The most colourful examples of a reversing of data in infographics:

Basic data are provided by 4 tables. Not the best format for creation of two dependent taymlayn (history and the novel). As infographics after all about the book, the basic taymlayny we selected the novel, historical was awarded a supporting role. The Osnovany table describes three sorts of events: only in the novel, only in the history and adjacent events. One adjacent event is described by two lines of data, and communication is put down through the related_book_id field in an adjacent historical event. As as the main taymlayn the book was selected, the table had to be turned out in two dependent lists connected across the field of related_book_id.

All infografichesky works are covered with links to the novel, it is made in order that without coming off cash desk to pass and continue reading the novel, starting with the selected quote. The book in turn contains back references to infographics. There were certain difficulties in comparison of links from infographics to the necessary fragment in the book. As the quote was the only key for getting a link, it was necessary to apply algorithm of inaccurate string comparison. Exact comparison gave only 30% of compliance.

Cover data with tests


Limits, checks on existence of communications in lists, right sequences within logic of infographics, formats – try to cover all chances. Besides returning to a perspective of team working, it is better to make it once in parallel with the main development and to sleep peacefully. Any change of data will not pass by you, be sure.

// vpc - уникальный идентификатор том, часть, глава
// во всей работе идентификаторы приводятся к унифицированному формату
console.log(v.id, 'не верный формат vlc_id:', v.vpc_id);

// проверяем существующие связи в смежных событиях
console.log(historyEvent.id, 'нет такого id в цитатах:', historyEvent.related_book_id);

Finish a prototype in the first 10 hours


It will give an idea where there is most serious not joining in your understanding of work of data. That it gave us:

Could define at an early stage that a grid of compliance of the novel of history of too bulky about 350 temporary notches. And a good few from them falls on the modest period in 1812. The good idea with analogy of a notebook in a lineechka was only good idea. It was necessary to refuse this implementation. Instead of it we left a grid of adjacent events, it partly implements this functionality. So it looked when there was a grid:

"War and peace" – testing time

It was necessary to refuse also diapazonalny distribution according to heads, the picture became absolutely unreadable. It was in certain cases not clear to paint in what color communications, in color of war or peace. And columns with information one-time contained too many quotes. One of early screenshots:

"War and peace" – testing time

Events of the novel develop from 1805 to 1820, but a covering events uneven. The decision to carry out scaling of years from the number of events was made. The algorithm of the following 35% of height of a taymlayn falls on all years in equal shares, the remained 65% share between years with events, in proportion to the number of events.

Many historical events, took place consistently with a small gap in time to avoid some porridge, points of events move concerning the initial positions a little down if there is a nayezzhaniye of points.

The historical events having for the beginning of year crept on separating bars, as well as in the previous point we shift them rather initial provision a little down.

So the final option of infographics looks:

"War and peace" – testing time

"Moscow when Napoleon, tired, uneasy and frowned, walked to and fro at the Kamerkollezhsky shaft was so empty, expecting that though external, but necessary, on his concepts, observance of decencies — deputations. In different corners of Moscow only senselessly people still moved, observing old habits and not understanding what they did."

III Tom, Part 3, Chapter 19


Except the practician and methodologies there is an important point – tools. To understand how everything is arranged in d3, it is not necessary to have 7 spans in a forehead. For a start we will bring skills of work with three basic features of library to automatism:

  • Construction: enter – update – exit. We learn as a mantra. The more often you apply it in work, the better – it characterizes successful preparation of your data.
  • Scaling of data. Feature, the second for popularity, fremvorka.
  • Helper on drawing of path. Without it it is difficult to provide at least one project on d3.

"It was frosty and clear. Over dirty, dark streets, above black roofs there was a dark star sky. Pierre, only looking at the sky, did not feel offensive meanness of all terrestrial in comparison with height at which there was his soul"

II Tom, Part 5, Chapter 22

There was a final stroke, preparation of the project for a prodakshen. As an assembly system I used webpack. Transparency and flat structure of rules allows to forget about conventionalism of a task absolutely. Here the pure template for start of the project with webpack'om lies (it is used in infographics).

Epilog


"There are only two virtues: activity and mind"

I Tom, Part 1, Chapter 22

All battles begin in our head. You strive for peace in the head and the world will set in around you.

It would be desirable to thank the ex-colleague from RIA of studio of infographics and all participants of a literary marathon for this action.

Hm, and that, interesting thought: to make infographics how read "War and peace".?!

Source codes of a taymlayn
Template with webpack
Infografichesky project "War and peace"
Separately taymlayn here

This article is a translation of the original post at habrahabr.ru/post/273327/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus