Developers Club geek daily blog

1 year, 9 months ago
Very many systems and the phenomena are representable in the form of networks, i.e. a set of objects and communications between them. A network — not only abstraction, but also the visual instrument of data visualization. It is possible to display importance of this or that object, weight of each communication, to specify key groups of elements, to select them and to emphasize communications between them. The main task of visualization — to submit key information on properties of system or the phenomenon the easiest method for perception. Ideally the analysis of system and visualization of its results can be made within one tool. R with its extensive set of packets allows it.

Introduction: visualization of networks


The main thing at design of visualization of a network — the purpose which needs to be reached. What structural properties we would like to select?
Visualization of static and dynamic networks on R, part 1

Cards of networks — not the only instrument of visualization of graphs — are in certain cases more preferable other formats of representation of networks, even simple diagrams of key properties.
Visualization of static and dynamic networks on R, part 1

In cards of networks, as well as in other formats of visualization, there are several key settings influencing the end result. Main — color, the size, a form and a relative positioning.
Visualization of static and dynamic networks on R, part 1

Modern representations of graphs optimize proceeding from requirements of performance and esthetic reasons. In particular, it is necessary to minimize imposing and intersection of edges, to set the identical length of edges in the graph.
Visualization of static and dynamic networks on R, part 1

Format of data, size and preparation


In this manual we will work generally with two small data arrays. Both contain information on mass media. One contains a network of hyperlinks and references in news resources. Another — a network of links between objects and consumers of mass information. Though it is a little data in examples, many ideas of the generated visualization can be extended to average and big networks. For the same reason we will seldom use visual means, for example, a form of characters tops: they cannot almost be distinguished in big graphs. Moreover, at display of very big networks it is possible even to hide edges as it is necessary to concentrate on identification and display of groups of tops. Generally speaking, the size of networks which can be visualized by means of R is limited only to the random access memory capacity of your machine. But it is necessary to emphasize that in many cases visualization of big networks in the form of a huge pompon is much less useful, than diagrams with key properties of the graph.

In this manual several key packets which need to be set before continuing are used. Some more libraries will be mentioned, but they are optional, and they can be passed. The following main libraries — igraph (it is supported by Gabor Tsardi and Tamas Nepush), by sna, network (it is supported by Carter Batts and the Statnet command) and ndtv will be used (it is supported by Skye of Bender-demoll).
install.packages("igraph")
install.packages("network")
install.packages("sna")
install.packages("ndtv")


Data set 1: list of edges


The first data set with which it is necessary to work consists of two files: "Media-Example-NODES.csv" and "Media-Example-EDGES.csv" (it is possible to download here).
nodes <- read.csv("Dataset1-Media-Example-NODES.csv", header=T, as.is=T)
links <- read.csv("Dataset1-Media-Example-EDGES.csv", header=T, as.is=T)

We investigate data:
head(nodes)
head(links)
nrow(nodes); length(unique(nodes$id))
nrow(links); nrow(unique(links[,c("from", "to")]))

Pay attention that it is more edges, than unique combinations of "from" - "to". It means that in data there are cases when between two tops more than one communication. We will contract all edges of one type between two nodes, having summed up them weight by means of function aggregate() on "from", "to" and "type":
links <- aggregate(links[,3], links[,-3], sum)
links <- links[order(links$from, links$to),]
colnames(links)[4] <- "weight"
rownames(links) <- NULL


Data set 2: matrix


nodes2 <- read.csv("Dataset2-Media-User-Example-NODES.csv", header=T, as.is=T)
links2 <- read.csv("Dataset2-Media-User-Example-EDGES.csv", header=T, row.names=1)

We investigate data:
head(nodes2)
head(links2)

It is possible to be convinced that links2 — an interface matrix for a bilateral network:
links2 <- as.matrix(links2)
dim(links2)
dim(nodes2)


Visualization of networks: the first steps with igraph


Let's begin with transformation of basic data into the igraph network. For this purpose we use the igraph graph.data.frame function which accepts two data units on an input: d and vertices.
  • d describes network edges. The first two columns contain identifiers of initial and final top for each edge. In the following columns there are edge parameters (the weight, type, a tag, another).
  • vertices begins with a column of identifiers of tops. All following columns are interpreted as top parameters.

library(igraph)

net <- graph.data.frame(links, nodes, directed=T)
net

## IGRAPH DNW- 17 49 -- 
## + attr: name (v/c), media (v/c), media.type (v/n), type.label
##   (v/c), audience.size (v/n), type (e/c), weight (e/n)

The description of object of igraph begins with four letters:
  1. D or U — for the directed or undirected graph respectively.
  2. N — for the named graph (where nodes have an attribute name).
  3. W — for the weighed graph (where communications have an attribute weight).
  4. B — for the bilateral graph (where nodes have an attribute type).

The following two numbers (17 49) specify quantity of tops and edges in the graph. Properties of tops and edges are also given in the description, for example:
  • (g/c) — property line at the level of the graph
  • (v/c) — property line at the level of top
  • (e/n) — property number at the level of an edge

It is also easy to get access to tops, edges and their attributes:
E(net)       # Ребра объекта "net"
V(net)       # Вершины объекта "net"
E(net)$type  # Свойство ребра "type"
V(net)$media # Свойство вершины "media"

# Можно работать с матрицей сети и напрямую:
net[1,]
net[5,7]

Now, when there is an igraph network, it is possible to make the first attempt to construct it.
plot(net) # некрасивая картинка!

Visualization of static and dynamic networks on R, part 1

It turned out not too beautifully. Let's begin to improve the picture, having cleaned cycles the graph.
net <- simplify(net, remove.multiple = F, remove.loops = T) 

You can note what could be used simplify, to contract several edges into one, summing up them weight by means of command of type simplify(net, edge.attr.comb=list(Weight="sum","ignore")). The problem is that at combination the edge type is not considered (in our data of "hyperlinks" — links and "mentions" — references).

Let's reduce also the size of arrows and we will clean tags (having set them in NA):
plot(net, edge.arrow.size=.4,vertex.label=NA)

Visualization of static and dynamic networks on R, part 1

In part 2: fonts and colors in diagrams R.

This article is a translation of the original post at habrahabr.ru/post/262079/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus