Network data is everywhere.
From roads and supply chains to biological pathways and the internet, any items that share a common relationship can form a network. And node-link diagrams are an intuitive and commonly-used visual representation of networks.
Node-link diagrams were created long before bar, line, or pie charts, with simple examples dating back to the early 14th century. However, only recently have node-link diagrams been used not only for illustration, but also as an effective tool for exploring the underlying dynamics of complex networks.
An example of using visualization for modern network analysis comes from the psychosociologist Jacob L. Moreno. Moreno used node-link diagrams to analyze and illustrate various social structures. He published a node-link diagram in a 1933 New York Times article, depicting the web of friendships in an elementary school. Using this information, Moreno was able to identify the students who linked groups together, and the ones who didn’t really fit into any social group.
Today, node-link diagrams are routinely used to visualize relationships in many areas, such as mathematics, biology, investigative analysis, and business. In fact, given the dramatic rise of social networking platforms in the past several years, people are better equipped for understanding network dynamics in a visual manner.
So what are some of the design possibilities for node-link diagrams?
Just what is network data? A graph is the technical name for the underlying structure of a network. Graphs are composed of nodes (vertices) and the links between them (edges). Optionally, each edge can have a weight, which is a distinct value describing the extent of the relationship between nodes. Furthermore, each edge can be directed or undirected, meaning that the relationship is either one-way or mutual. Given this structure, there are several visual elements that can be tweaked when constructing node-link diagrams.
Node shape. Though typically represented as circles, nodes can be images or any other shape, including pie charts. (As a general rule, consider using color for known classes and shape for any further divisions.)
Node color. The color of a node usually represents some known classification. For instance, a network of voting records would use blue for Democrats and red for Republicans.
Node size. Size can also be used to represent quantitative relationships between nodes. However, be sure to avoid quadratic scaling and ensure that your nodes do not become so large as to hide edges or other nodes.
Edge direction. A common way of representing direction in node-link diagrams is with an arrow. Alternatively, tapered edges can actually produce better results, since the direction can be seen at any point on the edge rather than just at the ends.
Edge color. Edge color usually represents the type of relationship. Occasionally, a diverging color scale can be used to represent edge weights directly.
Edge size. Edge size is also commonly used to represent edge weights. Keep in mind that tapered edges will also affect the size of an edge.
Once the visual elements for the nodes and links are determined, the network must be arranged in some way. Methods for arranging networks in a given space are called network layouts. Network layouts are typically constructed with the help of a layout algorithm. However, sometimes manual layouts can produce fantastic results that automatic methods simply cannot achieve.
Network layout algorithms usually try to optimize based on criteria like “minimize edge crossings” and “minimize the distance between similar nodes”. There are several ways to define node similarity, but in layout algorithms node similarity is usually based on the number of edges shared between nodes.
One of the most common and widely-available layouts is the force-directed layout. There are many variations of force-directed layouts, but the essential idea is that edges act as springs, nodes act as attractors, and a sort of physics simulation is run to allow the nodes to adjust their positions based on the forces acting on them. For example, see this figure of a wiki using force-directed layout:
A number of underlying hierarchies, clusters, and isolated nodes are clearly shown. Other popular layouts include radial, balloon, and hierarchical methods. As an example, consider this comparison of balloon (left) and radial (right) layouts (via IBM):
Many layout algorithms can be found in the several free and open-source tools that exist for network visualization.
How to cure network hairballs
Node-link diagrams are very susceptible to clutter as the number of links and nodes increase. Consider this figure of protein interactions (via Wikipedia):
These are called network “hairballs” for obvious reasons. Cluttered diagrams make it difficult or impossible to make discoveries in the data. Fortunately, there are several ways of addressing network clutter.
One option is to tweak the visual attributes mentioned above. Are the nodes or edges too large? Could the nodes or edges be made semi-transparent to make overlaps less obstructive?
Another option is changing the layout algorithm. Try the different layout options in the tool you’re using. Coders may even consider modifying a layout algorithm to suit their needs.
One last technique worth mentioning is edge bundling, which is a layout algorithm for the edges themselves. Put simply, edge bundling algorithms pull similar edges together, kind of like merging roads into a highway. Consider the edge bundling in this network of U.S. air travel paths (via Danny Holten):
The results of edge bundling are impressive. Unfortunately, many edge bundling algorithms have not yet made their way from research papers into readily-available toolkits. But you can rest assured that researchers are continually working not only to improve existing techniques for network layouts, but also to fundamentally rethink how we visualize and analyze the networks that surround us.