Book review: Network Graph Analysis and visualization with Gephi by Ken Cherven

network_gephi

This review was first published at ScraperWiki.

I generally follow the rule that if I haven’t got anything nice to say about something then I shouldn’t say anything at all. Network Graph Analysis and visualization with Gephi by Ken Cherven challenges this principle.

Gephi is a system for producing network visualisations, as such it doesn’t have a great many competitors. Fans of Unix will have used Graphviz for this purpose in the past but Gephi offers greater flexibility in a more user-friendly package. Graph theory and network analysis have been growing in importance over the past few years in part because of developments in the analysis of various complex systems using network science. As a physical scientist I’ve been aware of this trend, and it clearly also holds in the social sciences. Furthermore there is much increased availability of network information from social media such as Twitter and Facebook.

I’ve used Gephi a few times in the past, and to be honest there has been an air of desperate button clicking to my activities. That’s to say I felt Gephi could provide the desired output but I could only achieve it by accident. I have an old-fashioned enthusiasm for books even for learning about modern technology. Hence Network Graph Analysis and visualization with Gephi – the only book I could find with Gephi in the title. There is substantial online material to support Gephi but I hoped that this book would give me a better insight into how Gephi worked and some wider understand of graph theory and network analysis.

On the positive side I now have a good understanding of the superficial side of the interface, a feel for how a more expert user thinks about Gephi, and some tricks to try.

I discovered from Network Graph Analysis that the “Overview” view in Gephi is what you might call “Draft”, a place to prepare visualisations which allows detailed interaction. And the “Preview” view is what you might call “Production”, a place where you make a final, beautiful version of your visualisations.

The workflow for Gephi is to import data and then build a visualisation using one of a wide range of layout algorithms. For example, force-based layouts assume varying forces between nodes for which an arrangement of nodes can be calculated by carrying out a pseudo-physical simulations. These algorithms can take a while to converge, and may get trapped in local minima. The effect of these layout algorithms is to reveal some features of the network. For example, the force layouts can reveal clusters of nodes which might also be discovered by a more conventional statistical clustering algorithm. The concentric layout allows a clearer visualisation of hierarchy in a network.

It’s clear that the plugin ecosystem is important to the more experienced user of Gephi. Plugins provide layout algorithms, data helpers, new import and export functionality, analysis and so forth. You can explore them in the Gephi marketplace.

Cherven recommends a fairly small, apparently well-chosen set of references to online resources and books. The Visual Complexity website looks fabulous. You can read the author’s complete, pre-publication draft of Networks, Crowds and Markets: Reasoning about a highly connected world by David Easley and Jon Kleinberg here. It looks good but it’s nearly 800 pages! I’ve opted for the rather shorter Graph Theory and Complex Networks: An Introduction by Maarten van Steen.

On the less positive side, this is an exceedingly short book. I read it in a couple of 40 minute train journeys. It’s padded with detailed descriptions of how to install Gephi and plugins, including lots of screenshots. The coverage is superficial, so whilst features may be introduced the explanation often tails off into “…and you can explore this feature on your own”.

Network Graph Analysis is disappointing, it does bring a little enlightenment to a new user of Gephi but not very much. A better book would have provided an introduction to network and graph analysis with Gephi the tool to provide practical experience and examples, in the manner that Data Mining does for weka and Natural Language Processing with Python does for the nltk library.

This book may be suitable for someone who is thinking about using Gephi and isn’t very confident about getting started. The best alternative that I’ve found is the online material on GitHub (here).