Skip to Main Content

Digital Tools for Research

This guide provides information about digital tools that can be useful for research data management and analysis.

Network Analysis

A graph, or network is a data model consisting of nodes (vertices) and the connections between them, also called edges (links, arcs). Networks provide clear and intuitive visual representations of relationships, where nodes typically represent individual entities (such as people, organizations, or objects), and edges signify the connections or interactions between them. These networks appear in various contexts, including social, technological, and biological systems.

Communities

Nodes in graphs can be grouped into communities. A community is a dense subgraph where all (or almost all) nodes are interconnected.

 

Image source: TDS Archive

Network Analysis

Social network analysis (SNA), or simply network analysis (NA), is a research method used to understand and visualise how networks (graphs) function, and to identify the most important nodes within them. It involves analysing the connections between entities, as well as the characteristics of the entities themselves.

Image source: VisibleNetworksLabs

Graphs in Real Life

Looking at the London Underground, stations can be viewed as nodes, and tracks connecting them serve as edges. When calculating travel time, you are working with a weighted graph, where each track segment between two stations is assigned a time value in minutes.

Image source: Transport for London

And this is what the Internet looks like, represented as a graph of IP addresses.

Image source: Wikimedia

Image source: HelloAlgo

Bipartite

There are two subsets of nodes, and every edge connects a node from subset A to a node from subset B.

Image source: Tutorial Horizon & Wikimedia

Complete

Every node is connected to every other node, but there are no self-loops.

Image source: Wikimedia

Small world

Each node has few direct neighbours, but its neighbours are very likely to be connected to each other and most nodes can be reached from any other node in just a few steps.

Image source: ResearchGate

A metric is a quantifiable measure used to describe and compare models, processes and performance, e.g. customer retention rate or the number of users visiting your website. When comparing multiple items, the outcome of the comparison will depend on the metric used. For example, when comparing the research output of different academics, using the number of publications as a metric would rank the researcher with the most articles published as the highest. However, if you use citation count as a metric, the researcher whose work is most frequently cited, even if they have fewer publications, may be considered more influential.

In graph theory and network analysis, the following metrics are commonly used.

Degree is the number of connections a node has.

Weighted degree is the number of connections of a node divided by the total number of connections in the graph.

There is a number of ways to measure the importance of a node.

  • Degree centrality: the more connections a node has, the more important it is (A).
  • Closeness centrality: the more central a node is (i.e., the shorter the path from it to all other nodes), the more important it is (B).
  • Betweenness centrality: the more often a node connects two other nodes, the more important it is (C).
  • Eigencentrality: "the more friends your friends have, the more important you are" (D).

Image source: Claudio Rocchini / Wikimedia.

Assortativity coefficient determines with whom the "important" nodes are connected: if they are connected with other "important" nodes, the coefficient value is high, otherwise, it is low.

Clustering coefficient is the degree of interaction between a node's immediate neighbors, i.e., the probability that the node's closest neighbors are not only connected to it but also to each other.

Density is the ratio of the number of edges to the maximum possible number of edges. Communities tend to have a high clustering coefficient and high density.

Modularity measures how much denser the connections within a group are compared to the connections between groups. This metric is used to partition the graph into communities.

Graphs are usually stored in text files (.gml, .csv) or in XML files (.graphml, .gexf), where all the nodes, edges, and their attributes – for example, the name of a node or the weight of an edge – are listed. This is what .gml, .gexf and .csv files look like, respectively.

gml

A coappearance network of characters in the novel Les Misérables by Victor Hugo (lesmis.gml).

gexf

The same graph in GEXF format (lesmis.gexf).

csv

A co-occurrence network of the characters in the Game of Thrones (./game_of_thrones/book1.csv).

Further Reading

Network Data Sources