Methodology

The initial report from the project can be found here. The network approach allows us to focus our attention primarily on the interactions within any given system, whether that be a world of letters, journeys etc. The letters exchanged in this period are more than just networks of correspondence; people were bound together through community, print, and dialogue. Exchanging letters in a period of chaotic and often violent warfare was even more complex. Context is everything in understanding people, groups, identity, and behavior in the early modern world. Given this, understanding the relationships is paramount to visualizing the data. Behavior, relationships, and identity changed over the course of the Revolution and Networking the Revolution seeks to trace those changes and connect historical networks with visual interpretation by building a new database and visual representation of the interconnected nature of communication during the Revolution (1688-92). In the last 25 years, there has been a noticeable shift from historical records remaining piecemeal or fragmented in boxes in archives to a sea of digitized images and texts in the form of PDFs, JPEGS, CSVs, or PNGs. A number of manuscripts are now available online. Our goals in this project were to:

  • Visualize the world of communication from a corpora of letters.
  • Understand communication patterns during conflict and a regime change.
  • Parse group dynamics and information distribution from a chaotic period of transition.
  • Democratize the pursuit of knowledge and expand the potential for research in this area of research.

This project is a marriage of early modern Scottish, and in fact British imperial history and computational data science. It brings together methods of network science, prosopography, and traditional early modern political history surrounding communication. The visualizations show a story of connectivity over time. We chose this particular conflict because the sources do a very good job at capturing the difficulties in the reconstruction and administration of Scottish governance during such a chaotic period.

Phase one required turning qualitative data into quantitative data. We started by recording all of the pertinent data from the PDF versions of documents. We then turned this data into spreadsheet data, here we used categories: Id, Sender, Receiver, Location from, Location to, Latitude and Longitude, Type and Date. Later we converted this data to: Nodes that included node type, node ID, description, loyalty and Edges that included from type, from name, from location, edge type, to type, to name, to location, date, weight. This allowed us to parse the networking data and have it ready for exploration. The data was then cleaned using OpenRefine to split up latitude and longitude and keywords into different columns; we also made sure there were no blank tiles and duplicates. After creating a master spreadsheet information file, we then set about creating different sheets for different visualizations including people, places, keywords, nodes, and then edges (relationships). Given the large dataset of letters, it allowed for exploratory data analysis and investigation on different digital tools to identify the best representation of the relationships presented in the papers. One of the main tools we ended up using was the programming language Python; which contains a large number of libraries that extend the capabilities of the language, allowing for complex visualizations of the Network Data. One of the most prominent libraries used was networkx, which allowed the creation of network graphs along with the application of the Girvan-Newman algorithm to detect communities within the network. This algorithm works by repeatedly removing edges on the shortest path within the network. Additionally, the nodes are given corresponding colors to highlight their community, enabling an easier identification of groups in the network. The algorithm is important for understanding the network graph because a node with higher betweenness centrality would have more control over the network, due to the fact that more information will pass through that node. The implementation of these libraries and creation of visuals were carried out on Jupyter notebook, which is an open-source software for interactive computing. In addition, we experimented with tools such as Leaflet, Flourish, and Gephi for further analysis of the letters.