MapReader: Software and Principles for Computational Map Studies

Katherine McDonough, with Ruth Ahnert, Kaspar Beelen, Kasra Hosseini, Jon Lawrence, Valeria Vitale, Kalle Westerling, Daniel Wilson and Rosie Wood^{^[1]}

1. Introduction

Spatial data is a capacious category: it can derive from maps, catalogue data and other metadata, or fieldwork observations. Complex information about space and place is also embedded in texts, and thanks to developments in linguistics this data can now be unlocked at scale.^{^[2]} But as debates in the Digital Humanities (DH) have long recognised, scale represents both an opportunity and a challenge: a core objective of the spatial humanities has been to address the geospatial information overload faced by many fields, made all the more apparent as ever larger swathes of collections found in libraries, archives, and other heritage sites are scanned and made available as digital collections.^{^[3]}

The Living with Machines (LwM) project was concerned with various kinds of spatial data, as can be seen in Chapter 3. In this chapter, however, we are concerned with maps as a primary source and the opportunities that big digitised map collections present for developing new computational workflows and epistemologies. The field of computational map studies - named for the first time in another of our recent publications^{^[4]} - opens up the possibility of examining the richness of the surviving cartographic archive, but to do so requires a deeply interdisciplinary approach that combines the scholarly traditions of map interpretation with a new mindset geared to thinking across maps, at the scale of entire collections, or series. In computational literary studies, by comparison, we might think about the way that questions and approaches have been scaled up to deal with large-scale collections of texts.^{^[5]} In the same way, computational map studies need to develop new methods appropriate to the distinct challenges of analysing whole collections of maps.

This chapter introduces a new tool that we developed for undertaking such work: MapReader. Our purpose here is not simply to describe this tool, and its affordances, which we have done elsewhere,^{^[6]} but also, and perhaps more importantly, to reflect on the way that developing this methodological pipeline establishes a new epistemological pipeline for the historical researcher – one that reshapes the way that we conceptualise our research, develop our research questions, gather our evidence, and build our arguments. MapReader breaks down a method for working with maps at scale into steps that analyse abstract, but meaningful units – ‘patches’ and ‘maptext’ – to facilitate creative thinking about historical landscapes and cartographic practice that can provide different and complementary views on the past to non-computational analyses.

MapReader can be seen as a methodological response to two dimensions of the nineteenth-century information deluge we discuss in the book’s introduction. Following its creation in 1801, at the height of the Napoleonic Wars, the British Ordnance Survey (OS) oversaw the surveying and re-surveying of Britain and Ireland at multiple scales across the nineteenth century. The first 1” to 1 mile series, surveyed between 1801 and 1869, was published as 91 separate sheets, but later, more detailed series ran to tens of thousands of sheets. Millions of maps were printed and sold, although it was only after the First World War that the OS began to design and market maps as the essential tool for the outdoor enthusiast.^{^[7]} Crucially for us, and thanks to the resource-intensive work undertaken by the National Library of Scotland (NLS) to scan, catalogue, georeference, and ‘stitch’ together the georeferenced sheets (e.g. create ‘tiles’) of several series and editions of large-scale OS mapping, anyone can now browse these collections online, flying over the landscape like a bird. But despite the scale of the digital collections now available – over 15,000 sheets for each 6” to 1 mile edition or 50,000 sheets for the more detailed 25” to 1 mile editions – the user is constrained to navigate this information by searching for predetermined places of interest and looking at those maps sheet by sheet. Just as scholars of books and newspapers before us, we sought to address the challenge of how to analyse and interpret thousands of sources simultaneously using critical methods rooted in the humanities. If addressed carefully, analysing thousands of maps computationally could reframe our understanding of relationships between local, regional, and national spatial patterns. In short, cracking this challenge would enable a new era of historical research where map content can motivate new cultural, intellectual, environmental, social, and political history questions.

There is, of course, an established history of digital approaches to interpreting historical maps,^{^[8]} but since the development of deep learning and advances in computer vision (CV), the spatial humanities have radically changed. In the 2000s and 2010s, PhD students interested in using maps in coordination with digital tools would have been learning how to use software largely to build Geographic Information Systems (GIS), a limited approach to structuring and organising geospatial data that was not always well-suited to either historical sources or the spatial questions humanities scholars wanted to ask of them. A GIS (built with GIS software like ArcGIS or QGIS) is at its core a database. Two main data types can be stored: vector data (as points, lines and polygons) and raster data (continuous data with attributes stored for each pixel, for example, about land use classes). Visually, this data is presented as layers in the user interface, and various analyses can be used to edit, combine, transform, or export selections of the data.^{^[9]} Historical maps can be loaded as images, georeferenced, and then selected features of interest can be digitised manually, by tracing features like roads, for example, as lines. Scanned maps are also often simply background images, used as an alternative to a Google, Mapbox, or OSM base map.

However, LwM work has been part of a wider push in the spatial humanities to reorient the field around open source technologies, reproducible methods, and more capacious uses of maps as primary sources in digital spatial history in particular.^{^[10]} Within this wider community of practice, MapReader arose from a specific set of aims, which coalesced with the spirit of the LwM project. We sought to create an iterative, reflective, collaborative, open, and –in terms of its development– agile research environment for analysing a range of digitised, historical sources. We took the project’s ‘radical collaboration’ mindset seriously as the opportunity for working with maps from the NLS took shape.^{^[11]} This was not about historians bringing a problem to computer or data scientists: working with maps as data was an opportunity that the historians, DH scholars and data scientists on the project co-designed. Thus from the outset, MapReader was a unique proposition sitting between the data science community’s aim to produce generalisable code, the DH penchant for experimentation, and the historian’s aspiration to work outside the existing parameters defined by GIS. Furthermore, as a result of the spin-off project, Machines Reading Maps, we were also able to address a major gap in historical map data creation: maptext. Combining access to both visual and text information on maps represents a major shift towards conveying the multimodal reality of cartographic sources at scale.^{^[12]}

In its current form, MapReader offers historians an entry point to asking questions about maps, and creating data needed to answer those questions. Two main tasks provide pathways to labelling visual and text information independently and together: ‘patch classification’ and ‘text spotting’. Patches are at the heart of MapReader’s approach to data. They are created by imposing a grid on top of a map sheet image – the size of the squares forming the grid can be defined by the user – and defining each grid square, or patch, as the new unit of analysis. The patch becomes the visual frame of reference that is first labelled by the researcher in the process of creating training data (annotation), and then classified automatically by the trained CV model. Labels are user defined, and developing these iteratively during the annotation stage is an essential part of the conceptual shift encouraged by MapReader. As we explain further below, labelling patches is a creative activity that every historian will (and should) approach differently. Its very ethos contrasts with the reductive approach to digitising ‘what is on the map’ based on a) predetermined classes (mimicking features identified in map legends) and b) pixel-level precision (which we contend is unnecessary for most historical analyses). By freeing ourselves from the manual digitisation of micro-level data, we not only accelerate the research process, but also open ourselves to new opportunities to define, through an iterative process, the attributes of a map that can be meaningfully interrogated. MapReader thereby offers a new epistemological space for thinking about maps as primary sources, outside of the rules imposed by software and non-humanities disciplines.

It is that epistemological shift, and the opportunities it opens for those who wish to work with maps in the digital research environment, that we focus on in this chapter. Elsewhere, we have presented the technical details for how MapReader functions and reflected on MapReader’s interventions in the DH and the broader spatial disciplines.^{^[13]} MapReader offers a paradigm shift in the spatial humanities, allowing users to ‘see’ maps at scale in ways that adapt and refine scholarship on map understanding. MapReader is not a tool divorced from the archival reality of map collections or the topics of interest to historians across many fields, from social and political, to intellectual and environmental history, and more. If computational approaches to historical maps in non-humanities disciplines are usually geared towards the quantification of the built or natural landscape, MapReader firmly promotes a qualitative and critical view of the landscape presented on maps.

2. An epistemological shift in humanities software design

In 2025, MapReader is part of an emerging ecosystem of ‘Open Maps’ tools, standards, and data – a term coined by Vincent Baptist and Jules Schoonman.^{^[14]} But how did we get here? Like most tools, MapReader did not appear out of thin air. But, it did represent one of the first sustained efforts to link up machine learning, CV, digitised map collections, and humanities inquiry. Most previous and ongoing humanities work using CV – with the exception of library science-driven research on document processing and optical character recognition– has focused on photography, art, or other audio/visual archives.^{^[15]}

Work exploring maps with CV before MapReader has largely been undertaken by two separate communities, with different norms and aims. The first are geographic information and environmental scientists treating maps ‘as data’, i.e. as objective spatial observations, usually to combine these with other geospatial data for research on environmental or geological resource discovery and protection, or demographic research.^{^[16]} The second community working with CV and maps have been humanities scholars. Early on this community was seeking to build Historical GIS (HGIS) of neighborhoods, cities, or regions .^{^[17]} This kind of work led in turn to the development of ‘mirror worlds’ of historic cities like Amsterdam, Venice or Paris, mostly using semantic segmentation methods.^{^[18]} To radically oversimplify: by and large early HGIS work isolated specific features (cartographically-speaking) while, humanities-driven segmentation efforts have sought to understand the entire spatial footprint of the map (often across a large series).^{^[19]}

In evaluating these communities of practice on LwM, we concluded that both overlook a key lesson from the history of cartography: the ability to create digital data from map content in a way that did not suggest such content was an objective representation of a historical landscape. A critical approach to historical cartography – that is, critiquing the mapping process in its historical and technical contexts – is now deeply embedded in how scholars’ engage with maps in an analogue setting.^{^[20]} We wondered whether it would be possible to transform these practices to a digital research environment. Could there be an information processing method that was both automatic and critical?

A critical approach to maps as data was therefore the first fork in the road, setting what would become MapReader apart from other previous and concurrent computational approaches to maps. If discussions about the complexities of working with digital facsimiles are familiar across many kinds of media, maps face a particular challenge. If the scanned image of a map is the digital facsimile, and the physical sheet is the original, how do we refer to the landscape the map portrays? When we create datasets based on map content – commonly, but problematically, called ‘extracted features’ – what is the relationship of these features to the historical landscape? Where other scholars compare facsimiles and ‘originals’, map users often conflate derived ‘extracted’ data with the reality, or the ‘ground truth’ of a past landscape. Extracted features calcify into facts on the ground extraordinarily quickly in the absence of an approach that puts the image of the map into conversation with what is being datafied. This proximity between the map image and the new digitised data – the patch – is one of MapReader’s key features. It sidesteps reifying derived data as truth statements on a map by rejecting the standard forms of geospatial data, and offers the patch format as an intentionally unfamiliar transformation of scanned maps.

The next parting of ways from other computational approaches to maps came from a decision not to build an HGIS database and user interface. An obvious choice may have been to create a geospatial database of nineteenth-century Britain based on various layers of features identified on Ordnance Survey maps. Avoiding this route was, in part, a practical response to the ongoing overheads of maintenance for such outputs, and the well-documented challenges posed when the funding and expertise for such work disappears after a project ends.^{^[21]} But there was also a positive side to this, in that we wanted to open up ourselves to a more creative, experimental approach to the information that we could see on the maps, and in a way that foregrounded the analytical process. We were inspired by the Linked Open Data approaches from Pelagios, and other projects using more flexible, non-relational data structures. While relational databases store data in multiple related tables, flat files store data in a single table. By choosing to store the outputs from MapReader as a flat file, we realised we could enable users to decide for themselves how they wish to relate their data (e.g. extracted features) to other historical geospatial data or knowledge. Therefore, a primary benefit of MapReader’s flat data structure is that it can handle varying or evolving data schemas, making it highly flexible – well-suited for initial data collection, qualitative data, or situations where the structure of the data is not yet well-defined or is expected to change rapidly.

Building on these first two decisions, defining the ‘patch’ as the unit of analysis was a major shift. It introduced a form of coarse-graining that went against the highly-precise data granularity of prior digital approaches to map analysis, but still enabled a CV task that could infer the content on maps. But how precisely-located do historians really need map content to be? What kinds of research questions lend themselves to more or less precise data requirements?

These questions had not yet been asked, despite the fact that it is more computationally expensive to create more precise data, and it was not clear that such precision was useful in all cases. Previously, the assumption had been that, for cartographic documents, machine learning was simply a tool for automating whatever common tasks a GIS software user could perform. This attitude was so deep-seated that most GIScience research of the early 2000s and 2010s never considered alternatives, and the consequence was a reification and reuse of methods that create datasets that suggest locational accuracy and conform to vector data types (points, lines, polygons). Encouraging scholars to use machine learning tools as if they were merely reproducing the work that a human might do to trace features manually into a GIS is a disservice: it blurs the distinction between what a human might see and what a machine can see, but even more problematically, it prevents the human user from looking at the map with fresh eyes, and considering potential new ways of labelling map content. The scholar coming to maps as primary sources is not a ‘map user’, they have different needs when it comes to deciphering maps and MapReader is an attempt to offer an intellectual aid for this critical, evaluative process.

By defining the patch as the essential data unit MapReader takes away the impulse to delineate features, and instead emphasises looking at the information contained in the patch and developing a label that is conceptually useful for a given question, and visually depicted in the patch. It is represented as a raster image – that is, a digital image, in which the cell values relate to the pixel colours of the original image – of a square region of a map. We turn the constraint of a defined raster space into an opportunity to engage with the map as a historical object proposing a particular vision of the landscape at a given point in time. The patch is a nod to the essential contributions of twentieth-century history of cartography scholars like Matthew Edney who have reshaped historians’ understanding of mapping practice and thus maps themselves.^{^[22]} Just as the map is an argument about the landscape, so too is a scholar’s label of a patch.

Finally, we made an early decision to limit MapReader’s interface (and indeed, other LwM tools) to Jupyter notebooks and for all code to be open-source from the start. Jupyter notebooks are a place for storing and sharing computer code in an environment where the user can freely combine human-readable narrative with computer-readable code.^{^[23]} But as the work ‘notebook’ suggests, it is a working space rather than a finished user-oriented product. Choosing this interface meant that we could focus on the sequence of methodologies rather than the complexities of user experience. This decision also reflected our aim to position MapReader as a tool that helps humanities scholars experience CV and machine learning in action. Rather than dropping an image into a frame on a website and seeing the output, MapReader requires users to take the journey through the code to load, annotate, train/fine-tune, and infer labels for patches in their set of maps. Looping through these steps not only helps them refine their knowledge of the maps’ visual style, and thus also revise their questions and patch labels; it also serves a pedagogical goal of learning through repetitive action. Scholars not previously familiar with CV or machine learning – or even Python – come away from MapReader with a significant boost to their skills.

Because all of the code is open, it is easy for users to peek ‘under the hood’ and for colleagues to get involved in maintenance and development, even if it is just reporting bugs. MapReader workshops thus introduce participants to MapReader as a research tool, but also to key digital research infrastructure and concepts like versioning, code and data repositories, licensing, and open web and geospatial frameworks and standards. All of these will set them up as informed digital scholars and librarians (for example). Borrowing many best practices in open and reproducible research from The Turing Way,^{^[24]} we set up MapReader with an open license, ensured versions of the code could be tracked, created mechanisms for acknowledging contributions to the project, and developed a guide for future contributors. Our commitment to openness – not only via the transparency of the method within the notebook interface, but also via our open code and abundant documentation – is, unfortunately, still uncommon for research software in the humanities.^{^[25]} MapReader is thus a contribution both to computational map studies and also to DH more broadly, acting as an example of reusable research software that has expanded from its initial use case on Living with Machines to a growing, international community of users and contributors.^{^[26]}

2.1 Process and Pipeline

So, what is it like to use MapReader? How exactly do users adopt a critical, humanities mindset from the start and what does this look like throughout the entire experience? It can be useful to think about the MapReader Jupyter notebooks as the formal ‘experimental’ part of a research process that has preliminary and follow-up steps. Just because we are adapting to understanding maps using computational methods, does not mean we are leaving behind all scholarly traditions for the study of historical cartography: but we do need to ‘digitise’ some of these if we are working with hundreds or thousands of maps instead of only a few.

First, as explored in Chapter 1: ‘The Environmental Scan as a method for digital source criticism at scale’, users should spend some time getting to know their maps, their collection history, and the digital assets that are (or are not) available. Ideally, while doing this, they are also establishing a working relationship with the librarian or other person curating this collection. For example, on LwM, it was essential to work closely with Chris Fleet, the map curator at the NLS, who has shepherded through the metadata creation, digitisation, and georeferencing of the NLS’s Ordnance Survey collections. Fleet, on behalf of the NLS, was able to provide access to sheet-level metadata, imagery that was much higher quality than some other, older digitised versions of OS maps available from a national data service in the UK – as well as expertise on the maps themselves. Once we had the metadata, we were able to begin exploring the relationship between what had been scanned, what had been georeferenced, and what we wanted to research. Figures 2.1, 2.2 and 2.3, for example, are demonstrations of the metadata visualisation tools developed by Olivia Vane for LwM.^{^[27]} They were essential to building up our knowledge of this complex digitised corpus, and informed work to digitised additional sheets that the NLS had not been able to scan. On LwM, we made the decision to work with NLS sheets that were already georeferenced (e.g. the pixels of the scanned images have been translated to geospatial coordinates after manually ground control points at the four corners of each sheet, and shared via tileservers as XYZ tiles^{^[28]}), but it is equally possible to work with a collection of maps that has not yet been georeferenced. With maps and metadata in hand, it is possible to tie them to a rough research question that can be iteratively boiled down to a concept, or phenomenon, that has a visual signal found in the maps.

Figure 2.1. Animated Map Chronologies for viewing thumbnail images of series map sheets over time.^{^[29]} Credit: Olivia Vane.

Figure 2.2. Macromap tool for examining the series map collections via metadata. Credit: Olivia Vane. The data used in this notebook (except for the 'Old Series') is provided by the National Library of Scotland. The data is available for re-use under the Creative Commons Attribution Non-Commercial Share Alike 4.0 (CC BY-NC-SA 4.0) licence. For more information visit https://maps.nls.uk/copyright.html.

Figure 2.3. Ordnance Survey, Six-inch to the mile maps of England, Wales and Scotland, 2nd Edition - 1887-1949. Animated chronology by publication year. Credit: Olivia Vane for visualisation, National Library of Scotland for map metadata and images.

Figure 2.4. Seven steps in running MapReader in a Jupyter notebook. Credit: Rosie Wood.

Once these preliminary steps have been completed, running MapReader in a Jupyter notebook usually has seven steps (see Figure 2.4).^{^[30]}

MapReader accepts map images as input in a few formats. If you are using maps that are available via a tileserver (like the NLS OS maps, or some composite maps from the David Rumsey collection), then you begin with the ‘download’ module. This process uses required fields in the metadata to define the geographical extents and zoom levels for the (physical) sheets you want to download as separate images from the ‘stitched-together’ tiles.
Alternatively, if you are loading maps from online collections using the International Image Interoperability Framework (IIIF),^{^[31]} or locally from your local computer storage (or other storage), you begin with the ‘load’ module. Loading maps and metadata is the first basic step in setting up MapReader.
Next, you ‘patchify’ your maps. This transforms your ‘parent’ image (of an entire map sheet) into patches. After the selection of your maps, choosing the size of the patch is the next intellectual decision you make. This is also the first place where the flexibility of MapReader comes into play: it is easy and relatively quick (depending on the size of your corpus) to ‘re-patchify’ if you decide you want to work with a different size. (It can be handy to only experiment with a small sample of sheets initially to determine the best patch size for your research, so that changing patch sizes is quicker and more efficient.)
Once you have patches, you set up your annotation task. Annotating patches is the process of assigning a label to a sample of patches. In Figure 2.5, the buttons at the top left contain the 2 user-selected labels, while the ‘back’ and ‘next’ buttons facilitate correcting errors or skipping edge cases. The annotation interface allows users to resize the image, order patches according to a range of different metadata, and also choose whether to include ‘context’ (e.g. a buffer of one extra patch around the patch in question). These labeled patches are then used for training, validation, and evaluation data. Your choice of labels is the third significant intellectual choice: mapping the concepts of interest in your research question onto the visual information on the map is not always easy. You can choose a binary yes/no-style classification task (e.g. Label 1 might be “Concept A”, and Label 2 might be “Not Concept A”), or a multi-class task (Concept A, Concept B, Concept A & B, Not Concept A or B). You must annotate enough patches to have sufficient examples of each label to be able to divide the patches into three sets (or ‘splits’) for fine-tuning, testing, and validating your model.
Next, using the training split of your annotated patches, you will fine-tune an image classification model. Poor results often mean that additional annotations are required.
Once you are happy with the performance of your model, you can use it to predict, or infer, the label for every patch in your set of maps. Inference results include confidence scores, so you can review low-confidence inferences to see if your model requires more examples of certain kinds of labels.
At the end of the prediction module, you can export your dataset as structured data (CSV, JSON). The structured data exports always contain the pixel and/or geospatial coordinates of the patch boundaries, or you can choose to export only the coordinates of the patch centroid (e.g. the geometric centre point of a patch, see Figure 2.6).

Figure 2.5. Screenshot of annotation module with context turned on. Credit: Katherine McDonough, National Library of Scotland for map images.

Figure 2.6. Patch centroid locations + map sheet metadata + predicted labels saved as tabular data and plotted using kepler.gl. Patch centroid colour spectrum based on label confidence measure. Darker = higher confidence for rail space. Image credit: Katherine McDonough (May 2021), National Library of Scotland for map images.

Across all of these steps, a number of new files are created, either images or dataframes. If you downloaded data from a tileserver or IIIF, you will have local images and metadata for your ‘parent’ map sheets stored in file directories of your choice. Likewise, patch images and metadata are stored in a separate directory, each with a unique ID. The patch-level metadata file can contain key elements of the parent map metadata, such as the publication date, the URL of the original resource, or the map title. Once manual annotations have been made, this creates an annotation metadata file that associates a patch’s unique ID with an annotation label. Finally, after fine-tuning one or more models, you can store the details of your saved models for future use (or to share via a platform like Hugging Face, as we have). The inferred data created by the model is your final output: this is effectively the patch metadata dataframe with new columns for the inferred label and the model’s confidence (out of 100%).^{^[32]}

Once you have exported your inference results, it is possible to perform any number of post-processing steps for evaluation and analysis. For example, you could calculate the surface area of your patches across the nation, and compare how your concept changes over time. Or, as we have done, you can explore what text intersects with certain groups of patches.^{^[33]}

3. Software communities and expanding the pipeline

Beyond all of the domain specific considerations described above, MapReader was created by an interdisciplinary and experimental team who seized the opportunity to build a piece of research software that could in fact address the needs of many scholars confronting CV for the first time, not just people exploring maps. The SciVision project at the Turing Institute was the first to see the potential of applying MapReader to new imagery: in this case, inferring phenotype data in whole plant images (of Brassica napus).^{^[34]} MapReader’s railspace models were fine-tuned to identify branches, leaves, flowers, flower buds, and pods, with robust results. Evangeline Corcoran and her co-authors highlighted MapReader’s ‘versatility…for automated analysis of images from a vast variety of scientific and humanities disciplines, but also in terms of allowing great flexibility in the modelling approaches the MapReader framework can facilitate.’^{^[35]} In short, the critical framework for using patches as a consistent, flexible, and conceptually-agnostic unit of analysis turns out to be appealing in a range of disciplinary contexts.

Since 2024, MapReader also incorporates a second CV task: text spotting. Text spotting started, and is still usually implemented, as a composite task, combining a detection step and a recognition step, using two different models. The first model detects the bounding box around a text object on the image; the second model recognises the characters of the text within the box. At the outset of LwM, we were immediately drawn to the possibility of analysing maptext, both on its own, and in relation to other visual map content. We were aware of the complexities around ‘OCR for maps’, and experimented with existing tools like Strabo.^{^[36]} Choosing eventually to focus on the visual ‘patch’ classification element with in LwM, McDonough won joint NEH & AHRC funding for the Machines Reading Maps (MRM) project, which she co-led with Strabo-developer Yao-Yi Chiang and Deborah Holmes-Wong, where we set up the foundations for working with maptext as research data and to improve discovery in online map collections.^{^[37]} Working with partners at the Library of Congress, the British Library, and the National Library of Scotland, the transatlantic MRM team not only created several new open-access datasets of maptext using the newly developed mapKurator pipeline and models, we also developed tools and methods for manually annotating maptext (with a bespoke version of the Recogito annotation tool).^{^[38]} MRM culminated with a further collaboration with the David Rumsey Historical Map Collection at Stanford Libraries, processing the text on 57,000 georeferenced maps and making this text searchable and editable in the davidrumsey.com catalogue. Finally, at the conclusion of MRM, we implemented a new text spotting task in MapReader, to make this more accessible to users who were not able to access the earlier MRM text spotting tool. The text spotting pipeline also works from patches, but reconstructs words that sit across patch boundaries in a post-processing step (See Figure 2.7). With both text spotting and the patch classification pipelines, MapReader offers the only open-source, reproducible, multimodal approach to computational map studies.

Figure 2.7. MapReader’s text spotting pipeline. Credit: Rosie Wood.

A number of teams are using MapReader’s text spotting capabilities to find and analyse text on early modern and modern maps in Dutch, French, English, and other languages.^{^[39]} Our own work is focused on examining how ordering, structuring, or otherwise organising maptext can shed light on both cartographic practice, and historical built and natural environments. Taking the next logical step from the ‘Beyond the Tracks’ chapter work to investigate the social impacts of railway infrastructure in British communities (Chapter 3), we have used text intersecting with railspace patches to differentiate between different types of railspace.^{^[40]} As one of the first studies that uses maptext not simply to explore the history of place naming, this work sets a precedent for a truly multimodal and historical exploration of map content.

MapReader emerged from the rich collaborative environment cultivated in LwM, and we continue to embrace interdisciplinary collaboration as a key to sustaining MapReader longer term. During the Data/Culture project which followed on from LwM at the Turing (2023-25), MapReader was selected as a featured research software package around which a programme of community building activities was designed. From community calls to a series of in-person and online workshops, this period allowed us to teach new colleagues how to use MapReader, and to learn about their interests in computational map studies. Workshops at the Turing in London, as well as in Lancaster (UK), Paris, Richmond (Virginia), and at DH2025 have thus both expanded our user community and shaped recent development. Our focus in 2024-25 on implementing IIIF resources as input, for example, has emerged from frequent requests from scholars and librarians for this feature, as well as the emerging ecosystem of open-source infrastructure for maps and geospatial data with IIIF at its heart. Thanks to more recent support from the Software Sustainability Institute, for example, McDonough has been able to collaborate with the Allmaps team to link up their open-source, browser-based georeferencing tools for IIIF map collections with MapReader.

Collaborating on software is a different kind of work from the collaboration that most humanities scholars are familiar with, for example, when co-writing an article, or putting together a conference. For the humanities-trained colleagues on the MapReader team, every stage required learning about best practices in open, reproducible software development. For our data science and software engineering colleagues, there was likewise continuous learning about library and archive collection curation and digitisation biases, and meaningful concepts and research questions. A frequent refrain across LwM really resonated for MapReader in particular: never assume that you and your colleague ‘know’ something in the same way, or that you have the same assumptions about what ‘significant’ claims look like. From its initial design, to its basic functionality to answer an initial research question, to maintenance considerations and training opportunities, were were always learning from each other, and the nature of this collaboration means that MapReader reflects the ambitions both of the humanities to implement methods that respect our sources and allow us to make meaningful arguments and of software engineering to build tools that can be reused and maintained easily by many.

Even as it drew on expertise from many fields, MapReader was deeply embedded in humanistic epistemologies and research processes. Laying out the path of MapReader to date so far, we have focused on how building software like MapReader is actually a humanities endeavour. But what can our approach to reading and viewing maps do in historical research? How is this different from other ways of looking at maps and why does that matter?

4. Using MapReader: an epistemological shift for historical research

Patches offer a new frame through which to view historical research questions. On LwM we were interested in the evolution of rail infrastructure and the way that it changed the lived environment in Britain in the nineteenth century (which led ultimately to the work outlined in Chapter 3. ‘Beyond the Tracks: Re-connecting people, places and stations in the history of late-Victorian railways’). GIS-based approaches invite us to conceive of railways as a network of lines.^{^[41]} By comparison, the framework of the patch allowed us to focus on railway infrastructure embedded within the landscape. For others, patches offer a non-traditional way of defining the unit of analysis of map content. At scale - across thousands of maps - these two types of data usefully constrain the viewer, and offer an opportunity to focus on original interpretation of what the map depicts. The patch, in particular, is a tool that by its very design pushes the user to see the map as a historian. Digitising map contents using traditional GIS tools steers users to adopt the tools of cartography to interpret maps. MapReader is an opportunity to break out of this methodological box.

Working with one map is not the same as working with 10,000 maps. But we do need to think about how to translate the lessons from the history of cartography to working with maps as data at scale. MapReader is just one of these ways of working with maps at scale, perhaps lighting a way through the mists of machine learning. Thinking not just about the expanse of the sheet, but about the patch as the point of departure is a fundamental shift in ‘map reading.’ This entails a different set of intellectual priors and outcomes, just as the computational pipeline shifts the material conditions of map reading from the maps themselves to input from a tileserver and outputs of csv of json files. In so doing, it precipitates a significant epistemological shift, which we can perhaps more fully grasp if we think of it as an intellectual pipeline that corresponds to the technical pipeline that constitutes the MapReader tool.

In the twentieth century, two schools of thought emerged around map understanding. First, geographers – usually writing as self-declared cartographers, or scholars of modern cartography – had been concerned with best practices in making maps, which was often linked to the question of how effectively to use a map. Philip Muehrcke began to unpick the assumption that map making and map use practices were mirrors of one another when he wrote that ‘map use is not the simple reverse of map making.’^{^[42]} He knew that students and colleagues needed a guide to use a map, not just make one. The second school of thought comprised Muehrcke’s Geography colleagues at the University of Wisconsin and their students. This group are often lauded as the first historical geographers to bridge the next hurdle: from seeing maps as containers of useful, objective information, to examining them as historical documents with a variety of biases.^{^[43]} These historical geographers, writing as scholars of maps, developed a practice of interpreting maps not as users, but as historians.

In parallel we must think of the impact of the automation of machine cartography and the emergence of digital cartography. Following the move of command line tools to desktop applications in the 1990s, and platform-based tools in the early 2000s, mapping software like ArcGIS and QGIS quickly became ubiquitous in the academic and commercial world, where it was used not simply to make maps, but principally to analyse spatial data. However, the way that this software was designed by and for cartographers necessarily limits the possibilities of interpretation – essentially undoing the work that Muehrcke’s colleagues had done to tease apart cartographic utility and post hoc interpretation. This is particularly strange when you consider that making maps, using maps (in everyday life or work), and understanding maps (from the past) are all highly distinct practices. Yet scholars working on those very different processes in digital settings are still almost all using the same software. In other words, since the emergence of digital mapping, the capacity for doing critical humanities research using historical maps has been conceptually restricted to the ways that cartographic design wants users to engage with maps.

MapReader is designed especially to address the need to understand maps. It develops an approach to step outside the bounds of the data structures and tools that link map design and spatial analysis, offering an alternative to the software that was designed to make maps and analyse contemporary spatial data. MapReader enables users to regard the map as an argument, or a proposition, as we have become accustomed to do in non-computational map interpretation,^{^[44]} and to avoid the pitfalls of computational map processing that treats the map as a mirror of the landscape. For MapReader, patches and text are the alternatives to seeing ‘features’ as the components that make up a map.^{^[45]} Instead of expecting to find discrete features, patches open up a space (literally) to reconceptualise what we see on the map, and what we call it. Expecting objectivity from a map suggests that there are objects contained within that can be found and extracted. By changing our expectations and asserting a critical stance vis a vis the map, we no longer assume that a map can be teased apart into wholly distinct objects or features, nor that what we are doing is removing data from its context (as suggested by the dominant language of data ‘extraction’).^{^[46]}

Working with patches as a unit of analysis changes the way one asks and refines research questions, reconfigures what phenomena are of interest on a map, and shapes interpretation. To understand how this works, we might compare the epistemological process to discussions in computational literary studies in ‘algorithmic criticism’. The term was coined by Stephen Ramsay, who writes that ‘if text analysis is to participate in literary critical endeavor in some manner beyond fact-checking, it must endeavor to assist the critic in the unfolding of interpretative possibilities’.^{^[47]} Replace ‘text’ with ‘map,’ literary critical’ with ‘historical’, and ‘critic’ with ‘historian’, and we have a good basis for what MapReader does with patches. MapReader truly enables reading of patches, and therefore interpretation, because we are not pre-defining the object that is present (on the map, but also, by extension, in the historical world), nor are we assuming that a patch can only be classed in one way. This ‘deformation’ of the map is what sets MapReader apart from other map data processing pipelines that extract vector data or segment sheets into standardised classes. A patch is an open space, its label is called whatever suits the scholar, as long as there is a relationship between that concept and the visual information within the bounds of the patch. There are as many ways to read a map as there are questions about maps. And, as Ramsay argues, it is the ‘rigid, inexorable, uncompromising logic of algorithmic transformation as the constraint under which critical vision may flourish.’^{^[48]}

Furthermore, for MapReader, modelling maps as sets of patches is a way to avoid the error of assuming that the map represents the historical landscape. Mark Algee-Hewitt makes a similar pitch for literary studies when he suggests that we must ‘let go of the idea that we know our artifacts well.’^{^[49]} Instead, as he argues, ‘by complicating our understanding of a phenomenon like authorship through a computational model, we give ourselves the opportunity to revise our understanding of the underlying phenomena outside the constraints of habituation and practice such that we can radically alter our understanding of the object itself.’^{^[50]} In terms of maps, the opportunity for revision comes with letting go of the idea of discrete map features as the model for the digital artifact, or indeed as a digital placeholder for anything ‘real’. With MapReader the transformation of the map into patches creates an unexpected object, one that we are not trained to analyse. The positive consequence of this is that it reconfigures what it means to look at a map.

This new shape, or object of scrutiny, will not meet the needs of every historian, every map, or every question. MapReader should be viewed as a critical intervention offering an alternative to the assumptions inherent to the design of GIS software and industry-standard formats; but, it can also operate as a complement to these where appropriate. Our work in Chapter 3 is a case in point: much of the downstream analysis of the ‘converged’ railspace and street-level census data used QGIS as well as Pythonic geospatial libraries like geopandas. MapReader offers an experimental space in which to test, in an iterative, and visual-first environment, just how much precision one needs to address the issues of interest. This is possible through choosing the size of the patch (in pixels or metres), as well as the labels.

We can explore these principles and practices through the example of LwM’s railspace concept. In previous digital explorations of the history of rail, the focus has been on mapping and reusing vector data for railway tracks, often to answer socio-economic history questions.^{^[51]} In contrast, our question was about the lived experience of rail at the height of its activity in Britain. Historians had been aware, through neighborhood, town, or city-based case studies, that the arrival of rail had increased class-based social segregation in British communities. But was this a uniform process nationally, across urban, suburban, and rural areas? And what of the distinction between living near a station versus living in close proximity to major railroad works? In short, we sought to investigate what it was like to live near (or far) from rail both as a mode of transport (e.g. measured by distance from a passenger station) and also as a major infrastructural presence throughout the landscape. We arrived at this question as a group of social and cultural historians interested in the interplay between politics, the built environment, and technology after intensively studying what OS maps were accessible and what seemed like a timely opportunity to connect historical microcensus data to questions about residential class segregation. MapReader’s patches reinforce that we are not extracting tracks from maps as vector data, but locating the footprint of railway infrastructure among its surroundings, which we called railspace.

Railspace is only visible as patches, not in line data. The concept emerged from the constraints of this digital transformation of the map sheet: and thus our research question was refined as we adapted to working with patches. Unlike vector data, these 100 metre by 100 metre bounding boxes around segments of the map retain a connection to the map image, and repeatedly remind the viewer that rail infrastructure sits within a complex built and natural environment. Railspace becomes the set of patches that an image classification model can identify as sharing certain signals, as identified during the annotation of training data: This training data was generated by the historians on the project annotating patches in which they observed features they identified as ‘rail’, but the way that the model works – by ‘noticing’ patterns across large-scale data – means that it learns not only how rail is represented visually on maps, but also the wider contexts in which it habitually appears. The method thereby returns attention to context in two senses, both on the part of the model, but also on the part of the scholar, who is encouraged by the nature of the annotation task to scan the entire patch, taking in that wider context. As such, this highly computational pipeline in fact enables a way of looking at maps that is more similar to the ways historians traditionally looked at maps, before GIS reduced objects to lines and points.

Far from being an unthinking mechanical activity, therefore, the patch annotation task is essential to the intellectual process of concept definition. This is supported by the iterative nature of the annotation module. By this we mean that it is possible, indeed encouraged, to use this step of the pipeline to experiment with the name of your label, the guidelines you set for what the visual requirements are for a given label, whether or not your set of labels is sufficient for your needs, and, ultimately, whether your research questions also need revising. For scholars still getting to know their maps well, annotation offers a chance to look, patch by patch, across the corpus. In this way, annotation eventually consolidates the scholar’s conceptualisation of their labels, while also identifying edge cases (the patches that one struggles to label definitively), which may be historically significant. For LwM, annotating railspace patches first collectively and then individually highlighted another affordance of this approach: annotations with a team can be a productive way to debate conceptual definitions, while one person’s annotations reflect their unique interpretation of labels. Options for labeling practices can thus introduce opportunities for establishing shared understandings of novel concepts, or enable wholly personal ‘views’.^{^[52]} This is a major contribution: MapReader enables fast, reproducible, bespoke classifications of map content through patches, facilitating historical arguments that may be (as historical claims often are) attached to one scholar, or to many.

As a tool, MapReader facilitates refining questions, concepts, and modeling big historical data in a way that challenges received notions of how to interpret maps. Like the scholarly trip to the archive, time spent with maps in MapReader is part of an active research process. It is also a methodological nudge for scholars to be critical users of digital map collections. We know not to use a physical map as an objective representation, and MapReader’s design offers guardrails to avoid this, and enable new readings.

But how does the output of the pipeline correspond to what a historian might regard as an outcome? MapReader patches whose labels are inferred by a model are, like their parent maps, a proposition based on the labeling decisions of the user. In other words, outputs are not the answer. Just like the notes a historian might take about the documents they are reading, MapReader’s outputs constitute an interpretative layer, or ‘screen’ superimposed on the map.^{^[53]} They also resemble a historian's notes in the way they move the scholar one step closer to a domain-specific claim. The inferred label orients the scholar’s attention towards that claim, but the distance between the (potentially) millions of labelled patches and an argument that is recognisable to historians, digital or not, remains to be covered.^{^[54]} For our railspace work, we refined our understanding of the presence of 30.5 million railspace patches by calculating their proximity to hundreds of thousands of residential streets in England, Wales, and Scotland: this gave us something to measure that linked people to the spatial footprint of rail infrastructure. From there, we could analyse different groups based on railspace proximity, census attributes, and prior knowledge from earlier historiography. Crafting this bridge from nationwide patch data to fictional and real-life examples of the experience of living on specific streets heavily impacted by railspace thus combined our domain expertise with the model outputs.

Because they are inferences, these automatically-labled patches are imperfect. Machines make mistakes, but these are based on patterns in the data as seen by the model.^{^[55]} While our railspace experiments had extremely strong performance when evaluated against expert-annotated patches (between 95-99% F1 scores across the different labels) some errors will nevertheless persist.^{^[56]} Another way that MapReader is an iterative process is that often model fine-tuning will surface systematic false positives or false negatives. It was essential to view outputs and scan the regions for patterns in these errors. For railspace, false positives included hatched rock outcroppings along coasts and in the Highlands, as well as drainage ditches in East Anglia. False negatives were usually in hyper urban areas with lots of other infrastructure, urban trams (which we had decided to exclude from our railspace category), and rail tunnels (which we accepted as non-railspace in the end). These discoveries necessitate further annotation, either random or targeted. On the other hand, these mistakes can be fruitful, pointing to cartographic practices that the user might not yet have noticed, or to underappreciated variations in the concept of interest.

Could we have made the same arguments about residential segregation without the railspace dataset? No: first, railspace emerged from our engagement with MapReader; second, there is no other way we could simultaneously explore the hyper local, street-level, city, county, and national data. The continuity of OS data is unique in providing extremely close-up views of a rapidly changing landscape, but we would never have been able to undertake a (multi-)national-scale project without the speed and efficiency of MapReader. Where the historian chooses to take things from here is a separate matter. As Chapter 3 shows, working with nationwide map data surveyed at a large scale allows us to observe spatial patterns at multiple levels. We can make macro-level arguments (e.g. observations at the level of the nation about how people of different socioeconomic classes were situated in relation to the amenities and disamenities of rail), meso-level arguments (e.g. observing regional trends or deviations) as well as using inferences to direct us to localised case studies and further deep archival work. As such the distant view enabled by MapReader is highly supportive of close reading and the traditional archival practices of historical research.

5. Paths forward

While MapReader has been around now for a few years, in many ways it still feels like the beginning. Wood has been working hard to make the code sustainable; Westerling ramped up efforts on a range of documentation; Wood and Lilley have been implementing MapReader in a range of large-scale computing environments in the UK;^{^[57]} and Wilson, Beelen, and McDonough have been learning to work with maptext as research data, and to compare change over time across OS editions. The workshops and community calls organised by McDonough and Wood have cultivated a broad, interdisciplinary group of users and even some early contributors to our code and documentation, such as David Alexander (from the Peak District National Park Authority). Collaborations with peer open-source software for browser-based georeferencing (Allmaps) and manual annotation (in collaboration with Rainer Simon), partnerships with a range of map libraries, and emerging digital research infrastructure support for humanities software in the UK are all shaping MapReader’s future roadmap.^{^[58]} All this goes to show that building sustainable software is a community endeavour.

Historical maps are important, not just for historians and geographers, but for anyone who seeks to understand the built and natural environments of the past. MapReader helps people use these digitised collections responsibly and creatively. While we have showcased the LwM work with OS maps here, projects around the world are using MapReader on other series maps (such as the Sanborn fire insurance maps in the US) and non-serial collections that are much more heterogeneous in style (for example, the GLOBALISE project’s analysis of all Dutch East India Company maps).^{^[59]} MapReader remains extremely well-suited to large-scale topographical or urban maps which were printed as sets of hundreds or thousands of sheets representing continuous or quasi-continuous land. But others have experimented with MapReader on very small collections of maps, making use of the pipeline only up to the point of annotation, and simply annotating all patches in their dataset. This additional flexibility means that MapReader is not only an AI tool: it can be adapted for wholly manual annotation tasks where patches are a useful framework. These capabilities will open doors for historical map interpretation across the disciplines.

We believe that this is an important contribution to the wider developments starting to emerge in computational map studies. The team at EPFL working on the Venice Time Machine and other map collections, the Mapping Inequality team’s work with text on Sanborn maps, and the organisers of the MapText competition (as part of ICDAR’s annual set of competitions to benchmark and improve models for specific scientific tasks) are all making fascinating contributions to methods and data and are enhancing the role of maps in historical and cross-disciplinary research.^{^[60]}

Thinking more broadly about the epistemological underpinnings of MapReader, It would be shortsighted to ignore the further possibilities to use its pipeline with non-historical map imagery. Within the MapReader team Lilley has experimented with patchifying and annotating LiDAR imagery (laser technology that generates 3D models of the Earth’s surface), and we are now exploring using a range of other remote sensing data for further comparison of change over time in environmental features like likely historical hedgerow locations in England. As we learned from the collaboration with SciVision at the Turing Institute, MapReader applications are not even always geospatial, and this openness to experimental applications in biology, for example, has helped us to better understand what models ‘see’, and how there can be a two-way dialogue between the natural sciences and the humanities.

MapReader began life as an experimental opportunity on LwM, and this mindset permeates our ongoing ethos after the end of that project. Directed by McDonough, the team’s aspirations are not just to provide open-source, reproducible code from our own LwM research, but to demonstrate and train others to shift how they think about what we can do with big image data in the humanities. Embroiled in a host of ethical, environmental, and intellectual debates, AI writ large is a daunting challenge of our age. MapReader applies non-generative classification, detection, and recognition models, and we have intentionally limited MapReader to these tasks for the time being. We could not do this work without the assistance of inference, but for both environmental and intellectual reasons we resist the temptation to always apply the latest state-of-the-art generative AI methods. MapReader’s patch classification task, in particular, may be positively naïve from a CV perspective, but this simplicity is its virtue. By relying only on the human annotation of patches for training, rather than those derived from generative AI, we are able to understand, refine and test our classifications - and thereby begin to open the black box. Algee-Hewitt has written of the computational humanities as a chance to ‘use the methods and strategies of computation to put pressure on our received theoretical and historical objects and [to] uncover alternative ways of thinking.’^{^[61]} Patch classification does just this. Detecting maptext also (perhaps unexpectedly) uncovers an entire world of largely ignored cartographic detail. Both tasks require inference to be applied at scale, but in doing so we are indeed using AI to see our sources anew, and to centre historical expertise in data creation and interpretation.

To accomplish all of this, we depended on the radical collaboration at the heart of LwM.^{^[62]} Building MapReader, using it for novel research, and teaching others how to use it has been an exercise in reconceiving what it means to be a historian, a digital humanist, or a data scientist. Writing Github tickets, drafting and reviewing code, annotating patches, emailing librarians, meeting on zoom to debate label guidelines, more meeting on zoom to talk about development priorities, co-writing articles: there is no better evidence than these shared activities of just how much MapReader is the product of the intensive exchange of ideas over years. It may just be a pile of code, json files, and geotiffs, but MapReader has shaped our scholarly and professional trajectories for the better, and we hope it will continue to have a positive impact in DH, History, and beyond for many years to come.

Notes

McDonough planned and wrote the chapter. Ahnert collaborated on chapter scoping, writing and editing; Lawrence contributed to editing; and all authors contributed to MapReader’s conceptualisation, shared reflections used throughout the chapter, and reviewed the final version. After the first author, names are alphabetical. ↑
Key early examples of work applying these methods include Joanna E. Taylor and Ian N. Gregory, Deep Mapping the Literary Lake District A Geographical Text Analysis (Lewisburg: Bucknell University Press, 2022); Patricia Murrieta-Flores and Ian N. Gregory, ‘Further Frontiers in GIS: Extending Spatial Analysis to Textual Sources in Archaeology,’ Open Archaeology 1, no. 1 (2015): 166-75; Jim Clifford, Beatrice Alex, Colin M. Coates, Ewan Klein, and Andrew Watson, ‘Geoparsing History: Locating Commodities in Ten Million Pages of Nineteenth-Century Sources,’ Historical Methods: A Journal of Quantitative and Interdisciplinary History, 49, no. 3 (2016): 115–31; Catherine Porter, Paul Atkinson, and Ian N. Gregory, ‘Space and Time in 100 Million Words: Health and Disease in a Nineteenth-Century Newspaper,’ International Journal of Humanities and Arts Computing 12, no. 2 (2018): 196–216. ↑
Like scale, the lack of methods, infrastructure, underlying data, and models for non-English and low-resource languages also impedes applications in many fields. Earlier work by the LwM team sought to extend geoparsing (largely only applied to English text corpora) to early modern French texts. See Katherine McDonough, Ludovic Moncla, and Matje van de Camp, ‘Named Entity Recognition goes to Old Regime France,’ International Journal of Geographical Information Science 33, no. 12 (2019): 2498–522. ↑
Katherine McDonough and Valeria Vitale, ‘Introduction’, in ‘Maps and Machines: Recent Perspectives on Humanities Research Using AI and Maps’, ed. Katherine McDonough and Valeria Vitale, forum, Imago Mundi 76, no. 2, p. 282. ↑
Scholars have addressed this in terms of text of course, but also for the page as an image. For the latter, see James Dobson and Scott Sanders,’Distant Approaches to the Printed Page’, Digital Studies / Le Champ Numérique 12, no. 1 (2022); Andrew Piper, Chad Wellmon, and Mohamed Cheriet, ‘The Page Image: Towards a Visual History of Digital Documents,’ Book History 23, no. 1 (2020): 365–97. ↑
Kasra Hosseini, Katherine McDonough, Daniel van Strien, Olivia Vane, and Daniel C.S. Wilson, ‘Maps of a Nation? The Digitized Ordnance Survey for New Historical Research’, Journal of Victorian Culture 26 no. 2 (2021): 284–299; Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen, and Katherine McDonough, ‘MapReader: a computer vision pipeline for the semantic exploration of maps at scale’, Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities (2022), pp. 8–19; Rosie Wood, Kasra Hosseini, Kalle Westerling, Andrew Smith, Kaspar Beelen, Daniel C.S. Wilson, Katherine McDonough, ‘MapReader: Open software for the visual analysis of maps,’ Journal of Open Source Software 9, no. 101 (2024): 6434. ↑
Richard Oliver, The Ordnance Survey in the nineteenth century: Maps, money, and the growth of government (London: The Charles Close Society, 2014), p. 505. ↑
For earlier reviews of such work see Anne Kelly Knowles, ‘Emerging Trends in Historical GIS,’ Historical Geography 33 (2005): 7-13. ↑
For classic early examples of using GIS for historical research (published by the company that sells ArcGIS), see Placing History: How Maps, Spatial Data, and GIS Are Changing Historical Scholarship, ed. Anne Kelly Knowles (Redlands: ESRI, 2008). ↑
See, for example, Valeria Vitale, Pau de Soto, Rainer Simon, Elton Barker, Leif Isaksen, Rebecca Kahn,’“Pelagios – Connecting Histories of Place. Part I: Methods and Tools’, International Journal of Humanities and Arts Computing 15, nos. 1–2 (2021): 5–32; Rombert Stapel and Ivo Zandhuis, ‘Linked Data for modelling and replicating the knowledge production process in data-driven humanities research’, Digital Scholarship in the Humanities 40, supp. 1 (2025): i100–i107; and Niamh NicGhabhann Coleman, Zenobie Garrett, and Frances Kane, ‘“That’s a Powerful Map”: Shared Authority, Public Engagement, and the Archives of the First Ordnance Survey of Ireland’, Journal of Historical Geography 88 (June 2025): 65–73. ↑
For further discussion of the project’s collaborative practice, see Ruth Ahnert, Emma Griffin, Mia Ridge, and Giorgia Tolfo, Collaborative Historical Research in the Age of Big Data: Lessons from an Interdisciplinary Project (Cambridge: Cambridge University Press, 2023). ↑
On the shift to multimodal analysis in DH see Thomas Smits, Melvin Wevers, ‘A multimodal turn in Digital Humanities: Using contrastive machine learning models to explore, enrich, and analyze digital visual historical collections,’ Digital Scholarship in the Humanities 38, no. 3 (2023): 1267-1280; and Taylor Arnold and Lauren Tilton, Distant Viewing (Cambridge, MA: MIT Press, 2023). ↑
Katherine McDonough, ‘Maps as Data,’ in Computational Humanities, ed. by Lauren Tilton, David Mimno, and Jessica Marie Johnson (Minneapolis: Minnesota University Press, 2024), 99-126. ↑
https://openmapsmeeting.nl/↑
See, for example, Matthew Lincoln, Julia Corrin, Emily Davis, and Scott B. Weingart, ‘CAMPI: computer-aided metadata generation for photo archives initiative’ (2020).; Taylor Arnold and Lauren Tilton, ‘Distant Viewing: Analyzing Large Visual Corpora,’ Digital Scholarship in the Humanities 34, Supplement 1 (2019): i3–16. More recently: Leonardo Impett and Fabian Offert, ‘There Is a Digital Art History,’ Visual Resources 38, no. 2 (2022): 186–209 and Arnold and Tilton, Distant Viewing (2023). As an exception, see the wide-ranging studies in Kevin Kee and Timothy Compeau, Seeing the Past with Computers: Experiments with Augmented Reality and Computer Vision for History (University of Michigan Press, 2019). ↑
Yao-Yi Chiang and Craig A. Knoblock, ‘Recognizing Text in Raster Maps,’ Geoinformatica 19, no. 1 (2015): 1–27; Johannes H. Uhl, Stefan Leyk, Yao-Yi Chiang, and Craig A. Knoblock, ‘Towards the automated large-scale reconstruction of past road networks from historical maps,’ Computers, environment and urban systems 94 (2022): 101794; Johannes H. Uhl, Stefan Leyk, Yao-Yi Chiang, Weiwei Duan, and Craig A. Knoblock, ‘Automated extraction of human settlement patterns from historical topographic map series using weakly supervised convolutional neural networks,’ IEEE Access 8 (2019): 6978-6996; Yao-Yi Chiang, Weiwei Duan, Stefan Leyk, Johannes H. Uhl, and Craig A. Knoblock, Using historical maps in scientific studies: Applications, Challenges, and Best Practices (Springer, 2020). ↑
On HGIS scholarship in general see Jordi Martí-Henneberg, ‘Geographical Information Systems and the Study of History,’ The Journal of Interdisciplinary History 42, no. 1 (2011): 1–13; and Anne Kelly Knowles, ‘Historical Geographic Information Systems and Social Science History,’ Social Science History 40, no. 4 (2016): 741–50. ↑
Isabella di Lenardo and Frédéric Kaplan, ‘Venice Time Machine: Recreating the density of the past’, Digital Humanities Conference, Sydney, 2015 (Abstract); Frédéric Kaplan and Isabelle de Lenardo, ‘Building a Mirror World for Venice,’ in The Aura in the Age of Digital Materiality: Rethinking Preservation in the Shadow of an Uncertain Future (Factum Foundation, 2020), pp. 197-201; Frédéric Kaplan and Isabella di Lenardo, ‘The Advent of the 4D Mirror World’, Urban Planning 5, no. 2 (2020): 307–10; https://doi.org/10.17645/up.v5i2.3133.Beatrice Vaienti, Rémi Petitpierre, Isabelle di Lenardo, Frédéric Kaplan, ‘Machine-Learning-Enhanced Procedural Modeling for 4D Historical Cities Reconstruction’, Remote Sensing 15, no. 13 (2023): 3352. For Time Machine projects in general, see https://www.timemachine.eu/. For an additional, and recent, approach for Paris, see the ANR project SoDuCo (‘Social Dynamics in Urban Context: Open tools, models, and data – Paris and its suburbs, 1789-1950’), https://soduco.geohistoricaldata.org/. SoDuCo published their datasets openly (https://nakala.fr/collection/10.34847/nkl.abe0gxah): for example, ‘Annuaires Historiques Parisiens, 1798-1914. Extraction Structurée Et Géolocalisée à l'Adresse Des Listes Nominatives Par Ordre Alphabétique Et Par Activité Dans Les Volumes Numérisés’. NAKALA - https://nakala.fr (Huma-Num - CNRS), 2023. https://doi.org/10.34847/NKL.98EEM49T. See also https://github.com/soduco. ↑
Sofia Ares Oliveira, Isabelle di Lenardo, Bastien Tourenc, Frédéric Kaplan,’“A Deep Learning Approach to Cadastral Computing’, (Presentation, ADHO Digital Humanities Conference, 2019); Rémi Petitpierre, Frédéric Kaplan, Isabella di Lenardo, ‘Generic Semantic Segmentation of Historical Maps’, CHR 2021: Computational Humanities Research Conference Proceedings (November 17, 2021): 228-248; Remi Petitpierre and Paul Guhennec, ‘Effective Annotation for the Automatic Vectorization of Cadastral Maps’, Digital Scholarship in the Humanities 38, no. 3 (2023): 1227–37. ↑
Matthew H. Edney, ‘Academic Cartography, Internal Map History, and the Critical Study of Mapping Processes,’ Imago Mundi 66, no. sup1 (2014): 83–106; Matthew H. Edney, ‘JB Harley (1932-1991): Questioning Maps, Questioning Cartography, Questioning Cartographers’, Cartography and Geographic Information Systems 19, no. 3 (1992): 175–78. ↑
On these challenges and possible solutions, see ‘The Endings Project’ and its resources, https://endings.uvic.ca/about.html. ↑
Matthew H. Edney, Cartography: The Ideal and Its History (Chicago: University of Chicago Press, 2019); see also in general J.B. Harley’s work, collected in The New Nature of Maps: Essays in the History of Cartography (Baltimore: Johns Hopkins University Press, 2002). ↑
For an accessible introduction to Jupyter notebooks, see Quinn Dombrowski, Tassie Gniady, and David Kloster, ‘Introduction to Jupyter Notebooks’, The Programming Historian (2019; updated 2025), https://doi.org/10.46430/phen0087. ↑
The Turing Way Community, The Turing Way: A handbook for reproducible, ethical and collaborative research, Version 1.2.3 (2025), https://doi.org/10.5281/zenodo.15213042 ↑
Zotero, Tropy, and Omeka and other (now) Digital Scholar-based software (https://digitalscholar.org/) for example, are key examples of open-source humanities software. Sean Takats, ‘Zotero’ (2006). We also looked to work like Johanna Drucker, ‘Sustainability and Complexity: Knowledge and Authority in the Digital Humanities,’ Digital Scholarship in the Humanities 36, no. Supplement_2 (2021): ii86–94 and Giles Bergel, Pip Willcox, Guyda Armstrong, James Baker, Arianna Ciula, Nicholas Cole, Julianne Nyhan, et al. ‘Sustaining Digital Humanities in the UK,’ Zenodo, September 25, 2020. ↑
Our Contributors’ Code of Conduct is available at https://github.com/maps-as-data/MapReader/blob/main/CODE_OF_CONDUCT.md. The MapReader team is listed at https://github.com/maps-as-data/MapReader/blob/main/contributors.md. The entire community congregates on our Slack Workspace. Join us at https://join.slack.com/t/mapreader-workspace/shared_invite/zt-390jka9tt-yeQLfDbE6nP_8jiQb34iug. ↑
Olivia Vane, ‘Macromap: An interactive “small multiples” visualisation for historical map collections’, last modified November 21, 2021, https://observablehq.com/@oliviafvane/macromap; Olivia Vane, ‘Animated Map Chronologies’ [collection], last modified April 11, 2023, https://observablehq.com/@oliviafvane/animated-map-chronologies?collection=@oliviafvane/map-animations-to-publish, and specifically the ‘OS 6”-to-the-mile, Great Britain: 2nd Ed’ notebook. ↑
Chris Fleet, ‘Creating, managing, and maximising the potential of large online georeferenced map layers’, e-Perimetron 14, no. 3 (2019); Chris Fleet, ‘Understanding user needs: a case study from the National Library of Scotland’, Digital Preservation Coalition Technology Watch Guidance Note (2022). ↑
Full acknowledgements of underlying images and metadata are available at https://observablehq.com/@oliviafvane/animated-map-chronologies?collection=@oliviafvane/map-animations-to-publish. ↑
For a more detailed account of these steps, see Documentation and Tutorials at: https://mapreader.readthedocs.io/en/latest/. ↑
The International Image Interoperability Framework (IIIF) is a community-driven set of international open standards that enables cultural institutions to make their digital image and audio-visual resources more accessible and usable online. It uses standardized Application Programming Interfaces (APIs) and data formats for creating interoperable viewing experiences. By providing a common digital language, IIIF allows users to seamlessly access, view, and work with digital objects from different collections worldwide. The IIIF Maps Community group is working specifically on improving features for presenting, annotating, and reusing digitised map collections as IIIF resources: https://iiif.io/community/groups/maps/. Contributions like the Georeference extension are enabling a range of new tools. See, for example, Martijn Meijers and Jules Schoonman, ‘Mapping the Edge: A Novel Approach to Georeferencing Historical Map Series,’ e-Perimetron 20, no. 1 (2025), pp. 1-51. ↑
Details on how predicted labels are saved are available in the MapReader documentation: https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/4-classify/infer.html#add-predictions-to-metadata-and-save. ↑
Katherine McDonough, Kaspar Beelen, Daniel C.S. Wilson, and Rosie Wood,’Reading Maps at a Distance: Texts on Maps as New Historical Data’, Imago Mundi 76, no. 2 (2024), pp. 296–307. ↑
Cite https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2025.1443882/full and https://github.com/alan-turing-institute/scivision ↑
Evangeline Corcoran, Kasra Hosseini, Laura Siles, Smita Kurup, and Sebastian Ahnert, ‘Automated dynamic phenotyping of whole oilseed rape (Brassica napus) plants from images collected under controlled conditions,’ Frontiers in Plant Science 16 (2025): 1443882. ↑
Daniel Wilson, “Finding Words in Maps,” Living with Machines (blog), July 30, 2019, https://livingwithmachines.ac.uk/finding-words-in-maps/; Olivia Vane, ‘Finding Words in Maps part 2: seeing the results’, Living with Machines (blog), August 22, 2019, https://livingwithmachines.ac.uk/finding-words-in-maps-part-2-seeing-the-results/. Our code from these experiments can be found at https://github.com/jamespjh/strabo-text-recognition-deep-learning and is forked from Spatial-Computing/Strabo-Text-Recognition-Deep-Learning, C++, February 14, 2018; USC Spatial Computing Lab, released June 10, 2023, https://github.com/spatial-computing/strabo-text-recognition-deep-learning. Strabo is based on underlying research from Yao-Yi Chiang and Craig A. Knoblock, ‘Recognizing Text in Raster Maps,’ Geoinformatica 19, no. 1 (2015): 1–27. ↑
‘Machines Reading Maps: Finding and Understanding Text on Maps’ (AH/V009400/1), https://gtr.ukri.org/projects?ref=AH%2FV009400%2F1 ↑
MRM data and metadata for the Rumsey collection is openly available at https://purl.stanford.edu/rc349kh8402. Individual deposits include: Yao-Yi Chiang, Katherine McDonough, David Rumsey, and Zekun Li, ‘David Rumsey Map Collection Text on Maps Data [Version 1],’ (Stanford: SDR, 2023), https://doi.org/10.25740/vn901vj0926 ; Yao-Yi Chiang, Katherine McDonough, David Rumsey, Zekun Li, ‘David Rumsey Map Collection Text on Maps Data [Version 2],’ (Stanford: SDR, 2023), https://doi.org/10.25740/wc461hp2261. Metadata for this subset of the collection is also available in the same SDR collection: ‘David Rumsey Map Collection: Georeferenced Maps Metadata,’ (Stanford: SDR, 2023), https://doi.org/10.25740/ss311gz1992. The underlying research for mapKurator has been published as Yijun Lin and Yao-Yi Chiang, ‘Hyper-local deformable transformers for text spotting on historical maps,’ in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2024), pp. 5387-5397. On annotating maptext, see Valeria Vitale, ‘Map Text Annotation Guidelines’ for Recogito, available at: https://github.com/machines-reading-maps/Tutorials-Newsletters/wiki/Map-Text-Annotation-Guidelines ↑
Work from two projects in progress has recently been published: Leon van Wissen, Manjusha Kuruppath, Lodewijk Petram, ‘Unlocking the Research Potential of Early Modern Dutch Maps’, European Journal of Geography 16, no. 1 (2025): s12–17 and Robert K. Nelson, ‘Mapping Environmental Inequalities: Using MapReader to Uncover Mid-Century Industrial Burdens’, Imago Mundi 76, no. 2 (2024): 284–90. ↑
McDonough et al, ‘Reading Maps at a Distance’. ↑
The CAMPOP project previously had deposited static snapshots of railway tracks as vector data for 1851, 1861, and 1881 on the UK Data Service. See Chapter 3, note 42 for details. As of writing, these are no longer available. For a reference to this data and its origins, see Dan Bogart, Xuesheng You, Eduard J. Alvarez-Palau, Max Satchell, and Leigh Shaw-Taylor, ‘Railways, divergence, and structural change in 19th century England and Wales’, Journal of Urban Economics 128 (2022), pp. 103390. ↑
Phillip C. Muehrcke, Map Use: Reading Analysis and Interpretation, 2nd ed. (Madison, WI: JP Publications, 1986), p. vii. ↑
See previously cited works by Edney and Harley, but also the History of Cartography project at the University of Wisconsin-Madison: J.B. Harley and David Woodward, eds., The History of Cartography, 6 vols., (Chicago: University of Chicago Press, 1987-2019). ↑
Denis Wood, with John Fels and John Krygier, Rethinking the Power of Maps (New York: The Guilford Press, 2010), p. 39. ↑
See McDonough et al, ‘Reading Maps at a Distance’. ↑
For further discussion, see McDonough, ‘Maps as Data’. ↑
Stephen Ramsay, Reading Machines: Toward an Algorithmic Criticism (Champaign, IL: University of Illinois Press, 2011), p.10 ↑
Ramsay, Reading Machines, p. 32. ↑
Mark Algee-Hewitt, ‘Computing Criticism: Humanities Concepts and Digital Methods,’ in Computational Humanities, pp. 18-44 (p. 35). ↑
Algee-Hewitt, ‘Computing Criticism,’ p. 35. ↑
See for example, Robert Schwartz, Ian Gregory, and Thomas Thevenin, ‘Spatial History: Railways, Uneven Development, and Population Change in France and Great Britain, 1850–1914,’ Journal of Interdisciplinary History 42 (2011): 53-88. ↑
We have found that collective annotation is useful to establish basic guidelines, when the downstream analysis of MapReader output will involve more than one scholar. However, it was better for 1 person to annotate the final training data set to ensure that there were not conflicting versions of each label due to even minor inter-annotator disagreement. ↑
Hannah Ringler adopts ‘interpretive screen’ when discussing Rockwell and Sinclair, Hermeneutica: Computer -Assisted Interpretation in the Humanities (Cambridge, MA: MIT Press, 2016). Ringler, “Computation and Hermeneutics: Why We Still Need Interpretation to Be by (Computational) Humanists,” in Computational Humanities, pp. 3-17, (p. 6). ↑
One might look for inspiration on next steps to the process that Jo Guldi outlines in her article ‘Critical search: A procedure for guided reading in large-scale textual corpora’, Journal of Cultural Analytics 3, no. 1 (2018). ↑
These image classification models in particular, we have found, don’t always ‘pay attention’ to the areas of a patch that a human annotator might assume as most important. Using occlusion analysis and confidence scores side by side, we can study a model’s decision. ↑
Hosseini et al, ‘MapReader’, p. 6. F1 Scores capture a harmonic mean between precision and recall, expressed here as 2 x ‘True Positive’ patches divided by 2 x the sum of ‘True Positive’, ‘False Positive’, and ‘False Negative’ patches. ↑
Using MapReader in HPC environments fulfils the prophecy of Quinn Dombrowski, Tassie Gniady, David Kloster, Megan Meredith-Lobay, Jeffrey Tharsen, and Lee Zickel, in their chapter ‘Voices from the Server Room: Humanists in High-Performance Computing,’ in Computational Humanities (MIT Press, 2024), 248. ↑
https://allmaps.org/, https://annotorious.github.io/, Bell, Eamonn, Karina Rodriguez Echavarria, and Jeyan Thiyagalingam. “CCP-AHC Roadmap Open Draft”. Zenodo, September 11, 2025. https://doi.org/10.5281/zenodo.17099176. ↑
Leon van Wissen, “Unlocking the Research Potential of Early Modern Dutch Maps’. ↑
Among other works from the EPFL team, see Beatrice Vaienti, Isabelle di Lenardo, and Frédéric Kaplan, ‘Exploring Cartographic Genealogies through Deformation Analysis: Case Studies on Ancient Maps and Synthetic Data’, Cartography and Geographic Information Science 52, no. 5 (2025): 557–77; on Sanborn maps see Rob Nelson, ‘Mapping Environmental Inequalities’; on the MapText competitions see Joseph Chazalon, ‘ICDAR 2024 Competition on Historical Map Text Detection, Recognition, and Linking’, International Conference on Document Analysis and Recognition (ICDAR), Athens, Greece, September 4, 2024; Yijun Lin, Solenn Tual, Zekun Li, Leeje Jang, Yao-Yi Chiang, et al., ‘ICDAR 2025 Competition on Historical Map Text Detection, Recognition, and Linking’, International Conference on Document Analysis and Recognition (ICDAR), Sep 2025, Wuhan, China. pp.568-585. ↑
Algee-Hewitt, ‘Computing Criticism’, p. 36. ↑
Ahnert et al, Collaborative Historical Research in the Age of Big Data. ↑

Early access

Show the following:

Adjust appearance:

Notes

MapReader: Software and Principles for Computational Map Studies

1. Introduction

2. An epistemological shift in humanities software design

2.1 Process and Pipeline

3. Software communities and expanding the pipeline

4. Using MapReader: an epistemological shift for historical research

5. Paths forward

Notes

Annotate