Skip to main content

Introduction: Introduction: Working with Machines

Introduction
Introduction: Working with Machines
    • Notifications
    • Privacy
  • Project HomeLiving with Machines
  • Projects
  • Learn more about Manifold

Notes

Show the following:

  • Annotations
  • Resources
Search within:

Adjust appearance:

  • font
    Font style
  • color scheme
  • Margins
table of contents
  1. Introduction: Working with Machines
    1. The prehistory of digital history
    2. Victorian Data
    3. A new research paradigm?
    4. Writing the Industrial Revolution

Introduction: Working with Machines

Ruth Ahnert, Emma Griffin, and Jon Lawrence

This chapter was co-conceived and co-written by all authors.

In the final year of our five-year, multi-disciplinary project ‘Living with Machines’, Open AI launched ChatGPT, its open access pre-trained, generative language model. AI had arrived in the consciousness of the general public. Within months, the world had become gripped by the question of whether machines were about to make us all redundant, and, in more doom-laden scenarios, whether humanity itself was about to be overthrown by a new generation of super-intelligent machines.[1] There is nothing new about a technological innovation generating wild speculation about the future of society, including radical predictions of a future where no one would need to work (in the optimists’ telling), or (in more apocalyptic versions), a future where most people would cease to have either purpose or any reliable source of remuneration. Similar things had been said in the nineteenth century about the rise of machines to replace hand labour, and in the twentieth century about first mass production methods, and, later in the century, about the rollout of robots and other automated manufacturing processes. In the 1970s and 1980s, social scientists debated the coming post-employment society that would be ushered in by automation and the off-shoring of industry.[2] They were right to anticipate the devastating social consequences that would be wrought by deindustrialisation, especially in political systems wedded to the idea that markets would provide their own solutions, but worklessness did not become the new norm.[3] We cannot know what changes the AI revolution will bring. They may yet prove to be more radical than any preceding technological transformations, but it is important to remember that much of the hyperbole about AI has been driven by people within the industry who have every reason, both professionally and commercially, to play up the transformative power of their products.[4]

The impact of these technological changes on society have been described as the ‘fourth industrial revolution’.[5] That label makes clear the connection of our current economic moment to the coming of the machine age in the nineteenth century, which came to be known as the ‘industrial revolution’. But it is not simply the terminology of ‘industrial revolution’ that links now and then. There is, in addition, a continuity in the cultural anxiety that accompanies technological change. If we turn from today’s anxiety about AI bots to nineteenth-century commentary about the steam engine, we encounter a familiar set of fears about the dangers of new technology. William Blake’s evocative lines about the ‘dark Satanic Mills’; or William Wordsworth’s critique of the ‘outrage done to nature’ by the growth of urban industry;[6] or the industrial novels of Benjamin Disraeli, Elizabeth Gaskell, Charles Dickens, and from later in the century, George Gissing,[7] all provide echoes of the concern that technological progress carries a high social cost. Furthermore, the specific anxiety was often expressed that mechanisation – through some combination of deskilling, downward pressure on wages, and unemployment – undermined the position of those who work for a living. As John Stuart Mill wrote, society’s mechanical inventions had done no more than ‘enable a greater proportion to live the same life of drudgery and imprisonment’.[8] For Victorian commentators, the conclusion was that technological progress upset the social order and posed a particular threat for those who worked for a living.

It was not simply nineteenth-century contemporaries who voiced concern about the impact of mechanisation on Victorian culture and society. In fact, generations of historians have also puzzled over this problem. Indeed, it would be hard to exaggerate the extent of scholarly interest in the social consequences of that suite of economic changes we label ‘industrialisation’. As the academic discipline of history took shape in the early twentieth century, writers returned frequently to the theme of mechanisation and its (deleterious) impact on the labouring poor.[9] The emergence of a left-leaning social history in the post-war decades picked up this historical problem and fed into one of the most influential history books of the twentieth century, E.P. Thompson’s, The Making of the English Working Class.[10] As social history bifurcated into economic history on the one hand and cultural history on the other in the following decades, the impact of the industrial revolution on living standards remained a bread-and-butter topic in British history and a focus for research by a variety of different methods and means - both quantitative[11] and qualitative.[12] As such, our own work fits into a rich vein of scholarly thinking about the social adaptation required by the coming of the machine age.

This book, however, is not concerned simply with tracing the parallels between these two historical moments of technological change. Rather, it seeks to bring them into conversation in much more active ways. What follows is a computationally-intensive approach to the study of British society and culture in the era of industrialisation, bringing the technological innovations of AI and data science to the study of our past. The project underpinning the research for this volume is entitled Living with Machines. Our project shares its name with a pamphlet published in 1933 by the sociologist, William Ogburn, in which he advanced his theory of ‘cultural lag’: the contention that inventions outstrip societies’ ability to understand and evaluate them. This is certainly the case for nineteenth-century Britain. One consequence of industrialisation in Great Britain was an explosion in the creation and collection of data. Initiatives such as the decennial population census (established 1801) and the Ordnance Survey (also 1801, though with earlier precedents), had their roots in the British state’s response to war and the threat of invasion (it’s called the ‘ordnance’ survey for a reason).[13] Over time, both evolved to become central planks of the emerging information state as they collected more and more detailed information about people and places.[14] They became a way of knowing, and potentially controlling, an increasingly complex and fluid civil society.

At the same time, an ‘information society’ emerged in tandem with this process, driven above all by the explosion in newspaper publishing across the nineteenth-century - itself the result of technological innovations reinforced by the elimination of punitive government taxes and rising living standards. In 1851 there were 365 newspapers in Britain, with London representing more than a third of the total, by the late 1880s that figure stood at over 2,000 separate titles, including 1,366 provincial newspapers.[15] In short, industrialisation produced a more complex society, one busily generating ever more records and documents about itself. By the century’s end, industrial Britain had created vastly more information about itself than individuals could realistically read, analyse and fully understand even in multiple lifetimes. This is the paradox that Ogborn identified, in which an increase of information goes in hand with a decrease of understanding.

We are now in a position to begin actively engaging with this mass of information. Large swathes of the information curated by Victorian society and state has been digitised in recent decades, including household census returns from 1851, thousands of sheets of Ordnance Survey maps, and billions of words from Victorian newspapers. Indeed, these digitised sources represent the bedrock of the present study. By shifting these documentary sources into machine-readable formats, libraries and archives have created the opportunity to know the past anew by mobilising the power of data science. The rapid pace of advancement in the field of AI and data-driven fields of research means that there is a genuine opportunity to ask a whole new set of research questions, which could in turn be answered on a more empirically grounded and potentially more representative basis.

Despite the clear promise presented by the parallel availability of data and technology, so far only modest inroads have been made in this space. The primary obstacle, to date, has been the mismatch between the strengths and skills of the traditionally trained historian and those required to analyse and understand data at scale. Although the concept of digital history had entered the historians’ lexicon by the early 2000s, it remains the case that few historians today have the training to work at scale with the huge collections that the Victorians created. This means there remains much to learn about the human, social, and cultural consequences of this historical moment.

One answer to that obstacle is collaboration. The Living with Machines project brought together an unusually large and diverse team – data scientists, software engineers, historians, digital humanists, curators, library professionals, computational linguists, literary critics, and an urban geographer – with the aim of undertaking research that at once offered a data-driven approach to history and a human-centred approach to data science. As such the project was an experiment in a new research paradigm. Very literally, this project was about humanists living with machines: working with computers, with methods from machine learning, and with colleagues from computational backgrounds. But there are few models available for how such interdisciplinary research should be undertaken, so an important part of the project was doing the work of establishing a meaningful and productive exchange between the different disciplines and professional contexts. This kind of exchange does not just happen, but requires a proactive approach precisely because of the way that it brings together individuals from different research and professional cultures with different expectations about how to work, and how to disseminate research findings. In order that this labour not be rendered invisible, the project sought to share its experiences and recommendations in a short book, Collaborative Historical Research in the Age of Big Data: Lessons from an Interdisciplinary Project.[16] One of the key principles that the authors draw attention in those pages is that none of the represented groups or sectors should be in service of the others. Rather, we sought to create something that was greater than the disciplinary parts, building on the expertise, skills, and experiences of the whole team.

The diverse set of skills means that the project has been able to make interventions of various kinds, many of which might be unfamiliar to historians. It has developed means to make the digitised collections of historical documents research-ready by processing them into more accessible formats; by releasing new contextualised datasets so that users can better understand the contours of what has been digitised; and by curating new derived or sample datasets from behind paywalls to allow others to reproduce and build on our results. We have developed new tools and methods for, amongst many other things: computationally ‘reading’ maps and identifying features such as rail and buildings (or anything else with a distinctive visual form); extracting place names (toponyms) from text and geolocating them; linking people across census returns; linking multiple datasets by location – such as maps, census returns, geolocated streets and stations – allowing for multidimensional analysis of changes to communities; and analysing the ways that that language evolved to describe new technological realities, specifically developing new frameworks for annotating word meaning in historical texts, and new algorithms for identifying changes in word meaning over time and across space. These contributions can be conceptualised as the foundations on which this book is built. The rigour with which they have been developed and tested means that we have prepared secure foundations for the historical interrogations that build upon them. The book that follows acts as a capstone: in showing what is now possible thanks to the data preparation, methods and tool development, we wish to drive credit back to the undergirding acts of labour that are required to tell structurally sound digital histories. But the edifice that we have built is also just a model of what it is possible to build in the future from those foundations; we imagine larger more ambitious structures. Each chapter represents the first foray into the opportunities opened up by this work, and we lay out blueprints for how others might continue to build on these foundations.

The prehistory of digital history

Ours is by no means the first attempt to do data-driven history, and as such our project fits within a more established research paradigm: that of digital humanities. While digital humanities (DH) and its precursor, humanities computing, seemed like fringe activities for many years, they are now institutionally recognised with whole departments dedicated to such work, as well as undergraduate and graduate programmes, centres and institutes in many higher education institutions. Willard McCarty and others have sketched the longer history of humanities computing back to the now well-worn origin story of Josephine Miles and Roberto Busa’s collaboration with IBM on an exhaustive concordance of the writings of St. Thomas Aquinas, the Index Thomisticus, in the 1940s.[17] However there was a step-change in the potential of computational approaches to the humanities around the turn of the millennium when many western nations embarked on ambitious programmes to digitise their national heritage collections, and computers and the internet entered the mainstream.

Admittedly, the task of defining DH has become something of a joke within the community as a result of the breadth and diversity of source materials and approaches that are grouped within its capacious reach. Whole volumes have been dedicated to the task of defining the field and assessing its status.[18] This task of definition may grow increasingly difficult as different constituencies within the broad DH community have opted for labels such as ‘computational humanities’, and ‘cultural analytics’, each pushing different agendas with regards technical innovation as opposed to accessibility within the humanities. Yet if digital humanities as a disciplinary field is diffuse and so protean that definition is difficult, the reverse seems to be the case with respect to digital history. Historians have generally been slower to embrace digital methods than other humanities disciplines, and the consequence is a field that is narrow, tightly focused, and highly specialist.

That said, some historians did embrace the first flush of enthusiasm for digital humanities. The 2010s saw the appearance of history articles that addressed the ‘digital turn’[19], touched on the ‘Promise of Digital History’[20], and encouraged historians to ‘confront the digital’[21], and the field has produced a steady stream of high quality research ever since. Yet this work has not succeeded in introducing change in the ways in which the broader historical community goes about its work. Digital history has developed along specialist and highly technical tracks, largely published in its own specialist journals.[22] Jo Guldi writes that digital history, and specifically text mining for historians, ‘took a long time to develop as a field, although it is now maturing at great strides. As of the past three years, digital history can boast three journals; and there are dozens of journal articles in the field of history. In 2020, Luke Blaxill became the first author to publish a historical monograph—The War of Words—that used text mining as its principal methodology’.[23] The development of digital history thus forms a marked contrast to the field of history more generally, where analogue styles of research, based on the close reading of texts (broadly defined), are vastly more prevalent than digital approaches, and the monograph and the lone scholar remain the dominant model. This situation was recently lamented by Luke Blaxill himself in a thoughtful intervention calling on historians wedded to traditional methods to engage more directly with the radical challenge posed by digital methods.[24]

Indeed, it is not simply that the field of digital history today looks rather different from its digital humanities counterpart. It arguably has a different origin story as well. After all, history as a discipline has a strong tradition of quantitative work. One thinks, for example, of the new field of ‘cliometrics’, which emerged in the late 1950s and early 1960s quickly gaining traction over the American Economic History Association and its influential publication, the Journal of Economic History. At around the same time, the Cambridge Group for the History of Population and Social Structure, founded in 1964 by E. A. Wrigley and Peter Laslett, pioneered the application of quantitative methods to historical demography in the UK. Both cliometrics and historical demography produced work that might now retrospectively be written into the prehistory of digital. The Population Group’s work on parish register data, for example, was founded upon efforts to transcribe the registers and convert them into a machine-readable format so that they could be analysed at scale by researchers. Furthermore, cliometrics and the Cambridge Group formed just two elements of the much wider field of economic and quantitative social history which in the second half of the twentieth century developed institutional expression in standalone university departments running their own degree programmes, as well as multiple learned societies and journals.

But there have also been pull effects dragging quantitative historians away from history as a discipline. The issues are both reputational and disciplinary. Especially in North America, cliometrics suffered from the controversy surrounding the claims about American slavery advanced by Robert Fogel and Stanley Engerman in Time on the Cross (1974).[25] They argued not only that slavery had been an efficient economic system in the context of southern plantation cropping, but also, on flimsier quantitative evidence, that slaves themselves had generally been well-treated as part of the economic bargain to maximise productivity. Using numbers to minimise the moral evil of human slavery brought fierce criticism not just for the authors, but for the entire field of cliometrics.[26] But more pragmatically, as econometric methods became increasingly mathematical, many of its practitioners gravitated towards Economics faculties where they essentially conducted research in applied economics focused on testing economic models on historical datasets.[27] In turn, historical demographers have felt the disciplinary pull of historical geography and the applied social sciences (where demography is an important sub-discipline). Indeed, for many years the Cambridge Population Group was based in the Department of Geography rather than History (although it is now co-located). Since the 1990s, university restructuring has seen the absorption of most free-standing departments of economic and social history into larger disciplinary units of History or even combined humanities. Learned societies and specialist journals continue to fly the flag for quantitative methods in history, but these developments, plus the emergence of cultural history as a major field of scholarship since the 1990s, have weakened the influence of quantitative methods among historians just as the possibilities of new digital methods have come into view.[28]

In their 2014 History Manifesto, Jo Guldi and David Armitage called for a return to quantitative methods and longue durée history in response to the massive increase in the availability of data at scale.[29] Their call, however, was met with a surprisingly fierce backlash. This probably owed less to the disciplinary clashes over cliometrics in the 1970s, than to disciplinary suspicions that the tone of their manifesto echoed the grand claims for history as a social-science that had been fashionable before the cultural turn of the 1990s. The great champions of historical sociology, such as Barrington Moore, Charles Tilly and Theda Skocpol, generally focussed on the transformation of state and societal systems, rather than on quantitative methods, but their broad claims about historical processes have nonetheless often seemed at odds with the richly contextualised, humanistic and source-led methods of historians. Refighting old disciplinary battles over Armitage and Guldi’s bold intervention too easily obscured their larger message: that history stood on the brink of new, transformative possibilities thanks to the power of digital methods, if those methods were well-used.

Much of the scepticism about digital history has stemmed from historians’ suspicion that advocates of quantitative methods and big data have disregarded the need for painstaking source criticism. All too often, it has been easy to throw the old computer science adage of ‘garbage in, garbage out’ at headline-grabbing examples of what purports to be data-driven history. But there is a distinction to be made between humanities scholars engaging with data-driven methods, and the kinds of sources and questions typically studied by historians and literary critics being ‘scooped up by quantitative disciplines’.[30] An example of the latter is the now infamous 2011 ‘culturomics’ article written by a team led by evolutionary biologists in cooperation with Google, which analysed millions of digitised books and introduced the Ngram viewer to the world.[31] Published in the high-profile journal Science, this article used the Google Books corpus to trace correlations between word usage frequency and known cultural trends (e.g. the usage frequency of the word ‘slavery’ in the corpus of books rose steadily across the nineteenth century peaking first during the American Civil War (1861-65) and again a century later during the civil rights movement (1955-68)). It also provided insights into the evolution of language (e.g. the rate of regularisation of verbs). However, the problems of the Google Books corpus for nuanced linguistic analysis are by now well documented, including the impact of optical character recognition (OCR) errors, the over-representation of scientific literature, messy metadata, the equal weight assigned to each book regardless of its literary or commercial impact, and the compounded bias of aggregated source libraries.[32]

Despite greater understanding of the pitfalls of this dataset it continues to be used in data-driven work. A 2019 article in Nature Human Behaviour used sentiment analysis on this corpus to plot historical well-being over more than two centuries in the United Kingdom, United States, Germany and Italy.[33] Perhaps seeking to address the well-rehearsed critiques, they ‘corroborate’ their results against alternative indices derived from independent corpora including the ‘Find My Past’ data from the British Library’s ‘British Newspaper Archive’, which we have also used on this project (although, as we discuss in Chapter 1, our aim has been to explore newspapers’ hidden biases, rather than assume that problems of context and representativeness can simply be ignored if the corpus is large enough). The authors concluded that the data collected from books and newspapers is a reliable guide to the national mood because the plots correlated with key historic moments such as the American Civil War, the Wall Street Crash and Britain’s ‘Winter of Discontent’ (1979). But bold claims such as the British were happiest in 1880 necessarily ignore key issues including the gradual democratisation of print culture in Victorian Britain and the emergence of a more sensationalist style of ‘new journalism’ at exactly this moment. Such claims need an interrogation that the authors cannot give. Instead, they offer the caveat that ‘Caution is needed when considering any long-run socioeconomic data. In all cases, there is a need for what historians call a “close read” of the historical data.’[34] But if a historian had been on the research team they would likely have done more than offer the data a “close read”, they may have problematized the team’s basic research paradigm, in the process probably undercutting the ‘big find’ status that has surrounded the reporting of this sort of data history project.[35]

The effect of such work is to rip digitised documents from their historical context. This, then, is not digital history. Echoing the rallying cry of Guldi in her recent book The Dangerous Art of Text Mining, we believe that there is a need to clarify and codify the distinction between good and bad digital methods in their application to historical data, and that historians themselves must lead the way, rather than letting data scientists loose on this material in the absence of domain expertise and contextual knowledge.[36] One of the ways that we have sought to do this on Living with Machines was to develop new critical techniques and tools for contextualising and evaluating data at scale. This is the central goal of the digital ‘environmental scan’ method we developed for understanding hidden biases in large corpora of digitised newspapers.[37] But, as we discuss in Chapter 1, the same logic can and must be applied across digital methods in history. We would contend that the arguments for working in this way are not merely relevant to the needs and professional standards of historians trained to be attuned to the complexities, incompleteness, and biases of their sources; they are also just good practice in data science, although still rarely undertaken in practice.

The balance we seek to strike here between working at scale and attending to historical contingency is related to the larger issue of how we talk about the payoff of digital approaches to history. Those working in this space for any time will know the strange expectations upon us. Colleagues are at once sceptical that computational approaches could tell them anything they didn’t already know, and scathing that a new method has not delivered a revolutionary new finding. The latter demand is perhaps due in part to the blustering early promises made for the potential of digital methods to transform knowledge, and deliver startling new insights. But it is also due to the incursion of methods from scientific fields which tend to approach knowledge in a way that Wilhelm Windelband termed nomothetic, that is, having a tendency towards methods and approaches designed to derive generalisable laws to explain categories of phenomena. It is why historians have such problems with findings like those about 1880 being the year of peak ‘happiness’ in Britain. By contrast, scholarship in the arts and humanities is more comfortable with what Windelband calls the idiographic method, the tendency to specify what is contingent and particular. Contributions to the field of knowledge in history tend, on the whole, to be incremental, detailed, grounded, and alert to local inconsistencies, ambiguities, and subjective experience.

In the collaborative research that we undertook for this book our process sought to bring together the idiographic and nomothetic in an iterative and complementary research process. Our aspiration is that digital approaches can allow us to generate statistical – or what some proponents of the digital humanities call ‘distant’ – overviews of our historical documents and datasets; overviews which allow us both to determine the general trends and patterns within our data, and to attend to the individual data points (determined by a highly specific historical contexts) that make up the big picture. By keeping both worldviews in sight, and shifting between them as the research process unfolded, we have sought to define a particular research paradigm.

Victorian Data

That research paradigm, however, is itself shaped by the historical records at our disposal in the digital age. As already indicated, the Victorians produced data on a scale beyond anything that can be read and analysed using the traditional historian’s analogue methods. In the last few decades massive investments have been put into the digitisation of these records, which provide both new opportunities and new challenges for researchers. At the heart of this project are three key collections: newspapers, the population census, and the Ordnance survey maps, although these are surrounded by a constellation of other smaller datasets that we have acquired, digitised and shared (these are discussed more fully in the chapters that follow). This diversity of sources is key to our project; it offers a variety of perspectives and, crucially, novel ways of connecting these into a multi-dimensional picture of the past. Exploiting this diversity, however, requires an understanding of the provenance of digital data. After all, digital assets are not the same as the historical records themselves, either in their contours and coverage, or in their affordances; and, in turn, neither is a simple, unmediated window into past lives.

Provincial newspapers have long been recognised as a valuable and important historical source. In contrast to the national - in reality metropolitan - press, local and regional newspapers put down deep roots in local society.[38] They contain numerous valuable micro-genres that speak to the events in people’s lives, from news items and reports, to family notices, letters from local communities, and advertisements of the products that people might have had in their homes. Newspaper circulations grew enormously across the nineteenth century. Furthemore, as the content of newspapers was widely disseminated through reading rooms, clubs and through public readings, it was distributed more widely than circulation figures imply. That said, newspaper content was always slanted towards paying customers, and thanks to the influence of advertising often towards the worldview of the better off. Although in many respects provincial newspapers represent a uniquely democratic source, this democracy had its limits. Such newspapers have long been mined by historians for the insights they can offer into all aspects of everyday life at the local scale, but their great value as a digital source is that they can also be used to aggregate such evidence at the regional or national scale.

The BL’s collection contains newspapers from 1603 to the present day, from both Britain and further afield. There are over 600,000 bound volumes of newspapers (occupying 32 kilometres, or 20 miles, of shelving) and over 300,000 reels of microfilm (occupying a further 13 kilometres, or 8 miles, of shelving). The collection comprises 65 million pages that contain something like ten billion words. The size of this collection means that although digitisation of these newspapers began over two decades ago, it is still an ongoing process. The data we are working with derives from a range of different projects to digitise the BL’s newspapers. The largest is the British Newspaper Archive (BNA), which is a collaboration between the British Library and the genealogy website FindMyPast (FMP), which offers access to the digitised content mainly through a subscription service. In a typical week in June 2023, 23,182 new pages were added to this database.[39] The project is working with a version of this dataset that comprises 68 million pages with 150 billion words from c.1,600 newspaper titles spanning 1780-1920. We are the first project to have been provided a digital copy of this important dataset.[40]

The FindMyPast data also subsumes important earlier digitisation efforts resulting from the British Library’s prior collaborations first with JISC (formerly the Joint Information Systems Committee) and then with digital publisher Gale.[41] In addition, our project brought in new digitisations of BL holdings undertaken by the Library’s Heritage Made Digital (HMD) project, including a significant tranche funded directly by the project. HMD’s newspaper digitisation programme has been designed specifically to prioritise the digitisation of rare and vulnerable titles deemed to be of historical importance. In turn, our own digitisation selection prioritised longer-running titles from industrial districts, especially those known to have sought a working-class readership.[42] But despite the mammoth scale of newspaper digitisation, it is important to remember that far more newspapers remain undigitised, probably more than 80 per cent, some of which are too poorly preserved to be digitised – and other historic newspapers have not survived at all. Moreover, digitisation has neither progressed in a completely systematic nor a completely random way. Rather, successive waves of digitisation have each been shaped by the priorities of the organisations that led them. JISC and Gale both focused on research users (principally working via search portals). They prioritised quality provincial newspapers with long print runs; newspapers that could make a plausible claim to have been serious newspapers of record. By contrast, FindMyPast’s target audience is the general public, or rather that segment of it interested in family and local history. Their selection criteria are not a matter of public record, but they will certainly have been different.

The digital ‘environmental scan’ is our answer to the problem of how to know the relationship between what has, and has not been, digitised, and the potential implications of this for digital research. It is a tool that enables researchers to conduct source criticism at scale. It does this by deploying information derived from contemporary reference works to help us better understand the composition of the big bag of words that is any large text corpus. For newspapers, we use information extracted from the Mitchell’s Press Directories, established in 1846 and published annually from 1856 (these were digitised and structured within the project, for details see Chapter 1). We use the information derived from the digitised Press Directories to map the approximate profile of the newspaper publishing environment between the 1840s and 1920 (our end point), and then compare this profile with that of different newspaper corpora (JISC, Gale, BNA et.) to understand how selection decisions have influenced the representativeness of the different digital samples. The environmental scan is thus a tool for understanding bias in digital sources, but we are careful not to argue that there is some hypothetical ‘ideal’ sample that would eliminate all bias. Firstly, although we have no robust information about newspaper circulation after 1856, it is evident that not all newspapers were equal in their importance, influence or even their ability to represent facets of contemporary public discourse. JISC and Gale had good reason to favour some newspapers over others when spending scarce resources. Secondly, newspapers as a whole were (and are) an imperfect reflection of contemporary language precisely because they were mediated by commercial imperatives that gave some voices, and worldviews, more prominence than others. The environmental scan simply makes it possible for the first time to map these inclusions and exclusions more precisely.

Alongside the newspaper archive, the project also works with the census. The census began in 1801, but was initially little more than a basic headcount. The first full household census was completed in 1841, and included information about the occupation, place of birth and relationship to the head of household of eighteen million people living in Britain at that time. Sadly, the original household returns from 1841 were destroyed, and we only have aggregated information about the population, but from 1851 the decennial census returns survive as raw data as well as in aggregated form. Census returns from across the country were transcribed by hand by government officials and bound into the Census Enumerator Books (CEBs), which were subsequently stored at the Public Record Office, now the National Archives. The provenance of the census is clearly very different from that of the newspapers. Whereas the latter were produced, for commercial reasons, by numerous different individuals within society, the census was funded and produced by the Victorian state with a view to understanding the size and structure of its citizenry and economy. As a decennial snapshot of households (both their demographic composition and their economic underpinning), the census has long been exploited by academic researchers. It is of interest to our project primarily for the detail it gathered about occupations, which in turn provides a window on to new technologies invented in the nineteenth century and the ways in which they proliferated into, and impacted, the lives of ordinary workers.

It was not questions of an academic nature, however, that underpinned the original digitisation of the census. By the twentieth century, the census had come to the attention of a growing community of family historians and so genealogy companies, including (once again) FindMyPast, took an early interest in transcribing the original CEBs for the lucrative family history market. These transcriptions, produced with the family historian in mind, were not suitable for academic research. However, a subsequent ESRC-funded project at the University of Essex cleaned, standardised and coded FMP’s transcriptions to produce a structured dataset of the kind required by researchers. It is known as Integrated Census Microdata (I-CeM).[43] I-CeM contains the records of over 180 million individuals enumerated in British censuses between 1851 and 1911, over 100 variables for each (I-CeM lacks data for England and Wales in 1871 and Scotland in 1911). The I-CeM dataset enables academic research on the British census in a way that FMP’s original digitisation did not, and already forms the basis of a large and growing literature in the economic history tradition exploring aspects of family, demography, occupations and migration in nineteenth-century Britain.[44]

While the census provides us a point of entry into the living and working conditions of the people who experienced the industrial revolution, understanding the impact of industrialisation on the environment they inhabited is harder to capture. Our closest proxy, perhaps, is the series of Ordnance Survey maps that were begun during the Napoleonic wars. Once again, what gets mapped, and how it is represented, are political issues. Maps, like newspapers and population censuses are human documents, socially produced with specific ends in mind. Equally, however, maps can be interrogated with questions about the physical environment and its transformation in mind. They provide a powerful visual record of the natural and built environment, and - potentially - to changes reshaping that landscape, given that places were re-surveyed over time for revised and new editions of map series at different scales. Maps are of course being scanned by the thousands at cultural heritage institutions around the world. Our project has drawn heavily on the work of the National Library of Scotland (NLS). Since 2014, the NLS has been able to scan over 20,000 sheet maps per annum, and now has over 250,000 maps available online. The NLS have made those images freely viewable online, and the associated geo-data available for non-commercial use according to the principles of open and reproducible research. We supplemented the gaps in the NLS’s digitised OS map coverage with new digitisation. Our work to date has chiefly focused on the second edition County Series six-inch to one mile maps (1888-1914), but in the spirit of the environmental scan our early work on the maps sought to understand how the surveying process played out over time, and the impact that temporal process has on the way that we ask questions of such a dataset.[45]

The project therefore brings together multiple very large collections that present different forms of data: textual, tabular, and visual. However, despite the size, breadth and complexity of the datasets, it is important to pause a moment to recognise the narrow world from which they are mostly drawn, and the limitations they therefore present. By their nature these sources tend to accentuate the domestic, white mother-country facets of the story of Britain’s industrialisation, largely obscuring the centrality of empire and global trade to Britain’s emergence as an economic and imperial super power in this period. Entanglement with the empire ran through all aspects of daily life in nineteenth-century Britain, from the mass consumption of imported tea and sugar, to the reliance of so many of its industries on imperial raw materials and markets, as well as the trading advantages secured through British naval power. Both ordnance survey maps and the decennial census were confined within the geographical and political boundaries of the nation state; Ireland was included, the empire overseas was not (though as core technologies of state power both featured as tools of colonial rule).[46]

These sources still register traces of the wider story, but they are fleeting and partial. For example, the census recorded ‘place of birth’, including when people were born overseas; maps captured place names that encode local connections with empire and global trade, as well as recording the physical infrastructure that made that trade possible; and some newspapers offered reports not just of world events, but also of other international developments that might affect the local economy (such as accounts of harvests, shipping, mineral discoveries, or overseas infrastructure projects). Newspapers can also tell us about people’s daily engagement with commodities sourced from the empire. However, the ways in which such traces are refracted through the interests and the default worldview of imperial power, means that historians need to read ‘against the grain’ of such documents. As such, most research in this space has relied on painstaking archival work, supplemented by clues drawn from local and colonial newspapers, to tell alternative, de-centred stories of industrialisation. This is how Jenny Bulstrode has reconstructed the central role that enslaved Jamaican ironworkers played in developing the innovative manufacturing techniques that transformed Britain’s wrought iron industry and helped drive its industrial revolution.[47]

Of course, the work needed to uncover alternative histories capable of decentering the metropole need not be separate from computational research. Indeed, the archival work described above can build up new digital resources over time, even if they are on a different order of magnitude. And digital methods can be developed to scale up critical modes of close reading (focusing on discovery of rare or occluded phenomena, rather than large-scale patterns). However, the emphasis of this current project on what is possible already with these large and canonical datasets means that this volume often reproduces the dominant domestic narrative. Nevertheless, we do seek in the following pages to identify how our tools and methods could be extended to other collections and materials, and how we might be able to further develop computational approaches to find those subtle signals that can help us to capture an alternate worldview.

A new research paradigm?

The foregoing summary of our data begins to suggest the breadth of skills required of a team seeking to tackle such sources, both in terms of the domain expertise in the specific historical documents, but also the kinds of computational approaches needed to leverage such data types at scale. For example, we sought out computational linguists for analysing our large-scale text corpora, researchers able to work with regular expressions for the census, and colleagues conversant with GIS and computer vision for the maps. However, assembling diverse skills does not necessarily lead to interdisciplinary research. The risk remains that people will retreat into their received ways of working. This leads to multidisciplinary, rather than interdisciplinary, work: it may draw on knowledge from different disciplines but research and its outcomes ultimately stay within traditional boundaries. By contrast, the decision to work in a fully interdisciplinary fashion has ramifications both for the research process and its outcomes - as well as to whom such outcomes speak. We sought to find collaborative agendas which drew expertise from multiple fields in order to develop research questions and methods for answering these. In this way we challenged each member of our team to work within a new and unfamiliar research paradigm.

The combination of skills brought together by the authors of this book, and the broader team that sits behind it, was enabled by the vision of two institutions. The Living with Machines project began in 2017 as a conversation between the British Library (BL) and The Alan Turing Institute based on the clear opportunities their co-location on the BL’s main St Pancras site posed for collaboration. The Turing Institute is the UK’s national institute for data science and artificial intelligence, with strong support from government research funders and the university sector. Together, the two institutions provided a complementary set of offerings: the BL has digitised millions of pages from its collections and has amassed unparalleled expertise in both the curation and dissemination of cultural heritage, while the Turing has the expertise to harness cultural heritage data to answer research questions at scale. Nonetheless, the collaboration was experimental in nature, since neither institution had previously attempted a cross-disciplinary collaboration so radical in nature.

The project was designed with both general and specific aims. The under-girding aim of the project was to develop new computational techniques to marshal the UK's rich historical collections so that new research questions could be posed. This aim was intentionally broad: the project was intended to explore the research rendered possible by more than two decades of digitisation and investment in data creation. The digitised collections of the long nineteenth century was proposed as the test bed for this more general ambition, due in part to the BL’s long-term investment in the digitisation of its newspapers. Our more specific aim, identified by the historians involved in the development of the project, was to provide new perspectives on the effects of the coming of the machine age on the lives of ordinary people. However, history is only one of the fields in which we hoped to intervene. This collaboration was not designed in the service of history alone, or indeed any of the constituent disciplines. We wanted to develop generalisable tools, code, and infrastructure that could be adapted for, and help inspire, future interdisciplinary research projects. Crucially, we aimed not only to feed through developments from the computational sciences into humanities research applications, but to reverse this trend by generating new methods from a humanities project that made interventions within the fields of data science and other computational research.

One of the ways that Living with Machines is marked apart from digital projects from as little as five years ago is its emphasis on analysis. The previous generation of projects were more often focused around data collection, and/or the curation of data into queryable databases. Our ambition was to use already created datasets to drive the cutting edge of analysis.[48] The larger team were distributed across a set of broad research agendas, which were launched initially through project Labs, which were assembled to stimulate experimentation: on the language of mechanisation, on changes that can be mapped in time and space, on the bias and representation of our sources, on community engagement, and on tools and infrastructure development. Each Lab sought to assemble a team with members drawn from different disciplines and professional backgrounds, and their first task was to design an initial research task (what we termed the’ minimum research outcome’) that spoke back to the range of represented disciplines.[49] We wanted work that answered a concrete historical research question, that provided an opportunity to explore the question with methods from data science, that identified a specific data set that it could leverage. Outcomes from this work could be shared as code, software or pipelines for preparing and analysing data, shared in code repositories for the benefit of others working with cultural heritage data; as new datasets (or old datasets rendered newly research-ready) published on repositories and described within data papers; as articles reporting new methods or findings, which might speak to communities in the fields of computational linguistics, GIS, computer vision or digital history, amongst others; as well as historical research outcomes taking the more traditional form of articles and books; or as all of the above.

This may sound abstract. This is partly because our research proceeded differently for each of the sub-teams: the group tasked with analysing how the language used to describe the mechanisation of work and life evolved over the nineteenth century worked in very different ways from the sub-team that developed methods to quantitatively describe the biases of digitised newspaper collections. Nevertheless, to grasp what interdisciplinary collaboration might look like in action, it is perhaps useful to describe the specific research process underpinning one of our chapters: Chapter 3, ‘Beyond the Tracks’. This chapter reconstructs the physical impact of rail infrastructure on the lived environment, both urban and rural, in order to explore residential patterns that could better illuminate Victorian perceptions of the contrasting benefits and disbenefits of living close to rail. Who sought to live close to a railway station, and to what extent did wealth allow some to maximise the network amenity benefits of the rail system whilst minimising their exposure to the noise, pollution and noxious trades often associated with proximity to dense rail infrastructure? To answer this question we needed to know the footprint of rail and its surrounding infrastructure on the ground (data we derived from maps), the precise location of passenger railway stations, and where people from different backgrounds lived at the level of the street rather than larger administrative areas (census enumerators’ data, cross referred with datasets that could geolocate this data at the street level).

The analyses for this chapter relied heavily on the contribution of post-doctoral researchers with extensive prior experience using I-CeM, the computerised dataset of Victorian and Edwardian census enumerators’ returns described above. Joshua Rhodes played the lead role on the chapter, developing a new method for geo-linking census street addresses using only open-source spatial sources (GB1900 and OS Open Roads).[50] By locating individuals and households from the census (along with all the encoded socio-economic data about them in I-CeM) more precisely on the ground. This is a major feat of data curation in its own right, and can unlock numerous different research questions beyond this specific application.

The ‘Beyond the Tracks’ chapter took the form it did because other post-doctoral researchers and data scientists on the project had already put a great deal of work into developing research tools designed to facilitate new forms of spatial research. Historians Katherine McDonough and Daniel Wilson worked with Kaspar Beelen and Daniel van Strien, a digital humanist and curator respectively, in collaboration with Kasra Hosseini, a Research Software Engineer, to develop an innovative computer vision tool called MapReader, which harnesses machine learning to automatically identify distinctive features such as buildings and railway infrastructure on digitised ordnance survey maps (after manual training).[51] It was this research that allowed us to identify the physical footprint of the rail network (‘railspace’) across the whole country, and therefore to measure the density of rail infrastructure at the micro-level. This is very different from more conventional GIS (Geographical Information System) methods that focus exclusively on the network-amenity dimension of railways by reconstructing them as vectors connecting points on a map, rather than as physical entities transforming people’s lived environments in very uneven ways depending on the density of rail and its impact on residential space. Separately, Mariona Coll Ardanuy, Giorgia Tolfo and others worked to convert an important account of historic passenger railway stations, Michael Quick’s Railway Passenger Stations in Great Britain: a Chronology,[52] into a machine readable format, which formed the basis of our new dataset, StopsGB: Structured Timeline of Passenger Stations in Great Britain.[53]

The ‘Beyond the Tracks’ chapter was therefore conceived as a conscious attempt to bring this collection of individuals, skills, and new tools into dialogue with each other to ask how far late-Victorian residential patterns reflected the influence of rail as both amenity and disamenity. Did workers in certain occupational groups tend to live close to passenger stations? Were indicators of wealth such as live-in servant-keeping and occupation associated with a disinclination to live close to very dense rail infrastructure? Who did live closest to dense rail, and were these workers in the poorest groups, or perhaps those associated with trades that tended to be located close by dense rail? For a long time this research was known as the ‘convergence experiment’; a term which in many ways embodies the ethos of the project, and the way in which it sought to bring together not just very different types of sources, but also researchers with very different skill sets. Some of the key findings from this experiment are discussed in chapter 3.

This language of convergence is also important for thinking about what it means to credit all the labour and expertise that goes into developing such research, with all its varied inputs and dependencies. Our model of authorship in this book follows our policy on the project, to credit all parts of the workflow in our publications through authorship or citation of preceding outputs.[54] If no preceding publications exist (e.g. on a tool or method development), everyone involved in the workflow are cited as authors as long as a substantive input has been made. In that spirit we have adapted our own version of the CRediT Taxonomy–a high-level taxonomy, including fourteen roles, that can be used to represent the roles typically played by contributors to scientific scholarly output–tailored to the kinds of labour undertaken in our project. We seek to credit conceptualisation, methodology, implementation, reproducibility, interpretation and analysis, data curation, software, visualisation, writing, or the labour of care undertaken through actively overseeing and managing that part of the project (where it has materially helped the publication).[55] This taxonomy informs the contributor statements we write to accompany our formal publications, but we have also previously experimented with a ‘film-credit’ cover page for versions of our outputs deposited repositories.[56] We seek to be similarly generous in the way we credit the labour behind our datasets and code repositories.

This context is important to understand when encountering the author list and contents page for this current book. As we discuss further in our authorship and credit statement, while the contents page has some of the appearance of an edited collection, the chapters are the product of these converging experiments. The long author lists show the breadth of expertise brought to bear on each chapter; and the recurrence of contributor names across multiple chapters show the ways that different interests create threads of connection between the chapters. Those threads are evidence of that collaboration across the project as well as within more discrete experiments. As such this volume might more accurately be defined as a multi-authored book than an edited collection.

Writing the Industrial Revolution

As can be seen, there are multiple contexts - intellectual, disciplinary, practical - for this book and its constituent chapters. Those discussed above have most bearing on the practice of historians working with data, computational methods, and collaborators from computational disciplines - in other words historians living with machines. But our primary intellectual question is how ordinary people in the long nineteenth century learned to live with machines, which places our work within the more specific context of the accrued historiography surrounding the so-called ‘Industrial Revolution’. What do historians mean when they talk of Britain undergoing the first ‘Industrial Revolution’, and what have been the main areas of debate over its timing, causes and consequences? It is in our contribution to this ongoing scholarship that we can assess whether the ways of working that we have championed in this book, and our wider project, can truly extend our understanding of the changes wrought by the coming of the first machine age.

In the anglophone world, the popularisation of the concept of the ‘industrial revolution’ as a distinct epoch of social and economic transformation, beginning in Britain and subsequently spreading out across the world, is usually attributed to the economic historian Arnold Toynbee (1852-1883). Toynbee had made a stir at Oxford with his lectures on Britain’s industrial transformation, and, when he died suddenly, aged only 30, his friends and colleagues published his notes and papers as Lectures on the Industrial Revolution in England.[57] Toynbee wasn’t the first person to embrace the ‘revolution’ metaphor to dramatise industrialisation, broadly the same idea already had wide currency in Europe (not least through the greater awareness of the writings of Marx and Engels), but in Britain it only became an established feature of public discourse from the 1880s.[58] Toynbee intended the term to denote a very different type of ‘revolution’ to that advocated by Marx and Engels. He presented the ‘industrial revolution’ as the inter-connected, economic corollary of the political revolution that popular Victorian ‘Whig’ historians had long associated with the restoration of Parliamentary sovereignty in the (supposedly) peaceful revolution of 1688. It was, in short, a key component of Britain’s distinctive, non-revolutionary route to social transformation that should be contrasted with the violent, disorderly model of France in 1789.[59]

But no sooner had the expression begun to gain a degree of popular currency, than the academic community began to question the existence of this supposed ‘industrial revolution’. In the 1930s and 1940s a new generation of scholars began asking whether the industrial revolution was really so ‘revolutionary’, or indeed so located in ‘industry’, as Toynbee had confidently declared, and throughout the first half of the twentieth century the emphasis was firmly on the gradual rather than revolutionary nature of nineteenth-century economic change.[60] Of course, the writing of history rarely stands still. No sooner had scholars agreed that industrialisation proceeded in a gradual and piecemeal fashion, then interpretative trends changed once again, and the fast-paced ‘revolution’ that Toynbee had postulated, seemed to be back in vogue once more. The economic theorist Walt Rostow’s 1960 book The Stages of Economic Growth provided a hugely influential model of industrialisation that formed a sharp contrast to the more gradualist models that had held sway in the inter-war period. Here was an altogether punchier story, with the industrial revolution marking a watershed in not just British but in world history. For Rostow, industrialisation was the key to economic ‘take off’, and the promise of sustained economic growth.[61] But in little time, gradualism gained traction once more, at least for the British story, thanks to Nick Crafts, who in the 1980s recalculated economic growth rates for the period 1750-1830 and concluded that growth was much slower than Rostow’s concept of rapid take-off permitted, sparking yet another round of revisionism concerning the timing, pace and nature of British industrialisation.[62] Given the paucity and incompleteness of economic data for this period, no definitive answer is possible, but it is important to recognize that even those most pessimistic about Britain’s economic growth do not argue that little changed between the mid-eighteenth and mid-nineteenth centuries; that there was no ‘industrial revolution’. On the contrary, their arguments are about why Britain experienced less dramatic improvements in productivity and gross domestic production than countries that industrialised later, not whether its economy and society underwent radical restructuring across these decades.[63]

For all the controversy surrounding the term the ‘industrial revolution’, the concept has proved remarkably resilient, and for many decades there has been widespread agreement between academic historians that something happened in Britain in the years 1760-1860 that was so fundamental in nature that some form of special term is not only justified, but needed. Where there has been less agreement is over the question of why Britain emerged as the first industrial nation. For much of the twentieth century, the answer seemed to lie in technology. Toynbee had made much of the role of inventors in shaping Britain’s precocious industrialisation, and this emphasis on key inventions remains a staple of popular histories of the Industrial Revolution. Hargreaves’ spinning jenny, Arkwright’s spinning throstle and carding engine, Crompton’s mule, Cartwright’s power loom, and James Watt’s steam-engine are regularly cited as the cause of Britain's elevation to its unique (though ultimately temporary) status as the world’s industrial powerhouse.[64] In turn, this emphasis on inventive genius led historians to focus on those elements of British culture, education, and political and legal frameworks that had supposedly predisposed its people to ingenious and entrepreneurial activity.[65]

As the title of our book, Living with Machines, indicates, we concur that the development and proliferation of new forms of technology in the nineteenth century is an important historical event. But the complex forces that drove industrialisation should not be reduced to a simple narrative of technological determinism. Inevitably, new technologies played an important role in transforming Britain into an industrial society, but the world had already seen the emergence of some ground-breaking new technologies – gunpowder, mechanical clocks, the printing press. None of these had ushered in an industrial revolution. Indeed, even Britain's leading sector – the textile industry – had mechanised long before the eighteenth century. Spinning had been partially mechanised by the invention of the spinning wheel in the fourteenth century; treadle looms were even older. Manufacturers had long sought to invent and improve machinery in order to find more efficient ways of creating goods that could be sold more cheaply and used more widely. These inventions served to enrich manufacturers and nations, but they had never previously amounted to anything that could be properly described as an ‘industrial revolution’. The question remains, what changed in the eighteenth century?

Alongside technology, historians have drawn attention to many other possible causes of Britain’s industrial revolution. Some point to the importance of agriculture in freeing labourers to work in industry;[66] of coal in powering the new industrial juggernaut and breaking the demographic constraints of an organic economy rooted in nature;[67] of the colonies in providing Britain with raw materials;[68] of slavery in providing the capital needed by industrialists;[69] of domestic industry and the labour provided by women and children;[70] and of rising demand for new goods.[71] Indeed, throughout the twentieth century, historians continued to argue about every aspect of the industrial revolution, producing in the end a complex, multi-layered debate with very little agreement.

At the same time as some scholars argued over the origins and principal causes of the industrial revolution, so others argued over its consequences for people and, especially more recently, the environment. The question of the social impact of industrialisation was baked in from the time of Toynbee in the late nineteenth century.[72] Toynbee himself was a social reformer seeking to temper laissez faire liberalism through an expanded role for an ameliorative state. Other early writers brought a similar social agenda to their historical enquiries about the social impact of industrialisation, building on a literary tradition that had long presented industrialisation as a source of immiseration.[73] In the twentieth century, economic historians such as T.S. Ashton pushed back against this thesis, insisting that real wages rose substantially for the majority of the population during the industrial revolution.[74] The ‘standard of living’ debate increasingly focused on when real household incomes began to rise for the bulk of the population, and whether such monetary gains could be argued to have brought a better quality of life for the urban, industrial population. Some had no doubt not only that industrialisation brought higher real wages, but that workers viewed such wages as providing adequate compensation for the ‘urban disamenities’ associated with industrialisation such as pollution, disease and, for many, reduced life expectancy.[75] Critics point out that adult male wage rates cannot tell the whole story, and that other data, such as per capita consumption trends, point to little or no improvement in working-class living standards before 1850.[76]

Again, there is no easy way to resolve these controversies. Much hinges on the question of how workers’ themselves experienced the transition from agricultural to industrial production, and from rural to urban life. We know that Britain’s emerging industries were highly regionalised, and that, in turn, wage rates and economic fortunes varied radically across the country. Bold statements about average living standards necessarily flatten this regional complexity, and ignore the extent to which old and new ways of living with machines existed side-by-side even in the same localities.[77] Social historians have sought to humanise the story that can be told through economic and demographic statistics, by turning to the accounts that working people themselves generated about the experience of industrialisation. E.P. Thompson’s seminal book The Making of the English Working Class draws on a wide range of documentary and statistical sources, but at its heart is a sustained engagement with working-class radical print culture as an expression of the world-view of those oppressed and enslaved by the new industrial system.[78] Working-class autobiography has also proved a rich source for historians keen to restore working-class subjectivity to the history of the Industrial Revolution. The shadow of Thompon’s study loomed large over early work in this vein, which tended to foreground radical politics and the ‘birth and early struggles of the working class’.[79]

More recently politics and class-making have been decentred in favour of a greater emphasis on how industrialisation altered the fabric of working-class life and culture. Regenia Gagnier has focused specifically on shifting class-based models of selfhood in industrialising Britain, while more recently Emma Griffin has sought to recover how the social changes associated with industrialisation could be understood as a form of personal liberation by many born into dependent poverty in the countryside.[80] Such accounts undoubtedly help to flesh out what industrialisation meant for the people who lived through it, but inevitably they too can be charged with flattening the diversity of human experience by privileging some voices over others. Did the working-class writers of radical pamphlets and newspapers really share the world-view of the men and women they claimed to speak for? Similarly, does reconstructing the subjectivity of a few hundred autobiographers mostly writing late in life allow us to make definitive statements about how most people experienced the Industrial Revolution? Sadly, there are no easy answers when our goal is to reconstruct the everyday experiences of people who have mostly left no written record beyond the official markers of birth, marriage and death.

Nor do we claim that Living with Machines can provide ready-made answers to issues which historians have argued over since the nineteenth century. Our claim is more modest, namely that the tools and methods developed here have the potential to open up new perspectives on the radical social transformations that everyone agrees remade British society between 1750 and 1900. We need to proceed with caution, constantly interrogating the provenance and partiality of the sources available for digital research. But as the chapters that follow demonstrate, there is now enormous potential to combine analogue and digital research methods to unlock facets of the everyday experience of industrialisation. Combining close and distant reading techniques can now be used to recover minority voices currently buried in the billions of words of digitised newsprint. Interconnecting historic maps, street-level census data and other sources of social and economic information, as we do in Chapter 3, can help us reconstruct everyday experience in all its complexity whilst still identifying meaningful, large-scale patterns. In short, Living with Machines is a first step on an important journey to bring history and computational methods together in new ways in order to help make sense of the past.

  1. For instance, ‘Goldman Sachs Report: ChatGPT Could Impact 300 million Jobs’, Open Data Science, 5 April 2023 at: https://odsc.medium.com/goldman-sachs-report-chatgpt-could-impact-300-million-jobs-4f375b30e606; ‘Pause Giant AI Experiments: an Open Letter’, futureoflife.org, 22 March 2023 at: https://futureoflife.org/open-letter/pause-giant-ai-experiments/. ↑

  2. J.I. Gershuny and R.E. Pahl, ‘Work outside employment’, New Universities Quarterly, 34, 1 (1979): 120-35, and Ibid., ‘Britain in the decade of the three economies’, New Society, 3 Jan. 1980, pp. 7-9. ↑

  3. Jefferson Cowie, Stayin’ Alive: the 1970s and the Last Days of the Working Class (New York, 2010); Sherry Lee Linkon, The Half-Life of Deindustrialization: Working-Class Writing About Economic Restructuring (Ann Arbor, MI, 2018); Lutz Raphael, Beyond Coal and Steel: a Social History of Western Europe after the Boom, trans. Kate Tranter (Cambridge, 2023). ↑

  4. For a sober analysis of the moment see Emily M. Bender, ‘Policy makers: Please don’t fall for the distractions of #AIhype’, Medium, https://medium.com/@emilymenonbender/policy-makers-please-dont-fall-for-the-distractions-of-aihype-e03fa80ddbf1. ↑

  5. Klaus Schwab, ‘The Fourth Industrial Revolution: What It Means, How to Respond’, World Economic Forum, 14 Jan. 2016 at: https://www.weforum.org/agenda/2016/01/the-fourth-industrial-revolution-what-it-means-and-how-to-respond/. ↑

  6. William Wordsworth, ‘Outrage done to Nature’, from The Excursion (1814); William Blake, ‘And did those feet in ancient time’, from Milton: a Poem (1804-8). ↑

  7. Benjamin Disraeli, Sybil, Or the Two Nations (London, 1845), Elizabeth Gaskell, Mary Barton (1848); Charles Dickens, Hard Times (London, 1854); George Gissing, Demos (London, 1886). ↑

  8. John Stuart Mill, Principles of Political Economy, ed. W. J. Ashley (London, 1909), p.751. For the later nineteenth century, see: J. A. Hobson, ‘The Influence of Machinery upon Employment’, Political Science Quarterly 8/1 (1893), 97–123; J. Shield Nicholson, The Effects of Machinery on Wages (London, 1892). ↑

  9. John H. Clapham, An Economic History of Modern Britain: The Early Railway Age, 1820-1850 (Cambridge, 1926); J.L. Hammond and Barbara Hammond, The Skilled Labourer (London, 1919). ↑

  10. E. P. Thompson, The Making of the English Working Class (Harmondsworth, 1976). ↑

  11. N. F. R. Crafts, British Economic Growth during the Industrial Revolution (Oxford, 1985). ↑

  12. Herbert L. Sussman, Victorians and the Machine: The Literary Response to Technology (Cambridge, MA, 1968); B. Rieger, Technology and the Culture of Modernity in Britain and Germany, 1890-1945 (Cambridge, 2005); Tamara Ketabgian, The Lives of Machines: The Industrial Imaginary in Victorian Literature and Culture (Ann Arbor, MI, 2011); Wolfgang Schivelbusch, The Railway Journey: The Industrialization of Time and Space in the Nineteenth Century (Berkeley, CA, 2014). ↑

  13. James Vernon, Distant Strangers: How Britain Became Modern (Berkeley, CA, 2014). ↑

  14. Mary Poovey, A History of the Modern Fact: Problems of Knowledge in the Sciences of Wealth and Society (Chicago, 1998); Patrick Joyce, The State of Freedom: a Social History of the British State since 1800 (Cambridge, 2013). For a more sceptical assessment of the knowing state see Edward Higgs, ‘The Rise of the Information State: the Development of Central State Surveillance of the Citizen in England, 1500-2000’, Journal of the History of Sociology, 14/ 2 (2001), 175-97. ↑

  15. Alan J. Lee, The Origins of the Popular Press, 1855-1914 (London, 1976), 290-1. ↑

  16. Ruth Ahnert, Emma Griffin, Mia Ridge, and Giorgia Tolfo, Collaborative Historical Research in the Age of Big Data: Lessons from an Interdisciplinary Project (Cambridge, 2023) ↑

  17. Willard McCarty, ‘Humanities Computing’, in Encyclopedia of Library and Information Science, 2nd edn., ed. Miriam Drake (New York, 2003), pp. 1224-35. On Miles and Busa, see also Rachel Sagner Buurma and L. Heffernan, ‘Search and replace: josephine miles and the origins

    of distant reading’, Modernism/modernity 3/1 (2018); Melissa Terras and Julianne Nyhan, ‘Father Busa’s female punch card operatives’, Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein (Minneapolis, 2016), pp. 60–65; Julianne Nyhan and Marco Passarotti, ed. One Origin of Digital Humanities: Fr Roberto Busa in his own Words (Cham, 2019); and Julianne Nyhan, Hidden and Devalued Labour in the Digital Humanities: On the Index Thomisticus Project 1954-67 (London, 2022). ↑

  18. See for example, Melissa Terras, Julianne Nyhan, and Edward Vanhoutte, eds., Defining Digital Humanities: A Reader (London, 2013); and Matthew K. Gold and Lauren F. Klein, eds., Debates in the Digital Humanities (Minneapolis, 2019). ↑

  19. See for example Jen Boyle, ‘Treading the digital turn: mediated form and historical meaning’, Journal for Early Modern Cultural Studies, 13/4 (2013) 79-90; and Laura Estill, Diane K. Jakacki and Michael Ullyot, Early Modern Studies after the Digital Turn (Toronto, 2016). ↑

  20. Daniel J. Cohen et al.,’Interchange: The promise of digital history’, Journal of American History, 95/2 (2008), 452-491. ↑

  21. Tim Hitchcock, ‘Confronting the Digital: Or How Academic History Writing Lost the Plot’, Cultural and Social History, 10/1 (2013) 9-23. ↑

  22. One thinks of: Journal of Digital History; and Digital Humanities↑

  23. Jo Guldi, ‘The Trouble with Text Mining and Why Some Projects Take a Long Time, and Future Projects Might Take Less Time’, ‘Historical Research in the Digital Age’, Part Six [27 June 2023], Royal Historical Society Blogpost series, 14 March 2023, citing Luke Blaxill, The War of Words: The Language of British Elections, 1880–1914 (Woodbridge, 2020). The three journals are Current Research in Digital History, The Journal of History, Culture and Modernity, and The Journal of Digital History. See also Jo Guldi, The Dangerous Art of Text Mining: A Methodology for Digital History (Cambridge, 2023), p. 10. ↑

  24. Luke Blaxill, ‘Why do historians ignore digital analysis? bring on the Luddites’, Political Quarterly, 94/2 (2023): 279-89. ↑

  25. Robert Fogel and Stanley Engerman, Time on the Cross: the Economics of American Slavery (Boston, 1974) ↑

  26. Francesco Boldizzoni, The Poverty of Clio: Resurrecting Economic History (Princeton, 2011) 15–17. ↑

  27. Claude Diebolt and Haupert Michael, “Clio’s Contributions to Economics and History.” Revue d’économie Politique, 126/5 (2016), 971–89. Robert Skidelsky, What’s Wrong with Economics?: A Primer for the Perplexed, (New Haven, 2020). ↑

  28. Naomi Lamoreaux, ‘The Future of Economic History Must Be Interdisciplinary’, journal of economic history, 75/4 (2015), 1251–57; Jari Eloranta, et al. “Towards Big Data: Digitising Economic and Business History’ in Digital Histories: Emergent Approaches within the New Digital History, ed. Mats Fridlund et al. (Helsinki, 2020, pp. 45–68. ↑

  29. Jo Guldi and David Armitage, The History Manifesto (Cambridge, 2014). ↑

  30. Ted Underwood, ‘Dear humanists: fear not the digital revolution advances in computing will benefit traditional scholarship — not compete with it’, The Chronicle of Higher Education, March 27, 2019 <https://www.chronicle.com/article/dear-humanists-fear-not-the-digital-revolution/> ↑

  31. Jean-Baptiste Michel et al., ‘Quantitative analysis of culture using millions of digitised books’, Science (2011), 331, 176–82 ↑

  32. E. A. Pechenick, C. M. Danforth, and P. S. Dodds, ‘Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution’ PLoS ONE 10/10 (2015). ↑

  33. Thomas T Hills et al., ‘Historical analysis of national subjective wellbeing using millions of digitized books. Nature Human Behaviour, 3/12 (2019), 1271–5. ↑

  34. Ibid., 1274. ↑

  35. ‘Scientists pinpoint the year that Britons were happiest’, Independent, 15 Oct. 2019; ‘Victorian times were happiest, study of national mood finds’, The Times, 14 Oct. 2019. ↑

  36. See Guldi, Dangerous Art of Text Mining. ↑

  37. Kaspar Beelen et al, ‘Bias and representativeness in digitized newspaper collections: Introducing the environmental scan’, Digital Scholarship in the Humanities, 38/1 (2023), 1–22, https://doi.org/10.1093/llc/fqac037; ES2***. ↑

  38. Andrew Hobbs, A Fleet Street in Every Town: The Provincial Press in England, 1855-1900 (Cambridge, 2018). ↑

  39. Statistics true as of 21 June 2023. ↑

  40. Although some prior publications used this data, analysis was done on the FindMyPast servers, and the researchers were not given direct access to the data assets. See ADD CITATION ↑

  41. JISC report @ http://www.jisc.ac.uk/whatwedo/programmes/digitisation/bln.aspx [23 June 2023]; Paul Fyfe, ‘An archaeology of Victorian newspapers’, Victorian Periodicals Review, 49/4 (2016), 546-77. ↑

  42. Giorgia Tolfo et al, ‘Hunting for Treasure: Living with Machines and the British Library Collection’ in Digitised Newspapers? A New Eldorado for Historians? Reflections on Tools, Methods and Epistemology, ed. Andreas Fickers, Valérie Schafer, Sean Takats and Gerben Zaagsma (Oldenburg, 2023), https://doi.org/10.1515/9783110729214-002. ↑

  43. I-CeM has been available since 2014, updated in 2022, and with an upcoming release in 2023 to include 1921 census data. K. Schürer and E. Higgs, Integrated Census Microdata (I-CeM), 1851-1911. [data collection]. UK Data Service. SN: 7481, 2022, DOI: 10.5255/UKDA-SN-7481-2; K. Schürer and E. Higgs, Integrated Census Microdata (I-CeM) Names and Addresses, 1851-1911: Special Licence Access. [data collection]. 2nd Edition. UK Data Service. SN: 7856, 2022, DOI: 10.5255/UKDA-SN-7856-2. ↑

  44. See, for example: Leigh Shaw-Taylor and E. A. Wrigley, ‘Occupational structure and population change’, in The Cambridge Economic History of Modern Britain. Volume 1: 1700-1870, ed. Roderick Floud, Jane Humphries and Paul A. Johnson (Cambridge, 2014), pp. 53-88; Leigh Shaw-Taylor, ‘Diverse experiences: The geography of adult female employment in England and the 1851 census’, in Women’s Work in industrial England: Regional and Local Perspectives, ed. Nigel Goose (Hatfield, 2007), 29-50; K. Schürer et al., ’Household and family structure in England and Wales, 1851–1911’, Continuity and Change, 33/3 (2018), 365-411; Kevin Schürer and Joe Day, ‘Migration to London and the development of the north–south divide, 1851–1911’, Social History, 44/1 (2019) 26-56. ↑

  45. See Olivia Vane’s Observable notebook for Macromap <https://observablehq.com/@oliviafvane/macromap>. This work formed the basis of this article: ‘How intrepid Victorian surveyors mapped the length and breadth of Britain’, The Economist, <https://www.economist.com/interactive/britain/2023/04/06/how-intrepid-victorian-surveyors-mapped-the-length-and-breadth-of-britain> ↑

  46. A. J. Christopher, ‘The quest for a census of the British Empire c.1840-1940’, Journal of Historical Geography, 34/2 (2008), 268-285; Alexander Kent et al (eds.), Mapping Empires: Colonial Cartographies of Land and Sea: 7th International Symposium of the ICA Commission on the History of Cartography, 2018, (New York, 2020). ↑

  47. Jenny Bulstrode, ‘Black metallurgists and the making of the Industrial Revolution’, History and Technology (2023), advanced access, DOI: 10.1080/07341512.2023.2220991. ↑

  48. But even where historical data has already been digitised, it must be first acquired and then ‘wrangled’ (converted into a usable form) before analysis can begin. This starts with the foundational labour of obtaining data, including contractual work around rights management and data sharing agreements. In the UK, the lack of public funds to support the digitisation of our shared cultural heritage has led institutions to partner with commercial companies to digitise their assets. This in turn creates commercial and legal sensitivities that impose technical barriers to working in an exploratory and iterative fashion with digitised data. Both of these tasks require special skill sets and are complex and time-consuming. We have talked more practically about these challenges in, Ahnert et al., Collaborative Historical Research in the Age of Big Data, chapter 2, and for that reason do not intend to revisit them here. For the present, we will simply repeat that despite two decades of investment, both public and private, in the digitisation of the content of our national libraries and archives, our experience has been that these assets are difficult to access, rarely research-ready, and surrounded by a mass of restrictions that have implications for reproducibility and the sharing of outputs. ↑

  49. On the MRO see Giorgia Tolfo et al., ‘The Minimum Research outcome: a mechanism for generating and managing projects in labs', in Digital Humanities and Laboratories: Perspectives on Knowledge, Infrastructure and Culture, ed. Urszula Pawlicka-Deger and Christopher Thomson (Abingdon, 2023). ↑

  50. See https://www.visionofbritain.org.uk/data/, and https://www.ordnancesurvey.co.uk/products/os-open-roads. ↑

  51. The code for this tool is available at https://github.com/Living-with-machines/MapReader. See also Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen, and Katherine McDonough, ‘MapReader: a computer vision pipeline for the semantic exploration of maps at scale’, Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities (2022), 8–19, https://doi.org/10.1145/3557919.3565812; and Kasra Hosseini, Katherine McDonough, Daniel van Strien, Olivia Vane, Daniel C.S. Wilson, ‘Maps of a Nation? The Digitized Ordnance Survey for New Historical Research’, Journal of Victorian Culture, 26:2 (2021), pp. 284–299, https://doi.org/10.1093/jvcult/vcab009. ↑

  52. https://rchs.org.uk/railway-passenger-stations-in-great-britain-a-chronology/. ↑

  53. https://github.com/Living-with-machines/station-to-station. ↑

  54. For further discussion see Ahnert et al, Collaborative Historical Research, chapter 4. ↑

  55. https://credit.niso.org/↑

  56. For an example of this, see the Arxiv version of one of our earliest publications: https://arxiv.org/pdf/2005.11140.pdf, discussed in a blogpost by Federico Nanni, ‘Highlighting Authors’ Contributions’, https://livingwithmachines.ac.uk/highlighting-authors-contributions-and-interdisciplinary-collaborations-in-living-with-machines/. ↑

  57. Arnold Toynbee, Lectures on the Industrial Revolution in England; Popular Addresses, Notes and Other Fragments together with a short memoir by B. Jowett (London, 1884). ↑

  58. A simple google ngram search confined to British English sources records sees an increase in occurrences from c.1880 with a continuous increase in the frequency of usage from 1901 to 1944. ↑

  59. Alon Kadish, ‘Arnold Toynbee (1852-1883)’ OEDO, [2004]. ↑

  60. Joseph Schumpeter, Business Cycles i. (New York, 1939; repr. Philadelphia, 1989),.67-8; Clapham, Economic History, 143-5; E. M. Carus-Wilson, ‘An industrial revolution of the thirteenth century’, Economic History Review, 11 (1941), 39-60; J. U. Nef, The Rise of the British Coal Industry, i. (London, 1932), pp.165-89; Idem., ‘The progress of technology and the growth of large-scale industry in Great Britain, 1540-1640’, Economic History Review, 5/1 (1934), p.24. ↑

  61. Walt Whitman Rostow, The Stages of Economic Growth: A Non-Communist Manifesto (Cambridge, 1960). See also: Phyllis Deane, The First Industrial Revolution (Cambridge, 1967) pp.117; Phyllis Deane and W. A. Cole, British Economic Growth, 1688-1959: Trends and Structure (Cambridge, 1962). ↑

  62. N. F. R. Crafts, British Economic Growth during the Industrial Revolution (Oxford, 1985). ↑

  63. For a useful summary of these arguments see, M.J. Daunton, Progress and Poverty: An Economic and Social History of Britain 1700-1850 (Oxford, 1995), 125-36. ↑

  64. For instance, Historyhit.com, https://www.historyhit.com/key-figures-of-the-british-industrial-revolution/; schoolhistory.org, https://schoolshistory.org.uk/topics/british-history/industrial-revolution/inventors-and-inventions/. ↑

  65. For an example of such an approach, see Eric Jones, The European Miracle: Environments, Economies and Geopolitics in the History of Europe and Asia. 3rd ed. (Cambridge, 2003). ↑

  66. Patrick O’Brien, ‘Path dependency, or why Britain became an industrialized and urbanized economy long before France’, Economic History Review, 49/2 (1996), 213-49; R.C. Allen, The British Industrial Revolution in Global Perspective (Cambridge, 2009), chap. 3. ↑

  67. E. A. Wrigley, Continuity, Chance and Change: The Character of the Industrial Revolution in England (Cambridge, 1988) ↑

  68. Ronald Finlay and Kevin H. O'Rourke, Power and Plenty: Trade, War, and the World Economy in the Second Millennium (Princeton, 2008). ↑

  69. Eric Williams, Capitalism and the Slave Trade (London, 1944; 2nd edn., 1964); Joseph E. Inikori, Africans and the Industrial Revolution in England: a Study in International Trade and Economic Development (Cambridge, 2002); Sven Beckert, Empire of Cotton: A Global History (London, 2014) . ↑

  70. Maxine Berg, ‘What difference did women’s work make to the Industrial Revolution?’, History Workshop Journal, 35/1 (1993), 22–44. ↑

  71. J. De Vries, The Industrious Revolution: Consumer Behaviour and the Household Economy (Cambridge, 2008); Maxine Berg, Luxury and Pleasure in Eighteenth-Century Britain (Oxford, 2005). ↑

  72. Toynbee, Lectures↑

  73. Sidney Webb and Beatrice Webb, Industrial Democracy. 2 vols. (London, 1898); L Hammond and Barbara Hammond, The Skilled Labourer (London, 1919). ↑

  74. T.S. Ashton, The Industrial Revolution (Oxford, 1948). ↑

  75. J.G. Williamson, ‘Was the Industrial Revolution worth it? Dismanities and death in nineteenth-century British towns’, Explorations in Economic History, 19/3 (1982): 221-45. ↑

  76. Sara Horrell and Jane Humphries, ‘Old questions, new data, and alternative perspectives: Families’ living standards in the Industrial Revolution’, 52, 4 (1992): 849-80; Joel Mokyr, ‘Is there still life in the pessimist case? Consumption during the industrial revolution’, Journal of Economic History, 48/1 (1988): 69-92. ↑

  77. Pat Hudson, ed. Regions and Industries: a Perspective on the Industrial Revolution in Britain (Cambridge, 1989); Maxine Berg and Pat Hudson, ‘Rehabilitating the industrial revolution’, Economic History Review, 45 (1992): 24-50. ↑

  78. Thompson, English Working Class. ↑

  79. David Vincent, Bread, Knowledge and Freedom: A Study of Nineteenth-Century Working Class Autobiography (London, 1981), p. 3. ↑

  80. Regenia Gagnier, Subjectivities: a History of Self-representation in Britain, 1832-1920 (Oxford, 1991); Emma Griffin, Liberty’s Dawn: a People’s History of the Industrial Revolution (New Haven, 2013). See also: Jamie L. Bronstein, The Happiness of the British Working Class (Stanford, 2023). ↑

Annotate

Early access
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org