Skip to main content

Exploring Digital Cultural Heritage: Chapter 2 Access

Exploring Digital Cultural Heritage
Chapter 2 Access
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • My Notes + Comments
    • Notifications
    • Privacy
  • Project HomeExploring Digital Cultural Heritage
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. Series Page
  2. Title Page
  3. Copyright
  4. Contents
  5. Abbreviations
  6. Series editors’ preface
  7. 1. Introduction
    1. Context
    2. Themes and topics
    3. Notes
  8. 2. Access
    1. Opening up and accessing digital cultural heritage collections
    2. Technological advancements towards opening up access
    3. Responsible and ethical open access
    4. Access for a fee
    5. Restricting access
    6. Infrastructuring access
    7. Access during crisis
    8. Notes
  9. 3. Use and reuse
    1. Copyright and licensing
    2. Navigating grey areas of reuse
    3. Reusing cultural heritage collections as data
    4. Technical frameworks
    5. Documentation and standards
    6. Skills and training
    7. Restricting reuse
    8. Notes
  10. 4. Value(s)
    1. Measuring impact and value
    2. Values and ethical challenges
    3. Community and professional values
    4. Notes
  11. 5. Sustainability and preservation
    1. Digital cultural heritage in danger
    2. Environmental concerns
    3. Notes
  12. 6. Conclusion
  13. Bibliography
  14. Index

Chapter 2 Access

Opening up and accessing digital cultural heritage collections

Paul Koerbin issues a challenge for everyone concerned with the preservation of digital cultural heritage: ‘Without access preservation is little more than a costly and meaningless storage burden’ (Koerbin 2017, 195). This is, of course, a deliberate oversimplification; there are many kinds of cultural heritage – digital and otherwise – that cannot or should not be made accessible to everyone. Karolina Prażmowska, for example, notes that the digitisation of and access to Indigenous Peoples’ Traditional Cultural Expressions ‘can significantly increase opportunities for cultural appropriation and commodification of [their] cultural heritage’ (Prażmowska 2020, 120). Archives spend considerable time trying to check for and remove sensitive or protected information from the digital materials that are deposited with or acquired by them, a process which becomes ever more difficult at scale (see, for example, The National Archives of the UK 2016). Without such sensitivity review, much archival material cannot responsibly be opened up for public access, or at least not without long periods of initial closure. But Koerbin does remind us of the important connection between the digital and expectations of access. The possibility of access to cultural heritage is greatly expanded when it takes digital form, and it is no longer necessary to visit an object, manuscript or collection in person to experience something of it. Certain kinds of born-digital cultural heritage, for example non-subscription websites, may always have been open and accessible to a global audience (although paradoxically they may become closed or subject to restricted access when archived) (Winters 2020).

Narratives around openness and access – of open access to knowledge – have long accompanied the growth of the digital. The formal concept of open access has been part of the research landscape for more than twenty years, with the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities serving as an important milestone in 2003. The language used is again strikingly wide ranging:

Establishing open access as a worthwhile procedure ideally requires the active commitment of each and every individual producer of scientific knowledge and holder of cultural heritage. Open access contributions include original scientific research results, raw data and metadata, source materials, digital representations of pictorial and graphical materials and scholarly multimedia material. (Max-Planck-Gesellschaft 2003)

The Declaration is explicitly not only concerned with research produced in universities but also engages with cultural heritage and ‘source materials’ from the outset. Subsequently, discussions of open access, science and knowledge tended to be dominated by protocols around the sharing of the traditional outputs of scholarly research, for example journal articles. More recently, however, there has been a shift in attention to data, and it is here that digital cultural heritage once again comes into focus. Data means many things to many people, but for researchers and practitioners in the arts and humanities, it is primarily constituted from cultural heritage materials. An important concept here is the idea of collections as data, ‘which raises the question of what it might mean to treat digitised and born digital collections as data rather than simple surrogates of physical objects or static representations of digital experience’ (Padilla 2018, 296). The datafication of cultural heritage becomes possible when it is digitised or when its original form is digital.

Technological advancements towards opening up access

A key inflection point in relation to library and archival material was the advancement in technology that allowed projects and institutions not just to image documents and manuscripts but to make them machine readable through the application of Optical Character Recognition (OCR). A technology that is now routinely incorporated into smartphones transformed research using newspapers, printed books and pamphlets, the typed records of administration, and many other printed forms of expression. The title of an edited volume published in 2023 – Digitised Newspapers: A New Eldorado for Historians? – is redolent of the excitement that this has engendered for a very well-established and well-studied form of cultural heritage (Bunout, Ehrmann and Clavert 2023). A second inflection point is the recent maturity of Handwritten Text Recognition (HTR) software, which is opening up handwritten letters, books and manuscripts to analysis in a similar way. The widespread adoption of the Transkribus platform brings the benefits of datafication to the digitised cultural heritage of the medieval and early modern periods in particular.1

One of the earliest initiatives to experiment with HTR for historical document images was the tranScriptorium project, funded under the European Union’s Seventh Framework Programme. It ran for three years from 1 January 2013 and tested its methods on two English-language corpora: the transcribed papers of the philosopher Jeremy Bentham (1748–1832), held at University College London; and the publicly available subset of Eighteenth Century Collections Online (Tanha, Romero and de Does 2013). The Bentham Papers were a key source as they had been the subject of a ground-breaking crowdsourcing project, Transcribe Bentham, in 2010–11. Crowdsourcing, described as ‘a form of digitally-enabled participation that promises deeper, more engaged relationships with the public via meaningful tasks with cultural heritage collections’ (Ridge et al. 2021), has been a key mechanism for opening up access not just to cultural heritage materials themselves but to the contexts within which they sit, the processes that lead to their publication and the narratives that are developed about them. Citizen scientists can transcribe, describe, annotate, enhance and even generate new cultural heritage data, for example by contributing their own memories and knowledge to collections. Crowdsourcing also serves to make cultural heritage more accessible even to those who do not participate directly in citizen science activities – or may not even know that their experience has been mediated by ‘collective wisdom’ (Ridge et al. 2021). The ‘Tag along with Adler’ project, for example, engaged volunteers (via the Zooniverse platform) to add their own tags to images from the collections at the Adler Planetarium in Chicago.2 The aim was to capture the kinds of terms that non-specialists might search for rather than the more formalised language used in metadata created by museum staff and cataloguers, thereby making individual items in collections more discoverable – or discoverable in different ways (BrodeFrank 2024). Crowdsourcing is just one element of a more participatory culture that has developed in relation to cultural heritage, one which values different forms of knowledge and experience. Many aspects of this participatory culture have been enabled by digital technologies, and in particular the web and social media (see, for example, Giaccardi 2012).

The application of computer vision to digitised and born-digital cultural heritage materials seems likely to be the next technological means of transforming access, particularly to the object- and image-based collections held in museums and galleries (Liarokapis et al. 2020). The potential for enhanced search and linkage of industrial heritage collections, for example, has been explored by researchers in the Congruence Engine project, led by the UK’s Science Museum Group.3 The Heritage Weaver prototype aimed to draw connections within and between collections relying not on metadata or descriptive text but on machine-identified visual features and tropes (Kitcher et al. 2026). Brown (2024) notes the potential of computer vision to ‘contribute meaningfully to object-based research … serving as a visual prosthetic that enables researchers to make feature comparisons across large bodies of work or that reveals qualities invisible to the human eye’ (34).

Responsible and ethical open access

Access of this kind, and the research and understanding that it enables, relies on a culture of openness and sharing within the cultural heritage and academic sectors. Over the past decade, the Open Culture or OpenGLAM movement has emerged, encompassing cultural heritage organisations (GLAMs) that aim to provide ‘ethical open access to cultural heritage’ (OpenGLAM, n.d.-a). OpenGLAM brings some of the concepts and values of the broader open access movement to the cultural heritage sector and there were early attempts to position the activity of mass digitisation as increasing access to cultural and heritage content in line with the objectives of the open access movement (Terras 2015). The OpenGLAM movement was first spearheaded in Europe in the early 2010s by institutions like the Rijksmuseum, which aimed to make its online collections freely accessible. Over the past decade, and with contributions from members of the Creative Commons community, the Wikimedia Foundation and the Open Knowledge Foundation, OpenGLAM efforts have grown significantly, with numerous institutions and individuals across the globe providing frameworks, pilot studies and use-cases in order to define policies, approaches and practices for opening up access to their digital collections. The OpenGLAM community drafted the OpenGLAM Principles (OpenGLAM, n.d.-b) and at the time of writing is co-developing a ‘Declaration on Open Access for Cultural Heritage’ to guide more equitable practices around open access in digital cultural heritage.4 In order to operationalise all of this work, over the last decade, GLAM Labs (under the various names they have adopted) have been set up within many cultural heritage institutions as ‘in-house’ drivers for the implementation of enhanced access to and reuse of cultural heritage data. They operate at the intersection of digital cultural heritage, research, innovation, technology and creativity (Mahey et al. 2019) and have been vital in driving experimentation and collaboration. They have seeded ideas and indicated important directions of travel, but their work has all too rarely come to influence business-as-usual within under-resourced institutions (Winters et al. 2022).

The process of making cultural heritage data publicly accessible often has a significant technological element, including the need to ensure data is machine-readable and structured in a way that allows it to be linked with other datasets. A concept closely linked to open data in cultural heritage is that of the ‘FAIR principles’ (Findable, Accessible, Interoperable, Reusable), originally introduced in 2016 to provide guidelines to the scientific community for research data management and stewardship (Wilkinson et al. 2016). The cultural heritage community has widely embraced the FAIR principles, as they offer a valuable framework for assessing and digitally publishing cultural heritage data in order to improve discovery, ensure sustainable access and promote better sharing and reuse (Koster and Woutersen-Windhouwer 2018). Given that the FAIR principles refer to data, metadata and infrastructure (GO FAIR, n.d.), three types of entities that are heavily used within digital cultural heritage, compliance with them has become a baseline requirement in digital cultural heritage projects and infrastructures. It has also, however, opened up a field of enquiry concerned with the ongoing technical challenges of how to implement FAIR data sustainably (Hermon and Niccolucci 2021).

If openness is broadly to be welcomed as a precondition for greater access, the CARE Principles for Indigenous Data Governance remind us that it cannot always be the default: ‘greater data sharing alone creates a tension for Indigenous Peoples who are also asserting greater control over the application and use of Indigenous data and Indigenous Knowledge for collective benefit’ (Carroll et al. 2020, Abstract). The CARE Principles – Collective Benefit, Authority to Control, Responsibility and Ethics – are designed to ensure that Indigenous communities are able to assert greater control over how Indigenous data held by cultural heritage institutions and other bodies is used and accessed (Carroll et al. 2020; 2021). While the FAIR principles offer a data-centric approach, the CARE principles are people- and purpose-oriented, ‘reflecting the crucial role of data in advancing innovation, governance, and self-determination among Indigenous people’ (Carroll et al. 2020, Abstract). The goal is that stewards and other users of Indigenous data will ‘be FAIR and CARE’, embracing both sets of principles in all aspects of their data work (Carroll et al. 2021). Along the same lines, the Principles of Open Scholarly Infrastructure (POSI) underline the importance of open infrastructure. POSI offers a set of sixteen principles across three themes (Governance, Sustainability, Insurance) by which open scholarly infrastructure organisations and initiatives that support the research community can be run and sustained.5

Access for a fee

In contrast to the growing culture of responsible and ethical openness that is being promoted within the cultural heritage and knowledge sectors, much of the world’s digital cultural heritage is controlled by commercial entities. This commercial interest first became apparent in relation to the mass digitisation that began in the mid-1990s. A Jisc report published a decade or so after the first investments in digitising cultural heritage at scale noted that, in the UK alone, ‘a conservative estimate suggests £130 million of public money has been spent on the creation of digital content since the mid-1990s’ (Jisc 2005, 2). But it was not just public money that was being spent; commercial publishers often partnered with GLAMs to digitise and make available their collections. Thylstrup notes that this ‘integration of commercial platforms into the otherwise primarily public institutional set-up of cultural memory … produced a reconfiguration of the political landscape of cultural memory’ (Thylstrup 2019, 5). The consequences have been profound and connect to all four of the values that lie at the heart of this book. There is, of course, a trade-off between availability of any kind, however restricted, and open access. Nor are the financial barriers to access uniform. Public interest in genealogy and family history has driven significant digitisation in the UK and elsewhere and companies like Ancestry and FindMyPast have enabled access to vast historical collections for people who do not have the benefit of institutional support for research.6 At the time of writing, FindMyPast advertises that it hosts ‘more than 10 billion genealogy records, including the 1921 census’ – all yours for £24.49 a month (or £169.99 a year). The commodification of digital cultural heritage is apparent here not just in the subscription but in the reference to ‘genealogy records’ rather than collections or primary sources (we will return to this in Chapter 3, ‘Use and Reuse’). A different kind of partnership is apparent in the relationship between GLAMS and digital publishers, whose target market is not family historians but higher education institutions. Companies like Gale (part of Cengage Learning) and ProQuest invested significantly in the digitisation of important historical collections, with a business model that relies on high levels of (costly) institutional subscription. They are, and should be expected to act like, businesses, but locking cultural heritage behind a paywall does inevitably constrain access. It also determines the kinds of cultural heritage that will be digitised, that is, the material for which there is a market.

Commercial imperatives are also in play for born-digital cultural heritage, with potentially more serious consequences for access – and particularly for continuity of access. Material posted on the web and social media is an important form of cultural heritage. There are numerous institutions and initiatives concerned with the archiving and preservation of the web, whether globally (the Internet Archive’s Wayback Machine) or nationally (for example, national libraries such as the Bibliothèque nationale de France and the British Library), but platform-based social media content is far more of a challenge.7 The value of this material, and of persistent access to it, is increasingly recognised (see, for example, Rees 2021; Wallenius 2022; Schafer and Pailler 2024), but there are numerous, sometimes insuperable, barriers to access for both institutions and individuals. A complex web of legal deposit regulation, platform terms of use, data protection legislation and technical hurdles faces those wishing to archive and make available social media (Cannelli 2024). Social media platforms are not, of course, archives and access to their content can be curtailed for commercial or other reasons at little or no notice. The closure of once-important services, and the subsequent loss of unique cultural heritage, is a fact of twenty-first-century life. Nowhere was this more starkly illustrated than by the removal of free access to the Application Programming Interface (API) for the social media platform that was then called Twitter (now X). The relative ease of access to Twitter data had allowed the development of a lively ecosystem of tools and services built on top of the API and ensured that researchers and archiving institutions could download content and analyse it at scale. From 9 February 2023 this option was removed, to be replaced by a paid-for model out of the reach of all but the most well-financed research institutions. A report in the Guardian newspaper published two days before the switch-off noted that it was ‘yet another example of the perils of semi-public platforms being controlled by individuals. And an example of the impact that removing or revoking access to a relatively unrecognised backbone of the internet can have on everyday users’ (Stokel-Walker 2023).

Restricting access

There are numerous reasons why some cultural heritage collections should not be made openly accessible: open access might harm the communities represented in the collections or negatively impact their interests (for example, social media data collections documenting social protest under authoritarian regimes or activist groups); it might be ethically and/or culturally inappropriate to have certain assets openly accessible to all (for example, for some community and Indigenous archives); there might be ethical limitations arising from concerns around sensitive information, personal data, privacy and confidentiality; or contractual obligations to donors or creators might impose conditions (for example, commercial restrictions or embargoes). Sometimes effective closure of digital collections is the appropriate and ethical path to take, but in other cases cultural heritage institutions can explore ways to make the works they hold more accessible by considering options such as providing clear warnings or contextual explanations, implementing access restrictions, or anonymising or pseudonymising certain types of personal data. In all such instances, the agreement of those with rights or interests in such collections should be sought.

An important example of collection objects for which open access should be questioned is provided by digital surrogates of human remains in GLAM online collections and repositories. Kahn and Simon, in their recent study on this subject, revealed that there are ‘no direct limits to access’ to various types of digital surrogates of human remains in the collections of digital cultural heritage institutions, besides a couple of warning messages such as a ‘statement of intent regarding culturally sensitive items’ in the case of the Wellcome Collection in London (Kahn and Simon 2023, 213). However, as they rightly note, ‘just because something can be shared, does not automatically mean that it should be’ (Kahn and Simon 2023, 222) and critical and responsible digitisation, ingestion and publication practices need to be in place for similar cases. Blanket policies are often unsuitable for dealing with unique objects with a highly specific context, which are better considered on a case-by-case basis.

Infrastructuring access

Sometimes, a first step to openness involves not necessarily providing access to the collections themselves but making available metadata and other forms of derived information. An important recent initiative in the UK, which has the potential to transform access to, if not always (re)use of digital cultural heritage is the Museum Data Service (MDS).8 Formally launched in 2024, the MDS is a collaboration between Art UK, the Collections Trust and the University of Leicester, with support from Bloomberg Philanthropies and the Arts and Humanities Research Council (AHRC). Like many initiatives before it, it is seeking to solve the challenge of offering search and discovery options across a range of different institutions, with differing approaches to metadata and cataloguing. Projects like Europeana have attempted to do this by developing a network of aggregators, for example Archives Portal Europe, the European Film Gateway and the Digital Repository of Ireland, whose content can be searched through a common interface. The Digital Public Library of America adopted a similar approach, and there are other national and regional examples. The MDS is different in a number of ways: it creates collection level summaries that can be used to gain an overview of the national position; it focuses specifically on data, which it aims to make FAIR; and it is concerned only with museums rather than seeking to range across the GLAM sector. Its ambition is to provide ‘the digital standpipe to let decades’ worth of knowledge flow and grow’ (Museum Data Service n.d.). This kind of approach has also been adopted by curators and archivists working with collections where access is relatively limited because of legislative restriction rather than because of the nature of the content. Material collected under legal deposit legislation, for example, is limited in many countries to access on-site in library reading rooms rather than online. Absent the ability to consult the full UK Web Archive online, the next best thing is access to derived data, including seed lists of the websites crawled for special collections, format profiles (for example,.html, .pdf etc.) and links between archived pages.9

In November 2024, the UK’s Department for Science, Innovation and Technology confirmed that it is working on the development of a National Data Library of public sector data which sets an ambitious goal to create a new landscape for digital research and cultural heritage.10 The aim is to bring together existing research programmes that help deliver centralised, secure, data-driven public services to collate and provide access to high-quality data for researchers to explore. There are already existing UK initiatives that have transformed public sector datasets into valuable research assets, such as Administrative Data Research UK, the Integrated Data Service and Health Data Research UK.11 These services provide a model for offering safe and secure access to data – albeit mostly to accredited specialist researchers rather than to everyone – and present a strong foundation on which to build a potential National Data Library for the benefit of all.

As part of the preparatory work for the development of a UK National Data Library that would make public sector datasets more accessible to researchers and enable future science to thrive, in December 2024 the Wellcome Trust and the Economic and Social Research Council (ESRC) opened a call for the submission of ideas in relation to technical visions and architectures.12 In January 2025, the UK Government published its AI Opportunities Action Plan, which establishes digital and cultural heritage as a space of technological innovation and discusses in more detail the development of a proposed UK National Data Library. In the Action Plan, cultural heritage organisations are also referred to as bodies holding valuable cultural datasets that could prove to be important for the unlocking of public data assets to enable innovation, research and the generation of value:

Establish a copyright-cleared British media asset training data set, which can be licensed internationally at scale. This could be done through partnering with bodies that hold valuable cultural data like the National Archives, Natural History Museum, British Library and the BBC to develop a commercial proposition for sharing their data to advance AI.13

Access during crisis

We could not finish this chapter without acknowledging the impact of the COVID-19 pandemic on access to GLAM collections, both digital and analogue. From March 2020, cultural heritage institutions around the world began to close their doors to both the public and the majority of their staff. To differing extents, depending on regional and national lockdown arrangements, GLAM collections were physically inaccessible for periods of months over the following two years. Digital access to collections became increasingly important, whether through visiting GLAM websites and image databases, attending webinars and online talks or engaging with increasingly innovative institutional and personal social media channels. Much has already been written about this extraordinary period. Ginzarly and Srour acknowledge that ‘As daily-life practices moved online, the COVID-19 crisis was a catalyst for sharing heritage content online’. Focusing on the sharing of content online by means of hashtags allowed for the conceptualisation of ‘digitally mediated heritage practices … as a process of heritage value co-creation’ (Ginzarly and Srour 2022, 3). Burke and colleagues highlight some of the most striking examples of the vibrant co-creative practices that emerged in the museum sector, including, for example, the Getty Museum Challenge, which encouraged social media users to recreate paintings from Getty collections in their own homes, and the launch of #MuseumsUnlocked (Burke, Jørgensen and Jørgensen 2020).

Research is now beginning to engage with the medium-term impact of the necessarily ad-hoc digital innovation that characterised so much of the activity in the cultural sector during the period when audiences were excluded from traditional modes of physical engagement with collections. Samaroudi and colleagues, for example, note that

through the COVID-19 pandemic the sector has identified audiences and needs with which memory institutions want to engage through digital resources and mechanisms: these include anti-racism activists, audiences characterised through their social condition (lonely, bored) rather than their identity or interests, and those for whom digital may not be an easy or obvious means of communication. (Samaroudi, Echavarria and Perry 2020, 21–22)

It remains to be seen whether insights of this kind will fundamentally alter sector approaches to working with digital cultural heritage, notably in terms of providing access to those who have long been excluded from traditional physical spaces, whether because of disability, location, income or other factors. In terms of what constitutes cultural heritage, Zuanni argues that ‘digital content about an object becomes part of this same object’s biography, in a complex balance of relationships between original and reproduction, documentation and engagement, reference to the physical counterpart and newborn-digital object’ (Zuanni 2023, 696). Here the digital and the analogue are entwined as new forms of cultural heritage emerge and interact.

Notes

  1. 1  Transkribus, https://www.transkribus.org/ [accessed 30 January 2025].

  2. 2  Tag along with Adler, https://www.zooniverse.org/projects/webster-institute/tag-along-with-adler [accessed 30 January 2025].

  3. 3  Congruence Engine was one of five Discovery projects funded by the Arts and Humanities Research Council as part of the Towards a National Collection programme, https://www.sciencemuseumgroup.org.uk/projects/the-congruence-engine [accessed 30 January 2025].

  4. 4  OpenGLAM, https://openglam.pubpub.org/ [accessed 30 January 2025].

  5. 5  The Principles of Open Scholarly Infrastructure, https://openscholarlyinfrastructure.org/ [accessed 15 January 2025].

  6. 6  Ancestry, https://www.ancestry.co.uk/ [accessed 30 August 2024]; FindMyPast, https://www.findmypast.co.uk/ [accessed 24 November 2025].

  7. 7  Wayback Machine, https://web.archive.org/ [accessed 4 September 2024].

  8. 8  Museum Data Service, https://museumdata.uk/ [accessed 1 August 2025].

  9. 9  Data derived from the UK Web Archive is available via the Shared Research Repository, https://bl.iro.bl.uk/collections/d09fbc16-7a76-49db-a45f-16a99c30ae3e?locale=en [accessed 17 December 2024].

  10. 10  UKAuthority, https://www.ukauthority.com/articles/dsit-confirms-work-on-national-data-library [accessed 5 November 2024].

  11. 11  Administrative Data Research UK, https://www.adruk.org/; Integrated Data Service, https://integrateddataservice.gov.uk/; Health Data Research UK, https://www.hdruk.ac.uk/ [accessed 1 August 2025].

  12. 12  Wellcome, https://wellcome.org/what-we-do/our-work/uk-data-library [accessed 2 January 2025].

  13. 13  Department for Science, Innovation and Technology, ‘AI Opportunities Action Plan: Unlocking Data Assets in the Public and Private Sector’, https://www.gov.uk/government/publications/ai-opportunities-action-plan/ai-opportunities-action-plan [accessed 21 January 2025].

Annotate

Next Chapter
Chapter 3 Use and reuse
PreviousNext
Copyright © Anna-Maria Sichani, Jane Winters and Crown copyright, 2026. Re-used under the terms of the Open Government Licence v3.0.
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org