Chapter 3 Use and reuse
Copyright and licensing
The Berlin Declaration on Open Access (2003) tightly linked ideas of openness to the reality of use and reuse. In order to be considered truly open access, the authors and/or owners of any form of scholarly or cultural output should ‘grant to all users a free, irrevocable, worldwide, right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship’ (Max-Planck-Gesellschaft 2003). This leaves no room for doubt that open access involves the purposeful ceding of control over cultural heritage data to unknown publics. Fiona Cameron argues further that restrictions on use do not just affect what happens to cultural heritage today, but in the future: ‘copyright, the desire to own and protect data by heritage institutions, acts as an anti-preservationist strategy’ (Cameron 2021, 46).
But questions of copyright, use and reuse remain highly contested. Copyright is one of the most common forms of intellectual property rights present in digital cultural heritage. The terms of copyright differ slightly between countries, but it always has a common aim: to enable creators to benefit from and control the distribution of their work by preventing its unauthorised reuse. Copyright complexities, and issues that arise from the provision of access to and use of digital cultural heritage assets, require a combination of expertise – from cultural heritage professionals, legislators, lawyers and copyright experts – to unpack them. A range of practice exists within the cultural heritage sector, some arising from policy decisions within collecting/archiving institutions and some the result of external legal frameworks. In the UK, for example, the UK Libraries and Archives Copyright Alliance (LACA) is the main UK body lobbying on behalf of library, information and archive professionals, and users of GLAM institutions, for fair practices in copyright. It advocates for a fair and balanced copyright framework that represents the rights of copyright holders while placing equal value on the importance of users’ liberties. In 2014, the Association of European Research Libraries (LIBER) set up a working group on Copyright and Legal questions, which consists of librarians, lawyers, academics and communications professionals who monitor current European law and react to proposed changes on behalf of libraries, archives, researchers and students. The working group offers support to cultural heritage professionals engaged in research who seek to acquire the training to understand the complex landscape of copyright for cultural heritage assets. Since 2010 the International Council of Archives (ICA) has been represented at the World Intellectual Property Organization’s (WIPO) Standing Committee on Copyright and Related Rights (SCCR), which focuses on copyright advocacy work to support the archival mission. Working together with the International Federation of Library Associations and Institutions (IFLA) and the International Council of Museums (ICOM), the SCCR’s goal is ‘a binding international treaty setting out basic copyright exceptions that would enable libraries, archives, and museums around the globe to fulfil their mission to preserve their holdings and make them available for use’ (Dryden 2024).
Copyright, however, is not the only factor in determining the permitted uses and reuse of cultural heritage materials. In the UK and many other countries, the exploitation of some forms of born-digital material is restricted by the terms of legal deposit legislation. It is not, for example, possible to cut and paste text from an archived web page held in the UK Web Archive at the British Library, or indeed to take a screenshot or use any other form of image capture (Milligan 2015). In other cases, restrictions arise from the economic imperatives at work within the cultural heritage sector. Some institutions choose to impose restrictions on the use and reuse of collection images so that income can be generated from their sale, either via a third-party image library or an in-house platform. However, as Patricia Huang notes, while ‘At the beginning of [the] information revolution, museums were understood to have high hopes for the revenue that digital image licensing (DIL) services might generate … DIL services in museums have yet to report significant profits’. Consequently, ‘a growing number of art history museums have chosen to encourage free or CC-licensed distribution of their images’ (Huang 2020, 220).
The ‘CC’ referenced by Huang is Creative Commons, an international non-profit organisation that seeks to empower ‘individuals and communities around the world by equipping them with technical, legal, and policy solutions to enable sharing of knowledge and culture in the public interest’ (Creative Commons n.d.-c). Creative Commons, and specifically the suite of licences that it has produced to enable different forms of sharing and reuse, has become an essential part of the scholarly and cultural heritage open-data landscape. At the time of writing, there are six CC licences available, ranging from CC BY, which simply requires an acknowledgement of authorship, to CC BY-NC-ND, which allows attributed sharing but prevents commercial use and the creation of any form of derivative work. Most permissive of all, and reflecting the requirements of the Berlin Declaration, is the CC0 public domain dedication tool, which allows authors, creators and owners to waive any interests in their works so that they can be freely reused and remixed (Creative Commons n.d.-b). The Walters Art Museum in Baltimore, Maryland, was an early institution to adopt CC licensing for its collection images, removing copyright restrictions from more than 10,000 images in 2011 (Walters Art Museum 2011).1 Other, larger institutions followed, for example the Rijksmuseum in Amsterdam allows the ‘use of digital reproductions of public domain objects made available … without permission being required. For commercial purposes too’ (Rijksmuseum n.d.). Creative Commons itself developed a ‘search portal’ that allows users to explore content that ‘you can share, use, and remix’ (Creative Commons n.d.-a). Numerous online image searches, for example those offered by Google Images and Wikimedia Commons, offer an option to filter according to copyright and/or licensing status so that users can be reasonably confident they are allowed to use the digital images they find for research and creative practice.
Any barrier placed in the way of reuse, for example the insistence on those elements of a Creative Commons licence that prohibit the creation of derivative works or commercial use, constrains how users can interact with digital cultural heritage beyond simply looking at or reading it. Such restrictions would prohibit innovation of the kind evident from the collaboration between the British Library and the British Fashion Council, which recognises the work of student fashion designers who draw inspiration from the Library’s online collections. In 2021, for example, the Public Award was given to Chiara Lamon, whose designs used images from the British Library’s collections that ‘included the work of photographer Ethienne Jules, images from geology and dance and a comparison of a multiple collar garment from Viktor and Rolf’s Autumn/Winter 2003 collection paired with an early portrait of a man in a collared shirt’ (British Library n.d.). This kind of imaginative repurposing and remixing of digital cultural heritage cannot easily be anticipated, but it can be closed off by the use of restrictive licensing.
Licences, and in particular open licences, ‘are the key mechanism for ensuring that works can be used and reused legally’ (Hamilton and Saunderson 2017, 3). Licensing and the open movement partly developed as a reaction to the presence of intellectual property rights. Open licences, such as the Creative Commons licences, and open source licences, for example MIT GPLv3 and others, are tools that were created to remove legal barriers, making it easier to share copyrighted content and communicate reuse conditions in a simple way, thereby encouraging the maximum public reuse of cultural heritage data. Beyond licensing, the reuse of copyrighted GLAM content is further enabled in the UK by the Copyright, Designs and Patents Act 1988,2 in the EU via the Directive on Copyright in the Digital Market (hereafter EU Directive)3 through the introduction of text and data mining exceptions for the purpose of scientific research, and in the USA through the ‘fair use’ doctrine.
Navigating grey areas of reuse
Although licences succeed in preventing copyrighted digital cultural heritage from being locked down, there remains copyrighted content that cannot be treated within this framework, resulting in what has become known in the digital cultural heritage community as the ‘twentieth-century black hole’. This neat phrase refers to the very low quantity of twentieth-century material that is available for reuse because of the difficulty in clearing – or even determining – rights (Boyle 2009). Both out-of-commerce works (OOCWs) – that is, works that are either protected by copyright but not available commercially or have never been and/or were never intended to be available commercially (for example unpublished works, grey literature, amateur photography and certain expressions of traditional culture) – and orphan works, for which the rightsholder is not known or cannot be found (Martinez and Terras 2019), are still causing headaches for cultural heritage professionals in terms of their reuse status. There have been several recent legislative attempts to systematically address these issues, especially through the adoption of the EU Directive noted previously, which introduces a legal framework to support cultural heritage institutions in the digitisation and cross-border dissemination of OOCWs. This was followed by the launch in 2021 of the European Union Intellectual Property Office (EUIPO) Out-Of-Commerce Works Portal, where heritage institutions and other organisations can share information about out-of-commerce works to ensure that they are accessible to the public;4 and the EUIPO Orphan Works Database, which provides information about orphan works contained in the collections of publicly accessible cultural heritage institutions.5
Not surprisingly, for born-digital cultural heritage there are significant uncertainties and grey areas in relation to intellectual property rights, permissions, privacy and licences that further hinder its use and reuse. While she is primarily concerned with the relationship between copyright and the preservation of born-digital materials, the issues identified by Katherine Fisher similarly affect end users: researchers and practitioners are faced with ‘shifting definitions of ownership, unclear distinctions between published and unpublished content, digital rights management laws and technologies, and the layered copyrights that can exist in complex digital objects and their dependencies’ (Fisher 2021, 238).
Copyright brings challenges for the reuse of digital cultural heritage content even by its absence. Although in theory, once copyright protection expires, a creative work automatically falls into the public domain and anyone can reuse it for any purpose without obtaining permission, cultural heritage institutions still engage in the practice of claiming copyright over faithful digitised and born-digital surrogates of public domain works (Wallace and Euler 2020; Wallace 2022a). Even if a number of legislative interventions have been introduced aiming to ensure that public domain works remain in the public domain once digitised, for example Article 14 of the EU Directive, which aims to prohibit public domain ‘works of visual art’ from being subjected to new copyright claims, ‘we are still talking about copyright because the overwhelming majority of cultural institutions assert copyright in surrogates despite its unsound legal basis’ (Wallace 2022b, 329).
Reusing cultural heritage collections as data
Navigating rights, restrictions and other legal obstacles is just one piece of the complex matrix that impacts wider reuse of digital and born-digital cultural heritage collections and the ability to scale up innovative research and creative work around them. Licensing and the OpenGLAM movement, along with the wealth of digitised and born-digital cultural heritage collections that have resulted from the large-scale digitisation efforts of the last decades, have allowed researchers and cultural heritage professionals to conceptualise ‘cultural heritage data as humanities research data’ (Tasovac, Chambers and Tóth-Czifra 2020, 1), as well as to start thinking and working ‘at scale’ with digital cultural heritage data. As Daniel Wilson argues, ‘ “scale” has become a zeitgeist, in particular for the digital humanities, increasingly coupled to the field of data science, its methods, thought-style and knowledge claims’ (Wilson 2022). It is often in GLAM labs within cultural heritage organisations that we have witnessed this interdisciplinary and experimental work flourishing over the last years: researchers, cultural heritage professionals and data scientists collaboratively working towards the development and reuse of extensive collections of in-copyright and/or licensed cultural heritage materials to carry out various types of data-driven humanities research (for example, text mining, data mining, data visualisation, mapping, image analysis, audio analysis, network analysis, machine learning) (Candela et al. 2020).
In order to address this recent computational turn in cultural heritage, since 2016 the Collections-as-Data movement, and particularly the Always Already Computational project (Padilla et al. 2019) and its Mellon-funded successor, Collections as Data: Part to Whole (Padilla et al. 2023), has been focusing on and advocating for the responsible development and computational use of digitised and born-digital cultural heritage collections by making these collections available as data that is ‘amenable to computation’ (Padilla et al. 2019, 20). The Collections-as-Data movement and community of practice have blossomed internationally and have been advocating for the importance of presenting cultural heritage collections in open, reusable formats as machine-readable data, through the Santa Barbara Statement on Collections as Data (2019) and the Vancouver Statement on Collections as Data (2023).6 Furthermore, Collections-as-Data is focused on promoting and encouraging diverse forms of collaboration among stakeholders and areas of work that need to be brought together to responsibly develop and support the reuse of collections as data, including reference services, documentation, repository development, collections management, digitisation, web archiving, outreach and preservation. The movement also highlights concerns about ethical stewardship, social and technical interoperability, transparent documentation, commitments to preservation and responsible operations (Padilla 2019; Padilla et al. 2023).
Aligned with Collections-as-Data’s mission to advance the responsible development and computational reuse of digital cultural heritage collections, a series of initiatives have introduced the importance of producing datasheets (Alkemade et al. 2023) and data envelopes (Luthra and Eskevich 2024). Especially in light of developments in AI and the need for high-quality, well-documented GLAM datasets for machine learning processes, these datasheets and data envelopes are designed to provide context for and transparent information about provenance, purposes, composition, collection processes, recommended uses or societal biases reflected in cultural heritage datasets. Developing, providing and maintaining digital cultural heritage collections as data is not an easy task for cultural heritage institutions, and to support adoption in a ‘business-as-usual’ mode, a number of checklists and guidelines have been developed to help them break down the tasks, steps and requirements for such an endeavour (Candela et al. 2023; Lee 2023).
Technical frameworks
Earlier technical attempts to offer GLAM datasets for reuse include the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), which provided links to zipped collections or to static collection directories that can be accessed via GitHub, such as those of The New York Public Library,7 The Cooper Hewitt Museum8 and The Tate Collection.9 Over the last ten years, the most common way for museums and other GLAMS to openly release their collection data has been via an API. APIs are a way of structuring data that makes it accessible and transmissible in a machine-readable and dynamic way, allowing for communication between software programs. From the perspective of a cultural heritage institution, an API allows users to request data from inside the institution and have it delivered to them in a usable form, freely available by means of an open licence which is Creative Commons (CC) or CC-like.
Although API technology has become widely used in the GLAM world in general, a recent audit showed that there is some resistance to or delay in the adoption of APIs among GLAM institutions, particularly in the UK. Almost half (49 per cent) of respondents to a 2022 survey said that their institution did not have APIs; only 21 per cent said that their institution had an API that allows others to make use of their online collections; and a further 16 per cent said that the introduction of an API was pending (Gosling et al. 2022, 3). Moreover, the existing open APIs have low levels of usage and some either struggle to perform or stall completely when attempting higher-volume queries for collaborative research projects (Gosling et al. 2022, 5). Indeed, there is very little research aiming to understand how people use collections data from cultural heritage institutions through APIs (Villaespesa et al. 2021), thus limiting our capacity to assess the quality of and interest in reusing the available data and to encourage straightforward engagement with APIs by users with a range of technical expertise. For example, the Museums of the City of Paris/Paris Musées make available more than 100,000 open-access works via their GraphQL API, but an additional layer of authentication is required for the user to acquire the API’s key and access its content (in JSON files).10 This adds friction to the process and discourages serendipitous rather than targeted exploration. Similarly, the technical documentation and development for both the V&A Collections API11 and the Wellcome Collection API12 are designed for and addressed to developers as end-users, as per their domain names. This is suggestive of a high barrier to entry and likely to discourage general research use.
Intended for users who might benefit from practical, accessible examples, rather than exhaustive technical specifications, the Jupyter Notebook is becoming a key tool for GLAM institutions to introduce users to accessing and reusing their datasets (Candela et al. 2023). The Jupyter Notebook is a web application, often used as a learning and teaching environment, which allows users to craft easily shared, interactive, computational narratives that mix live code, results and text. The value of Jupyter Notebooks for GLAM services and collections data has been demonstrated in recent years by the GLAM Workbench (Sherratt 2021), focusing mainly on the GLAM sector in Australia and New Zealand, but also encompassing material from the UK Web and UK Government Web Archives. These ‘workbenches’ succeed not only in making GLAM data more accessible by using Jupyter Notebooks to analyse and reuse it, but also establish these processes as highly reproducible, even by users without coding skills. The National Library of Scotland’s Digital Scholarship Service has been using Jupyter Notebooks to support its release of collections as data (including digitised collections (text and images); metadata collections; map data; and organisational data) via its data-delivery platform, the Data Foundry, since September 2019.13 Based on the principles of openness, transparency, reproducibility and practicality, the Data Foundry is designed to be easy to access and use, providing ‘no-nonsense’ data with clear rights information, straightforward downloads, dataset trials and plain-text-only options. The Jupyter Notebooks were created initially as ‘a COVID-19 response’ in order ‘to give all library users an opportunity to explore the Library’s collections as data, even if they have never programmed or conducted data analysis’ (Ames and Havens 2022, 52).
Documentation and standards
Reuse can take many forms and the concept itself is multifaceted. The editorial note that accompanies a special issue of the International Journal on Digital Libraries on ‘FAIR data and cultural heritage’ focuses on the final letter of the acronym, suggesting that ‘to reuse data in cultural heritage it is necessary to expand the “R” facet of the FAIR principles at least into R3: Re-usable, Relevant and Reliable’ (Hermon and Niccolucci 2021, 251). However, preparing digital cultural heritage data to be reused is a resource-intensive process for cultural heritage institutions and stakeholders, requiring time, expertise and robust infrastructure. The quest for data interoperability has been one of the major challenges in digital cultural heritage since its early days and a vast array of standards and technical solutions have been explored and developed over the years to support and encourage cultural heritage data interchange and reuse among different audiences (Ioannides, Georgopoulos and Scherer 2005).
Digital cultural heritage standards are like a ‘shared grammar’, establishing a common way of structuring, understanding and managing information, and they can take all sorts of forms and shapes and be applied to a wide variety of assets or concepts. Among the most popular and widely used standards in the digital cultural heritage sector is the CIDOC Conceptual Reference Model (CRM), a knowledge representation model ‘for describing the implicit and explicit concepts and relationships used in cultural heritage documentation’ (CIDOC CRM n.d.). CIDOC CRM provides a common and extensible semantic framework for evidence-based cultural heritage information integration and reuse. Persistent identifiers (PIDs), on the other hand, are globally unique and long-lasting references to potentially any sort of digital entity and their adoption through the form of Archival Resource Keys (ARKs)14 or Digital Object Identifiers (DOIs)15 sits at the foundation of the FAIR principles by making cultural heritage data interoperable and encouraging reuse. Being able to uniquely identify digital or born-digital cultural heritage objects supports their discovery, curation and reuse – you cannot provide persistent or even consistent access to an item in order to reuse it if you do not know what it is. The importance of persistent identifiers is best realised through considering what happens when digital cultural heritage collections do not have a PID strategy: broken links, link rot and dying data, which affects not only the user experience but also all forms of data (re)use.
In the area of collections management, metadata standards, such as the Metadata Encoding and Transmission Standard (METS) created by the Library of Congress in the US, or Spectrum, the UK museum collections management standard that is also used around the world, are designed to describe and structure digital cultural heritage content in standardised human- and machine-readable ways and further enable its interchange and reuse. Finally, the International Image Interoperability Framework (IIIF) is a set of standards for interoperable functionality in digital image repositories, allowing users to choose different viewers and tools to interact with cultural heritage content.16 IIIF leverages interoperability and the fabric of the web to access new possibilities for image-based resources, while reducing long-term maintenance and technological lock-in.
Standards are capable of breaking data out of ‘silos’ and enabling the interchange and reuse of digital cultural heritage data, but Linked Data and semantic web technologies are able to truly reap the benefits of the reuse and interconnection of digital cultural heritage at scale. Originating from the concept of the World Wide Web, the semantic web has as its main purpose the interconnection of data through a set of tools and techniques to structure and relate information on the web so that it can be shared, discovered, integrated and reused efficiently by both humans and machines. Cultural heritage data is highly heterogeneous, multilingual, semantically rich and distributed, and in order to achieve interoperable creation, publication and reuse of such rich and varied content, information must be available in a standardised, searchable format through the application of the principles and technologies of Linked Open Data (LOD) and semantic web technologies (Jones and Seikel 2016; Bikakis et al. 2021).
Embracing a Linked Open cultural heritage data approach makes it possible to connect data from different institutions to enable better interoperability between collections and better opportunities for researchers and developers to use that data. A popular approach is to use Wikidata, a form of community-curated LOD, as a node for linking different cultural heritage datasets. Llyfrgell Genedlaethol Cymru/the National Library of Wales has been pioneering in using Wikidata to explore the benefits of LOD, from improving access to collections to data and metadata enrichment. In early 2016 a Wikipedian in residence, Jason Evans, and a Wikidata visiting scholar, Simon Cobb, converted a mass of library collections into Wikidata – free, open, linked data that anyone can access, interpret and visualise (Evans and Cobb 2016). Using Wikidata to connect common data elements such as people and places has helped to create rich bilingual data which can be freely reused with an internet connection and access to a computer or other digital device. This example highlights the benefit of adopting open common data standards to streamline services, improve discoverability within datasets and open up opportunities for new collaborations and the creative reuses that can come from sharing such rich data without restrictions. Following this attempt, the Semantic Name Authority Repository Cymru (SNARC)17 was established by the Wikipedian in order to provide
a central hub for name authority records relating to Wales and in the Welsh language … Based on Wikidata, but using a simplified and customised ontology, the data is presented as Linked Open Data and this makes it easier for us to define relationships between entities in our collections. All the data is bilingual and available on an Open Licence … Our goal is to grow the dataset using data from Welsh cultural organisations, connecting our heritage in one central hub. It will also act as an important bridge between 3rd party linked open data from Wikidata and other sources, and data curated by GLAM professionals around the world. (SNARC n.d.)
Using Wikidata shows that it is perfectly possible to work with linked open cultural heritage data without developing costly, proprietary, independent platforms that require advanced expertise in knowledge representation and semantic technologies, as well as resources for ongoing management, storage, hosting and continuous development. On the other hand, semantic web technologies also lie at the heart of many large-scale infrastructural investments in the area of cultural heritage, such as the Europeana aggregator and the newly established European Collaborative Cloud for Cultural Heritage (ECCCH), developed by ECHOES (European Cloud for Heritage OpEn Science).18 The latter project, funded by the European Commission and UK Research and Innovation (UKRI), brings together fragmented communities in the cultural heritage field into a new community around the digital commons. It remains to be seen where the idea for a UK National Data Library, discussed earlier, falls between these two poles.
Skills and training
To fully harness the potential of the recent computational turn in cultural heritage, a range of training initiatives and open educational resources have been developed to build capacity among professionals working within and alongside the cultural heritage sector as well as to support researchers in reusing digital cultural heritage datasets and collections. For example, The Programming Historian recently collaborated with The National Archives, UK and Jisc to produce a special series of open educational resources focused on computational analysis of large-scale digital cultural heritage collections.19 Similarly, Library Carpentry develops and delivers workshops to equip librarians and other information professionals in the GLAM sector with essential computational skills.20 Other notable training opportunities include initiatives like the Europeana Academy,21 the Cambridge Cultural Heritage Data School,22 the DARIAH-Campus courses focused on digital cultural heritage23 and Towards Digital Collections: Resources for Galleries, Libraries, Archives and Museums,24 produced as part of the Towards a National Collection programme.
The increasing demand for advanced data literacy within the cultural heritage sector, and the consequent training requirements, have gradually led to the establishment of digital cultural heritage as a recognised academic discipline at university level. For instance, since 2018 the University of Edinburgh has hosted a chair in digital cultural heritage, supported by initiatives like the Digital Cultural Heritage Research Network (DCHRN) and a dedicated Digital Cultural Heritage cluster,25 with a recent MSc in Cultural Heritage Futures.26 Similarly, the University of Glasgow has embraced this field with its Arts and Humanities Partnership Catalyst for Digital Cultural Heritage.27 Additional programmes, such as the MSc in Digital Heritage at the University of York,28 further demonstrate the integration of digital perspectives into the academic study of cultural heritage.
Restricting reuse
There are often, however, good reasons for restricting the uses to which digital cultural heritage can be put, and for the nuanced application of licensing to whole or partial collections. A recent challenge to forward-looking cultural heritage organisations that have previously allowed various kinds of use and reuse of their data has been posed by the Large Language Models (LLMs) that underpin Generative AI. LLMs and the companies behind them are hungry for data to train their models, and cultural heritage organisations, like many other bodies and organisations that post content online, are sources of high-quality data. Many LLMs have been trained on openly available cultural heritage datasets or creative works from various knowledge sectors, often without permission or credit. However, the ongoing legal uncertainty surrounding AI model training has made institutions increasingly hesitant or unwilling to share their content, reversing progress made over the past two decades towards greater openness and accessibility of their data. While the regulation of copyright and licensing for AI training data and outputs is still evolving – various countries are exploring approaches such as text and data mining exceptions, transparency measures and technical standards – GLAM institutions and stakeholders are responding in various ways to address the ongoing legal grey area, ranging from limiting access to their collections to implementing innovative strategies for monitoring and preventing unauthorised use of their datasets.
The National Library of the Netherlands, for example, has restricted access to its online collections for ‘commercial parties who crawl digital resources on websites on a large scale for training models’, using a combination of ‘technical measures’ and changes to terms of use. It remains committed to encouraging ‘academic research based on our collections as much as possible. We guarantee that this reuse shall not be hindered by our measures against AI companies’ (National Library of the Netherlands n.d.). This is a difficult balance to strike for GLAMs that, for over a decade now, have been focusing on developing a strong culture of openness in relation to their digital collections. The potential ethical and legal problems posed by the use of Generative AI in this space (for example, copyright, data protection and ethical sensitivities) are still very much emerging and add a new layer of complexity to the discussion about openness and the use and reuse of digital cultural heritage collections. A more drastic solution to prevent the unauthorised exploitation of artistic works is offered by tools such as Nightshade, developed at the University of Chicago. Nightshade is ‘a tool that turns any image into a data sample that is unsuitable for model training. More precisely, Nightshade transforms images into “poison” samples, so that models training on them without consent will see their models learn unpredictable behaviors that deviate from expected norms’ (The Nightshade Team n.d.). This is a highly contested and rapidly changing area of research and practice, and there is potential for erosion of public trust not just in AI but in the institutions who may wish to share their data and collaborate with technology companies.
The growing focus on use and reuse in relation to digital cultural heritage – by cultural heritage institutions themselves, by researchers and practitioners, by a range of other users and stakeholders – speaks to the enormous potential value of this new form of heritage material and indicates a growing willingness to embrace openness, transparency and even a relative loss of control over some forms of collections data. Standards, documentation, new tools and technologies, and training and skills are part of the solution to encouraging new and innovative uses of digital cultural heritage, but they also hint at some of the challenges to be faced. Capacity building and effective resourcing are essential if progress is to be sustained and further imaginative reuse encouraged and supported where appropriate.
Notes
1 The licence chosen was the very first iteration of the CC0 Universal Deed, https://
creativecommons .org /publicdomain /zero /1 .0 /deed .en [accessed 4 September 2024]. 2 Copyright, Designs and Patents Act 1988, https://
www .legislation .gov .uk /ukpga /1988 /48 /section /29A [accessed 30 January 2025]. 3 Directive on Copyright in the Digital Market, https://
eur -lex .europa .eu /eli /dir /2019 /790 /oj [accessed 30 January 2025]. 4 EUIPO Out-Of-Commerce Works Portal, https://
euipo .europa .eu /out -of -commerce / [accessed 30 January 2025]. 5 EUIPO Orphan Works Database, https://
euipo .europa .eu /orphanworks / [accessed 30 January 2025]. 6 The Santa Barbara Statement on Collections as Data, https://
collectionsasdata .github .io /statement /; the Vancouver Statement on Collections as Data, https:// zenodo .org /records /8342171 [accessed 20 January 2025]. 7 New York Public Library, https://
github .com /NYPL -publicdomain /data -and -utilities [accessed 30 January 2025]. 8 The Cooper Hewitt Museum, https://
github .com /cooperhewitt /collection [accessed 30 January 2025]. 9 The Tate Collection, https://
github .com /tategallery /collection [accessed 30 January 2025]. 10 Museums of the City of Paris/Paris Musées, https://
apicollections .parismusees .paris .fr / [accessed 30 January 2025]. 11 V&A Collections, https://
developers .vam .ac .uk / [accessed 30 January 2025]. 12 Wellcome Collection, https://
developers .wellcomecollection .org / [accessed 30 January 2025]. 13 Data Foundry, National Library of Scotland, https://
data .nls .uk / [accessed 30 January 2025]. 14 Archival Resource Keys, https://
arks .org / [accessed 30 January 2025]. 15 Digital Object Identifiers, https://
www .doi .org / [accessed 30 January 2025]. 16 International Image Interoperability Framework, https://
iiif .io / [accessed 30 January 2025]. 17 Semantic Name Authority Repository Cymru, https://
snarc -llgc .wikibase .cloud /wiki /Main _Page [accessed 30 January 2025]. 18 European Collaborative Cloud for Cultural Heritage, https://
www .echoes -eccch .eu / [accessed 30 January 2025]. 19 The Programming Historian, https://
programminghistorian .org /en /jisc -tna -partnership [accessed 30 January 2025]. 20 Library Carpentry, https://
librarycarpentry .org / [accessed 30 January 2025]. 21 Europeana Academy, https://
pro .europeana .eu /page /europeana -academy [accessed 30 January 2025]. 22 Cambridge Cultural Heritage Data School, https://
www .cdh .cam .ac .uk /events /39077 / [accessed 30 January 2025]. 23 DARIAH-Campus, https://
www .dariah .eu /2024 /11 /01 /dariah -campus -courses -on -digital -cultural -heritage -a -path -through -cultural -heritage -data -data -modelling -and -europeana -apis / [accessed 30 January 2025]. 24 Towards Digital Collections, https://
www .towardsdigitalcollections .org / [accessed 30 January 2025]. 25 Digital Cultural Heritage Research Network, https://
dchrn .de .ed .ac .uk / [accessed 30 January 2025]. 26 Cultural Heritage Futures, https://
efi .ed .ac .uk /programmes /cultural -heritage -futures [accessed 30 January 2025]. 27 University of Glasgow College of Arts and Humanities, https://
www .gla .ac .uk /colleges /arts /knowledge -exchange /catalyst / [accessed 30 January 2025]. 28 Digital Heritage (MSc), https://
www .york .ac .uk /study /postgraduate -taught /courses /msc -digital -heritage / [accessed 30 January 2025].