Skip to main content

A Matter of Trust: A Matter of Trust

A Matter of Trust
A Matter of Trust
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeA Matter of Trust
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. Cover
  2. Title Page
  3. Dedication
  4. Copyright
  5. Contents
  6. Acknowledgements
  7. About the authors
  8. Introduction
    1. Background
  9. 1. Records as evidence for measuring sustainable development in Africa
    1. Breakdown of records systems in Africa
    2. Records management, structural adjustment, public sector reform and computerisation
    3. Consequences for Africa of losing control of records
    4. Open data and records management
    5. Conclusion
  10. 2. The state of data and statistics in sub-Saharan Africa in the context of the Sustainable Development Goals
    1. Defining the terms statistics and data
    2. Census data
    3. Statistical activities in Africa
    4. SWOT analysis
    5. Overcoming the challenges
    6. Conclusion
  11. 3. Data, information and records: exploring definitions and relationships
  12. 4. The potential – constructive and destructive – of information technology for records management: case studies from India
    1. The Mahatma Gandhi National Rural Employment Guarantee Act
    2. Aadhaar
      1. Leaks and the system’s vulnerability to penetration
      2. Coercive action by a government in a hurry
      3. ‘Inhuman and illegal’: malfunctions and denials of services cause hardships
      4. Curbing – and enabling – corruption
  13. 5. Statistical accuracy and reliable records: a case study of mortality statistics in The Gambia
    1. Background
    2. Mortality rates in The Gambia
      1. How are mortality rates calculated?
    3. Challenges for collecting reliable birth and death statistics in The Gambia
      1. How are deaths recorded?
      2. How are death rates estimated?
      3. The reliability of birth dates
    4. Efforts to strengthen official statistics in The Gambia
      1. The Gambia Bureau of Statistics
      2. The significance of records for mortality statistics and the contribution of the National Records Service
    5. The benefits of shared responsibility for the quality of statistics
    6. Summary and conclusion
  14. 6. Mainstreaming records and data management in sustainable development: lessons from the public and private sectors in Kenya
    1. The public sector experience in Kenya
    2. Mobile banking in Kenya
      1. Relationship to the SDGs
      2. How do data and records management support mobile banking?
    3. Building bridges between the sectors
    4. Conclusion
  15. 7. Open data and records management – activating public engagement to improve information: case studies from Sierra Leone and Cambodia
    1. Sierra Leone
      1. Open data in support of free and fair elections
      2. The potential records management contribution
    2. Lower Mekong, Cambodia: land investment mapping
      1. The open data initiative
      2. The potential for a records management contribution
    3. Key issues from the two case studies
    4. Conclusion
  16. 8. Assuring authenticity in public sector data: a case study of the Kenya Open Data Initiative
    1. Data authenticity
    2. The Kenya Open Data Initiative
    3. Land data
      1. Land information management
      2. Examining the land dataset
    4. Conclusion
  17. 9. Preserving the digital evidence base for measuring the Sustainable Development Goals
    1. Elements of a digital preservation capability
    2. Implementation options
      1. Doing nothing
      2. Using open source software
      3. Developing a bespoke solution
      4. Procuring a commercial solution
      5. Outsourcing the service
      6. Partnership approaches
      7. Hybrid approaches
      8. Using consultancy services
    3. Implementation and operational implications
      1. Implementing a digital preservation service
      2. Governance
      3. Roles and responsibilities
    4. Training
    5. Policies and procedures
    6. Conclusion
  18. 10. Preserving and using digitally encoded information as a foundation for achieving the Sustainable Development Goals
    1. Requirements for SDG data to be fit for purpose
      1. Authenticity
      2. Longitudinal studies
      3. Combining data
      4. Errors
    2. Collecting and preserving data for SDGs
      1. Semantic issues
      2. Proportions
      3. Unclear metrics
      4. Rates
      5. Number of countries
      6. Money
      7. Prevalence
      8. Structural issues
      9. Virtual data
      10. Input data
    3. Digital preservation and exploiting digital data
      1. Basic concepts in digital preservation
      2. Types of digitally encoded information
      3. Digital preservation
      4. Active data management plans
    4. Is it really being preserved? The importance of certification
    5. Getting to where we need to be
    6. Conclusion
  19. 11. Transparency in the 21st century: the role of records in achieving public access to information, protecting fundamental freedoms and monitoring sustainable development
    1. Current transparency initiatives are undermined by weak records and information management
    2. Weakness in records and information management is a widespread and persistent problem
    3. New digital forms of communication and conducting government business have exacerbated earlier weaknesses in records and information management
    4. Weak control of digital records and information weakens transparency and public accountability mechanisms
    5. Persistent cultures of secrecy lead to oral government and avoidance of record-making and keeping
    6. Good data are needed on records and information management implementation in support of transparency
      1. Policy
      2. Standards
      3. Roles and responsibilities
      4. Systems and practices
      5. Capacity
      6. Policy
      7. Standards
      8. Roles and responsibilities
      9. Systems and practices
      10. Capacity
    7. Steps that can be taken to strengthen records and information management
      1. Strengthen laws and policies governing digital records management
      2. Introduce independent records and information management oversight
      3. Align incentives of public officials with RIM principles and transparency policies and laws
      4. Encourage collaboration
    8. Conclusion
  20. 12. Information management for international development: roles, responsibilities and competencies
    1. Quality information for international development
    2. Key players in records management, their roles and responsibilities
      1. Group 1: professionals with the necessary technical skills and qualifications (such as records, IT) to ensure information quality
      2. Group 2: managers (senior, programme, functional) who enable or facilitate the work of the professionals
      3. Group 3: all other stakeholders and users of the information, inside and outside the organisation
    3. Capacity for managing records
    4. Capacity Level 1
      1. (Poor quality records undermine SDG implementation)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    5. Capacity Level 2
      1. (Records enable SDG implementation at a basic level)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    6. Capacity Level 3
      1. (The quality of records makes it possible to measure SDGs effectively and supports government programme activities)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    7. Capacity Level 4
      1. (Well-managed records make it possible to measure SDG implementation effectively and consistently through time; data and statistics are of high enough quality and integrity to support government programme activities at the strategic level)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    8. Capacity Level 5
      1. (Processes generating records, and the framework for managing them, are designed to make it possible to exploit data, statistics and records, including the information used for measuring SDGs, in new and innovative ways)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    9. Determining and achieving the desired capacity level
      1. Employ staff with formal qualifications
      2. Train existing staff
      3. Contract expert staff short term as change makers
      4. Use standards to guide practice and inform staff recruitment
      5. Benchmark staff skills and knowledge against competency standards
    10. Conclusion
  21. 13. The quality of data, statistics and records used to measure progress towards achieving the SDGs: a fictional situation analysis
    1. Background
    2. Organisation of the report
    3. Methodology
    4. Definitions
    5. Analysis
    6. The government of Patria and the SDGs
    7. Data collection and analysis at the ministry level
      1. Survey data
      2. Registration and administrative data
      3. Scientific data
    8. Data and records issues at the ministry level7
    9. Data and records issues at the NBS
    10. Implications of the failure to establish a management framework
    11. Strategies for sustainable solutions
    12. Laws and policies
      1. Issues
      2. Strategies
    13. Standards and practices
      1. Issues
      2. Strategies
    14. Systems and technologies
      1. Issues
      2. Strategies
    15. People
      1. Issues
      2. Strategies
    16. Management and governance
      1. Issues
      2. Strategies
    17. Awareness
      1. Issues
      2. Strategies
    18. Implementing the strategies
    19. Capacity levels to guide the way forward
      1. Level 1: poor-quality data, statistics and records undermine SDG implementation
      2. Level 2: data, statistics and records enable basic SDG measurement
      3. Level 3: the quality of data, statistics and records makes it possible to measure SDGs effectively and supports government programme activities
      4. Level 4: well-managed data, statistics and records make it possible to measure SDG implementation effectively and consistently through time; data and statistics are of high enough quality and integrity to support government programme activities at the strategic level
      5. Level 5: processes generating data, statistics and records, and the framework for managing them, are designed to make it possible to exploit data, statistics and records, including those measuring SDGs, in new and innovative ways
    20. First steps
      1. Identify a leader and assemble a team
      2. Identify processes as examples
      3. Describe the selected processes
      4. Identify issues and implications
      5. Develop strategies for resolving issues
      6. Apply the experience to other processes and to the framework for managing data/statistics/records
  22. Index

3. Data, information and records: exploring definitions and relationships

Geoffrey Yeo interviewed by James Lowry1

The achievement and measurement of the Sustainable Development Goals (SDGs) depend on the availability of trustworthy data from a variety of sources. Records, especially records created by government agencies, are often identified as one of the most important sources from which such data can be derived. Over centuries, the records and archives management profession has developed approaches to maintaining, controlling and contextualising records, which can help users assess the trustworthiness of the records and perhaps also the quality of the information that can be gained from them. With so much vested in the SDGs, it has become increasingly important to interrogate these terms – records, information and data – to achieve a better understanding of how they are interrelated. In this interview, James Lowry asks Geoffrey Yeo, author of Records, Information and Data,2 to analyse the distinctions and relationships among these concepts.

JL:I’d like to ask you about some of the themes you discuss in your book and their relevance to the challenges of measuring global progress towards the SDGs. In the book, you explore a number of topics that may perhaps help us to understand what is meant when people speak about ‘information’, ‘data’ and ‘records’, and how they affect the indicators used to measure progress toward the SDGs.
GY:I certainly hope that my book will make a useful contribution. It doesn’t specifically address the SDGs, but it looks more generally at records, information and data, the different meanings that people have attached to these terms, and the different ways in which their relationships have been interpreted and understood.
I had a number of aims in writing the book, but one key aim was to examine the growing tendency among records professionals (records managers and archivists) to try to explain records in terms of information. Philosopher John Searle has described information as ‘one of the most confused and ill-defined notions in ... intellectual life’.3 Why, then, does it have such a high profile in contemporary discourse? Why have records professionals attributed such importance to it in recent years? These questions turned out to have many ramifications, and investigating the connections – real or supposed – between records and information turned out to have many more.
The approach I took is rather different from the approach followed by most writers about records management and information management. The book looks at many varied ideas about the meaning or meanings of ‘information’ and explores many aspects of what it calls ‘the role of recordkeeping in an information culture’. It argues that concepts of information, although currently fashionable, don’t provide an adequate foundation for understanding how records are made or how they operate. Information is not what records are, nor is it what they contain, but it is, perhaps, what we may hope to gain from using them intelligently. Information should be associated with the use of records rather than with their creation.
JL:It is generally accepted that data play important roles in assessing and achieving the SDGs, and they have often been considered to have close relationships to records and also to information. Where do data fit in to your analysis?
GY:Originally, I hadn’t intended to say much about data. But, as my work on the book progressed, I found that I couldn’t adequately explore the concepts of records and information, and the associations between them, without considering data and the burgeoning worlds of data science and data management. Relations between records and data became an important theme of the book. As you know, it’s a topic on which there are widely varying opinions.
I think a lot of difficulty arises because of uncertainty about what we mean when we speak of ‘data’. Does the term refer to anything that is, or can be, stored on a computer? Or only to digital materials that are in some way meaningful? Or only to materials that exist in structured formats (as, for example, in databases)? Or is ‘data’ a wider term, embracing a range of non-digital as well as digital resources? Each of these views has its advocates, and those who adhere to any one view often take it for granted, not recognising that other people may understand the term ‘data’ very differently. When people talk about the importance of data for achieving – or measuring progress towards – the SDGs, they don’t always take the trouble to explain what they mean by ‘data’.
JL:Other contributors to this book will have much more to say about connections between data, records and the SDGs; I’d like to focus our conversation on records and their relationship (or relationships) to data. You have characterised one view of this relationship as sequential, where records are made prior to the creation of data.4 You see an association between this view and environments where digital systems employed structured data, and records were made and kept in paper form.
GY:This is not my own view, but it’s a view that often surfaces in the literature. It sees records as ‘source documents’ from which data are extracted or derived. For those who support this view, records arise from the conduct of organisational business; data entry clerks then examine the records, identify appropriate content, format or code it, and input it as data into ‘structured’ database systems. The data in these systems are then used for a range of purposes including administrative and financial control, strategic planning and decision-making: purposes beyond those that led to the original creation of the records. For example, employment data can be derived from records of staff appointments, agricultural data extracted from land surveys, or environmental data aggregated from records of impact assessments.
Sometimes, a chain of processes is involved. Coders or data entry clerks identify relevant details in what are supposedly ‘unstructured’ records and enter them into a ‘structured’ database. Or this task may be automated, perhaps using some kind of recognition technology. Either way, this initial ‘data entry’ is a preliminary to further processing. When the initial entry is complete, a computer program takes the structured textual data and uses them to create processed or ‘computational’ data, or statistics of various kinds. More complex routes involving multiple stages of further processing are also possible.
From this point of view, the quality of the data depends on the quality of the original records from which they are drawn. The reliability, accuracy and trustworthiness of the records determine the reliability, accuracy and trustworthiness of the data and statistics derived from them. Poorly kept records, it is argued, result in inaccurate, incomplete or unverifiable data, which can lead to organisations wasting resources attempting to process or analyse data that are of poor quality. Worse, governments and donor agencies can be misled into making ill-informed decisions with potentially damaging consequences. Skewed findings, misguided policy initiatives and misplaced funding can all have devastating effects on people’s lives. Open data projects may also rely on data derived from poorly kept records; and so citizens may unwittingly be provided with data that are untrustworthy. Data can be collected from interviews, experiments, surveys, measurements or calculations as well as from records, but when data are extracted from records, the ability to trace the data back to the records from which they are derived is an important issue. Records can also serve to document the procedures by which data are collected and the processing methods that are applied to data, and further serious difficulties can arise if records serving these purposes are not made and kept to appropriate standards.
These views have been expressed in many reports issued by the International Records Management Trust (IRMT) over the past 20 years. They were first articulated by Piers Cain and Anne Thurston in the late 1990s, at a time when the first automated systems were being acquired in low-income countries. Donor agencies were actively encouraging governments in these countries to adopt automated systems, particularly (at that time) for personnel and payroll data. Automation was often seen as the solution to the inadequacies of existing paper systems where records had been poorly maintained and the information that could be obtained from them was frequently incomplete or unreliable. At the same time, however, the existing paper records were seen as ‘the primary sources of the data needed for input into the automated system’.5 Automation didn’t solve the problem of reliability; it simply transferred the problem from a paper-based to a digital environment. The IRMT affirmed that the answer lay in effective records management controls, which would support and ensure not only the systematic creation and survival of the records that were needed, but also their orderliness, trustworthiness and continuing accessibility.
I don’t want to suggest that these issues have become outdated or unimportant, but they are characteristic of an era when records chiefly took the form of paper files and data were associated only – or very largely – with structured automated systems; databases often had to be populated from paper sources. Today, in wealthier countries – and increasingly also in many less wealthy ones – paper files are becoming obsolete, records are being maintained in digital rather than paper form, and the world of recordkeeping looks very different.
Recently, we’ve heard much about ‘datafication’, which may also lead us to rethink our approach to these issues. The term ‘datafication’ became popular after it was used in a 2013 book by Viktor Mayer-Schönberger and Kenneth Cukier.6 To ‘datafy a phenomenon’, in the words of these writers, is to put it in a format that allows it to be tabulated and analysed. Elsewhere, Cukier and Mayer-Schönberger tell us that ‘datafication is … taking all aspects of life and turning them into data’.7 More specifically, it seems, it is about transforming resources so that they can be analysed in depth using new computational and analytical techniques from the realms of big data and artificial intelligence. It has often been noted that, by using these techniques to detect and analyse themes, patterns and relationships in digital materials, we will be able to open up innovative modes of discovery and investigation.
Other commentators have picked up these ideas, and I think that datafication can now be understood in at least two senses: it can be interpreted as a practical imperative to create resources in, or convert them into, datafied forms; more conceptually, it implies an intellectual reframing of all digital objects as data amenable to computation. These changes are having a major impact, not only on governments and businesses, but also on the world of scholarship. Academic writings now often refer to the ‘datafication of the humanities’ or the arrival of ‘computational social science’.8
These developments certainly have relevance for the records discipline. Some records professionals, and some computing experts, have begun to look at digital records – and digitised versions of analogue records – from the specific perspective of data science, and see them as candidates for participation in computationally-based data analysis projects. Advocates of datafication argue that reconceptualising records as data - or perhaps transforming records into data – is moving us to ‘a world in which … the whole record can be mined and analysed’.9 They have also argued that, if this transition is to succeed, digitisation processes for paper and other analogue records will have to be restructured to generate computer-processable data rather than mere digital images. At the same time, we are told, creators of born-digital records should be persuaded to use analytics-friendly formats.
As yet, these changes have had little impact in most low-income countries, where the pace of technological development has been less rapid. At present, many low-income countries are still heavily dependent on paper records; even as digital applications are being introduced to support the current work of government agencies, these countries often have little capacity to manage records in digital form. Indeed, in many of these countries, there is no recognition that digital records are records or that recordkeeping principles should be applied to them. Issues such as these remain of primary concern today. In these circumstances, the so-called ‘datafication’ of records may appear to be a topic of interest only to wealthier countries and may seem to have little immediate relevance in poorer areas of the world.
Nevertheless, the changes that are under way in wealthier countries suggest that the need to translate data from records into structured databases is becoming outmoded in new environments where records are largely digital and analytical tools can be applied directly to them. Although at present the older models of using paper records as a source for digital data entry and of converting ‘unstructured’ records to ‘structured’ data still have validity in many low-income countries, in the future we can expect them to be superseded everywhere as new skills are developed and newer technologies become more widely available around the world. As notions of datafication become more widespread, it will become more apparent that we need no longer see records and data as two distinct kinds of entity; instead, datafication suggests that records themselves can be interpreted as data that we can mine, analyse, reuse and repurpose.
JL:In my work with the international development community I’ve noticed that the concept of ‘records’ is often seen as quaint or irrelevant. Given that advocacy for recordkeeping requires us to speak the language of stakeholders, users, budget holders, etc., should we abandon the language of records in favour of the language of data?
GY:Despite changes in technology, the challenges that Cain and Thurston identified 20 years ago haven’t disappeared. It remains the case that, when one set of resources (let’s call it ‘A’) is analysed or processed to create another resource (‘B’), the utility of B always depends on the qualities of A, as well as on the processing methods employed. Regardless of whether we want to label A and B as ‘data’ or ‘records’, the old adage ‘garbage in, garbage out’ still applies. And irrespective of whether we choose to speak of ‘data’ or ‘records’, the issues of ill-informed decision-making, misguided policies, misplaced funding and failed attempts at open government still arise, with all their consequences for the lives of citizens in lower-income countries. Similarly, in the context of the SDGs, we will find it impossible to measure whether the SDGs have been achieved if the resources for assessing their achievement are unavailable, inadequate or unreliable. In addressing these challenges, we undoubtedly need to find a language or languages to help us articulate our understandings and communicate our concerns and proposed solutions to other stakeholders. There seems to be a case for abandoning the kinds of distinction between records and data we have made in the past and seeing whether we can achieve more practical success if we frame our approaches in a different way.
Where communication with others is concerned, it may sometimes be appropriate, or more effective, to talk about data; talking about records does not always seem to have the same resonance. As I wrote in my book:

the language that now carries weight … in the corridors of power is the language of data and information, and many records professionals … feel a political imperative to adopt this language when they seek to convince resource allocators or government policy-makers that they can contribute to the 21st-century digital landscape.10

But this approach has its own difficulties and drawbacks. One difficulty is that some of the people with whom we speak are likely to assume that data are always created and maintained digitally, and that analogue records have altogether fallen out of the picture, which is certainly not the case. Another difficulty with speaking of records as ‘data’ is the widespread notion that data are simply ‘raw facts’ or ‘sources of truth’ and are wholly or largely independent of social and contextual influences. As I noted in my book, when we look at a database or dataset:

no one seems to be making statements; no one is affirming that they can vouch for the data; the apparent absence of signs of authorship gives the impression that the data are uncontroversial and objective.11

But, of course, data aren’t autonomous, independently valid or context-free. They are always conditioned by the practices used to generate them and the circumstances that led to their production. Data are rarely as uncomplicated as they seem. The ‘facts’ they present are propositions about the world or about actions or events: propositions stated by humans, or by computing devices programmed by humans, in particular contexts. And this means they can never be exempt from social constraints or from possibilities of error, ambiguity or bias.
So when I said that there seems to be a case for abandoning the kinds of distinction between records and data that we’ve made in the past, I wasn’t trying to suggest that records managers and archivists should forsake the concept of ‘records’ in favour of speaking and thinking only about ‘data’. On the contrary: I wanted to suggest that, rather than simply reconceptualising records as data, records managers and archivists – at least in their own professional discourse – might usefully be encouraged to understand data as records. Viewing records as data opens the way to employing powerful analytic tools, which will enable new modes of future investigation and research; but viewing data as records reminds us that data are shaped by their cultural contexts and that effective use and comprehension of them will only be possible if knowledge of their contexts is safeguarded.
Placing emphasis on records rather than data also reminds us that records do much more than communicate facts, or supposed facts, about the world. They also allow us to express ideas, opinions, emotions and predictions; to pose questions, issue orders, make promises or establish rights and responsibilities. In coordinating human behaviour and social relations, they are part of the way we conduct business and live our lives. In the digital world as much as in the analogue, records are more than ‘data’; they are instruments through which social actions are achieved.
Alan Bell12 has written about the dangers that can arise when archivists and records managers choose to speak about information rather than records – particularly the dangers that they may be led to forget, disregard or even deny the importance of what David Bearman13 called ‘recordness’ – and it seems to me that the same dangers will arise if we are over-enthusiastic in adopting the vocabulary of data. Perhaps we can or should use this vocabulary when we think it is politically necessary, while remaining aware that it doesn’t offer us a solid base for reflective professional thinking about records and their keeping. But our professional leaders and professional associations also need to promote a campaign to overcome the idea that records are a ‘quaint’ legacy of an archaic paper-based world, and to reaffirm the continuing importance of records in the 21st century, both as instruments of current social action and as bulwarks that support our knowledge and understanding of past events.
JL:Could we agree with the many data scientists, analysts, journalists and others who see paper records as data?
GY:I think you are right when you say that analysts and commentators from many different backgrounds see paper records as data, but I’m not sure that they would all approach this question in the same way. Once again, much depends on what people mean when they speak of ‘data’. Those who perceive data as essentially structured materials – the kinds of materials we typically find in relational database systems – should have little difficulty in recognising that, before the advent of digital technology, similar materials were created using paper records: ledgers, registers, card indexes and the like. It seems to me that, if we accept this premise, it requires no great conceptual leap to understand these paper artefacts as data, or at least to understand that they hold data.
Individuals who use records for purposes of academic scholarship might approach your question in another way. Many scholars – particularly in the field of history, but also in other disciplines – have long had a perception of records as data that can support their research. They use the word ‘data’ to refer to the materials they can employ to unravel a problem and reach conclusions to their investigations. Historians and other scholars who use the word ‘data’ in this sense do not seek to limit its scope to digital resources. Nor, I think, do they see data as resources that are necessarily or primarily in structured form. Some historians may see records as the only ‘data’ they need; others may say that records – whether digital or analogue – are simply one of many different kinds of data they employ in their research.
Despite the appropriation of the term ‘data’ by the computing industry during the past half-century, my own view is that it can still be applied to paper as much as to digital materials. But I also think that this is an area where we need to proceed cautiously. As I said, one of the risks that records professionals run in adopting the language of data is that other stakeholders in the workplace may assume that data are always digital. The broader scholarly view that ‘data’ can embrace many different media is not always acknowledged outside academic circles. Any expectation that everyone in government, in business or in the international development sphere will recognise analogue records as data is likely to give rise to misunderstandings and failures in communication.
JL:I think that data are the building blocks of records, whether the records are paper or digital. Does this differ from the ‘sequential’ view of records and data, or do you think they are much the same?
GY:The ‘building blocks’ view of records and data is not the same as the ‘sequential’ view, although both views have had distinguished advocates. Whether you accept the ‘building blocks’ view will partly depend on what you think we mean by data, and on the levels of granularity at which you believe data exist.
If you agree with the computer scientists who tell us that a single bit or byte is ‘the smallest unit of data a computer can handle’,14 then yes, you can reasonably claim that low-level data such as bits or bytes are not themselves records, but are ‘building blocks’ from which a digital record can be constructed. Similarly, in the paper world, a single pen-stroke, a single letter of the alphabet or a single word might perhaps be construed as ‘data’, but I think they cannot so easily be construed as records; pen-strokes, alphabetical and numerical characters and words are not records but building blocks of records.
But if you think that ‘data’ must refer to something less granular than a single bit, byte, character or word – if you think that data must be capable of conveying a greater degree of meaning – matters become a little more complicated. One widely held view of the term ‘data’ is that it generally refers to structured statements such as ‘President: Joe Bloggs’ or ‘number of widgets in stock: 39’. From a records perspective, statements of this kind can be seen in different ways. If we want, we can certainly see them as building blocks of records, but each of them can also be seen as a complete record in itself. If we are seeking a record and find an entry of this kind in a database, we are not obliged to look for further components; we have found a record of an assertion that Bloggs is the president, or an assertion that 39 widgets are in stock.
From a computer science perspective, too, entries like these can be seen in different ways. ‘Number of widgets: 39’ can be seen as data at one level, but it can also be seen as a building block of a larger set of data at a higher level. I think we should accept that data and records may both exist at different levels of granularity. Lower-level data can be (and often are) used to construct larger aggregations of data; lower-level records can be (and often are) used to construct larger aggregations of records. And perhaps this is little more than two ways of saying the same thing.
JL:I have argued that machines such as autonomous cars and autonomous weapons are record-making devices if they receive, transmit or store data with even basic contextual metadata, but this is rooted in the notion that records are data with metadata and structure. Some of the SDGs will depend on sensor data, both to be achieved and to be measured. How should we recognise records in devices and systems?
GY:I’m sure you are right that the data captured by so-called ‘smart’ or autonomous devices are records. They are records of the functioning of the device and of its sensing of the environment in which it operates. And, given the potential for these devices to act in ways that could have major consequences for human lives, it seems vital that such data should be recognised as records that may need to be retrieved and interpreted in future, and should be preserved and managed accordingly. As with so many initiatives in computer technology, there is a serious risk that the recordkeeping requirements will not be recognised by the developers of these devices or will only be recognised at a stage in their development when it is too late to implement them satisfactorily.
The only point where I might disagree with you concerns the need for separate contextual metadata. Distinctions between data and metadata aren’t always clear-cut in data-centric environments; what person X thinks of as metadata may be perceived by person Y simply as further data that the device has captured. I’d argue that the data captured by these devices are records even if their metadata aren’t separately identified. And that metadata, whether separately identified or not, are also records.
JL:You have written that ‘record-making is always … bound to contexts of social action’.15 It could be argued that although data collection takes place in social contexts, it is not necessarily bound to those contexts in the same way as it is for records, since bonds of this kind would require a persistent relationship to contextual metadata. You also wrote that the creation and transmission of records are ‘not a matter of information, but a matter of social action’.16 Can this also be said about data? Could we summarise this line of thinking by saying that a record is data with metadata?
GY:Rather than arguing that records are data, I prefer to explore the idea that data are records. Some of our colleagues have claimed that only some data are records,17 but I’m increasingly inclined to the view that all data, if they persist in a stable form beyond their moment of creation, have record characteristics. In my book, I proposed a number of arguments in favour of seeing persistent data as records. Data are not context-free but arise from particular acts of statement-making or recording that take place at particular moments. Over time, they are also likely to be subject to interventions from their custodians or users, interventions that add to the richness of their contexts. Even if our knowledge of those contexts is imperfect or has been lost, the data are still shaped by the contexts in which they have been created and stored; the bond doesn’t disappear simply because we have little or no knowledge of it.
What about the need for metadata? Well, some people say that ‘if there are no metadata, it’s not a record’; I think this may be what you are suggesting? Of course, contextual metadata are beneficial, because they help reduce the risk of total loss of contextual knowledge. And let’s not forget that there are many other kinds of metadata that serve other equally useful purposes. However, I don’t think it’s as simple as that. Replying to your previous question, I said that distinctions between data and metadata aren’t always clear-cut. In data-centric environments, it’s not always necessary to identify metadata as a separate category; data in which assertions are made about context can be very useful even if they don’t sit in a little box labelled ‘metadata’. We can still have records even when their metadata aren’t separately identified; we don’t need to find the little box labelled ‘contextual metadata’ in order to know whether we are looking at a record.
Now I’d like to go further and suggest that we can encounter records even when assertions about their context seem wholly absent. A good example might be the 11th-century survey that we know as Domesday Book. Today, of course, it is surrounded by vast quantities of metadata; the book and its contexts have been described on countless occasions. But when it was compiled in 1086 it must have stood alone in glorious isolation, with no metadata and no written contextualisation of any kind. Its contexts were well-known to its users, and no one felt it necessary to inscribe them; arguments about the need for contextual metadata didn’t arise in 11th-century England. Advocates of the mantra ‘if there are no metadata, it’s not a record’ presumably have to believe that Domesday Book wasn’t a record until somebody catalogued it, many years later. But I’m sure that, like me, you will find this absurd. The status of Domesday Book as a record – as one of the most valued records that survives from the Middle Ages – has nothing to do with its metadata. Of course, present-day records systems require metadata if they are to function effectively; metadata are far more necessary in a 21st-century era of record abundance than in an 11th-century era of record scarcity. Users of records can be seriously handicapped when metadata are missing or inadequate for their needs. Nevertheless, while the presence of metadata is always a very good thing, it doth not a record make. Records are records even when the metadata we seek are lacking.
JL:If data can be combined into records, and if configurations of these data can constitute evidence, do we need to revisit legal theory as a foundation for defining records? It seems that courts often consider many forms of data, information and records to be evidential, whether or not they meet archival standards of trustworthiness.
GY:As I’ve said, I’m not sure that it’s entirely helpful to talk about data being combined into records. There are other, and, I think, more fruitful, ways of looking at the relationships between data and records. But the question of whether legal theory provides a foundation for defining records is a separate issue; it doesn’t depend on our understandings of data and their combination.
In the past, certainly, there has been a long tradition of seeing records in legal terms. When the Public Record Office was set up in London in 1838, its remit was limited to the records of courts of law; the writings of the administrative departments of government weren’t formally deemed to be ‘records’, and the Office’s responsibilities weren’t extended to administrative writings until the 1850s. In England, the idea that records emanate only from law courts dates back to the early Middle Ages, when ‘record’ was a formal oral recollection of court proceedings. When oral methods of recalling judicial business were superseded by writing, the word ‘record’ was applied to their written successors, and definitions of ‘record’ that confined the record to legal settings persisted down to the 19th century. As early as the 17th century, however, the word ‘record’ was being used more widely outside the legal world; over time, it became increasingly common for people to speak of the ‘records’ of any institutional body, and – more recently – of the ‘records’ of families and individuals. I’m sure that, today, almost no one in England would be likely to restrict the word to the records of the law courts.
In continental European countries with systems of civil law, legal traditions are very different. In these countries, the word ‘record’ is largely unknown; lawyers, diplomatists and archivists in civil-law countries have traditionally spoken of ‘documents’, and the evidential function of documents has been analysed in jurisprudence and embedded in law over many centuries. In recent years, some records professionals have chosen to equate the civil-law ‘archival document’ with the English word ‘record’,18 although I think it’s open to question whether this equation is fully correct. Of course, in English-speaking countries, the common law also recognises that records (whether emanating from law courts or from other places) have evidential aspects. Indeed, the word ‘evidence’ belongs to the English common-law tradition; civil lawyers have generally preferred to use words such as ‘proof’ and ‘authentication’. But the common-law view of the evidential aspects of records is perhaps less rigorous and systematic than the view you find in the traditions of continental Europe.
You ask how far I think legal theory might still provide a foundation for defining records. The first point I’ve tried to make is that we aren’t dealing with a single legal theory here. Civil-law ideas about ‘documents’ are different from common-law ideas about ‘records’; and there may be other legal traditions, such as sharia law, that could or should be taken into account. My second point would be to sound a note of caution about the idea of ‘defining records’: we can offer definitions that help us examine a range of different perceptions and understandings of records, but I don’t believe that we will ever be able to construct an incontrovertible statement of ‘what a record is’.
Having said this, I think it is important to acknowledge that legal theories have been a major force shaping people’s understandings of records in the past; and the ways in which we understand records today can’t be wholly independent of the understandings we have inherited from earlier generations. And, of course, legal aspects of record-making and recordkeeping still influence our work today. We can see this, for example, in the work of national standards bodies on records’ legal admissibility and evidential weight. But today we recognise, or should recognise, that the role of records is not limited to the provision of evidence. We also recognise that the evidential role of records isn’t confined to legal circles: auditors, journalists, historians and many other users may see records wholly or partly in evidential terms. The legal aspect of making and keeping records is certainly a part of the mix, but it is not the only part – and not even the most significant part, in my view.
JL:You and others have argued that the evidential paradigm should not dominate recordkeeping theory. You have said that, in addition to evidence, records can offer other benefits including memory and senses of individual and communal identity. What are the important qualities of records if we start from a position where records are testimony of the personal or cultural?
GY:I didn’t mean to suggest that evidence is unimportant. If they had no means of uncovering evidence, institutions that seek to promote justice and accountability would be unable to function or would find their functioning severely impaired; individuals would often be unable to assert their rights against powerful vested interests. On some occasions, human witnesses can supply evidence when it is needed; on other occasions, especially when human witnesses are unavailable or untrustworthy, institutions or individuals rely on records to obtain the evidence they require.
But all the benefits – I call them ‘affordances’19 – that records offer can be important to those who rely on them. Consider, for example, the role of records in supporting memory. Human memory, we know, is fallible, and many people depend on records to redress its failings. Some people may claim that they live only for the present or the future and that memories of the past are unimportant to them, but others affirm that their lives would be empty and meaningless without such memories. Information, too, is an affordance of records, and different people and different cultures around the world will assign different values to affordances such as evidence, information and memory. Some will find affordances that others don’t recognise.
I was intrigued by your use of the word ‘testimony’ in the last part of this question. It’s a word that records professionals don’t use as often as one might expect. I like it because it carries resonances of people who say ‘I can tell you about it because I was there. I saw what happened with my own eyes’. Creation of records implies direct participation in, or first-hand knowledge of, an action or event. I concede that it’s possible to find examples of records created by people who don’t have such immediate knowledge: the official record of a birth, for instance, is made by a registrar who was not present when the baby was born, but who relies on statements made by others with first-hand knowledge of the birth. However, I’d argue that, for most people, a key aspect of what we think of as records is that their creators participated in the actions or events they represent or were able to call on first-hand knowledge of them.
An account of events written by someone without first-hand knowledge (such as a school textbook on medieval history) can be valuable in its own right, but we don’t usually think of it as a ‘record’ (or ‘testimony’) of the events that the author has written about. We could perhaps say that one of the ‘important qualities’ of records – I’d prefer to say one of the qualities that people tend to look for in records – is that they were created by someone closely connected to the matters they represent. Or by a mechanical device with a similarly close connection; the sensing devices you asked me about earlier offer a useful reminder that records in today’s world don’t have to be created by humans.
Nevertheless, I have a couple of reasons for being cautious about describing records as ‘testimony’. First, because the word ‘testimony’ is closely associated with ideas about ‘witnessing’, many people will naturally associate it with the role of witnesses in a court of law. Although this isn’t the only sense in which we can speak of ‘testimony’, I feel that the word can’t be wholly disengaged from ideas about evidence, and particularly legal evidence presented in court. Yet evidence, as I’ve said, is only one among many affordances that records offer. When I speak of the connection between records and actions or events, I choose not to describe records as ‘testimony’; I prefer to say that they are representations of actions or events, created by people who participate in, or have close knowledge of, the actions or events concerned. This terminology, I think, gives no primacy to evidence. Of course, no choice of terminology can be wholly neutral, but speaking of ‘representation’ seems less weighted in this regard than speaking of ‘testimony’.
Second, I think the word ‘testimony’ always seems to bear connotations of looking back to some action or event that took place in the past: an action or event that is separate or distinct from the ‘testimony’ that now tells us about it. From the perspective of users consulting records made at an earlier date, this is indeed what records do: they tell us about things that occurred in some other time or place. But at the moment of their creation, records don’t merely provide a retrospect on previous actions; the issuance of a record performs an action in itself. To create a representation is to perform an action, and we can also perform many other kinds of action – we can make statements, ask questions, give orders or enter into commitments – by creating and communicating representations of them. The creation of records always has a role in social action; it is always performative; and it is the performativity of records that gives them much of their authoritativeness and their power. ‘Testimony’ is a valuable concept, but I’d be reluctant to say that ‘records are testimony …’ is ‘a position we start from’.
JL:Given that the SDGs are targets for action in many countries with widely differing circumstances, do definitions of records need to be specific to the contexts of the records’ creation or use?
GY:I’m not sure that they need to be, but I think that in practice they very often are specific to those contexts. Whenever we construct definitions of records, we think of records in particular ways, and those ways of thinking are always conditioned by our own circumstances.
Consider, for example, how records are defined in the international standard for records management, ISO 15489: they are said to be ‘information created, received and maintained as evidence and as an asset by an organization or person, in pursuance of legal obligations or in the transaction of business’.20 Ostensibly, this is an all-purpose definition that embraces personal as well as organisational records; although the standard is primarily for organisational use, the authors of the definition took care to state that records could be created, received and maintained by individual ‘persons’ as well as organisations. Nevertheless, their reference to the role of records ‘in the transaction of business’ might be thought to betray an organisational bias; their reference to keeping records ‘as an asset’ was undoubtedly influenced by contemporary ideas about the management of corporate ‘information assets’. The definition almost certainly would not have employed this terminology if it had been written by a keeper or user of personal records, or if it had been written at any time before the late 20th or 21st century. The definition is not universal, but was moulded by its authors’ circumstances, which led them to think of records in a particular way.
JL:Looking ahead, I think that recordkeeping – or archival science – will become a specialisation within data science, or within computer science more generally. Do you agree?
GY:No. Undoubtedly, the great majority of records in the foreseeable future will be created and maintained in digital form, and the practical tools we will use to maintain them will be designed using the techniques of computer science. The sheer volume of digital records will make it – is already making it – impossible for records managers and archivists to scale up their traditional manual methods of working, which will have to be replaced by automated processes. The use of computational techniques and artificial intelligence in areas such as description, preservation and access will become an essential part of working life for every records professional. But I don’t believe that archival science as a discipline will be subsumed into computer science. Archival science has concerns for the distinctive societal roles of records and archives, concerns that data science and computer science do not share.
Archival science also embraces – and must continue to embrace – the legacy of many centuries of records created using paper and other analogue media. Human needs for records antedate the invention of writing, and have endured for about 10,000 years across many shifts in technology; the interests of archival science are not confined to digital records, which are a product only of the last half-century. While the challenges and opportunities of digital technologies increasingly occupy the centre of the stage, I am confident that archival science will remain a distinct discipline concerned with understanding, evaluating and managing the records created in the past by non-digital means, as well as the records created digitally in the present and future.
JL:Although you clearly want to differentiate recordkeeping from data science and computer science, I sense that you are very reluctant to provide conclusive or universal definitions of terms such as ‘records’ and ‘data’.
GY:You’re right; this is not a task I would want to attempt. ‘Records’ and ‘data’ are words that can bear a wide variety of meanings and interpretations, both within and across disciplines, and I believe it would be inappropriate to try to impose a single definition of either term. In my book, I argued in favour of a way of looking at records as persistent representations of actions and events: this is a way of looking that I personally have found very helpful. Although I’d prefer not to label this view of records as a ‘definition’, many commentators – perhaps inevitably – have chosen to refer to it as ‘Yeo’s definition of records’. Regardless of how it is labelled, others will be welcome to use, or adapt, it if they find it beneficial to their own thinking, research or practice. But I certainly wouldn’t want to suggest that my way of looking at records is the only possible or only acceptable way; definitions of records remain a moving target.

In countries where the SDGs are objectives for strategic action, individuals and communities will undoubtedly have varied assumptions, ideas and beliefs about the scope of data and records, their interrelationships and their roles in sustainable development. Individual contributors to this volume come from many different disciplines and will also have different conceptual understandings of records and data. Yet I’m sure you’ll agree that collaborative working will be essential if we are to move forward on the issues and concerns expressed in their contributions. If the chapters of this book help different stakeholders to recognise and understand the diverse viewpoints of others with whom they seek to collaborate, they will play a very valuable part in cross-disciplinary communication and cooperation.

1For biographies of Geoffrey Yeo and James Lowry, see the list of contributors at the beginning of this volume. See also Chapter 8 in this volume.

2G. Yeo, Records, Information and Data: Exploring the Role of Record-Keeping in an Information Culture (London: Facet Publishing, 2018).

3J. R. Searle, Making the Social World: The Structure of Human Civilization (Oxford: Oxford University Press, 2010), p. 71.

4Yeo, Records, Information and Data, pp. 111–12.

5P. Cain, ‘Automating personnel records for improved management of human resources: the experience of three African governments’, in R. Heeks (ed.), Reinventing Government in the Information Age (London: Routledge, 1999), pp. 135–55, at p. 146.

6V. Mayer-Schönberger and K. Cukier, Big Data: A Revolution That Will Transform How We Live, Work and Think (London: John Murray, 2013).

7K.N. Cukier and V. Mayer-Schönberger, ‘The rise of big data: how it’s changing the way we think about the world’, Foreign Affairs, 92 (2013), http://www.foreignaffairs.com/articles/2013-04-03/rise-big-data.

8T. Blanke and A. Prescott, ‘Dealing with big data’, in G. Griffin and M. Hayler (eds), Research Methods for Reading Digital Data in the Digital Humanities (Edinburgh: Edinburgh University Press, 2016), p. 190; R. Kitchin, ‘Big data, new epistemologies and paradigm shifts’, Big Data & Society, 1 (2014): 1–12, at p. 1.

9S. Ranade, ‘Traces through time: A probabilistic approach to connected archival data’ (IEEE International Conference on Big Data (Big Data), Washington DC, 2016), https://doi.ieeecomputersociety.org/10.1109/BigData.2016.7840983, pp. 3260–3265.

10Yeo, Records, Information and Data, p. 198.

11Yeo, Records, Information and Data, p. 142.

12A.R. Bell, ‘Participation vs principle: does technological change marginalize recordkeeping theory?’, in C. Brown (ed.), Archives and Recordkeeping: Theory into Practice (London: Facet Publishing, 2014).

13D. Bearman, Electronic Evidence: Strategies for Managing Records in Contemporary Organizations (Pittsburgh: Archives and Museum Informatics, 1994), p. 133.

14K.C. Laudon and J.P. Laudon, Management Information Systems: Managing the Digital Firm, 15th edn (Harlow: Pearson, 2018), p. 242.

15Yeo, Records, Information and Data, p. 129.

16Yeo, Records, Information and Data, p. 152.

17See, e.g., K. Anderson, ‘The footprint and the stepping foot: archival records, evidence, and time’, Archival Science, 13 (2013): 349–71, at p. 363; D. Hofman, L. Duranti and E. How, ‘Trust in the balance: data protection laws as tools for privacy and security in the cloud’, Algorithms, 10 (2017): 1–11, at p. 3.

18L. Duranti, Diplomatics: New Uses for an Old Science (Lanham: Scarecrow Press, 1998), p. 6.

19For the concept of ‘affordance’, see O. Volkoff and D.M. Strong, ‘Affordance theory and how to use it in IS research’, in R.D. Galliers and M.-K. Stein (eds), The Routledge Companion to Management Information Systems (Abingdon: Routledge, 2018), pp. 232–45.

20ISO 15489-1: 2016, Records Management. Part 1: Concepts and Principles, clause 3.14.

Annotate

Next Chapter
A Matter of Trust
PreviousNext
© authors 2020
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org