Skip to main content

A Matter of Trust: A Matter of Trust

A Matter of Trust
A Matter of Trust
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeA Matter of Trust
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. Cover
  2. Title Page
  3. Dedication
  4. Copyright
  5. Contents
  6. Acknowledgements
  7. About the authors
  8. Introduction
    1. Background
  9. 1. Records as evidence for measuring sustainable development in Africa
    1. Breakdown of records systems in Africa
    2. Records management, structural adjustment, public sector reform and computerisation
    3. Consequences for Africa of losing control of records
    4. Open data and records management
    5. Conclusion
  10. 2. The state of data and statistics in sub-Saharan Africa in the context of the Sustainable Development Goals
    1. Defining the terms statistics and data
    2. Census data
    3. Statistical activities in Africa
    4. SWOT analysis
    5. Overcoming the challenges
    6. Conclusion
  11. 3. Data, information and records: exploring definitions and relationships
  12. 4. The potential – constructive and destructive – of information technology for records management: case studies from India
    1. The Mahatma Gandhi National Rural Employment Guarantee Act
    2. Aadhaar
      1. Leaks and the system’s vulnerability to penetration
      2. Coercive action by a government in a hurry
      3. ‘Inhuman and illegal’: malfunctions and denials of services cause hardships
      4. Curbing – and enabling – corruption
  13. 5. Statistical accuracy and reliable records: a case study of mortality statistics in The Gambia
    1. Background
    2. Mortality rates in The Gambia
      1. How are mortality rates calculated?
    3. Challenges for collecting reliable birth and death statistics in The Gambia
      1. How are deaths recorded?
      2. How are death rates estimated?
      3. The reliability of birth dates
    4. Efforts to strengthen official statistics in The Gambia
      1. The Gambia Bureau of Statistics
      2. The significance of records for mortality statistics and the contribution of the National Records Service
    5. The benefits of shared responsibility for the quality of statistics
    6. Summary and conclusion
  14. 6. Mainstreaming records and data management in sustainable development: lessons from the public and private sectors in Kenya
    1. The public sector experience in Kenya
    2. Mobile banking in Kenya
      1. Relationship to the SDGs
      2. How do data and records management support mobile banking?
    3. Building bridges between the sectors
    4. Conclusion
  15. 7. Open data and records management – activating public engagement to improve information: case studies from Sierra Leone and Cambodia
    1. Sierra Leone
      1. Open data in support of free and fair elections
      2. The potential records management contribution
    2. Lower Mekong, Cambodia: land investment mapping
      1. The open data initiative
      2. The potential for a records management contribution
    3. Key issues from the two case studies
    4. Conclusion
  16. 8. Assuring authenticity in public sector data: a case study of the Kenya Open Data Initiative
    1. Data authenticity
    2. The Kenya Open Data Initiative
    3. Land data
      1. Land information management
      2. Examining the land dataset
    4. Conclusion
  17. 9. Preserving the digital evidence base for measuring the Sustainable Development Goals
    1. Elements of a digital preservation capability
    2. Implementation options
      1. Doing nothing
      2. Using open source software
      3. Developing a bespoke solution
      4. Procuring a commercial solution
      5. Outsourcing the service
      6. Partnership approaches
      7. Hybrid approaches
      8. Using consultancy services
    3. Implementation and operational implications
      1. Implementing a digital preservation service
      2. Governance
      3. Roles and responsibilities
    4. Training
    5. Policies and procedures
    6. Conclusion
  18. 10. Preserving and using digitally encoded information as a foundation for achieving the Sustainable Development Goals
    1. Requirements for SDG data to be fit for purpose
      1. Authenticity
      2. Longitudinal studies
      3. Combining data
      4. Errors
    2. Collecting and preserving data for SDGs
      1. Semantic issues
      2. Proportions
      3. Unclear metrics
      4. Rates
      5. Number of countries
      6. Money
      7. Prevalence
      8. Structural issues
      9. Virtual data
      10. Input data
    3. Digital preservation and exploiting digital data
      1. Basic concepts in digital preservation
      2. Types of digitally encoded information
      3. Digital preservation
      4. Active data management plans
    4. Is it really being preserved? The importance of certification
    5. Getting to where we need to be
    6. Conclusion
  19. 11. Transparency in the 21st century: the role of records in achieving public access to information, protecting fundamental freedoms and monitoring sustainable development
    1. Current transparency initiatives are undermined by weak records and information management
    2. Weakness in records and information management is a widespread and persistent problem
    3. New digital forms of communication and conducting government business have exacerbated earlier weaknesses in records and information management
    4. Weak control of digital records and information weakens transparency and public accountability mechanisms
    5. Persistent cultures of secrecy lead to oral government and avoidance of record-making and keeping
    6. Good data are needed on records and information management implementation in support of transparency
      1. Policy
      2. Standards
      3. Roles and responsibilities
      4. Systems and practices
      5. Capacity
      6. Policy
      7. Standards
      8. Roles and responsibilities
      9. Systems and practices
      10. Capacity
    7. Steps that can be taken to strengthen records and information management
      1. Strengthen laws and policies governing digital records management
      2. Introduce independent records and information management oversight
      3. Align incentives of public officials with RIM principles and transparency policies and laws
      4. Encourage collaboration
    8. Conclusion
  20. 12. Information management for international development: roles, responsibilities and competencies
    1. Quality information for international development
    2. Key players in records management, their roles and responsibilities
      1. Group 1: professionals with the necessary technical skills and qualifications (such as records, IT) to ensure information quality
      2. Group 2: managers (senior, programme, functional) who enable or facilitate the work of the professionals
      3. Group 3: all other stakeholders and users of the information, inside and outside the organisation
    3. Capacity for managing records
    4. Capacity Level 1
      1. (Poor quality records undermine SDG implementation)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    5. Capacity Level 2
      1. (Records enable SDG implementation at a basic level)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    6. Capacity Level 3
      1. (The quality of records makes it possible to measure SDGs effectively and supports government programme activities)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    7. Capacity Level 4
      1. (Well-managed records make it possible to measure SDG implementation effectively and consistently through time; data and statistics are of high enough quality and integrity to support government programme activities at the strategic level)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    8. Capacity Level 5
      1. (Processes generating records, and the framework for managing them, are designed to make it possible to exploit data, statistics and records, including the information used for measuring SDGs, in new and innovative ways)
      2. Group 1: professionals
      3. Group 2: managers
      4. Group 3: other stakeholders and users
    9. Determining and achieving the desired capacity level
      1. Employ staff with formal qualifications
      2. Train existing staff
      3. Contract expert staff short term as change makers
      4. Use standards to guide practice and inform staff recruitment
      5. Benchmark staff skills and knowledge against competency standards
    10. Conclusion
  21. 13. The quality of data, statistics and records used to measure progress towards achieving the SDGs: a fictional situation analysis
    1. Background
    2. Organisation of the report
    3. Methodology
    4. Definitions
    5. Analysis
    6. The government of Patria and the SDGs
    7. Data collection and analysis at the ministry level
      1. Survey data
      2. Registration and administrative data
      3. Scientific data
    8. Data and records issues at the ministry level7
    9. Data and records issues at the NBS
    10. Implications of the failure to establish a management framework
    11. Strategies for sustainable solutions
    12. Laws and policies
      1. Issues
      2. Strategies
    13. Standards and practices
      1. Issues
      2. Strategies
    14. Systems and technologies
      1. Issues
      2. Strategies
    15. People
      1. Issues
      2. Strategies
    16. Management and governance
      1. Issues
      2. Strategies
    17. Awareness
      1. Issues
      2. Strategies
    18. Implementing the strategies
    19. Capacity levels to guide the way forward
      1. Level 1: poor-quality data, statistics and records undermine SDG implementation
      2. Level 2: data, statistics and records enable basic SDG measurement
      3. Level 3: the quality of data, statistics and records makes it possible to measure SDGs effectively and supports government programme activities
      4. Level 4: well-managed data, statistics and records make it possible to measure SDG implementation effectively and consistently through time; data and statistics are of high enough quality and integrity to support government programme activities at the strategic level
      5. Level 5: processes generating data, statistics and records, and the framework for managing them, are designed to make it possible to exploit data, statistics and records, including those measuring SDGs, in new and innovative ways
    20. First steps
      1. Identify a leader and assemble a team
      2. Identify processes as examples
      3. Describe the selected processes
      4. Identify issues and implications
      5. Develop strategies for resolving issues
      6. Apply the experience to other processes and to the framework for managing data/statistics/records
  22. Index

8. Assuring authenticity in public sector data: a case study of the Kenya Open Data Initiative

James Lowry

Measuring the Sustainable Development Goals (SDGs), and ultimately the success of the whole SDG initiative, will depend on the availability of authentic public sector data. By the time it reaches policy-makers, either as baseline data or as comparative data indicating progress, it will have been assembled through one or more methods from sources as varied as paper or digital records, management information systems, people and scientific instruments. After collection, it will be curated, cleaned, analysed, augmented and remixed. It will be subjected to formulas in spreadsheets, algorithms in apps and the intervention of people in various roles with various priorities and agendas. It will be distributed, published, cited and (it is hoped) preserved through numerous channels, platforms and systems.

At every stage, the authenticity of these data is potentially at risk. If public sector information is going to be available and authentic, there must be a regulatory environment and information culture that supports openness. From at least the 1960s, the international open government movement has been working towards laws, policies and standards that have supported the availability of public sector information, for instance through freedom of information laws. What it has not yet addressed is the need for technical and procedural controls to establish authenticity. This chapter argues that the principles and techniques developed over centuries in the field of recordkeeping for the purpose of assuring the authenticity of the records documenting decisions and actions also can be used to improve data quality, so that the information needed for implementing and monitoring the SDGs is not only available but authentic.

This chapter presents a case study of the Kenya Open Data Initiative (KODI). It examines the level of control presently in place for establishing and maintaining the authenticity of information released through KODI. While KODI is noteworthy as the first government open data portal in sub-Saharan Africa, it is studied here only as an example; problems identified with KODI data can be seen in open government datasets worldwide, to varying extents.1

This chapter begins by defining authenticity in information before providing an overview of KODI (http://www.opendata.go.ke). To identify issues relating to the authenticity of information released via KODI, the chapter then examines a KODI dataset relating to land use. This analysis is discussed in relation to Kenya’s provisions for managing land information. The chapter goes on to describe the processes of preparing, publishing and using the land dataset. It maps the lifecycle of the data and identifies strengths and weaknesses in the controls for protecting the data’s authenticity. Questions about authenticity raise questions about the data’s contribution to implementing and monitoring the SDGs.

Data authenticity

In records and archives management literature, an authentic record is one ‘that is what it purports to be and is free from tampering or corruption’.2 Authenticity depends on ‘integrity’, which is the ‘wholeness and soundness’ of a record, and on ‘identity’ – ‘the attributes of a record that uniquely characterize it and distinguish it from other records’.3 Over many centuries, records and archives professionals have developed principles and techniques for assuring information authenticity – from medieval chancery practices to technical standards for digital information management systems.

What these controls have in common is an emphasis on documenting the provenance and custodianship of records through metadata captured in auditable systems. For instance, the registry system used to control paperwork in the British empire required specific officers to capture specific information in registers and on file covers and flyleaves in such a way that traces of the registered record existed in multiple places and could not easily be erased or doctored. In this way, records became wrapped in metadata that described their management and their movement within and between government offices. The obligations on custodians and the system requirements were mutually reinforcing, so that custodians were encouraged to comply with and give effect to the system through its oversight mechanisms. These same principles can be seen in standards for digital records management systems, such as MoReq and ICA-Req,4 which require that metadata be captured to document all actions in relation to records.5

The Kenya Open Data Initiative

In July 2011, the KODI portal was launched by President Kibaki to provide public access to Kenyan public sector information.6 This was a landmark moment in the history of state secrecy and openness in Kenya. Nevertheless, KODI experienced a number of challenges in its first years of operation, as the World Wide Web Foundation’s Open Data Barometer (ODB) reports showed. The ODB ranks countries on three criteria:

•readiness: how prepared are governments for open data initiatives? What policies are in place?

•implementation: are governments putting their commitments into practice?

•impact: is open government data being used in ways that bring practical benefit?7

The ODB methodology draws on government self-assessment, peer-reviewed expert survey responses, detailed dataset assessments and secondary data.8 The ODB Regional Report for Africa (3rd edition, 2016) examined 21 sub-Saharan African countries, including Kenya.9 Key findings were:

1Performance across the continent is relatively poor in comparison to leading countries in the Global South and globally. The report noted that Kenya ‘does not publish a single, fully open dataset – health, education and legislation data are open licensed but fall short of being fully open because the data are not available in bulk’.

2A downward trend is common in the overall Barometer scores from 2013 to 2015. The assessment showed a drop in ten points in Kenya’s ODB score in this period.

3Open data initiatives lack long-term commitment and resources, resulting in short-term gains that are unsustainable. ODB data shows a drop in scores between 2014 and 2015 in all African countries except Nigeria and Cameroon. However, the report notes ‘While there is a net decline in the scores for Kenya and Mauritius between 2013 and 2015, there is a recovery in their overall scores between 2014 and 2015’.

4ODB implementation scores are lower than readiness scores. The ODB’s comparison of readiness and implementation scores show a consistent gap in the case of Kenya, so that both scores increase and decrease in parallel, with no sign of the gap closing.

5There is no stand-out performer in Africa. The report states that ‘Africa is the only region without a clear open data champion … In previous editions, Ghana and Kenya looked likely to assume this role, but the data show that the performance of these countries is erratic’. The ODB Global Report acknowledges that Kenya and Ghana were in ‘a holding pattern as they try to revamp their initiatives’.10

Within the ICT Authority, which is responsible for KODI, there is a good deal of enthusiasm for improving and expanding the open data programme.11 Members of the KODI team participate in the Data Science Africa research network and its annual conferences.12 The KODI team has a network of ‘fellows’ across the public sector who identify relevant datasets, and there is a Data Science Team that cleans data released by government agencies.13 Although the African Data Consensus had not yet been ratified by Kenya at the time of writing,14 the staff of the ICT Authority recognised the challenges it identified, which informed the priorities for KODI’s development.15

KODI staff are aware of problems with the data they receive for publication through the portal. They cite many of the same problems reported by civil society and the ODB, including the questionable accuracy and completeness of the data, the risk of introducing errors during the cleaning and curation of data, problems of timeliness and infrequent updates, and ongoing resistance to data release within the public sector because of the long-standing culture of secrecy in the Kenyan government. Several of the staff noted in an interview in September 2016 that ‘the quantity of data received does not seem to reflect the enactment of the FOI law’; though the law had only just received Presidential Assent, there was a sense that it should have resulted in an increase in proactive information release.16 At the time of our interview, KODI team members had not considered the issue of data authenticity and the significance of contextual information, but they noted that they ‘hardly ever get data with metadata’.17

Nevertheless, KODI staff have put in place a number of quality control measures. They take a snapshot of the portal with every new data upload. They keep copies of original datasets received from ministries, departments and agencies, so that sources can be checked in the event that the curated data are queried. In addition, they have created a ‘data release calendar’ that they use to schedule and monitor updates, and they have created templates for datasets. All of this encourages more complete and consistent data.18

Land data

Examining a sample dataset from KODI should help to illustrate what controls and processes for authenticity are in place. In view of the significance of land in Kenyan political and economic life, a dataset relating to land use is examined here in relation to Kenya’s approach to land information management.

Land information management

Kenya, which was among the ‘first [independent African] countries to experience comprehensive land reform’,19 has experienced problems in managing information about land and related resources. For instance, in October 2015, the Thomson Reuters Foundation reported, in relation to the Kenya Groundwater Mapping Programme:

One key problem is lack of data … According to the Kenya Water Industry Association, not one of the country’s several water regulation agencies, including the Water Resources Management Authority, has reliable data that captures the distribution, quantity and quality of available groundwater.20

Since the colonial period, Kenya’s land management system has functioned largely through the creation, transmission and exchange of paper records. Under the Registration of Titles Act (1982), a Central Registry was established in Nairobi and a Coastal Registry in Mombasa, for managing paper land registration records. Under the Registered Land Act (1989), a registry was established in every land registration district. The 2012 Registration Act aimed to rationalise and devolve the land registration process but, at the time of writing, regulations were still being developed. Recordkeeping in Kenya has faced numerous challenges, including a lack of cohesive policies, lack of compliance with procedures, ad hoc systems and lack of adequate staffing and other resources.21

In 2010 and 2011, during research into the readiness of Kenyan government recordkeeping for e-government and freedom of information, Justus Wamukoya and I found that what was then the Ministry of Lands (now the Ministry of Lands and Physical Planning) had experienced a period of recordkeeping reform.22 At the time of our study, its registries were well-functioning and monitored, and sanctions were imposed for infringements:

Records management is not audited, but when breaches of records management procedure are identified, they are investigated. At the time of the interview a member of the registry staff was on suspension for removing a file that he was not permitted to access.23

At that time, a digitisation project was underway, motivated by a sense that computerisation and digitisation would, among other benefits, reduce delays in work processes. These, according to one member of staff, accounted for the great majority of the written complaints that the ministry received.24 The digitisation project was led by the Land Management Systems Technical Working Group in the Lands Reform Unit, which oversaw the ministry’s target under the government-wide Vision 2030 strategy in relation to improving land title acquisition.25

Wamukoya and I found that while records staff were confident that the organisation of paper records was adequate for land title process improvements, members of the working group had identified significant gaps in the paper records being digitised. Digital surrogates were comprehensively and regularly backed up, but no digital preservation measures had been developed and no consideration had been given to the need to eventually transfer digital records to the Kenya National Archives and Documentation Service (KNADS). Moreover, there seemed to be no planning for moving from creating and scanning paper records to managing born-digital records.26 When I again visited the Ministry of Lands and Physical Planning in September 2016, paper records were still being digitised, including maps. The staff could conduct online searches for Nairobi, and the ministry was setting up a central database of land titles as part of the Kenya National Spatial Data Infrastructure, which was expected to expand the ministry’s capacity to conduct online searches.27

Today, recordkeeping practices in the ministry continue to be guided by the standard public service manual on records management, with no separate guidance on managing land title records.28 The Records Management Procedures Manual for the Public Service provides procedures for registering and managing records, with provisions for mail management, filing, indexing, cross-referencing, classification, file tracking, ‘bring-up’, storage, survey, appraisal and disposal (transfer to archives or destruction).29 It also discusses disaster management, capacity building and the institutional framework for recordkeeping, including ministerial responsibility for compliance. The ‘Security of Records’ chapter states that access to classified records should be on a ‘need to know’ basis, reflecting the long-standing civil service bias towards secrecy. However, it also notes that ‘Access to public records shall be provided within the existing legislative and regulatory framework’, which, since 2016, includes freedom of information legislation.30 The manual warns staff to ‘guard against a natural tendency to over classify documents’.31

There is also a chapter on digital records management that sets out responsibilities and provides guidance on naming conventions, media handling and storage. It empowers the National Archives (KNADS) to set procedures for digital records management and to authorise the destruction of digital records. It states that all public institutions are required to install the Integrated Records Management System (IRMS) developed and introduced by the Personnel Office in the Ministry of Public Service. Although the IRMS has not been audited against international standards such as ICA-Req or MoReq, it covers core records system functionalities relating to registration, file tracking, ‘bring up’ and reporting.32 It does not, however, address the need for a system to preserve born-digital records or digital surrogates. Kenya’s second Open Government Partnership National Action Plan included a commitment to establish ‘a central digital repository to provide lasting access to government records and data and all information of public interest’ by June 2018.33 However, this has not yet happened.

Examining the land dataset

Although authentic information is crucially important for the SDG initiative, difficulties in accessing or understanding data can make it virtually inaccessible to potential users, including government and civil society actors involved in SDG work. Kenyan open data specialists, Leonida Mutuku and Christine Mahihu, have reported that ‘data quality’ is a key issue for open data in Kenya. They have highlighted low relevance of data to citizens, the irregularity of data updates or dataset releases and the questionable utility of the data (for instance, incomplete data and data that are poorly structured or formatted) as key issues.34 However, their study did not examine authenticity when considering data quality.

In the field of recordkeeping, authenticity requires that metadata, documented provenance and custodianship and auditable systems work in concert. For the purpose of examining how these three prerequisites are applied in producing and publishing data released through KODI, I studied the dataset ‘Proportion of Parcels Using Fertiliser 2006’. Not only is the dataset relevant to the contentious issue of land use, but the fact that it was accompanied by some contextual information suggested that it might be possible to determine if the prerequisites for authenticity were addressed in its production and publication. The dataset was uploaded by a user called ‘Knoema’ in 2015 and the original data source was identified in its metadata as the Kenya National Bureau of Statistics.

I first looked at the dataset in October 2016. Returning to KODI on 11 January 2017, the site rendered a login interface (see Figure 8.1).

Figure 8.1. KODI interface on 11 January 2017

Creating an account and signing in brought me to an error message (Figure 8.2) indicating that permission was now required to access the data.

Figure 8.2. KODI sign-in error message

On querying the staff of the ICT Authority, I was directed to use a mirror site while the portal was migrated between service providers. A search of the interim platform for the title of the dataset on 16 January 2017 produced no results. A search for ‘parcels’ produced a short list of results that included a dataset called ‘Proportion of Parcels Using Fertiliser County Estimates 2005/6’. The metadata for the dataset indicated that the dataset was uploaded by ‘kodipublisher’ rather than ‘Knoema’, with a publication date of 20 December 2016, which was probably the date of migration to the interim platform. There was nothing to indicate that the dataset was first published in 2015.

The dataset, viewed as a CSV file, comprised seven columns:

A– object ID (a sequence of ascending numbers from 1 to 47)

B– ‘county_name’ (a list of county names)

C– ‘proportion_of_parcels_using_f’ (the same list of county names)

D– ‘proportion_of_parcels_using_1’

E– ‘proportion_of_parcels_using_2’

F– ‘proportion_of_parcels_using_3’

G– ‘proportion_of_parcels_using_4’

Columns D to F provide figures ranging from 0 to 0.94. Column G provides figures ranging from 0 to 428581.8. There is no data or metadata within the CSV file to help interpret these figures, such as the meaning of ‘using_1’, ‘using_2’, etc., and no formulas that provide a key to the relationship between the figures in columns D through F and the figures in column G. Columns D through F for Wajir County each have a value of 0, as does column G. Columns D through F for Mombasa County each have a value of 0, but column G has a value of 2425.2. This means that the content, context and structure of the dataset are insufficient for the average user to interpret the data.

Instead, with regard to this particular dataset, KODI does the work of interpretation for its users by visualising the data as a map. The portal includes three tabs: ‘Overview’, ‘Data’ and ‘Visualization’:35

•‘Overview’ provides basic metadata about the dataset: title, publisher, last modified date (but no other dates related to events in the life of the data), licensing information; it attributes the dataset to the Kenya National Bureau of Statistics

•viewing the ‘Data’ tab produces a generic error message (‘There was an error’)

•the ‘Visualization’ tab allows users to generate a map of Kenya’s counties coloured to show the proportion of parcels using fertiliser. To the general user, this visualisation feature is necessary to understand the dataset.

The ‘Proportion of Parcels Using Fertiliser County Estimates 2005/6’ dataset lacks essential assurances of authenticity. In terms of metadata, the content of the CSV file itself lacks sufficient metadata to enable users to interpret the data. The fact that human interpretation of the CSV is not possible does not necessarily undermine an assumption of authenticity. However, since visualisation is necessary to understand the data, questions need to be asked about the mechanisms of visualisation. What formulas and algorithms are used to render the dataset as a map? The formulas and algorithms that render the data interpretable to users are a significant component of the system for managing the information and should be capable of being audited to support authenticity. However, there is no technical information available through KODI to help users understand the process that produces the map. This effectively constitutes a gap in the chain of custody.

Of more significance for demonstrating authenticity, KODI offers little metadata about the file itself. Looked at in isolation, this dataset appears to have ‘identity’ (the attribute of a record that distinguishes it from other records).36 This ‘identity’ is supplied by its metadata, in particular its unique title, author and dates. However, when compared with the dataset first viewed in October 2016, the discrepancy in publishers (Knoema and kodipublisher) and dates (uploaded 2015 and last modified 20 December 2016, without reference to an upload date) raises questions about the identity of the dataset. Is it the same dataset? If identity is called into question through inconsistent metadata, one of the two fundamental elements of authenticity (identity, with integrity) is absent.

In addition, the provenance of the dataset is obscured by the lack of metadata documenting the custodianship of the data from the point of collection. Working backwards, there is metadata about the publisher of the data (though some ambiguity about the identity of the publisher when the two versions of the dataset are considered together) and about the source of the dataset, which is given as the Kenya National Bureau of Statistics, but it is not possible to see the sources that the Bureau used.

In this dataset, there are two types of data:

•information about the boundaries of the land parcels, which depends on information generated by the land registration process conducted by the Ministry of Lands and Physical Planning

•estimates of fertiliser use, which are likely to come from the Ministry of Agriculture, Livestock and Fisheries or one of its agencies.

To be assured of the integrity of the dataset it must be possible to know the sources of the data. In this case, assurance would require users to seek this information from the Bureau of Statistics. The recordkeeping system of the Ministry of Lands and Physical Planning functions well and conforms with the system set out in the manual, which, being government-wide, is also likely to guide the recordkeeping of the Ministry of Agriculture, Livestock and Fisheries.37 The recordkeeping system documents custodianship throughout the lifecycle of the ministries’ records, but KODI users do not have a way to be aware of this. Therefore, there is a break in the documentation of the chain of custody between the creation or capture of the data by the ministries and the aggregation of the data by the Bureau of Statistics. Moreover, the user does not know what processes and controls the Bureau followed to prepare the data and assign and document responsibility for the data.

The data pass through a series of systems, and these need to be auditable if authenticity is to be assured. From the point that it is created, and throughout its use within the ministry, land parcel information is managed through an auditable recordkeeping system. The data may then pass to the Ministry of Agriculture, Livestock and Fisheries, where they serve as a parameter for the ministry’s estimates of fertiliser use. There is no metadata that would help KODI users understand this. Even at the point of aggregation, when the data passes into the Bureau of Statistics’ systems, this is not transparent to KODI users.

In preparation for publication, the data then pass into the custody of KODI and its systems. Unlike the previous transitions, this is documented in publicly available metadata. As outlined above, KODI staff follow defined measures for protecting the integrity of data, and, while these measures are not brought together in a formal system that allows each action and custodian to be audited, they would, theoretically, enable published datasets to be compared with datasets as received by KODI. In this way, there is a basic accountability mechanism in place for KODI’s data custodianship. However, there is no information about these controls on the KODI portal. Rather than assurances of integrity, KODI users have ambiguous information about the sources and treatment of the data.

Although ‘Proportion of Parcels Using Fertiliser County Estimates 2005/6’ may be authentic in the sense that it ‘is what it purports to be’, the user does not have the information needed to determine its authenticity. Partial metadata, opaque provenance, undocumented custody (particularly during aggregation) and the lack of information about the systems for its management, including its visualisation, introduce doubts about the data’s identity and integrity.

Conclusion

The development of open data in Kenya has not yet been linked to records management. Connecting the two would help facilitate trust in opened datasets and support their survival through time. At present, there are problems with the data being released through KODI, and the ability to rely on their authenticity remains limited.

This study has charted the lifecycle of a specific dataset and found that, in general, strengthening the recordkeeping practices of public sector bodies lays a foundation for assuring authenticity in public sector data. This would be enhanced by the publication of recordkeeping audit reports documenting processes from the beginning of the data lifecycle. Later in the lifecycle, when data are aggregated and then published on KODI, authenticity is called into question by partial metadata about data’s provenance and management. Leaving aside problems that may be unique to KODI’s transition between platforms, there are unanswered questions about the management of the data throughout their lifecycle.

Authenticity is at the greatest risk at the point that data are aggregated. The data aggregation practices of the Bureau of Statistics are not documented for the public. In this space between creation and publication, provenance is obscured, and custody cannot be audited. After publication, most users will need to rely on KODI’s visualisation of the dataset in order to interpret the data, making KODI an essential component of the data management system. Accountability requires that information management systems must be transparent, and custodianship must be public. At present, KODI’s mechanisms for visualisation are not published. The chain of custody of the dataset appears to be broken in several places.

These issues all undermine assurances of the authenticity of the data. Not only is there a negative impact on government openness and community and commercial reuse, but there are repercussions for the ability to pursue the SDGs. Implementing and monitoring the goals depend on access to authentic data.

KODI is taking steps to build controls into its processes. It takes snapshots of the portal with every new upload, retains original datasets received from government and standardises templates for datasets. All of this contributes to building an audit trail of the accuracy and completeness of data and metadata. If these steps can be brought together, documented and published, KODI will have taken important steps towards assuring authenticity.

The lack of technical controls for information authenticity is not unique to Kenya – it is widespread, and arises from the disconnect between the communities of practice involved in open data on one hand and records management on the other.38 To improve data quality generally and data authenticity in particular, records management principles and know-how need to be incorporated into open government data curation. Records management has supported previous efforts to establish government openness (particularly freedom of information), by organising and making information available for release. The major contribution of records management to open data is, however, not in providing records for data mining but in offering techniques for improving data quality by making datasets more like records.

1J. Lowry, ‘Addressing information asymmetry in the social contract: an archival-diplomatic approach to open government data curation’, unpublished PhD thesis, University College London (2019).

2InterPARES 1 Authenticity Task Force, ‘Appendix 2: requirements for assessing and maintaining the authenticity of electronic records’, in InterPARES 1 Project, The Long-Term Preservation of Authentic Electronic Records: Findings of the InterPARES Project (InterPARES Project, 2002), pp. 1–2.

3InterPARES 1 Authenticity Task Force, ‘Appendix 2’.

4These are standards that set out functional requirements for digital records management systems. MoReq is the ‘Model Requirements for the Management of Electronic Records’ published in 2001 by the DLM Forum – a European network of government archives and information professionals – with funding from the European Commission. The current version of MoReq (MoReq2010) was published in 2011. ICA-Req is the International Council on Archives’ Principles and Functional Requirements for Records in Electronic Environments, published in 2008. In 2010, ICA-Req was adopted by the International Standards Organization as ISO 16175.

5Lowry, ‘Addressing information asymmetry’.

6T. Davies, Open Data Policies and Practice: An International Comparison (European Consortium for Political Research, 2014), p. 14, https://ecpr.eu/Filestore/PaperProposal/d591e267-cbee-4d5d-b699-7d0bda633e2e.pdf.

7World Wide Web Foundation, Open Data Barometer: ODB Methodology – v1.0 28 April 2015, p. 3.

8World Wide Web Foundation, Open Data Barometer, p. 3.

9World Wide Web Foundation, Open Data Barometer, Third Edition, Regional Report, Africa, May 2016, p. 6, http://opendatabarometer.org/doc/3rdEdition/ODB-3rdEdition-AfricaReport.pdf.

10World Wide Web Foundation, Open Data Barometer, 3rd edn, p. 30.

11Interview with Sifa Mawiyoo, open data specialist and GIS technologist, Kenya Open Data Initiative (KODI), ICT Authority, and Prestone Adie, data analyst, KODI, ICT Authority, Nairobi, 20 September 2016.

12Data Science Africa, http://www.datascienceafrica.org/.

13Interview with Sifa Mawiyoo, 20 September 2016.

14Interview with Sandra Musoga, senior programs officer – Transparency, Article 19, Nairobi, Kenya, 16 September 2016.

15Interview with Sifa Mawiyoo, 20 September 2016.

16Interview with Sifa Mawiyoo, 20 September 2016.

17Interview with Sifa Mawiyoo, 20 September 2016.

18Interview with Sifa Mawiyoo, 20 September 2016.

19J. Herbst, States and Power in Africa: Comparative Lessons in Authority and Control (Princeton: Princeton University Press, 2000), p. 185.

20M. Waruru, ‘To arm against drought, Kenya maps its water resources’, Thomson Reuters Foundation, http://news.trust.org/item/20151030082053-5dgtn/.

21Administrative histories of recordkeeping in sub-Saharan Africa are limited, but for an overview of the problems that are common across many of those countries, see J. Wamukoya, ‘Records management and governance in Africa in the digital age’ and N. Mnjama, ‘Anne Thurston and record-keeping reform in Commonwealth Africa’, in J. Lowry and J. Wamukoya (eds), Integrity in Government through Records Management (Farnham: Ashgate, 2014). The roots of these problems are to be found in the colonial period. Again, this history is under-researched, though the colonial origins of more recent recordkeeping problems are noted by M. Musembi, ‘Development of archive services in East Africa’, in Historical Development of Archival Services in Eastern and Southern Africa: Proceedings of the 9th Biennial General Conference (Mbabane; Rome: ESCARBICA, 1986), p. 116.

22International Records Management Trust, Aligning Records Management with ICT, e-Government and Freedom of Information in East Africa, Kenya Country Report, p. 13, http://www.irmt.org/portfolio/managing-records-reliable-evidence-ict-e-government-freedom-information-east-africa-2010–2011.

23International Records Management Trust, Aligning Records Management.

24International Records Management Trust, Aligning Records Management.

25International Records Management Trust, Aligning Records Management.

26International Records Management Trust, Aligning Records Management, p. 5.

27Interview with Edward Kosgei, head of lands administration, and Emily Ndungi, principal records management officer, Lands Department, Ministry of Lands and Physical Planning, Nairobi, Kenya, 19 September 2016.

28Interview with Edward Kosgei and Emily Ndungi.

29Republic of Kenya, Records Management Procedures Manual for the Public Service, Office of the Prime Minister, Ministry of State for Public Service, May 2010.

30Republic of Kenya, Records Management Procedures Manual, p. 50.

31Republic of Kenya, Records Management Procedures Manual, p. 51.

32International Records Management Trust, Aligning Records Management, p. 9.

33Republic of Kenya, OGP National Action Plan 2.

34Mutuku and Mahihu, Open Data in Developing Countries (iHub, 2014), p. 47.

35Kenya Open Data Initiative (KODI), http://www.opendata.go.ke.

36InterPARES 2, Terminology Database, http://www.interpares.org/ip2/ip2_terminology_db.cfm.

37International Records Management Trust, Aligning Records Management, pp. 12–13.

38International Records Management Trust, Aligning Records Management.

Annotate

Next Chapter
A Matter of Trust
PreviousNext
© authors 2020
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org