10. Preserving and using digitally encoded information as a foundation for achieving the Sustainable Development Goals
David Giaretta
The health, wealth and happiness of a great many people worldwide, now and in the future, will depend upon the accuracy, intercomparability and preservation of the Sustainable Development Goal (SDG) measurements, information that is of global significance. This chapter offers a view on how to manage the complexity of gathering comparative data to monitor progress towards the SDGs in the years leading to 2030 from the perspective of collecting, using and preserving digitally encoded information, in particular scientific data. The aim is to help improve the way that information relevant to the SDGs is collected and used, so that the conclusions and actions arising from them are based on information that can be regarded as authentic, the results from which can be trusted, and comparisons between nations can be made sensibly through time.
The extract that follows, quoted from Transforming Our World: The 2030 Agenda for Sustainable Development, Section 74,1 notes that follow-up and review processes at all levels will be guided by the following principles:
(f)They will build on existing platforms and processes, where these exist, avoid duplication and respond to national circumstances, capacities, needs and priorities. They will evolve over time, considering emerging issues and the development of new methodologies and will minimize the reporting burden on national administrations.
(g)They will be rigorous and based on evidence, informed by country-led evaluations and data which is high-quality, accessible, timely, reliable and disaggregated by income, sex, age, race, ethnicity, migration status, disability and geographic location and other characteristics relevant in national contexts.
(h)They will require enhanced capacity-building support for developing countries, including the strengthening of national data systems and evaluation programmes, particularly in African countries, least developed countries, small island developing States, landlocked developing countries and middle-income countries.
(i)They will benefit from the active support of the United Nations system and other multilateral institutions.
These statements make it clear that data are to be collected at the national and subnational levels for every nation and that there may be differences in the way they are collected and recorded. The data are to be disaggregated, but the exact method and level of disaggregation may differ. All these differences make comparisons between nations and regions difficult or even impossible, and the results may have political and financial ramifications. Additional levels of proof of the validity of the data may be necessary.
A vast amount of information is needed to monitor progress towards achieving the SDGs. More challengingly, this information is hugely varied, covering all aspects of life, including health, wealth, nutrition, education, industry, society and the natural environment. Moreover, the volume of information being collected is growing at an ever-increasing rate, and an ever-greater portion of this information is in digital form. Of particular concern here are sets of information, including data, statistics and records, that provide the foundation for measuring the achievement of the SDGs. This digital information is fragile; the bits can decay, or the information encoded in those bits can be lost, which is difficult to guard against.
The chapter looks first at the challenges of measuring the SDGs and at the ideal solutions from the perspective of the international standards with which the author has been involved. It then examines representative SDGs, considers some of the realities for measuring them and explores what is required to preserve them. Finally, it explores potential ways of reaching the ideal, in line with the principles set out above, given the difficult realities faced. The chapter should be of use to those responsible for collecting the information for measuring the SDGs, those responsible for the use of SDG data and those whose interest is in deriving lessons from combining information that is relevant to the SDGs.
Requirements for SDG data to be fit for purpose
If the SDG measurements are to guide decisions that could affect the lives of billions of people, the information used to measure them must be authentic and verifiable, with clear health warnings in terms of its applicability and accuracy. This section looks in general at what must be addressed to achieve these aims and outlines some of the complexities that must be considered.
Authenticity
If SDG results lead to unpalatable conclusions, the authenticity of the data on which those conclusions are based is likely to be questioned, as has happened in the case of climate change. Evidence to support authenticity must therefore be collected; it should be possible to verify all the information related to the SDGs as authentic.
To achieve this, the provenance of the information needs to be recorded, which includes a record of who created it, when it was created, how it was created and what has happened to the information subsequently. This also should include the procedures, methodologies and algorithms used to preserve it. The importance of authenticity and provenance can be illustrated by considering questions about evidence, which could easily be distorted, with far-reaching consequences for countries and individuals.2 Questions that should be asked include:
•are these the actual measurements made?
•were the measurements made correctly and in the location claimed?
•was the process used to reach the conclusion correct?
To answer these questions, it is important to be able to provide reliable evidence, for instance:
•the hash codes or the Transformational Information Properties (TIPs; discussed below) of the original measurements (so that those used can be checked against them)
•a verifiable record of how the measurements were made and by whom
•the documentation and software used (so that the whole process can be checked)
•the process used to reach the result and a record of how the process can be repeated and checked.
All these answers constitute the types of provenance that should be captured as part of SDG data.
For individual measurements, there are simple ways of proving that the values have not changed and are authentic, namely by using digests or hashes. Hashes are created by taking the bits that make up a file and applying an algorithm (or a set of procedures), such as: divide by a certain number, chop the sequence of bits into smaller pieces and multiply together in a certain way. This creates a sequence of digits and characters much shorter than the original file, known as the hash, which is like a fingerprint for that file. Keeping the original hash for a file makes it impossible to make a change in the file without detection, because recalculating the hash of the ‘imposter’ will produce something that does not match the original hash.
Between now and 2030 and beyond, it may be necessary to transform digital objects in order to preserve them, for example from MS Word to PDF (if the required version of MS Word is no longer supported), or from CSV to XML. In such cases a hash cannot be used, and other evidence must be collected. The OAIS3 Reference Model introduced the concept of ‘Transformational Information Property’ (TIP) to capture evidence demonstrating that there is sufficient similarity between the original and the new objects. Transforming the file will almost certainly result in the loss of some information, and the TIP chosen should ensure that what is lost is not important. The TIPs are an explicit statement of what aspects should not be lost. Examples could include the pagination of a document, which may be important for legal documents, or the colours in an image, or the numerical differences which are allowed when making changes to scientific data. The TIPs should be agreed early on, and when the transformation is carried out, they should be checked.
There are other complexities that must be addressed when seeking to measure the SDGs, as described below.
Longitudinal studies
It is clear that several of the SDGs will take some time to achieve. For example:
SDG 2
End hunger, achieve food security and improved nutrition and promote sustainable agriculture.
SDG 10
Reduce inequality within and among countries.
SDG 17
Strengthen the means of implementation and revitalise the global partnership for sustainable development.
Monitoring progress between now and 2030 will require reliable means of measuring nutrition, inequality and implementation; the information will need to be captured consistently through the months and years. Each of these types of measurement can then be compared, month by month and year by year, to check that progress has been made in the desired way. This type of ‘longitudinal’ study is common to many disciplines.
Consider the following SDGs, for which it will be difficult to quantify and collect data that are immediately relevant:
SDG 4
Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all.
SDG 15
Protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, halt and reverse land degradation and halt biodiversity loss.
SDG 16
Promote peaceful and inclusive societies for sustainable development, provide access to justice for all and build effective, accountable and inclusive institutions at all levels.
The question is how these goals can be monitored, as they will require complex analyses combining information from many areas of activity. For instance, if something is done to promote peaceful and inclusive societies, how will it be possible to check whether societies have become more peaceful and inclusive? If societies have not become more peaceful and inclusive, should some other course of action be taken?
Many SDGs, namely 6, 7, 8, 9, 12 and 14, seek sustainability of certain things:
SDG 6
Ensure availability and sustainable management of water and sanitation for all.
SDG 7
Ensure access to affordable, reliable, sustainable and modern energy for all.
SDG 8
Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all.
SDG 9
Build resilient infrastructure, promote inclusive and sustainable industrialisation and foster innovation.
SDG 12
Ensure sustainable consumption and production patterns.
SDG 14
Conserve and use the oceans, seas and marine resources sustainably.
Evidence of sustainability must in some way involve a longitudinal study to show that what is being sustained is unchanged through time and indeed for a significant time beyond 2030. Longitudinal studies will have to be started well before positive results can be expected.
Combining data
Longitudinal studies can, in principle, be relatively simple, involving comparing raw measurements from one point in time to the next. However, it may be necessary to combine a number of measurements to achieve a meaningful response. For example, SDG 2 requires progressively improved land and soil quality. One way of measuring soil quality is to use the USDA measurement of soil quality which combines measurements of ‘soil respiration, infiltration, bulk density, electrical conductivity, pH, nitrates, aggregate stability, slake, earthworm, water quality and observations of soil structure’.4 Each of these is a separate measurement in itself, each with its own procedures and processes. All must be recorded alongside individual results.
The measurements are in general processed according to a specific algorithm, or some specific procedure such as ‘divide measurement of this quantity by the measurement of that quantity and then add the result to the measurement of a third quantity’. Alternative algorithms may be used by different sets of people to produce specific results, because the groups follow different theories or make different assumptions about how the measurements were made. The different algorithms will almost certainly produce different results.
When there are large datasets, the algorithm is normally encoded in software that takes in the various datasets and combines them. The results of such algorithms affect our everyday lives,5 and these algorithms can change.6 Therefore, when comparing results computed using numerous inputs reported by different countries, we should ensure that the algorithms used are the same, or, if this is not possible, that the algorithms are well-documented so that the methods used by different countries can be properly compared.
If it is found to be useful to perform meta-analyses using SDG data from a number of countries, then it will be even more important to be sure that like is being combined with like.
Errors
Scientific measurements always have random and/or systematic errors. The reader will probably be familiar with a number of statistical methods and perhaps standard deviations of measurements. But information about the SDGs is likely to require more detailed considerations because it will involve the combination of diverse sources of information, each of which will have some kind of errors associated with it.
The aim in designing a good measurement is, first, to understand the potential sources of errors and, second, to minimise the errors wherever possible. It is also important to distinguish between various kinds of errors and to understand the difference between precision and accuracy. The word precision is related to the random error distribution associated with a particular experiment or even with a particular type of experiment. The word accuracy is related to the existence of systematic errors, such as differences between laboratories. For example, one could perform very precise but inaccurate timing with a high-quality pendulum clock that had the pendulum set at not quite the right length.7
Some scientific data can be checked by repeating the measurement, ideally by different people and at different places. Measurements of physical constants, such as the speed of light, can be repeated to reduce errors. Random errors can be reduced by statistical means, assuming the distribution of errors is normal. When lots of measurements are combined, the errors tend to cancel in a predictable way, reducing the overall error.
However, this may not be the case for SDG-related data. To give a concrete example of the effect of errors, let us imagine that one measures temperature, which happens to be constant. A random error would be like rolling a dice for each measurement and adding the number shown on the dice, minus 3. That means that one will see the temperature jittering up and down, but on average the temperature will be constant. On the other hand, if the dice is weighted so that it always lands with the number 6 showing, one will see an average temperature 3 degrees higher than the actual temperature. The former are called random errors, while the latter are systematic errors.
To give an SDG-related example, if the Gross Domestic Product of a country is measured but an important part of the economy is omitted, the error will be repeated each time the exercise is repeated. Systematic errors can be reduced or identified by repeating measurements with different set-ups, for example using different measurement techniques.
Combining measurements of different things, for example when determining soil quality, will combine and propagate errors in the data. Adding measurements of two things together, each of which has a random error, one can estimate the combined error, which will be larger than either separate error. For example, if one has errors which are both systematic then the combined error may be the sum of the errors or they may cancel each other out.
Consider, for example, SDG 5: gender equality, which has nine targets and 14 indicators. It is reported that, too often, women are not identified separately in datasets,8 which must affect conclusions to be drawn about gender equality as a systematic error. Even if there are no systematic errors, there may be random errors. It may be necessary to combine the errors in an appropriate way, which could require significant work to decide or even to estimate what the overall error in the results might be. These issues should be considered in looking at trends over the years covered by the SDGs.
Collecting and preserving data for SDGs
Having looked at the challenges and solutions from a general point of view, this chapter now examines representative SDGs and considers some of the realities for measuring them. The Global Indicator Framework for the Sustainable Development Goals and targets9 sets out 244 measures, which can be roughly categorised in diverse ways, as discussed below.
Semantic issues
The SDGs are measured in various ways, with particular issues related to different classes of measurements. Some of the issues are generic and concern numbers, especially where similar measurements are to be combined or compared. It is important to capture a description of how the measurement was taken.
Proportions
One hundred and eleven measures, or about 45 per cent, are proportions of one measure with respect to another, for instance, measure 1.4.1: proportion of population living in households with access to basic services. Proportions have the advantage of not involving units. At the same time, a proportion may be expressed as a percentage (a number between 0 and 100) or as a fraction (a number between 0 and 1). Each one needs to be specified, as does the way it is encoded.
Unclear metrics
About 54 measures, or 22 per cent, are very unclear. For example, 2.b.1: agricultural export subsidies and 2.c.1: food price anomalies, are non-specific in terms of units and even about what is to be measured.
Rates
Twenty-two of the measures, or 9 per cent, are expressed in terms of rates. Some are expressed fairly clearly, for example, 8.1.1: annual growth rate of real GDP per capita. However, 3.3.4: hepatitis B incidence per 100,000 population does not specify the time period, and it may be that the time period assumed in one country is different from that assumed in another.
Number of countries
Nineteen metrics, or 8 per cent, are simply numbers. For example, 1.5.3: number of countries that adopt and implement national disaster risk reduction strategies in line with the Sendai Framework for Disaster Risk Reduction 2015–2030, which countries may interpret differently.
Money
Eleven of the measures, or 4.5 per cent, are expressed in terms of monetary values. For some, the currency is explicitly expressed in units of US dollars, but for others no currency is specified, making comparisons impossible. Comparisons are difficult between countries or even within a single country in a given range of years because of currency variations and inflation.
Prevalence
Four of the measures, or less than 2 per cent, are expressed in terms of prevalence and are not always clear. For example, 2.1.1: prevalence of undernourishment, is not clear, whereas 2.1.2: prevalence of moderate or severe food insecurity in the population, based on the Food Insecurity Experience Scale (FIES), is more specific. However, it is still not clear what prevalence means here.
Structural issues
Information such as the proportions or amounts of money tends to be encoded in some sort of table, for instance:
2016, 7.4
2017, 7.7
2018, 9.2
This could be encoded as a simple text file or as a complex binary file. Alternatively, it could be in an XML file, such as Microsoft Excel uses, or else something like:
<year>2016</year><value>7.4</value>
<year>2017</year><value>7.7</value>
<year>2018</year><value>9.2</value>
There are of course many variations and many possible XML schema. Alternatively, the data could be stored in a database in some internal format. Again, these variations make comparisons very difficult.
Virtual data
Whether they are in a spreadsheet or a database, the values of the data may not be defined explicitly anywhere, and this adds another level of complexity. The value may be calculated from other data values through a formula. For example, a value shown as a proportion may only exist in an Excel spreadsheet as a formula ‘=100*A1/B1’. The value of B1 itself may be calculated from other values. In the future, appropriate software may be available to access the data, as may be the case, for instance, for Excel. However, even then there is no guarantee that the formulae will be applied correctly.10 Lack of data in a spreadsheet cell may be indicated by a blank or by a zero or ‘999’, which can produce uncertain results.
Input data
Many of the measures discussed above are the result of a combination of other measurements, as illustrated earlier in relation to SDG 2, zero hunger. The USDA measurement of soil quality includes ‘soil respiration, infiltration, bulk density, electrical conductivity, pH, nitrates, aggregate stability, slake, earthworm, water quality and observations of soil structure’. Each of these measures requires a separate test in itself, each with its own procedures and processes, all which should be recorded, as should the individual results. Even a simple proportion requires that the units of the two quantities are the same and are calculated year after year in the same way. For example:
1.a.3 Sum of total grants and non-debt-creating inflows directly allocated to poverty reduction programmes as a proportion of GDP.
The way that funds are identified as being ‘directly allocated to poverty reduction programmes’ needs to be expressed consistently if the measure is to be accurate.
The same point applies to many of the measures used.
Digital preservation and exploiting digital data
In addition to understanding how to manage the complexity of gathering comparative data as a basis for monitoring progress towards the SDGs, it is vital to think about and plan for digital preservation, at least over the timescales relevant for the SDGs. Whereas printed documents can be used for hundreds of years, digital data are different. The things we rely on to use data, such as technology, software and know-how, quickly evolve and change and even become unavailable.
Basic concepts in digital preservation
While there are many different factors influencing the use and longevity of digital information, such as software dependency or users’ knowledge, in order to evaluate preservation issues, it is necessary to understand how the information is actually recorded and how easy it is to distort what it means.
Consider the meaning of these bits:
01001110 01001101 01010001 01001101 01010000 01001010 00100000 00100000
They could mean, among other things:
Two IEEE 754 32-bit real numbers: 8.6116461x108 1.35644119x1010
Two 32-bit integers: 164211241 168379396
Eight 7-bit ASCII characters: ‘NMQMPJ’
In fact, in this instance it was the last of the three. The characters were my flight reference for a recent trip – quite important for me at the time, but not really of interest now. This illustrates the point that just keeping the bits is not enough.
Types of digitally encoded information
There are many types of digital objects, including documents that can be rendered on a screen or on paper to be viewed by a human. Other types of digital objects that can be rendered are images, sounds or movies. Data, and in particular measurements relevant to the SDGs, are not normally simply rendered and viewed, and they are therefore referred to below as ‘non-rendered’.
Data, such as encoded numbers and text, are normally processed in a number of ways, and then the results of these processes are rendered and viewed, for example as a graph. The numbers and text could simply be printed and viewed, but often this is not really useful, especially if the dataset is very large. The information in rendered objects may be ‘combined’ in the tens or hundreds of ways in the minds of individuals. Non-rendered objects (data), on the other hand, can be combined using computers with millions or billions of other pieces of data to be analysed and evaluated.
Evidence in the form of data, such as population statistics or economic indicators, must be collected to guide and monitor the SDG work. These data will be summarised in words or graphs, but only a limited number of analyses will be included. However, in many cases, where the measured progress could be challenged, datasets will need to be preserved and maintained so that they will be useable and capable of serving as evidence until at least 2030 and probably beyond.
Digital preservation
Digital preservation has been described as ‘interoperability with the future’. A fundamental aim of information preservation is to make it possible to trust and use it in the future. Preservation through time involves making sure that future users, who may have different technologies, formats, languages and understandings of words, can still use that information and can be confident that it has not been altered. Digital preservation also includes interoperability with the current time – preservation techniques should help to make information accessible and trustworthy right now. Use between communities can present the same challenges as use through time; the same techniques will be useful.
Today we need the capacity to use information from many sources, many disciplines, many people and many software applications. Coping with a large amount of data in a timely and repeatable manner requires that the data be digital and that each type of information should be encoded digitally. Since these measurements may provide the basis for making important decisions in the future, each should be preserved.
The widely accepted way to preserve digital information is to follow the Reference Model for an OAIS, or ISO 14721:2012. The standard was developed to facilitate broad consensus on the requirements for a repository capable of providing long-term, discipline independent preservation of digital information, or digital archives. The group that developed the standard brought together people from space agencies, national archives and libraries, commercial organisations and many other domains with an interest in the long-term preservation of digitally encoded information. The standard defines a set of responsibilities that an OAIS archive must fulfil, making it possible to distinguish it from other uses of the term archive. It also was intended to support the development of additional digital preservation standards.
Since being adopted as an ISO standard, the OAIS Reference Model has been welcomed and widely adopted by virtually all types of digital preservation communities. Most modern digital preservation initiatives refer to the OAIS Reference Model standard, and it has also been widely used by organisations to inform their implementations of new or upgraded preservation systems. The term ‘open’ in OAIS is used to imply that this standard was developed in open forums. It does not imply that access to the archive is unrestricted.
OAIS defines a number of important concepts for successful digital preservation. One central concept is that Archival Information Packages (AIPs) should be created to capture all the metadata needed for preserving the data. An AIP includes representation information (semantic, structural and other types, such as the software that makes it possible to understand and use the data) as well as preservation description information, which includes:
•provenance, including the date it was created, why it was created and what happened to it subsequently
•location reference
•fixity (to ensure the information has not been altered, for example by calculating access rights and how they are controlled)
•the way all this is linked together
•a description of the whole package.
Active data management plans
Once it is determined what metadata is needed for preservation in any given situation, it is important to capture this metadata as soon as the data are created rather than waiting to collect them at the end or attempting to collect them later. Here, two resources are valuable: Project Management Body of Knowledge (PMBOK)11 is the entire collection of processes, good practices, terminologies and guidelines accepted as standards within the worldwide project management industry; the Data Management Body of Knowledge (DMBOK),12 is a collection of processes and knowledge areas that are generally accepted as good practices within the data management discipline.
A new standard is now being developed that will bring together ideas from PMBOK, DMBOK and OAIS. The new standard, Information Preparation to Ensure Long Term Usability (IPELTU), is being prepared by the same international working group that wrote OAIS: the International Standards Organization Technical Committee on Space Data and Information Transfer Systems. Drawing upon PMBOK and DMBOK, it breaks the project down into types of activities. At each stage, from conception to planning the collection of data and preserving the data, checklists are provided to illustrate additional information that will be needed to ensure the data can be used now and into the future.
For every activity there are areas of information that should be considered for collection. This makes it possible to draw up a table, with each element representing a particular activity and a corresponding area of information that should be considered for collection. IPELTU uses the term ‘collection groups’ for data creation and collection activities:
•initiating: the reason for creating the data and the initial definition of the data project
•planning: planning for the data creation and encoding
•executing: creating/collecting/encoding the data (at each point there may be deviations from the planned results, including instrument effects and unexpected influences)
•closing: completing the data creation/collection/encoding to satisfy the requirements of the project, phase or contractual obligations, and, at the end of the project, turning the information over to the long-term preservation organisation
•controlling: tracking, reviewing and orchestrating the progress and performance of the activities.
‘Additional information areas’ include:
•content information: content data object and representation information
•preservation description information (PDI)
•provenance information
•context information
•fixity information
•access rights information
•package description
•packaging information
•issues outside the OAIS Information Model: publications and related datasets.
The table that follows shows a sample of activities and can serve as a checklist for the data and metadata that need to be captured. This should help to ensure that everything necessary for preservation and future use is available. Each column describes a ‘collection group’, while other types of information that should be collected are described in the corresponding ‘additional information area’. For example, for the ‘initiating’ collection group, the additional information that should be collected about the ‘data object’ is estimates of the volume of data to be produced and ideas about the potential value of the data.
Representation information should also be collected. In particular, this should include the standards that are expected to be used as well as the OAIS Information Model components, including provenance, access rights, fixity, reference and context information, and so on down the rows.
Table 10.1. Information that should be captured to support preservation and future use
Collection Group → | Initiating | Planning | Executing | Closing |
Additional Information Areas ↓ | ||||
Data object | •produce estimate of volume of data to be produced •develop ideas about the potential value of the data | •update additional information from the initiating step, based on more detailed plans •identify types of data, for instance raw or processed, that should be preserved •identify categories of data, for instance images, tables and any generic interfaces •identify quality constraints •plan the rate of data production •expand and add detail | •update additional information from planning phase based on what really happens | •finalise additional information from executing phase •create inventory of data produced that should be preserved •determine the volume that would require preservation •quality checks may be performed on the data by non-experts •define information properties that may be useful •check for (and create logs of) any missing data |
Representation information | •standards expected to be used •the OAIS Information Model | •update additional information from initiating phase on more detailed plans •review applicable standards •refine information model •choose data format •identify hardware and software dependencies •identify relationships between data items | •collect semantics of the data elements, e.g. data dictionaries and other semantics •collect format definitions and formal descriptions •create other data documentation •calibrate and test tools and system test data to be delivered | •finalise additional information from executing phase •finalise representation information networks to reasonable level •identify other software that may be used on the data •create suggestions for the designated community and the representation information needed |
•record of origins of the project, e.g. in a Current Research Information System (CRI) | •update additional information from initiating based on more detailed plans •define processing workflow, processing inputs and processing parameters •define system testing required •documents from system development milestones | •update additional information from planning based on what really happens •documentation about the hardware and software used to create the data, including a history of the changes in these over time •update documentation of processing workflow, processing inputs and processing parameters •record who was responsible for each stage of processing •record when each stage was performed •record any special hardware needed •record calibration •processing logs •record checking of fixity | •finalise additional information from executing •identify related data which may in the future be combined with these data |
This does not mean that each project must be broken down into only one initiating collection group, one planning collection group, and so on. Rather, a project may be carried out in multiple phases, and the process may be repeated in each phase. For example, in a longitudinal study, the aim may be to collect information in one country about one SDG over the whole period to 2030. This is, in principle, repeated year after year. One can then look at each year as a project phase. However, practical considerations may mean that there are changes from one year to another. Applying the IPELTU checklist is a reminder to capture necessary information (data as well as metadata) at each project phase.
Is it really being preserved? The importance of certification
Care of the information relating to measuring the SDGs is of global significance. Having collected all the information needed for preservation, it is important to ask where and how the information will be preserved and to ensure that it remains useable for as long as it is required (at least until 2030 and very possibly beyond), along with evidence to support claims of authenticity.
ISO certification based on ISO 1636313 requires third-party verification that the information holdings are being preserved securely. Conducting audit and certification under ISO accreditation14 has real benefits because the process requires continuous improvements to the repository and regular checks to ensure that everyone involved has up-to-date skills, including the auditors, certification organisations and accreditation organisations – all are checked repeatedly and consistently. Moreover, cross-checks between countries help to guarantee consistency. One of the aims of ISO certification is to facilitate international trade in services and products by allowing measurements certified in one country to be accepted in other countries. In this way, many systems on which our health, wealth and happiness depend, such as medical and food products, can be audited and certified as following the correct procedures.
Getting to where we need to be
Having looked at the challenges and solutions from an ideal point of view and the realities of measuring information related to the SDGs, we can now explore potential ways of making the difficult realities approach the ideal. Data collection has already begun across many countries, and it may not be practical to make changes in order to approach the ideal. However, there are things that can be done to strengthen the way that data are collected. Notably, we can:
•ensure that all the metadata, as required by the OAIS AIP, is collected in order to fill in the gaps that should be identified in a data management plan
•ensure that the information is preserved. At the very least, an ISO 16363 audit will identify opportunities for improving the operations of a repository.
It is important to take immediate steps to clarify what data should be collected (including clarifying the units being measured and the specific measurements to be made). It is also important to improve the way data are collected (for example by identifying women separately in datasets). If a greater level of disaggregation becomes possible, and the data can be separated into a finer level of detail, then future measures can be intercompared in greater depth. However, even with only aggregated measures, they can be compared immediately.
Having considered the guiding principles for the SDG initiative quoted at the beginning of this chapter, it is worth adding further commentary here:
(f)They will build on existing platforms and processes, where these exist, avoid duplication and respond to national circumstances, capacities, needs and priorities. They will evolve over time, considering emerging issues and the development of new methodologies, and will minimize the reporting burden on national administrations.
A collective effort should be made to draw up more detailed guidelines for the algorithms needed to process the information for each SDG so that the processes evolve in such a way that they converge and the results are compatible between countries.
(g)They will be rigorous and based on evidence, informed by country-led evaluations and data which is high-quality, accessible, timely, reliable and disaggregated by income, sex, age, race, ethnicity, migration status, disability and geographic location and other characteristics relevant in national contexts.
The guidelines should include enough detail to help the data collectors capture information as evidence of authenticity and as far as possible errors in the results can be estimated. For example, changes through time can be said to be real rather than due to random errors and the information can be re-processed/re-purposed in the future.
(h)They will require enhanced capacity-building support for developing countries, including the strengthening of national data systems and evaluation programmes, particularly in African countries, least developed countries, small island developing States, landlocked developing countries and middle-income countries.
If the concepts described in this chapter can guide this support, then the data systems in different countries can converge and training and common software systems can be shared.
(i)They will benefit from the active support of the United Nations system and other multilateral institutions.
Staff of the UN and other multilateral institutions will benefit from becoming familiar with the concepts in this chapter, which will provide a standards-based blueprint for capacity building and implementation.
Such activities will result in improvements in:
•existing platforms and processes to ensure validity, authenticity and intercomparability of the information gathered for the SDGs
•specification of the information to be gathered and the detail in which to gather it in order to improve the quality of the responses to each SDG
•ability to compare the results year on year
•ability to compare the results between countries.
Preservation requires resources, so it makes sense to share the costs (facilities, human resources) with other organisations. Sharing the techniques and knowledge needed to capture, encode and preserve information will be a valuable start. A more advanced step would be to use an ISO 16363 certified repository to preserve the information. Such a repository will need to be inspected by expert independent auditors and certified as being capable of preserving information.
Conclusion
This chapter has taken a pragmatic approach to viewing, monitoring and measuring SDG information as a large data project designed to support measuring progress towards achieving the SDGs and to focus attention on key issues that need to be resolved if the goals are to be achieved. If UN and other multilateral institution staff can draw on the concepts set out, as part of the planning process for meeting the SDGs, there will be real benefits in terms of data management and the longevity of digital materials.
The chapter is directly relevant to the issues that the key SDG working groups responsible for monitoring developments and issues relating to the indicators and their metadata have worked on, namely: metadata exchange, geo-spatial information and interlinkages between the SDGs. Incorporating the contributions that ISO standards can make to improving data quality and building the framework for preservation will strengthen the whole complex web of sustainable development and solidify the efforts underway by the United Nations and its partners.
1Transforming Our World: The 2030 Agenda for Sustainable Development, http://www.un.org/ga/search/view_doc.asp?symbol=A/RES/70/1&Lang=E.
2Michael Grubb, ‘Climate researchers’ work is turned into fake news’, Scientific American, January 2018, http://www.scientificamerican.com/article/climate-researchers-rsquo-work-is-turned-into-fake-news/.
3The OAIS Reference Model is an ISO standard (ISO 14721), which forms the basis of essentially all work done in digital preservation. It can be downloaded from: https://public.ccsds.org/Pubs/650x0m2.pdf.
4USDA Soil Quality Test Kit, http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/health/assessment/?cid=nrcs142p2_053873.
5‘How algorithms rule the world’, 1 July 2013, http://www.theguardian.com/science/2013/jul/01/how-algorithms-rule-world-nsa.
6‘Why Facebook’s news feed is changing: and how it will affect you’, 1 July 2013, http://www. theguardian. com/ technology/2018/jan/12/why-facebooks-news-feed-changing- how- will-affect-you.
7E.M. Pugh and G.H. Winslow, The Analysis of Physical Measurements (London: Addison- Wesley, 1966).
8‘Measuring the UN’s Sustainable Development Goals: an update’, September 2017, http://www.statslife.org.uk/news/3556-measuring-the-un-s-sustainable-development-goals-an-update.
9Annex of the resolution adopted by the General Assembly on 6 July 2017, Work of the Statistical Commission pertaining to the 2030 Agenda for Sustainable Development (A/RES/71/313), http://ggim.un.org/meetings/2017-4th_Mtg_IAEG-SDG-NY/documents/A_RES_71_313.pdf.
10See, for example, the report of errors in the spreadsheet formulae of Harvard’s Carmen Reinhart and Kenneth Rogoff who are two of the most respected and influential academic economists active today, at http://www.theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646.
11A Guide to the Project Management Body of Knowledge (PMBOK® Guide), 6th edn (2017), see https://www.pmi.org/pmbok-guide-standards/foundational/pmbok.
12The DAMA Guide to the Data Management Body of Knowledge (DMBOK Guide), 1st edition, 2009, http://www.dama.org/content/body-knowledge and DMBOK Version 2 see http://damadach.org/dmbok2-DMBOK-version-2/, final version available from http://www.amazon.co.uk/DAMA-DMBOK-Data-Management-Body-Knowledge/dp/1634622340.
13Audit and Certification of Trustworthy Digital Repositories, 2011, CCSDS 652.0-M-1 and ISO 16363:2012. Available from http://www.public.ccsds.org/Pubs/652x0m1.pdf.