Chapter 5 | Electronic Evidence and Electronic Signatures

The presumption that computers are ‘reliable’

Stephen Mason

5.1 This chapter considers the common law presumption in the law of England and Wales that ‘In the absence of evidence to the contrary, the courts will presume that mechanical instruments were in order at the material time’. The Law Commission formulated this presumption in 1997.¹ The concept of ‘judicial notice’² is also considered in this chapter.

Stephen Mason, ‘The presumption that computers are “reliable” ’, in Stephen Mason and Daniel Seng (eds.), Electronic Evidence and Electronic Signatures (5th edn, University of London 2021) 126–235.

1The Law Commission, Evidence in Criminal Proceedings: Hearsay and Related Topics ((Law Com No 245, 1997), 13.13. The Law Commission has an influence beyond the jurisdiction of England and Wales, for which see two cases from the Supreme Court of India, Anvar P.V. v P.K. Basheer [2014] INSC 658 (18 September 2014), where Kurian J said: ‘It is relevant to note that Section 69 of the Police and Criminal Evidence Act, 1984 (PACE) dealing with evidence on computer records in the United Kingdom was repealed by Section 60 of the Youth Justice and Criminal Evidence Act, 1999. Computer evidence hence must follow the common law rule, where a presumption exists that the computer producing the evidential output was recording properly at the material time. The presumption can be rebutted if evidence to the contrary is adduced’ (correct pagination not available in the pdf version). In Arjun Panditrao Khotar v Kailash Kushanrao Gorantyal (2020 SCC OnLine SC 571) Ramasubramanian J outlined the discussions in the Law Commission paper, but failed to consider any of the recent scholarship on this topic.

2Halsbury’s Laws (5th edn, 2015) vol 12, paras 712–723.

5.2 The reasons given by the Law Commission for the introduction of this presumption make it clear that the words ‘mechanical instruments’ include computers and computer-like devices – even though computers and computer-like devices are not mechanical instruments. Judges have, although not exclusively, used the term ‘reliable’ in relation to computers, and lawyers have also bypassed the use of the word ‘reliable’ by using the word ‘robust’. The purpose of this chapter is to consider the introduction of a presumption of ‘in order’ or ‘reliability’ or ‘working properly’ in relation to mechanical instruments generally, and to explain why the term ‘reliable’ in relation to computers and computer-like devices is not accurate, although we now have definitive evidence from computer scientists that the words ‘reliable’ and ‘robust’ as used by lawyers and judges have been exposed as not having the meaning attributed to them by the legal profession.¹ It must be emphasized that the examples of the failure of computers and similar devices discussed in this chapter are provided to demonstrate the problems that occur, and do not represent the totality of illustrations that could be used, nor the volume of errors that have occurred or will occur in the future. It is suggested that judicial notice be taken of these examples, particularly because they contradict the presumption that computers are ‘reliable’.²

1Peter Bernard Ladkin, Bev Littlewood, Harold Thimbleby and Martyn Thomas CBE, ‘The Law Commission presumption concerning the dependability of computer evidence’ (2020) 17 Digital Evidence and Electronic Signature Law Review 1; Peter Bernard Ladkin, ‘Robustness of software’ (2020) 17 Digital Evidence and Electronic Signature Law Review 15; Michael Jackson, ‘An approach to the judicial evaluation of evidence from computers and computer systems’ (2021) 18 Digital Evidence and Electronic Signature Law Review 50.

2For instance, problems with the century date change problem continue to afflict technology, and in 2038 the problem will occur again, because epoch time on Unix is stored as a 32-bit integer, which will run out of capacity at 3.14 am on 19 January 2038, for which see Chris Stokel-Walker, ‘A lazy fix 20 years ago means the Y2K bug is taking down computers now’, NewScientist Technology, 7 January 2020, https://www.newscientist.com/article/2229238-a-lazy-fix-20-years-ago-means-the-y2k-bug-is-taking-down-computers-now/; Professor Martyn Thomas, ‘What really happened in Y2K?’, Gresham College lecture, 4 April 2017, https://www.gresham.ac.uk/lectures-and-events/what-really-happened-in-y2k.

The purpose of a presumption

5.3 The aim of a presumption, which allocates the burden of proof, is to alleviate the need to prove every item of evidence adduced in legal proceedings, to reduce the need for evidence in relation to some issues and to save ‘the time and expense of proving the obvious’.¹ In an appeal before the Supreme Court of South Australia, Travers J explained the rationale in the case of Barker v Fauser² regarding the accuracy³ of the readings of a weighbridge:

It is rather a matter of the application of the ordinary principles of circumstantial evidence. In my opinion such instruments can merely provide prima-facie evidence in the sense indicated by May v. O’Sullivan [(1955) 92 CLR 654]. They do not transfer any onus of proof to one who disputes them, though they may, and often do, create a case to answer. Circumstantial evidence is something which is largely based upon our ordinary experience of life … It is merely an application of this principle to our ordinary experience in life which tells us of the general probability of the substantial correctness of watches, weighbridges and other such instruments. If they are instruments or machines of a type which we know to be in common use our experience tells us that this is suggestive of their substantial correctness. Experience also tells us that they are rarely completely accurate, but usually so substantially accurate that people go on using them, and that subject to a certain amount of allowance for some measure of incorrectness, they act upon them.⁴

1Holt v Auckland City Council [1980] 2 NZLR 124, per Richardson J at 128.

2(1962) SASR 176.

3The words ‘accurate’, ‘precision’ and ‘correctness’ are often used interchangeably in the everyday sense, but they have different meanings in their technical use. I owe this observation to Professor Martin Newby.

4(1962) SASR 176 at 178–179.

5.4 This explanation justifies the rationale for the presumption that mechanical instruments were in order at the material time. However, it appears that this presumption exists on the basis of expediency. In admitting evidence from a mechanical instrument or similar device, judges have not justified the presumption on the basis of relevant scientific evidence, but have substituted for it concepts such as ‘common use’, ‘ordinary experience’ or ‘substantial correctness’.

5.5 Consider the accuracy of a watch. Just because a watch has passed tests of accuracy at one moment in time does not preclude its mechanical parts from failing subsequently. In Cheatle v Considine, Travers J put the discussion of the accuracy of mechanical instruments into its overall context:

My view on the subject of such instruments is that reliance on them is basically an application of circumstantial evidence. The fact that people go on relying upon watches, speedometers, or even hearing aids, seems to be some circumstantial proof that all these things do provide some aid or assistance to those who use them, otherwise they would not go on using them. They are not necessarily accurate, and indeed, probably, most of such instruments on being properly tested would reveal some degree of inaccuracy. But I think in the absence of contrary evidence, they are to be regarded as some proof.¹

1Cheatle v Considine [1965] SASR 281 at 282.

Presumptions and mechanical instruments

5.6 The presumption that scientific instruments work properly has a long history.¹ For instance, scales benefit from the presumption.² Timing devices also take advantage of the presumption. In Plancq v Marks,³ in an appeal against conviction for driving a motor car in excess of the speed limit of 20 mph, the evidence of the police officer was challenged. The stopwatch used by the police officer was produced in court. The appeal focused on the ground that the police officer gave opinion evidence as to the speed of the vehicle. This appeal was dismissed on the basis that the police officer was merely giving oral evidence of the actions of the stopwatch, which did not constitute the giving of opinion evidence. The real issue was whether the police officer was telling the truth.

1R. P. Groom-Johnson and G. F. L. Bridgman (eds), A Treatise on the Law of Evidence (12th edn, Sweet and Maxwell 1931), 167, in which the working accuracy of certain scientific instruments, such as watches, clocks, thermometers, aneroids and anemometers, among other ‘ingenious contrivances’, was recognized in the absence of evidence to the contrary.

2Giles v Dodds [1947] VLR 465, [1947] ArgusLawRp 53; (1947) 53 Argus LR 584.

3(1906) 94 LT NS 577.

5.7 Arguments that a watch used to prove that the defendant was speeding ought to be tested have been ignored,¹ as in the case of Gorham v Brice.² The Lord Chief Justice dismissed the appeal against conviction for driving a motor car in excess of the speed limit of 12 mph without considering the point. In comparison, the members of the Divisional Court in Melhuish v Morris³ allowed an appeal against speeding because the speedometer of the police vehicle had not been tested for accuracy.⁴ The court in Nicholas v Penny⁵ subsequently overturned this decision. Lord Goddard CJ commented:

The question in the present case is whether, if evidence is given that a mechanical device, such as a watch or speedometer – and I cannot see any difference in principle between a watch and a speedometer – recorded a particular time or a particular speed, which is the purpose of that instrument to record, that can by itself be prima facie evidence, on which the court can act, of that time or speed.⁶

1In communication with the author, Professor Lorenzo Strigini, Professor of Systems Engineering School of Mathematics, Computer Science and Engineering, Department of Computer Science, City University of London, points out that, from an engineering point of view, testing that a watch is accurate enough now (which usually implies that it was accurate until now, unless it has been repaired) is an inexpensive enough exercise that not doing it seems a dereliction of duty.

2(1902) 18 TLR 424.

3[1938] 4 All ER 98, [1938] 10 WLUK 7; see also ‘Evidence in speed limit cases’, The Journal of Criminal Law (1937) 1(2) 181.

4Evidence that the accused did not exhibit the usual signs of being intoxicated can indicate that a machine is not working properly: R. v Crown Prosecution Service Ex p. Spurrier [1999] 7 WLUK 431, (2000) 164 JP 369, [2000] RTR 60, Times, 12 August 1999, [1999] CLY 883, also known as DPP v Spurrier. Police officers can conduct physical tests to ensure a speedometer is working accurately, for which see Mohammed Aslam Pervez v Procurator [2000] ScotHC 111.

5[1950] 2 KB 466, [1950] 2 All ER 89, 66 TLR (Pt. 1) 1122, [1950] 5 WLUK 20, (1950) 114 JP 335, 48 LGR 535, 21 ALR2d 1193, (1950) 94 SJ 437, [1947–51] CLY 9158, also known as Penny v Nicholas; 66 Law Quarterly Review (1950) 264, 441; in the South Australian case of Peterson v Holmes [1927] SASR 419, Piper J asked, at 421, ‘If [appears as “It” is in the report, but this must be a mistake] the speedometer be tested by stop-watches and measured distances, what about the accuracy of the watches and the chain measure?’; ‘Proof of excessive speed’ (1950) XIV(4) The Journal of Criminal Law 360.

6[1950] 2 KB 466 at 473.

5.8 The judge went on to suggest that because the defendant was accused of exceeding the speed limit by 10 mph, it ‘would be a considerable error in the speedometer if it were as much out as that’.¹ Such a comment was not intended, it is suggested, to create a presumption that such devices are reliable, especially as Lord Goddard CJ commented that ‘the justices need never accept any evidence if they do not believe it, or feel that for some reason they cannot accept it’.² A similar issue arose in the case of H. Gould and Company Limited v Cameron,³ where the pressure in the tyres of a heavy motor vehicle was tested in July and found to be over the legal limit. The instrument used to test the tyre pressure had itself been tested in March of the previous year, and in August in the year following the reading. The defence argued that the instrument might have developed an error after being tested in March. It was known and accepted that, at certain pressures, the device would be in error of 1 lb over a range of tests between 70 lb and 100 lb. This error had been taken into account in this case. Northcroft J said:

In a case such as this, where of necessity, a mechanical device must be used to ascertain the pressure within the tyres, it is sufficient, I think, to show that the instrument is used correctly, and that, from its nature and history, it may reasonably be relied upon by the Court. The history of this instrument and the description of its use satisfies me that the learned Magistrate was justified in accepting it, as I do, as a reliable test on this occasion.⁴

1[1950] 2 KB 466 at 473.

2[1950] 2 KB 466 at 742. In R v Amyot (1968) 2 OR 626, Clare Co.Ct.J accepted the use of a stop-watch to measure the time a vehicle took to travel between marked points on a highway, where the police officer had personally checked the distance between the markings using a cyclometer and made the observations with the stop-watch in an aircraft.

3[1951] NZLR 314.

4[1951] NZLR 314 at 316 (40–45).

5.9 The observations by Shadbolt DCJ in the New South Wales case of Re Appeal of White¹ put the matter into perspective when hearing an appeal for exceeding the speed limit, where he noted, at 430:

Courts have been generally loath to be wearied in seeking proof of some absolute measure or requiring it in cases such as this. It is not possible for every child to check his wooden ruler with the standard metre in Canberra nor every grocer his scales with the standard gram. Most of us accept the ruler’s accuracy and the weight of the grocer’s scales.

1(1987) 9 NSWLR 427.

5.10 It does not follow, however, that every measuring device is accurate.

Judicial formulations of the presumption that mechanical instruments are in order when used

Judicial notice

5.11 There are a number of reasons for the doctrine of judicial notice:¹ to expedite the hearing of a case where obvious facts do not need proving; to promote uniformity in judicial decision making and to prevent the possibility of a decision which is demonstrably erroneous or false.² Brett JA summed up the concept in R v Aspinall: ‘Judges are entitled and bound to take judicial notice of that which is the common knowledge of the great majority of mankind and of the greater majority of men of business.’³ In the High Court of Australia,⁴ Isaacs J emphasized the guiding principle of the doctrine:

The only guiding principle – apart from Statute – as to judicial notice which emerges from the various recorded cases, appears to be that wherever a fact is so generally known that every ordinary person may be reasonably presumed to be aware of it, the Court ‘notices’ it, either simpliciter if it is at once satisfied of the fact without more, or after such information or investigation as it considers reliable and necessary in order to eliminate any reasonable doubt.

The basic essential is that the fact is to be of a class that is so generally known as to give rise to the presumption that all persons are aware of it.⁵

1See Law Commission New Zealand, Evidence Law: Documentary Evidence and Judicial Notice: A Discussion Paper (Preliminary Paper No 22, 1994) Chapter IX for a nuanced consideration of the topic; Hodge M. Malek (ed), Phipson on Evidence (19th edn, Sweet & Maxwell 2018), chapter 3.

2Christopher Allen, ‘Case comment: judicial notice extended’ (1998) E & P 37, 39; David M. Paciocco, ‘Proof and progress: coping with the law of evidence in a technological age’ (2013) 11(2) Canadian Journal of Law and Technology 181, 188–189; Evidence (Interim) [1985] ALRC 26 [969]; Law Commission New Zealand, Evidence Law: Documentary Evidence and Judicial Notice: A Discussion Paper (Preliminary Paper No 22, 1994), [259].

3(1876) 3 QBD 48 at 61–62.

4Holland v Jones (1917) 23 CLR 149, [1917] VLR 392, 23 ALR 165, 1917 WL 15976, [1917] HCA 26.

5(1917) 23 CLR 149 at 153.

5.12 The practical approach was considered in Commonwealth Shipping Representative v Peninsular and Oriental Branch Service¹ by Lord Summer:

My Lords, to require that a judge should affect a cloistered aloofness from facts that every other man in Court is fully aware of, and should insist on having proof on oath of what, as a man of the world, he knows already better than any witness can tell him, is a rule that may easily become pedantic and futile.²

1[1923] AC 191, (1922) 13 Ll L Rep 455, [1922] 12 WLUK 85, also known as Peninsular & Oriental Branch Service v Commonwealth Shipping Representative.

2[1923] AC 191 at 211.

5.13 The doctrine of judicial notice is restricted to very clear knowledge,¹ and it can be more severe in its effect than a presumption, as noted by Susan G. Drummond:

It is a manoeuvre that forecloses further evidence. The judge operates, in this case, as a virtually unlimited authority with limitations imposed only from within the legal hierarchy. Judicial notice can only be contested on appeal and invalidated if it can be demonstrated that the criteria for the application of judicial notice were not present (the fact was not notorious, the sources to establish the fact were not indisputable ...). As judicially noticed matters operate in the domain of fact, not law, they have no precedential value.²

1For discussions on the confusing treatment of this doctrine, see G. D. Nokes, ‘The limits of judicial notice’ (1958) 74 LQR 59 and Susan G. Drummond, ‘Judicial notice: the very texture of legal reasoning’ 15 No 1 Can JL & Soc’y 1.

2Drummond, ‘Judicial notice: the very texture of legal reasoning’, 4.

5.14 Given that it appears as if this doctrine has been extended to electronic evidence in Canada, this observation by Drummond illustrates the importance of ensuring judges more fully understand the nature of the world in which they now live. Thorson JA discussed judicial notice in R. v Potts before the Ontario Supreme Court, Court of Appeal:

Judicial notice, it has been said, is the acceptance by a court or judicial tribunal, without the requirement of proof, of the truth of a particular fact or state of affairs that is of such general or common knowledge in the community that proof of it can be dispensed with.

…

Thus it has been held that, generally speaking, a court may properly take judicial notice of any fact or matter which is so generally known and accepted that it cannot reasonably be questioned, or any fact or matter which can readily be determined or verified by resort to sources whose accuracy cannot reasonably be questioned.¹

11982 CarswellOnt 56, [1982] OJ No 3207, 134 DLR (3d) 227, 14 MVR 72, 26 CR (3d) 252, 36 OR (2d) 195, 66 CCC (2d) 219, 7 WCB 236, at [15].

5.15 In R. v Find,¹ before the Supreme Court of Canada, McLachlin CJC, at [48], held that the threshold for judicial notice is strict:

Judicial notice dispenses with the need for proof of facts that are clearly uncontroversial or beyond reasonable dispute. Facts judicially noticed are not proved by evidence under oath. Nor are they tested by cross-examination. Therefore, the threshold for judicial notice is strict: a court may properly take judicial notice of facts that are either: (1) so notorious or generally accepted as not to be the subject of debate among reasonable persons; or (2) capable of immediate and accurate demonstration by resort to readily accessible sources of indisputable accuracy.

12001 CarswellOnt 1702, 2001 CarswellOnt 1703, 2001 SCC 32, [2001] 1 SCR863, [2001] SCJ No 34, 146 OAC 236, 154 CCC (3d) 97, 199 DLR (4th) 193, 269 NR 149, 42 CR (5th) 1, 49 WCB (2d) 595, 82 CRR (2d) 247, JE 2001–1099, REJB 2001–24178.

5.16 The concept of ‘notorious’ is considered in Phipson:

the concept covers matters being so notorious or clearly established or susceptible of demonstration by reference to a readily obtainable and authoritative source that evidence of their existence is unnecessary. Some facts are so notorious or so well established to the knowledge of the court that they may be accepted without further enquiry.¹

1Malek, Phipson on Evidence, para 3:02.

5.17 The judge can conduct her own research, and the United States Court of Appeals, Ninth Circuit reached conclusions regarding automatic programs in this way, as in U.S. v Lizarraga-Tirado, where Kozinski CJ said:

Because there was no evidence at trial as to how the tack and its label were put on the satellite image, we must determine, if we can, whether the tack was computer generated or placed manually. Fortunately, we can take judicial notice of the fact that the tack was automatically generated by the Google Earth program. By looking to ‘sources whose accuracy cannot reasonably be questioned’ – here, the program – we can ‘accurately and readily determine [ ]‌’ that the tack was placed automatically. See Fed.R.Evid. 201(b). Specifically, we can access Google Earth and type in the GPS coordinates, and have done so, which results in an identical tack to the one shown on the satellite image admitted at trial.¹

1789 F.3d 1107 (9th Cir. 2015), 1109. Although judges should be wary of reaching conclusions without adequate evidence, as in the case of 1475182 Ontario Inc. o/a Edges Contracting v Ghotbi, 2021 ONSC 3477 (CanLII), where Boswell J incorrectly determined, at [50], that the unique telephone number linked to a cellular telephone, taken together with the International Mobile Equipment Identifier number ‘provide, in effect, a digital signature on every message sent by the user of that particular device.’

5.18 In justifying judicial notice, David M. Paciocco comments: ‘If a court could not rely on a notorious and incontrovertible material fact because it had not been proved, verdicts would not conform to reality. The repute of the administration of justice would be harmed.’¹ Paciocco went on to illustrate his argument with the following example of how a brake on a motor vehicle operates:

For example when someone describes putting the brakes on in a car no-one offers expert testimony that the function of brakes is to slow or stop vehicles, that brakes are typically controlled by foot-pedals that are depressed in order to slow or stop the vehicle, or that brakes are depressed gently to come to a gradual stop and aggressively for an emergency stop.²

1Paciocco, ‘Proof and progress’, 188–189.

2Paciocco, ‘Proof and progress’, 189

5.19 There is a distinction between the purpose of a brake on a motor vehicle (which is the fact in issue in the above illustration) and how the braking system operates (if the fact in issue is whether the brakes actually worked). In the example above, Paciocco made assumptions about how braking systems work and failed to understand the nature of the technology. Most braking systems in motor vehicles are controlled by a mix of electronic systems and software code (a fact so notorious that no citation ought to be required¹). It is more accurate, using a high-level functional description of the brake system, to explain the braking technology in vehicles as involving the use of brakes primarily under the control of electronics or software code. The failsafe fallback strategy for most modern brake systems is that if the electronics or software code fails, the system reverts to a standard hydraulic brake system. It does not necessarily follow that the function is always performed correctly or as normally expected in the situation where the action is mediated by electronic systems. For instance, anti-lock braking systems (ABS), electronic stability control (ESC) and traction control are predicated on interactions between the engine torque output and brake control on individual wheels and so on (such as using data from accelerometers). This means that there is a possible difference between the fact that a braking event took place and whether or not a braking event was requested, and vice versa.² This example is far from the strict application of the doctrine as noted in the Supreme Court of Canada by McLachlin CJC. If judicial notice is extended to such an extent, then the question of whether justice is served by this doctrine must be carefully scrutinized.

1Notwithstanding that it is notorious that anti-lock brake systems are partly controlled by software code and electronic systems, the reader can obtain more information from the Society of Automotive Engineers International, the open access journal Intelligent Control and Automation and IEEE Transactions on Vehicular Technology.

2I owe this point to Dr Michael Ellims; see also the following, in which it is demonstrated that braking systems can be controlled by hacking into the motor vehicle computer system: Chris Valasek and Charlie Miller, Adventures in Automotive Networks and Control Units (Technical White Paper, 2014), http://www.ioactive.com/pdfs/IOActive_Adventures_in_Automotive_Networks_and_Control_Units.pdf; Dr Charlie Miller and Chris Valasek, Remote Exploitation of an Unaltered Passenger Vehicle (2015), http://illmatics.com/Remote%20Car%20Hacking.pdf; Roderick Currie, Developments in Car Hacking (SANS Institute, 2015), https://www.sans.org/reading-room/whitepapers/internet/developments-car-hacking-36607.

A ‘notorious’ class

5.20 In the Victoria case of Crawley v Laidlaw,¹ Lowe J considered, at 374, the basis upon which a presumption might apply – in this case regarding a scientific instrument:

I do not question that such a presumption is frequently and (in general) tacitly acted on by our Courts; but in my opinion it must appear from evidence before the Court, or from something which stands in place of evidence, e.g., judicial notice, that the instrument in question is a scientific instrument, before the presumption applies.

1(1930) VLR 370.

5.21 The prosecution sought to adduce evidence from two weighing machines called ‘loadometers’ to prove a motor truck was carrying a greater weight than that allowed by the regulations. The Police Magistrate who heard the case had dismissed it on the basis that there was no evidence to demonstrate the correctness of the instruments. On appeal, Lowe J concurred, holding that there was no evidence that the devices were scientific instruments, and there was no foundation for a presumption that the instruments worked properly. Emphasizing the need to establish a foundation for the presumption, Lowe J observed:

I do not doubt that in appropriate cases the Court will use its ‘general information and … knowledge of the common affairs of life which men of ordinary intelligence possess’ – Phipson on Evidence (6th ed.), p. 19 – and that of the nature of most, if not all, of the instruments mentioned in the paragraph cited from Taylor¹ would require no evidence in order to raise the presumption relied on. I think, too, that the Court may, if it thinks it desirable, refer to appropriate standard works of reference in order to inform itself of matters of the kind mentioned of, which it may personally be unaware. But if, after such reference, the Court is still ignorant of the nature of the instrument in question, no help can be got from the presumption relied on. Apparently the learned magistrate did not know, and I myself do not know, what a loadometer is. I may guess from the derivation of the name what the instrument is, but my guess is not evidence.²

1Taylor on Evidence (10th edn), s 183, where the author wrote: ‘The working accuracy of scientific instruments is also presumed. For example, in the absence of evidence to the contrary, a jury would be advised to rely on the correctness of a watch or clock, which had been consulted to fix the time when a certain event happened; a thermometer would be regarded as a sufficiently safe indication of the heat of any liquid in which it had been immersed; a pedometer would afford prima facie evidence of the distance between two places which had been traversed by the wearer; and similar prima facie credit would be given to aneroids, anemometers, and other scientific instruments; and blood stains are every day detected by means of known chemical tests:’ (1930) VLR 370 at 373–374. This quote uses the term ‘correctness’; others seem to refer to ‘sufficient accuracy’. A measurement instrument for a continuous quantity has a degree of accuracy (how close the reading is to the real value) and a degree of precision (how tightly spaced the points are on its scale), but its reading will not usually be exactly ‘correct’. This may have a bearing on how digital devices are seen. A tiny amount of damage to the mechanical mechanism of a scale might cause it to be slightly off the exact reading of weight, but a tiny mistake in software may change the response to some specific inputs substantially. I owe this insight to Professor Strigini.

2(1930) VLR 370 at 374.

5.22 Herring CJ made comments similar to Lowe J’s in the Victoria case of Porter v Koladzeij.¹ This case involved the review of the refusal of a Stipendiary Magistrate to admit evidence of an analogue device to measure the amount of alcohol in a sample of breath. The judge observed that certain instruments of a scientific or technical nature fell into a ‘notorious’ class that by general experience are known to be trustworthy.² He placed a speedometer into this class. However, the evidence from the device to measure breath alcohol was rejected because it was not a standard device, and because the evidence given by the witness regarding the device was not adequate. The judge said that once breath analysis devices were used more often, they would become standard, and then judicial notice would be taken of their existence as scientific or technical instruments,³ although it was necessary to present relevant evidence to the court:

Where, however, the instrument in question does not fall within the notorious class, then his Honour made it clear that evidence must be given to establish that it is a scientific or technical instrument of such a kind, as may be expected to be trustworthy, before the presumption can be relied upon.⁴

1(1962) VR 75.

2Falling back on ‘general experience’ is dubious, because few people check the correctness of the instruments they might use. People routinely use imprecise instruments such as house thermometers and speedometers, and seldom have occasions for questioning the readings. Relying on the reading does not make the reading accurate.

3The Supreme Court in South Australia refused to take judicial notice of the accuracy of the breathalyser in 2012: Police v Bleeze [2012] SASCF 54 at [88] and [89].

4(1962) VR 75 at 78.

5.23 The failure to obtain such evidence can lead to scenarios such as that described by Thomas E. Workman below:

In Florida, one citizen was tested 13 times on one machine, by one officer, in one hour. These instances occur because in some situations, a machine that registers an error or multiple errors may finally produce a value that has the appearance of being a valid test. The Courts are usually unaware of the history of failures on the machine, and believe that the result is legitimate, when in fact [it] may not be.¹

1Thomas E. Workman, Jr, ‘Massachusetts breath testing for alcohol: a computer science perspective’ (2008) 8 J High Tech L. 209, 217.

5.24 In this context,¹ it is relevant to consider the decision of the Supreme Court in New Jersey in the United States, which ordered the software of a breath-testing device to be reviewed in detail in the case of State of New Jersey v Chun.² In his judgment, Hoens J began by stating that: ‘For decades, this Court has recognized that certain breath testing devices, commonly known as breathalyzers, are scientifically reliable and accurate instruments for determining blood alcohol concentration.’ This comment was based on the old technology. With the introduction of a new device, the Alcotest 7100 MK111-C, which was selected by the department of the Attorney General, the court agreed to test the scientific validity of the machine. After extensive testing, the court concluded that the Alcotest, utilizing New Jersey Firmware version 3.11, ‘is generally scientifically reliable’, but modifications were required to enable its results to be admitted into legal proceedings.³ The testing of the software revealed the following issues, among others:⁴

1. That a mathematical algorithm that corrected for fuel-cell drift did not undermine the reliability of the results, but it was recommended that the machines be recalibrated every six months to ensure fuel cells are replaced regularly.

2. That a specific buffer overflow error should be corrected.

3. That a specific number of documents be produced for the purposes of foundation of evidence, as recommended by the court.

4. That the recommendations by the defendants’ experts for reorganizing and simplifying the source code be considered for implementation.

1These devices are also discussed, in the context of England and Wales, under the heading ‘The statutory presumption’ below.

2194 N.J. 54, 943 A.2d 114.

3943 A.2d 114 at 120.

4943 A.2d 114 at 134.

5.25 The analysis of the source code indicated that there was a fault when a third breath sample was taken that could cause the reading to be incorrect, and the court saw fit to order a change in one of the formulae used in the software. Save that the extensive analysis of the device and the source code took some time and some expense, little of substance was found to be wrong with the machine. However, there are two significant points that arise as a result of this case: the first is that the software that controlled the device, written by a human, was defective, which in turn meant that the data relied upon for the truth of the statement was defective and therefore affected the accuracy and truthfulness of the evidence; and the decision by the court to intervene by ordering certain changes and modifications to be carried out, one of which was a change in a formula, meant that part of the evidence used against drivers in the future would be a set of instructions provided by the Supreme Court of New Jersey.¹

1There is a considerable body of case law relating to challenges of breathalyser devices in the US. Some of the articles that discuss the position are (in addition to those already cited): Charles Short, ‘Guilt by machine: the problem of source code discovery in Florida DUYI prosecutions’ (2009) 61 Fla L Rev 61, 177; Cheyenne L. Palmer, ‘DUIs and apple pie: a survey of American jurisprudence in DUI prosecutions’ (2010) 13 UDC L Rev 407; Aurora J. Wilson, ‘Discovery of breathalyzer source code in DUI prosecutions’ (2011) 7 Wash JL Tech & Arts 121; Kathleen E. Watson, ‘COBRA data and the right to confront technology against you’ (2015) 42 N Ky L Rev 375.

5.26 However, it is not necessary to rely on a presumption that an instrument is accurate or reliable in lieu of other evidence that the data produced by the instrument is accurate.¹ For instance, a satellite navigation system was the subject of discussion in Chiou Yaou Fa v Thomas Morris² before the Supreme Court of the Northern Territory of Australia. In this case, the commander of the vessel established his position by using the satellite navigation system, radar and sextant. The court accepted the evidence that a variety of methods were used to establish the position at sea, including the expertise of qualified navigators. Even though the court heard their testimony as to the accuracy of the satellite navigation system, it concluded that it was not necessary to determine, and therefore rely upon, the satellite navigation system as being in the ‘notorious’ class, and accepted the radar and sextant evidence in its place.³

1In R. v Ranger 2010 CarswellOnt 8572, 2010 ONCA 759, [2010] OJ No 4840, 91 WCB (2d) 271, the Ontario Court of Appeal held at [16]: ‘it is now notorious that cell phone users engaged in a cell phone call and travelling from point A to point B will find their cell phone signal passes from one cell phone tower to another at different locations along the route from point A to point B’, which led the court to consider that the trial judge did not err ‘in taking judicial notice that a particular cell phone was in a general location based on the tower that received the signal and that the path along which the cell phone was moving could be determined by reference to the cell phone towers that received the signal transmission in respect of particular calls’.

2[1987] NTSC 20; 46 NTR 1; 87 FLR 36; 27 A Crim R 342 (8 May 1987).

3Here are a selection of cases dealing with aerial photography, infra-red rays and images from satellites. International Court of Justice: Land and Maritime Boundary between Cameroon and Nigeria, ICJ Reports 1991, 31; Kasikili/Sedudu Island (Botswana/Namibia), ICJ Reports 1999, 1045; Maritime Delimitation and Territorial Questions between Qatar and Bahrain, ICJ Reports, 2001, Judgment (Merits), 16 March 2001; Survey of Recent Court Cases that Consider Remote Sensing Data as Evidence – Case Concerning Frontier Dispute, ICJ Reports 1986, 554. Permanent Court of Arbitration: Eritrea/Yemen, Award 9 October 1998; Award 17 December 1999. Australia: Witheyman v Simpson [2009] QCA 388; McKay v Doonan [2005] QDC 311; Maple Holdings Limited v State of Queensland [2001] QPEC 056. England and Wales: Associated British Ports v Hydro Soil Services NV [2006] EWHC 1187 (TCC), [2006] 6 WLUK 575. Singapore: Virtual Map (Singapore) v Singapore Land Authority [2008] SGHC 42. USA: St. Martin v Mobil Exploration & Producing U.S. Inc., 224 F.3d 402 (5th Cir. 2000), 31 Envtl. L. Rep. 20, 01155 Fed. R. Evid. Serv. 270 (aerial photography); Connecticut v Wright, 58 Conn.App. 136, 752 A.2d 1147 (Conn.App. 2000) (computer-generated engineering map); Wetsel-Oviatti Lumber Co. Inc., v United States, 40 Fed.Cl. 557 (1998) (aerial photography); United States v Kilgus, 571 F.2d 508 (9th Cir. 1978) (infra-red rays); Pittson Co. v Allianz Insurance Co., 905 F.Supp. 1279 (D.N.J. 1995) rev’d in part on other grounds, 124 F.3d 508 (3d Cir. 1997) (aerial photography); Ponca Tribe of Indians of Oklahoma v Continental Carbon Co., 2008 WL 7211981 (digital orthophoto); Gasser v United States, 14 Cl.Ct. 476 (1988) (aerial and satellite photographs); I & M Rail Link v Northstar Navigation, 21 F.Supp. 849 (N.D.Ill. 1998) (satellite photography); Wojciechowicz v United States, 576 F.Supp.2d 214 (D.Puerto Rico 2008) (satellite photography); Lisker v Knowles, 651 F.Supp.2d 1097 (C.D. Cal. 2009) (satellite photography); United States v Fullwood, 342 F.3d 409 (5th Cir. 2003) (satellite photography); Fry v King, 192 Ohio App.3d 692, 950 N.E.2d 229 (Ohio App. 2 Dist. 2011), 2011 WL 766583 (satellite photography); State v Reed, 2009 WL 2991548 (Google Earth evidence rejected); State of New Jersey in the Interests of J. B. A Minor, 2010 WL 3836755 (Google Earth evidence admitted); Swayden v Ricke, 242 P.3d 1281 (2010), 2010 WL 4977158 (Google Earth images and photographs from ‘trail cameras’); Banks v U.S., 94 Fed.Cl. 68 (2010) (satellite photography).

Common knowledge

5.27 Another justification for accepting that a mechanical instrument is in order when it is used is the assertion that it is a type of instrument that is commonly held to be – more often than not – in ‘working order’. In discussing mechanical instruments, it does not appear that lawyers or judges have ever concerned themselves with how the instrument has been maintained, or have considered the maintenance history of the instrument. In a case before the full court of the Supreme Court of Western Australia, Zappia v Webb,¹ the question was whether an amphometer, used to determine the speed of a vehicle, could be considered an accepted scientific instrument. Jackson CJ discussed this as follows:

It is, however, common knowledge that amphometers have been widely used in this State for a number of years for the purpose of checking the speed of motor vehicles.² As one drives through the country, it is common-place to see large notices by the side of the road warning motorists that amphometers are used in the district, and it is not at all uncommon to see a traffic inspector by the side of the road with his amphometer equipment set up. It is also, I believe, generally accepted in the community that an amphometer correctly set up and operated will give a reliable reading of speed, not necessarily precise, but sufficiently accurate for its purpose. There has not been, so far as I am aware, any general complaint about the use or efficiency of these machines, and there must be hundreds of speeding convictions each year resulting from their use.

It seems to me, therefore, that an amphometer is now a well known and accepted speed checking device and that judicial notice should be taken in this State of its use and effectiveness, in general terms.³

1(1974) WAR 15; (1973) 29 LGRA 438.

2Lie detectors are widely used and are scientifically shown to be useless, for which see Shane O’Mara, Why Torture Doesn’t Work: The Neuroscience of Interrogation (Harvard University Press 2015), chapter 3 ‘Can we use technology to detect deception?’; George W. Maschke and Gino J. Scalabrini, The Lie Behind the Lie Detector (5th edn, AntiPolygraph.org 2018), https://antipolygraph.org/pubs.shtml.

3(1973) 29 LGRA 438 at 440–441.

5.28 The Chief Justice referred to the ‘common knowledge’ of the use of amphometers without referring to any evidence to demonstrate that they were reliable. He also asserted that somehow it was generally accepted that the device would give a reliable reading of speed (without discussing whether the amphometer was calibrated, and if so, to what standard) and concluded that because he was not aware of any complaints about the devices, they were therefore to be considered an accepted speed-checking device.

5.29 In Castle v Cross,¹ the prosecution relied on the presumption that mechanical instruments were in order when they were used. In the judgment, Stephen Brown LJ cited a passage from Cross on Evidence (1979)² regarding this presumption:

A presumption which serves the same purpose of saving the time and expense of calling evidence as that served by the maxim omnia praesumuntur rite esse acta is the presumption that mechanical instruments were in order when they were used. In the absence of evidence to the contrary, the courts will presume that stopwatches and speedometers and traffic lights were in order at the material time; but the instrument must be one of a kind which it is common knowledge that they are more often than not in working order.³

1[1984] 1 WLR 1372, [1985] 1 All ER 87, [1984] 7 WLUK 180, [1985] RTR 62, [1984] Crim LR 682, (1984) 81 LSG 2596, (1984) 128 SJ 855, [1985] CLY 3048.

2Page 47 of the fifth edition.

3[1984] 1 WLR 1372 at 1376H–1377A.

5.30 The Latin tag omnia praesumuntur rite esse acta means ‘all acts are presumed to have been done rightly and regularly’ or ‘all things are presumed to have been done regularly and with due formality until the contrary is proved’. Such a presumption cannot operate in a vacuum, as indicated by Stephen Brown LJ’s preference for the above formulation in Cross on Evidence, which requires the basic fact – proof that the instrument be one of a kind which is common knowledge that they are more often than not in working order – to be established before the presumption could operate, as opposed to the same formulation of the presumption in Phipson on Evidence, which did not adopt the basic fact.¹

1[1984] 1 WLR 1372 at 1377.

5.31 In this case, counsel for the Crown put forward the case that the device in question, a Lion Intoximeter 3000, was a sophisticated machine that depended in part on software code, but this did not set it in a different class from other sophisticated mechanical devices and instruments. The presumption stood unchallenged because the defence ‘argued forcefully that the potential for computer error renders the consideration of evidence stemming from a computer particularly sensitive and places it into a separate class in relation to its admissibility’.¹ It is unclear from the judgment of Stephen Brown LJ whether His Lordship relied on the presumption in admitting the printout from the Lion Intoximeter 3000, because the central issue in this case appears to be the admissibility of the printout as real evidence.

1[1984] 1 WLR 1372 at 1379D.

5.32 The case of Anderton v Waring¹ also concerned the reading from a Lion Intoximeter 3000. In giving the judgment of the court, May LJ stated that the ‘Intoximeter ought to have been assumed by the justices to have been in good working order unless the contrary was proved’.² Counsel for the prosecution cited from the fourth edition of Cross on Evidence:³ ‘In the absence of evidence to the contrary, the courts will presume that [mechanical instruments] were in order at the material time’.⁴ However, the barrister omitted to continue, and cite the basic fact that ‘the instrument must be one of a kind as to which it is common knowledge that they are more often than not in working order’.⁵ This has to be a misapplication of the presumption, because a presumption cannot operate in a vacuum without the basic fact or facts. In addition, the manufacturers of intoximeters (and almost all forms of software) refuse to share their code, so there is no way to establish any such basic fact or facts – in addition to which, the US cases (discussed above) illustrate that such devices are not reliable.

1[1985] 2 WLUK 274, [1986] RTR 74, (1985) 82 LSG 1417, Times, 11 March 1985, [1986] CLY 2883.

2[1986] RTR 74 at 80F.

3Page 47.

4[1986] RTR 74 at 79E.

5Cross on Evidence (6th edn, 1985), 28; Professor Tapper mentioned this omission in Colin Tapper, ‘Reform of the law of evidence in relation to the output from computers’ (1995) 3(1) Intl J L & Info Tech 79, 89.

5.33 A more recent reformulation of the presumption has been articulated by Kerr LCJ, as he then was, when he rejected the suggestion that the machine in question ought to be commonly known to be – more often than not – in working order. In Public Prosecution Service v McGowan,¹ Kerr LCJ said:

In so far as the passage from Cross and Tapper suggests that for the presumption to operate it will always be necessary that the machine was commonly known to be more often than not in working order, we would not accept it. We consider that the presumption must be that machines such as a cash register are operating properly and in working order in the absence of evidence to the contrary. The presumption of the correct operation of equipment and proper setting is a common law presumption recognised by article 33(2) [Criminal Justice (Evidence) (Northern Ireland) Order 2004]. In the modern world the presumption of equipment being properly constructed and operating correctly must be strong.²

1[2008] NICA 13, [2009] NI 1.

2[2009] NI 1 at [20].

5.34 Kerr LCJ’s deviation from the formulation of the presumption, which requires proof of the basic fact, is unwarranted. Furthermore, Kerr LCJ’s formulation of the presumption without the basic fact leads to the extraordinarily broad assumption that all devices and machines are operating properly and in working order, an assumption for which His Lordship did not cite any relevant evidence in support. In particular, there was nothing in the judgment to indicate what he understood by ‘equipment’, or how the equipment was ‘properly constructed’, nor did he provide any evidence as to what he meant by ‘operating correctly’ or ‘proper setting’.¹

1The assumption of correctness would be verified by recording the performance of the machine, just as when one constructs a quality control chart. I owe this observation to Professor Martin Newby.

Evidential foundations of the presumption

5.35 It is suggested that the correct articulation of the presumption for mechanical instruments is as follows:¹

For a mechanical instrument (including stand-alone computers, computer-like devices and digital systems) to benefit from the evidential presumption that it was in working order at the material time, it is necessary for the party seeking to benefit from the presumption to adduce evidence of how the instrument in question works, together with change logs and release notices, changes to the device or system (software, physical and organizational), transaction and event logs, and sworn evidence that (i) the records disclosed are complete records of all the known defects in the device or system, and (ii) that members of staff with access to the device or system have not modified system data in the relevant period.

1For a more detailed set of recommendations, see Paul Marshall, James Christie, Peter Bernard Ladkin, Bev Littlewood, Stephen Mason, Martin Newby, Dr Jonathan Rogers, Harold Thimbleby and Martyn Thomas CBE, ‘Recommendations for the probity of computer evidence’ (2021) 18 Digital Evidence and Electronic Signature Law Review 18.

5.36 This formulation is consistent with Crawley v Laidlaw¹ and Porter v Koladzeij² in that if the presumption is to be recognized, it is necessary for the proponent to provide sufficient evidence – the basic fact – to merit the introduction of such a presumption. It this respect, it is pertinent to note the observation by Lord Griffiths in Cracknell v Willis³ that ‘“trial by machine” is an entirely novel concept and should be introduced with a degree of caution’.⁴ He went on to indicate that it would be unthinkable that somebody should be convicted by a machine that is not ‘reliable’, although he did not make it clear what he meant by ‘reliable’.

1(1930) VLR 370.

2(1962) VR 75.

3[1988] AC 450, [1987] 3 WLR 1082, [1987] 3 All ER 801, [1987] 11 WLUK 62, (1988) 86 Cr App R 196, [1988] RTR 1, (1987) 137 NLJ 1062, (1987) 131 SJ 1514, [1988] CLY 3122; work had already been undertaken before 1988: T. R. H. Sizer and A. Kelman (eds), Computer Generated Output as Admissible Evidence in Civil and Criminal Cases (Heydeon & Son on behalf of the British Computer Society 1982); Alistair Kelman and Richard Sizer, The Computer in Court (Gower 1982).

4[1988] 1 AC 450 at 459.

5.37 Conversely, in DPP v McKeown (Sharon), DPP v Jones (Christopher)¹ Lord Hoffmann voiced the opinion in 1997 that ‘It is notorious that one needs no expertise in electronics to be able to know whether a computer is working properly’.² This comment, akin to the ‘aura of infallibility’,³ is an extreme view that is contradicted by the evidence, and did not bear a great deal of scrutiny at the time the comment was made. The observation by Lloyd LJ in R v Governor Ex p Osman (No 1), sub nom Osman (No 1), Re⁴ is of a similar nature:

Where a lengthy computer printout contains no internal evidence of malfunction, and is retained, e.g. by a bank or a stockbroker as part of its records, it may be legitimate to infer that the computer which made the record was functioning correctly.⁵

1[1997] 1 WLR 295, [1997] 1 All ER 737, [1997] 2 WLUK 386, [1997] 2 Cr App R 155 (HL), (1997) 161 JP 356, [1997] RTR 162, [1997] Crim LR 522, (1997) 161 JPN 482, (1997) 147 NLJ 289, Times, 21 February 1997, Independent, 7 March 1997, [1997] CLY 1093; note the comment by Harvey J in the New Zealand case of R v Good [2005] DCR 804 at 65 ‘that computers are not recently invented devices, are in wide use and are fundamentally reliable’.

2[1997] 1 All ER 737 at 743b.

3D. W. Elliott, ‘Mechanical aids to evidence’ [1958] Crim LR 5, 7.

4[1990] 1 WLR 277, [1989] 3 All ER 701, [1988] 3 WLUK 391, (1990) 90 Cr App R 281, [1988] Crim LR 611, (1990) 87(7) LSG 32, (1990) 134 SJ 458, Times, 13 April 1988, Independent, 15 April 1988, Guardian, 19 April 1988, Daily Telegraph, 21 April 1988, [1990] CLY 1175.

5[1990] 1 WLR 277 at 306H.

5.38 The judge did not indicate what evidence was before him to demonstrate that there was no ‘internal evidence of malfunction’. Just because a bank or a stockbroker will rely on computer data as part of its records, it does not follow that a judge should accept that such records are what a party asserts they are. Indeed, Professor Seng observed that such comments made by judges are ‘extravagant judicial statements … [that] are incomplete and are actually misleading because accurate computer output depends not just on the proper operation of computers, but also proper human use (or abuse) of computers’.¹ There is a significant difference between functioning ‘correctly’ – meaning, as intended – and being correct, namely that the intentions of the programmers were correct and free of any errors.

1Daniel K. B. Seng, ‘Computer output as evidence’ [1997] SJLS 130, 167.

5.39 The ‘instrument in working order’ relies on the presumption that transitions between ‘being in working order’ and ‘not being in working order’ are reasonably rare.¹ In other words, the instrument cannot capriciously alternate between giving correct readings and incorrect readings, with arbitrary lengths of the sequences of correct and of incorrect readings. These arbitrary sequences happen rapidly and often with software. Although there is generally a reason for these sequences – something in the exact values and timings of the sequences of inputs determines which outputs will be correct and which ones will be wrong, given the defects in the software – identifying the law that governs them and the software defects causing a problem may be impossibly time-confusing, even for well-equipped experts.

1This moves into the confusing area of inference, implication and causality. The arguments are based on a conditional probability. The court wants to know that the conditional probability that the device is working correctly is high enough given the evidence about its provenance and circumstances. Very few people understand conditional probability. I owe this observation to Professor Martin Newby.

How judges assess the evidence of devices controlled by software

5.40 When discussing the admission of evidence from devices controlled by software code, judges do not distinguish between a single, highly specialist device that is self-contained and a linked network containing any number of devices each independently operating on its own set of software code. As noted above, when considering cases dealing with specialized devices such as breath-testing machines and blood-testing machines, judges have used nebulous terms in the absence of scientific analysis, such as ‘notoriety’, ‘common knowledge’ and ‘properly constructed’. There is little evidence to demonstrate that proper evidential foundations have been adduced to permit such presumptions to be admitted. In this regard, it is useful to consider, although not exclusively, the case law in Australia, where these devices have been subjected to stricter judicial analysis.

5.41 The Southern Australian case of Mehesz v Redman¹ concerned the method of analysing a blood sample. At trial, the Special Magistrate categorized the blood sample-testing device as a scientific instrument with the presumption that it was in the category of a ‘notorious’ instrument whose accuracy is presumed. On appeal, Zelling J rejected this on the basis that the device was not a mere calculator, although the interpretation of the data was a result of its software program. There was no evidence to demonstrate that the machine was accurate or reliable. The appellant was tried a second time, convicted again and appealed to the Supreme Court once more. This appeal was referred to the full court.² The main argument of counsel for the appellant related to the evidence tendered by the prosecution regarding the analysis of a blood sample, in that the evidence relied on the use of two instruments (a gas chromatograph and the ‘Auto-lab system 4B’ data analyser) whose accuracy had not been established. King CJ rejected the submission that the Auto-lab was an instrument that could not be relied upon because there was no evidence as to the ‘correctness’ of its software program. He said:

The courts do not require such evidence. If the instrument is so well known that its accuracy may be assumed as a matter of common experience, the Court is entitled to presume its accuracy without evidence.³

1(1979) 21 SASR 569.

2Mehesz v Redman (no 2) (1980) 26 SASR 244.

3(1980) 26 SASR 244 at 247.

5.42 Proof of the accuracy of a particular instrument will ‘ordinarily be proved by those who use and test it’, and the results obtained are acceptable in evidence ‘provided that the expert witness has himself formed an opinion that the methods used are apt to produce the correct result’.¹ Notwithstanding the inability of the operator of a machine controlled by software code to demonstrate the accuracy or otherwise of the code that he does not control and has no ability to alter, this proviso is important. (White J also made a similar point.²) This means that the operator of such a machine ought to be able to assess when the machine produces results that are not expected, even if the operator is not able to establish why those results are wrong. If a machine produces results that are not anticipated, the operator is put on notice that the machine (and the software code) might not be reliable. In such circumstances, it will be necessary to have the machine tested before it is relied upon for future analysis.

1(1980) 26 SASR 244, King CJ at 248.

2(1980) 26 SASR 244 at 254.

5.43 Dealing with the submission that the prosecution failed to provide proper foundations for the Auto-lab analyser, White J set out the conditions that must be fulfilled before evidence will be admitted regarding the measurements of scientific instruments:

1. If the instrument falls within the class of instrument known as notorious scientific instruments, the court will take judicial notice of its capacity for accuracy, so that the operator merely proves that he handled it properly on the particular occasion.

2. If the instrument is not a notorious scientific instrument, its accuracy can be established by evidence: (a) that the instrument is within a class of instrument generally accepted by experts as accurate for its particular purpose; (b) that the instrument, if handled properly, does produce accurate results: ((a) and (b) must be established by expert testimony, that is, by experts with sufficient knowledge of that kind of instrument; and upon proof of (a) and (b), a latent presumption of accuracy arises which allows the court to infer accuracy on the particular occasion if it is proved) – (c) that the particular instrument was handled properly and read accurately by the operator on the particular occasion; ((c) can be established by a trained competent person familiar with the operation of the instrument, not necessarily the type of expert who proves (a) and (b)).

3. Where the actual accuracy of the measurement can be inferred from all of the proved circumstances, it is not necessary to rely upon the presumption arising from (a) and (b), proof of which is superfluous.¹

1(1980) 26 SASR 244 at 251–252, original emphasis.

5.44 At the second trial, the prosecution called evidence from Professor Northcote, Chairman of the School of Mathematics and Computers at the Institute of Technology in South Australia, and an expert in mathematics, physics and computers. He gave evidence about the workings of the Auto-lab from his reading of the manufacturer’s manual and his understanding of the content of the manual. He was not able to read the software code, because the manufacturer had sealed the program against inspection, tampering and modification. Although Professor Northcote was not an expert in relation to the Auto-lab, the members of the Court of Appeal in the Supreme Court were of the opinion that both Professor Northcote and Mr Vozzo, who gave evidence at both trials, were sufficiently qualified to give evidence, even though neither witness had access to, nor any knowledge of, the software code. The Chief Justice also stated: ‘It is sufficient that the expert who uses it is able to say that it is an instrument which is accepted and used by competent persons as a reliable aid to the carrying out of the scientific procedures in question and that he so regards it.’¹ He also prayed in aid the observations of Wigmore on Evidence² to support this comment:

(2) Scientific instruments, formulas, etc. The use of scientific instruments, apparatus, formulas, and calculating-tables, involves to some extent a dependence on the statements of other persons, even of anonymous observers. Yet it is not feasible for the professional man to test every instrument himself; furthermore he finds that practically the standard methods are sufficiently to be trusted. Thus, the use of a vacuum-ray machine may give correct knowledge, though the user may neither have seen the object with his own eyes nor have made the calculations and adjustments on which the machine’s trustworthiness depends. The adequacy of knowledge thus gained is recognized for a variety of standard instruments.³

1(1980) 26 SASR 244 at 247.

2(3rd edn), Volume 2, paragraph 665a.

3(1980) 26 SASR 244 at 247, original emphasis.

5.45 In this case, the court emphasized that there was evidence other than the trustworthiness of the software code that enabled the evidence from the machine to be admitted as being accurate. White J set out the following analysis of the problem:

The only defect in the expert evidence of Dr. Northcote and Mr. Vozzo, if defect it be, was their lack of direct knowledge of the internal operations of the sealed instrument. They relied upon what the manufacturer said about its operation. The extreme position would be that only the expert actually supervising the manufacture of the instrument in the United States of America could prove (a) and (b). I do not think that the rules relating to expert evidence encourage that kind of extreme position. Quite apart from questions of expense and delay in the administration of justice, the Court is entitled to rely upon evidence of measurements made by instruments which reputable scientists accept as accurate, whether those scientists have direct knowledge of the reasons for the instrument’s accuracy or not, provided they have knowledge that the instrument’s measurements are accurate according to a known standard, or are accepted as accurate by reputable scientists.¹

1(1980) 26 SASR 244 at 253. Most of these arguments fall short when applied to a large-scale system. In all of the examples where the subject of discussion is notionally a scientific instrument, there is always the possibility of treating it as a black box and testing its calibration with standard inputs, just like the weights and measures inspector turning up with a box of standard weights. These instruments essentially have a single input and output, and could be fully characterized experimentally. As soon as this very simple conceptual model does not fit, many other considerations come into play. Primarily that there is no longer the possibility of exhaustively examining all circumstances and factors determining behaviour. I owe this observation to Professor Martin Newby.

5.46 By implication, the court concluded that it would be extreme to establish the reliability of a software controlled device in a court of law by analysing the software code – the very software code that controlled the device and provided the evidence. The court considered that evidence from the operator of the device was sufficient for the trial court to assess the accuracy of the evidence. The appeal was dismissed.

5.47 Given these comments, it is understandable that the court reached the conclusions it did in Mehesz v Redman (no 2). At issue was a self-contained device that was used by trained operators with suitable qualifications. On the basis that the readings from such devices were, at any time, not within the expected range, the suitably trained and qualified operators were expected to use their professional judgement to verify the reliability of the device before submitting the evidence for legal proceedings. In such a case, the court would not require the software code to be challenged.

5.48 The case of Bevan v The State of Western Australia¹ illustrates the approach taken when considering the admission of evidence from computers and computer-like devices. One of the grounds of appeal in this case was the admissibility of mobile telephone data in the form of text messages downloaded by a computer software program. An investigating police officer carried out two separate downloading operations using two separate tools, Cellebrite and XRY. At the beginning of the trial, counsel for the accused objected to the text messages being received into evidence. The trial judge held that the text messages were admissible. Questions were raised as to the reliability of the software and of the officer’s correct use of it. The Court of Appeal concluded that the trial judge erred in law in admitting the text messages into evidence. This was because the officer did not explain the process of how he downloaded it in any detail at trial: it was the first time he had used the relevant software, and he did not have any formal training in its use. When considering the rebuttable presumption at common law as to the accuracy of ‘notorious’ scientific or technical instruments, Blaxell J said that ‘when evidence from a new type of scientific instrument or process is adduced for the first time, there must be proof of its reliability and accuracy’.² He went on to say that:

When specific evidence of the accuracy of a new instrument is required, this need not come from the manufacturer. It is sufficient that the expert who uses it can say that it is an instrument which is accepted and used by competent persons as a reliable aid in the carrying out of the scientific procedure in question, and that he so regards it.³

1[2010] WASCA 101.

2[2010] WASCA 101 at [30].

3[2010] WASCA 101 at [31].

5.49 Blaxell J approved of the observations by White J¹ in Mehesz v Redman (no 2) as noted above. He continued:

To the above principles I add the obvious comment that a court will not be satisfied that an instrument was ‘handled properly’ on a particular occasion, if it does not understand what was required of the operator for this to be so. Detailed evidence as to the workings of the instrument need not be given … However, it is necessary that there be sufficient evidence for the court to apprehend what it was that the operator had to do in order to ensure an accurate result.²

1Mehesz v Redman (no 2) (1980) 26 SASR 244 at [251]–[252].

2Bevan v The State of Western Australia [2010] WASCA 101 at [33].

5.50 In essence, Blaxell J is saying that if the user of a smartphone can give evidence to demonstrate that he can use the smartphone, it follows that he is sufficiently knowledgeable to give evidence indirectly that the software code that controls the device is ‘working properly’, ‘reliable’ or ‘accurate’. It is as if the software programs that form the device are irrelevant. Additionally, no attempt was made to define how software code can be determined to be ‘working properly’, ‘reliable’ or ‘accurate’.

5.51 In Bevan v The State of Western Australia, the Court of Appeal heard a second appeal in the same case after a re-trial.¹ The same argument arose regarding the method of downloading the data from the mobile telephone. There was a trial within a trial concerning the evidence of Detective Tomlinson. (Buss J referred to him as a First Class Constable, and set out his qualifications.²) Counsel for the appellant conceded that the witness was qualified to operate the equipment used to perform the download, but argued that he was not qualified to give evidence about the accuracy of the download material and the reliability of the material itself. In cross-examination, Detective Tomlinson explained he did not hold a certificate in relation to the Cellebrite and XRY software packages, but that he had been shown how to use them on about ten occasions. The following exchange took place regarding how the software worked:

Q. Can you tell me how the Cellebrite package actually works.

A. I don’t understand the question.

Q. How does it work? Explain to me, a layman, who knows nothing about Cellebrite, how it works.

A. It extracts data from a telephone.

Q. How? How does it do that?

A. It uses software.

Q. And how does that software work?

A. I couldn’t tell you.

Q. What about the XRY?

A. The same.

Q. If you don’t know how it works, how can you say its [sic] reliable?

A. You’d have to ask the manufacturer.

Q. Okay. I’m asking you. How can you say its [sic] reliable.

A. I can’t.

Q. You can’t. And, in fact, on one occasion that you used it in relation to the Nokia, it was unsuccessful.

A. Yes, that’s right.³

1[2012] WASCA 153.

2[2012] WASCA 153 at [18]–[21] and [105].

3[2012] WASCA 153 at [20]; the last question and answer is at [106(g)].

5.52 In deciding to allow the evidence before the members of the jury, the trial judge said:

The workings of the instrument need not be given and it seems to me that in this case the notes of the experienced officer, the evidence that this software is regularly used by him establishes the level of accuracy and in his notes at the time that he was – successfully used the program seems to me to meet the tests ... He was a trained, experienced and competent operator and the software was operated properly and, in those circumstances, in this case I think this evidence is admissible and I will allow it to be given by the qualified expert.¹

1[2012] WASCA 153 at [21].

5.53 Pullin and Mazza JJA agreed the trial judge did not err in overruling the objection to the tendering of the text messages. In essence, because Detective Tomlinson was qualified as an expert, he could testify about the performance of the machines and the software. It was inferred that as an expert (in the opinion of the court), he considered the process to be accurate, and that because he had performed such actions previously, the actions undertaken on this particular occasion were properly performed – even though the user of the program will not know that it was giving inaccurate results.¹ There was no requirement for the Detective to understand how the software worked, or whether there were any problems with the software he used.² Pullin JA said: ‘His evidence provided sufficient assurance that the results produced by the machines were reliable and accurate, because he (a trained operator of the machines) observed them to be so.’³ But it does not follow that any operator of an electronic device will be able to detect if the device was malfunctioning in any way. As noted by Eric Van Buskirk and Vincent T. Liu:

There is a general tendency among courts to presume – without the benefit of meaningful assurance – that forensic software can be trusted to yield accurate digital evidence. As a judicial construct, this presumption is unjustified in that it is not tailored to separate accurate results from inaccurate ones.⁴

1As in the case of the death of Casey Marie Anthony in 2011, for which see Craig Wilson, ‘Digital evidence discrepancies – Casey Anthony trial, 11 July 2011’, http://www.digital-detective.net/digital-evidence-discrepancies-casey-anthony-trial/; Tony Pipitone, ‘Cops, prosecutors botched Casey Anthony evidence’, Clickorlando.com, 28 November 2012, http://www.clickorlando.com/news/cops-prosecutors-botched-casey-anthony-evidence; Jose Baez and Peter Golenbock, Presumed Guilty: Casey Anthony: The Inside Story (BenBella Books, updated edition 2013), 46, 180–183, 211, 346–348, 365, 368–371, 400, 426–428; Jess Ashton and Lisa Pulitzer, Imperfect Justice: Prosecuting Casey Anthony (William Morrow 2011), 105, 239, 277, 291–292, 298, 315.

2[2012] WASCA 153; the rationale is set out at [66] and [67].

3[2012] WASCA 153 at [67].

4Eric Van Buskirk and Vincent T. Liu, ‘Digital evidence: challenging the presumption of reliability’ (2006) (1) Journal of Digital Forensic Practice 19, 20, original emphasis.

5.54 They suggest there are two approaches to resolve the problem in the abstract of the paper:

One is through the proper application of scientific jurisprudence to questions of digital evidence and the other is through some combination of certain broad market and social corrections.

5.55 The important question is:

If the device was malfunctioning, how would the operator know?

5.56 More significantly, the question should be:

How would the malfunction manifest itself, if at all, and in a form evident to the operator?

5.57 In addition to which, it is necessary to allow for human factors: such as whether an operator focusing on getting a job done has the cognitive capacity to notice errors. It is well known that errors cause ‘interference’, which makes them very hard to recall even if they were noticed – that is, noticing and interpreting the error requires a different sort of thinking than doing the main task, so it interferes and makes it harder to do either properly.

5.58 In the minority, Buss J considered that none of the relevant basic facts and circumstances were proven. The judge considered the applicable legal principles in detail. He cited the relevant case law, and also extracts from The Science of Judicial Proof (3rd edn, 1937, para 111) by Professor Wigmore:

Professor Wigmore enunciated three fundamental propositions applicable to evidence based on the use of a mechanical or scientific instrument constructed on knowledge of scientific laws:

1. The type of apparatus purporting to be constructed on scientific principles must be accepted as dependable for the proposed purpose by the profession concerned in that branch of science or its related art. This can be evidenced by qualified expert testimony; or, if notorious, it will be judicially noticed by the judge without evidence.

2. The particular apparatus used by the witness must be one constructed according to an accepted type and must be in good condition for accurate work. This may be evidenced by a qualified expert.

3. The witness using the apparatus as the source of his testimony must be one qualified for its use by training and experience (§220).¹

1[2012] WASCA 153 at [111]–[129], original emphasis.

5.59 The judge continued:

Wigmore on Evidence (Chadbourn Rev, Vol III, 1970) §795 states the requirements for the admissibility of evidence based on the use of scientific instruments, as follows:

What is needed, then, in order to justify testimony based on such instruments, is preliminary professional testimony: (1) to the trustworthiness of the process or instrument in general (when not otherwise settled by judicial notice); (2) to the correctness of the particular instrument; such testimony being usually available from one and the same qualified person.¹

1[2012] WASCA 153 at [112], original emphasis.

5.60 And logically, as Professor Thimbleby has indicated,¹ (3) the appropriateness and correctness of the use of the instrument as used in the particular case.

1In reviewing this chapter for the fifth edition, for which my thanks.

5.61 Buss J rejected the evidence of the constable, partly because he was not qualified to comment on the software and because the ‘machines/software’ were not so well known that their accuracy may be assumed as a matter of common experience.¹ Evidence was required to demonstrate their accuracy. It followed that the State had to produce evidence from a suitably qualified expert of the trustworthiness of the machines and software in general, and of the correctness of the particular instruments for the purposes of downloading data from mobile telephones.² Arguably, had the State produced sufficient evidence to convince a judge of the accuracy of the machines and software, it would not have been necessary to reply on the presumption. Notwithstanding this observation, the approach by Buss J is to be preferred. His brother judges appear to accept the astonishing conclusion that not having any knowledge of how a device works is irrelevant to the results of the analysis. In their approach, the work of software programmers is immaterial. Software code is not germane when determining causation. If this approach were accepted, no longer would decisions in legal proceedings be based on relevant evidence.

1This is a criterion that ignores how often people trust something that is untrustworthy simply because they are never tempted to challenge its results and scrutinize them with sufficient rigour to be able to tell whether they are correct.

2[2012] WASCA 153 at [132]–[139].

5.62 Contrast this decision to a similar set of facts discussed by the United States Court of Appeals, First Circuit in the case of U.S. v Chiaradio.¹ The Federal Bureau of Investigation (FBI) used a software tool called LimeWire, a commercially available peer-to-peer file sharing program that enables users to transmit files to and from other members of the LimeWire network. The FBI adapted this software for the purposes of investigations into abusive images of children. It was called ‘enhanced peer-to-peer software’ or EP2P. The software adapted by the FBI differed from LimeWire in three principle respects: (1) the software permitted downloading from only one source at a time, thus ensuring that the entire file was available on the computer of the accused; (2) in the commercially available version, LimeWire responds to a search term by displaying the names of the available files, file types, and the file sharers’ IP addresses, whereas EP2P displays the same data and the identity of the Internet Service Provider (ISP), together with the city and state associated with the IP address sharing a particular file, and (3) EP2P was modified so that an agent could easily compare the hash value of an available file with the hash values of confirmed videos and abusive images of children.

1684 F.3d 265 (1st Cir. 2012).

5.63 The defence requested discovery of the source code at an evidentiary hearing before the District Court. The application was refused. The purpose of the request was to determine whether the reliability of the technology could be credibly challenged; the defence argued that the inability to examine the source code prevented the accused from mounting such a challenge. The District Court denied the motion to compel discovery of the source code and the Appeal Court agreed with the District Court. Agent P. Michael Gordon testified that the software had no error rate; he demonstrated how the results of an investigation could be independently verified, and that the software had never yielded a false positive. The court considered that this alone provided sufficient evidence of the reliability of the tool. The defence also cited the lack of a peer review, but the Appeal Court indicated that the Daubert¹ factors were not a definitive checklist, and there was a sound explanation for the absence of peer review:

The record shows that the source code is purposely kept secret because the government reasonably fears that traders of child pornography (a notoriously computer-literate group) otherwise would be able to use the source code to develop ways either to evade apprehension or to mislead the authorities. This circumstance satisfactorily explains the absence of any peer review.²

1Daubert v Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), 113 S.Ct. 2786.

2684 F.3d 265 (1st Cir. 2012) at 278.

5.64 The evidence in this example enabled the court to resist the discovery of the source code on the basis that the software was proven to be ‘reliable’ in respect of the specific purposes for which it had been developed, although it is not clear what evidence of its correctness, if any, was offered.¹ That no errors (for example there were no false positives) are found does not mean that there are none.

1See People v Collins, 49 Misc.3d 595, 15 N.Y.S.3d 564 (N.Y. Sup. Ct. 2015), 2015 N.Y. Slip Op. 25227 (evidence based on a forensic statistical tool (FST) excluded on the basis that the device was not generally accepted in the DNA scientific community); a number of judges have declined to follow this decision – one of the most negative is Schwartz J in People v Carter, 50 Misc.3d 1210(A), 36 N.Y.S.3d 48 (Table), 2016 WL 239708, 2016 N.Y. Slip Op. 50067(U) (determining that the defendant was not entitled to a Frye hearing because the FST is not new, novel or experimental), although note the order of Caproni J in United States v Johnson, Case No. 1:1-er-00565-VEC (S.D.N.Y. 7 June 2016) (order granting request for subpoena for disclosure of FST source code), https://www.courtlistener.com/recap/gov.uscourts.nysd.446412.57.0.pdf.

Mechanical instruments and computer-like devices

5.65 The discussion in this chapter focuses on software code that provides instructions. In the case of firmware, which is software that is incorporated into hardware, the absence of visible programs does not mean that software is absent: the commentary in this chapter applies equally to this form of implementation of software.

The nature of software errors

5.66 It can be said that a computer can be both ‘reliable’ (but not infallible) and yet perform functions without the authority or knowledge of the owner or software writer. This may be when the code executes in a way, because of a strange or unforeseen conjunction of inputs, which neither the owner nor the writer had imagined. For instance, one Jonathan Moore designed and produced forged railway tickets that were accepted by ticket machines controlled by computers. It took a ticket inspector to notice subtle differences in the colour and material of the ticket, which led to his arrest and prosecution for forgery.¹

1Tom Pugh, ‘IT expert sentenced for rail ticket forgery’, The Independent (London, 2 October 2009).

5.67 It is important to understand that programmers are aware of the limitations of their software, as famously articulated by Ken Thompson:

You can’t trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code.¹

1Ken Thompson, ‘Reflections on trusting trust’ (1984) 27(8) Turing Award Lecture, Communications of the ACM 761; Donald MacKenzie, Mechanizing Proof Computing, Risk and Trust (MIT Press 2004), 299, fn 1.

5.68 These comments are decidedly relevant, given that Thompson demonstrated how to create a C program fragment that would introduce Trojan horse code into another compiled C program by compromising the C compiler. Thomas Wadlow explained this process as follows:

For example, when compiling the program that accepts passwords for login, you could add code that would cause the [first] program to accept legitimate passwords or a special backdoor password known to the creator of the Trojan. This is a common strategy even today and is often detectable through source-code analysis.

Thompson went one step further. Since the C compiler is written in the Cprogramming language, he used a similar technique to apply a Trojan to the C compiler source itself. When the C compiler is compiled, the resulting binary program could be used to compile other programs just as before; but when the program that accepts passwords for login is compiled with the new compiler from clean, uncompromised source code, the backdoor-password Trojan code is inserted into the binary, even though the original source code used was completely clean. Source-code analysis [of the login program] would not reveal the Trojan because it was lower in the tool chain than the login program.¹

1Thomas Wadlow, ‘Who must you trust?’ (2014) 12(5) acmqueue Security 2.

5.69 The description could have continued. The Trojan, as described above, remains easily detected: there is one in the source code of the compiler and one in the object code of the compiler. In Thompson’s scheme, he went one step further: modify the compiler to insert the compiler’s Trojan. Now the source code Trojan in the compiler (which inserts the Trojan into the login) can be removed. Furthermore, as Thompson points out, you can now remove all trace of the Trojan in all source code. There is now no readable evidence of any Trojan attack.¹

1My thanks to Professor Thimbleby for this point.

5.70 Just because a person is in physical control of a computer or shop cash till,¹ it does not follow that she will be aware whether it is working ‘reliably’, ‘properly’, ‘consistently’, ‘correctly’ or ‘dependably’.² As indicated above, even the writer of the software will not be in such a luxurious position. It therefore follows that the following comment by Kerr LCJ was not correct:

In the modern world the presumption of equipment being properly constructed and operating correctly must be strong. It is a particularly strong presumption in the case of equipment within the control of the defendant who alone would know if there was evidence of incorrect operation or incorrect setting.³

1Stephen Castell, ‘Letter to the editor’ (1994) 10 Computer Law and Security Report 158 pointed out that the observation by Lord Griffiths that a till was a ‘computer … of the simplest kind’ was, even at the time, an assumption that did not reflect the truth: at 387D, R. v Shephard (Hilda) [1993] AC 380, [1993] 2 WLR 102, [1993] 1 All ER 225, [1992] 12 WLUK 273, (1993) 96 Cr App R 345, (1993) 157 JP 145, [1993] Crim LR 295, (1993) 143 NLJ 127, (1993) 137 SJLB 12, Times, 17 December 1992, Independent, 21 January 1993, [1993] CLY 636; Allison Nyssens, ‘The law of evidence: on-line with the computer age?’ (1993) 15(10) EIPR 360.

2The use of the word ‘dependability’ is a global concept that subsumes attributes of reliability, availability, safety, integrity and maintainability, and ‘reliability’ provides for continuity of correct service: Algirdas Avižienis, Jean-Claude Laprie and others, ‘Basic concepts and taxonomy of dependable and secure computing’ (2004) 1(1) IEEE Transactions on Dependable & Secure Computing 11, 13.

3Public Prosecution Service v McGowan [2009] NI 1 at [20]; it is acknowledged that many standards in the safety critical community require some element of proof in the tools they use, such as evidence that the supplier tracks and corrects defects, for instance.

5.71 That software code is imperfect and remains so may be illustrated by the comments of an early pioneer in computing, the late Professor Sir Maurice V. Wilkes FRS FREng:

By June 1949 people had begun to realize that it was not so easy to get a program right as had at one time appeared. I well remember when this realization first came on me with full force. The EDSAC was on the top floor of the building and the tape-punching and editing equipment one floor below on a gallery that ran round the room in which the differential analyzer was installed. I was trying to get working my first non-trivial program, which was one for the numerical integration of Airy’s differential equation. It was on one of my journeys between the EDSAC room and the punching equipment that ‘hesitating at the angles of the stairs’ the realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs. Turing had evidently realized this too, for he spoke at the conference on ‘checking a large routine’.¹

1Maurice V. Wilkes, Memories of a Computer Pioneer (MIT Press 1985), 145. For the EDSCA room, see https://en.wikipedia.org/wiki/EDSAC.

5.72 This observation has been repeated many times since.¹ Programmer errors are caused by a mix of novelty (applying software to previously unsolved problems) and the difficulty of the tasks software is required to perform, including their magnitude and complexity.² And the errors reach back. Programmers use programming languages, and the languages are themselves subject to errors; there are errors even if the programmers are somehow perfect and ensure there are no errors, because other programmers further back will have left errors. This is why software has always been released in new versions: primarily to correct previously unknown (or ignored) errors.

1The reader might wish to begin with the following, which is only one of many articles by many eminent people: Les Hatton, ‘Characterising the diagnosis of software failure’ (2001) 18(4) IEEE Software 34.

2B. Littlewood and L. Strigini, ‘Software reliability and dependability: a roadmap’ in A. Finkelstein (ed.), The Future of Software Engineering, State of the Art Reports given at the 22nd International Conference on Software Engineering (ACM Press 2000), 177–188.

5.73 To address this problem, the approach of many of the existing software safety standards is to define requirements for and put constraints on the software development and assurance processes.¹ Using the taxonomy of the provision of services, Algirdas Avižienis and colleagues have defined a ‘correct service’ as one where the service implements the system function. Its failure is an event that occurs when the service does not do what the function provides. This deviation is described as an ‘error’. For instance, if the function when using an ATM is to dispense the correct quantity of cash, and the ATM dispenses the correct amounts of cash, then there is a correct service, and the service is carried out in accordance with the function. If the amount of cash withdrawn from an ATM is greater or less than the amount keyed in, or no cash is provided, this is a service failure that can be an error or fault. The authors go on to say:

Since a service is a sequence of the system’s external states, a service failure means that at least one (or more) external state of the system deviates from the correct service state … In most cases, a fault first causes an error in the service state of a component that is a part of the internal state of the system and the external state is not immediately affected.

For this reason, the definition of an error is the part of the total state of the system that may lead to its subsequent service failure.² It is important to note that many errors do not reach the system’s external state and cause a failure. A fault³ is active when it causes an error, otherwise it is dormant.⁴

1Professor John McDermid and Tim Kelly, ‘Software in safety critical systems: achievement and prediction’, 2(3) Nuclear Future 34; Peter Bernard Ladkin, ‘Duty of care and engineering functional-safety standards’ (2019) 16 Digital Evidence and Electronic Signature Law Review 51.

2Although this permits everything to be an error.

3The word ‘fault’ has not been defined or distinguished from error.

4Algirdas Avižienis and others, ‘Basic concepts and taxonomy of dependable and secure computing’, 13, original emphasis; for additional discussions on this topic, see John Rushby, ‘Critical system properties: survey and taxonomy’ (1994) 43(2) Reliability Engineering and System Safety 189, and MacKenzie, Mechanizing Proof Computing, 337, fn 16.

5.74 For instance, an ATM might provide a receipt that £100 has been withdrawn, but does not dispense the money. Given this set of facts, clearly a fault has occurred. One reason might be that the sensors or the software code (or both) in the machine failed to detect the lack of movement of cash. The bank might provide a printout of the machine’s internal functioning that shows the purported balance of cash held in the machine before the transaction, and again after it. This proves very little. In the New York case of Porter v Citibank, N.A.,¹ a similar set of facts occurred. The customer used his card, but no money was dispensed. Employees of the bank testified that on average machines were out of balance once or twice a week. From the point of view of evidence, the information on the printout is restricted to a single transaction. For the bank to prove that the machine actually dispensed £100 (and therefore the customer is lying), it is necessary for the bank to balance the ATM and report the results for the material time. The overall balance might indicate that it had gone down by £100. But the report might be inaccurate. This is because of a number of associated variables, such as (this is not an exhaustive list): there are multiple layers of outsourcing, the fact that people cover up mistakes and the fact that people rely on other people to be diligent in dual-control tasks (whatever they are). Equally, if the machine happens to overpay someone else by £100, the error will cancel out the previous error and the end result might not have been detected by human intervention either. Human cross-checks may suggest that everything appears correct, but the system is failing repeatedly. A further reason for the machine to be in error is that a third party may have successfully inserted code to bypass the software in the machine, leaving the thief to recover the cash after the customer left the scene.²

1123 Misc.2d 28, 472 N.Y.S.2d 582 (N.Y.City Civ.Ct. 1984).

2Stephen Mason, ‘Debit cards, ATMs and negligence of the bank and customer’, (2012) 27(3) Butterworths Journal of International Banking and Financial Law 163; Maryke Silalahi Nuth, ‘Unauthorized use of bank cards with or without the PIN: a lost case for the customer?’ (2012) 9 Digital Evidence and Electronic Signature Law Review 95; Stephen Mason, ‘Electronic banking and how courts approach the evidence’ (2013) 29(2) Computer Law and Security Review 144.

5.75 For all these reasons and more, it is difficult to show that a computer is working ‘properly’, even for highly skilled professionals.¹ Part of the problem is that computers fail in discontinuous ways (they cannot fail slightly), which is a characteristic of discrete complexity, unlike most mechanical devices.

1There is a technique called code verification, where code functionalities are verified as mathematical properties. But this process is time-consuming and limited, although it is faster then fixing a problem later. I owe this observation to Professor Seng.

Why software appears to fail

5.76 People across the world increasingly depend on computers and computer-like devices for mundane uses such as recording (cameras and recorders on mobile telephones), for critical uses such as lifesaving devices that control delicate medical equipment in hospitals and for important infrastructural uses such as systems for the supply of gas, electricity and fuel, underground trains,¹ buses² and financial software that assesses risk in financial products.

1The railway trains on the Jubilee Line of the London Underground were being replaced with new trains from 2011. Many of the new trains failed and left passengers stranded for hours because of software failures: Dick Murray, ‘Computer crash caused Jubilee Line “meltdown”’, Evening Standard (London, 9 November 2011) 11; this problem was also included in one of the series of six programmes by the BBC entitled The Tube that was broadcast during the spring of 2012. This is merely one example from across the world.

2A software problem meant that a new model of the London bus had to be run with its distinctive rear platform shut: ‘New Routemaster bus starts running on London roads’, BBC News, 27 February 2012, http://www.bbc.co.uk/news/uk-england-london-17173625.

5.77 In the light of the ubiquitous nature of software, it is important to be aware that software code can function as intended by the programmer, but it can also be the cause of failure. Alternatively, software code may fail to function in the way the programmer intended, or it might continue to function but undertake actions that the programmer did not originally intend or instruct the device to undertake. Problems can occur for a number of reasons, such as where software code has a mistake, or because of improper installation, or because the people hired to undertake the work were not sufficiently qualified.¹ A range of consequences might follow, such as the failure of air traffic control systems² and lost baggage from baggage handling systems in airports,³ preventing couples from obtaining mortgages because of incorrect records,⁴ dispensing more cash than is recorded via faulty software in ATMs,⁵ miscalculating assets in family cases,⁶ and causing injuries and deaths.⁷ The increasing complexity of software and interconnections act to exacerbate the problems that occur.⁸

1Robotic Vision Systems, Inc. v Cybo Systems, Inc., 17 F.Supp.2d 151 (E.D.N.Y. 1998).

2Leonard Lee, The Day the Phones Stopped: the Computer Crisis – The What and Why of It, and How We Can Beat It (Donald I. Fine, New York 1991), chapter 7; Independent Enquiry, NATS System Failure 12 December 2014 – Final Report (13 May 2015), paras ES7–ES10, https://www.caa.co.uk/WorkArea/DownloadAsset.aspx?id=4294974241.

3Michael Schloh, Analysis of the Denver International Airport Baggage System (Submitted: 16 February 1996 Advisor: Daniel Stearns) (Computer Science Department, School of Engineering, California Polytechnic State University 1996), http://www5.in.tum.de/~huckle/schloh_DIA.pdf; Paul Stephen Dempsey, Andrew R. Goetz and Joseph S. Szyliowicz, Denver International Airport: Lessons Learned (McGraw-Hill 1997); The Department of Homeland Security, Office of the Inspector General, Lessons Learned from the August 11, 2007, Network Outage at Los Angeles International Airport (Redacted) (OIG-08-58, May 2008); House of Commons Transport Committee, The Opening of Heathrow Terminal 5, Twelfth Report of Session 2007–08: Report, Together with Formal Minutes, Oral and Written Evidence (Ordered by The House of Commons to be printed 22 October 2008; HC 543, published on 3 November 2008).

4Nicole Blackmore, ‘Npower’s error cost us our mortgage’, The Daily Telegraph, Your Money (London, 10 May 2014) 1, 3.

5Tim Stewart, ‘Huge queues as Tesco cash machine gives customers “free money”’, London Evening Standard (18 August 2009), http://www.standard.co.uk/news/huge-queues-as-tesco-cash-machine-gives-customers-free-money-6702682.html; for other examples, see Stephen Mason, When Bank Systems Fail: Debit Cards, Credit Cards, ATMs, Mobile and Online Banking: Your Rights and What To Do When Things Go Wrong (2nd edn, PP Publishing 2014).

6Owen Bowcott, ‘Revealed: divorce software error hits thousands of settlements’, The Guardian (London, 17 December 2015).

7Donald MacKenzie, ‘Computer-related accidental death: an empirical exploration’ (1994) 21(4) Science and Public Policy 233.

8For instance, consider the widespread effect that the power outage in August 2019, partly because of software failures, had on England: Office of Rail and Road, Report Following Railway Power Disruption on 9th August 2019 (3 January 2020); Department for Business, Energy & Industrial Strategy, GB Power System Disruption on 9 August 2019, Energy Emergencies Executive Committee (E3C): Final Report (January 2020).

Classification of software errors

5.78 The word ‘bug’ is a term commonly used in the information technology industry to describe a variety of issues.¹ When a technician uses this term, it can have a number of meanings.² Professor Thomas offered his view at a lecture he gave in 2015:

Different researchers and authors may describe faults as ‘flaws’, ‘errors’, ‘defects’, ‘anomalies’ or ‘bugs’ but they will almost always mean functional faults, which cause the software to crash or to give the wrong results.³

1It must be emphasized that there are a number of definitions of technical terms, but they are not dealt with in any detail in this text. For an insight as to how ‘bugs’ are dealt with in a contract between commercial entities, see GB Gas Holdings Limited v Accenture (UK) Limited [2010] EWCA Civ 912, [2010] 11 WLUK 260, [2011] 1 Costs LO 64, [2011] CLY 269 and Kingsway Hall Hotel Ltd v Red Sky IT (Hounslow) Ltd [2010] EWHC 965 (TCC), [2010] 5 WLUK 106, (2010) 26 Const LJ 542, [2011] CLY 2777; in the software world, a ‘bug’ is also known as an undocumented feature, for which see David Lubar, ‘It’s Not a Bug, It’s a Feature!’ (Addison-Wesley Publishing Company 1995).

2The members of the team responsible for writing the following report did not use the term ‘bug’ when they meant ‘error’: Willis H. Ware (ed), Security Controls for Computer Systems: Report of Defense Science Board Task Force on Computer Security – RAND Report R-609-1 (Published for the Office of the Secretary of Defense) R-609-1, Reissued October 1979.

3‘Should we trust computers?’, lecture given at Gresham College, 20 October 2015, http://www.gresham.ac.uk/lectures-and-events/should-we-trust-computers.

5.79 Lay people, not without some justification, consider the term ‘bug’ to be a cloak that hides the correct meaning, namely that what is being described is an error, flaw, mistake, failure or fault in a software program or system.¹ Drawing from the work of Professor Ladkin, it is possible to classify most software errors into the following non-exhaustive categories:² human errors in coding and software development; software design or specification errors; unintended or unanticipated software interactions, input data flaws and deliberate errors caused by operators or hackers remotely.

1Causes of failure can also be categorized into human error, environment (including power outages or A/C failure), network failure, software failure and hardware failure: Bianca Schroeder and Garth A. Gibson, ‘A large-scale study of failures in high-performance computing systems’ (2010) 7(4) IEEE Transactions on Dependable and Secure Computing 338.

2Peter B. Ladkin, On Classification of Factors in Failures and Accidents (Report RVS-Occ-99-02), https://rvs-bi.de/publications/.

Human errors and biases in the software code

5.80 Notwithstanding the best software development tools that catch and identify coding errors, human errors in writing software code account for a large number of software errors. This problem is going to be exacerbated, given the increasing size of written codes. An example of human error in software code is that of Mariner I, the spacecraft that was sent to Venus and launched on 22 July 1962. The software code indicated that the booster had failed, and the rocket was destroyed on command from the control centre. In fact, the rocket was behaving correctly and it was the computer system on the ground that was at fault, partly because of a defect in the software and partly because of a hardware failure. The error in the software arose because the person who wrote the software failed to include an overbar in the guidance equations.¹

1Peter G. Neumann, Computer Related Risks (Addison-Wesley 1995), 26–27 (‘Here R denotes the radius; the dot indicates the first derivative – that is, the velocity; the bar indicates smoothed rather than raw data; and n is the increment. When a hardware fault occurred, the computer processed the track data incorrectly, leading to the erroneous termination of the launch’); see also the explanation by the National Aeronautics and Space Administration report NSSDC ID: MARIN1, http://nssdc.gsfc.nasa.gov/nmc/spacecraftDisplay.do?id=MARIN1; for more detail on computers and the space age and an analysis of accidents (including this example), see Paul E. Ceruzzi, Beyond the Limits: Flight Enters the Computer Age (MIT Press 1989).

5.81 Two further examples are the Clementine mission and the Ariane 5 failure. The Clementine mission was a joint project between the Strategic Defense Initiative Organization and NASA. After the spacecraft left lunar orbit, a malfunction in one of the onboard computers on 7 May 1994 caused a thruster to fire until it had used up all of its fuel, leaving the spacecraft spinning at about 80 rpm with no spin control. The spacecraft remained in geocentric orbit and continued testing the spacecraft components until the end of mission.¹ In the case of the Ariane 5 rocket failure in 1996, the disintegration of the rocket 40 seconds after launch was due to a software failure – because, in the words of Professor Les Hatton, ‘the programmers had arranged the code such that a 64-bit floating point number was shoe-horned into a 16-bit integer’.² As pointed out by Professor Ladkin, ‘Code was reused from the Ariane 4 guidance system. The Ariane 4 has different flight characteristics in the first 30 seconds of flight and exception conditions were generated on both inertial guidance system (IGS) channels of the Ariane 5.’³

1Space Studies Board, National Research Council, Lessons Learned from the Clementine Mission (National Academy Press 1997), http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19980041408.pdf.

2Les Hatton, ‘Ariane 5: A smashing success’ (1999) 1(2) Software Testing and Quality Engineering 14; Ariane 501 Inquiry Board report (4 June 1996), http://esamultimedia.esa.int/docs/esa-x-1819eng.pdf and https://www.ima.umn.edu/~arnold/disasters/ariane5rep.html; Charles C. Mann, ‘Why software is so bad’, (2002) Technology Review 38(b); Derek Partridge, The Seductive Computer: Why IT Systems Always Fail (Springer 2011), 99, fn 6.

3Peter B. Ladkin, The Ariane 5 Accident: A Programming Problem? (Article RVS-J-98-02), https://rvs-bi.de/publications/.

5.82 Human bias has also begun to be more fully understood, especially when analysing systems marketed as artificial intelligence, usually developed with a form of machine learning.¹ Hidden biases and flawed datasets are, in all probability, normal.²

1State v Loomis, 881 N.W.2d 749 (Wis. 2016), cert. denied, 137 S.Ct. 2290 (2017); Danielle Keats Citrona, ‘Technological Due Process’ (2008) 85 Wash U L Rev 1249 – noting that the automated public benefits systems of Colorado, California and Texas mistranslated codified eligibility requirements and erroneously distributed or withheld public benefits; Kenneth A. Bamberger, ‘Technologies of compliance: risk and regulation in a digital age’ (2010) 88 Tex L Rev 669 – software designers have created compliance and risk management software with automation biases to favour corporate self-interest; Kathleen E. Watson, ‘Note, COBRA data and the right to confront technology against you’ (2015) 42 N Ky L Rev 375, 381; Susan Nevelow Mart, ‘The algorithm as a human artifact: implications for legal [re]search’ (2017) 109 Law Libr J 387; Christian Chessman, ‘A “source” of error: computer code, criminal defendants, and the constitution’ (2017) 105 Cal L Rev 179 – for corrections in this article, see Duncan A. Taylor, Jo-Anne Bright and John Buckleton, Commentary, ‘A “source” of error: computer code, criminal defendants, and the constitution’ (2017) 8 Frontiers in Genetics 1; Molly Griffard, ‘A bias-free predictive policing tool?: an evaluation of the NYPD’s Patternizr’ (2019) 47 Fordham Urb LJ 43; Aylin Caliskan, Joanna J. Bryson and Arvind Narayanan, ‘Semantics derived automatically from language corpora contain human-like biases’ (2017) 356(6334) Science 183; Jieyu Zhao, TianluWang, Mark Yatskar, Vicente Ordonez and Kai-Wei Chang, ‘Men also like shopping: reducing gender bias amplification using corpus-level constraints’ in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics 2017), https://www.aclweb.org/anthology/D17-1323.pdf; Anupam Chander, ‘The racist algorithm?’ (2017) 115 Mich L Rev 2013; Virginia Eubanks, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St Martin’s Press 2018); Caroline Criado Perez, Invisible Women: Exposing Data Bias in a World Designed for Men (Chatto & Windus 2019).

2In her book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Broadway Books 2016, 2017), Cathy O’Neil demonstrates that software applications are written by human beings (mostly men), with choices as to how the software code is written, often on the basis of prejudice, misunderstanding and bias. Software writers define their own reality and then use it to justify the results. In writing software code, programmers routinely lack data for human behaviour, which means they substitute data from dubious statistical correlations that discriminate and whose use might even be illegal. For details of the case law cited, see ‘Book Reports’ (2017) 14 Digital Evidence and Electronic Signature Law Review 95. For an early example of software that was written and produced biased results because of the bias of the programmer, see Stella Lowry and Gordon Macpherson, ‘A blot on the profession’ (1988 March) 5 British Medical Journal 657, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2545288/; Anders Eklund, Thomas E. Nichol and Hans Knutsson, ‘Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates’ (2016) 113 Proc Natl Acad Sci 7900. For software bias that promotes male over female vocal artists, see Andres Ferraro, Xavier Serra and Christine Bauer, ‘Break the loop: gender imbalance in music recommenders’ in CHIIR ’21: Proceedings of the 2021 Conference on Human Information Interaction and Retrieval (Association for Computing Machinery 2021) 249, https://dl.acm.org/doi/pdf/10.1145/3406522.3446033.

Failure of specification

5.83 The problem might not be in the software code, but with the specification,¹ such as with the loss of the Mars Climate Orbiter spacecraft in 1999. On this occasion, the failure resulted from not using metric units in the coding of a ground software file. The thruster performance data used in the software application code entitled SM_FORCES (small forces) was in Imperial units instead of metric units.² Roy Longbottom, Head of the Large Scientific Systems Branch of the Central Computer Agency, observed that:

When the software is first written and assembled, as for hardware, it usually undergoes a series of design quality assurance tests to ensure that the specification is met on facilities, performance and on physical source requirements. It is again fairly easy to check out the broad facilities provided but impossible to forecast and test for all possible modes of operation, combinations and sequences. One difference with hardware is that, the writing of comprehensive tests³ for the software is often regarded as an overhead, whereas for hardware, comprehensive tests are written as a natural process for identifying constructional defects on all new equipment and for overcoming long term reliability problems. So, when software is first delivered, it is almost certain that the design will not be quite correct or some coding errors will be present.⁴

1For an example of the failure of a properly structured agreement that included what the customer wanted from the software; see South West Water Services Ltd v International Computers Ltd [1999] 6 WLUK 427, [1999] BLR 420, [1999–2000] Info TLR 1, [1998–99] Info TLR 154, [1999] ITCLR 439, [2001] Lloyd’s Rep PN 353, [1999] Masons CLR 400, [2000] CLY 870. In Co-Operative Group (Cws) Ltd. (Formerly Co-Operative Wholesale Society Ltd.) v International Computers Ltd. [2003] EWHC 1 (TCC), [2003] 12 WLUK 646, [2004] Info TLR 25, (2004) 27(3) IPD 27023, (2004) 148 SJLB 112, Times, 19 January 2004, [2005] CLY 42, the case failed for lack of a contract, but the judge observed, at [260], that ‘the initial efforts of ICL to try to meet the requirements of CWS as to when software was required were frustrated by the failure of CWS to specify precisely what its requirements were’.

2Mars Climate Orbiter Mishap Investigation Board Phase I Report (10 November 1999), https://llis.nasa.gov/llis_lib/pdf/1009464main1_0641-mr.pdf.

3Because of the discontinuous nature of software, the notion of a ‘comprehensive test for software’ does not exist, even in the high-integrity market. Testing every possible sequence of every possible input is not feasible.

4Roy Longbottom, Computer System Reliability (Wiley 1980), 71. This book may have been published in 1980, but remains true in the twenty-first century. Note chapter 6 regarding faults.

5.84 It is a pervasive characteristic of software code that design will not be quite correct or coding errors will be present, and there will be occasions when a fault cannot be replicated.¹ At the beginning, the attitude taken by NASA towards software code was to consider it of secondary importance. Although this view had changed over time, and a rigorous methodology has since been implemented to provide for the better control and development of software code, NASA has never produced error-free software code.²

1For which see National Transportation Safety Board, Pipeline Accident Report, Pipeline Rupture and Subsequent Fire in Bellingham, Washington June 10, 1999 (NTSB/PAR-02/02; PB2002-916502; Notation 7264A, Adopted 8 October 2002), 63, https://www.ntsb.gov/investigations/AccidentReports/Reports/PAR0202.pdf.

2Nancy G. Leveson, ‘Software and the challenge of flight control’ in Roger D. Launius, John Krige and James I. Craig (eds) Space Shuttle Legacy: How We Did It and What We Learned (American Institute of Aeronautics and Astronautics 2013).

Unintended software interactions

5.85 Software code might function correctly, as intended by the programmer, but the interactions between individual components of the software code can be the cause of failure, because the designers of the system fail to account for all potential interactions. This is because the possible number of defects in software relates not only to the components (lines of code), but also to the number of ways in which they interact – the number of interactions increases faster than the number of components, thus making large systems with many components proportionally harder to get right. As the work of Bianca Schroeder and Garth A. Gibson demonstrates, the more complex the system becomes, the more likely it is that different types of failure will occur,¹ and the number of reasons that complexity causes failure also increases.² To put the problem into perspective, it is necessary to understand not the number of defects per device but the proportion of design decisions that contain defects.³ A typical design decision in software looks like this:

if some-condition-I-have-decided-when-I-designed-the-software

then

do something

otherwise

do something else

1Schroeder and Gibson, ‘A large-scale study of failures in high-performance computing systems’.

2For the same discussion in 1986, see Rudolph J. Peritz, ‘Computer data and reliability: a call for authentication of business records under the federal rules of evidence’ (1986) 80(4) Northwestern University Law Review 965, 990–999; Stephen Mason and Timothy S. Reiniger, ‘“Trust” between machines? Establishing identity between humans and software code, or whether you know it is a dog, and if so, which dog?’ (2015) 21(5) Computer and Telecommunications Law Review 135; for a specific case study, see Sivanesan Tulasidas, Ruth Mackay, Pascal Craw, Chris Hudson, Voula Gkatzidou and Wamadeva Balachandran, ‘Process of designing robust, dependable, safe and secure software for medical devices: point of care testing device as a case study’ (2013) 6 Journal of Software Engineering and Applications 1.

3Nobody is certain how many defects occur per lines of code or number of design decisions, but for a good discussion, see McDermid and Kelly, ‘Software in safety critical systems’, 34 .

5.86 Illustrating the point with this simple example means that each design decision creates at least two choices for the software to handle, and further design choices will have to be made within the ‘do something’ bits, as well as in the ‘do something else’ bit as needed. One decision will have 2 choices, then as it is developed, 4, then 8, 16, 32, 64 and so on, increasing exponentially in complexity. Very quickly the choices go beyond human comprehension. This demonstrates that in software, a very few decisions rapidly create a far more complex thing than humans can reliably analyse and about which they can be confident they have made the right decisions, in even a modest fraction of the possible cases.¹ Since there are typically thousands of design decisions in the software for even relatively small products, there will be millions and millions of design choices, and hence it is easy to overlook hundreds of defects in the final products.² An average defect level of one to five defects per thousand lines of code could translate into hundreds if not thousands of defects for devices that have several hundred thousand to a million or more lines of code.³ This is the typical size of most software that controls aircraft,⁴ motor vehicles and many other common systems. The user is affected by how often the software fails. This is because one defect may cause failures frequently, and another defect may very seldom cause failures.⁵

1I owe this example and analysis to Professor Harold Thimbleby.

2Hoang Pham, System Software Reliability (Springer 2000), 2; Clemente Izurieta and James M. Bieman, ‘How software designs decay: a pilot study of pattern evolution’, First International Symposium on Empirical Software Engineering and Measurement (ESEM, 2007) (Institute of Electrical and Electronics Engineers 2009); Clemente Izurieta and James M. Bieman, ‘A multiple case study of design pattern decay, grime, and rot in evolving software systems’ (2013) 21 Software Qual J 289; Duc Minh Le, Carlos Carrillo, Rafael Capilla and Nenad Medvidovic, ‘Relating architectural decay and sustainability of software systems’, in 13th Working IEEE/IFIP Conference on Software Architecture (WICSA 2016) (Institute of Electrical and Electronics Engineers 2016); National Institute of Statistical Sciences, Code Decay in Legacy Software Systems: Measurement, Models, and Statistical Strategies: ‘Over time, software code can lose quality and begin having errors and problems working properly. [Note: code does not lose quality on its own, but because programmers continue to alter code.] It is more difficult to keep changing the code and has become much more expensive as well. Eventually the hardware fails and there is no way to update or port the software to newer tools. Lucent Technologies, along with the National Science Foundation, hired NISS to look at a way to quantify, measure, predict and reverse or retard code decay.’ For the results, see https://www.niss.org/research/code-decay-legacy-software-systems-measurement-models-and-statistical-strategies.

3William Guttman, professor of economics and technology at Carnegie Mellon University, is of the view that the figure is nearer 30 errors per 1,000 lines of code on average: Alorie Gilbert, ‘Newsmaker: fixing the sorry state of software’, CNET News, 9 October 2002 (this item no longer seems to be available online); see also The Economic Impacts of Inadequate Infrastructure for Software Testing: Final Report (May 2002) prepared for National Institute of Standards and Technology by RTI Health, Social, and Economics Research, https://www.nist.gov/system/files/documents/director/planning/report02-3.pdf; Herb Krasner, The Cost of Poor Quality Software in the US: A 2018 Report (Consortium for IT Software Quality 2018), https://www.it-cisq.org/the-cost-of-poor-quality-software-in-the-us-a-2018-report/The-Cost-of-Poor-Quality-Software-in-the-US-2018-Report.pdf.

4On 2 June 1994, Chinook helicopter ZD 576 crashed on the Mull of Kintyre. The RAF Board of Inquiry held the pilots to be negligent. Some considered that the installation of a Full Authority Digital Engine Control (FADEC) system was to blame, as described in detail in RAF Justice (Computer Weekly), http://cdn.ttgtmedia.com/rms/computerweekly/DowntimePDF/pdf/rafjust.pdf; Tony Collins, ‘Chinook crash: critical internal memo on software flaws’, Computer Weekly, 4 June 2009, http://www.computerweekly.com/news/2240089594/Chinook-crash-critical-internal-memo-on-software-flaws; the decision of the RAF Board of Inquiry was subsequently reversed: The Mull of Kintyre Review (HC Paper 1348, 2011), https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/247259/1348.pdf.

5P. G. Bishop, ‘The variation of software survival time for different operational input profiles (or why you can wait a long time for a big bug to fail)’ in FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing (IEEE 1993), 98–107.

5.87 Consider, by way of example, the 2003 power outage that affected large portions of the Midwest and Northeast United States and Ontario, Canada. The outage affected an area with an estimated 50 million people and 61,800 megawatts of electric load, and power was not restored for four days in some parts of the United States. Parts of Ontario suffered blackouts for more than a week before full power was restored. The subsequent investigation indicated a number of failures, a significant one being the failure of a computerized energy management system, XA/21 EMS. This system failed to detect the tripping of electrical facilities. After weeks of testing and analysis, a software coding error was discovered. It was a subtle incarnation of a common programming error called a race condition,¹ brought to light by a series of events and alarm conditions in the equipment being monitored. The race condition involved times measured in milliseconds. Mike Unum, manager of commercial solutions at GE Energy, explained the problem: ‘There was a couple of processes that were in contention for a common data structure, and through a software coding error in one of the application processes, they were both able to get write access to a data structure at the same time … And that corruption led to the alarm event application getting into an infinite loop and spinning.’²

1https://en.wikipedia.org/wiki/Race_condition.

2Kevin Poulsen, ‘Tracking the blackout bug’, SecurityFocus, 7 April 2004; US–Canada Power System Outage Task Force, Final Report on the August 14, 2003 Blackout in the United States and Canada: Causes and Recommendations (Merrimack Station AR-1165, April 2004), https://emp.lbl.gov/publications/final-report-august-14-2003-blackout.

5.88 This issue is further magnified by what are called ‘legacy’ systems. For instance, the computer systems used by airlines are very complex. There are a number of reasons for this: airlines introduced computer systems in the 1950s; as airlines merge, or take over other airlines, they might combine or adopt the computer systems they have inherited; over time, as new functions are added, this process has created systems of great complexity. The banking sector has the same problem. Replacing such systems is not an easy decision, because it would take a considerable amount of money and time, and it is doubtful whether any IT firm has sufficient skills and knowledge to provide all the software needed for a complete replacement.¹

1‘All systems stop: why big firms like Delta find it so hard to eliminate glitches from their IT systems’, The Economist, 13 August 2016 (from the print edition), https://www.economist.com/business/2016/08/13/all-systems-stop.

5.89 Consider a practical example. The display on a screen has a meaning, and if that meaning is not veridical, then an accident may result. Where the moon rising over the horizon causes a system to interpret it as a massive ICBM launch, semantic safety is violated: that is, the display (it might be a warning signal or something else) was not veridical. This problem has been linked to the possibility that a nuclear war has been averted by human intervention despite computer warnings of imminent attacks at least twice.¹

1I owe this suggestion to Professor Peter Bernard Ladkin. For the incident where software code made it appear the Soviet Union had launched a nuclear missile assault on the USA, see MacKenzie, Mechanizing Proof Computing, 23–24 and Eric Schlosser, Command and Control (Penguin 2014), 253–254; for an incident where software code made it appear there was a missile attack by the USA against the Soviet Union, see Ron Rosenbaum, How the End Begins: The Road to a Nuclear World War III (Simon & Schuster 2011) 7, 225–226, 248; Pavel Aksenov, ‘Stanislav Petrov: the man who may have saved the world’, BBC News, 26 September 2013.

5.90 It should be observed that the increasing use of machine-learning systems complicates this issue, because the software code is instructed to make further decisions when running, which increases the complexity. In addition, the veridicality of machine-learning systems, like neural nets, cannot be easily understood or verified.¹ Machine learning (ML) systems can learn (correctly or incorrectly) after they have been programmed: the errors they can make will typically not have been subject to the sort of scrutiny we expect of standard non-ML software. In particular, ML systems are easy to fool. There is a whole field of ‘adversarial ML’ which seeks training data to teach ML perverse things. One commonly quoted example is to spray STOP signs with innocuous-looking graffiti, and sign recognition software used in cars will read the STOP sign as 40 mph;² a more recent example is to mislead the software in autopilots by inserting split-second images into roadside billboards.³

1I owe this point to Dr Michael Ellims and Professor Martyn Thomas, CBE, FREng.

2Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno and Dawn Song, ‘Robust physical-world attacks on deep learning models’, CVPR 2018, https://arxiv.org/abs/1707.08945.

3Ben Nassi, Yisroel Mirsky, Dudi Nassi, Raz Ben Netanel, Oleg Drokin and Yuval Elovici, ‘Phantom of the ADAS: securing advanced driver assistance systems from split-second phantom attacks’, https://ad447342-c927-414a-bbae-d287bde39ced.filesusr.com/ugd/a53494_04b5dd9e38d540bc863cc8fde2ebf916.pdf.

Input data flaws

5.91 There are also what are known as ‘input-data flaws’, meaning that the data entered into the machine was not correct, thus ensuring the information coming out is also incorrect – colloquially known as ‘garbage-in-garbage-out’. In a well-designed system, the software should check, insofar as that is possible, that the input data is not wrong, corrupted or unexpected, and subject the output to a warning, perhaps via the user interface. This is a common problem in fairly simple systems such as databases, even in critical uses as in the medical field.

Operational errors

5.92 Another manifestation of human error would be operational error. Professor Leveson observed that it is ‘often very difficult to separate system design error from operator error: In highly automated systems, the operator is often at the mercy of the system design and operational procedures’.¹ The accuracy of this comment applies to virtually every automated system that includes computers and software code, and has, indirectly, caused significant loss of life. For instance, ‘user interface errors’ have been blamed for several aviation accidents, where the pilot as the user did not do anything wrong, but did not know the correct way to do what she wanted to do. Even in situations where people are part of a controlled and trained user community, such as ambulance controllers or air traffic controllers, human error rates in many tasks are high enough to stress systems in ways that are unpredictable. Examples of such situations in high stress industries are further explored in the rest of this chapter.

1Nancy G. Leveson, Engineering a Safer World: Systems Thinking Applied to Safety (MIT Press 2012), 39.

The development, maintenance and operation of software

5.93 As general-purpose computing systems have become more powerful and flexible, users have devised new uses for them in ways that the systems developers never envisaged. This, coupled with the increase in complexity and the speed at which computers work, especially in modern automated systems, means that developers can never completely anticipate how users will use their software, or how their software will interact with other systems and software. Even where the developers have tested their systems in the ways that most users use them (and possibly fail to test them against less conventional methods of use), they may subsequently need to issue upgrades that provide more functions, or updates to remedy any defects that have been found. In doing so, the developers will have modified the software and its operating conditions. Such changes will result in new modes of operation that have not been previously tested, causing the users to encounter defects they have not previously experienced. This problem is compounded when complex operations such as banking systems are connected to networks and can be attacked by hackers – internal or external to the organization – or just simply affected by unintentional actions of third parties, such as making errors during recovery from backups.

5.94 While it might appear that exhaustive testing could be the answer to these problems, it is impractical and does not necessarily work, and there is no workable theory that would constitute an adequate test. Professor Thomas notes that ‘The main way that software developers assure the quality of their work is by running tests, even though computer scientists have been saying for the past forty years that testing can never show that software is secure or correct’.¹ For even relatively small systems, the number of possible test cases required for comprehensive testing is enormous. It is also not always certain whether or not the software has passed or failed the test, and it is necessary to repeat the tests after all software changes. Furthermore, a single test case can only expose a system to a very specific set of conditions and data values. The number of variations is, in practical terms, unbounded because a robust test must consider, among other things, different data values, the number of simultaneous jobs running, the system memory configuration, the hardware configuration, all of the connected devices or systems, the operators’ actions, user errors, data errors, device malfunctions, and so forth. However, just because testing is a complex affair does not mean that testing should not be carried out.² This is so especially when people can be killed or injured,³ as in the case of the sudden unintended acceleration problems experienced by owners of some modern motor vehicles which operate with electronic control systems.⁴ Michael Barr, in giving evidence as the expert witness for the plaintiffs in the trial of Bookout v Toyota Motor Corporation Case, gave the following in oral testimony:

[Toyota] didn’t [have] a formal safety process like the MIRSA, the big book. They don’t follow a recipe for making a safe system.

They also have the defect that they didn’t do peer reviews on the operating system code or the monitor CPU codes. And here, ultimately, it comes down to resources. Toyota did not put people and time behind checking up on the suppliers who were supplying this critical software [for their vehicle electronic control systems]. The operating system at the heart of this main CPU and this and second CPU that’s doing the monitoring.⁵

1Martyn Thomas, ‘Technology, security and politics’ (2016) 25(3) SCSC Newsletter 53.

2For which see Chris Elliott and Peter Deasley (eds), Creating Systems that Work: Principles of Engineering Systems for the 21st Century (The Royal Academy of Engineering 2007).

3Matt Parker, Humble Pi: A Comedy of Maths Errors (Allen Lane 2019) – the title is hardly fitting, given the author refers to a total of 1,517 deaths as a result of software errors, and four deaths relating to the results of a lottery (at 156).

4For safety critical systems, see B. Littlewood, I. Bainbridge and R. E. Bloomfield, The Use of Computers in Safety-Critical Applications (Health and Safety Commission 1998).

5No. CJ-2008-7969 (Reported by Karen Twyford, RPR): examination and cross examination of Michael Barr 14 October 2013, 80, http://www.safetyresearch.net/Library/Bookout_v_Toyota_Barr_REDACTED.pdf.

Developmental issues and software errors

5.95 In examining the nature of a software fault, even at a time when software was less complex than now, Professor Randell and his colleagues made the following astute observation:

A detected error is only a symptom of the fault that caused it, and does not necessarily identify the fault. Even where the relationship between the fault and the detected error appears obvious, it will be found that many other possible faults could have caused the same error to be detected.¹

1B. Randell, P. Lee and P. C. Treaven, ‘Reliability issues in computing system design’ (1978) 10(2) ACM Computing Surveys 126, 127; but as Professor Thimbleby has pointed out when reviewing this chapter, simple slips (including programming errors) do not stem from unmastered complexity; they just stem from random events.

5.96 Professor Randell also commented that ‘What is significant about software faults is, of course, that they must be algorithmic faults stemming from unmastered complexity in the system design’.¹ This is a telling observation, in that the primary source of software errors lies in its development process. There are numerous issues in the development of software that will generate errors, including but not limited to the speed at which a developer is required to work to write proprietary software within the contractual time frame, the consistent failure within the industry to provide for suitable quality control procedures, the creation of a climate of fear to suppress concerns relating to errors and safety,² and the lack of knowledge that programmers may have of the domain in which the software is to work (for instance, the programmer might be knowledgeable about mathematics, but have no knowledge of how acceleration systems work in motor vehicles³). In addition, it is extremely difficult to develop good software without well-designed and mature engineering processes, and impossible to do so consistently. Such processes involve the production of essential documents that enable effective communication between members of the development team and those who will accept, install, use and modify the software. The existence of such documents does not guarantee that the software is of high (or adequate) quality, but the absence or lack of rigorous quality control is a strong indication of poor-quality software.⁴

1Randell and others, ‘Reliability issues in computing system design’, 127.

2Nancy G. Leveson, ‘Technical and managerial factors in the NASA Challenger and Columbia losses: looking forward to the future’, in Daniel Lee Kleinman, Karen A. Cloud-Hansen, Christina Matta and Jo Handelsman (eds) Controversies in Science and Technology Volume 2: From Climate to Chromosomes (Mary Ann Liebert Press 2008); for a legal response to this problem, see Richard Warner and Robert H. Sloan, ‘Vulnerable software: product-risk norms and the problem of unauthorized access’ (2012) Journal of Law, Technology & Policy 45.

3Michael Ellims, ‘On wheels, nuts and software’, 9th Australian Workshop on Safety Related Programmable Systems (SCS ’04) in Brisbane, 2.1, http://crpit.com/abstracts/CRPITV47Ellims.html.

4I thank Professor Martyn Thomas CBE for these observations.

5.97 In addition, unrealistic estimates of how long it will take to write and test software also undermine accuracy,¹ which means that those responsible for writing software code will not have the time or resources to be comprehensive in developing the software.² It is also necessary to have a comprehensive design that has been subjected to peer review that should precede any coding. Often, the writing of lines of code remains the ready and easily quantifiable measure of progress, which means that writing code starts much too soon, and too little emphasis is placed on good design.

1This is just one of the problems. Frederick P. Brooks, The Mythical Man-Month Anniversary Edition (Addison Wesley Longman, Inc. 1995). For a comprehensive failure, see Slaughter and May, TSB Review An Independent Review Following TSB’s Migration to a New IT Platform ( October 2019), https://www.slaughterandmay.com/news/slaughter-and-may-s-independent-review-of-tsb-s-2018-migration-to-a-new-it-platform/.

2This is not a recent phenomenon. Even in 1976 it could be said that ‘debugging and testing often account for half the cost of a program’: Theodore A. Linden, ‘Operating system structures to support security and reliable software’ (1976) 8(4) ACM Computing Surveys 409, 410–411 (also available as a US Department of Commerce National Bureau of Standards Technical Note 919, https://csrc.nist.gov/csrc/media/publications/conference-paper/1998/10/08/proceedings-of-the-21st-nissc-1998/documents/early-cs-papers/lind76.pdf); and more recently, Robert N. Charette, ‘Why software fails’ (2005) 42(9) IEEE Spectrum (2005) 42, http://spectrum.ieee.org/computing/software/why-software-fails; Partridge, The Seductive Computer; W. Wayt Gibbs, ‘Software’s chronic crisis’, Scientific American, September 1994, 86.

5.98 This is not to say that all software programmers are incompetent or that they do not wish to undertake work of a high quality. In their writings about software errors, Algirdas Avižienis and colleagues define ‘human-made faults’¹ as including faults of omission, and wrong actions that lead to faults of commission. ‘Human-made faults’ are, in turn, divided into malicious faults and ‘non-malicious’ or guileless faults. These faults can be introduced during the development of the system by a developer or during use by an external third party. Guileless faults can be classified as faults resulting from mistakes, and deliberate faults that are brought about because of poor decisions – usually caused when choices are made to accept having less of one thing in order to get more of something else, for instance to preserve acceptable performance, or because of economic considerations. Developers who commit such faults may unintentionally or deliberately violate an operating procedure with or without understanding the consequences of their action.²

1Avižienis and others, ‘Basic concepts and taxonomy of dependable and secure computing’, 15–18.

2For instance, management in Boeing was aware that additional software added to the 737 Max, if not correctly dealt with by a pilot within 10 seconds, would lead to catastrophic results, for which see The House Committee on Transportation & Infrastructure, Final Committee Report: The Design, Development & Certification of the Boeing 737 MAX (September 2020), 231, https://transportation.house.gov/imo/media/doc/2020.09.15%20FINAL%20737%20MAX%20Report%20for%20Public%20Release.pdf.

5.99 The main part of the problem is that writing software is an exceedingly difficult and challenging field, and the methods used by management to control quality are not necessarily the most competent that can be used. However, writing software is now getting easier. Advanced development environments generate some code automatically, although writing software to perform complex functions that works well in all circumstances continues to be demanding. Many amateurs have had the experience of being able to build software that achieves impressive effects with very little effort. This may well lead them to believe that because they find it easy to program a simple video game or puzzle-solver (whose failures do not matter and will probably go unnoticed), or some simple program that seems reliable enough for their personal, everyday use, then completely finishing or building other complex software systems that are correct must be just as easy.

5.100 A further barrier arises when an organization is collectively incompetent.¹ This in turn means that inherent problems in software used in large organizations may not be identified for a long time. For instance, in 2003 Oates Healthcare began to use a new software product that was written for the company. At that time it was not known that the code written by the programmer was defective, in that it failed to calculate overtime for employees correctly. The problem was identified when a previous employee took legal action against the company five years after the software was implemented. As a result of discovering this problem, the company had to undertake two exercises. First, the simple solution was to write new software code to permit the program to begin calculating overtime correctly from the point in time that the software was amended. Second, because the changes to the software were not capable of affecting the previous calculations, the previous records had to be recalculated manually, which is an admission of poor programming: it would not be necessary do it manually if a computer could do it. Apparently there were over 10 million records that needed to be recalculated.²

1As in the example of the failure of the AAS system: Office of Inspector General, Audit Report: Advance Automation System: Federal Aviation Administration, Report Number: AV-1998-113 (15 April 1998), https://www.oig.dot.gov/sites/default/files/av1998113.pdf.

2Phil Simon, Why New Systems Fail: Theory and Practice Collide (AuthorHouse 2009), 7–9.

5.101 A software project can fail partly because of a combination of the failure of management, an unrealistic time frame to develop the software, and a failure to develop and test software properly. There are many examples of such failure, and more importantly, some failures do not come to light until after the project is complete.¹

1Robert L. Glass, Software Runways: Lessons Learned from Massive Software Project Failures (Prentice Hall PRT 1998), xiii–viv; Lee, The Day the Phones Stopped; Nancy G. Leveson, ‘Role of software in spacecraft accidents’ (2004) 41(4) Journal of Spacecraft and Rockets 564.

Increasing the risk of errors through modification of software

5.102 Software typically goes through modification cycles, called updates or upgrades, to fix existing errors in code or enhance or improve software functionality. One of the major causes of software failure is that, as software code is modified, each modification is capable of increasing the risk of failure. Some of the changes that are meant to fix errors may create another one, resulting in a greater or smaller probability of failure. Where a vendor releases a significant number of new features or a major redesign, there is typically a sudden increase in the probability of failure, after which the risk is reduced once further error updates begin to resolve the errors discovered, thus reducing the risk again over time.

5.103 It is useful to observe that when safety-related software code is modified, there is usually documentation to explain how this risk has been reduced, although this is routine only in the case of dangerous failures, and not necessarily all failures. By way of example, consider the case of Saphena Computing Limited v Allied Collection Agencies Limited¹ in which Mr Recorder Havery QC commented:

In the present case, on the other hand, once the software is fit for its purpose, it stays fit for its purpose. If by any chance a flaw is discovered showing that it is unfit for purpose (which is hardly likely after prolonged use)² there is a remedy in damages against the supplier, if solvent, until the expiry of the period of limitation.³

1[1989] 5 WLUK 21, [1995] FSR 616, [1995] CLY 774.

2Professor Thomas has indicated that even in 1995 there was plenty of evidence that this was not correct.

3[1995] FSR 616 at 639.

5.104 The problem with this remark is that proprietary software code can be (and indeed often is) affected by updates, which means it does not necessarily stay ‘fit for purpose’. It can also be affected by updates in other code, for instance – and quite commonly – in updates to the operating system on which it runs. Flaws can become manifest at any time, and some flaws can remain for years, which means if they are detected by a malicious person or state agency, they can be manipulated for purposes other than that which the users intend. There is a more fundamental flaw in the statement that ‘it stays fit for its purpose’. If the software is used unchanged for a different purpose, which may be no more than the original purpose but applied to different data, it may still fail.

5.105 This is illustrated in the Heartbleed exposé.¹ Cryptographic protocols are used to provide for the security and privacy of communications over the Internet, such as the World Wide Web, email, instant messaging and some virtual private networks. A current protocol is called the Transport Layer Security (TLS). To implement this protocol, a developer will use a cryptographic library. One such library, which is open source, is OpenSSL. In 2011, a doctoral student wrote the Heartbeat Extension for OpenSSL, and requested that his implementation be included in the protocol. One of the developers (there were four) reviewed the proposal, but failed to notice that the code was flawed. The code was included in the repository on 31 December 2011 under OpenSSL version 1.0.1. The defect allowed anyone on the Internet to read the memory of any system that used the flawed versions of the OpenSSL software. It was possible for a hacker using this flaw to steal user names and passwords, instant messages, emails and business documents. No trace would be left of the attack. The attack did not rely on access to privileged information or credentials such as username and passwords. Taking into account the length of exposure, the ease by which it could be exploited, the fact that an attack would not leave a trace and that it is estimated to have affected up to two-thirds of the Internet’s web servers, this weakness was taken very seriously. On 7 April 2014, the same day the Heartbleed vulnerability was publicly disclosed, a new version that applied a fix to the flaw was released.

1Zakir Durumeric, James Kasten, David Adrian, J. Alex Halderman, Michael Bailey, Frank Li, Nicholas Weaver, Johanna Amann, Jethro Beekman, Mathias Payer and Vern Paxson, ‘The matter of Heartbleed’, IMC ’14: Proceedings of the 2014 Conference on Internet Measurement Conference (Association for Computing Machinery, New York, United States, 2014), 475–488. A more important error was discovered in GNU Bash in September 2014, for which see ‘Bourne-again Shell (Bash) remote code execution vulnerability’ (original release date 24 September 2014; last revised 30 September 2014), https://www.us-cert.gov/ncas/current-activity/2014/09/24/Bourne-Again-Shell-Bash-Remote-Code-Execution-Vulnerability.

5.106 Software can also be affected by changes in the environment, such as the operating system or other components, rather than any specific application, although it is necessary to distinguish between modification of software in situ and the reuse of software in an environment that is presumed to be similar. An example is the Ariane 5 incident, where a malfunction arose from a changed environment and assumptions that were poorly understood, rather than a defect in the original development. Where the software is modified in situ, the environment does not change; where software is reused in an environment that is presumed to be similar, the software has not changed, but the environment has. The results in either case are that there may be a mismatch where there was none before.

5.107 Generally speaking, programmers who modify someone else’s code often do not fully understand the software, and may also be less well trained than the people who originally wrote it. Software can (if appropriately designed) be relied upon to produce verifiably correct results, but to have such a degree of certainty, it is necessary to be assured that the operating conditions remain identical and that nothing else malfunctions. Peter G. Neumann has indicated that even though the utmost care and attention might be devoted to the design of a system, it may still have significant flaws.¹ This was illustrated in a 1970 report edited by Willis H. Ware. The authors noted, under ‘Failure Prediction’ within section V System Characteristics, that:

In the present state of computer technology, it is impossible to completely anticipate, much less specify, all hardware failure modes, all software design errors or omissions, and, most seriously, all failure modes in which hardware malfunctions lead to software malfunctions. Existing commercial machines have only a minimum of redundancy and error-checking circuits, and thus for most military applications there may be unsatisfactory hardware facilities to assist in the control of hardware/software malfunctions. Furthermore, in the present state of knowledge, it is very difficult to predict the probability of failure of complex hardware and software configurations; thus, redundancy [is] an important design concept.²

1Neumann, Computer Related Risks, 4; see his text generally for this topic.

2Security Controls for Computer Systems: Report of Defense Science Board Task Force on Computer Security – RAND Report R-609-1, http://www.rand.org/pubs/reports/R609-1/index2.html.

5.108 The authors of the report went on to observe the following in Part C, Technical Recommendations:

(a) It is virtually impossible to verify that a large software system is completely free of errors and anomalies.

(b) The state of system design of large software systems is such that frequent changes to the system can be expected.

(d) System failure modes are not thoroughly understood, cataloged, or protected against.

(e) Large hardware complexes cannot be absolutely guaranteed error-free.

Security vulnerabilities

5.109 Software vulnerabilities are software errors generally hidden from view. While they generally cause users no harm, they may be exploited by state security services, malicious hackers and professional thieves for various advantages, including theft of personal data (to sell on), control of vulnerable systems, drug smuggling,¹ blackmail and other forms of financial gain. The market in selling packets of software code known as ‘exploits’ has become significant. Legitimate businesses may sell a vulnerability in a software code to business and government agencies, and hackers may sell a vulnerability to anyone who will buy them. These vulnerabilities, particularly those against which there are no pre-existing defences, known as ‘zero day exploits’,² may be exploited, whether legally or illegally, for criminal investigation as well as for the purposes of cyber espionage, including the violation of confidentiality (stealing information), availability (denial of service for political intimidation or blackmail) and integrity (corrupting information to steal from banks or to cause an embedded computer system to cause accidents).

1Hackers Deployed to Facilitate Drugs Smuggling, Intelligence Notification 004-2013, June 2013, Europol Public Information, https://www.europol.europa.eu/publications-documents/cyber-bits-hackers-deployed-to-facilitate-drugs-smuggling.

2https://en.wikipedia.org/wiki/Zero-day_(computing).

5.110 To address these vulnerabilities, software vendors often, but not always, issue ‘security patches’ regularly each month (sometimes referred to as ‘software updates’ to conceal the nature of the update) in recognition of the failure of their software. Yet these may give rise to more problems. For instance, an important security weakness was discovered in relation to the distribution of software patches (which, ironically, was put in place to address security weaknesses). This meant that attackers who receive the patch first might compromise vulnerable hosts who have yet to receive the patch.¹

1Two examples from many: David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng, ‘Automatic patch-based exploit generation is possible: techniques and implications’, 2008 IEEE Symposium on Security and Privacy (sp 2008) (Oakland, IEEE 2008); Yan Wang, Chao Zhang, Xiaobo, Zixuan Zhao, Wenjie Li, Xiaorui Gong, Bingchang Liu, Kaixiang Chen, Wei Zou, ‘Revery: from proof-of-concept to exploitable’, CCS ’19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (New York, Association for Computing Machinery 2019), 1689–1706.

5.111 Software security vulnerabilities are particularly pertinent to businesses and industries that operate or rely on digital security infrastructures. For these industries, there are other issues to consider. The first is whether the design of the security protocol is robust. An example of a failure in this category is with banking systems,¹ and although a design can be modified, at best it is only possible to take a provisional view in respect of this point, because designs constantly change and are therefore liable to failure. The second is whether the security protocol is implemented properly. For instance, a number of ATMs were tested around Cambridge in the UK, and it was found that nonce generation was predictable. A nonce is supposed to be a unique object in a protocol, a one-time ‘security code’, but it was found that some ATMs were using a small supply of tokens as nonces and reusing them in a predictable order, thereby compromising their security.²

1Steven J. Murdoch, Saar Drimer, Ross Anderson and Mike Bond, ‘Chip and PIN is broken’ in 31st IEEE Symposium on Security and Privacy (IEEE Computer Society 2010) 433–446, https://www.cl.cam.ac.uk/research/security/banking/nopin/oakland10chipbroken.pdf; Steven J. Murdoch, ‘Reliability of Chip & PIN evidence in banking disputes’ (2009) 6 Digital Evidence and Electronic Signature Law Review 98.

2Megan Geuss, ‘How a criminal ring defeated the secure chip-and-PIN credit cards’, arstechnica, 20 October 2015, http://arstechnica.com/tech-policy/2015/10/how-a-criminal-ring-defeated-the-secure-chip-and-pin-credit-cards/; Mike Bond, Omar Choudary, Steven J. Murdoch, Sergei Skorobogatov and Ross Anderson, ‘Chip and skim: cloning EMV cards with the pre-play attack’, paper presented to Cryptographic Hardware and Embedded System (CHES) 2012, Leuven, Belgium, September 2012, https://murdoch.is/papers/oakland14chipandskim.pdf; Houda Ferradi, Rémi Géraud, David Naccache and Assia Tria, When Organized Crime Applies Academic Results: A Forensic Analysis of an In-Card Listening Device, https://eprint.iacr.org/2015/963.pdf.

5.112 Furthermore, security may be associated with safety. If there is a safety-related system with security vulnerabilities, it is possible for the safety functions in the system to be deliberately subverted and give rise to a safety issue. For instance, the nuclear industry has developed a draft international standard for safety and security.¹ The vital problem in this area, which nobody has solved, is that while updates of safety functions in code that control nuclear reactors are slow, deliberate and highly analytical, updates for security purposes have to be rapid, to forestall anticipated attempts via zero-day exploits. These two modi are obviously incompatible.

1Caroline Baylon, with Roger Brunt and David Livingstone, Cyber Security at Civil Nuclear Facilities: Understanding the Risks, Chatham House Report (The Royal Institute of International Affairs, September 2015), https://www.chathamhouse.org/publication/cyber-security-civil-nuclear-facilities-understanding-risks.

5.113 It follows that software security vulnerabilities expose the software to manipulations without the authority or knowledge of the software vendor.¹ Many of the vulnerabilities arise specifically from the errors in the original implementation. For instance, it might be possible for a person to control another owner’s computer as part of a botnet² or enter the control system of an aircraft in flight via the in-flight entertainment system.³

1The Trojan horse problem was recognized very early on, for which see Linden, ‘Operating system structures to support security and reliable software’, 422–424.

2Sanjay Goel, Adnan Baykal and Damira Pon, ‘Botnets: the anatomy of a case’, (2005) 1(3) Journal of Information System Security 45.

3See Applicant for a Search Warrant in the case of Chris Roberts at the United States District Court for the Northern District Court of New York Case number 5:15-MJ-00154 (ATB) dated 17 April 2015, [18]–[19], http://www.wired.com/wp-content/uploads/2015/05/Chris-Roberts-Application-for-Search-Warrant.pdf; https://assets.documentcloud.org/documents/2082796/gov-uscourts-nynd-102002-1-0.pdf; Caleb Kennedy, ‘New threats to vehicle safety: how cybersecurity policy will shape the future of autonomous vehicles’ (2017) 23 Mich Telecomm & Tech L Rev 343.

5.114 At this point, the reader might consider that such problems can be solved fairly easily – by the introduction of anti-virus software (this is not to imply that all attacks occur through the use of malicious software). But it must be understood that the fundamental nature of most anti-virus software limits its effectiveness – and the anti-virus software itself might not be error-free. A sophisticated attacker will have access to all types of anti-virus software, and he will program round the detection mechanisms and test his code against the anti-virus systems to ensure it is not detected.¹ Most anti-virus software is reactive, in that it searches for known threats. As such, anti-virus software is far from perfect. It can fail to stop some malicious software² and should not be relied upon as the sole method of securing a computer.

1J. A. P. Marpaung, M. Sain and Hoon-Jae Lee, ‘Survey on malware evasion techniques: state of the art and challenges’, Advanced Communication Technology (ICACT), 2012 14th International Conference, PyeongChang, (Global IT Research Institute 2012), 744–749; Chandra Sekar Veerappan, Peter Loh Kok Keong, Zhaohui Tang and Forest Tan, ‘Taxonomy on malware evasion countermeasures techniques’, 2018 IEEE 4th World Forum on Internet of Things (WF-IoT) (Institute of Electrical and Electronics Engineers 2018).

2Daniel Bilar, ‘Known knowns, known unknowns and unknown unknowns: anti-virus issues, malicious software and internet attacks for non-technical audiences’ (2009) 6 Digital Evidence and Electronic Signature Law Review 123; in 2006, Graham Ingram, the general manager of the Australian Computer Emergency Response Team (AusCERT), told an audience in Sydney, Australia, that popular desktop antivirus applications do not work: Munir Kotadia, ‘Eighty percent of new malware defeats antivirus’ (19 June 2006), ZDNet Australia; Michael A. Caloyannides, ‘Digital evidence and reasonable doubt’ (2003) 1(6) IEEE Security and Privacy 89; Dmitry Silnov, ‘Features of virus detection mechanism in Microsoft security essentials (Microsoft forefront endpoint protection)’ (2013) 4(2) Journal of Information Security 124; also see the annual ‘X-Force Trend Statistics’ by IBM Internet Security Systems that reinforces the position on the failure of anti-virus software, https://www.ibm.com/security/data-breach/threat-intelligence; the reports produced by the Anti-Phishing Working Group (http://www.antiphishing.org/) illustrate the same problem; reports by AV-Comparatives.org appear to indicate that some of the best products are now very efficient, http://www.av-comparatives.org/; see also ‘Common vulnerabilities and exposures’, https://cve.mitre.org/.

5.115 It is a truth universally acknowledged that the majority of hackers concentrate on the most widely used software and on vulnerable applications that can be found by using Internet search engines. The development of the Stuxnet virus illustrates that governments are now probably responsible for some of the most effective viruses that are written, although organized criminals can be equally effective.¹ Software need only include a low number of defects to create enough vulnerabilities for hackers to manipulate them to their advantage. Jim Nindel-Edwards and Gerhard Steinke usefully sum up the position:

It would seem that after decades of software development there would be some assurance that software works as specified in the customer requirements. Is it that software vendors are unwilling to perform sufficient testing? Is it possible to test everything? Finding a certain number of bugs, doesn’t mean that the software has no more bugs. On the other hand, not finding any defects doesn’t mean there aren’t any defects in the software either. Perhaps there are known bugs, but the time and resources to fix these bugs and defects are often not provided and the software is released with known (but not publicly stated) bugs. Is it because there is a low expectation of quality? Is it even possible to get rid of all bugs, especially when we are integrating components from multiple sources and we are dependent on the software that was developed and tested by others?

Software quality assurance is a challenging task. There are many questions raised by software being released with defects. What are the ethical responsibilities of a software vendor releasing software with bugs, especially if it is system-critical software, but also when releasing non system-critical software?²

1Roderic Broadhurst, Peter Grabosky, Mamoun Alazab, Brigitte Bouhours and Steve Chon, ‘Organizations and cyber crime: an analysis of the nature of groups engaged in cyber crime’ (2014) 8(1) International Journal of Cyber Criminology 1, http://www.cybercrimejournal.com/broadhurstetalijcc2014vol8issue1.pdf; https://www.nationalcrimeagency.gov.uk/what-we-do/crime-threats/cyber-crime; https://www.unodc.org/e4j/en/cybercrime/module-13/key-issues/criminal-groups-engaging-in-cyber-organized-crime.html.

2Jim Nindel-Edwards and Gerhard Steinke, ‘Ethical issues in the software quality assurance function’ (2008) 8(1) Communications of the IIMA article 6, 53, 54.

Software testing

5.116 Most software organizations test their products extensively, including in the ways that they anticipate that their customers will use them. Indeed, most software has become so complex that in a process called beta testing, software has been provided to volunteers to test before it is sold as a product. It has also been suggested that the problems of the composition of components in large systems can be mitigated by programmers reusing components in ways that they know from experience tend to work,¹ although this view is not generally accepted.² However, there will continue to be malfunctions, because many problems in hardware, software and configuration are only exposed when the system runs under real workloads.³ A number of issues arise in this respect, including the use of tools to test software fault tolerance or robustness,⁴ the degree to which the testing accurately reflects the way users will actually use the software, the unconventional ways in which people may attempt to use the program and testing how the software works when connecting and communicating with different software and hardware. It is well known that testing software is inadequate for uncovering errors, because there is never enough time to cover all the cases, as the illustrations mentioned in this chapter vividly show. Professor Thimbleby has indicated that the only solutions are:

(1) a very careful approach to reasoning about the requirements that lead to the decisions,

(2) a mathematically rigorous way to analyse the combinations of decisions,

(3) rigorous testing, primarily to uncover whether there were flaws in steps (1) and (2), including in the testing process itself, and

(4) external oversight to avoid mistakes in one’s reasoning – this includes processes such as code review by third parties.⁵

1C. A. R. Hoare, ‘How did software get so reliable without proof?’ in Marie-Claude Gaudel and Jim Woodcock (eds) Lecture Notes in Computer Science, vol 1051/1996 (Springer 1996), 1–17.

2Bev Littlewood and Lorenzo Strigini, ‘The risks of software’, Scientific American 267(5), (November 1992) 62–75, cited by Partridge, The Seductive Computer, 205, fn 15; B. Littlewood and L. Strigini, ‘Validation of ultra-high dependability – 20 years on’ (2011) 20(3) SCSC Newsletter, http://www.staff.city.ac.uk/~sm377/ls.papers/2011_limits_20yearsOn_SCSC/BL-LS-SCSSnewsletter2011_02_v04distrib.pdf.

3Schroeder and Gibson, ‘A large-scale study of failures in high-performance computing systems’, 343.

4Although the availability of such tools does not mean that developers use them to improve their systems, for which see John DeVale and Philip Koopman, ‘Robust software – no more excuses’ in Danielle C. Martin, editorial production, Proceedings International Conference on Dependable Systems and Networks (The Institute of Electrical and Electronics Engineers, Inc. 2002), 145–154; The Economic Impacts of Inadequate Infrastructure for Software Testing: Final Report (May 2002), Prepared for National Institute of Standards and Technology by RTI Health, Social, and Economics Research, https://www.nist.gov/system/files/documents/director/planning/report02-3.pdf.

5Personal communication with the author.

5.117 The problem with a presumption that a computer is deemed to be ‘reliable’ is that, as systems become more complex, it has become progressively more challenging to test software to reflect the way the users will actually use the product. This is because of the large number of functions that software is required to perform and the unpredictability of its users.¹ Professor Partridge reiterates the point that ‘no significant computer program is completely understood’,² and goes further by indicating that systems are now so complex that humans are no longer able to deal with the problems:

We might speculate further: if the nature of computer-system complexity really is new and peculiar, a system characteristic that has no parallel in the natural world, then our evolutionary history is unlikely to have equipped us to reason effectively with such systems. Our genetic programs may be totally lacking in mechanisms that can deal effectively with discrete complexity.³

1The rise in fraud that took advantage of the faults in software was rapidly increasing in the 1970s, for which see Linden, ‘Operating system structures to support security and reliable software’, 410.

2Derek Partridge, What Makes You Clever – the Puzzle of Intelligence (World Scientific 2014), 394 and 407 fn 22.

3Partridge, The Seductive Computer, 192.

5.118 This weakness is now recognized by some of the organizations that produce devices and software. Microsoft and Apple are among a number of companies that have adopted a ‘bug’ bounty programme to reward professionals who test and find errors in the software.¹ The US Department of Defense has also taken this approach, as has Google in respect of cryptographic software libraries.² Yet claims that software code and hardware products have been independently tested does not necessarily lead to the conclusion that they can be relied upon. In his ACM Turing Lecture of 1972, Professor Dijkstra said this of testing:

Today a usual technique is to make a program and then to test it. But: program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence.³

1Microsoft Bounty Program, https://www.microsoft.com/en-us/msrc/bounty?rtc=1; https://developer.apple.com/security-bounty/; Project Wycheproof, https://github.com/google/wycheproof; HackerOne, formed by Facebook, Microsoft and Google, https://www.hackerone.com/.

2DoD Vulnerability Disclosure Policy, available at https://hackerone.com/deptofdefense.

3https://www.cs.utexas.edu/~EWD/ewd03xx/EWD340.PDF; the lecture was published as an article: Edsger W. Dijkstra, ‘The humble programmer’, (1972) 15(10) Communications of the ACM 859.

5.119 In other words, good-quality testing might discover the failings of the developer, but is less capable of resolving the issues in the overall design of software: there are significant limits to testing.

Writing software that is free of faults

5.120 As Professor Thomas indicates, it is possible to design and develop software so that it is almost completely free of faults.¹ Many applications are now built without the developer writing any code at all. The coding is done in building the tools that generate the code when given parameters by the developer – and this is premised on the fact that the software tools that generate the code are themselves error free.

1‘Should we trust computers?’, lecture given at Gresham College on 20 October 2015, http://www.gresham.ac.uk/lectures-and-events/should-we-trust-computers.

Software standards

5.121 Where an organization produces safety critical software for aeroplanes, motor vehicles, air traffic control or power stations, it will be necessary to conform to the requirements of an international standard on the functional safety of programmable electronic systems.¹ For instance, security in the banking sector relies on certification standards such as FIPS-140 Information Technology Security Evaluation Criteria (ITSEC) and the Common Criteria for Information Technology Security Evaluation. It should be noted that these schemes only focus on aspects of security, and not on overall functionality. It is possible to have an accredited product that implements the security functions well, but its business functions badly.

1For a discussion, see the NuSAC Study Group on the Safety of Operational Computer Systems, The Use of Computers in Safety-Critical Applications (Her Majesty’s Stationery Office 1998).

5.122 The ITSEC scheme, which is no longer as active as it once was, assesses a product based on a document prepared by the organization that wants that product to be evaluated. In general terms, a document that is submitted to ITSEC describes what the product is designed to do, the situation in which it is intended to operate, the risks the product is likely to encounter and the mechanism by which the product acts to protect against the risks. It is for ITSEC to determine whether the claims are substantiated. Only the risks identified by the applicant are tested. A product is given one of seven levels from E0 (no formal assurance) to E6 (the highest level of confidence), with each level representing increasing levels of confidence. The assessment and granting of a position on the E scale is a judgement that a certain level of confidence has been met; it is not a measure of the strength of the security in place. It is important to realize that the organization submitting the product for evaluation sets out the criteria by which it will be evaluated, and it may be that this organization will not have included the risks associated with the use of the product by the end user. The evaluation includes an assessment of the confidence to be placed in whether the security features are the correct ones and how effectively they work. This means that a security mechanism might be applied correctly, but it will not be effective unless it is appropriate for the purpose for which it has been designed. In this respect, it is necessary to know why a particular security function is necessary, what security is actually in place and how the security is provided. It does not follow that if a product has a high E level that it will provide a high level of security.

5.123 The ‘Common Criteria for Information Technology Security Evaluation’ and ‘Common Methodology for Information Security Evaluation’ comprise the technical basis for an international agreement called the Common Criteria Recognition Agreement. The manufacturer submits its product to an independent licensed laboratory for an assessment. The way a product is evaluated is similar to the way ITSEC undertakes such assessments. There are problems with this, because it creates a conflict of interest: there are no known examples of the revocation of licences of laboratories that conduct evaluations, both parties are able to subvert the process, and determining the name of the organization that conducted the evaluation might be impossible without an order for disclosure. In addition, claims will sometimes be made that a device has been certified when, in fact, it might only have been evaluated. Often, a bank will ask a judge to rely on the certification process without disclosing the relevant report. In the Norwegian case of Bernt Petter Jørgensen v DnB NOR Bank ASA, Journal number 04-016794TVI-TRON, Trondheim District Court, 24 September 2004,¹ Assistant Judge Leif O. Østerbø, who tried the case, said at 121–122 (emphasis added):

It is assumed that the standard security systems that are used are effective. However, according to Jørgensen, no cases have been documented that demonstrate [that] the implementation of the systems are secure.

The court refers in this respect to the fact that banks are subject to supervision and operate a comprehensive internal control work, and the witness Haugstad’s explanation that both the standards and the practical implementation are revised thoroughly and regularly. In that regard, Haugestad explained that the systems are subject to annual audits. The Banks Control Center (BSK), in addition to the major international card companies, conducts such audits.

The court does not find that there is reason to accept that the banks’ security systems are in doubt. Although the implementation of a system necessarily involves opportunities for errors, the court cannot see that this involves significant practical risk for customers with cards.

1For a translation into English, see (2012) 9 Digital Evidence and Electronic Signature Law Review 117; Nuth, ‘Unauthorized use of bank cards with or without the PIN’.

5.124 If the purpose of a trial is to test the evidence, should a judge assume that the standard security systems used by the bank were effective in the absence of any evidence? Should a judge accept untested assurances that audits actually take place, not knowing whether such audits are conducted internally or by the Banks Control Center, whether the audits revealed problems that might affect the systems for ATMs and PINs, or whether the audits were conducted by people with appropriate qualifications? Given the lack of such evidence, can a judge conclude that there was no reason to doubt the bank’s security systems could be at fault?¹ For instance, it has been demonstrated that independent external examination continues to validate and approve devices and cryptographic software code that are open to failure and subversion.²

1Ken Lindup, ‘Technology and banking’ (2012) 9 Digital Evidence and Electronic Signature Law Review 91.

2Steven J. Murdoch, Mike Bond and Ross Anderson, ‘How certification systems fail: lessons from the Ware Report’ (2012) 10(6) IEEE Security & Privacy 40; Kim Zetter, ‘In legal first, data-breach suit targets auditor’, Wired, 16 February 2009 – the case mentioned in this article was Merrick Bank Corporation v Savvis, Inc., 2010 WL 148201 (for other references, see 2009 WL 2968844 (D.Ariz.) (Trial Motion, Memorandum and Affidavit) (5 June 2009); 2009 WL 4823623 (D.Ariz.) (Trial Motion, Memorandum and Affidavit) (7 July 2009); 2009 WL 4823624 (D.Ariz.) (Trial Motion, Memorandum and Affidavit)) – it is not clear what happened as a result of the legal action. It is probable that the case was settled after the court refused to dismiss the case. Another case, a class action, was initiated in the United States District Court Northern District of Illinois Eastern Division on 24 March 2014: Trustmark National Bank v Target Corporation, Case No 14-CV-2069, although it was reported that this action was subsequently withdrawn, for which see Jonathan Stempel, ‘Banks pull out of lawsuit vs Target, Trustwave over data breach’, Reuters, 1 April 2014.

5.125 Two observations are worthy of note: that standards¹ regarding aviation, space and medical devices are usually much more prescriptive that those used in other domains, and even within the aviation, space and medical industries a great deal of commercial software is developed against no formal process model at all. The relevant standard for medical devices is ‘ISO 13485:2003 Medical devices – Quality management systems – Requirements for regulatory purposes’ (now revised by ‘ISO 13485:2016 Medical devices – Quality management systems – Requirements for regulatory purposes’). This standard has historically placed much less focus on tracing the details of internal product structure than, for instance, DO-178B, Software Considerations in Airborne Systems and Equipment Certification, which is a guideline dealing with the safety of critical software to be used in certain airborne systems. Yet, although having software evaluated against standards is a laudable goal, it does not follow that, by conforming, errors are eliminated.²

1The use of standards is a topic of significant debate, because it is not always certain that they work to improve the quality of software. By way of example, see Patrick J. Graydon and C. Michael Holloway, Planning the Unplanned Experiment: Assessing the Efficacy of Standards for Safety Critical Software (NASA/TM - 2015 - 218804, September 2015), https://core.ac.uk/reader/42705578.

2Timothy J. Shimeall and Nancy G. Leveson, ‘An empirical comparison of software fault tolerance and fault elimination’ (1991) 17(2) Transactions on Software Engineering 173; P. B. Ladkin, ‘Opinion – taking software seriously’ (2005) 41(3) Journal of System Safety https://rvs-bi.de/publications/; Harold Thimbleby, Alexis Lewis and John Williams, ‘Making healthcare safer by understanding, designing and buying better IT’ (2015) 15(3) Clinical Medicine 258.

Summary

5.126 In summary, faults in software and errors relating to the design of software systems are exceedingly common.¹ And while defects in hardware have been relatively rare,² they are not unknown.³ Hardware is increasingly developed using high-level languages similar to those used for software. Furthermore, hardware is being released with firmware which may be reconfigured for other purposes. In addition, hardware faults can also be introduced by the improper use or configuration of software tools designed for developing hardware, which may themselves be error-prone. Like software, hardware errors, too, can be exploited to cause security failures.⁴

1Richard Cook, ‘How complex systems fail’, Cognitive Technologies Laboratory, University of Chicago (January 2002), https://www.researchgate.net/publication/228797158_How_complex_systems_fail/link/5caf748a299bf120975f697e/download; L. Strigini, ‘Fault tolerance against design faults’, in Hassan B. Diab and Albert Y. Zomaya (eds) Dependable Computing Systems: Paradigms, Performance Issues, and Applications (John Wiley & Sons 2005), 213–241.

2Such as the Pentium FDIV or ‘floating point’ error (strictly speaking, this was a software fault), although Intel could not fix the error other than by issuing a replacement, https://en.wikipedia.org/wiki/Pentium_FDIV_bug. Professor Thomas R. Nicely was the first to publicize this fault: Partridge, The Seductive Computer, 98, fn 8.

3Most complex integrated circuits in wide use will have published lists of ‘errata’ – for example, Intel publishes regular updates online.

4For an example, see Apostolos P. Fournaris, Lidia Pocero Fraile and Odysseas Koufopavlou, ‘Exploiting hardware vulnerabilities to attack embedded system devices: a survey of potent microarchitectural attacks’ (2017) 6 electronics, 52; Lucian Cojocar, Kaveh Razavi, Cristiano Giuffrida and Herbert Bos, ‘Exploiting correcting codes: on the effectiveness of ECC memory against Rowhammer’, in 2019 IEEE Symposium on Security and Privacy (SP), Volume 1 (IEEE 2019), 279–295.

5.127 Every part of a program is different, and must be independently correct. In the case of machines, there are two important differences: things are almost always continuous, and after a time the system is back where it started. When they are not continuous, problems always occur. For example, a wheel turns, and once it has turned, it is (not withstanding wear and tear) likely to be able to turn again. Each time it turns, it gets back to an indistinguishable state. This is called a symmetry. Symmetries are very general ideas. For example, if one moves a cup of coffee a foot to the left, it stays the same and works exactly as before. This is because the world we live in has translational symmetry – everything is the same if it is moved. Wheels have rotational symmetry, and so on. This means that almost all of the design decisions in mechanical devices ‘collapse’ because of symmetries, and there is not the exponential growth of cases that happens in software. On the other hand, no part of a software program is the same as any other part. Indeed, if it was, one would ask why it was so inefficiently designed. Thus there are no symmetries in software that amplify the ‘how it works’ thinking that so readily simplifies physical design. The other advantage of physical systems is that where a response is expected to be continuous, it can be verified. Where continuity holds, interpolation tells what the behaviour for any input will be. Digital systems do not have this helpful property.

5.128 In particular, it might be obvious that the behaviour of a stopwatch used by a policeman is the ‘same’ as the behaviour of the ‘same’ stopwatch presented in court as evidence, or in the laboratory where it was tested. Thanks to symmetries, moving a watch from the roadside to the laboratory does not change it. There is no symmetry to justify software adduced in court behaving as it did anywhere else. Software is not constrained, as any physical device is, to work in the universe with all its symmetries. Software does not obey any of them, and thanks to human error (known and unknown) in its design, its behaviour cannot be taken for granted.¹

1I owe this discussion to Professor Thimbleby.

5.129 Software will continue to be unreliable. By providing a general presumption of reliability to software, the law acts to reinforce the attitude of the software industry that the effects of poor-quality work remain the problem of the end user. In many circumstances, because the user can himself cause errors, the industry may seek to pin the blame on the user, further obfuscating the true origin and source of the errors.¹ For these reasons, it is rare for a customer to take legal action against the software supplier, let alone attempt such an action and be successful.²

1The various pressures are illustrated in Hechler, ‘Lost in translation?’ . David Hechler is the executive editor of Corporate Counsel magazine, and the American Society of Business Publication Editors awarded him the 2014 Stephen Barr Award for this article.

2For example, see the English cases of St Albans City and District Council v International Computers Limited [1996] 4 All ER 481, [1996] 7 WLUK 443, [1997–98] Info TLR 58, [1997] FSR 251, (1996) 15 Tr LR 444, [1998] Masons CLR Rep 98, (1997) 20(2) IPD 20020, Times, 14 August 1996, [1996] CLY 1218 and Kingsway Hall Hotel Ltd v Red Sky IT (Hounslow) Ltd [2010] EWHC 965 (TCC), [2010] 5 WLUK 106, (2010) 26 Const LJ 542, [2011] CLY 2777; Alison White, ‘Caveat vendor? A review of the Court of Appeal decision in St Albans City and District Council v International Computers Limited’, Commentary 1997 (3) The Journal of Information, Law and Technology JILT, https://warwick.ac.uk/fac/soc/law/elj/jilt/1997_3/white/#Salvage. Elizabeth MacDonald considered the position in contract, giving a number of examples in her article ‘Bugs and breaches’ (2005) 13(1) Intl J L & Info Tech118. National Air Traffic Services initiated action against Electronic Data Systems Ltd, although the outcome is not certain. For an appeal against an application to amend the reply and defence to counterclaim, see Electronic Data Systems Ltd v National Air Traffic Services [2002] EWCA Civ 13, [2002] 1 WLUK 128 – Professor Ladkin indicated that the software development could fail, for which see Memorandum by Professor Peter B. Ladkin (ATC 20) submitted to the Select Committee on Environment, Transport and Regional Affairs Fourth Report (ordered by the House of Commons to be printed 27 March 1998), http://www.publications.parliament.uk/pa/cm199798/cmselect/cmenvtra/360-e/36082.htm.

5.130 This discussion apart, the central issue for lawyers is dealing with the presumption that a computer is working properly. The following summary of the problems of software by Professor Partridge help to remind us of the landscape:

IT systems are everywhere, and will continue to infiltrate the lives of all of us.

We cannot easily check that an IT system is computing correctly.

IT systems all fail: sometimes immediately and spectacularly, sometimes unobtrusively just once in a while, and sometimes in any combination of these two extremes.

IT-system failures vary from production of blatantly incorrect results to failure to produce a desired result.

The interplay of a variety of causes means that all large IT systems are unmanageably complex.

IT-system complexity is discrete complexity rather than complexity based on continua.

If, by chance (combined with exemplary practice and much effort), an IT system is constructed with no possibility of failure behaviour, we can never know this.¹

1Partridge, The Seductive Computer, 9.

5.131 This poses a question for lawyers, experts and the courts: how the reliability of software should be reviewed in a court of law.

Challenging ‘reliability’

5.132 When seeking to challenge the underlying software of a computer or computer-like device,¹ lawyers frequently have great difficulty in overcoming the presumption that a machine is working properly, although general assertions about the failure of software code are often made without providing any foundation for the allegations. This problem is compounded when a party refuses to deliver up relevant evidence, usually citing confidentiality as the reason for the refusal, and relying, directly or indirectly,² on the presumption that a computer is ‘reliable’. In such circumstances, it is difficult to convince a judge to order the disclosure of relevant data.

1Including machines controlled by software, often called ‘robots’. For examples of people killed by machines controlled by software, see the following illustrative articles: Stephen S. Wu, Summary of Selected Robotics Liability Cases (19 October 2010), https://ftp.documation.com:8443/references/ABA10a/PDfs/2_5.pdf; Woodrow Barfield, ‘Liability for autonomous and artificially intelligent robots’ (2018) 9 Paladyn, J Behav Robot 193; Emilie C. Schwarza, ‘Human vs. machine: a framework of responsibilities and duties of transnational corporations for respecting human rights in the use of artificial intelligence’ (2019) 58 Colum J Transnat’l L 232. Robert N. Williams appears to have been the first person to be killed by a robot machine that was the subject of legal proceedings: Williams v Litton Systems, Inc., 422 Mich. 796 (1985); Williams v Litton Systems, Inc., 164 Mich.App. 195, 416 N.W.2d 704 (1987); Williams v Unit Handling Systems Div. of Litton Systems, Inc., 433 Mich. 755, 449 N.W.2d 669 (1989), and in 1987 seven-year-old Barton Griffin was the first person to be killed because of a defect in a programmable read-only memory chip installed in a 2500 series Chevrolet pickup truck that caused the vehicle to stall. Another vehicle struck the pickup truck, killing the driver’s grandson: General Motors Corporation v Johnston, 592 So.2d 1054 (1992).

2The use of the word ‘robust’ is one such device, for which see Ladkin, ‘Robustness of software’.

5.133 Yet, paradoxically, it is a well-known fact in the industry that software could hardly be said to be ‘reliable’, as noted by Steyn J in Eurodynamic Systems Plc v General Automation Ltd:

The expert evidence convincingly showed that it is regarded as acceptable practice to supply computer programmes (including system software) that contain errors and bugs. The basis of the practice is that, pursuant to his support obligation (free or chargeable as the case may be), the supplier will correct errors and bugs that prevent the product from being properly used.¹

1(6 September 1988, not reported), QBD, 1983 D 2804 at [5.a]; also see CL & P 1988, 5(2), 8.

5.134 Professor Matt Blaze reinforces this view:

It is a regrettable (and yet time-tested) paradox that our digital systems have largely become more vulnerable over time, even as almost every other aspect of the technology has (often wildly) improved.

…

Modern digital systems are so vulnerable for a simple reason: computer science does not yet know how to build complex, large-scale software that has reliably correct behaviour.¹ This problem has been known, and has been a central focus of computing research, since the dawn of programmable computing. As new technology allows us to build larger and more complex systems (and to connect them together over the internet), the problem of software correctness becomes exponentially more difficult.² [Footnote 2 is at this point, and is reproduced below.]

Footnote 2:

That is, the number of software defects in a system typically increases at a rate far greater than the amount of code added to it. So adding new features to a system that makes it twice as large generally has the effect of making [it] far more than twice as vulnerable. This is because each new software component or feature operates not just in isolation, but potentially interacts with everything else in the system, sometimes in unexpected ways that can be exploited. Therefore, smaller and simpler systems are almost always more secure and reliable, and best practices in security favor systems [that have] the most limited functionality possible.³

1It should be noted that computer scientists have invented many ways to achieve this, and some companies use these methods to prove mathematically that their systems cannot fail at runtime – but the software will be running on a computer with unreliable hardware, other firmware and software and user interfaces, which might mean that the program might be ‘right’, but when interacting with the other components, can lead to a lethal failure. Also, we need to be aware that what is being proved is not that the systems do what is desired, but that the systems meet a formal statement of the requirements. The original requirements cannot by themselves be proved to be correct, or that the formal software requirements meet the constraints of the real world. There are limits to what formal methods can do, and those limits are not widely acknowledged. B. Littlewood and L. Strigini, ‘Validation of ultrahigh dependability for software-based systems’ (1993) 36(11) Communications of the ACM 69, http://openaccess.city.ac.uk/1251/1/CACMnov93.pdf.

2It is not clear whether ‘exponentially’ means that the rate of growth is proportional to the amount present, or whether the word is used loosely to mean ‘growing rapidly’.

3Dr Matt Blaze, Testimony, ‘Encryption Technology and Potential US Policy Responses’ before the Subcommittee on Information Technology of the Committee on Oversight and Government Reform House of Representatives, 114 Congress 1st session, Wednesday, April 29, 2015 (Serial No. 114–143), https://www.govinfo.gov/content/pkg/CHRG-114hhrg25879/pdf/CHRG-114hhrg25879.pdf.

5.135 The late Professor Lawrence Bernstein and C. M. Yuhas also acknowledged this observation:

Software developers know that their systems can exhibit unexpected, strange behaviour, including crashes or hangs, when small operational differences are introduced.¹ These may be the result of new data, execution of code in new sequences or exhaustion of some computer resource such as buffer space, memory, hash function overflow space or processor time.²

1This is a consequence of discrete complexity, or digital complexity.

2Lawrence Bernstein and C. M. Yuhas, ‘Design constraints that make software trustworthy’, IEEE Reliability Society 2008 Annual Technology Report, 3, https://rs.ieee.org/images/files/Publications/2008/2008-25.pdf; Ali Mili and Fairouz Tchier, Software Testing Concepts and Operations (John Wiley & Sons, Inc. 2015).

5.136 Finally, companies that write software code include a contract term in the software licence that makes it clear that writers of software code are not perfect. Here is an example:

The Licensee acknowledges that software in general is not error free and agrees that the existence of such errors shall not constitute a breach of this Licence.

5.137 This section aims to provide a broad outline of the problems relating to computers and computer-like devices experienced by different industries, and to illustrate the importance of software and how there may be times when the output of a computer may not necessarily be ‘reliable’ and is therefore not to be trusted. Software code should be open to scrutiny, and should not necessarily share the benefit of a presumption of ‘reliability’ that is incapable of being effectively challenged.

5.138 One of the problems with understanding the role of the presumption is that people fail to distinguish software from computer systems. Computers are merely devices that are remarkable in that they can be turned to do many tasks rather than being limited to a single purpose. In order to perform a useful purpose, they must be instructed by software. A computer and its software together can be taken to form a system. No machine is ‘reliable’ or ‘unreliable’ in an absolute sense. Machines may be more or less reliable. The term ‘reliable’ in everyday use is an abbreviation of what in technical terms is ‘reliable enough for the intended purpose’. All machines have some probability of failing, so none is ‘reliable’ in the sense that one can rely on it without any doubt, while many are reliable enough (their probability of failing to perform correctly at any one use is small enough) to be worth using. The problem with using the word ‘reliable’, as though reliability were a binary quality, is that we risk taking it to mean ‘reliable enough’ without allowing for the fact that what is ‘enough’ depends on the use to which we put the machine, or rather its outputs. For instance, a machine may be reliable enough to be worthwhile in everyday use, and yet not reliable enough to use as evidence in a specific case. The speedometer in a motor car is reliable enough to use as an aid for driving at reasonable speed, because this level of reliability is sufficient for the purpose. In such circumstances, precision is not necessary. Compare this to instruments in an aircraft: the same level of reliability could be catastrophic. It is not a matter of whether or not the instrument is ‘reliable’, but of ‘how reliable’ it is.

5.139 It follows that lay people are not aware of the inherent design faults, and trust their personal experience to reassure themselves that computers are ‘reliable’ machines. Yet lay users regularly experience problems with devices, which illustrates their failure to grasp that ‘reliability’ and software code are impossible to guarantee.¹

1David Harel, Computers Ltd. What They Really Can’t Do (Oxford University Press, 2003); see also Neumann, Computer Related Risks, and his website, which is continually updated: http://www.csl.sri.com/users/neumann/insiderisks.html; see also the list of software failures on the web site of Nachum Dershowitz, School of Computer Science, Tel Aviv University, http://www.cs.tau.ac.il/~nachumd/horror.html.

5.140 Lay people are not the only people to make this mistake. This is illustrated by the judicial claim that computers are ‘reliable’ because in current times their use is widespread. Villanueva JAD made just such an assertion without providing any evidence to sustain his claim that computers are ‘presumed reliable’ in the case of Hahnemann University Hospital v Dudnick:

Clearly, the climate of the use of computers in the mid-1990’s is substantially different from that of the 1970’s. In the 1970’s, computers were relatively new, were not universally used and had no established standard of reliability. Now, computers are universally used and accepted, have become part of everyday life and work and are presumed reliable.¹

1292 N.J.Super. 11, 678 A.2d 266 (N.J.Super.A.D. 1996), 268. See Ivars Peterson, Fatal Defect: Chasing Killer Computer Bugs (Random House 1996) to demonstrate the opposite.

5.141 This observation by Villanueva JAD was made in the same year as the failure of the software that caused the Ariane 5 rocket to be destroyed shortly after take-off.

5.142 That computers are deemed to be ‘reliable’ because they are used more frequently now than when they were first developed is a poor substitute for a rigorous understanding of the nature of computers and their software.¹ However, it is accepted that long-term use can be an important element of justified trust in a software system. This comes about because there might be a long history of valuable and seemingly error-free use, but also because the long-term user typically gets to know the idiosyncrasies of the system.

1That software is ‘reliable’ has been comprehensively demonstrated to be incorrect: Ladkin and others, ‘The Law Commission presumption concerning the dependability of computer evidence’; Jackson, ‘An approach to the judicial evaluation of evidence from computers and computer systems’.

Aviation

5.143 Errors in aviation software can have disastrous, or near disastrous, consequences. They can be caused by something as simple as bad coding. By way of example, consider the F-22A Raptor advanced tactical fighter, which entered service with the US Air Force in 2005. In February 2007, 12 of these aircraft were flying from Hickham AFB in Hawaii to Kadena AB on Okinawa. All of the aircraft experienced simultaneous and total software failure in their navigational console when their longitude shifted from 180 degrees West to 180 degrees East. The jets were accompanied by tanker planes, which meant the pilots in the tankers were able to guide the jets back to Hawaii. Major General Don Sheppard spoke about the problem on CNN on 24 February 2007. The relevant part of the transcript is set out below:

Maj. Gen. Don Sheppard (ret.): … At the international date line, whoops, all systems dumped and when I say all systems, I mean all systems, their navigation, part of their communications, their fuel systems. They were – they could have been in real trouble. They were with their tankers. The tankers – they tried to reset their systems, couldn’t get them reset. The tankers brought them back to Hawaii. This could have been real serious. It certainly could have been real serious if the weather had been bad. It turned out OK. It was fixed in 48 hours. It was a computer glitch in the millions of lines of code, somebody made an error in a couple lines of the code and everything goes.

[...]

SHEPPERD: Absolutely. When you think of airplanes from the old days, with cables and that type of thing and direct connections between the sticks and the yolks [sic] and the controls, not that way anymore. Everything is by computer. When your computers go, your airplanes go. You have multiple systems. When they all dump at the same time, you can be in real trouble. Luckily this turned out OK.

John Roberts, CNN anchor: What would have happened General Shepperd if these brand-new $120 million F-22s had been going into battle?

SHEPPERD: You would have been in real trouble in the middle of combat. The good thing is that we found this out. Any time – before, you know, before we get into combat with an airplane like this. Any time you introduce a new airplane, you are going to find glitches and you are going to find things that go wrong. It happens in our civilian airliners. You just don’t hear much about it but these things absolutely happen. And luckily this time we found out about it before combat. We got it fixed with tiger teams in about 48 hours and the airplanes were flying again, completed their deployment. But this could have been real serious in combat.

ROBERTS: So basically you had these advanced air – not just superiority but air supremacy fighters that were in there, up there in the air, above the Pacific Ocean, not much more sophisticated than a little Cessna 152 only with a jet engine.

SHEPPERD: You got it. They are on a 12 to 15-hour flight from Hawaii to Okinawa, but all their systems dumped. They needed help. Had they gotten separated from their tankers or had the weather been bad, they had no attitude reference. They had no communications or navigation. They would have turned around and probably could have found the Hawaiian Islands. But if the weather had been bad on approach, there could have been real trouble. Again, you get refueling from your tankers. You don’t run – you don’t get yourself where you run out of fuel. You always have enough fuel and refueling nine, 10, 11, 12 times on a flight like this where you can get somewhere to land. But again, attitude reference and navigation are essential as is communication. In this case all of that was affected. It was a serious problem.¹

1‘F-22 Squadron Shot Down by the International Date Line’, Defense Industry Daily, 1 March 2007, at http://www.defenseindustrydaily.com/f22-squadron-shot-down-by-the-international-date-line-03087/; Lewis Page, ‘US superfighter software glitch fixed’, The Register, 28 February 2007.

5.144 In practice, this means that most commercially produced software will have thousands of undetected defects.¹

1For software defects generally, see Brooks, The Mythical Man-Month and a discussion by Professor Les Hatton substantiates the broad range quoted here: Some Notes on Software Failure (Addison-Wesley Professional 2001). See also Nindel-Edwards and Steinke, ‘Ethical issues in the software quality assurance function’, article 6.

5.145 In conventional flight control, the flight control commands from the cockpit are conveyed mechanically through steel cables or pushrods, often servo-assisted, to hydraulic actuators which then physically move the aerodynamic control surfaces on the wings and tailplane. In ‘fly-by-wire’, the flight control commands are converted to electrical signals transmitted by wires to the control surface actuators (in some cases in modern fly-by-wire aircraft the actuators may also be electric). Flight control is completely intermediated by software code, so a more accurate description would now be ‘fly-by-software-code’. Besides fly-by-wire, the autopilot and flight management systems of even conventionally controlled aircraft are software-based. The more reliable and functional the autopilot and flight management systems software have become, the more pilots have relied on them, even to the detriment of their piloting skills, as demonstrated by a number of accidents and ensuing loss of life. Accidents involving aircraft can exhibit a series of anomalous pilot–system interactions, and aviation regulations and investigators, with few exceptions, tend to assign the responsibility for the results of those interactions ultimately to the pilots.¹ This is so even in circumstances where it is clear that the software code and the system design are so faulty that a human being is not able to respond correctly – or with sufficient speed. In the case of American Airlines Flight 965 near Cali, Colombia, on 20 December 1995,² 151 passengers and all of the cabin crew members died in the crash. In this case, a significant error occurred, as explained by Highsmith DJ:

American Airlines predicates its claims on Honeywell’s role as supplier of the Flight Management Computer (FMC) used on Flight 965 and Jeppesen’s role in furnishing the navigational database programmed into the FMC and the corresponding aviation charts. Without making any findings in this regard but simply reflecting the narrative contained in Judge Marcus’ summary judgment opinion, the Court notes that, on the approach to Cali, the pilots entered ‘R’ into the FMC, anticipating (based on the aviation charts) that this cipher corresponded to a beacon designated as ‘Rozo’. Instead, another beacon designated as ‘Romeo’ was activated. This resulted in a change of the aircraft’s heading to the east, over the Andes mountains. When the pilots became aware of the aircraft’s easterly swing, they turned back to the west, in the direction of the valley where the Cali airport is located. Sadly, since the aircraft had been descending during these directional changes, Flight 965 never made it back to the valley. It crashed into the side of a mountain.³

1Bill Palmer, Understanding Air France 447 (Print edition v1.05, 2013), 179 and Safety Alert for Operators, issued by the U.S. Department of Transportation, Federal Aviation Administration (SAFO 13002 1/4/13), https://www.faa.gov/other_visit/aviation_industry/airline_operators/airline_safety/safo/all_safos/media/2013/SAFO13002.pdf; Susan Carey, ‘American Airlines flight delays continue as pilot iPad app glitch is fixed’, Wall Street Journal, 29 April 2015, http://www.wsj.com/articles/american-airline-flight-delays-continue-as-pilot-ipad-app-glitch-is-fixed-1430335366; Alex Hern, ‘App fail on iPad grounds “a few dozen” American Airlines flights’, The Guardian, 29 April 2015, https://www.theguardian.com/technology/2015/apr/29/apple-ipad-fail-grounds-few-dozen-american-airline-flights.

2In Re Air Crash Near Cali, Colombia on December 20, 24 F.Supp.2d 1340 (1998).

3At 1342 (footnotes omitted).

5.146 The critical importance of verifying the design of aviation software based on industry standards was noted in the Aviation Occurrence Investigation Final Report: In-flight upset 154 km west of Learmonth.¹ In this case, a problem with the software controlling the aeroplane was the cause of the accident. In this investigation report, the authors cited text relating to software requirements from Software Considerations in Airborne Systems and Equipment Certification,² produced by the Radio Technical Commission for Aeronautics:

DO-178A [now DO-178C] provided high-level guidance for the generation of software requirements, the verification that the resulting design met the requirements, and validation that the requirements were adequate. It also noted that for systems that performed certain critical and essential functions:

... it may not be possible to demonstrate an acceptably low level of software errors without the use of specific design techniques. These techniques, which may include monitoring, redundancy, functional partitioning or other concepts, will strongly influence the software development program, particularly the depth and quality of the verification and validation effort ...

NOTE: It is appreciated that, with the current state of knowledge, the software disciplines described in this document may not, in themselves, be sufficient to ensure that the overall system safety and reliability targets have been achieved. This is particularly true for certain critical systems such as full authority fly-by-wire. In such cases it is accepted that other measures, usually within the system, in addition to a high level of software discipline may be necessary to achieve these safety objectives and demonstrate that they have been met.³

1WA 7 October 2008 VH-QPA Airbus A330-303 (ATSB Transport Safety Report, AO-2008-070).

2(DO-178A, SC-152, issued on 22 March 1985 and up-dated regularly), http://www.rtca.org.

3At 2.3.5.

5.147 Perhaps it is not necessary to indicate that the Boeing 737 Max crashes that killed 346 people (189 on Lion Air Flight 610 and 157 on Ethiopian Airlines Flight 302), were, it appears, caused by a hardware–software interaction.¹ What is pertinent is that the problems originated in design changes that were apparently small and presumed to be unlikely to make any significant difference to the system’s behaviour, and were intended to make the new system appear to the users like the old system.²

1Final Aircraft Accident Investigation Report, PT. Lion Mentari Airlines, Boeing 737-8 (MAX); PK-LQP 29 October 2018 (October 2019), http://knkt.dephub.go.id/knkt/ntsc_aviation/baru/2018%20-%20035%20-%20PK-LQP%20Final%20Report.pdf; Aircraft Accident Investigation Bureau Interim Report on Accident to the B737-8 (MAX) Registered ET-AVJ operated by Ethiopian Airlines on 10 March 2019 (AI-01/19 9 March 2020), https://reports.aviation-safety.net/2019/20190310-0_B38M_ET-AVJ_Interim.pdf; a number of internal Boeing documents about this have been released, with a significant number of derogatory comments about this issue made by employees: https://archive.org/details/boeingemailsocr; the US House Committee on Transportation & Infrastructure provides a list of resources dealing with their investigation at https://transportation.house.gov/committee-activity/boeing-737-max-investigation; Gregory Travis, ‘How the Boeing 737 Max disaster looks to a software developer’, IEEE Spectrum, 18 April 2019, https://spectrum.ieee.org/aerospace/aviation/how-the-boeing-737-max-disaster-looks-to-a-software-developer; Final Committee Report The Design, Development & Certification of the Boeing 737 MAX (The House Committee on Transportation & Infrastructure, September 2020), https://transportation.house.gov/imo/media/doc/2020.09.15%20FINAL%20737%20MAX%20Report%20for%20Public%20Release.pdf.

2Joint Authorities Technical Review, Boeing 737 MAX Flight Control System Observations, Findings, and Recommendations (11 October 2019), VI (item 4), XI (item 9), https://www.faa.gov/news/media/attachments/Final_JATR_Submittal_to_FAA_Oct_2019.pdf.

Financial products

5.148 In August 2006, the rating agency Moody’s gave constant proportion debt obligations (CPDOs) an AAA rating, which was close to making an investment in a CPDO free of risk.¹ In comparison, a competing rating agency, Fitch, could not understand why such a high rating was given to such ‘investments’, because its own models put CPDOs at almost the grade of ‘junk’.² It transpired that the software used by Moody’s for the purpose of rating CPDOs had a number of faults. A fault was found in early 2008 that, when corrected, failed to give the AAA rating, increasing the likelihood of defaults. The rating committee failed to disclose the error to investors or clients, and although the error was eventually corrected, other changes were made to the code to ensure the AAA rating continued to be assigned.³ A subsequent external investigation by the law firm Sullivan & Cromwell established that members of staff had engaged in conduct contrary to Moody’s Code of Professional Conduct.⁴ Moody’s subsequently received a ‘Wells Notice’⁵ from the Securities and Exchange Commission (SEC) on 18 March 2011.⁶ The Division of Enforcement of the SEC later issued a Report of Investigation into the matter.⁷ In a section of the Report, there was an examination of the attitude of the people responsible for dealing with the software error. It is revealing, and it merits setting out in full:

B. Rating Committee Conduct

MIS subsequently held several internal rating committee meetings in France and the United Kingdom to address the coding error. MIS corrected the coding error on February 12, 2007, but made no changes to the outstanding credit ratings for CPDO notes at that time. Internal e-mails show that committee members were concerned about the impact on MIS’s reputation if it revealed an error in the rating model. A January 24, 2007, e-mail from a rating committee member to the Team Managing Director chairing the committee stated:

In this particular case we seem to face an important reputation risk issue. To be fully honest this latter issue is so important that I would feel inclined at this stage to minimize ratings impact and accept unstressed parameters that are within possible ranges rather than even allow for the possibility of a hint that the model has a bug.

On April 27, 2007, after additional analysis, the rating committee voted not to downgrade the affected credit ratings for the CPDO notes. The committee members felt that because the CPDO notes were generally performing well there would be no ostensible justification for downgrading the credit ratings, absent announcing the coding error. In declining to downgrade the credit ratings, the committee considered the following inappropriate non-credit related factors: (i) that downgrades could negatively affect Moody’s reputation in light of ongoing negative media focus in Europe on Moody’s Joint Default Analysis; (ii) that downgrades could impact investors who relied on the original ratings; and (iii) the desire not to validate the criticisms of Moody’s ratings of CPDOs that had been made by a competitor and covered in the local media. The committee was comprised of senior level staff, including two Team Managing Directors, two Vice President-Senior Credit Officers, and a Vice President-Senior Analyst.

1For the broader picture, see Charles W. Calomiris and Stephen H. Haber, Fragile by Design: The Political Origins of Banking Crisis and Scarce Credit (Princeton University Press 2014), 266–269.

2The same scepticism was expressed by Richard Beales, Saskia Scholtes and Gillian Tett with Paul J. Davies, ‘Failing grades? Why regulators fear credit rating agencies may be out of their depth’ Financial Times, 17 May 2007, 13.

3This was revealed by Sam Jones, Gillian Tett and Paul J. Davies, ‘Moody’s error gave top ratings to debt products’ Financial Times, 20 May 2008.

4Sam Jones, ‘When junk was gold’ FT Weekend, 18/19 October 2008, 16–22.

5A ‘Wells Notice’ is a letter sent by a securities regulator to a prospective respondent, notifying him of the substance of charges that the regulator intends to bring against the respondent, and affording the respondent with the opportunity to submit a written statement to the ultimate decision maker.

6Phil Wahba, ‘UPDATE 2-Moody’s says got Wells Notice from SEC’, Reuters, 7 May 2010.

7Release No. 62802/31 August 2012, https://www.sec.gov/litigation/investreport/34-62802.htm.

5.149 Because the rating committee met in France and the UK and not in the US, the SEC declined to take any further action, ‘[b]‌ecause of uncertainty regarding a jurisdictional nexus to the United States in this matter’.

5.150 Although the SEC declined to take action in this case, it did take action against AXA Rosenberg Group LLC, AXA Rosenberg Investment Management LLC and Barr Rosenberg Research Center LLC. In this instance, an employee discovered an error in the computer code of a quantitative investment model used to manage client portfolios. The employee brought the matter to the attention of senior management, but was told to keep quiet about the error and not to inform others about it. The error adversely affected 608 of 1,421 client portfolios managed by AXA Rosenberg Investment Management and caused US$216,806,864 in losses. Cease-and-desist proceedings were instituted and the respondents were jointly and severally ordered to pay a civil money penalty in the amount of US$25 million to the US Treasury.¹

1The order is available at https://www.sec.gov/litigation/admin/2011/33-9181.pdf.

5.151 Another example that might be considered to be mundane is that of software systems for the use of stockbrokers. Stockbrokers used to be regulated by the Financial Services Authority (FSA) (now by the Financial Conduct Authority), and were required to conduct their business in accordance with relevant legislation and the rules laid out by the FSA. Failure to follow the rules may have caused the FSA to take disciplinary action against the firm. In the case of SAM Business Systems Limited v Hedley and Company (sued as a firm),¹ the partners of Hedley used to handle their stockbroking business with a system known as ANTAR, but late in 1999 they decided it might not work after the century date change, so they decided to buy a new product from SAM, a small software company whose only product was an item of software known as InterSet. SAM claimed this product was a ready-made package of software modules made by SAM for stockbrokers and others (such as banks) dealing in stocks and shares in administering their systems. Hedley agreed to buy the new system, but immediately after it went live, serious problems were apparent, many of which were fixed, some speedily. (The word ‘fix’ is the telling word here: a local fix within a large and complex piece of software often generates problems elsewhere.) Hedley continued to use InterSet, but problems persisted. Eventually, they decided to find another product for their purposes. In his judgment, Judge Bowsher QC discussed the issue of defaults in software:

The point has frequently been made during the trial that InterSet works well elsewhere (and I have received evidence from stockbrokers, Hoodless Brennan to that effect) and accordingly it is said, if it did not work for Hedley’s there must be something wrong with Hedley’s method of working. That line of argument has prompted me to ask, (a) if it is a tried and tested system, why when supplied to Hedley’s did it have admitted bugs? (b) what is the difference between a bug and a defect?²

1[2002] EWHC 2733 (TCC), [2003] 1 All ER (Comm) 465, [2002] 12 WLUK 550, [2003] Masons CLR 11, (2003) 147 SJLB 57, [2003] CLY 3616.

2[2002] EWHC 2733 (TCC) at [19].

5.152 The full nature of the problems encountered with this software that purported to be written for the specific purpose for which it was supplied merits setting out in full:

To complete the history, I must mention a document produced at my request as Exhibit C2. During the evidence of Mr. Whitehouse, I asked for a copy of a timesheet to which he had referred. That is a timesheet of ‘maintenance activity’ for which no charge was made. That document had not been disclosed until I asked for it. It is a document of 10 pages. I have not counted each item, but there are about 35 items on each of the first 9 pages and 16 on the last page. According to the claimants, the hours worked amount to 785.25. The period of time covered by the document is from 4 January 2000 to 7 February, 2001. The majority of those items appear to be efforts to fix defects. The fact that no charge was made suggests that all items fall into that category. I am not going to go through all of that document, but I will take one example. On 12 January, 2001, there is an entry, ‘Analysing the problems with Hedley contract report … problem actually with contract form not the report’. On 15 January a temporary fix was prepared. On 15, 16 and 17 January over 17 hours are recorded working on this problem. Then on 17 January there is another entry, ‘Attempting to find the reason for the intermittent bad contracts. Not found yet’. On 18 January, 2001, there is an entry, ‘Attempting to find the reason for the intermittent bad contracts. The reason appears to be conflicting requirements of procedures. Needs deeper understanding of form’. There were then further entries for modifications to put the problem right on 19, 23, 24, 25 and 26 January, 2001. More work was done on the same problem on 5, 7, and 9 February, 2001. On 5 February, 2001, changes were made, ‘To prevent contracts being saved where the values do not add up’. Through February, 2001 there was a series of calls to deal with a problem with split deals commission. In mid April, 2001 there was a problem with trial balances. It is quite clear from that document, produced only under pressure during the trial, as well as from all the other evidence to which I have referred, that InterSet as delivered to Hedley’s was never in satisfactory working order.¹

1[2002] EWHC 2733 (TCC) at [128].

5.153 Two experts were appointed to give evidence in this case, and they signed an agreement which was, in fact, a schedule of defects alleged by Hedley with comments on each defect from SAM. This schedule of faults ran to 34 pages. Judge Bowsher QC offered some pertinent comments in relation to the attitude of the software supplier in this case:

SAM, like some others in the computer industry seem to be set in the mindset that when there is a ‘bug’ the customer must pay for putting it right. Bugs in computer programmes are still inevitable, but they are defects and it is the supplier who has the responsibility for putting them right at the supplier’s expense.¹

1[2002] EWHC 2733 (TCC) at [165].

Motor vehicles

5.154 Software can be manipulated to give whatever reading the writer wishes. Because software is presumed to be ‘reliable’, software that gives deliberate false data is also presumed to be ‘reliable’. It is well known that traffic lights are now generally controlled by software code across a network, and the code can be written in such a way as to break the law. Stefano Arrighetti, an engineering student from Genoa, is reported to have developed the T-Redspeed traffic light system in Italy. The traffic lights were apparently programmed to remain on amber before turning to red for less than the time set out in regulations.¹

1Peter Popham, ‘Smart traffic lights rigged to trap drivers’ The Independent (30 January 2009); Jacqui Cheng, ‘Italian red-light cameras rigged with shorter yellow lights’, Ars Technica (2 March 2009), https://arstechnica.com/tech-policy/2009/02/italian-red-light-cameras-rigged-with-shorter-yellow-lights/.

5.155 The ‘sudden unintended acceleration’ incidents involving the unintended, unexpected and uncontrolled acceleration of modern vehicles with electronic controls raises the issue of the reliability of complex electronic vehicle systems.¹ Consider the prosecution of Ann Diggles, aged 82, who was found not guilty at Preston Crown Court (R v Ann Diggles T20157203 before Mr Justice Fraser) for causing death by dangerous driving and death by careless driving when her Nissan Qashqai hit and killed Julie Dean, aged 53, while she was attempting to park.² The prosecution’s case was that the driving of Mrs Diggles caused the accident. The prosecution relied on the evidence from the motor car manufacturer, as reported by the BBC:³

Takuma Nakamura, who is responsible for engine control systems development at Nissan, was asked by prosecutor Richard Archer: ‘Is it possible, in your opinion, for a malfunction in an electronic throttle to cause sudden acceleration of the vehicle?’

Mr Nakamura replied: ‘I think that’s impossible’.⁴

1For a general outline of the case law in the USA, see Maria N. Maccone, ‘Litigation concerning sudden unintended acceleration’ 132 Am Jur Trials 305 (Originally published in 2013) (December 2020 Update); see also Philip Koopman, ‘Practical experience report: automotive safety practices vs. accepted principles’ Safecom 2018, http://safeautonomy.blogspot.com/2018/09/automotive-safety-practices-vs-accepted.html; Professor Koopman maintains a list of potentially deadly automotive software defects at https://betterembsw.blogspot.com/2018/09/potentially-deadly-automotive-software.html.

2‘Driver cleared over fatal Nissan Qashqai crash’, BBC News, 7 February 2017, http://www.bbc.co.uk/news/uk-england-lancashire-38897681; ‘Nissan cars “sped” without accelerator use, court hears’, BBC News, 6 February 2017, http://www.bbc.co.uk/news/uk-england-lancashire-38885809; ‘Driver who killed woman denies mistaking accelerator for brake’, BBC News, 2 February 2017, http://www.bbc.co.uk/news/uk-england-lancashire-38846896.

3We only have reports from the media to rely on.

4‘Nissan boss denies malfunction caused fatal crash’, BBC News, 31 January 2017, http://www.bbc.co.uk/news/uk-england-lancashire-38814890.

5.156 The expert witness for the defence was Dr Antony F. Anderson CEng FIEE. Dr Anderson pointed out the following:

A mechanical inspection of the vehicle was carried out. A Nissan garage, on the instruction of the police, downloaded diagnostic trouble codes. The police constable who witnessed the diagnostic testing took a screen shot with his camera that showed three trouble codes. Two of these were past codes of no significance, but one was a current U1000 trouble code. The U1000 code, as I understand it, signifies that there had been a CAN Bus malfunction lasting more than 2 seconds sometime in the ignition cycle during which the incident occurred. Mr Nakamura, the senior engineering manager from Nissan Japan, who was sent over to give evidence in the trial, implied that the trouble code was of no significance.¹

1Email communication with the author.

5.157 In addition to the evidence from Dr Anderson, two other women came forward at a late stage in the trial to give evidence that they had also had identical experiences. The evidence was that Mrs Diggles and the other two witnesses had their vehicles fully serviced in line with the manufacturer’s recommendations.¹ The evidence put before the members of the jury is not readily available, which means it can only be observed that deaths and injuries appear to occur as a result of software failure. It would be of interest to know how the police and prosecution assessed the evidence, including the complexities between the software code and the mechanical and electronic systems.

1Gabriella Swerling, ‘”Runaway car” driver cleared over road death’ The Times (8 February 2017), 8.

5.158 Another prominent example involves Toyota and Lexus motor vehicles, some of which have involved deaths of drivers and their passengers.¹ Michael Barr, in giving expert evidence for the plaintiffs in the case of Bookout v Toyota Motor Corporation,² stated that:

A. The Toyota’s design actually they have an abysmal design, not just unreasonable in my view, but I use the word abysmal. This was actually the first chapter of my report I wrote because I couldn’t believe what I was seeing.

Toyota has a watchdog supervisor design that is incapable of ever detecting the death of a major task. That’s its whole job. It doesn’t do it. It’s not designed to do it.

It also, the thing it does in Toyota’s design is lookout for CPU overload, and it doesn’t even do that right. CPU overload is when there’s too much work in a burst, a period of time to do all the tasks. If that happens for too long, the car can become dangerous because tasks not getting to use the CPU is like temporarily tasks dying.

And in Toyota’s watchdog you can have any overload going up to one and a half seconds, which at 60 miles an hour I calculated is about the length of a football field, you have any vehicle malfunction for up to a football field in length that’s explained only because this watchdog design it [sic] bad, and because the processor is overloaded momentarily. And that should have been also a job of that watchdog supervisor. And that is one they tried to implement and they don’t do it well.

They also made a classic blunder, one that’s taught by professor like at Dr. Koopman³ to first year students in his imbedded systems class, which is, you don’t dedicate a hardware timer on the main CPU to periodically kick the hardware on the watchdog, because that will keep functioning even though vast portions of the software and the tasks are not rubbing because these interrupts are a higher priority than the tasks.

And so, that is a design that you – and I have spoken about that at many conferences, not doing it that way. And they do that.⁴

1There are other examples. The members of a jury concluded that a cruise control malfunctioned on a Ford Aerostar vehicle in Cole v Ford Motor Company, 136 Or.App. 45, 900 P.2d 1059 (1995); for another Ford Aerostar vehicle case, in which the members of a jury concluded that a cruise control malfunctioned, see Jarvis v Ford Motor Company, 283 F.3d 33, 51 Fed.R.Serv.3d 1310 (2d Cir. 2002). Examples of conflicting evidence that is, on its face, inadequate to determine causation include: Ford Motor Company v Stimpson, 115 So. 3d 401 (Fla. 5th DCA 2013); Belville v Ford Motor Company, 919 F.3d 224 (2019) upholding the summary judgment decision and exclusion of expert testimony of plaintiffs in Johnson v Ford Motor Company, 310 F.Supp.3d 699 (2018) (consumers failed to establish that unintended acceleration of their vehicles was the result of the manufacturer’s electronic throttle control system, granting summary judgment in favour of the defendant); Kesse v Ford Motor Company, 2020 WL 832363. See Buck v Ford Motor Company, 526 Fed.Appx. 603 (2013) where the plaintiff failed to produce adequate expert evidence and reliance on a report regarding unintended acceleration from the United Kingdom was not admitted into evidence.

2The trial was held in the District Court of Oklahoma County State of Oklahoma before the Hon Patricia G. Parrish, District Judge; see also In re Toyota Motor Corp. Unintended Acceleration Marketing, Sales Practices, and Products Liability Litigation, 978 F.Supp.2d 1053, 92 Fed. R. Evid. Serv. 714, Prod.Liab.Rep. (CCH) P 19,244 (summary judgment granted regarding the claim by the plaintiff of a manufacturing defect and negligence, denied motion for summary judgment as to the design defect claim and the failure to warn claim); transcript (not proofread) of the trial 14 October 2013 (Reported by Karen Twyford, RPR): examination and cross examination of Michael Barr, http://www.safetyresearch.net/Library/Bookout_v_Toyota_Barr_REDACTED.pdf.

3Dr Koopman is an Associate Professor at Carnegie Mellon University, Department of Electrical and Computer Engineering.

4Case No. CJ-2008-7969, at 70–71. Professor Philip Koopman also gave evidence in this case, and his assessment of the problem was similar to that of Mr Barr, for which see https://www.usna.edu/AcResearch/_files/documents/NASEC/2016/CYBER%20-%20Toyota%20Unintended%20Acceleration.pdf; https://users.ece.cmu.edu/~koopman/toyota/koopman-09-18-2014_toyota_slides.pdf.

5.159 Software in vehicles can also be manipulated to give the false assurance of regulatory compliance. In September 2015, the United States Environmental Protection Agency issued a notice of violation of the Clean Air Act to Volkswagen AG, Audi AG and Volkswagen Group of America, Inc.¹ The notice alleged that four-cylinder Volkswagen and Audi diesel cars manufactured in the years 2009–2015 included software that circumvented the emissions standards for some air pollutants. The State of California Air Resources Board had issued a separate In-Use Compliance letter to Volkswagen,² and the two agencies initiated investigations based on the allegations. A software algorithm on certain Volkswagen vehicles switched the full emissions controls on only when the car detected it was undergoing official emissions testing.³ Thus the effectiveness of the emission control devices was greatly reduced during normal driving. This meant that motor vehicles met the emissions standards in the laboratory or testing station, but during normal operation the vehicles emitted nitrogen oxides, or NOx, at up to 40 times the standard. Over a one-year period of operation, the emission of this extra pollutant by Volkswagen was estimated to have resulted in 5 to 50 premature deaths.⁴ The Department of Justice subsequently filed a complaint for alleged violations of the Clean Air Act.⁵

1For details, see https://www.epa.gov/vw/learn-about-volkswagen-violations.

2Letter from the Air Resources Board to Volkswagen AG, Audi AG, and Volkswagen Group of America, Inc dated 18 September 2015 reference number IUC.2015-007 (this has been archived and is no longer available on the Internet).

3It has been identified as the EDC17 diesel ECU manufactured by Bosch, for which see Moritz Contag, Guo Li, Andre Pawlowski, Felix Domke, Kirill Levchenko, Thorsten Holz and Stefan Savage, ‘How they did it: an analysis of emission defeat devices in modern automobiles’, 2017 IEEE Symposium on Security and Privacy (Institute of Electrical and Electronics Engineers 2017), 231–250. The authors indicate they found strong evidence that the defeat device was created by Bosch and enabled by Volkswagen. They also observed that the same device was installed in the Fiat 500X.

4Lifang Hou, Kai Zhang, Moira A. Luthin and Andrea A. Baccarelli, ‘Public health impact and economic costs of Volkswagen’s lack of compliance with the United States’ emission standards’ (2016) 13(9) International Journal of Environmental Research and Public Health 891; Gregory J. Thompson, Daniel K. Carder, Marc C. Besch, Arvind Thiruvengadam and Hemanth K. Kappanna, Final Report: In-Use Emissions Testing of Light-Duty Diesel Vehicles in the United States (Center for Alternative Fuels, Engines & Emissions, Department of Mechanical & Aerospace Engineering, West Virginia University), 15 May 2014 http://www.eenews.net/assets/2015/09/21/document_cw_02.pdf.

5Press release: ‘United States files complaint against Volkswagen, Audi and Porsche for alleged Clean Air Act violations’, Monday, 4 January 2016, https://www.justice.gov/opa/pr/united-states-files-complaint-against-volkswagen-audi-and-porsche-alleged-clean-air-act, including a link to the original Complaint; an amended Complaint was submitted on 7 June 2016 and is available at https://www.epa.gov/sites/production/files/2016-10/documents/amendedvw-cp.pdf.

5.160 Manufacturers of motor vehicles are rapidly increasing the amount of software in vehicles, partly with the aim of manufacturing autonomous vehicles.¹ Semi-autonomous or fully autonomous vehicles will not provide the panacea that the industry constantly asserts. Vehicles controlled wholly or partially by software code will continue to cause accidents and kill and injure people.² Also, because the software in vehicles is open to being attacked, it is far from safe.³

1Autonomous motor vehicles have been involved in numerous accidents, mainly because of software failures, and a number of people have been killed and injured by motor vehicles in ‘autonomous’ mode. Here is a sample list of articles and websites: Francesca M. Favarò, Nazanin Nader, Sky O. Eurich, Michelle Tripp and Naresh Varadaraju, ‘Examining accident reports involving autonomous vehicles in California’, PLoS One, 2017;12(9):e0184952, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5607180/; Song Wang and Zhixia Li, ‘Exploring the mechanism of crashes with automated vehicles using statistical modeling approaches’, PLoS One 2019; 14(3): e0214550, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6438496/; Đorđe Petrovića, Radomir Mijailovića and Dalibor Pešića, ‘Traffic accidents with autonomous vehicles: type of collisions, manoeuvres and errors of conventional vehicles’ drivers’ (2020) 45 Transportation Research Procedia 161; for fatalities, see https://en.wikipedia.org/wiki/List_of_self-driving_car_fatalities (although this list does not correspond to the list of lives lost when relating to the Tesla motor case, for which see: https://www.tesladeaths.com/); for Uber, see https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg.

2For instance, see NTSB, Preliminary Report, Highway HWY16FH018 (Josh Brown, Florida in Tesla Model S) https://www.ntsb.gov/investigations/AccidentReports/Pages/HWY16FH018-preliminary.aspx; NTSB, Preliminary Report, Highway HWY18MH010 (Uber car crash), https://www.ntsb.gov/investigations/AccidentReports/Reports/HWY18MH010-prelim.pdf.

3Andrea Palanca, Eric Evenchick, Federico Maggi and Stefano Zanero, ‘A stealth, selective, link-layer denial-of-service attack against automotive networks’ in Michalis Polychronakis and Michael Meier (eds) Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2017, Lecture Notes in Computer Science, vol 10327, Springer); Roger Kemp, ‘Autonomous vehicles – who will be liable for accidents?’ (2018) 15 Digital Evidence and Electronic Signature Law Review 33; Michael Ellims, ‘Brake systems: a mind of their own’ (2021) 18 Digital Evidence and Electronic Signature Law Review 27; ‘The braking system on Formula E cars is designed so that if the front brakes fail, the rear brake system is activated as a fail-safe. In this instance, an incorrect software parameter that meant the rear brake system didn't activate as intended and the fail-safe did not kick in. We have now corrected the software problem and demonstrated to the FIA’s satisfaction that the matter has been resolved. As a result, the FIA will permit all Mercedes-powered cars to race this evening’: Thomas Claburn, ‘Incorrect software parameter sends Formula E’s Edoardo Mortara to hospital: Brakes’ fail-safe system failed’, The Register, 1 March 2021, https://www.theregister.com/2021/03/01/formula_e_bug.

Emergency services

5.161 In 1992, the London Ambulance computer-aided dispatch system failed. A complex set of circumstances resulted in an effective failure of the dispatching system, which are set out in paragraph 1996 of the Report.¹ Apparently ‘the computer system itself did not fail in a technical sense … However, much of the design had fatal flaws that would, and did, cumulatively lead to all of the symptoms of systems failure’.² Among the contributing factors were ‘exception messages’ and ‘requests for attention’ which scrolled off the screen because of the large number of messages generated.³ There is also a suggestion that one member of staff was not using the system as expected,⁴ and the problems were compounded by ‘a genuine failure of crews to press the correct status button owing to the nature and pressure of certain incidents’.⁵ This was so even though the individuals who used the new system were from a skilled and trained pool of staff, namely ambulance crews and controllers. Other problems have occurred since.⁶

1Report of the Inquiry into the London Ambulance Service, South West Thames Regional Health Authority (1993) – a scanned version is available at http://www0.cs.ucl.ac.uk/staff/A.Finkelstein/las.html; P. Mellor, ‘CAD: Computer-aided disaster’ (1994) 1(2) High Integrity Systems Journal 101; Anthony Finkelstein and John Dowell, ‘A comedy of errors: the London Ambulance Service case study’ in Proceedings of the 8th International Workshop on Software Specification & Design IWSSD-8, (IEEE CS Press 1996), 2–4; Paul Beynon-Davies, ‘Information systems “failure” and risk assessment: the case of the London Ambulance Service computer and despatch system’ in G. Doukidid, B. Galliers, H. Kremar and F. Land (eds) Proceedings of the 3rd European Conference on Information Systems, Athens, 1–3 June 1995, 1153–1170; Paul Beynon-Davies, ‘Human error and information systems failure: the case of the London Ambulance Service computer-aided despatch system project’ (1999) 11 Interacting with Computers 699; D. Dalcher, ‘Disaster in London: The LAS case study’ 1999 Engineering of Computer-Based Systems 41.

2Report of the Inquiry into the London Ambulance Service, para 1007(x).

3Report of the Inquiry into the London Ambulance Service, paras 4012(c) and 4023.

4Report of the Inquiry into the London Ambulance Service, para 4025.

5Report of the Inquiry into the London Ambulance Service, para 4009(b).

6Kelly Fiveash, ‘London Ambulance Service downed by upgrade cockup’, The Register (9 June 2011); Jon Ironmonger, ‘Ambulance system failure “might have led to patient death”’, BBC News (6 January 2017).

5.162 In 2014, an outage for 911 calls in the United States of America occurred because of a preventable software coding error in a 911 Emergency Call Management Center automated system in Englewood, Colorado, operated by Intrado, a subsidiary of West Corporation. This prevented non-PI-enabled long-distance assignments, which meant calls could not be routed to the appropriate destination.¹

1April 2014 Multistate 911 Outage: Cause and Impact Report and Recommendations (A Report of the Public Safety Homeland Security Bureau, Federal Communications Commission, October 2014, Public Safety Docket No. 14–72 PSHSB Case File Nos. 14-CCR-0001-0007), https://www.fcc.gov/document/april-2014-multistate-911-outage-report.

Medical

5.163 The widespread use of computer devices in the medical industry has also given rise to incidents where the reliability of devices and software has been called into question. The rules for approving medical devices leave a lot of scope for software failure. The device does not need new approval if it is ‘substantially similar’ to an existing approved device. This allows for errors to go unconsidered or for incremental changes to take the latest device far from the original design.¹ Most of the ‘apps’ promoted on smartphones are not licensed or inherited from an ‘equivalent’ device. Consider the ‘Babylon health app’ – a triage chatbot that is notoriously poor and not approved, but would pass the examination taken by final year doctors, as noted by Dr Margaret McCartney:

Who’s in charge of ensuring that this app [NHS 111 powered by Babylon app] is safe and fit for purpose?

Knowing the staggering lack of publicly available robust testing that had accompanied the adult symptom checker app, I thought that perhaps Babylon might have done better with its paediatric one. What’s Babylon’s evidence? I don’t know, for it replied with, ‘we won’t be responding to your enquiry’. The binary nature of the chatbot means that one thing that doesn’t happen is history taking, in the medical sense (‘Shut up, your patient is telling you the diagnosis’). It has a series of yes/no questions and short multiple choices.

Who’s in charge of ensuring that this app is safe and fit for purpose? The Medicines and Healthcare Products Regulatory Agency (MHRA) has said that it will ask Babylon to change the way it refers to the app as being ‘certified as a medical device with the MHRA’. The MHRA says that, for class I devices such as this app, the manufacturer must register with the agency and self certify that the device meets the requirements of the regulations. The MHRA says that this process is purely administrative – the MHRA takes details of the types of devices manufactured, but it does not assess, certify, approve, or accredit devices as part of the CE (European Conformity) marking process.

Who else could act? The Care Quality Commission has inspected Babylon, but it made no mention of the reliability, or not, of the app that it uses to direct people to and from general practice consultations. The General Medical Council regulates individual doctors, not clinical devices.

We have many regulators but little proactivity, even for an app which – despite the small print warning us that it ‘does not constitute medical advice, diagnosis, or treatment’ – is being used as the front door into NHS care.²

1For a general introduction that should be compulsory reading for all incoming ministers of health, see Martyn Thomas and Harold Thimbleby, Computer Bugs in Hospitals: A New Killer (Gresham College, 6 February 2018), https://www.gresham.ac.uk/lectures-and-events/computer-bugs-in-hospitals-a-new-killer; Dolores R. Wallace and D. Richard Kuhn, ‘Failure modes in medical device software: an analysis of 15 years of recall data’ (2001) 8(4) International Journal of Reliability, Quality and Safety Engineering 351; Homa Alemzadeh, Ravishankar K. Iyer, Zbigniew Kalbarczyk and Jai Raman, ‘Analysis of safety-critical computer failures in medical devices’ (2013 July/August) IEEE Security & Privacy, 14; Homa Alemzadeh, Jaishankar Raman, Nancy Leveson, Zbigniew Kalbarczyk and Ravishankar K. Iyer, ‘Adverse events in robotic surgery: a retrospective study of 14 years of FDA data’ (2016) 11(4) LPoS ONE 1, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0151470; Alessia Ferrarese, Giada Pozzi, Felice Borghi, Alessandra Marano, Paola Delbon, Bruno Amato, Michele Santangelo, Claudio Buccelli, Massimo Niola, Valter Martino and Emanuele Capasso, ‘Malfunctions of robotic system in surgery: role and responsibility of surgeon in legal point of view’ (2016) 11(1) Open Med (Wars) 286, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5329842/.

2Margaret McCartney, ‘AI in medicine must be rigorously tested’, Thebmj, News and Views, 24 April 2018, https://www.bmj.com/content/361/bmj.k1752.

5.164 For instance, patients have been affected by an error in clinical IT software¹ and by the failure to timely correct an error,² and one study of a hospital computerized physician order entry system in the USA illustrated a number of errors that the system was supposed to resolve, such as an increased probability of prescribing errors. There were 12 flaws in the interface used by humans that reflected machine rules that in turn did not correspond to how work was organized or the usual behaviour of those using the system.³ There is an increasing volume of articles on this topic,⁴ and it would appear that some, and not all, of the problems were due to software defects,⁵ but it is now very clear that software helps to kill people in hospitals.⁶

1Alex Matthews-King, ‘GPs told to review patients at risk as IT error miscalculates CV score in thousands’, Pulse Today, 11 May 2016, https://www.pulsetoday.co.uk/news/clinical-areas/prescribing/gps-told-to-review-patients-at-risk-as-it-error-miscalculates-cv-score-in-thousands/.

2Singh v Edwards Lifesciences Corp., 151 Wash. App. 137, 210 P.3d 337 (2009) where the manufacturer of a heart monitor was aware of and had developed a fix for the software bug as early as 1998, but made a calculated business decision not to issue a recall or warning to any customers. Monitors were patched only when sent in for repair, and so the one used during Singh’s operation had not been patched. The jury awarded Singh $31.75 million in compensatory damages plus an additional $8.35 million in punitive damages. The verdict was upheld on appeal.

3Ross Koppel, Joshua P. Metlay, Abigail Cohen, Brian Abaluck, A. Russell Localio, Stephen E. Kimmel and Brian L. Strom, ‘Role of computerized physician order entry systems in facilitating medication errors’ (2005) 293(10 Journal of the American Medical Association 1197.

4E. Alberdi, A. A. Povyakalo, L. Strigini and P. Ayton, ‘Computer aided detection: risks and benefits for radiologists’ decisions’ in E. Samei and E. Krupinski (eds) The Handbook of Medical Image Perception and Techniques (Cambridge University Press 2009), 320–332.

5Frances E. Zollers, Andrew McMullin, Sandra N. Hurd and Peter Shears, ‘No more soft landings for software: liability for defects in an industry that has come of age’ (2004) 21 Santa Clara High Tech. LJ 745; Sharona Hoffman and Andy Podgurski, ‘E-health hazards: provider liability and electronic health record systems (2009) 24 Berkeley Tech LJ 1523; Paul T. Lee, Frankie Thompson and Harold Thimbleby, ‘Analysis of infusion pump error logs and their significance for health care’ (2012) 21(8) British Journal of Nursing (Intravenous Supplement) S12; Hon. John M. Curran and Mark A. Berman, ‘Gremlins and glitches using electronic health records at trial’ (2013) 85(4) New York State Bar Journal 20; Courtney L. Davenport, ‘Dangers of electronic medical systems’, (2013) 49(5) Trial: The National Legal Newsmagazine14; Timothy P. Blanchard and Margaret M. Manning, ‘Electronic medical record documentation: inherent risks and inordinate hazards’, in Alice G. Gosfield, (ed.), Health Law Handbook (Thompson Reuters 2016), 246–297; Jayanti Bhandari Neupane, Ram P. Neupane, Yuheng Luo, Wesley Y. Yoshida, Rui Sun and Philip G. Williams, ‘Characterization of leptazolines A−D, polar oxazolines from the cyanobacterium leptolyngbya sp., reveals a glitch with the “Willoughby−Hoye” scripts for calculating NMR chemical shifts’ (2019) 21 Org Lett 8449, where the authors discuss a flaw in software that could lead to incorrect conclusions; Karam v Adirondack Neurosurgical Specialists, P.C., 93 A.D.3d 1260 (2012), 941 N.Y.S.2d 402, 2012 N.Y. Slip Op. 02182 (evidence pointed to error in software), motion for reargument or leave to appeal to the Court of Appeals denied, 96 A.D.3d 1513 (2012), 945 N.Y.S.2d 588, 2012 N.Y. Slip Op. 04645, motion for leave to appeal denied, 19 N.Y.3d 812 (2012), 976 N.E.2d 251, 951 N.Y.S.2d 722, 2012 N.Y. Slip Op. 83806.

6Yong Y. Han, Joseph A. Carcillo, Shekhar T. Venkataraman, Robert S. B. Clark, R. Scott Watson, Trung C. Nguyen, Hülya Bayir and Richard A. Orr, ‘Unexpected increased mortality after implementation of a commercially sold computerized physician order entry system’ (2005) 116(6) Pediatrics1506; Harold Thimbleby, ‘Ignorance of interaction programming is killing people’, Interactions (September and October 2008), 52; Harold Thimbleby, Fix IT: How to Solve the Problems of Digital Healthcare(Oxford University Press 2021), open source at http://www.harold.thimbleby.net/rhbook/book.pdf.

The Post Office Horizon scandal

5.165 Between 2000 and 2019, the Post Office¹ operated a computerized accounting and electronic point of sale IT system called Horizon. This system was installed in its branch Post Offices around the country. It was not long before sub-postmasters and sub-postmistresses (SPMs) began experiencing balancing errors that they could not explain. Post Office employees did not attempt to find out why balance errors were occurring; they merely required the SPMs to make-up any shortfall from their own funds. The balancing errors ranged from small amounts to tens of thousands of pounds. Some SPMs would make up the shortfall, and some would not. The Post Office initiated a substantial number of prosecutions for theft and fraud (the Post Office itself is a prosecuting authority), relying on the presumption that computers are reliable.²

1Post Office Limited is a private limited company registered in England and Wales, company number 02154540, incorporated on 13 August 1987. The Secretary of State for Business, Energy and Industrial Strategy holds a special share, and the rights attached to that special share are enshrined within the Post Office Limited Articles of Association.

2The transcript of the trial of Regina v Seema Misra, T20090070, in the Crown Court at Guildford, Trial dates 11, 12, 13, 14, 15, 18, 19, 20, 21 October and 11 November 2010, His Honour Judge N. A. Stewart and a jury, was published in full in (2015) 12 Digital Evidence and Electronic Signature Law Review, Introduction 44, Documents Supplement; see also Tim McCormack, ‘The Post Office Horizon system and Seema Misra’ (2016) 13 Digital Evidence and Electronic Signature Law Review 133. In this case, the prosecuting barrister referred to the Horizon system being ‘robust’ – seemingly in an attempt to refer to the presumption that computers are reliable without actually committing to using the word ‘reliable’, for which see Ladkin, ‘Robustness of software’; for a discussion of the evidence the Post Office ought to have disclosed before trial, see James Christie, ‘The Post Office Horizon IT scandal and the presumption of the dependability of computer evidence’ (2020) 17 Digital Evidence and Electronic Signature Law Review 49. The disclosure of relevant digital data was a live issue in this case. The defence made a number of requests for further disclosure of the computer system. This was refused four times: first application before Mr Recorder Bruce, 10 March 2010 (Day 1 Monday 11 October 2010, 3C; Judge’s Ruling, Day 1 Monday 11 October 2010, 25, A–C); second application before HH Judge Critchlow, 7 May 2010 (Day 1 Monday 11 October 2010, 3G); third application before the trial judge (Day 1 Monday 11 October 2010, 15H–16H) and fourth application before the trial judge (Day 6, Monday 18 October 2010, 24H–25A) – on this precise point, see Hamilton v Post Office Ltd [2021] EWCA Crim 577 at [204].

5.166 In response to the failure of the Post Office to consider that SPMs were not defrauding or stealing from the Post Office, a group of ex-sub-postmasters and sub-postmistresses formed the Justice For Subpostmasters Alliance (JFSA)¹ in 2009 as the result of experiencing significant problems with how Post Office Limited dealt with apparent shortfalls in their accounts after the introduction of the Horizon IT system in 2000.² Following years of campaigning with the support of many MPs, in 2012 the Post Office appointed Second Sight Support Services Limited, a firm of independent forensic accountants, to investigate the claims being made about the Horizon system and the associated issues. On 8 July 2013, Second Sight published an Interim Report on its findings up until that date, which led to MPs raising questions with the Minister for Postal Affairs in the House of Commons on 9 July 2013.³ The Interim Report demonstrated that there were issues that required further investigation, and in August 2013 an Initial Complaint Review and Mediation Scheme was established to investigate individual cases. The Scheme was open to both serving and ex-sub-postmasters and sub-postmistresses who had concerns relating to Horizon, and offered them an opportunity to have their cases independently reviewed and raised directly with the Post Office. A Working Group, comprising representatives from Second Sight, the Post Office and the JFSA, was established with an independent chair. The Scheme closed to applicants on 18 November 2013. During the 12 weeks it was open 150 applications were received. On 9 April 2015, the Post Office terminated the Scheme Working Group, and also terminated the contracts with Second Sight and the independent chairman. The draft of the Second Sight Report Part Two was due to be released to the Working Group on 10 April 2015, but the action of the Post Office prevented this from taking place. The second part of the Second Sight Report (version 2) eventually appeared on a journalists’ website.

1The Justice For Subpostmasters Alliance, https://www.jfsa.org.uk/.

2The Post Office took civil action to recover monies on an account stated (for an explanation of ‘account states’ see Marshall below) by one of its former sub-postmasters, Mr Castleton: Post Office Ltd v Castleton [2007] EWHC 5 (QB), [2007] 1 WLUK 381. For a comprehensive assessment of this judgment, illustrating failure of the judge to accept that Mr Castleton was challenging the presumption that computers are reliable, see Paul Marshall, ‘The harm that judges do – misunderstanding computer evidence: Mr Castleton’s story’ (2020) 17 Digital Evidence and Electronic Signature Law Review 25. In Banks v Revenue & Customs [2014] UKFTT 465 (TC), [2014] 5 WLUK 335, in response to the appellant’s assertions that the online process for submitting tax forms was flawed, Revenue and Customs rejected the claim without providing any evidence, the members of the tribunal reporting, at [22], that ‘HMRC says that it interrogated its computer system, and found no faults’. In addition, the members of the tribunal stated, at [28], in the absence of any evidence to make such an assessment, that ‘It is equally difficult to envisage HMRC’s systems failing in such a rudimentary way’.

3Hansard, 9 July 2013, columns 198–209, https://publications.parliament.uk/pa/cm201314/cmhansrd/cm130709/debtext/130709-0002.htm#13070952000004.

5.167 In 2015, the law firm Freeths LLP agreed to represent those ex-sub-postmasters and sub-postmistresses who wanted to take part in any future legal action. Therium Group Holdings Limited funded the litigation.¹ A Group Litigation Order was subsequently made on 22 March 2017 by Senior Master Fontaine, and approved by the President of the Queen’s Bench Division. Procedural issues before the first trial were dealt with in the first judgment, Bates v Post Office Ltd,² and a second judgment dealt with a further application by the Post Office to strike out part of the claim in Bates v Post Office Ltd (No 2).³ It was anticipated that there would be four trials. In the event, only two trials have taken place.

1Therium Group Holdings Limited, https://www.therium.com/.

2[2017] EWHC 2844 (QB), [2017] 6 Costs LO 855, [2018] CLY 376.

3[2018] EWHC 2698 (QB), [2018] 10 WLUK 291.

5.168 The first trial concerned, in the main, the contractual position between the Post Office and the sub-postmasters and sub-postmistresses. The judgment is in Bates v Post Office Ltd (No. 3: Common Issues).¹ In this judgment, the judge included a comprehensive introduction to the issues generally between the parties at [2]‌–[43]. Orders in respect of costs of the Common Issues trial were determined in Bates v Post Office Ltd (No. 5: Common Issues Costs).² The second trial, dealing with the Horizon software, took place between 11 March 2019 and 22 July 2019. During the course of this trial, the Post Office issued an application that the judge recuse himself as Managing Judge in this group litigation, and stop the Horizon Issues trial so that it could be recommenced at some later date before a replacement Managing Judge. That application was refused, for which see Bates v Post Office Ltd (No. 4: Recusal Application).³ Permission to appeal was refused by the single Lord Justice on 9 May 2019.⁴ Between the end of the second trial and the judgment, the parties sought mediation. An agreement was reached on 11 December 2019.⁵ The judge handed down his judgment in the second trial on 16 December 2019 – a comprehensive judgment that clearly indicated that the Horizon system had a significant number of software errors, including the ability of employees of Fujitsu to enter the computers of SPMs remotely and change data without the SPM being aware of what was happening.⁶ When handing down his judgment, the judge indicated that he:

[had] very grave concerns regarding the veracity of evidence given by Fujitsu employees in other courts in previous proceedings about the known existence of bugs, errors and defects in the Horizon system. These previous proceedings include the High Court in at least one civil case brought by the Post Office against a sub-postmaster and the Crown Court in a greater number of criminal cases, also brought by the Post Office against a number of sub-postmasters and sub-postmistresses.

After careful consideration, I have therefore decided, in the interests of justice, to send the papers in the case to the Director of Public Prosecutions, Mr Max Hill QC, so he may consider whether the matter to which I refer should be the subject of any prosecution.⁷

1[2019] EWHC 606 (QB), [2019] 3 WLUK 260.

2[2019] EWHC 1373 (QB), [2019] 6 WLUK 80, [2019] Costs LR 857, [2019] CLY 431.

3[2019] EWHC 871 (QB), [2019] 4 WLUK 150.

4Bates v Post Office Ltd Case No: A1/2019/1387/PTA dated 22 November 2019. The approved judgment will be published in a future edition of the Digital Evidence and Electronic Signature Law Review.

5Confidential settlement deed (10 December 2019) between the claimants in the action Bates v Post Office Limited, Post Office Limited and Freeths LLP, https://www.onepostoffice.co.uk/media/47518/20191210-glo-confidential-settlement-deed-executed-version-redacted_-003.pdf.

6Bates v Post Office Ltd (No 6: Horizon Issues) Rev 1 [2019] EWHC 3408 (QB), [2019] 12 WLUK 208; during this trial, the lead counsel for the Post Office, Anthony de Garr Robinson QC, repeatedly referred to the ‘robustness’ of the Horizon system and also cited statistics that were incorrect. For an analysis, see the discussion in Parker, Humble Pi in (2019) 16 Book Reports, Digital Evidence and Electronic Signature Law Review 99–105.

7Approved Proceedings sent to the author, High Court of Justice, Queen’s Bench Division, No QB-2016-004710, 16 December 2019, to be published in the Digital Evidence and Electronic Signature Law Review in 2021.

5.169 In 2015, the Criminal Case Review Commission (CCRC) began reviewing claims of wrongful prosecution for offences such as theft and false accounting, caused, the complaints allege, as a result of problems with the Post Office’s Horizon IT system. In 2020, the CCRC referred 47 Post Office cases on the abuse of process to the Court of Appeal,¹ and in October 2020 the government initiated a non-statutory inquiry into the Post Office’s Horizon IT dispute led by Sir Wyn Williams.² The Court of Appeal Criminal Division heard the appeal of 42 appellants on 22, 23 and 24 March 2021 and handed down judgment on 23 April 2021 in which the appeals of 39 appellants were quashed.³ The court also reached a rare determination: that the prosecutions were an affront to the conscience of the court.⁴ In delivering the judgment of the court, Holroyde LJ noted that the Post Office constantly asserted that the Horizon system was ‘reliable’ [20] and [125], ‘accurate and reliable’ [68] or ‘robust and reliable’ [121]. He went on to say, at [137]:

By representing Horizon as reliable, and refusing to countenance any suggestion to the contrary, POL [Post Office Limited] effectively sought to reverse the burden of proof: it treated what was no more than a shortfall shown by an unreliable accounting system as an incontrovertible loss, and proceeded as if it were for the accused to prove that no such loss had occurred. Denied any disclosure of material capable of undermining the prosecution case, defendants were inevitably unable to discharge that improper burden.

1R. v Hamilton [2021] EWCA Crim 21, [2021] 1 WLUK 116, [2021] 1 Cr App R 17; ‘The CCRC refers eight more Post Office cases for appeal – bringing total to 47 so far’, 3 June 2020, https://ccrc.gov.uk/the-ccrc-refers-eight-more-post-office-cases-for-appeal-bringing-total-to-47-so-far/; ‘CCRC to refer 39 Post Office cases on abuse of process argument’, 26 March 2020, https://ccrc.gov.uk/ccrc-to-refer-39-post-office-cases-on-abuse-of-process-argument/; the Criminal Cases Review Commission’s process for review of convictions relating to the Post Office and Horizon accounting system (Number 2020-0040, 3 March 2020), House of Commons Library, https://commonslibrary.parliament.uk/research-briefings/cdp-2020-0040/; for Scotland, see Reevel Alderson, ‘Post Office scandal: Scottish probe into sub-postmasters’ convictions’, BBC Scotland, 30 September 2020, https://www.bbc.co.uk/news/uk-scotland-54339004.

2https://www.gov.uk/government/publications/post-office-horizon-it-inquiry-2020; https://www.gov.uk/government/publications/post-office-horizon-it-inquiry-2020/terms-of-reference. The government converted the Inquiry into a statutory inquiry under the Inquiries Act 2005 on 1st June 2021, Statement UIN HCWS40, https://questions-statements.parliament.uk/written-statements/detail/2021-05-19/hcws40.

3Hamilton v Post Office Ltd [2021] EWCA Crim 577, [2021] 4 WLUK 227.

4Hamilton v Post Office Ltd [2021] EWCA Crim 577 at [66].

5.170 Not only was it factually incorrect that the Horizon system was reliable, but the failure to disclose relevant information meant:

[the] defendants were inevitably unable to discharge that improper burden. As each prosecution proceeded to its successful conclusion the asserted reliability of Horizon was, on the face of it, reinforced. Defendants were prosecuted, convicted and sentenced on the basis that the Horizon data must be correct, and cash must therefore be missing, when in fact there could be no confidence as to that foundation.¹

1Hamilton v Post Office Ltd [2021] EWCA Crim 577 at [137].

Banking

5.171 The presumption that computers are reliable is particularly relevant with regard to banking. Banks across the world have introduced very complex systems and networks to control the flow of transactions, many of which are no longer under the sole control of the banks themselves. That a bank benefits from the presumption that its computers and networks, including the computers and networks it relies upon over which it has no direct control, were in order at the material time, puts an impossible burden on the customer. If a customer in dispute with his bank wants to challenge this presumption, he will require significant knowledge of the computers, systems and networks operated by the bank, how they work and where the vulnerabilities might lie, including the results of relevant audits, both internal and external – a task well beyond the majority of customers, including most lawyers without the benefit of expert advice, which in itself is difficult to obtain.

5.172 Issues regarding the reliability of banking systems manifested themselves in the problems in the UK in June and July 2012 with RBS, NatWest and Ulster banks.¹ On 19 June 2012, an important item of software known as CA-7 was updated. This software controls the batch processing systems that deal with retail banking transactions. It is used to automate large sequences of batch mainframe work, usually referred to as ‘jobs’. The jobs take transactions from various places, such as ATM withdrawals, automatic salary payments and such like, so that accounts are credited and debited with the correct amounts by the next morning. The software initiates jobs, and when one job is finished, a new job will be initiated. Accounts are processed overnight when the mainframes are less busy, and finish by updating the master copy of the account in a system known as Caustic. It appears that the update made to CA-7 caused the files to run incorrectly or not to run at all for three nights. David Silverstone, delivery and solutions manager for NMQA, which provides automated testing software to a number of banks, is quoted to the effect that such problems can always be avoided if there is sufficient testing of the update before it is put into operational use.² Michael Allen, director of IT service management at Compuware, is reported to have said:

The problem is that IT systems have become vastly more complex. Delivering an e-banking service could be reliant on 20 different IT systems. If even a small change is made to one of these systems, it can cause major problems for the whole banking service, which could be what’s happened at NatWest. Finding the root cause of the problem is probably something NatWest is struggling with because of the complexity of the IT systems in any bank.³

1For detailed information, the reader is directed to the Treasury Select Committee web page on the Parliament website.

2Charles Arthur, ‘How NatWest’s IT meltdown developed’, The Guardian, 25 June 2012.

3Anna Leach, ‘Natwest, RBS: When will bank glitch be fixed? Probably not today’, The Register, 22 June 2012.

5.173 The complexity of the problem is highlighted in an article written by Hilary Osborne in The Guardian in 2014, in which the issues were explained:

‘The banks do have a problem, but it’s not a new problem, and it’s not an easy problem to fix, which is why it’s taking so long’, says David Bannister, editor of Banking Technology magazine. ‘In the old days these machines just had to run overnight in batch mode – it was like newspapers with just one edition – but now they have to deal with news that is being updated throughout the day. The users – us – are using internet banking, ATMs, we’re spending money online. The reconciliation between what is going on in the background is the hard part, and the gulf is widening all the time.’

Ben Wilson, associate director of financial services for techUK, says some of the ‘legacy systems’ at banks are 30–40 years old and were originally set up for branch banking, but ‘then they needed to be ATM-focussed, then there was online banking, then mobile banking’. He says: ‘Banks have bolted on these changes because it is cheaper and less risky than starting from scatch, but every time you bolt on a change it becomes more complex.’

As well as new banking channels, systems are also tinkered with whenever regulatory changes are made, and when a product is withdrawn or changed.

Jim McCall, managing director of the Unit, which works with banks and other companies on their mobile apps, says that while anyone now building a system from scratch would ‘abstract out as much as possible so [different elements] are not as reliant on each other’, the banks’ systems often resemble a house of cards. ‘If you make a change to a tiny bit of code on one thing it is like the butterfly flapping its wings far away and somewhere someone’s mobile app stops working’, he says.

To make things more complicated, says Colin Privett, UK managing director of software firm Cast, new functions are usually ‘written in different programming languages, on different machines, by different teams’. He adds: ‘This prevents a single person/team from ever fully understanding the entire structure of a system. That is why when things do go wrong it can often take hours, or even days, to fix as teams scramble to find out where the problem lies.’¹

1Hilary Osborne, ‘Why do bank IT systems keep failing?’, The Guardian, 27 January 2014.

5.174 The effects of the CA-7 imbroglio were considerable. In some cases people were left homeless after the computer problems meant house purchases fell through; others were stranded abroad, unable to obtain access to funds which should have been in their account; wages and direct debits were not paid; and it is reported that one person spent the weekend in prison because the computer failure meant his bail money was not processed.¹ The problems continued into 2014.² In December 2014, the Royal Bank of Scotland Plc, National Westminster Bank Plc and Ulster Bank Ltd faced a combined financial penalty of £42 million by the Financial Conduct Authority for breaches of Principle 3 of the ‘principles for businesses’, forming part of ‘the principles of good regulation’, which requires a firm to take reasonable care to organize and control its affairs responsibly and effectively with adequate risk management systems,³ and the Prudential Regulation Authority imposed a financial penalty of £14 million on the same banks for their failure to meet their obligations to have adequate systems and controls to identify and manage their exposure to IT risks.⁴

1James Hall and Gordon Rayner, ‘RBS computer failure condemns man to spend weekend in the cells’, The Telegraph, 25 June 2012.

2Emma Dunkley, ‘RBS and NatWest to plough £1bn into digital upgrade’ Financial Times, 28–29 June 2014, 18.

3https://www.fca.org.uk/publication/final-notices/rbs-natwest-ulster-final-notice.pdf.

4https://www.bankofengland.co.uk/-/media/boe/files/prudential-regulation/enforcement-notice/en201114.pdf?la=en&hash=7483F66E5533498680F8C2CD9F34CE9C10FD5EA8.

5.175 The problem of complexity and the difficulties in understanding and maintaining another banking system were emphasized in the report by Deloitte of the failure of the Real-Time Gross Settlement (RTGS) system operated by the Bank of England in 2014.¹ The report stated:

133. During the 18 years since RTGS was first launched, the incremental changes have resulted in an increase in complexity and a system which is now more difficult to understand and maintain. In particular, the LSM and MIRS changes introduced additional functionality with an associated increase in complexity.

134. In combination with the ageing development language used to program RTGS, the result is a system which is more complex to support, heavily reliant on the skills and experience of the team to support it, and more susceptible to errors which take longer to diagnose. Therefore there is an increased risk of functional or configuration changes causing errors and if or when the system does fail it may take longer to resolve the issue.

1Deloitte, Independent Review of RTGS Outage on 20 October 2014 (23 March 2015), https://www.bankofengland.co.uk/-/media/boe/files/report/2015/independent-review-of-rtgs-outage-on-20-october-2014.pdf.

5.176 In this case, there was a design defect. The defect was mentioned at paragraph 151 of the report, but it had been redacted to such an extent that there was no meaningful text. The only information available is that a process known as ‘Process A functionality’ was changed in April 2014 and tested in May 2014 in preparation for the anticipated transfer of CHAPS members, and a design defect was introduced at this stage. This was the cause of the failure.¹

1Independent review of RTGS outage on 20 October 2014: Bank of England’s response, https://www.bankofengland.co.uk/-/media/boe/files/report/2015/independent-review-of-rtgs-outage-on-20-october-2014-boes-response.pdf.

5.177 Other examples include Deutsche Bank AG, where a coding error caused Deutsche to reverse the buy/sell indicator for its CFD Equity Swaps in 2013. This meant it reported them inaccurately to the Financial Conduct Authority (FCA). The FCA imposed a financial penalty of £4,7818,800 on Deutsche for failing to provide accurate reports in accordance with the provisions of the Markets in Financial Instruments Directive.¹ In 2014, the Co-operative Bank identified that statements on a number of loans had been issued three days late because of a software error. Under the provisions of s 6 of the Consumer Credit Act 2006, which inserted s 77A into the Consumer Credit Act 1974, it is necessary to provide an annual statement to each borrower for a fixed-sum credit agreement, which should set out the amount borrowed, the money paid, the interest and the outstanding amount. If the creditor fails to provide the debtor with an annual statement, the creditor is not entitled to enforce the agreement during the period of the failure to comply, and the debtor is not liable to pay any interest during the period. The bank set aside £109.5 million to refund interest payments for this breach of the Act.²

1https://www.fca.org.uk/publication/final-notices/deutsche-bank-ag-2015.pdf; Directive 2004/39/EC of the European Parliament and of the Council of 21 April 2004 on markets in financial instruments amending Council Directives 85/611/EEC and 93/6/EEC and Directive 2000/12/EC of the European Parliament and of the Council and repealing Council Directive 93/22/EEC, OJ L 145, 30.4.2004, p.1.

2Adam Leyland and Beth Brooks, ‘The Co-operative Bank’s £400m costs bill caused by “programming error”’, The Grocer, 29 March 2014, at https://www.thegrocer.co.uk/the-co-operative-group/programming-error-to-blame-for-co-op-banks-400m-bill/356022.article; The Co-operative Bank plc, Annual Report and Accounts for 2013, 151 section 2(iv).

Interception of communications

5.178 In the half-yearly report in July 2015, the Report of the Interception of Communications Commissioner illustrated the effect that errors in software code had had on the interception of communications.¹ Although the number of technical errors were low in comparison to the overall number of requests made, nevertheless the effect such errors had on innocent parties was significant. In paragraph 5.28, it was indicated that eight out of ten errors made in relation to resolving IP addresses to individuals related to investigations into the sexual exploitation of children or cases where serious concerns were raised in relation to the welfare of a child.² The Commissioner commented, at paragraphs 5.29 and 5.37:

Regrettably when errors occur in relation to the resolution of IP addresses the consequences are particularly acute. An IP address is often the only line of enquiry in a child protection case (so called ‘single strand’ intelligence), and it may be difficult for the police to corroborate the information further before taking action. Any police action taken erroneously in such cases, such as the search of an individual’s house who is unconnected with the investigation or a delayed welfare check on an individual whose life is believed to be at risk, can have a devastating impact on the individuals concerned.

…

5.37 … The eight technical system errors led to four warrants being executed at premises unconnected with the investigations and in one of these instances an individual was arrested. In another case the error delayed a welfare check on a child believed to be in crisis. In one instance a person unconnected with the investigation was visited by police. The majority of these errors resulted in communications data being obtained in relation to individuals who were unconnected with those investigations.

1The Rt Hon Sir Anthony May, Half-yearly Report of the Interception of Communications Commissioner (July 2015, HC 308, SG/2015/105).

2There is no suggestion from these examples that it was in error. The report may mean errors in resolving IP addresses in criminal investigations.

5.179 In his Report, the Commissioner said that the Crown Prosecution Service used funds provided by the government to work with vendors and the Home Office to develop secure disclosure systems – and although money has been spent on this issue, technical issues nevertheless continue to arise.¹ As a result of the disclosure of the technical errors, the Commissioner made a number of recommendations regarding technical system errors:

11 Ensure that the [Communication Service Provider] CSP secure disclosure systems are tested sufficiently prior to implementation and after significant updates or upgrades.

12 Ensure there is standardisation and as much consistency as possible in relation to the data entry requirements on the different CSP secure disclosure systems.

13 Requirement for [Single Point of Contact] SPoC to inform CSP immediately if an error is identified which might be the result of a technical system fault (even where the error has been classified as a recordable error).

14 Ensure that there are regular quality assurance audits of the CSP secure disclosure systems to identify any faults at an earlier stage.

15 Ensure that the CSPs and system vendors are aware of the potential significant consequences of system errors, that the public authorities are informed of any systems errors immediately and the errors are fixed at the earliest opportunity.²

1At para 5.53.

2At para 5.40.

5.180 Technical errors continue to be reported by successive Commissioners.¹

1The Rt Hon Sir Stanley Burnton, Annual Report of the Interception of Communications Commissioner 2016 (December 2017, HC 297, SG/2017/77), Error Investigation numbers 22–27; The Rt Hon Lord Justice Fulford, Annual Report of the Investigatory Powers Commissioner 2017 (January 2019, HC 1780, SG/2019/8), Error Investigation numbers 2, 19, 20, 23, and 24; The Rt Hon Sir Brian Leveson, Annual Report of the Investigatory Powers Commissioner 2018 (March 2020, HC 67, SG/2020/8), Error Investigation numbers 16–22.

Most computer errors are either immediately detectable or result from input errors

5.181 Let us consider the proposition that most computer errors are either immediately detectable or result from errors in the data entered into the machines. The evidence is to the contrary: Mr Adams demonstrated that a third of software faults in a large IBM study took at least 5,000 execution years to appear for the first time (this was one of the largest studies of all time);¹ Professor Les Hatton and Andy Roberts conducted a study that demonstrated that seismic programs developed by oil companies were shown to have been used for many years even though they were defective;² and Nancy G. Leveson and Clark S. Turner demonstrated that between June 1985 and January 1987 the Therac-25 medical linear accelerator was involved in massive radiation overdoses, causing the deaths of six people, while others were seriously injured. The detailed investigations eventually indicated that the main cause of the deaths was software errors. Some of the lessons gleaned from the work by Nancy Leveson included the following: too much confidence was placed in the software, an assumption by lay people that software will not or cannot fail, and engineers ignoring software when analysing faults, because it was assumed the hardware was at fault, not the software.³ In this respect, opinions have not changed since 1987.⁴ When investigating sudden unintended acceleration in some of its motor cars in the US, Toyota did not include software engineers in its investigations, and incorrectly ruled out software as the cause of the resulting deaths and injuries.⁵

1Edward N. Adams, ‘Optimizing preventive service of software products’ (1984) 28(1) IBM Journal of Research and Development 2.

2Les Hatton and Andy Roberts, ‘How accurate is scientific software?’ (1994) 20(10) IEEE Transactions on Software Engineering 785.

3‘An investigation of the Therac-25 accidents’ (1993) 26(7) Computer 18 (note the additional information in Nancy Leveson, Software, System Safety and Computers (Addison-Wesley 1995)); for descriptions of what some of the patients suffered, see Lee, The Day the Phones Stopped, chapter 1.

4Simon Oxenham, ‘Thousands of fMRI brain studies in doubt due to software flaws’, New Scientist, 18 July 2016, https://www.newscientist.com/article/2097734-thousands-of-fmri-brain-studies-in-doubt-due-to-software-flaws/; Eklund and others, ‘Cluster failure’.

5Transcript (not proofread) of Bookout v Toyota Motor Corporation Case No. CJ-2008-7969 (Reported by Karen Twyford, RPR): examination and cross examination of Michael Barr 14 October 2013, 76–77, http://www.safetyresearch.net/Library/Bookout_v_Toyota_Barr_REDACTED.pdf.

5.182 Uncovering the faults in devices controlled by software used in medicine is now considered to be an important research area,¹ and in November 2000, 28 patients at the National Cancer Institute in Panama were given massive overdoses of gamma rays partly due to limitations of the computer program that guided use of a radiation therapy machine. A number of patients died.²

1Kevin Fu, ‘Trustworthy medical device software’ (Appendix D, 97–118) in Theresa Wizemann (ed) Public Health Effectiveness of the FDA 510(k) Clearance Process: Measuring Postmarket Performance and Other Select Topics: Workshop Report (Food and Drug Administration 2011), https://www.ncbi.nlm.nih.gov/books/NBK209656/; see also Senate Hearing 112–92, United States Senate, Hearing on a Delicate Balance: FDA and the Reform of the Medical Device Approval Process, 13 April 2011, https://www.aging.senate.gov/hearings/a-delicate-balance-fda-and-the-reform-of-the-medical-device-approval-process.

2Deborah Gage and John McCormick, We Did Nothing Wrong: Case 109 A Dissection, https://edisciplinas.usp.br/pluginfile.php/31797/mod_resource/content/1/casoCancerPanama.pdf; International Atomic Energy Agency, Investigation of an Accidental Exposure of Radiotherapy Patients in Panama Report of a Team of Experts (26 May–1 June 2001), https://www-pub.iaea.org/mtcd/publications/pdf/pub1114_scr.pdf; Cari Borrás, ‘Overexposure of radiation therapy patients in Panama: problem recognition and follow-up measures’ (2006) 20(2/3) Rev Panam Salud Publica/Pan Am J Public Health 173.

5.183 The observations by Professor Leveson will invariably remain relevant: the Toyota recall exercise in late 2009 and early 2010 serves to illustrate this point.¹ The US Congressional Committee on Energy and Commerce heard evidence on this matter, and a report by The National Highway Traffic Safety Administration and the National Aeronautics and Space Administration (NHTSA–NASA), which conducted a study into the problem entitled ‘Study of unintended acceleration in Toyota vehicles’, a revised version of which was published on 15 April 2011,² concluded that it was not proven that faulty software caused the problems, although it was accepted that just because no software faults could be found did not mean that software faults did not occur. The methods used to investigate this matter were challenged.³

1A number of motor manufacturers are facing similar legal actions. It was known that sudden acceleration occurred in the 1980s and 1990s, for which see James Castelli, Carl Nash, Clarence Ditlow and Michael Pecht, Sudden Acceleration: The Myth of the Driver Error (University of Maryland, Calce EPSC Press 2003).

2Available at http://www.nasa.gov/topics/nasalife/features/nesc-toyota-study.html.

3For which see Michael Barr, ‘Firmware forensics: best practices in embedded software source code discovery’ (2011) 8 Digital Evidence and Electronic Signature Law Review 148. For an earlier article, see Joel Finch, ‘Toyota sudden acceleration: a case study of the National Highway Traffic Safety Administration recalls for change’ 22 Loy Consumer L Rev 472.

5.184 Civil proceedings were subsequently initiated by a number of people across the US. In Bookout v Toyota Motor Corporation,¹ Michael Barr, an expert in embedded software, gave evidence for the plaintiff regarding the software code in the relevant motor vehicles. He was also cross-examined about aspects of the NHTSA Report, among other issues. His evidence demonstrated that there were a significant number of errors in the software (referred to as ‘bugs’ in the transcript):

Q. Did you find all the bugs in the software that you reviewed?

A. Absolutely not.

Q. Why not?

A. Because there is a lot of bugs, and all indications are that there are many more. We haven’t specifically gone out looking for bugs. The metrics, like the code complexity and a number of global variables, indicate the presence of large numbers of bugs. And just the overall style of the code is suggestive that there will be numerous more bugs that we haven’t found yet.²

1Case No. CJ-2008-7969. The trial was held in the District Court of Oklahoma, County State of Oklahoma before the Hon Patricia G. Parrish, District Judge.

2Transcript (not proofread) of the trial 14 October 2013 before the Hon Patricia G. Parrish, District Judge (Reported by Karen Twyford, RPR): examination and cross examination of Michael Barr, 47–48, http://www.safetyresearch.net/Library/Bookout_v_Toyota_Barr_REDACTED.pdf.

5.185 He also demonstrated that motor cars are now largely run by software. In fact, motor cars have more software code than aircraft, and are prone to software recalls.¹ Drivers no longer have total control over their vehicles.² For instance, it was explained how the driver is no longer in direct control of the throttle:

But the driver had always been directly in control of the air, which is directly related to how much power the engine has. When electronic throttle control comes in, you have software that is now responsible for all three of them at once. So you have a portion of the software, the job of which is to make the spark at the right time, inject the fuel at the right time and the right amount, and open the throttle a certain amount.

…

The software in electronic throttle control is responsible for all three things, which means if the software malfunctions, it has control of the engine and can take you for a ride. What is of particular importance is that there is another part of the software that is looking at the driver controls, looking at the accelerator pedal and cruise control -- it is looking at more than that, but that is a simplification, that is appropriate right now -- so there is a part of the software looking at what the accelerator pedal position is, is it down, is it up, how much down. Then that is translating that into a calculated throttle angle. And then another part of the software is performing the sparking and the throttle control.³

1Robert N. Charette, ‘This car runs on code’, IEEE Spectrum, 1 February 2009 http://spectrum.ieee.org/transportation/systems/this-car-runs-on-code; Jürgen Mössinger, ‘Software in automotive systems publication’ (2010) 27(2) IEEE Software 92; ‘Today’s car has the computing power of 20 modern PCs, features about 100 million lines of code, and processes up to 25 gigabytes of data per hour’: Connected Car, Automotive Value Chain Unbound (McKinsey & Company, September 2014), 11. https://www.sas.com/images/landingpage/docs/3_McKinsey_John_Newman_Connected_Car_Report.pdf; James Scoltock, ‘As vehicles become more reliant on software, the amount of code needed to run them is challenging OEMs and suppliers alike’ Eureka Magazine, 1 February 2018, https://www.eurekamagazine.co.uk/design-engineering-features/technology/as-vehicles-become-more-reliant-on-software-the-amount-of-code-needed-to-run-them-is-challenging-oems-and-suppliers-alike/168096/.

2For which see the prosecution of a driver in Switzerland driving a Tesla motor vehicle in ‘Traffic-Aware Cruise Control’ and ‘Autosteer’ mode: PEN 17 16 DIP, 30 May 2018, Regionalgericht Emmental-Oberaargau, Strafabteilung (Regional Court Emmental-Oberaargau, Criminal Division), translated by Thierry Burnens, (2020) 17 Digital Evidence and Electronic Signature Law Review 97.

3Transcript of the trial of 14 October 2013, 53.

5.186 Mr Barr established that the motor vehicle had errors in the throttle system:

A. So the first main conclusion is that the 2005 Camry electronic throttle control, the software is of unreasonable quality. It contains bugs, but that’s not the only reason it is of unreasonable quality. And it’s otherwise defective for a number of reasons. This includes bugs that when put together with the defects can cause unintended acceleration.

Q. As we go forward are you going to explain to us how those problems that you found will cause an unintended acceleration?

A. Yes.

Q. Then you mentioned the code quality metrics. What do you mean about that?

A. So the code complexity and the McCabe Code Complexity is one of the measures of that.¹ And the code complexity for Toyota’s code is very high. There are a large number of functions that are overly complex. By the standard industry metrics some of them are untestable, meaning that it is so complicated a recipe that there is no way to develop a reliable test suite or test methodology to test all the possible things that can happen in it. Some of them are even so complex that they are what is called unmaintainable, which means that if you go in to fix a bug or to make a change, you’re likely to create a new bug in the process. Just because your car has the latest version of the firmware – that is what we call embedded software – doesn’t mean it is safer necessarily than the older one.²

1McCabe Code Complexity has no sound theoretical basis. It is a rule of thumb. I owe this point to Dr Michael Ellims.

2Transcript of the trial of 14 October 2013, 65–66.

5.187 Mr Barr stated his overall opinion in the following terms: ‘ultimately my conclusion is that this Toyota electronic throttle control system is a cause of [unintended acceleration] software malfunction in this electronic throttle module, can cause unintended acceleration.’¹ The members of the jury found in favour of the plaintiffs, and awarded damages of US$1.5 million to each of the plaintiffs. The US Department of Justice subsequently concluded a criminal investigation into the Toyota Motor Company regarding the widespread incidents of unintended vehicle acceleration that caused panic for Toyota owners between 2009 and 2010. It was established with certainty that Toyota intentionally concealed information and misled the public about the safety issues behind these recalls.² It was alleged that Toyota made misleading public statements to consumers, gave inaccurate facts to Members of Congress and concealed the extent of problems that some consumers encountered from federal regulators. For instance, Betsy Benjaminson, a translator for Toyota, realized what she was translating was highly significant:

She began working on Toyota litigation in 2010. Before then, she’d been ‘oblivious’ to the events in the U.S., she says. Slowly she began to notice ‘odd things’ in documents she saw in connection with her role as translator. Revised press releases sometimes obscured important details, she says. Emails among engineers ‘revealed facts that directly contradicted’ Toyota’s public statements.

Then it got worse. She read reports about runaway cars, including survivors’ accounts of crashes that killed their companions. She was deeply affected. A ‘tipping point’ came when she read a document the company had prepared based on complaints filed with NHTSA. ‘A summary of the injuries and deaths was attached’, she recalls, ‘and it was cynically titled “Souvenirs from NHTSA”.’ For her, that was it. ‘At that moment,’ she says, ‘I knew something was really wrong inside the company.’ ³

1Transcript of the trial of 14 October 2013, 67.

2The literature on this topic in general merits further analysis, but is beyond the scope of this chapter: Suzanne M. Kirchhoff and David Randall Peterman, Unintended Acceleration in Passenger Vehicles (Congressional Research Service 7-5700, R41205, 26 April 2010); R. Graham Esdale Jr and Timothy R. Fiedler, ‘Toyota’s deadly secrets’, 46-SEP Trial 16; Finch, ‘Toyota sudden acceleration’; Molly S. O’Neill, ‘Faulty cars or faulty drivers: the story of sudden acceleration and Ford Motor Company’ (undated and scanned images from an unidentified book), available at http://www.suddenacceleration.com/article-2/; Scott Elder and Travis Thompson, ‘Recent development in automobile consumer class actions’ 41-FALL Brief 44; Katherine Gardiner, ‘Recent developments in automobile law’ 47 Tort Trial & Ins Prac LJ 45; Joseph Gavin, ‘Crash test dummies: what drives automobile safety in the United States?’ (2012) 25 Loy Consumer L Rev 86; Maria N. Maccone, ‘Litigation concerning sudden unintended acceleration’ 132 Am Jur Trials 305; Qi Van Eikema Hommes, ‘Review and Assessment of the ISO 26262 Draft Road Vehicle – Functional Safety’ (SAE Technical Paper 2012-01-0025, 2012); David C. Vladeck, ‘Machines without principals: liability rules and artificial intelligence’ 89 Wash L Rev 117; Aaron Ezroj, ‘Product liability after unintended acceleration: how automotive litigation has evolved’ 26 Loy Consumer L Rev 470; Antony F. Anderson, ‘Intermittent electrical contact resistance as a contributory factor in the loss of automobile speed control functional integrity’ (2014) 2 IEEE Access 258; Antony F. Anderson, ‘Case study: NHTSA’s denial of Dr Raghavan’s petition to investigate sudden acceleration in Toyota vehicles fitted with electronic throttles’ (2016) 4 IEEE Access 1417.

3David Hechler refers to Betsey Benjaminson, a translator who illustrated the mismatch in evidence when she informed the US authorities: ‘Lost in translation?’, 79, http://www.asbpe.org/blog/2014/07/28/david-hechler-wins-asbpes-2014-stephen-barr-award-for-article-on-toyotas-fatal-acceleration-problems/; the Crown Prince was having troubles with his vehicle, which the manufacturer took pains to resolve: David McNeil, ‘Imperial Family’s car woes sparked Toyota whistleblower’, The Japan Times, 9 June 2013, http://www.japantimes.co.jp/news/2013/06/09/business/corporate-business/imperial-familys-car-woes-sparked-toyota-whistleblower/#.WJ14B-l4j8s.

5.188 In its settlement with the Department of Justice, Toyota admitted its wrongdoing in making such misleading statements in the Statement of Facts filed with the criminal information, and also admitted that it undertook these actions as an act of concealment as part of efforts to defend its brand. In consequence, Toyota paid a financial penalty of US$1.2 billion under the settlement.¹

1http://www.justice.gov/usao-sdny/programs/victim-witness-services/united-states-v-toyota-corporation.

Challenging the authenticity of digital data – trial within a trial

5.189 Laying the evidentiary foundations for the authenticity of electronic evidence is discussed elsewhere in this text, but if the authenticity of evidence is raised by one of the parties, it is appropriate to deal with it in a trial within a trial.¹ This will be a rare occurrence, as noted in R. v Wayte (William Guy)² by Bedlan J:

It may be that in very rare cases, there will have to be a trial within a trial on the issue of the admissibility … but on such an issue, where the party producing the document and arguing for its admissibility contends that it is genuine … the issue will invariably be left to the jury.³

1Rosemary Pattenden, ‘Pre-verdict judicial fact-finding in criminal trials with juries’ (2009) 29(1) Oxford Journal of Legal Studies 1.

2[1982] 3 WLUK 247, (1982) 76 Cr App R 110, CA, Times, 24 March 1982, [1983] CLY 659.

3(1982) 76 Cr App R 110 at 118.

5.190 In R. v Stevenson (Ronald), R. v Hulse (Barry), R. v Whitney (Raymond),¹ Kilner Brown J was required to establish whether audio tapes were originals. After a lengthy and careful examination of the evidence held in a trial within a trial, it became clear that there was an opportunity for someone to have interfered with the original tape, and there was evidence that some interference might have taken place. Given the nature of the evidence before him, he said:

Once the original is impugned and sufficient details as to certain peculiarities in the proffered evidence have been examined in court, and once the situation is reached that it is likely that the proffered evidence is not the original, is not the primary and best evidence, that seems to be to create a situation in which, whether on reasonable doubt or whether on a prima facie basis, the judge is left with no alternative but to reject the evidence.²

1[1971] 1 WLR 1, [1971] 1 All ER 678, [1970] 10 WLUK 82, (1971) 55 Cr App R 171, (1971) 115 SJ 11, [1971] CLY 2264.

2[1971] 1 WLR 1 at 3G.

5.191 In the case of R v Robson (Bernard Jack), R v Harris (Gordon Federick),¹ the defence raised the issue of the admissibility of the evidence of 13 tape recordings. The judge had to consider whether, on the face of it, the tapes were authentic in the absence of the members of the jury. Shaw J heard evidence in a trial within a trial from a number of witnesses who gave evidence of the history of the tapes, from the actual process of recording to the time they were produced in court. He also listened to four experts called on behalf of the defence, whose examination of the tapes led them to question their originality and authenticity. The prosecution called a separate witness in rebuttal. After hearing the evidence, Shaw J decided that the tape recordings were originals and authentic, commenting that:

My own view is that in considering that limited question [the primary issue of admissibility] the judge is required to do no more than to satisfy himself that a prima facie case or originality has been made out by evidence which defines and describes the provenance and history of the recording up to the moment of production in court.²

1[1972] 1 WLR 651, [1972] 2 All ER 699, [1972] 3 WLUK 89, (1972) 56 Cr App R 450, [1972] Crim LR 316, (1972) 116 SJ 313, [1972] CLY 642.

2[1972] 1 WLR 651 at 653H.

5.192 Professor Tapper expressed the view that this exercise should be conducted first by the judge, and if, on the balance of probabilities, the judge determines the evidence could go before the jury, it would then be necessary to cover the same ground again in the same way as any other question of fact that must be decided at trial.¹ On the standard of proof to be used by the judge, O’Connor LJ indicated the criminal standard of proof is to be used in the context of handwriting,² and in the case of R v Minors (Craig), R v Harper (Giselle Gaile),³ Steyn J, as he then was, set out the opinion of the Court of Appeal on this matter in relation to a computer printout:

The course adopted by the judge in one of the two appeals before us prompts us to refer to the procedure which ought to be adopted in a case where there is a disputed issue as to the admissibility of a computer printout. It is clear that in such a case a judge ought to adopt the procedure of embarking on a trial within a trial.⁴

1Colin Tapper, Computer Law (4th edn, Longman 1989), 370; see also Rosemary Pattenden, ‘Authenticating “things” in English law: principles for adducing tangible evidence in common law jury trials’ (2008) 12 E & P 273 and ‘Pre-verdict judicial fact-finding in criminal trials with juries’ (2009) 29 Oxford Journal of Legal Studies 1; in the context of s 69 Police and Criminal Evidence Act 1984, Professor Smith commented that during a trial within a trial, if a document is tendered by the prosecution, the standard is beyond reasonable doubt, and if tendered by the defence, the standard is presumably on the balance of probabilities: R. v Shephard (Hilda) [1993] Crim LR 295, 296.

2R. v Ewing (Terence Patrick) [1983] QB 1039, [1983] 3 WLR 1, [1983] 2 All ER 645, [1983] 3 WLUK 125, (1983) 77 Cr App R 47, [1984] ECC 234, [1983] Crim LR 472, (1983) 127 SJ 390, Times, 15 March 1983, [1983] CLY 63.

3[1989] 1 WLR 441, [1989] 1 All ER 208, [1988] 12 WLUK 161, (1989) 89 Cr App R 102, [1989] Crim LR 360, (1989) 133 SJ 420, [1989] CLY 546.

4[1989] 1 WLR 441 at 448.

5.193 He went on to indicate that the judge should apply the ordinary standard of criminal proof in reaching a decision, and in the case of R. v Neville,¹ the members of the Court of Appeal also noted that trial judges ‘should examine critically any suggestion that a prior computer malfunction has any relevance to the particular computer record tendered in evidence’.² The decision of the Court of Appeal in R v Minors (Craig, R v Harper (Giselle Gaile) to require a judge to apply the ordinary standard of criminal proof in reaching a decision when hearing evidence in a trial within a trial overrules the decision of Shaw J in R v Robson (Bernard Jack), R v Harris (Gordon Federick) (in which he reached an opinion that the standard was on a balance of probabilities³), although there is much to commend the view of Shaw J when he suggested that the prosecution need do no more than set up a prima facie case in favour of the authenticity of the evidence:

It may be difficult if not impossible to draw the philosophical or theoretical boundary between matters going to admissibility and matters going properly to weight and cogency; but, as I have already said, it is simple enough to make a practical demarcation and set practical limits to an inquiry as to admissibility if the correct principle is that the prosecution are required to do no more than set up a prima facie case in favour of it. If they should do so, the questioned evidence remains subject to the more stringent test the jury must apply in the context of the whole case, namely, that they must be sure of the authenticity of that evidence before they take any account of its content.⁴

1[1990] 11 WLUK 143, [1991] Crim LR 288, [1991] CLY 623.

2[1991] Crim LR 288, 289.

3[1972] 1 WLR 651 at 656C; this standard was agreed by counsel on both sides at 653E.

4[1972] 1 WLR 651 at 655H–656A.

5.194 The standard that a judge must apply in determining the admissibility of a videotape was considered by Cameron JA in the Canadian case of R v Penney¹ before the Newfoundland and Labrador Court of Appeal in 2002. In this instance, the prosecution sought to adduce evidence of the killing of marine animals. The evidence comprised a video recording of the killing of a seal. The recording had been frequently switched on and off as the operator of the camera selected scenes to record. The recording was filmed in mini-digital format, transferred to Beta format and then to VHS format. Before the Crown took possession of the tape, it had been in the possession of a professional editing studio for several months. There was no attempt to provide for the security of or to restrict access to the tape. The Crown called the camera operator and the owner of the company for whom the camera operator worked to give evidence during the trial within a trial. The trial judge concluded that the witnesses were not credible and failed to tell the truth. He therefore refused to admit the video recording in any format. The Crown appealed to the summary appeal conviction court, which allowed the appeal. A subsequent appeal to the Newfoundland and Labrador Court of Appeal reversed the summary appeal conviction court decision and the decision of the trial judge was restored. Cameron JA addressed the issue of the standard that a trial judge should apply in determining the admissibility of videotape evidence, indicating that:

The issue then is whether in making this finding the trial judge was usurping the role of the jury (or in this case the role of the judge at trial) or was properly carrying out the function of the judge on determination of the admissibility of hard evidence.²

1(2002) 163 CCC (3d) 329.

2(2002) 163 CCC (3d) 329 at [40].

5.195 He went on:

[43] In my view, this consideration is really a matter of weighing prejudice against probative value, in much the same way that a trial judge must examine many other kinds of evidence.

[44] It is the question of fairness and absence of any intention to mislead that is really at issue in this case. The trial judge on a voir dire must determine whether a videotape being offered in evidence has been edited in such a way as to distort the truth.

5.196 Reference was made to R v Nikolovski,¹ which established that where a videotape has not been altered or changed, and where it depicts the scene of a crime, then it becomes admissible and relevant evidence.² In R v Bulldog,³ the members of the Court of Appeal of Alberta considered this issue, and emphasized that ‘What matters with a recording, then, is not whether it was altered, but rather the degree of accuracy of its representation’.⁴ In R v Penney, the judge addressed the problem of the falsification of evidence by pointing out that the members of a jury ‘can be expected to have, if not experience with, knowledge of the possibilities for manipulating the content of photographs and videotapes’, and concluded that the ‘standard by which the trial judge is to determine the question is on the balance of probabilities’.⁵

1(1996) 111 CCC (3d) 403, [1996] 3 SCR 1197 403.

2In R. v Andalib-Goortani, 2014 ONSC 4690 (CanLII), the prosecution failed to establish the authenticity of a digital image obtained from the Internet: the metadata had been removed, and it was not possible to ascertain the provenance of the image.

32015 ABCA 251 (CanLII); 326 CCC (3d) 385; [2015] AJ No 813 (QL).

42015 ABCA 251 (CanLII) at [32].

5(2002) 163 CCC (3d) 329.

5.197 If the standard of proof of a trial within a trial is the criminal standard, it can be argued that the prosecution is required to prove its case twice: once to the trial judge and a second time before the members of the jury. Arguably, the duty of the trial judge is to sift the evidence sufficiently to establish whether it is to go before the members of the jury in cases where the authenticity of the evidence is questioned by the defence.

A protocol for challenging software in devices and systems

5.198 Should it become the norm for the defence to challenge the authenticity of evidence in digital form, it is suggested that consideration might be given to the development of a protocol to deal with such challenges:

(1) First, in criminal proceedings, the prosecution should be required to inform the trial judge and defence in advance that it intends to rely on the presumption.

(2) Where the prosecution demonstrates reliance is warranted (with appropriate evidence), it will be for the defence to warn the trial judge that it will question the use of the presumption, in particular the authenticity of identified aspects of the evidence, and to set out the grounds upon which the challenge is made.¹

1To a certain extent this might be already happening, for which see Oriola Sallavaci, ‘Streamlined reporting of forensic evidence in England and Wales: is it the way forward?’ (2016) 20(3) E & P 235.

5.199 Such an approach would be entirely consistent with the trial management procedures set out in Part 3, rule 3.3(2)(c)(ii) of the Criminal Procedure Rules 2015 (as amended). If this first hurdle is overcome, then it will be for the trial judge to decide whether a trial within a trial is necessary, and if so, to set out the parameters, including the standard of proof, for which a ruling is required.

5.200 There is something missing in the suggestion noted above regarding criminal proceedings: there is no discussion regarding the sufficiency of the evidence the defence must adduce to persuade a judge to order appropriate disclosure.¹ Professor Imwinkelried has also considered this problem,² and has proposed a two-step process, the first part of which is:

Faced with competing legitimate interests, a trial judge must attempt to strike a rational balance. In this context, the judge could do so by proceeding in two steps. First, a judge should assign to the accused seeking discovery the burden of showing that the facts of the instant prosecution exceed, or are at the margins of, the validation range of the empirical studies relied on by the prosecution. More specifically, the defendant must convince the judge that the available studies do not adequately address the effect of a specified, material variable or condition present in the instant case. The most clear-cut case would be a fact situation in which none of the available studies relied on by the prosecution experts tested the application of the technique to fact situations involving the condition.³

1There are profound concerns relating to the disclosure (or discovery) of evidence in both civil and criminal proceedings. For the USA, see Matt Tusing, ‘Machine-generated evidence’, 43 No 1 The Reporter 13; Katherine Kwong, ‘The algorithm says you did it: the use of black box algorithms to analyze complex DNA evidence’ (2017) 31 Harv JL & Tech 275; Vera Eidelman, ‘The First Amendment case for public access to secret algorithms used in criminal trials’ (2018) 35 Ga St U L Rev 915; Sonia K. Katyal, ‘The paradox of source code secrecy’ (2019) 104 Cornell L Rev 1183; Rebecca Wexler, ’Life, liberty, and trade secrets: intellectual property in the criminal justice system’ (2018) 70 Stan L Rev 1343, in which the author considers the history of the trade secret privilege, uncovering an interesting development where it was demonstrated that Wigmore was initially hostile to the privilege (at 1383), but his opinion later changed. He admitted in an aside that his brother had suffered loss relating to intellectual piracy (at 1385); Steven M. Bellovin, Matt Blaze, Susan Landau and Brian Owsley, ‘Seeking the source: criminal defendants’ constitutional right to source code’ (2021) 17(1) Ohio State Tech LJ 38.

2Edward J. Imwinkelried, ‘Computer source code: a source of the growing controversy over the reliability of automated forensic techniques’ (2017) 66 DePaul L Rev 97.

3Imwinkelried, ‘Computer source code’, 128.

5.201 Professor Imwinkelried then indicates, at 128, that ‘The judge should certainly not accept the ipse dixit assertion of the defense counsel that the omitted condition is material in the sense that its presence could affect the outcome of the test’. Providing the defence has met the burden of part one, the second part of the test provides as follows:

Even then the judge should not automatically require the manufacturer to furnish the defense with a printout or electronic version of the source code. Instead, the judge could give the manufacturer a choice to: either (1) allow the defense to test the application of the program to a fact situation including the material condition or variable omitted from the validation studies, or (2) provide the defense with the source code.

5.202 Professor Imwinkelried points out, as 129, that:

At the end of this first step, the judge is not licensing a fishing expedition of unlimited scope; rather, the judge is authorizing discovery designed to meet a discrete defense criticism of the state of the empirical record in order to determine whether the technique can be reliably applied to the facts in the pending case.

5.203 There are criticisms of this proposal. Professor Martyn Thomas has pointed out that an argument or proposal that depends on the outcome of testing to provide evidence of the correctness or of the specific, required reliability of some software is almost always based on erroneous or unverified assumptions.¹ At the very least, it should always be challenged by the following questions:

(1) How many tests will be enough to satisfy the required threshold of confidence in the evidence?

(2) How and on what assumptions will the applicant arrive at that number of tests?

(3) What is the procedure to be used to test the software and to verify the test results?

(4) On what assumptions can this be achieved within a practical period of time?

1Email communication between the author and Professor Thomas CBE.

5.204 The answers to these questions will either reveal a fundamental flaw or provide the basis for challenging the assumptions. In addition, it is not clear how ‘reliable’ a court requires a forensic test to be. If a forensic system was known to be right more than half the time and randomly wrong otherwise, the question is whether a single positive result will pass the on-the-balance-of-probabilities requirement for a civil case.

5.205 As all judges are only too well aware, there is a danger that the trial judge may be seen to usurp the functions of the members of the jury in reaching preliminary decisions on authenticity when conducting a trial within a trial. Marshall J, in delivering the judgment of the Court of Appeal in the case of R. v Ali (Maqsud), R. v Hussain (Ashiq),¹ indicated that conducting a trial within a trial should be a rare occurrence:

In the view of this court the cases must be rare where the judge is justified in undertaking his own investigation into the weight of the evidence, which, subject to proper directions from the judge, is really the province of the jury, but the court sees that there can be cases – but they must be rare – where the issues of admissibility and weight can overlay each other.²

1[1966] 1 QB 688, [1965] 3 WLR 229, [1965] 2 All ER 464, [1965] 4 WLUK 27, (1965) 49 Cr App R 230, (1965) 129 JP 396, (1965) 109 SJ 331, [1965] CLY 796.

2[1966] 1 QB 688 at 703C.

5.206 This restricted view was reinforced by the comments in R. v Stevenson (Ronald), R. v Hulse (Barry), R. v Whitney (Raymond)¹ of Kilner Brown J:

as a general rule it seems to me to be highly undesirable, and indeed wrong for such an investigation to take place before the judge. If it is regarded as a general practice it would lead to the ludicrous situation that in every case where an accused person said that the prosecution evidence is fabricated the judge would be called upon to usurp the functions of the jury.²

1[1971] 1 WLR 1, [1971] 1 All ER 678, [1970] 10 WLUK 82, (1971) 55 Cr App R 171, (1971) 115 SJ 11, [1971] CLY 2264.

2[1971] 1 WLR 1 at 4E.

5.207 Where the matter of authentication is raised, the trial judge is required to decide whether to conduct a trial within a trial. Where the decision is made to hold a trial within a trial, it will be useful for the judge to set out the scope of the hearing. In R v Robson (Bernard Jack), R v Harris (Gordon Federick), Shaw J said that where such a hearing takes place, it should be defined narrowly.¹ This must be right.

1[1972] 1 WLR 651 at 655H.

5.208 In respect of the costs of such an exercise, in R. v Saward (Steven Kevin), R. v Bower (Steven Kevin), R. v Harrison (Keith),¹ the prosecution sought the admission of recordings of telephone conversations that were intercepted by the Dutch police and stored on a CD. The judge was invited to conduct a trial within a trial to determine whether or not the data recorded on the CD, transferred from a mainframe computer located in the Netherlands, was admissible in evidence as authentic, accurate and a reliable copy. The trial within a trial lasted for four days, and a number of witnesses, including British officers and a Dutch police officer, were called to give evidence. Lady Justice Hallett commented, at [44], on the costs of such an exercise:

Given the evidence available to the Crown we also have reservations about the profitability of the four day exercise of putting the Crown to strict proof of the exhibit. All of those involved in the conduct of criminal trials must be aware by now of the constraints upon resources and we are far from persuaded that this was a proper use of limited resources.

1[2005] EWCA Crim 3183, [2005] 11 WLUK 351.

5.209 The defence drew a number of errors in the CD recording to the attention of the trial judge, and it was only right that this issue should be considered.

5.210 When collecting electronic evidence, the investigator needs to pay careful attention to the process by which the evidence was obtained, and to demonstrate the provenance of the evidence. In R. v Skinner (Philip),¹ the defence called into question evidence of screen images obtained by a police constable when conducting an investigation into indecent photographs of children. During the trial within a trial, the police officer gave evidence that he had a ‘source’ for the screen images. He admitted entering a website that he was not prepared to identify, and could only provide limited information about the provenance of the material he produced for the purposes of the investigation: namely, images that appeared on screen that were produced in the form of a printout. He refused to name or identify the website he had entered. It was held by the members of the Court of Appeal that the trial judge wrongly admitted the evidence. First, the members of the Court accepted that it was probable that the screen images were real evidence, because their content did not require any computer input, and likened the image to somebody switching on a television set. However, the printouts were not authenticated properly under the provisions of s 27 of the Criminal Justice Act 1988, and for that reason, the trial judge should not have admitted them. Second, there was no public interest immunity hearing to enable the judge to decide whether the prosecution need not disclose or need not give evidence as to the process by which the screen image reached the police officer, or in the absence of a proper explanation, how the screen image came to be on the police officer’s computer. It was conceded that a public interest immunity hearing should have been requested, and in such circumstances the trial judge was wrong to admit the evidence.

1[2005] EWCA Crim 1439, [2005] 5 WLUK 506, [2006] Crim LR 56.

Reintroduction of the common law presumption

5.211 The Law Commission proposed the repeal of s 69 of the Police and Criminal Evidence Act 1984 and a return to the common law presumption:

In the absence of evidence to the contrary, the courts will presume that mechanical instruments were in order at the material time.¹

1Section 69 ceased to have any effect under s 60 of the Youth Justice and Criminal Evidence Act 1999, and s 69 was also repealed by Schedule 6; the Law Commission, Evidence in Criminal Proceedings: Hearsay and Related Topics, 13.13; Katie Quinn, ‘Computer evidence in criminal proceedings: farewell to the ill-fated s.69 of the Police and Criminal Evidence Act 1984’ (2001) 5(3) E & P 174; Amanda Hoey, ‘Analysis of the Police and Criminal Evidence Act, s.69 – computer generated evidence’ [1996] 1 Web JCLI.

5.212 The grounds for justification were set out in paragraphs 13.7 – 13.11, and are reproduced below with the references omitted:

The problems with the present law

13.6 In the consultation paper we came to the conclusion that the present law was unsatisfactory, for five reasons.

13.7 First, section 69 fails to address the major causes of inaccuracy in computer evidence. As Professor Tapper has pointed out, ‘most computer error is either immediately detectable or results from error in the data entered into the machine.’¹

13.8 Secondly, advances in computer technology make it increasingly difficult to comply with section 69: it is becoming ‘increasingly impractical to examine (and therefore certify) all the intricacies of computer operation’. These problems existed even before networking became common.²

13.9 A third problem lies in the difficulties confronting the recipient of a computer produced document who wishes to tender it in evidence: the recipient may be in no position to satisfy the court about the operation of the computer. It may well be that the recipient’s opponent is better placed to do this.

13.10 Fourthly, it is illogical that section 69 applies where the document is tendered in evidence, but not where it is used by an expert in arriving at his or her conclusions, nor where a witness uses it to refresh his or her memory. If it is safe to admit evidence which relies on and incorporates the output from the computer, it is hard to see why that output should not itself be admissible; and conversely, if it is not safe to admit the output, it can hardly be safe for a witness to rely on it.

13.11 At the time of the publication of the consultation paper there was also a problem arising from the interpretation of section 69. It was held by the Divisional Court in McKeown v DPP that computer evidence is inadmissible if it cannot be proved that the computer was functioning properly – even though the malfunctioning of the computer had no effect on the accuracy of the material produced. Thus, in that case, computer evidence could not be relied on because there was a malfunction in the clock part of an Intoximeter machine, although it had no effect on the accuracy of the material part of the printout (the alcohol reading). On appeal, this interpretation has now been rejected by the House of Lords: only malfunctions that affect the way in which a computer processes, stores or retrieves the information used to generate the statement are relevant to section 69.

1Ladkin and others, ‘The Law Commission presumption concerning the dependability of computer evidence’ (commented on this citation by the Law Commission, at 3: ‘We were surprised to read Tapper’s suggestion that the Tapper Condition categorises “most computer error”, even allowing that he was writing in 1991. Reading the original paper, it seems to us as if Professor Tapper was not categorising “most computer error” in unqualified terms, but rather considering particular phenomena that are manifest in the use of one specific sort of IT system, namely systems commonly used for clerical work (maybe, more specifically, for legal-clerical work). The Tapper Condition does not appear to hold in general.’

2It may be the case that computer technology made it increasingly difficult to comply with the provisions of s 69, but this is not an argument to presume that mechanical instruments were in order at the material time. Professor Les Hatton, in his article ‘The chimera of software quality’ 103, stated that:

computer programs are fundamentally unquantifiable at the present stage of knowledge, and we must consider any proof based on them flawed until we can apply the same level of verification to a program as to a theorem.

Scientific papers are peer reviewed with a long-standing and highly successful system. The computer programs we use today to produce those results generally fly somewhere off the peer-review radar. Even worse, scientists will swap their programs uncritically, passing on the virus of undiscovered software faults.

That the peer review process is successful is debatable – the scientific community itself has raised concerns about the various biases that afflict the selection and review processes of scientific papers and their eventual publication.

5.213 Curiously, the authors of the report did not produce any evidence to establish whether it is generally true in the absence of contrary evidence that ‘mechanical instruments were in order at the material time’. There was no evidence to demonstrate that software code should benefit from this assertion. There was also no discussion of what is meant by ‘in order’. This is an important issue, bearing in mind that the presumption is a presumption without the requirement of proof of a basic fact.¹ There was a great deal of technical material in the 1970s and 1980s to demonstrate that software errors might not be obvious. Indeed, in 1986 Professor Rudolph J. Peritz noted the following (footnotes omitted):

[to] grant greater credibility to computerized records … because they have not been touched by ‘the hand of man’ succumbs to two delusions. First, it is the hands and intellects of men and women that produce computers and the programs that guide them. To believe that the absence of direct physical contact means that records are untouched betrays a naive view of electronic data processing, one that ignores the centrality of humans to any computer system’s functioning. Second, trustworthiness is equated with electronic processing and opposed to human reckoning … It ignores, for example, the great dangers of traceless change and unauthorized access, as well as the benefits of having the proponent present evidence to prove systemic accuracy.

…

Throughout law’s intellectual history, scholars and jurists have sought methodological objectivity to justify legal decision making … The jurisprudential lure of computer technology is a perceived absence of discretion. Once designed, built, and programmed, the machinery objectively executes the will of its creators, and thus is perceived as trustworthy. But closer scrutiny reveals, at best, a paradox of complete submission and complete autonomy. A computer performs relentlessly just as we have designed and programmed it, and in so doing, it is entirely independent of us. Computerized records also are treated as trustworthy for a second reason—because the technology is perceived as error-free. Moreover, even on those exceptional occasions of technological failure, we believe, a computer will still inform us that an error has occurred. In sum, we have come to believe that unacknowledged error and subjectivity are not only undesirable, but also indigenous to the human domain.

But experience can teach us that such idealization of technology is a mirage that obfuscates the overlapping horizons of humans and computers, as well as their distinctive characteristics. In the human drama of litigation, better attention to the pragmatic jurisprudence of the Federal Rules of Evidence, as well as to the thoughtful practice recommended by the Manual for Complex Litigation, can help to dispel such harmful illusions. The concrete result of this attention will be the extension to the objecting party and to the court of a fair opportunity to evaluate the trustworthiness of all documents generated from computerized data.²

1Quinn, ‘Computer evidence in criminal proceedings’, 182.

2Rudolph J. Peritz, ‘Computer data and reliability: a call for authentication of business records under the federal rules of evidence’, 1001–1002; at the time of writing this article, Professor Peritz was a Visiting Associate Professor of Law at Benjamin N. Cardozo School of Law, and had worked with computers since 1962 as a programmer, operator, systems engineer and legal consultant. He was fully conversant with the errors regarding software code that occurred regularly.

5.214 In England and Wales, s 69 was subsequently repealed,¹ and a similar reform was adopted with respect to evidence in electronic form for civil proceedings with the passing of the Civil Evidence Act 1995. It is suggested that the presumption, as set out above, that ‘mechanical instruments were in order at the material time’ remains far too crude an assumption to apply to computers. The authors of the Law Commission Report cite excellent reasons as to why the criminal law might be amended, but the proponents of the presumption should establish what they mean by the term ‘mechanical instruments were in order at the material time’ when referring to computers or computer-like devices. A fundamental problem is caused by the fact that software errors can be present (in large numbers), but not observable in use until a specific situation is encountered.² For example, the ‘Shellshock’ vulnerability (CVE-2014-6271³) had been dormant since 1989 in a program called Bash, which was used in Unix systems for years.

1By s 60 of the Youth Justice and Criminal Evidence Act 1999.

2Stephen Castell, ‘Computers trusted, and found wanting’ (1993) 9(4) Computer Law and Security Report 155; Castell, ‘Letter to the editor’, 158 – the views expressed by Dr Castell, despite their age, remain valid; Student Comment, ‘A reconsideration of the admissibility of computer-generated evidence’ (1977) 126(1) University of Pennsylvania Law Review 425; George L. Paul, ‘Systems of evidence in the age of complexity’ (2014) 12(2) Ave Maria L Rev 173.

3https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-6271.

5.215 Various challenges have been made in criminal proceedings as to the accuracy of speed measuring devices and breath analysis machines. Such devices rarely undergo a catastrophic failure, but they will drift from being accurate, which means a recalibration is necessary from time to time. Such devices continue to be the subject of challenge. This topic is not dealt with in any depth, because the aim of this chapter is to discuss the fragility of software code in particular, although the drift or wearing out of components can of itself be a cause of software error if the software was never designed to cope with the changes that occur in such circumstances.¹ With rare exceptions, such challenges have failed. For instance, in the case of Darby (Yvonne Beatrice) v DPP² the assertions of a police officer familiar with the use of such a device was held to be sufficient evidence to sustain the finding that the device was working correctly,³ although where the legislation requires the date and time at which a specimen was provided to be printed on the printout and the date is incorrect, the machine is not considered to be capable of being ‘reliable’.⁴ This is supported by the comments of Kourakis and Blue JJ of the Supreme Court of South Australia in Police v Bleeze, who stated that ‘an evidential basis for the presumption of accuracy of a scientific instrument, in a proper case, may be given by a person who, even though not a scientist with expertise in the machine’s technology, is properly trained in its operation’.⁵

1For the early history of case law, see ‘The breathalyser’ by A Magistrates’ Clerk, (1970) 34 The Journal of Criminal Law 206, and for a later analysis, see C. E. Bazell, ‘Challenging the breathalyser’ (1988) 52 Journal of Criminal Law 177 and F. G. Davies, ‘Challenging the accuracy of the breath-test device’ (1988) 52 Journal of Criminal Law 280; Ian R. Coyle, David Field and Graham A. Starmer, ‘An inconvenient truth: legal implications of errors in breath alcohol analysis arising from statistical uncertainty’ (2010) 42(2) Australian Journal of Forensic Sciences 101; for a discussion based on the USA, including an indication of the technical problems relating to fixed speed cameras, see Steven A. Glazer, ‘Those speed cameras are everywhere: automated speed monitoring law, enforcement, and physics in Maryland’ (2012) 7(1) Journal of Business & Technology Law 1.

2[1994] 10 WLUK 343, [1995] RTR 294, (1995) 159 JP 533 (DC), Times, 4 November 1994, [1994] CLY 674.

3Extensive tests have indicated that many pieces of software widely used in science and engineering are not as accurate as imagined (thus affecting the accuracy of the output), and whether a police officer who has no knowledge of software code is capable of determining such a complex point is debatable: Les Hatton, ‘The T experiments: errors in scientific software’ (1997) 4(2) IEEE Computational Science & Engineering 27.

4Slender v Boothby [1984] 11 WLUK 234, [1985] RTR 385, [1984] 149 JP 405, [1986] CLY 2951; ‘The paradox of the reliable device’ (1986) 50 Journal of Criminal Law 13–15.

5[2012] SASCF 54 at [89]; for an earlier case with evidence from three witnesses, see R v Ciantar; DPP v Ciantar [2006] VSCA 263.

5.216 In New Zealand, Harvey J summarized the position regarding evidence of mechanical or technological devices in R v Good, although no evidence was proffered to substantiate the assumptions built into the presumption:

(a) There is a presumption that mechanical instruments or technological devices function properly at the relevant time.

(b) Judicial notice will be taken of the output of a notorious or well-known technology. Evidence of the way in which it works to establish that it is based on sound scientific principles is not required.

(c) New or novel technologies will not receive judicial notice. Expert evidence is required to explain the operation of the technology and the scientific principles upon which it is based. Authority seems to suggest that problems have arisen when technologically based evidence has been adduced without undertaking the inquiry whether or not the technology is ‘notorious’ or requires expert evidence.

(d) There is no rule of law which says that the reliability of the device is a precondition to admissibility. In either situation set out in (a) or (b) above the evidence is admissible – it is for the fact finder to assess weight.

(e) In some cases the presumption of accuracy of a technological device will be created by statute. The manner in which the technology is operated may have an impact upon the weight to be attributed to its output.

(f) In some cases devices may, as a result of their own processes, create a record which is admissible. (R v Spiby (1990) 91 Cr App R 186, (1991) Crim LR 199).

(g) However, if there is human intervention in the performance of such processes either at the input, output or any intermediate stage, hearsay issues may arise, although in some cases exceptions to the hearsay rule may apply.

(h) Whether or not there is unfairness in the process of acquiring or dealing with the evidence is a recognized common law ground to test admissibility and may be available upon the facts of each case. That is a matter primarily of human behaviour and is not intrinsically part of the technology.¹

1[2005] DCR 804 at [70].

5.217 Proof that computers are presumed to work properly must rest with the proponent. The term ‘computers’ is used solely to reinforce the point that a computer or computer-like device is far more sophisticated than any pure mechanical machine, and such devices only work because a human being has written code to allow it to function. No evidence has been adduced to demonstrate the accuracy of such a presumption. One type of computer differs remarkably from another, and each will be controlled by software written by different people of varying degrees of competence to address problems of varying degrees of complexity and difficulty.¹

1For a discussion of software and the complex issues that affect devices used by the medical profession, see Sylvia Kierkegaard and Patrick Kierkegaard, ‘Danger to public health: medical devices, toxicity, virus and fraud’ (2013) 29 Computer Law and Security Review 13; Steven Hanna, Rolf Rolles, Andrés Molina-Markham, Pongsin Poosankam, Kevin Fu and Dawn Song, ‘Take two software updates and see me in the morning: the case for software security evaluations of medical devices’ in Proceedings of the 2nd USENIX Conference on Health Security and Privacy (USENIX Association Berkeley, California, 2011).

5.218 Thus the assertion that all computers are presumed to be working properly (whatever this means) cannot be right. It is to say that all motor cars, regardless of quality, are reliable – which they demonstrably are not (although it is acknowledged that most motor cars are generally reliable). In the view of George L. Paul, ‘Just because businesses rely on faulty computer programs does not necessarily mean that courts should follow suit’,¹ although in People of the State of Colorado v Huhen Vogt J considered that ‘computer business records have a greater level of trustworthiness than an individually generated computer document’² without providing an authority, other than to quote from Colorado Evidentiary Foundations³ that ‘computers are so widely accepted and used that the proponent of computer evidence need not prove those two elements of the foundation’.

1George L. Paul, Foundations of Digital Evidence (American Bar Association 2008), 129; Gordon v Thorpe [1985] 10 WLUK 38, [1986] RTR 358, [1986] Crim LR 61, [1986] CLY 2950, where two experts gave evidence of the accuracy or otherwise of a Lion Intoximeter 3000.

253 P.3d 735 (Colo.App. 2002) at 737.

3Roxanne Bailin, Jim England, Pat Furman and Edward J. Imwinkelreid, Colorado Evidentiary Foundations (Michie 1997, with supplements), 736.

The statutory presumption

5.219 Mention might usefully be made of the powers conferred upon the Secretary of State by s 7(1)(a) of the Road Traffic Act 1988, by which a Minister may approve the use of breathalyser devices.¹ The view of the courts is illustrated in Richardson v DPP,² in which Stanley Burnton J noted that ‘The device so approved is assumed to be an effective and sufficiently accurate device for the purposes of section 5(1)(a), and that is the end of the matter’.³ The effect is to create a statutory presumption for breathalyser devices. He went on to indicate that if the device and software approved in 1998 had since changed, that was not relevant:

On the face of it, therefore, it would seem that a device which did not include the Intoximeter EC/IR Gas Delivery System, by way of example, or the software version of which was not UK5.23, but some significantly different version, would not be an approved device. It does not follow from that that every modification to an Intoximeter takes it out of the approval. Far from it. The alteration must be such, in my judgment, that the description in the schedule to the order no longer applies to it.⁴

1‘Approval of breath test device’ (1968) 32 The Journal of Criminal Law 255; ‘Trying times for breath testers’ (1969) 33 The Journal of Criminal Law 106; ‘Proof of approval of “Alcotest”’ (1969) 33 The Journal of Criminal Law 168; ‘Proof of approval by letter’ (1969) 33 The Journal of Criminal Law 204; ‘Judicial notice of Alcotest’ (1970) 34 The Journal of Criminal Law 107.

2[2003] EWHC 359 (Admin), [2003] 2 WLUK 596.

3[2003] EWHC 359 (Admin) at [6]‌.

4[2003] EWHC 359 (Admin) at [9]‌; identical comments were made by Robert Goff LJ in R v Skegness Magistrates’ Court, Ex parte Cardy [1985] RTR 49 at 61.

5.220 In Fearnley v Director of Public Prosecutions,¹ Mr Justice Field observed that:

Whilst the defence statement purports to put the prosecution specifically to proof that the software was UK 5.23, this did not mean that the prosecution had specifically to prove this matter. This is because of the general presumption that flows from the fact that the machine was of a type that had been approved,² this being a presumption which in my view is plainly consistent with Article 6 ECHR. Thus, it was for the appellant to adduce some evidence that the software was otherwise than the specified software before the prosecution came under a burden to prove the software. At no stage did the appellant raise or adduce such evidence and therefore he can have no substantial complaint that the prosecution were allowed to provide specific proof of the software through the engineer’s report.³

1[2005] EWHC 1393 (Admin), [2005] 6 WLUK 191, (2005) 169 JP 450, (2005) 169 JPN 735, Times, 6 July 2005, [2005] CLY 729.

2Illustrating a confusion between common law and statutory presumption.

3[2005] EWHC 1393 (Admin) at [34].

5.221 In Kemsley v DPP,¹ Buxton LJ stated the opinion of the court on this matter:

The statutory presumption as to approval of a particular device was conclusive as to the correctness of that device. That point does not now appear in this case, and should not appear in any case in the future.²

1[2004] EWHC 278 (Admin), [2004] 2 WLUK 65, (2005) 169 JP 148, (2005) 169 JPN 239, [2005] CLY 874.

2[2004] EWHC 278 (Admin) at [11].

5.222 In DPP v Wood, DPP v McGillicuddy,¹ Ouseley J indicated that if the breath test device is approved, it is therefore reliable: ‘There is a common law presumption that the breath test device, if type approved, is reliable.’² Alternatively, where a device is weighted in favour of the accused, it is not an improper use of the device.³ The same position is held in cases relating to speed measuring devices,⁴ although if the road markings that are placed on the road to provide a scale for the digital device to measure speed are not the correct distance apart, the device will give a false reading.⁵ This approach might be appropriate, given that the accused can agree to have a sample of blood taken, and at the same time a copy sample of the blood is provided to the accused. Analysis of the blood is more accurate, and the blood sample can thus be analysed by the police and independently by a person on behalf of the accused.⁶ If this option is taken up by the accused, the evidence is more compelling, although consideration must be given to the deterioration of the blood sample.⁷ Lord Hughes offered a further rationale in Public Prosecution Service v McKee Public Prosecution Service,⁸ where the appellants had their fingerprints taken at the police station using an electronic device called Livescan. A match was subsequently made, which the Crown relied upon at trial. Livescan devices were in general use in Northern Ireland from 2006 and throughout the period 2007–2009 when statutory type approval was required by article 61(8B) of the Police and Criminal Evidence (Northern Ireland) Order 1989,⁹ although approval was never granted. The appeal was dismissed. Of relevance in this context are the remarks by Lord Hughes:

The control fingerprints taken from the appellants in the police station were not snapshots. The impressions which their fingers provided could be reproduced at any time afterwards, and would be the same. The accuracy of the Livescan readings, if disputed, could readily be checked independently by the appellants providing more samples, whether by ink and paper or by any other means, for examination by an independent expert.¹⁰

1[2006] EWHC 32 (Admin), [2006] 1 WLUK 326, (2006) 170 JP 177, [2006] ACD 41, (2006) 170 JPN 273 (2006) 170 JPN 414, (2006) 156 NLJ 146, Times, 8 February 2006, [2006] CLY 951.

2[2006] EWHC 32 (Admin) at [2]‌; also noted by Mr Justice Cresswell at [43] in DPP v Brown (Andrew Earle), DPP v Teixeira (Jose) [2001] EWHC Admin 931, [2001] 11 WLUK 426, (2002) 166 JP 1, [2002] RTR 23, Times, 3 December 2001, [2002] CLY 733.

3Ashton v DPP [1995] 6 WLUK 298, (1996) 160 JP 336, [1998] RTR 45, Times, 14 July 1995, Independent, 10 July 1995, [1995] CLY 4416; for a discussion of other cases and the reverse burden of proof, see Ian Dennis, ‘Reverse onuses and the presumption of innocence: in search of principle’(2005) Dec Crim LR 901; David Hamer, ‘The presumption of innocence and reverse burdens: a balancing act’ (2007) 66(1) CLJ 142; P. M. Callow, ‘The drink-drive legislation and the breath-alcohol cases’ (2009) 10 Crim LR 707.

4Section 20 of the Road Traffic Offenders Act 1988 as amended; Griffiths v DPP [2007] EWHC 619 (Admin), [2007] 3 WLUK 572, [2007] RTR 44, [2007] CLY 3537.

5Bill Gardner, ‘Driver defeats speeding ticket with tape measure’, The Telegraph (London, 15 December 2014) https://www.telegraph.co.uk/news/uknews/road-and-rail-transport/11294579/Driver-defeats-speeding-ticket-with-tape-measure.html.

6Judges will not permit devices to be tested, and do not require the police to disclose details of the maintenance of machines. This leaves a defendant, when challenging the accuracy of a breath test device, the option of having their blood or urine tested, for which see Hughes v McConnell [1986] 1 All ER 268, [1985] 2 WLUK 235, [1985] RTR 244, [1985] CLY 3055, applying Snelson v Thompson [1984] 10 WLUK 254, [1985] RTR 220, [1985] CLY 3058.

7As noted by Mr Justice Newman at [8]‌ in Dhaliwal v DPP [2006] EWHC 1149 (Admin), [2006] 3 WLUK 459, also known as R. (on the application of Dhaliwal) v DPP; the position is similar in South Australia: Police v Bleeze [2012] SASCF 54, although the timing of the taking of the blood sample might be relevant, for which see Evans v Benson (1986) 46 SASR 317.

8[2013] UKSC 32, [2013] 1 WLR 1611, [2013] 3 All ER 365, [2013] NI 133, [2013] 5 WLUK 542, [2013] 2 Cr App R 17, [2014] Crim LR 77, Times, 18 June 2013, [2013] CLY 3289, also known as Public Prosecution Service of Northern Ireland v Elliott.

91989 No. 1341 (NI 12); article 61(8B) was repealed by the Policing and Crime Act 2009 (s 26), ss 112(1)(2), 116(6), Sch 7 para 128(2), Sch 8 Pt 13.

10[2013] UKSC 32 at [15].

5.223 Lord Hughes rejected the analogy between the Livescan device and speed guns and breathalysers. The latter device records an action that cannot be subsequently remeasured. Unlike a breath test, the digital data comprising the impressions of the fingerprints were reproducible, and further tests could be carried out. For this reason, it is argued, it is appropriate to expect the device to produce reliable evidence, which in turn infers that such devices have been investigated and approved by the relevant authorities.

5.224 In essence, this is what the defendants tried to achieve in R v Skegness Magistrates’ Court, Ex parte Cardy.¹ In the absence of the right to obtain discovery as it was then called, solicitors for the accused sought to obtain relevant documents for the purpose of challenging the reliability of the Lion Intoximeter 3000 device by issuing witness summonses. Robert Goff LJ, as he then was, described the witness summonses as a means to obtain the discovery of documents, which was not permitted. Correct as this decision was, the judge commented on several occasions² that, in the judgment of the court, the documents that the defendants sought to obtain were not likely to be of material relevance, but failed to give any reason as to why such a conclusion was reached, given that some of the records that were requested included details of the microprocessor program and the standard operating procedures, which were highly relevant. The judge also indicated³ that the court had been assured (it is not clear by whom) that the Home Office constantly monitored the device, and that if the devices were not reliable, the Secretary of State would not have approved their use.⁴ In effect, the court was presuming the ‘reliability’ of such devices because the Secretary of State had so provided.

1[1984] 12 WLUK 244, [1985] RTR 49, [1985] Crim LR 237, (1985) 82 LSG 929, [1985] CLY 3046; see also R v Coventry Magistrates’ Court Ex p. Perks [1984] 7 WLUK 215, [1985] RTR 74, [1985] CLY 3051.

2[1985] RTR 49 at 57F, 57J–K, 58B–C, 58J and 59A.

3[1985] RTR 49 at 60J.

4[1985] RTR 49 at 61F–G.

5.225 Where the defence is not given the opportunity to understand how such a device is constructed, and how new versions of software affect the accuracy of the device, defendants are not, it seems, permitted to obtain any evidence to challenge the ‘reliability’ or ‘accuracy’ of the machine. The failure to provide for the proper scrutiny of electronic evidence and the emphasis on relying on the assurances of the owner or user of the digital device means that the ‘reliability’ or ‘accuracy’ of these devices cannot be readily challenged in English courts.

Challenging the presumption

5.226 To sum up the thrust of this chapter, when considering the ‘reliability’ of computers, judges rarely take relevant expert advice or require lawyers appearing before them to cite the technical literature regarding the ‘reliability’ of computers. They reach their conclusions on this issue in the absence of relevant knowledge.¹ In essence, judges conclude that because a system or device appears to do what is expected of it, notwithstanding the opponent’s challenge, they are satisfied that such systems or devices are ‘reliable’.² In effect, the bench has incorrectly made the presumption into a legal presumption that reallocates the burden of proof on the party opposing the presumption. It is only if the party opposing the presumption succeeds that the relying party is required to discharge the legal burden in relation to the ‘reliability’ of the machine, and therefore the authenticity or integrity and the trustworthiness of the evidence.³

1For instance, see Bryan H. Choi, ‘Crashworthy code’ 94 Wash L Rev 39.

2By way of example, see the conclusion by Walsh J in Her Majesty the Queen v Dennis James Oland, 2015 NBQB 245.

3For a consideration of this point, see Daniel Seng and Stephen Mason, ‘Artificial intelligence and evidence’ (2021) 33 SAcLJ 241.

5.227 It is possible to challenge the authenticity of electronic evidence in a number of ways, although many reported cases appear to indicate that a lawyer will do so on what might appear to be somewhat slender grounds,¹ and the judge will then have to determine whether to conduct a trial within a trial (if a criminal case) to receive evidence on the point. For instance, in R. v Coultas (Kiera),² the accused was convicted of dangerous driving. Evidence from the defendant’s mobile telephone indicated that she was probably writing a text message when she collided and killed the cyclist. Counsel for the defendant asserted, without any foundational evidence, that there was some fault in the network coverage that would demonstrate that the defendant was probably not writing a text message at the material time. Rix LJ accepted that if such an issue had been raised at an earlier stage in the proceedings, it would have been a matter for the Crown to cover, but there was nothing about this in the defence statement and the issue was not relevant at appeal.³ In The People v Lugashi,⁴ the defence argued that the prosecution had, in effect, to disprove the possibility of error before digital records of credit card fraud were admitted. Ortega J said that the ‘proposed test incorrectly presumes computer data to be unreliable’,⁵ which does not follow. However, the appeal on this point was dismissed on a number of grounds, one of which was that the appellant did not challenge the accuracy of the information recorded in the printout.

1Although a letter from the defence to the prosecution putting the validity of the information of a machine in issue is not sufficient in New Zealand: Police v Scott 30/5/97, HC Rotorua AP89/96 – a decision that must be right and probably would be followed in other jurisdictions.

2[2008] EWCA Crim 3261, [2008] 9 WLUK 352.

3[2008] EWCA Crim 3261 at [21].

4205 Cal.App.3d 632 – Ortega J reviewed relevant case law up to the date of this judgment, 27 October 1988.

5205 Cal.App.3d 632 at 640.

5.228 The problem for the lawyer making the challenge is that only the party in possession of the digital data has the ability to understand fully whether the computer or computers from which the evidence was extracted can be trusted. The authors of the Law Commission paper Evidence in Criminal Proceedings: Hearsay and Related Topics point out that a party might rely on evidence from a computer owned or controlled by a third party that is not a party to the proceedings. However, this should not prevent the party from making the challenge of providing a suitable foundation to justify most challenges. Reed and Angel indicate that there are two broad arguments that can be pursued:

1. Where the party adducing the evidence does so to prove the truth of the output, it may be that the other party will challenge the accuracy of the statement by proposing that the computer, or computer-like device, exhibited faults, errors or other forms of failure that might have affected the integrity and trustworthiness of the evidence, and thus its reliability. The reliability of the computer program that generated the record may be questioned. In addition, there might be a fault with the hardware.

2. The conduct of a third party (this phrase is meant to be construed widely to include any person who does not have the authority to alter how a computer or computer-like device operates, other than the way it is intended to operate) generated the faults, errors or other forms of failure that might have affected the integrity and trustworthiness of the evidence, and thus its reliability. For instance, this can include a claim that the records were altered, manipulated, or damaged between the time they were created and the time they appear in court as evidence, or the identity of the author may be in dispute: the person identified as being responsible for writing a document in the form of a word processing file may dispute they wrote the text, or it might be agreed that an act was carried out and recorded, but at issue could be whether the person alleged to have used their PIN, password or clicked the ‘I accept’ icon was the person that actually carried out the action.¹

1Chris Reed and John Angel, The Law and Regulation of Information Technology (6th edn, Oxford University Press 2007), 596; the following analysis closely follows that of Reed and Angel, and the author is indebted to them.

5.229 The first argument was considered in the case of DPP v McKeown (Sharon), DPP v Jones (Christopher)¹ over the inaccuracy of a clock in a Lion Intoximeter 3000² and whether the inaccuracy of the clock affected the facts relied upon as produced by the device, which was otherwise in working order. The court concluded that if there was a malfunction, it was only relevant if it affected the way in which the computer processed, stored or retrieved the information used to generate the statement tendered in evidence. This must be right. Regarding breathalyser cases, in Director of Public Prosecutions (DPP) v Manchester and Salford Magistrates’ Court³ Sir Brian Leveson P gave the judgment, and illustrated what the courts expected from the defence:

[54] … there must be a proper evidential basis for concluding that the material sought is reasonably capable of undermining the prosecution or of assisting the defence, or that it represents a reasonable line of enquiry to pursue.

55. … It is not enough to say that the defence case is that the amount drunk would not put the defendant over the limit or anywhere near it, and therefore the machine must be unreliable. What the evidence needed to do, in order to provide a basis for such a disclosure order was to address two critical features.

56. The first requirement is the basis for contending how the device might produce a printout which, on its face, demonstrated that it was operating in proper fashion, but which could generate a very significantly false positive reading, where, on the defence case, the true reading would have been well below the prosecution limit. The second requirement is to identify how the material which was sought could assist to demonstrate how that might have happened. Those are the two issues which arise and which the expert evidence in support of disclosure should address. Unless that evidence is provided, the disclosure is irrelevant.

58. … unless the disclosure application addresses the two questions which we have identified, this extensive disclosure would have to be given in every case in which a defendant alleged that his alcohol consumption had been too low to sustain a positive reading, and in effect proof of reliability would always be required and the presumption of accuracy would be displaced.

1[1997] 1 WLR 295, [1997] 1 All ER 737, [1997] 2 WLUK 386, [1997] 2 Cr App R 155 (HL), (1997) 161 JP 356, [1997] RTR 162, [1997] Crim LR 522, (1997) 161 JPN 482, (1997) 147 NLJ 289, Times, 21 February 1997, Independent, 7 March 1997, [1997] CLY 1093; Philip Plowden, ‘Garbage in, garbage out – the limits of s 69 of the PACE Act 1984’ (1997) 61 Journal of Criminal Law 310; for an earlier case where the defence challenged the accuracy of the Intoximeter printout, see Ashton v DPP, [1995] 6 WLUK 298, (1996) 160 JP 336, [1998] RTR 45, Times, 14 July 1995, Independent, 10 July 1995, [1995] CLY 4416; ‘Ashton v DPP’ (1996) 60 Journal of Criminal Law 350.

2The range of approved devices constantly alters, but the case law relating to older devices remains relevant. For a more detailed discussion, see the most up-to-date edition of Wilkinson’s Road Traffic Offences, Sweet & Maxwell.

3[2017] EWHC 3719 (Admin), [2019] WLR 2617, [2017] 7 WLUK 154 also known as DPP v Manchester and Salford Magistrates’ Court; see also DPP v Walsall Magistrates’ Court [2019] EWHC 3317 (Admin), [2019] 12 WLUK 61, [2020] RTR 14, [2020] Crim LR 335, [2020] ACD 21, [2020] 5 CL 43; Peter Hungerford-Welch, ‘Disclosure: DPP v Walsall Magistrates’ Court; DPP v Lincoln Magistrates’ Court QBD (DC): Lord Burnett LCJ and May J: 5 December 2019; [2019] EWHC 3317 (Admin)’ (2020) 4 Crim LR 335; for a speeding case regarding an approved measurement device, an LTi 20.20 Ultralyte 1000, with a Ranger system to make a video record of the use of the device and its results, see R (on the application of DPP) v Crown Court at Caernarfon [2019] EWHC 767 (Admin), [2019] 3 WLUK 830.

5.230 Where the evidential burden has been successfully raised to challenge an aspect of the digital data (whether it be its integrity or reliability),¹ then the persuasive burden will be on the party denying any error to prove the computer (normally the software), computer-like device or computer system is not at fault, thus demonstrating its reliability, integrity and trustworthiness and therefore the authenticity of the evidence tendered. One test is to determine how many important or critical updates of the software were made available and downloaded before the material time, and whether, if such updates were downloaded, they had a detrimental effect on the subsequent operation of the software. Claimants face a considerable problem with ATM cases because so much can go wrong, and it can be difficult to raise sufficient evidence to shift the burden: an outsider or a bank employee might have subverted the system, or a part of the system, or a hardware device forming part of the ATM network (or a cloned card is used) in such a way that money is stolen from the account of an individual.² In such circumstances, the electronic record adduced to prove the transaction may be perfectly reliable – what will be at issue is how the thief subverted the network to steal the money. In the case of Marac Financial Services Ltd v Stewart,³ Master Kennedy-Grant observed:

The use of computers for the recording of transactions on accounts such as the cash management account in this case is sufficiently well established for there to be a presumption of fact that such computers are accurate.⁴

1As in Young v Flint [1986] 4 WLUK 218, [1987] RTR 300, [1988] CLY 3120, where the defence wished to cross-examine the witness respecting modifications made to the device to determine whether the machine ceased to be an approved device.

2Ken Lindup, ‘Technology and banking’; Roger Porkess and Stephen Mason, ‘Looking at debit and credit card fraud’ (2012) 34(3) Teaching Statistics 87.

3[1993] 1 NZLR 86.

4[1993] 1 NZLR 86, [40]. Examples of where banks have not been found to be fully in control of their systems include Patty v Commonwealth Bank of Australia [2000] FCA 1072, Industrial Relations Court of Australia VI-2542 of 1996; United States of America v Bonallo, 858 F.2d 1427 (9th Cir. 1988); Kumar v Westpac Banking Corporation [2001] FJHC 159; Sefo v R [2004] TOSC 51; R v Clarke [2005] QCA 483.

5.231 Master Kennedy-Grant did not provide any evidence to substantiate this statement.

‘Working properly’

5.232 The Law Commission made comments about the presumption in Evidence in Criminal Proceedings: Hearsay and Related Topics at 13.14:

Where a party sought to rely on the presumption, it would not need to lead evidence that the computer was working properly on the occasion in question unless there was evidence that it may not have been – in which case the party would have to prove that it was (beyond reasonable doubt in the case of the prosecution, and on the balance of probabilities in the case of the defence).

5.233 Three significant problems occur with the judicial comments on this topic: first, that there is no definition of what is meant by ‘working properly’. A computer might be working ‘properly’ but not in the way an owner expects, and a third party can instruct a computer to do things that the owner neither authorizes nor is aware of. Second, it will not always be obvious whether the reliability of the evidence generated by a computer is immediately detectable without recourse to establishing whether there is a fault in the software code.

5.234 The third problem is that the presumption asserts something positive. The opposing party is required to raise a doubt in the absence of relevant evidence from the program or programs that are relied upon. In criminal proceedings, this has the unfair effect of undermining the presumption of innocence – subverting any article 6 rights under the European Convention of Human Rights and the Human Rights Act 1998 that the accused might have – and in civil proceedings the party challenging the presumption must convince a judge to order up the delivery of the relevant evidence, including software code, if the evidence is to be tested properly.

5.235 There is no authoritative judicial guidance in relation to the meaning of the words ‘reliable’, ‘in order’ or ‘working properly’ in the context of digital data. It is possible to refer to system reliability, interpreted broadly, as a measure of how a system matches the expectations of the user, but this view is problematic, because the expectations may be mistaken and can change arbitrarily, sometimes based on the user’s experience. A more narrow definition is to define reliability in relation to the success with which a system provides the specified service.¹ Professor Randell and colleagues illustrate the conundrum: ‘It is of course to be hoped that the reliance placed on a system will be commensurate with its reliability.’ Herein lies the rub: ‘Notions of reliance, therefore, can be as much bound up with psychological attitudes as with formal decisions regarding the requirement that a system is supposed to satisfy.’² The authors continue:

In fact, the history of the development of computers has seen some fascinating interplay between reliance and reliability. The reliability of early computers caused relatively little reliance to be placed on the validity of their outputs, at least until appropriate checks had been performed. Even less reliance was placed on the continuity of their operation – lengthy and frequent periods of downtime were expected and tolerated. As reliability increased so did reliance, sometimes in fact outdistancing reliability so that additional efforts had to be made to reach previously unattained reliability levels. During this time computing systems were growing in size and functional capacity so that, although component reliability was being improved, the very complexity of systems was becoming a possible cause of unreliability, as well as a cause of misunderstandings between users and designers about system specification.³

1Randell and others, ‘Reliability issues in computing system design’, 123.

2Randell and others, ‘Reliability issues in computing system design’, 124.

3Randell and others, ‘Reliability issues in computing system design’, 124. That IT projects invariably cost more than estimated, overrun and sometimes fail to be implemented is a notorious fact. A citation (or citations) is not necessary.

5.236 In considering a number of examples of reliability issues, Professor Randell indicates that the design of software is inextricably intertwined with the other factors that are responsible for the failure of computer projects:¹

reliability is a commodity whose provision involves costs, either direct, or arising from performance degradation. In theory, the design of any nontrivial computing system should involve careful calculations of trade-offs between reliability, performance, and cost. In practice the data and relationships which would be needed for such calculations in complex systems, are quite often unknown, particularly with regard to unreliability caused by residual design faults.²

1For a more detailed treatment of the causes of the failure of projects, see Glass, Software Runways; Planning Report 02-3 The Economic Impacts of Inadequate Infrastructure for Software Testing, prepared by RTI for the National Institute of Standards & Technology (May 2002), https://www.nist.gov/system/files/documents/director/planning/report02-3.pdf; Charette, ‘Why software fails’, 42.

2Randell and others, ‘Reliability issues in computing system design’, 127.

5.237 Linden pointed out that reliability ‘means not freedom from errors and faults, but tolerance against them. Software need not be correct to be reliable’,¹ and Denning indicated that although ‘reliability, in the sense of error tolerance, has long been sought in operating system software, it has always been difficult to achieve’.² Responsible practice will often include processes such as the maintenance and review of defect records, and testing or requalification of an upgrade before it is distributed: these are some of the issues about which questions can be legitimately asked by a party in seeking to question the presumption of ‘reliability’.

1Peter J. Denning, ‘Fault tolerant operating systems’ 8(4) ACM Computing Services 359, 361.

2Denning, ‘Fault tolerant operating systems’, 359.

Concluding remarks

5.238 It is proposed that the proponents of a presumption that computers and computer systems ‘were in order at the material time’ should state what is meant by such a proposition if it is to remain – if they are able to, given the notion that computers are ‘reliable’ has finally been exposed as erroneous.¹ In Holt v Auckland City Council, Richardson J observed the need to provide evidence to justify reliance:

The results depend on the manner in which it is programmed. And there is no basis on which the Court could take judicial notice of the manner in which this equipment was programmed and maintained. Evidence was necessary to justify reliance on the computer print out.²

1Ladkin and others, ‘The Law Commission presumption concerning the dependability of computer evidence’; Ladkin, ‘Robustness of software’; Jackson, ‘An approach to the judicial evaluation of evidence from computers and computer systems’.

2[1980] 2 NZLR 124 at 128 (35–40).

5.239 It does not appear that any thought has been given to demonstrating what the proposition means. The Law Commission specifically commented on the contrary argument made by David Ormerod, now Professor Ormerod, to their proposal to repeal s 69. Professor Ormerod ‘contended that the common law presumption of regularity may not extend to cases in which computer evidence is central’.¹ This comment by Professor Ormerod must be right.

1Evidence in Criminal Proceedings: Hearsay and Related Topics at 13.16.

5.240 In Scott v Baker,¹ Lord Parker CJ and his brother judges rejected the argument of the prosecution that there was a presumption that where an alcohol measuring device was used by the police, it therefore followed that the device was approved by the Secretary of State. The Law Commission agreed that this presumption must have been applicable to the Intoximeter cases, and yet noted that this had not been raised in previous cases. They then went on, at 13.17, to state (footnote omitted):

It should also be noted that Dillon was concerned not with the presumption regarding machines but with the presumption of the regularity of official action. This latter presumption was the analogy on which the presumption for machines was originally based; but it is not a particularly close analogy, and the two presumptions are now clearly distinct.

1[1969] 1 QB 659, [1968] 3 WLR 796, [1968] 2 All ER 993, [1968] 5 WLUK 42, (1968) 52 Cr App R 566, (1968) 132 JP 422, (1968) 112 SJ 425, [1968] CLY 3428; ‘Divisional court cases breath tests: approval of Device Scott v. Baker’ (1958) 32 The Journal of Criminal Law 151.

5.241 Professor Ormerod referred to Dillon¹ for the point that the prosecution is not entitled to rely on a presumption to establish facts central to an offence, and it is essential for the prosecution to prove, on the facts of Dillon, the lawfulness of the prisoner’s detention by affirmative evidence.² In his article, Professor Ormerod argued that where evidence in digital form is fundamental, such as in bank frauds, it will be necessary to require specific proof of reliability. This proposition must be correct: the presumption on its own cannot bear the weight of proof beyond reasonable doubt.

1Dillon v R [1982] AC 484, [1982] 2 WLR 538, [1982] 1 All ER 1017, [1982] 1 WLUK 749, (1982) 74 Cr App R 274, [1982] Crim LR 438, (1982) 126 SJ 117, [1982] CLY 547.

2David Ormerod, ‘Proposals for the admissibility of computer evidence’ (1995) 6(4) Computers and Law 24.

5.242 In the absence of evidence that such a presumption can possibly apply to such complex objects as computers and computer systems, it is suggested that any presumption that a computer or computer-like machine is working properly be guided by considerations as to how ‘correct operation’, ‘quality’, ‘reliability’ and ‘integrity’ can be incorporated within the evaluation of the presumption.¹ It cannot be right to infer ‘reliability’ from reliance.

1The Model Law on Electronic Evidence (Commonwealth Secretariat 2017) https://thecommonwealth.org/sites/default/files/key_reform_pdfs/P15370_7_ROL_Model_Bill_Electronic_Evidence_0.pdf also refers to ‘reliability’ at 2: ‘The Group agreed that system reliability is the most sensible measurement’, and article 7 and 7(a) provides for the presumption that the integrity of the electronic records system is working properly – as is normal with such pronouncements, no evidence was put forward to substantiate this assertion.

5.243 As it stands, the presumption places an evidential burden (in reality, as noted above, it is a legal burden) on the party opposing the presumption, as described by Tipping J: ‘The accused must be able to point to a sufficient evidential foundation for the suggestion that the device was unreliable in the relevant sense, before being entitled to have the point considered by the jury. If there is such a foundation, the Crown must establish reliability beyond reasonable doubt’¹ and careful consideration ought to be given to the hurdle a party must overcome in order to meet the evidential burden. In this respect, the defence was correct to challenge the evidence of the CD which contained the intercepted recordings in R. v Saward (Steven Kevin), R. v Bower (Steven Kevin), R. v Harrison (Keith),² because had the prosecution more thoroughly ensured the continuity of the evidence, it is possible the defence may not have had a legitimate objection. In Scott v Otago Regional Council, Heath J indicated that cross-examination of relevant points can be sufficient to put the point in issue, which must be right (although the cross-examination might more usefully have also considered questioning how many software updates were provided by the manufacturer of the product that corrected faults):

No evidence was offered about the reliability of the computer and software used to establish that they were ‘of a kind that ordinarily [do] what a party asserts [them] to have done’.³ Mr Reeves offered no evidence that he had used the programme successfully in the past and had found it to be working normally. Nor was there any independent evidence to explain how the computer programme worked and what it could reliably be expected to do. In a prosecution such as this, Mr Andersen’s cross-examination of Mr Reeves was sufficient to put the point in issue.⁴

1R v Livingstone [2001] 1 NZLR 167 at [13].

2[2005] EWCA Crim 3183, [2005] 11 WLUK 351.

3Where the basic fact of the presumption is not satisfied, the presumption fails.

4CRI 2008-412-17-20, High Court Dunedin, 3 November 2008, [2008] Your Environment 392; 31 TCL 48/8 at [33].

5.244 The Law Commission indirectly discussed ‘reliability’ at para 13.18, Evidence in Criminal Proceedings: Hearsay and Related Topics, but only by referring to the possibility of a ‘malfunction’. The entire discussion seems to be predicated upon machines used to test the amount of alcohol a person has consumed, rather than the very much broader range of computers and computer-like devices that are in common use:

Even where the presumption applies, it ceases to have any effect once evidence of malfunction has been adduced. The question is, what sort of evidence must the defence adduce, and how realistic is it to suppose that the defence will be able to adduce it without any knowledge of the working of the machine? On the one hand the concept of the evidential burden is a flexible one: a party cannot be required to produce more by way of evidence than one in his or her position could be expected to produce. It could therefore take very little for the presumption to be rebutted, if the party against whom the evidence was adduced could not be expected to produce more.

5.245 The comments by Lord Hoffmann in DPP v McKeown (Sharon), DPP v Jones (Christopher),¹ in which he offered the opinion that ‘It is notorious that one needs no expertise in electronics to be able to know whether a computer is working properly’,² can be considered to be the extreme view that will not be shared by any computer experts – or indeed lay people. His comment is not merely dangerous but also vacuous. It is like saying that you do not need to know the chemistry of ink to know whether writing works. This is not relevant, because you can still write nonsense, regardless of the chemical properties of the ink. It is noticeable that paragraph 432 of the Explanatory Notes to the Criminal Justice Act 2003 indicated that, in respect of testimony under s 129(1):

This section provides where a statement generated by a machine is based on information implanted into the machine by a human, the output of the device will only be admissible where it is proved that the information was accurate.

2[1997] 1 All ER 737 at 743b.

5.246 Here the emphasis is on the accuracy of the information as an input to the computer, not whether the computer was working consistently, or to put it another way, whether the system was not working in accordance with an expectation, or the ability of the computer to return generally verifiably correct results. The problem is that Lord Hoffmann considered the issue from the opposite perspective: an assumption that the computer is working properly because of what the user can see, not what an unknown third party does not want them to see, or attempts to prevent anyone from seeing and understanding what else the computer is doing without the knowledge of the owner or user.

5.247 As a matter of admissibility, it is necessary that proof that a computer, computer-like device or network (comprising many computers and modes of communication) was ‘in order’ at the material time – indeed, in England and Wales, s 129(2) of the Criminal Justice Act 2003 preserves the common law position:

129 Representations other than by a person

(1) Where a representation of any fact—

(a) is made otherwise than by a person, but

(b) depends for its accuracy on information supplied (directly or indirectly) by a person, the representation is not admissible in criminal proceedings as evidence of the fact unless it is proved that the information was accurate.

(2) Subsection (1) does not affect the operation of the presumption that a mechanical device has been properly set or calibrated.¹

1Law Commission, Evidence in Criminal Proceedings: Hearsay and Related Topics (Law Com no 245) (19 June 1997), 7.50.

5.248 That software is notorious for being the subject of defects leads to a somewhat uneasy state of affairs. It cannot be right to presume that a machine (in particular a computer, computer-like device or network) was ‘in order’ (whatever that means) or ‘reliable’ at the material time. The proponents of the presumption have not provided any evidence to demonstrate the accuracy of this assertion. Evidence in digital form is not immune from being affected by the faults in software written by human beings. The use of the words ‘operating properly’ illustrates the misconceptions described in this chapter.

5.249 The lack of any evidence to support the proposition is especially relevant in the light of the underlying rationale of evidence. In A Philosophy of Evidence Law: Justice in the Search for Truth,¹ Professor Hock Lai Ho demonstrates that the finder of facts acts as a moral agent, and central to this is that the findings by a court must be justifiable, and meet the demands of rationality and ethics.² When read in the light of the unique characteristics of evidence in digital form, the rationale of the evidential process takes on an even more relevant role. This is because the factors and subsequent analysis have an added poignancy when taking into account the complexity of electronic evidence: the potential volumes of evidence, the difficulty of finding evidence, persuading the judge to order additional searches or to order the disclosure of relevant digital data, the ease with which electronic evidence can be destroyed, the costs of such exercises, the lawyer’s lack of knowledge when dealing with this form of evidence and the presumption that computers are ‘reliable’ or ‘working properly’. In this respect, the inadequacy of the procedure leading to trial brought about by an incomplete understanding and application of the presumption may cause unfairness.

1Oxford University Press 2008.

2Note the article by Louis Kaplow, ‘Burden of proof’ (2012) 121 Yale LJ 738, in which the author considers how robust the evidence ought to be in order to assign liability when the objective is to maximize social welfare.

5.250 The question is whether the presumption is to remain in its misunderstood form as a legal presumption. The failure of its proponents to provide evidence that the presumption has any basis in fact is a strong indication that it does not merit being in place, and any argument in favour of the proposition ought to clearly indicate why banking systems, manufacturers of motor vehicles, aircraft and medical devices – to name but a few – should be rewarded by such a presumption. In addition, the innumerable examples of the failure of software outlined in this chapter, and other failures that are constantly brought to our attention by the media, as well as the failures we witness ourselves in our everyday lives, act to challenge why software code should benefit from such a presumption. This is particularly so when evidence in digital form is more likely to be open to challenge, as illustrated above.

5.251 In addition, considering that the presumption is only an evidential presumption, the bar for raising doubts about the reliability or otherwise of a computer, computer-like device or network must not be placed too high.¹ For instance, in DPP v Wood, DPP v McGillicuddy Ouseley J indicated (in respect of the Intoximeter EC/IR):

The nature and degree of an alleged unreliability has to be such that it might be able to throw doubt on the excess in the reading to such an extent that the level of alcohol in the breath might have been below the level at which a prosecution would have been instituted.²

1Sergey Bratus, Ashlyn Lembree and Anna Shubina, ‘Software on the witness stand: what should it take for us to trust it?’ in Alessandro Acquisti, Sean Smith and Ahmad-Reza Sadeghi (eds) Trust and Trustworthy Computing: Proceedings of the Third International Conference, TRUST 2010, Berlin, Germany, 21–23 June 2010 (Springer-Verlag 2010), 396–416.

2[2006] EWHC 32 (Admin) at [36].

5.252 However, as indicated by Eric Van Buskirk and Vincent T. Liu:

The Presumption of Reliability is difficult to rebut. Unless specific evidence is offered to show that the particular code at issue has demonstrable defects that are directly relevant to the evidence being offered up for admission, most courts will faithfully maintain the Presumption of Reliability. But because most code is closed source and heavily guarded, a party cannot audit it to review its quality. At the same time, however, source code audits are perhaps the best single way to discover defects.

This difficulty gives rise to an important question: if a party cannot gain access to source code without evidence of a defect, but cannot get evidence of a defect without access to the source code, how is a party to rebut the Presumption? Rather than wrestle with, or even acknowledge, this conundrum, most courts simply presume that all code is reliable without sufficient analysis. (Footnotes omitted.) ¹

1Van Buskirk and Liu, ‘Digital evidence: challenging the presumption of reliability’, 20.

5.253 This view is illustrated in the case of State of Florida v Bastos,¹ an appeal before the District Court of Appeal of Florida, Third District, where Cope J held that source code for an Intoxilyzer 5000 breath test machine used in the defendants’ cases was not ‘material’ within the meaning of the provisions of the uniform law to secure the attendance of witnesses from within or outside a state in criminal proceedings. The judge went on to say:

However, we cannot accept the proposition that simply because a piece of testing equipment is used in a criminal case, it follows that the source code for its computer must be turned over. There would need to be a particularized showing demonstrating that observed discrepancies in the operation of the machine necessitate access to the source code. We are unable to see that any such evidence was brought forth in the evidentiary hearing below.²

1985 So.2d 37 (Fla.App. 3 Dist. 2008). In State of North Carolina v Marino, 747 S.E.2d 633 (N.C.App. 2013), the court refused to accept that the decisions of the Supreme Court in Crawford v Washington, 541 U.S. 36, 51, 124 S.Ct. 1354, 158 L.Ed.2d 177, 192 (2004), nor that the decision in Melendez–Diaz v Massachusetts, 557 U.S. 305, 310–11, 129 S.Ct. 2527, 174 L.Ed.2d 314, 321–22 (2009) stood for the proposition that a defendant had a right under the Sixth Amendment to examine the Intoximeter source code. But see In re Commissioner of Public Safety v Underdahl, 735 N.W.2d 706 (Minn. 2007) and State of Minnesota v Underdahl, 767 N.W.2d 677 (Minn. 2009), where it was held that an order that the Commissioner of Public Safety provide Mr Underdahl with an operational Intoxilyzer 5000EN instrument and the complete computer source code for the operation of the device was affirmed partly on the basis that the State had possession or control of computer source code for the purposes of discovery.

2985 So.2d 37 (Fla.App. 3 Dist. 2008) at 43.

5.254 The party contesting the presumption will rarely be in a position to offer significant evidence to substantiate any challenge¹ because the party facing the challenge will generally (but not always) be in full control of the computer or computer systems that are the subject of the challenge.² Offering an explanation that is not reinforced with any evidence will not be sufficient, for which see Burcham v Expedia, Inc.,³ and a theory that is ‘incredible’ should not require the court to consider the matter in any detail.⁴ The lack of evidence for raising doubts about the presumption is not helpful, for which see Public Prosecution Service v McGowan.⁵ From the perspective of criminal procedure, it must be right that the defence should give the prosecution advance notice that they intend to challenge the device, as suggested in R. v Crown Prosecution Service Ex p. Spurrier⁶ by Newman J:

As a matter of general rule, I can see no reason why the defence should not be taken to be required, of course on pain of paying the costs of an adjournment if that proves to be necessary, to give some notice in advance of the trial of the grounds upon which a claim that the device was defective will be advanced.⁷

1For an interesting discussion that includes the burden in the context of authentication, see Rudolph J. Peritz, ‘Computer data and reliability: a call for authentication of business records under the federal rules of evidence’, 965–1002.

2It is becoming increasingly common for organizations and individuals to rely on third parties to provide computing facilities through what is termed ‘cloud computing’ by the technical community; for a detailed explanation, see Stephen Mason and Esther George, ‘Digital evidence and “cloud” computing’ (2011) 27(5) Computer Law & Security Review 524.

32009 WL 586513.

4For which see Novak d/b/a PetsWarehouse.com v Tucows, Inc., 73 Fed. R. Evid. Serv. 331, 2007 WL 922306 affirmed Novak v Tucows, Inc., 330 Fed.Appx. 204, 2009 WL 1262947.

5[2008] NICA 13, [2009] NI 1.

6[1999] 7 WLUK 431, (2000) 164 JP 369, [2000] RTR 60, Times, 12 August 1999, [1999] CLY 883, also known as DPP v Spurrier.

7[2000] RTR 60, 68 item (6).

5.255 The evidence of relevant audits is also of significance, such as where John Rusnak forged trades in a Word document and an audit failed to indicate the forgery;¹ and where Nick Leeson forged data that was not noticed by audits.² The importance of audits was glaringly revealed in A and others (Human Fertilisation and Embryology Act 2008).³ Following Cobb J’s judgment in E (Assisted Reproduction: Parent), Re,⁴ the HFEA (Human Fertilisation and Embryology Authority) required all 109 licensed clinics to carry out an audit of their records. It transpired that 51 clinics (46 per cent) had discovered ‘anomalies’ in their records, including missing forms, forms completed or dated after treatment had begun, incorrectly completed, unsigned and not fully completed forms, forms with missing pages, and even forms completed by wrong persons.⁵ Sir James Munby, President of the Family Division, had this to say:

The picture thus revealed … is alarming and shocking. This is, for very good reason, a medical sector which is subject to detailed statutory regulation and the oversight of a statutory regulator – the HFEA. The lamentable shortcomings in one clinic identified by Cobb J, which now have to be considered in the light of the deeply troubling picture revealed by the HFEA audit and by the facts of the cases before me, are, or should be, matters of great public concern. The picture revealed is one of what I do not shrink from describing as widespread incompetence across the sector on a scale which must raise questions as to the adequacy if not of the HFEA’s regulation then of the extent of its regulatory powers.⁶

1Siobhán Creaton and Conor O’Clery, Panic at the Bank: How John Rusnak Lost AIB $691,000,000 (Gill & Macmillan 2002), 96–97.

2Nick Leeson with Edward Whitley, Rogue Trader (Sphere 2013), 117, 120–121, 239; see also Report of the Board of Banking Supervision Inquiry into the Circumstances of the Collapse of Barings (ordered by The House of Commons to be printed 18 July 1995) (HMSO 1995), chapters 9 and 10 and conclusions 13.4(b) and (c) at 232.

3[2015] EWHC 2602 (Fam), [2016] 1 WLR 1325, [2016] 1 All ER 273, [2015] 9 WLUK 234, [2017] 1 FLR 366, [2015] 3 FCR 555, (2015) 146 BMLR 123, [2015] Fam Law 1333, [2016] CLY 928.

4[2013] EWHC 1418 (Fam), [2013] 5 WLUK 682, [2013] 2 FLR 1357, [2013] 3 FCR 532, [2013] Fam Law 962, (2013) NLJ 163(7563) 19, [2014] CLY 1408, also known as AB v CD.

5A and others (Human Fertilisation And Embryology Act 2008) [2015] EWHC 2602 (Fam), Sir James Munby P at [7]‌.

6[2015] EWHC 2602 (Fam) at [8]‌.

5.256 In Bates v Post Office Ltd (No 6: Horizon Issues) Rev 1,¹ evidence was finally adduced regarding, and witnesses cross-examined upon, a Management Letter dated 27 March 2011 by Ernst & Young, providing, at [393] the following revealing information about the Horizon system:

The main area we would encourage management focus on in the current year is improving the IT governance and control environment. Within the IT environment our audit work has again identified weaknesses mainly relating to the control environment operated by POL’s third party IT suppliers. Our key recommendations can be summarised into the following four areas:

Improve governance of outsourcing application management

Improve segregation of duties within the manage change process

Strengthen the change management process

Strengthen the review of privileged access.

1[2019] EWHC 3408 (QB), [2019] 12 WLUK 208.

5.257 Fraser J emphasized the last point: privileged access. It had been alleged for years that employees of Fujitsu had privileged access to the entire system, and could log into any computer connected to the Horizon system. The judge described the nature of the privileged access at [389]:¹

This entry was from Andy Beardmore, Senior Software and Solution Design Architect Application Services. The experts are agreed that the APPSUP role would, effectively, permit anyone who had that permission to do almost anything on Horizon. It was available to 3^rd line support at SSC, the level at which Mr Roll was employed by Fujitsu. This PEAK further substantiates the evidence of Mr Roll and is consistent with it. APPSUP was described by Mr Parker as ‘the more technically correct name for a type of privileged access to the BRDB’. It is a very powerful permission.

1See also [423] in the Technical Appendix to the judgment.

5.258 Employees and officers from the Post Office repeatedly denied this was possible. Although Mr Richard Roll, previously employed by Fujitsu, gave evidence of the fact the employees of Fujitsu had such privileged access, the final disclosure of the audit by Ernst & Young acted to corroborate his evidence.

5.259 The banking cases also illustrate the nature of the problem,¹ as do the unintended acceleration cases. Crucially, in the US Bookout case, which was one of the high-profile unintended acceleration cases, Selna J ordered the disclosure of the software code.² The explanation for this might be because of two significant, and rather fortuitous, factors. When Jean Bookout was driving her 2005 Toyota Camry, it suddenly accelerated. She took action by pulling the parking brake. By so doing, the right rear tyre left a 100-foot skid mark, and the left tyre left a 50-foot skid mark. The vehicle continued to speed down a ramp, across the road, and came to rest with its nose in an embankment, injuring her and killing her passenger and best friend Barbara Schwarz. Before she died, Schwarz called her husband and said ‘Jean couldn’t get her car stopped. The car ran away with us. There’s something wrong with the car.’³ Both the skid marks and the telephone call by Barbara Schwarz undermined any suggestion that the acceleration was due to a physical problem in the cabin of the vehicle.

1Gerwin Haybäck, ‘Civil law liability for unauthorized withdrawals at ATMs in Germany’ (2009) 6 Digital Evidence and Electronic Signature Law Review 57; Mason, ‘Debit cards, ATMs and negligence of the bank and customer’, 163; Nuth, ‘Unauthorized use of bank cards with or without the PIN; Mason, ‘Electronic banking and how courts approach the evidence’, 144.

2United States of America, Central District of California, Case Protective Order In re: Toyota Motor Corp. Unintended Acceleration Marketing, Sales Practices and Products Liability Litigation, Case Number: 8:10ML2151 JVS (FMOx) (2018) 15 Digital Evidence and Electronic Signature Law Review 98.

3Antony Anderson, ‘Sudden acceleration, spaghetti software and trauma at the kitchen sink’ (2014) Expert Witness Journal (no pagination), http://blog.copernicustechnology.com/wp-content/uploads/2014/05/Uncommanded-Acceleration-article.pdf; ‘Sudden unintended acceleration redux: the unresolved issue’ (2009) 6(3) The Safety Record, http://www.safetyresearch.net/blog/articles/sudden-unintended-acceleration; given that the Bookout case demonstrated the claims of the plaintiff, the decision of Carr J to exclude a number of important expert witnesses, while permitting the expert witness for Ford (an employee) to give evidence, is to be questioned in Buck v Ford Motor Company, 810 F.Supp.2d 815 (N.D.Ohio 2011).

5.260 As Professor Peritz pointed out in 1986:

Computers provide an illusory basis for shortcircuiting traditional legal processes because they cannot be isolated from the people that build and run them. They simply cannot guarantee error-free processing.¹

1Peritz, ‘Computer data and reliability’, 1000; Lynda Crowley-Smith made the same point in ‘The Evidence Act 1995 (Cth): should computer data be presumed accurate?’ (1996) 22(1) Monash University Law Review 166.

5.261 This is why lawyers and members of the judiciary need to understand two significant issues about the world in which we live now, and our reliance on software code. First, the evidential presumption that software code is ‘reliable’ must be reconsidered – or more carefully understood. The rationale used by judges that software code is part of a ‘notorious’ class of machines, or that the operation of computers and other such devices are ‘common knowledge’, must be reversed. In his speech Science and Law: Contrasts and Cooperation before the Royal Society in London on 25 November 2015,¹ Lord Neuberger said that ‘scientists and lawyers each search for and assess hard facts from which they can establish the truth’,² yet lawyers and judges rely on ‘common sense’ when many ‘well-established principles are positively contrary to common sense’.³ Justifications around loose notions of ‘notorious’ or ‘common knowledge’ in respect of software programs is irrational. Justice should not be based on concepts with no basis in logic or science. It is necessary for lawyers and judges to take account of this element of irrationality that has been the law for far too long. To resolve the problem expeditiously, an appellate court could adjust the presumption by restricting it to mechanical instruments and instruments for which statutory presumptions exist. Thereafter, if it is treated as an evidential presumption instead of a legal presumption, it will be for the proponent to provide for the reliability (if the term ‘reliability’ is to be used) of the software. Evidence of reliability will not always be required. No doubt suitable procedural mechanisms can be put in place to allow a party to require relevant evidence of reliability where it is challenged.

1https://www.supremecourt.uk/docs/speech-151124.pdf.

2Lord Neuberger, Science and Law: Contrasts and Cooperation, [9]‌.

3Lord Neuberger, Science and Law: Contrasts and Cooperation, [13].

5.262 Second, judges should understand the necessity of requiring the disclosure of software code and relevant audits of systems, and determine whether security standards, if applied, have been applied properly.¹ This problem has been acknowledged by the European Court of Justice:

In the context of disclosure of evidence, complex issues may arise concerning the disclosure of electronic data, which may constitute a certain mass of information in [the] hands of the prosecution. In such a case, an important safeguard in the sifting process is to ensure that the defence is provided with an opportunity to be involved in the laying-down of the criteria for determining what might be relevant for disclosure.²

1Failures in banking systems used by millions of customers are demonstrated in Murdoch and others, ‘How certification systems fail’.

2Guide on Article 6 of the European Convention on Human Rights, Right to a Fair Trial (Criminal Limb) (31 August 2020), para 166.

5.263 A recent case in the State of New Jersey in the US illustrates that judges might have concluded that disclosure is essential.¹ in the case of State of New Jersey v Pickett,² the Superior Court of New Jersey, Appellate Division decided to permit the disclosure of the software code from a program called TrueAlle, described by Fasciale, PJAD for the court ‘as software designed to address intricate interpretational challenges of testing low levels or complex mixtures of DNA’. The court agreed the criteria a judge should consider when deciding to permit disclosure or discovery of software code, at [284]‌:

We hold that if the State chooses to utilize an expert who relies on novel probabilistic genotyping software to render DNA testimony, then defendant is entitled to access, under an appropriate protective order, to the software’s source code and supporting software development and related documentation—including that pertaining to testing, design, bug reporting, change logs, and program requirements—to challenge the reliability of the software and science underlying that expert’s testimony at a Frye hearing, provided defendant first satisfies the burden of demonstrating a particularized need for such discovery. To analyze whether that burden has been met, a trial judge should consider: (1) whether there is a rational basis for ordering a party to attempt to produce the information sought, including the extent to which proffered expert testimony supports the claim for disclosure; (2) the specificity of the information sought; (3) the available means of safeguarding the company’s intellectual property, such as issuance of a protective order; and (4) any other relevant factors unique to the facts of the case. Defendant demonstrated particularized need and satisfied his burden.

1See also People v Williams, 35 N.Y.3d 24 (2020), 147 N.E.3d 1131, 124 N.Y.S.3d 593, 2020 N.Y. Slip Op. 02123 (trial court abused its discretion in permitting the admission of low copy number DNS evidence without a Frye hearing (Frye v United States, 293 F. 1013 (D.C. Cir. 1923)).

2466 N.J.Super. 270, 246 A.3d 279, followed in United States v Ellis, Slip Copy, 2021 WL 1600711.

5.264 A practical two-phase approach has been proposed:

Stage 1

(i) As a matter of procedure, disclosure should be given of:

(a) Known bugs in the system that have been reported, and the actions taken in response. This should include the disclosure of known error logs,¹ release notices,² change logs³ and similar documents.

(b) The party’s information security standards and processes. This should extend to cover logical access controls⁴ (including emergency access), security vulnerability notifications⁵ and security patches.⁶

(c) Relevant audits of systems and the management of the installation to provide assurance that suitable standards and processes have been implemented and complied with.

(d) Evidence of reliably managed records of error reports and system changes, including evidence to demonstrate that basic precautions, such as digital signatures, have been implemented to detect and limit accidental or deliberate corruption.

(ii) The disclosure set out above should be provided by a person authorised to do so by the party subject to the disclosure obligation. The party with the disclosure obligation should be required to undertake a reasonable and proportionate search for the documents and records in question. Disclosure should be supported by evidence confirming that a reasonable and proportionate search has been undertaken by a person with appropriate authority and knowledge, and that:

(a) The records disclosed are believed to be the records of the relevant standards, processes and audits, and of the known defects, security vulnerabilities, fixes and changes in the system.

(b) The party seeking to rely upon the evidence in question has taken reasonable steps to satisfy itself that access to the system is controlled in such a way that unauthorised and undetected amendment of system data, in a way that might affect the evidence in question, is prevented.⁷

(iii) The disclosure exercise should, where possible, be collaborative and co-operative between the parties, rather than adversarial. In particular:

(a) The parties should, if possible, seek to agree that the disclosed data is in a form that takes into account the ability of the party to whom the disclosure is made should be able to conveniently read/use it.

(b) It should not be required that the party challenging the reliability of the data relied upon should identify the particular issue to which the disclosure required to be given is alleged to go.

(iv) The documents under Stage 1 will be routinely kept and easily available for a bespoke system professionally developed and managed. The absence of such records will ordinarily suggest poor quality software/system management. For commercial-off-the-shelf software it should be enough to provide evidence of the particular version and release of the software and to disclose release documentation (usually publicly available from the supplier) for the relevant version and subsequent releases. (The latter will reveal errors in the version in question later found and corrected.) In either case, proportionate Stage 1 disclosure should not be onerous, and for a professionally managed system should be a straightforward exercise.

Stage 2

(i) If the limited disclosure under Stage 1 reveals any one or more of the following:

(a) a level of recorded defects or failures sufficiently high to provide grounds for questioning the reliability of the computer system from which the material is derived;

(b) that there exist records of specific defects or failures that provide grounds for questioning the evidence sought to be relied upon;

(c) that a person seeking to rely upon the evidence in question is not able to demonstrate that it has adequate control over the systems or data.

then the party seeking to rely upon the evidence produced by the computer system in question should be required to prove that none of the facts or matters identified under (a)–(c) above might affect the reliability of the material sought to be relied upon.

(ii) It is known that all large computer systems contain bugs, and that some of these may be ‘small’ bugs that reveal themselves rarely. This is true even for those systems that have been shown convincingly to be very reliable. It follows that, even in the case of such a reliable system, the court should have regard to the possibility that an apparent failure may be the consequence of a bug manifesting itself.⁸ Evidence of reliability is not evidence of the absence of software bugs. The court should consider what degree of doubt remains in the context of all the other available evidence.⁹

1Records of the errors that have been reported in a system and what action was taken. This should include evidence of testing after each system change to ensure that the same error has not been reintroduced.

2Documentation of the changes that have been made in each new release of the software, including identifying all the known errors that have been corrected.

3Records of every change that has been made to the software (containing information about what was changed, what was affected and what the results were, together with any resulting problems), including by whom, when and why it was done.

4Organizational processes and software controls that ensure data and systems can be read, changed, created and deleted only by people who have been properly authorized and identified.

5Notifications of a vulnerability in a software product that could allow unauthorized access to the system to compromise the integrity, availability or confidentiality of an organization’s systems or data.

6Software changes to correct security vulnerabilities, often made to software systems between releases of the software because an error has been detected that is too important to wait for a new system release to correct it.

7The issue of remote access by a third party to Horizon branch terminals was a major issue in the Post Office Bates litigation. The fact that such access was possible was only conceded by the Post Office in January 2019. It had in fact been practised from early after the introduction of the Horizon system in 1999. Fraser J considered the issue to be of central importance, for which see Bates v The Post Office Ltd (No 6: Horizon Issues) Rev 1 [2019] EWHC 3408 (QB) at [990] and [991] and Hamilton v Post Office Ltd [2021] EWCA Crim 577 at [49]. Until 2010 no records were kept by Fujitsu of such actions.

8For which see Ladkin and others, ‘The Law Commission presumption concerning the dependability of computer evidence’; Jackson, ‘An approach to the judicial evaluation of evidence from computers and computer systems’.

9Paul Marshall, James Christie, Peter Bernard Ladkin, Bev Littlewood, Stephen Mason, Martin Newby, Dr Jonathan Rogers, Harold Thimbleby and Martyn Thomas CBE, ‘Recommendations for the probity of computer evidence’, 24–25; see also The Attorney General’s Guidelines on Disclosure for Investigators, Prosecutors and Defence Practitioners (2020) https://www.gov.uk/government/publications/attorney-generals-guidelines-on-disclosure-2020 (in force 31 December 2020) is a step in the right direction in respect of electronic material, for which see paras 54–57, and in which the overriding obligation to ensure a fair trial is stressed (para 55) .

5.265 The purpose of these recommendations is to ensure that the judicial process more fully comprehends the evidential reality of software code and ‘digital systems’, and helps to preserve fairness in legal proceedings.¹

1Colin Tapper, ‘Judicial attitudes, aptitudes and abilities in the field of high technology’ (1989) 15(3) and (4) Monash University Law Review 219, 228, where Professor Tapper considers the members of the House of Lords and Court of Appeal were unduly restrictive regarding the transient storage of a false password in R v Gold (Stephen William), R v Schifreen (Robert Jonathan) [1988] AC 1063, [1988] 2 WLR 984, [1988] 2 All ER 186, [1988] 4 WLUK 121, (1988) 87 Cr App R 257, (1988) 152 JP 445, [1988] Crim LR 437, (1988) 152 JPN 478, (1988) 85(19) LSG 38, (1988) 138 NLJ Rep 117, (1988) 132 SJ 624, [1988] CLY 787.

Chapter 6

Notes

Show the following:

Adjust appearance:

Annotate