Algorithms, in both the broad and narrow senses of the word, affect practically every aspect of our daily lives. Algorithms decide which advertisements are shown on in both the broad and narrow senses of the media such as Youtube (and sometimes get it wrong, causing producers of videos to receive income from advertisements appearing with their material, when the advertisers would never consciously approve it [1], often causing public anger (for example [2])); what prices we are quoted for airline flights [3]; and, in the USA, whether people get bail [4].
The House of Commons Select Committee on Science and Technology launched an inquiry into ‘algorithms in decision making’ on 28 February 2017. Both the IMA and BCS (the Chartered Institute for IT) submitted evidence with the writing process coordinated by the author. The vote on 19 April for an early General Election terminated the inquiry: all submissions can be viewed at http://tinyurl.com/Algorithms-enquiry. The aim of this article is to draw together the evidence and general points made in these submissions.
1 Background on algorithms
The word algorithm has, particularly in American but creeping into British English, acquired a wider connotation than the OED definition: ‘a precisely defined set of mathematical or logical operations for the performance of a particular task’. For example, Merriam–Webster has a ‘set of rules a machine (and especially a computer) follows to achieve a particular goal’. Hence we drew various distinctions to help the Committee, and the reader.
1.1 Proclaimed or inferred (from data)
A classic example of a proclaimed algorithm is Income Tax: if (for a given set of circumstances) your net income is X, you pay f(X) in income tax, where f is laid down in the Finance Act.
A classic example of an inferred algorithm is the wind chill, which is an algorithm with a precise mathematical statement, but was experimentally determined by the multinational JAG/TI group of scientists.
Much of the current trend in machine learning is for a computer, sometimes using a well-understood meta-algorithm to produce an algorithm without a precise mathematical statement, but based on a large amount of experimental data, generally referred to as training data. Most of these algorithms operate on data, often training or other background data (database) as well as foreground data (the question). They are at most as good as the background data they operate with, and bad data can lead to disastrously wrong results, as when the London Ambulance Service did not know where the Velodrome was [5].
1.2 Published, secretly understood, or not understood
Income tax, and wind chill, are published algorithms. Many companies operating in regulated industries, generally in finance, have secret algorithms for credit scoring, loan approval etc. These algorithms are part of their competitive edge. Yet, they need to be able to explain them to regulators, and to justify decisions if required.
An example might be a car insurance company, whose algorithm might include the step ‘To the base premium, we add a sum depending on the insured’s occupation, from this table’, where the table is computed based on past claims data. The table, the precise definition of occupations, and indeed whether it’s adding a sum, multiplying by a ratio, or both, are part of the insurer’s trade secrets. This example in insurance long pre-dates the use of computers, never mind machine learning algorithms, but the advent of technology has permitted much more analysis and the use of many more factors. Equally, it has enabled the use of much more precise calculations, rather than gut feel to populate these tables.
Conversely, many of the algorithms produced by machine learning, notably those based on deep learning, are not understood at all.
1.3 Advisory or determinative
The income tax algorithm determines the amount payable: it is not a suggestion to the tax inspector on how much to charge. University degree classifications have become more determinative over the years. A recent adoption of a determinative algorithm is the Duckworth–Lewis algorithm [6] in cricket. The algorithm is public, and understood by experts, but not by the general public, who just accept it.
Advisory algorithms produce a piece of advice to a human being who makes an ultimate decision. This may consist of evaluating several different scenarios, or may just be a simple answer. These answers may or may not (in practice, far too often do not) have some measure of confidence attached. Of course, an algorithm may in principle only be advisory but the human beings using it may in practice just rubber-stamp its advice, so in practice it’s determinative.
In the US case of Paul Zilly [4], defence and prosecution had agreed a plea bargain of a year in prison, but the judge looked at a recidivism score produced by a proprietary (and at least secret, probably not understood) algorithm, overturned the plea deal, and imposed a two-year sentence. In theory that was an advisory algorithm, but in practice it was being used to overrule the agreement which would have been rubber-stamped by the judge.
The General Data Protection Regulation (GDPR) Article 22.1 applies to a ‘decision based solely on automated processing’, i.e. a determinative algorithm. The relevant Recital (71) is less clear, and one might expect a lot of litigation where humans are rubber-stamping a decision that is in theory only advisory.1
1.4 Continuous or discrete
Many algorithms are used that yield numerical results varying continuously with their input. So if you earn £10 more, you pay £2 (or £4 or …) in income tax, and £20 more is twice that (unless you cross a threshold) and so on. If the wind speed increases by 1 km/h, then the wind chill changes by a certain amount. Other algorithms, particularly machine learning classifier algorithms, give one of a small number of discrete results, e.g. degree classification. In many cases the answer is binary: bail or no bail; (referral for) melanoma or not.
For a continuous algorithm, a small error in the inputs should result in a small error in the answer, whereas for a discrete algorithm, even the smallest error in the inputs may result in a different answer. If we understand the algorithm, we can consider the question ‘how close are we to a boundary?’, and possibly act on this information. For example, a university department might adopt (and some do) a rule that if a student fails by 1% or less, the scripts should be checked again. But if all we have is a black box that outputs decisions, we cannot ask this question.
1.5 Effect on people
It is a truism that every decision has consequences. The main concern is determinative (either de facto or de jure) algorithms making decisions that significantly affect people’s lives. Examples would include medical diagnosis, mortgage approvals and (in the USA) granting or otherwise of bail, and possibly sentencing. However, there are also effects on people’s prospects (being short-listed for jobs); finances, e.g. gender discrimination in insurance, which in theory is illegal, but can be perpetuated through use of gender-correlated data; and other important aspects.
1.6 Data
Any inferred algorithm is based on data. In the case of a scientific experiment, where the algorithm is often called a Law, as in Boyle’s Law, one expects the experiment to be reproduced, and these repeat experiments to confirm the algorithms. In the case of other fields, the nature of the data is more variable. The term ground truth is used to refer to absolute facts, such as ‘this skin lesion is a melanoma’, or ‘this person did not offend while on bail’, whereas we use proxy data to refer to what is actually used as the training data, ‘did a dermatologist diagnose skin cancer’ or ‘did a bail judge refuse bail (on the grounds of likelihood of re-offending)’.
Clearly ground truth is better, but it may be too difficult, or even impossible, to obtain. The study in [7] (which deduces that dermatologists have a 64% accuracy rate, but a sensitivity of over 80%) was very rare in actually conducting biopsies on patients with a negative diagnosis by the dermatologist, so that we have ground truth. In many circumstances it may be impossible, e.g. we can’t obtain ‘did this person offend while on bail’, merely ‘was this person caught offending while on bail’. This may seem pedantic, but note that we can’t obtain the opposite datum ‘would this person (who was not bailed) have been caught offending while on bail had he been released’.
This use of background data to derive an algorithm has two important consequences. The first is that any biases in the background data will be perpetuated in the algorithm. The second is that we have no guarantee at all that the algorithm produced will remain valid outside the range of those data. This means that every such algorithm needs to be traceable back to the background data that produced it.
2 Uses of algorithms
Although well-known to many readers, we thought it important to draw Parliament’s attention to various existing uses.
2.1 Queen Elizabeth aircraft carrier
The business case for the QE aircraft carrier drew heavily on the Cost Capability Trade Off Model, which uses a variety of modern algorithmic techniques such as Hybrid Model of Non-Linear Regression, Optimisation, Monte-Carlo Simulation and Design of Experiment in order to forecast the optimum performance within budgetary constraints. It is a mixture of proclaimed and experimental, is the company’s intellectual property, but was explained to the customer, and was advisory, exploring a variety of scenarios. See [8,9].
2.2 Scheduling etc.
Linear Programming, and its generalisation Mixed-Integer Programming, are used, generally as determinative algorithms, in practically every sector [10, slide 18]. The specification of the algorithm is public, even though the details of what makes it fast (not what the answer is, which is defined by the specification) are trade secrets. It is also an area in which the algorithmic advances are even greater than hardware advances, to the point where you would be better off running today’s software on 1991 computers than vice versa [10, slide 37]. For example, a supermarket chain would use Mixed-Integer Programming to decide how many lorries to send from each warehouse to each (delivery run of) supermarket(s), carrying which goods, to meet the order plan. If it used an independent haulage firm, that firm would use Mixed Integer Programming to schedule the drivers and vehicles, and so on, and again this would be a determinative algorithm.
2.3 Online advertising
Nowhere is commerce using opaque machine-learning algorithms more than in the Internet itself: algorithms determine what search engines return, what advertisements are shown to human beings, what advertisements are shown next to which content (which has caused controversy recently [1,2]) and so on. It is probable that none of these activities are intentionally (on the part of the designers of the algorithms) biased, but in practice they are [11]. These algorithms are determinative.
3 Biases or discrimination?
The Science and Technology Committee asked, and it’s a very natural question, whether the use of algorithms can eliminate, introduce or amplify biases or discrimination, whether this can be detected in a transparent and accountable way, and what the implications of increased transparency are.
Firstly, it must be noted that biases or discrimination are human perceptions of a process, and what is, or is not, discrimination depends on society, and changes over time. Consider, for example, the recent ruling about gender bias in insurance, where what had been a common – and public – industry practice, was suddenly declared to be a bias. It’s also a very nebulous question when pushed to precise specification, as doing something algorithmically requires.
3.1 Eliminate bias
If the algorithm is known and fully understood, it is possible to assert that it is no more biased than the data fed into it, and should produce the same output on identical input. New algorithms for combining panellists’ scores [12] could be used for many decision-making processes, e.g. the Research Excellence Framework, and would improve the openness of the decision-making. If the algorithm is not known and fully understood, it is only possible to eliminate those biases which are known about and checked for, and then only if the algorithm has the appropriate mathematical properties, e.g. linearity.
Most such machine learning algorithms are not understood at all: no human being can say ‘why’ the algorithm does what it does, nor can predict what it will do on data which are not the training data [13]. Even strong advocates of these admit this major weakness.
Although Deep Neural Networks (DNN) have demonstrated tremendous effectiveness at a wide range of tasks, when they fail, they often fail spectacularly, producing unexplainable and incoherent results that can leave one to wonder what caused the DNN to make such decisions. The lack of transparency in the decision-making process of DNNs is a signifcant bottleneck in their widespread adoption in industry, such as healthcare, defence, cybersecurity etc., where the error tolerance is very low and the ability to interpret, understand, and trust decisions is critical [14].
We noted that the research in that paper, though good, is only looking at specific mistakes, and answering questions like ‘why did the DNN misclassify this Chihuahua as a Shih-Tzu?’, and not ‘how does the DNN recognise Chihuahuas?’.
3.2 Introduce bias
It is certainly possible to introduce biases through an algorithmic process, just as it is possible to do so through a manual process. As pointed out [4], it is possible for such processes to be biased by considering things that human beings would refuse to consider publicly: the system used in Broward County Florida asks ‘Was one of your parents ever sent to jail or prison?’, while it’s hard to imagine a judge accepting the prosecution argument ‘the defendant deserves a harsher sentence because his father went to prison’.
Consider, again, insurance, more specifically car insurance. It is no longer legal to discriminate on the basis of gender. It is, currently, legal to discriminate on the basis of occupation, even though some occupations are predominantly occupied by one gender, giving rise to indirect discrimination [15].
3.3 Amplify bias
If one can introduce a bias, one can certainly amplify it. But there is an effect (uncertainty bias) by which an unbiased algorithm can become biased by considering factors that are not uniformly distributed, and this bias can grow over time as an active learning algorithm learns and reinforces its bias [16].
3.4 Detection of bias
Currently, detecting bias relies on human effort, sometimes aided itself by machine learning. We have quoted various instances of bias, but these are merely those that researchers have chosen to investigate. Direct bias, e.g. highly-paid jobs being offered to explicitly male searchers [11] can be obviated by search engines never taking gender into account (although this is highly improbable), but even so this would not eliminate indirect bias [15].
3.5 Transparent and accountable
Many of the most efficient algorithms for certain tasks are not transparent or accountable. It is, with the current state of technology, impossible to understand how a typical neural net reaches its decisions. Though the state of technology may change, it is currently the case that only algorithms which can be reduced to a (possibly complex) formula, can be understood, and even then possibly only by experts.
3.6 The implications of increased transparency
The background data should have been thoroughly anonymised (easier said than done). The foreground data should be subject to the usual data protection rules. There is a real challenge with active learning algorithms, where today’s foreground data becomes tomorrow’s background data. To the best of our knowledge, no satisfactory research has been carried out here.
An issue that has received very little attention (but see [17]) is the interaction between ‘the right to be forgotten’ and machine learning. If X exercises his right to be forgotten, does/should Y’s insurance premium change?
4 Bad practice
We were actually asked for examples of good practice, but couldn’t really find any in the machine learning area. We did want to warn about examples of bad practice.
4.1 Correlation versus causation
Though this error is warned about in most statistics courses, confusing correlation, which is what algorithms, machine learning or otherwise, can detect, with causality, which is what people are generally interested in, is still common.
Consider [18, p. 16], the example of ‘Screen time [at 14 years] was associated with lower academic performance [at 16 years]’. Media reporting generally converted correlation into causality, as in ‘programmes aimed at reducing screen time could have important benefits for teenagers’ exam grades’, whereas increased screen time might be an early indicator of poor performance, and the investment should be in coaching/catching up.
4.2 Training versus testing data
Many machine learning protocols require splitting the background data into training and testing data. However, it is bad practice to believe that this split is in any way significant. At the very least, this process should be repeated several times, which is known as cross-validation. The state of the art is not yet able to give good guidelines for several.
5 Recommendations
Both societies made several recommendations to Government, many of which will be obvious to readers. The full set of recommendations are in the submissions. Key ones were the following:
- The Government, and all its agencies and subcontractors, including recruitment agencies, need to review as a matter of urgency the use of all automatic processing to ensure it:
(a) Is only based on legitimate personal data of the individual;
(b) Carries a precise description of any training data it was built from;
(c) Has been tested for indirect discrimination;
(d) Is capable of being explained in line with GDPR 15.1(h). In particular, buzzphrases like Artificial Intelligence, deep learning algorithm, data-based algorithm should act as warning signs that the algorithm is in fact probably no more than unscientific reasoning by generalisation - No data should be fed to an algorithm, determinative or advisory, that would not be acceptable in an equivalent manual process: see the Zilly case (Section 1.3 and [4]).
- In machine learning algorithms, the background data contribute to the decision, so every such algorithm should be prominently labelled with the data that created it, both as a statement the lay person can understand (e.g. ‘based on London traffic data 1982–2002’) and question (‘but that was before the congestion charge’), and ultimately such that an expert can analyse it. This is an algorithmic consequence of the 15.1(h) right.
6 Conclusions
The role of the traditional algorithm, whether proclaimed or inferred, is great and growing (possibly not as fast as it should), whether in government or elsewhere.
However, there is a need to be cautious when endowing the algorithms produced by many forms of machine learning, which humans, even experts, do not understand, with determinative (either de facto or de jure) powers that affect people, individually or as society, in non-trivial ways.
James H. Davenport CMath FIMA
University of Bath
Acknowledgment
The author is grateful to the members and supporting staff of the IMA Research Committee and the BCS Academy of Computing, notably Professors Crick, Grindrod and MacKay. This work was done while the author was a Fulbright CyberSecurity Scholar at NYU, and he is grateful to the NYU Privacy Research Group, especially Professor Strandberg, for their debates.
Notes
Post-submission note: The Alan Turing Institute submission (number 73) claims that there is no ‘Right of Explanation’, but that was not the consensus at the NYU Conference on Algorithms and Explanations (http://tinyurl.com/NYU-algorithms).
References
- Sweney, M. and Hern, A. (2017) Google ad controversy: what the row is all about, The Guardian, 17 March 2017.
- Layton, J. (2017) Anger over Youtube Burger Bar gang shooting boast videos, Birmingham Mail, 22 March 2017.
- Malighetti, P., Paleari, S. and Redondi, R. (2009) Pricing strategies of low-cost airlines: The Ryanair case study, Journal of Air Transport Management, vol. 15, no. 4, pp.195–203.
- Angwin, J., Larson, J., Mattu, S. and Kirchner, L. (2016) Machine Bias. There is software that is used across the county to predict future criminals. And it is biased against blacks, ProPublica, 23 May 2016.
- Khomami, N. (2016) Cyclist died after three ambulances could not find Olympic velodrome, The Guardian, 2 June 2016.
- Duckworth, F. and Lewis, A. (1998) A fair method for resetting the target in interrupted one-day cricket matches, J. Oper. Res. Soc., vol.49, pp.220–227.
- Grin, C.M., Kopf, A.W., Welkovich, B., Bart, R.S. and Levenstein, M.J. (1990) Accuracy in the Clinical Diagnosis of Malignant Melanoma, Arch. Dermatol., vol.126, pp.763–766.
- Chamberlain, N. and Pinker, E. (2016) The Cost Capability Trade O Model (presentation at ORS 58). Available at: http://staff.bath.ac.uk/masjhd/Others/ChamberlainPinker2016a.pdf.
- Matthews, V. (2016) Top UK scientist on ‘the beauty of maths’, The Telegraph, 30 September 2016.
- Bixby, R.E. (2015) Computational Progress in Linear and Mixed Integer Programming. Presentation at ICIAM 2015.
- Datta, A., Tschantz, M.C. and Datta, A. (2015) Automated Experiments on Ad Privacy Settings, Proceedings on Privacy Enhancing Technologies, vol. 1, pp. 92–112.
- MacKay, R.S., Kenna, R., Low, R.J. and Parker, S. (2017) Calibration with confidence: A principled method for panel assessment, R. Soc. Open Sci. vol. 4, 160760.
- Knight, W. (2017) The Dark Secret at the Heart of AI, MIT Technology Review, 11 April 2017.
- Kumar, D., Wong, A. and Taylor, G.W. (2017) Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks, CoRR, abs/1704.04133.
- McDonald, S. (2015) Indirect Gender Discrimination and the Test-Achats Ruling: An Examination of the UK Motor Insurance Market (presentation to Royal Economic Society, April 2015). Available at: http://tinyurl.com/McDonald-RES.
- Goodman, B. and Flaxman, S. (2016) European Union regulations on algorithmic decision-making and a right to explanation, arXiv:1606.08813.
- Malle, B., Kieseberg, P., Weippl, E. and Holzinger, A. (2016) The right to be forgotten: towards machine learning on perturbed knowledge bases, International Conference on Availability, Reliability, and Security, CD-ARES 2016, Springer, pp.251–266.
- Grindrod, P. (2016) Beyond privacy and exposure: ethical issues within citizen-facing analytics, Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 374, pp. 2083.
Reproduced from Mathematics Today, August 2017
Download the article, The Debate about Algorithms (pdf)