The widely predicted large-scale adoption of autonomous AI systems
in the coming years will give rise to difficult legal and indeed philosophical
issues with which (outside of academe) we are yet fully to grapple. Obvious
questions arise – such as whether AI entities should be treated in law as the
agents of their creators or owners and where liability for the actions of AI
entities should fall – some of which do not admit obvious answers and may
ultimately need a legislative response.[1]
That is still for the future (albeit perhaps the not-at-all-distant
future). Meanwhile, AI in the form of machine learning algorithms is already
heavily deployed in numerous industries, and tools based on such algorithms are
used extensively in commercial life. Whilst the application-specific machine
learning algorithms in daily use today do not test legal boundaries in the way
that autonomous AI systems of the future might, the questions of liability to
which they give rise are nonetheless a hot topic of debate.[2] The
fundamental problem is succinctly stated by Thomas Burri of the University of
St. Gallen in his paper Machine Learning
and the Law: Five Theses:[3]
“The behaviour and actions of machine learning systems are not fully
foreseeable in all situations, even when the algorithm directing the learning
is known. A machine learning system’s behaviour is based on the patterns and
correlations it discovers in datasets. These patterns and correlations by
nature are not known in advance, else no learning would be required.”
This inability to know in advance what the output of a
machine learning algorithm will be, together with a degree of mystery that still
surrounds the concept of machine learning,[4] not only
gives rise to potential legal challenges, but is also invoked from time to time
by providers of professional services (including some in the legal profession) as
a justification for resisting the use of machine learning systems altogether.
The argument runs as follows: I do not understand how the machine works; I do
not know how it thinks; I do not know in advance what its output will be; and
even when I have its output, I do not understand how it got there; I cannot
properly rely on such a system to provide services – to do so would be an
abrogation of my contractual and professional duty.
In anticipation of the upcoming
SCL Annual Conference, which focuses on real-world problems arising from AI
and machine learning systems, this article explores (in an un-grounded and
wholly unscientific way) whether resisting machine learning systems on the
grounds articulated above is a manifestation of the sort of justified caution
that would be expected of a professional carrying out services with reasonable
care and skill,[5]
or whether it is the last cry of King Canute.
I will explore this issue initially through the example of
technically-assisted review or “predictive coding” (“TAR”) tools before making
more general observations. TAR is a useful example of the phenomenon of
machine learning in the present context because: (i) as a technology it will be
familiar to most readers of Computers
& Law magazine; (ii) slowly but surely it is achieving widespread adoption,
not least owing to encouragement from the judiciary; but nonetheless (iii) its
use gives rise to precisely the concerns articulated above; so (iv) it allows
me to ask the rhetorical question “if
TAR, why not every other form of machine learning?”!
Document review is a classic machine learning problem or,
more specifically, a classic supervised machine learning problem. Not all TAR systems
work in precisely the same way, but the general principle can be described as
follows: (i) expert human reviewers review a sample of documents from a large
document population and categorise them as (for example) disclosable/not
disclosable or privileged/not privileged; (ii) the machine learning algorithm
seeks to “understand” the criteria used by the human reviewers and uses its “understanding”
to apply the same categorisation criteria to the remainder of the document
population; (iii) over a series of iterations, expert human reviewers review a
sample of the algorithm’s categorisations, correcting them if necessary, and
the algorithm seeks to improve its performance; until (iv) eventually, the
human reviewers are having to overturn so few of the system’s categorisations
that the review is deemed successful and complete.
The software performing the miracle of TAR shares with many
other machine learning systems the characteristic that those using it will for
the most part have no idea how it is working (because it is conceptually quite
complex and the details of the underlying algorithms will likely be proprietary
secrets); and even with access to the source code it would likely be impossible
to gain any real insight into how any particular document is categorised in a
particular way (the Burris issue). Notwithstanding that, the software produces
high-quality results and, critically, results that are verifiably of high quality. In essence, the ability to verify
results is the reason why the technology can be trusted. To use TAR is not to
abrogate responsibility for a disclosure exercise to a machine whose inner
workings cannot be understood; it is merely to shift the focus of the effort
from the reviewing to the training and validation processes. Plainly it is
possible to use TAR negligently. However, solicitors satisfying themselves with
a sufficiently thorough sampling exercise that the software is yielding sufficiently
high-quality results can hardly be criticised merely because the (highly
efficient) process that they have undertaken is not susceptible to scrutiny
save in respect of its outputs. The validity of that assertion can be tested by
comparing the exercise with a typical “traditional” disclosure exercise
involving large teams of paralegals working hours on end reviewing document
after document: the multitude of human minds involved are no more susceptible
to scrutiny than the TAR software, and arguably less so.[6]
Underpinning typical TAR
processes are three implicit assumptions: (i) that there is a yardstick by
which the performance of the software can be judged; (ii) that yardstick need
not be measured exhaustively provided that sufficient confidence can be gained
in the system; and (iii) it is possible to ascertain what a “sufficient” level
of confidence may be. These assumptions remain implicit from the perspective of
the user because the yardstick of success in a disclosure exercise is so
obvious that it generally goes without saying, and the second two assumptions
are addressed by the e-disclosure providers who design the procedures and are
both part of the package and the subject of broad industry consensus.
Making explicit those assumptions
is perhaps the key to the more general question of what might constitute
reasonable skill and care in the age of machine learning: identifying
appropriate yardsticks to measure the performance of a machine learning system,
assessing what level of confidence is needed in the results, and undertaking
validation processes sufficient to ensure that the required level of confidence
in the results has been met. The focus shifts from actually producing artefacts
to designing and implementing quality control processes; exercising reasonable
skill and care involves ensuring that the level of scrutiny to which the
outputs of a machine learning algorithms are subjected is commensurate with the
risks involved in the exercise and the ramifications of failure. That is what
we do with TAR,[7]
so why not with everything else?
Whilst this sort of thinking is
relatively unfamiliar to many of us in the legal profession, it is humdrum in
many other domains, most obviously in safety-critical industries where it finds
expression in the notion of acceptable risk. Although it will probably be some
time before my machine-generated pleadings are accompanied by a carefully
crafted safety case (being my sole human contribution to the process), when it
does happen I am happy to be held to the same standard of care as is required
of me today – but by reference to the skill and care which I have applied to my
safety case.
Disclaimer: This
article was the emergent product of a machine-learning algorithm. The author
provides no warranty as to its accuracy or completeness or its ability to stand
up in a court of law.
Matthew Lavy is a Barrister at 4 Pump Court, a trustee of
SCL and one of the panel at the SCL
Annual Conference session on ‘AI and machine learning – the real legal
issues’.
[1]
For an interesting take on
this, see Res Robotica! Liability and
Driverless Vehicles, by Peter Lee and Sabrina Richards (https://www.scl.org/articles/3167-res-robotica-liability-and-driverless-vehicles).
[2]
Other legal problems also
arise, such as the practical implications of the right to be forgotten on
machine learning. However, the focus of this article is liability.
[3]
http://www.mlandthelaw.org/papers/burri.pdf
[4]
Which I am reliably informed
really is based on tractable mathematical concepts such as matrix
transformation and not on black magic or any other dark arts.
[5]
For the purposes of this
article I use the phrase “reasonable care and skill” as shorthand for “to the
requisite contractual standard and so as to avoid liability for professional
negligence”.
[6]
The typical manual review
exercise is not subject to nearly the same level of quality control as a TAR
exercise either. However, as the purpose of this article is not to extoll the
virtues of TAR, I say no more about that, or the numerous studies evidencing
the superiority of TAR over manual review.
[7]
For a detailed and
fascinating insight into the impact on process on the efficacy of TAR, see
Chapter 9 of Tredennick’s TAR for Smart
People (https://www.law.ufl.edu/_pdf/academics/centers/Catalyst_TAR_for_Smart_People.pdf).