William Hooper asks: What do lawyers need to know about the assurance of quality in software to contract for it effectively? How do litigators draw on this to prove or defend a claim? His view is that avoiding “system melt-down” seems wiser than dealing with it afterwards.
What is Software Testing?
Suppliers test systems to assess whether they do what they should do (functional testing) in a way that meets the customer’s need (non-functional testing). As such, it is the principal approach used to assure quality. Consideration of testing is useful both to transactional lawyers seeking to draft agreements that protect their clients’ interests and to contentious lawyers seeking to establish a claim.
If you have developed a spreadsheet and want to check whether it adds correctly, you may enter input data of 2 and 3, expecting to get the answer 5. If the actual result is what was expected, you call it a “pass.” If not, it is a “defect.” A useful “test report” contains details of the steps taken in testing by reference to the “test case,” of the input data, the result, and the deviation that leads you to believe it to be defective. In this way, when a developer is passed the defect for resolution, they may replicate the test as an early step in their triage, diagnosis, and fix.
Why Test?
The fundamental assumption is that if one looks for trouble before launching a product, one can address it before it harms anyone. Thus, the product is more likely to be satisfactory for users than if testing is inadequate.
Software engineers have long been aware that if they identify defects early in the process of development, they can fix them more cheaply than if the work has advanced. The reason is that the process of delivery involves bringing many components together. When a defect is discovered early on, just one component (that being developed) is affected. When found later, many others have been closely crafted to fit with the first, so each of these needs to be adapted and re-tested, first in isolation, then in combination. So, the impact is magnified. This is not a linear increase. If a fault is found only in live operation, the user population, support staff, documentation, data for processed transactions may all be affected. There can also be commercial fall-out as compensation or reputation are damaged. In this way, good testing is related to commercial success and profitability for the developing organisation and customer.
Risk and Testing
The aim of testing is to give reasonable assurance to those charged with developing and launching the system that it is ready for use and is likely to deliver greater benefit than it is harm.
This does not assure that the system is free of defects. No such guarantee can be given. Because of this, there is a residual level of risk. Managers decide whether testing has been appropriately rigorous to reduce the risk of harm to an acceptable level. If they delay launch to conduct more testing, there can be competitive and commercial consequences from this. So there are trade-offs to be made.
The conscious assessment and containment of risk is at the heart of good test design. This is assisted by the test managers’ having a good understanding of the intended business context of use, so that they focus their efforts on what is most important. The place to look for this is an over-arching document describing the project’s approach to testing, often called the “test strategy.”
In the most egregious cases, a system may be launched with little, or inadequate testing. The press, social media, customers, and regulators can be brutal in response.[1]
Some industries have developed sophisticated methods to address risks. Nuclear, aerospace and pharmaceuticals feature prominently. Such methods combine advanced management of the delivery process with considerations of risk and rigorous testing. West-Coast software developers have typically taken this on-board, moderated by methods such as progressive deployment and real-time monitoring of early responses to detect, react to and contain defects when they do occur.[2]
Types of Testing
There is a variety of types of testing with differing objectives. This results in each component being tested many times. When introducing a change, it is normal to repeat many of these. Types of testing that you may encounter include:
Functional
Unit – This is a set of tests normally performed by the person developing the component to validate that it performs the required function, such as the spreadsheet example above. One component may need to deliver several functions, each of which should have an associated test case. The unit is tested in isolation. Anything else the unit relies upon to function is simulated by programmes called “stubs” that deliver the result required from interfacing units and systems.
System – A system normally consists of more than one unit. In system testing, all the units are gathered and tested together, rather than relying on stubs. So, this encompasses looking at the interaction between component units.
Integration – A major system may have multiple elements, some from other suppliers or already in place within the customer’s environment. So, a finance system may interact with payroll and HR systems. Integration testing is a technical validation of the interactions and data flows.
Regression – Sometimes, when changing an element to fix one defect, it has unintended consequences, breaking another part of the system. Regression testing looks for such defects.
UAT – User Acceptance Testing is usually a late phase and is designed to address the question “is the system ready for business use?” It is not an exhaustive set of functional tests but is normally based on a few end-to-end scenarios.
It is normally required that functional testing should assure that the system does do what it should, or “positive testing.” It is wise also to check that it does not do what it should not, or “negative testing.” So, if you expect an input to the earlier spreadsheet example to be a positive integer, and the entry is either “-3” or “Friday” what does the system do? A helpful error message suggesting what is required is a good reaction; crashing is less good; producing an irrational answer is worse.
A complex system is likely to support many processes. Each may have an expected path and various exceptional cases. Each should be tested to assure it works as expected. It is likely to be infeasible to test all combinations, hence the use of risk to prioritise what are selected.
Non-Functional
Security – The project’s security lead should have conducted a security risk assessment. This will assess the value of the system’s function and data, consider vulnerability, the risks of attack and the means these may occur. From that, counter-measures may be constructed and their efficacy tested. One commonly adopted type of security test is “penetration” or “pen” testing. In this, hire a trusted person to attempt to penetrate the system’s defences and review its construction.
Performance – Express non-functional requirements as testable performance parameters. These can include elements such as response time; languages supported; support to disabled users; availability; capacity. Each parameter will have its own test.
User
Useability – Many systems need to operate effectively on a range of platforms such as mobile, PC, tablet. It is wise to validate that the system works effectively for the intended users, that they find the flow of interaction to be understandable and that it is effective in supporting them in their “jobs to be done.” [3] Useability testing explores aspects of the user experience.
Operational
Data Migration – If the new system is to take over from an existing one, there is likely to be data on historic transactions and assets that the new will need access to. Assume that existing data has faults such as missing or corrupt fields. Permitted values may also differ between the old and the new. Data migration testing runs along-side iterative cleansing of the data and its treatment to prepare it for the new and validates that transactions that are in-flight can be handled.
Deployment – Users of the new system may need material such as documentation and training to prepare them for the new system. Assess the efficacy of such preparation before rolling it out.
Support – Conduct “Operational Acceptance Test” or “OAT” through the repeated review of checklists. Questions may include “do we have a set of knowledge articles prepared to support the service desk with known issues?” It validates that the support organisation is ready for the system’s launch.
Code Inspection
It used to be widespread practice to require developers to submit code for human review. Whilst this is still used, it is normally now by exception and based on small sample sizes once a developer has established competence. Automated tools have taken over the bulk of the work.
Static Test – Automated tools are used to test the code for its ability to compile and for conformance to coding standards. The best organisations take this a long way into promoting the use of good practices in areas such as making code readable and declaring classes. Some tools automatically correct non-conformances.
The Limitations of Testing
Testing should be risk-based. It can assure within the scope of considered risks. It can say nothing of un-imagined “black swan” combinations of behaviour and data modes.
The aim of testing is to establish “does the system behave as intended?” A frequent source of contention is that what applies to the design of test is the designers wish. This may differ from that of the user and of the commissioning customer, especially where the expression is inarticulate. Most software testing is silent on the quality of the specification and of associated design until user testing.[4] The modern use of iterative design brings this process forward to avoid unwelcome surprises later.
Managers may consciously or ignorantly limit the scope of test. Often, they do this to accelerate launch. Sometimes the bet pays off. Sometimes not.[5]
Test Systems, Data, Environments
Setting up systems that replicate the production environment, or a part of it, can be expensive in labour, hosting, and maintenance charges. This is less of an issue in these days of virtualised and containerised systems than it was when everything was physical. But it still has costs.
For a test to operate, it must have access to:
The system – at the appropriate release level for every component required (or stubs).
An environment – loaded with the system to be evaluated, all pre-requisites and data.
Test data – Getting hold of enough of the right data can be a real problem. The contract often defines this as a customer obligation, one that can be difficult, causing delay. Then the customer’s security staff object to putting sensitive live customer data into an unsecured environment.
Modern software engineering promotes “test driven development” (TDD). Under this approach a developer first writes the test cases, then develops the code to satisfy them. This puts testing at the heart of the development process. Automated testing assists greatly.
So What? For Transactional Lawyers
Transactional lawyers are rightly reluctant to impose schedules defining detailed operational methods on the supplier. The tendering and selection process should have asked the customer to describe what they will do and the ways in which they will assure quality. An informed advisor with operational experience of testing should review the submission and raise the right awkward clarification questions during negotiations. Then upload the combined method statement, questions, and responses to become a schedule of the agreement. I hope to assist the drafting lawyer by providing introductory context and understanding to detect distracting waffle and to focus on what matters to the client.
The diligence of risk assessment heavily influences the level of assurance provided by test, making risk an area to prioritise. It is also worth considering how to report the progress and outcome of test. A key measure is test coverage, being the proportion of planned cases that are assessed.
So What? For Contentious Lawyers
Should the quality of testing or the treatment of defects become the subject of dispute, the contentious lawyer will be working along-side an expert who they need to instruct. There may be issues of breach and of tortious negligence, along-side consideration of associated loss. I hope that this article provides a guide to areas of test and their relation to the case that support the lawyer in their management of the matter.
Once test detects a defect, those investigating the case will be interested in whether the rate of fix is consistent with the planned schedule. They also look at whether defects accumulated in an uncontrolled manner or were simply and effectively despatched. Your expert should roll-up their sleeves and mine it for patterns that indicate systematic trends, so bringing clarity on the issues to the court.
If experts differ, it is likely that the supplier’s expert will seek to give the impression that overall, quality was good despite obstacles erected by the customer. The supplier was heroic. The customer’s expert may bemoan the manifold and serious failings encountered across delivery and the accumulated defects that took months to resolve.
Conclusions
A good and diligent programme of testing gives useful assurance that software is likely to be dependable. It complements good design, resourcing, and delivery methods. Where testing is appropriate in coverage and diligence, strong assurance follows and decisions are sound. Where testing is unreliable, so are its results.
Good delivery organisations embrace thorough testing and weave it into their development plans. The poor postpone the day of reckoning. Is your head high, scanning for threats, or buried in the sand?

William Hooper acts as an expert witness in IT and Outsourcing disputes and a consultant in service delivery. He is a member of the Society of Computers and Law and a director of Oareborough Consulting. He may be reached on +44 7909 958274 or William@Oareborough.com
[1] https://www.bbc.co.uk/news/business-50471919
[2] Software Engineering at Google, Titus Winters, Tom Manshreck, Hyrum Wright, 2020, O’Reilly Pages 301-303
[3] Know your customers’ “Jobs to be done”, Clayton M. Christensen, Taddy Hall, Karen Dillon, David S. Duncan, Harvard Business Review, September 2016 https://hbr.org/2016/09/know-your-customers-jobs-to-be-done
[4] https://oareborough.com/Insights/assessing-design-quality/
[5] https://www.fca.org.uk/news/press-releases/tsb-fined-48m-operational-resilience-failings and