Extracting data from electronic documents has become a serious issue of concern for those working in the legal profession and related IT services due to the perceived scale and difficulty of locating, extracting, de-duplicating and ultimately disclosing relevant and targeted information. In most organisations, the sheer amount of information stored electronically is overwhelming and, whilst any professional needing to locate targeted data will not want to overlook any potential evidence, which information sources are used and how the data is extracted is continuing to cause considerable controversy.
The level of complexity in identifying and extracting targeted data to support disclosure can vary and will inevitably depend on the way in which the data was stored and managed in the first instance. In typical situations data is held on desktops and servers initially and then backed up to tape regularly to ensure that a ‘work-in-progress’ type copy is retained for a defined period. This period is relatively short and, once this period has expired, these tapes can be re-used. More importantly however, the same hardware and software that is used for back-ups is also used to create archive copies of the data for completely different purposes. The archive copies are usually complete copies of the organisation’s data that are created to be stored in an off-site location. This has recently become a regulated process due to the introduction and enforcement of various legislation (Sarbanes Oxley, Freedom of Information Act, Basel II and new Companies Act obligations etc) that is designed to facilitate greater accountability and protect the organisation, its supply chain and the general public against fraud and poor accountancy practices (as demonstrated with Enron, Parmalat and Worldcom etc).
So, whilst organisations have been given guidelines to assist them with storing and retaining information, there has yet to be a definitive and regulated process for locating and extracting that information, should it need to be disclosed for regulatory or internal compliance requirements. To address this problem (among others), a working party (under the Honourable Mr. Justice Cresswell) created a report to assist the legal sector in defining the scope for discovery of information held in electronic format and the difficulties that need to be overcome during the eDiscovery process (see Computers & Law, vol 15, issue 4).
Oversights in the Cresswell Report
First and foremost, the Cresswell Report seeks to elucidate the ambiguities in the Civil Procedure Rules, stating that the word “document”, as referred to in r 31.4, needs further clarification as metadata (system information concerning the documents in question) could hold the key to an investigation in determining who modified what documents, on what date and the specific changes that were made. The Cresswell Report also discusses r 31.7, which describes a party’s obligation to make a “reasonable search” for relevant information in some detail.
The amount of data that needs to be examined, in addition to the cost of finding and then extracting it, needs to be carefully considered in any eDiscovery case. In order to address these issues, the report has drawn on examples from previous cases in the
Drawing on the previous definitions of back-up and archive data, it is evident that this differentiation is somewhat overlooked in the Cresswell Report. In practice these are different, have distinct uses and can be used in different ways as part of the disclosure process. Section 3.13 refers to the use of back-up tapes in the discovery process and concludes that “such a search will only produce ‘snapshots’ of the data held in the computer at the selected dates and times.” Whilst this is accurate when referring to back-up tapes in isolation it fails to mention that archive tapes can be used for full data restoration at multiple points in time. When comparing this ability with traditional desktop and server examinations which can only offer ‘single point in time’ snapshots, the benefits of using tape in the eDiscovery process become evident.
This is not the only benefit of using tape. In section 3.3(3), the Cresswell Report discusses the way in which organisations use archive systems that “store data on magnetic tapes which contain documents from many different sources. Thus, any search for disclosure may involve trawling through thousands of irrelevant documents in order to comply with the party’s disclosure obligations” – which would suggest that this is a disadvantage of using tape. In actual fact, this is a distinct advantage as the archive tape provides a single source for all types of file, across multiple locations and individuals, stored on a media that makes alteration of individual files impossible. This then reduces the time, complexity and associated cost of eDiscovery and enables a wider-ranging search, involving a greater number of documents and a wider timeframe that will still be regarded as “reasonable”. Further advantages are apparent when third-party professionals, like eMag Solutions, are employed to assist in the process, as sophisticated techniques to de-duplicate documents are used which will increase the speed at which targeted data is identified.
In section 2 of the Cresswell Report, the issues of obsolescence and processing difficulty are raised with “expensive-to-restore back-up media.” The report rightly makes the point that retained tapes might be in a format that is no longer readable due to the fact that the drives used to create them have now become obsolete. However, it fails to mention that this data can easily be retrieved with the help of a third-party specialist, at no extra cost, due to the fact that data can be restored non-natively.
Section 2.18 of the Cresswell Report discusses compressed data held on back-up systems and states that it “can be difficult and costly to retrieve.” With the help of a specialist, compressed data represents no additional challenge in the extraction process and should not therefore incur any additional cost. Section 2.18(5) then goes onto discuss residual data, ie data that is left when documents are deleted from an active system.
The report again refers to this data as difficult and costly to retrieve which is this instance is correct; however it does not convey the full picture. Due to the nature of tape back-up systems it is likely that a back-up or archive tape would have been made prior to the deletion of the particular documents in question, enabling full restoration of the files in their original state.
Conclusion
Tape has a number of benefits that have been overlooked in the Cresswell Report:
data can be retrieved from a number of points in time providing distinct improvements over the ‘single point in time’ provided by traditional desktop and server examinations
data can be sourced from points in time that precede any subsequent file deletion
tape provides a single source for all file types including e-mail, documents and financial reports, reducing the complexity and cost of the process
tape can be used to retrieve data from multiple sites, systems and individuals again simplifying the process and removing cost
sophisticated forensic techniques mean that duplicate files can quickly be identified and removed reducing the time and cost of examination
the extensive facilities that eMag maintain mean that perceived difficulties associated with processing tape, such as obsolete drives and compressed data, do not introduce complexity or additional cost.
Whilst the report has raised a number of valuable questions surrounding the eDiscovery process as a whole, the benefits of using tape-back ups should be clearly considered before the report is used to provide guidance across all situations that may arise the future.