To Tell The Truth - Will the Real Electronic Data Processing Solution Please Stand Up?
Posted by Salvatore Mancuso at 11:47 AM
111 comments - Categories: Service Providers | Data Processing
How does one select an efficient and effective data processor? Should it be by price? Quality? Both? For me, a low-cost provider won’t cut it. But on the flipside, with so many options available, I find myself spending far too many restless nights struggling with the answer. What is the right set of criteria, including cost of course, for choosing one solution over another. While I don’t long for the old days of reams and reams of paper, I do recall how much easier it was back then. A scanning shop with three shifts. The latest and greatest scanning software. Fast turnaround and the ability to handle 50 boxes in a day. Boy, those were some easy choices. How things have changed in this age of electronic data processing.
So accepting that quality is a driving factor, how do we measure it for electronic data processing? Sure we can rely on our old friend, Mr. Paper, and adapt old criteria to new: strong, around the clock project management; comprehensive processing and output solutions; and fast turnaround times for large volumes of data. However, those are only pieces of the puzzle, and it is frequently the unknown that will hurt you and ultimately your clients. Dealing with this “e-Factor,” as we call it, is extremely difficult to value and even more difficult to identify. In choosing a data processor, it is critically important that the vendor be able to handle the unknowns, the “e-Factors.” Having data processors who can seamlessly handle the routine but also tackle new and unique problems and situations is possibly the single most important criteria for selection.
Question all aspects of each system that claims to effectively handle electronic data processing. What are the advantages and disadvantages of processing data with a proprietary system over an off-the-shelf product? How does one tool handle extraction of metadata and text versus another? Which hashing algorithm does the proposed solution use and which fields does it use to determine the hash? What about treatment of foreign language? Sure, Spanish is easy but what about Bashkir or Kanuri? Does the system process Lotus Notes mailboxes without converting them? Can the system de-duplicate across a PST and a Lotus Notes mailbox? How about “white-on-white” text within an email? These are only some of the questions that require a trusted and tested response.
Let’s consider a few realistic scenarios to put these concerns in perspective. First, take a situation in which you are faced with processing emails that were generated in an Outlook 2003 environment, stored in an outsourced archiving system, and then output in an Outlook 2007 PST format for ease of delivery … or so we think. To your complete surprise, when processed and loaded into your review tool, the full-text from the body of several emails is truncated. And here’s the part that’s really interesting – the cut-off text occurs seemingly at random and affecting only searching, as the native files and image renderings appear fine. The issue is generated by the embedded XML tags within the exported Outlook 2007 PSTs in combination with the text-extraction methodology your trusty data processor uses. The potential problems, such as incomplete search yields resulting from a privilege search of only partially extracted text, are truly daunting, and this continues to be a significant problem.
Here’s another scenario. You are about to decide on the de-duplication strategy for a huge document review. De-duplication can be a great cost saver but it can also be a great pain in the “hash.” What fields does your electronic data processing solution use for identifying duplicates? So long as there are a bunch, it’s no big deal you say? Wrong! Those fields are extremely important – especially in how they affect the hashing process for attachments of emails. Not all electronic data processing systems generate hash values in the same way. For example, there are certain tools that will generate a hash value for the attachment “file name” only, and others that will generate a hash value for the entire attached file, including all text within the file. This small difference is the key behind emails that appear to be identical at face value but are not due to differing attachments despite their same filename. And when those false duplicates fall out of the output set, good luck explaining it to the litigation team at best or the court at worst.
In June of 2006, the members of my litigation services department assembled a data set reflecting some of the worst case scenarios we’ve encountered and asked more than fifty service providers to participate in an evaluation process. Starting with a data set consisting of various problematic emails and attachments, we began testing service providers and their electronic data processing solutions. Our objective was to identify and qualify proven technologies. It wasn’t a pass/fail test. Instead, we evaluated each service provider’s electronic data processing system against a set of known issues and then, based on an initial round of feedback, allowed them to adjust their protocols accordingly. The results were fascinating and unequivocally proved that no two data processing systems are the same. More importantly, the lessons learned have empowered my team to set appropriate expectations for our attorneys and to effectively communicate our attorneys’ needs to the service provider. These lessons also have proved useful to objectively address situations where a data processing solution recommended to us by a client or co-counsel was not suited to our needs.
When it comes to electronic data processing, quality, not price, should be driving the e-discovery bus. In fact, quality in many cases will actually justify a higher cost for services. We always look to the following criteria when measuring quality:
- The results of our electronic data processing test
- Project management
- Flexibility of the system (based on new issues)
- Consistent results – ensuring each project is handled the same way
- Regression testing of the system itself against all known issues
- A service provider’s recovery time
Regretfully, I am still looking for the Holy Grail of electronic data processing solutions, but at least I’m having an easier time sleeping.
Andy Wilson wrote on 12/22/09 12:05 PM
Nice post Salvatore. I hope this lands on the desk of all those attorneys that end up choosing the eDiscovery vendors. Sadly and ironically, many still choose based on lowest-price-wins. Having the foresight to know better comes with experience and solid advice. Here's to hoping 2010 will be the year of both.