E-Discovery Law Blog

Mar 19 2010

Building Better Search Term Sets

Posted by Juilan Ackert at 11:17 AM
3 comments
- Categories: Search tools & methodologies | Keyword Search


Generating a set of keywords to filter a document set is growing more commonplace during the e-discovery lifecycle.  These keywords often include terms like the product name in a product liability case or the drug name in a patent litigation, and provide counsel a more targeted approach to document review.  With the volumes of ESI growing at exponential rates, keyword searches can be instrumental to comply with discovery deadlines.

However, keywords can seem like a blunt instrument at times, doing more harm than good.  Designing the keyword search methodology can be a critical component to this process.  An inadequate approach can leave the court in an uncomfortable position, as was the case in William A. Gross Constr. Assocs. v. Am. Mfrs. Mut. Ins. Co., 2009 U.S. Dist. LEXIS 22903 (S.D.N.Y. Mar. 19, 2009).  At issue in this matter was the production of emails for a non-party.  The non-party did not identify the nomenclature its relevant custodians used when referencing matter related emails.  Because of this, the court had to create a keyword search methodology without appropriate background information from the non-party.  Magistrate Judge Andrew Peck indicated that

“…at a minimum [counsel] must carefully craft the appropriate keywords, with input from the ESI’s custodians as to the words and abbreviations they use, and the proposed methodology must be quality control tested to assure accuracy in retrieval and elimination of ‘false positives’.”  

What Peck left off of his analysis was the phrase ‘false negatives’, which is equally, if not more important than ‘false positives’. Instead he uses the layperson’s phrase “accuracy in retrieval” which is more vague and open to several interpretations. It’s true that requesting parties object to large document dumps (‘too many false positives’), but it’s equally true that they object to not getting enough relevant documents (‘too many false negatives’).

Many search and review technologies available today provide functionality to fine tune keyword searches, including complex Boolean connectors that identify terms within a specific proximity to each other or concept based searches that return documents containing the keyword and synonymous terms.  Coupling the power of these technologies with an analysis as to the effectiveness of the keywords, their ability to remove ‘false positives’, and the reduction of ‘false negatives’ provides a more robust keyword search methodology.  

Imagine throwing darts.  The dartboard is your document collection.  The bullseye is your set of responsive documents.  The darts represent your search terms. An experienced dart thrower would not throw darts at a dartboard while wearing a blindfold.  They would use feedback (eyes) as to where the first dart landed, then adjust the dart (search terms) with each subsequent throw and get the next dart closer to the mark.  This measure of trials and errors is key to ensuring successful dart throws with improvements from throw to throw.  A keyword search analysis endeavor is key to ensuring that one does not approach search terms as a blindfolded dart toss exercise.  

Measuring and correction is an iterative process.  It should include a mechanism to evaluate the accuracy of the search terms.  There should be a way to measure the quality of the responsive documents to the amount of ‘noise’ returned by each keyword.  Like all processes, you will benefit if you have a goal in mind before you start – what measured degree of success are you seeking?  To not measure, and improve, the effectiveness of search terms would be to run the risk of another Peck opinion.

With Peck’s opinion, and other recent opinions, it’s clear that Judge’s are providing guidelines that will allow the more frequent application of search terms.   With other changes, such as the Federal Rules of Evidence, Rule 502 on Waiver, the very institution of law seems to facilitate, if not encourage, their use.

 

 

About Juilan Ackert

More ...



Comments

big boobs wrote on 05/19/10 7:08 AM

Perfect post. Thanks!

Portrait Painting Instructions Tips Techniques wrote on 06/06/10 11:12 PM

I recalled when Google was still in its infancy stage their algorithm were not so much tuned to find out relevancy. But as years went by they have finally came up with a better set of formula to determine "false positives" and "false negatives".

registry cleaners wrote on 06/17/10 3:52 AM

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

Write your comment



(it will not be displayed)





Search Archives