Hidden Bias in Empirical Textualism

by Matthew Jennejohn, Samuel Nelson & D. Carolina Núñez

A new interpretive technique called “corpus linguistics” has exploded in use over the past five years from state supreme courts and federal courts of appeals to the U.S. Supreme Court. Corpus linguistics involves searching a large database, or corpus, of text to identify patterns in the way in which a certain term is used in context. Proponents of the method argue that it is a more “empirical” approach than referencing dictionaries to determine a word’s public meaning, which is a touchstone in originalist approaches to legal interpretation.

This Article identifies an important concern about the use of corpus linguistics in legal interpretation that courts and scholarship have overlooked: bias. Using new machine learning techniques that analyze bias in text, this Article provides empirical evidence that the thousands of documents in the Corpus of Historical American English (COHA), the leading corpus currently used in judicial opinions, reflect gender bias. Courts and scholars have not considered that the COHA is sexist, raising the possibility that corpus linguistics methods could serve as a vehicle for infecting judicial opinions with longstanding prejudices in U.S. society.

In addition to raising this important new problem, this Article charts a course for dealing with it. It explains how hidden biases can be made transparent and introduces steps for “debiasing” corpora used in legal interpretation. More broadly, it shows how the methods introduced here can be used to study biases in all areas of the law, raising the prospect of a revolution in our understanding of how discriminatory biases affect legal decisionmaking.

Continue reading Hidden Bias in Empirical Textualism.

Georgetown Law Journal

Georgetown Law Journal

Hidden Bias in Empirical Textualism