The Suffolk Journal of Trial and Appellate Advocacy law journal published Peeking Inside the Black Box: A Preliminary Survey of Technology Assisted Review (TAR) and Predictive Coding Algorithms for eDiscovery (21 Suffolk J. Trial & App. Advoc. 221, 221-286 (2016)). The article explores the sometimes complex and arcane technical language of predictive coding, keyword search, and technology assisted review (TAR). As a lawyer and experienced technology professional, the article explores how these new-to-the-legal-community technologies work along with the limitations and benefits of the technologies. With extensive notations (300+ citations), the well-cited article imports academic literature from machine learning, natural language processing, information retrieval, and statistics to explain how predictive coding and machine learning work in an eDiscovery context.
Shannon Brown teaches eDiscovery technologies at Widener School of Law. The teaching provides insights into the educational needs of the legal community. Shannon Brown is also the author of open source eDiscovery software (Prolorem eDi) used for law school classes and by legal community members.
The abstract for the article summarizes:
This article fills a troubling gap in the legal literature related to e-Discovery software systems. Lawyers, law students, and law school professors have no concise resource for learning about or teaching about e-Discovery technologies such as technology assisted review (TAR), “predictive coding,” and older keyword search systems.
Peeking Inside the Black Box provides the legal community with a preliminary overview of some of the algorithms and methods used in keyword search, TAR, and “predictive coding” software. The article first illustrates the ethical duties and strategic or practical reasons for knowing how these technologies work. The objective is to reduce reliance on non-lawyer experts—who may misunderstand the legal implications of applying technical systems.
Before delving into the algorithms, the article then addresses how these computer algorithms translate human-readable documents into computer-understandable “language”—called preprocessing. Surprisingly, preprocessing has not been addressed in legal literature even though this step defines what the algorithms “see” and thus the potential effectiveness of the algorithm output.
The article then explains the critical distinction between keyword search systems and TAR or predictive coding systems. This distinction, hinted at in case law and articles, finally reveals the source of the Go Fish Problem—where lawyers blindly select keywords in hope of identifying relevant materials. However, the explanation requires a basic technical understanding of how keyword search algorithms fundamentally differ from TAR or predictive coding algorithms. Once understood, lawyers gain additional insights into when and how to deploy these tools in litigation.