InfoCoBuild

Machine Learning for Discovery in Legal Cases

Machine Learning for Discovery in Legal Cases by David D. Lewis - Machine Learning Summer School at Purdue, 2011. Changes in the Federal Rules of Civil Procedure in December 2006 led to an explosion in the amount of electronically stored information that needs to found and turned over in civil litigation in the United States. Traditional manual review approaches (rooms full of low paid lawyers and paralegals reading paper documents) have collapsed under this burden, spawning a multi-billion dollar electronic discovery (e-discovery) software and services industry. Information retrieval technology, particularly supervised machine learning for text classification, plays a pivotal role.

I will review the major technological and process challenges in e-discovery, the ways in which machine learning has been brought to bear on these challenges, and results from benchmarking efforts (in particular the NIST TREC Legal Track) in this area. I will also outline a new theoretical framework for studying supervised learning algorithms, Finite Population Annotation. FPA was inspired by the technical and legal context of the e-discovery setting, but arguably is an appropriate model for a range of practical applications of active and transductive learning.

Machine Learning for Discovery in Legal Cases


Machine Learning Summer School at Purdue, 2011
A Machine Learning Approach for Complex Information Retrieval Applications
A Short Course on Reinforcement Learning
Classic and Modern Data Clustering
Divide and Recombine for the Analysis of Big Data
Graphical Models for the Internet
Introduction to Machine Learning
Large-Scale Machine Learning and Stochastic Algorithms
Machine Learning for a Rainy Day
Machine Learning for Discovery in Legal Cases
Machine Learning for Statistical Genetics
Mining Heterogeneous Information Networks
Modeling Complex Social Networks
Optimization for Machine Learning
Privacy Issues with Machine Learning: Fears, Facts, and Opportunities
Survey of Boosting from an Optimization Perspective
The MASH Project