Sentence Boundary Detection in Adjudicatory Decisions in the United States
Sentence Boundary Detection in Adjudicatory Decisions in the United States, Jaromir Savelka, Vern R. Walker, Matthias Grabmair, and Kevin D. Ashley, in the journal “Traitement Automatique des Langues”, Volume 58, Number 2: pp. 21 – 45 (ATALA, 2017). The original manuscript is available on the web site <www.atala.org>.
We report results of an effort to enable computers to segment US adjudicatory decisions into sentences. We created a data set of 80 court decisions from four different domains. We show that legal decisions are more challenging for existing sentence boundary detection systems than for non-legal texts. Existing sentence boundary detection systems are based on a number of assumptions that do not hold for legal texts, hence their performance is impaired. We show that a general statistical sequence labeling model is capable of learning the definition more efficiently. We have trained a number of conditional random fields models that outperform the traditional sentence boundary detection systems when applied to adjudicatory decisions.