0% found this document useful (0 votes)

28 views10 pages

Legal Language Modeling

Uploaded by

Faique Memon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views10 pages

Legal Language Modeling

Uploaded by

Faique Memon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Peric et al.

Proceedings of ASAIL 2020 1

Legal Language Modeling

with Transformers
Lazar PERIC a , Stefan MIJIC a , Dominik STAMMBACH a and Elliott ASH a
a ETH Zurich

Abstract. We explore the use of deep learning algorithms to generate text in a

professional, technical domain: the judiciary. Building on previous work that has
focused on non-legal texts, we train auto-regressive transformer models to read
and write judicial opinions. We show that survey respondents with legal expertise
cannot distinguish genuine opinions from fake opinions generated by our mod-
els. However, a transformer-based classifier can distinguish machine- from human-
generated legal text with high accuracy. These findings suggest how transformer
models can support legal practice.
Keywords. legal text generation, language modeling, transformer, human validation,
machine detection of generated text.

1. Introduction

The capacity to efficiently generate human-like text has rapidly increased with the
development of neural language models implemented through deep transformer archi-
tectures [1]. Besides breezing past the previous state-of-the-art baselines (e.g. perplexity
on the Penn Treebank and other corpora [2]), auto-regressive models like GPT-2 have
amazed both researchers and the public with their ability to generate believable and ex-
pressive text – for example, the now proverbial news article about scientists discover-
ing unicorns in South America [3]. The even better coherence, originality and fluency
achieved by GPT-3 [2] promise additional gains as these models are further extended and
refined.
In the legal domain, especially, there is enormous potential for well-intended use of
such language models. The law is the domain of natural language, and natural language
models could be used for many tasks, such as the generation of contracts, briefs, and
rulings. Many legal writing tasks are somewhat repetitive, and in these tasks especially
legal language models could save practitioners significant time and resources. Some early
steps in this direction are taken by [4, 5], who use a character-level recurrent neural
network to draft text in the style of international treaties.
While the promise of these language models is clear, concerns about potential mis-
use have led to extensive discussions, including disagreements about whether these mod-
els should be released to the public [3]. A complementary line of papers has explored
automated detection of machine-generated text, mostly focusing on newspaper articles

Proceedings of the 2020 Workshop on Automated Semantic Analysis of Information in Legal Text (ASAIL)
Copyright c 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2 Peric et al. Proceedings of ASAIL 2020

[6, 7]. Correspondingly, other work has explored human responses to news articles gen-
erated by language models, finding that overall humans respond to machine-generated
news text the same way they respond to human-generated news text [8].
We add to this literature by focusing on generating legal documents. Legal corpora
have two important differences from the generic or news-oriented corpora that have been
the focus of previous work. First, the documents tend to be much longer and to rely
on long-range connections within documents. This document length poses a problem
for the standard Transformer’s memory usage, which grows quadratically with the input
sequence size. One remedy to this problem is Transformer-XL [9], which allows (in
theory) access to arbitrarily long memory sequences using a recurrence mechanism and
relative positional encoding.
Second, legal language is technical. Similar to [8], we ask how well humans are
able to distinguish machine- from human-generated documents. But the language used
by judges is more technical, and less expressive, than that used by journalists. Under-
standing and writing effective legal documents requires many years of training, which
could make the task difficult for language models. On the other hand, legal language is
more formulaic and repetitive than generic language [10], so it might be easier to gener-
ate believably. Given that the benefits (and risks) of text generation in the law are at least
as significant as those in journalism, more empirical work on this topic is needed.
This paper explores the use of language models for generating legal text. We use a
corpus of 50’000 judicial opinions from U.S. Circuit Courts, appeals courts which review
the decisions made by lower courts. We train a Transformer-XL model from scratch and
fine-tune a pre-trained GPT-2 model on this corpus. We use both models to machine-
generate text conditioned on the start of judicial opinions not considered during training.
We evaluate the model outputs qualitatively and based on a set of language metrics.
We validate our method via a survey where human annotators tried to guess genuine from
generated opinion text. Even respondents with legal training had difficulty distinguishing
judge-generated from machine-generated texts. These results suggest that transformer
language models can learn to generate consistent and coherent legal text.
Legal disputes are an adversarial process. In a world where language models are of-
ten used to support legal arguments, it would be useful to detect machine-generated text
when used by an adversary. Following the approach for fake news detection in [6], we
show that our machine-generated texts can be used to train a classifier that discriminates
genuine from machine-generated texts with a very high accuracy, seemingly outperform-
ing humans on that task.
To summarize, our findings suggest the legal language models could be a promising
tool for legal practice. Such technology could be further enhanced and used to assist
practitioners in writing legal documents. In addition, the models could help differentiate
human- from machine-written legal text.
Peric et al. Proceedings of ASAIL 2020 3

2. Data and Methods

2.1. Data

Our empirical setting is U.S. Circuit Courts, the intermediate appellate courts in the
federal court system. Circuit Court judges review the decisions of the District Courts,
deciding whether to affirm or reverse. The judges explain their decision by providing
a written opinion. Our corpus comprises 50’000 of these U.S. Circuit Court opinions,
uniformly sampled from the universe of opinions for the years 1890 through 2010.1 The
sample includes both lead (majority) opinions and addendum opinions (concurrences and
dissents).
We undertake minimal pre-processing, so that our generator can replicate the original
style of the texts. We do remove some metadata and XML markup but keep capitalization,
punctuation, etc. In particular, we preserve the special legal citation notation used by U.S.
courts.
The opinions are in general quite lengthy, containing an average of 2024 tokens
(words) per article. Figure 1 shows the average opinion length over time as well as the
number of documents per year. We see that average length gradually decreased from the
1890s reaching a minimum in the 1970s. After that, the average length of these opinions
has grown steadily until the present day. Notably, it is around 1970 when digital legal
research databases came into use.

Figure 1. Average number of tokens in a document (top) and number of documents (bottom) per year.

1 The opinion texts can be obtained from CourtListener.

4 Peric et al. Proceedings of ASAIL 2020

2.2. Language Model Objective

Our approach to representing legal documents is an auto-regressive language model.

We are given an unsupervised corpus U. The objective is to learn a set of parameters Θ
to maximize

L1 (U) = ∑ logP(ui |ui−k , ..., ui−1 ; Θ), (1)

the probability of token ui appearing at position i conditional on the k tokens appearing

before ui .
We experimented with two transformer architectures for auto-regressive language
modeling. First, we trained a Transformer-XL model from scratch on our corpus. Second,
we fine-tuned an existing GPT-2 checkpoint on our legal corpus. The implementations
for each of these approaches are described in the following two subsections.

2.3. From-Scratch Transformer-XL Model

We train a Transformer-XL model from scratch. After applying a word tokenizer

to the training corpus, we built a vocabulary of 528’476 different token types. Sentence
boundaries (detected using spaCy [11]) are annotated as <eos>. If seed documents con-
tain words that did not appear in the training corpus, those are replaced with a special
<unk> token. Figure 2 shows an excerpt of the beginning of a new opinion after pre-
processing.
[ * 1209 ] JERRE S. WILLIAMS , Circuit Judge : <eos> This appeal is lodged by
H. Roger Lawler , a debtor in bankruptcy . He claims that an award in bankruptcy of $
<unk> in attorneys ’ fees to Richard W. Horton and Vernon O. <unk> is too high .
Horton and <unk> are cross - appealing the amount of the award

Figure 2. Sample of the dataset.

The Transformer-XL model consists of 12 layers, where each layer consists of 16

attention heads. The model dimension is set to 512, the inner dimension in each fully-
connected feedforward layer (following the attention layer) is set to 2048. These val-
ues were derived from the configuration of the Transformer-XL base model trained on
the WikiText-103 dataset, which was used for text generation in [9]. This yields 312M
trainable model parameters.

2.4. Fine-Tuned GPT-2 Model

Secondly, we consider a fine-tuned GPT-2 model. We started with a GPT-2BASE

checkpoint, downloaded from huggingface. The model consists of 124M parameters and
was pre-trained on a general-domain corpus. We further fine-tuned the model on the
language modeling task on the Circuit Court opinions using a maximum sequence length
of 256.
We used the aitextgen library with default settings to perform further training and
text generation.
Peric et al. Proceedings of ASAIL 2020 5

2.5. Model Training

We trained the Transformer-XL model with a batch-size of 16 and used a sequence

length of 128 during training. After having trained the network for 24’000 steps, i.e.,
the model has read 50M unique tokens, we achieve a perplexity of 36.85 on held-out
data (perplexity indicates how well our language model predicts the held-out data). We
trained the GPT-2 model on a batch-size of 2 for 36’000 steps, i.e., we provided 18.5M
tokens during training. The fine-tuned GPT-2 model achieved a final perplexity of 28.40
on held-out data.
Figure 3 shows perplexity over time during training for the Transformer-XL archi-
tecture trained from scratch. It slowly converges. In future work, we would like to explore
fitting models with more parameters on a larger number of opinions.

Figure 3. Perplexity during training.

2.6. Generation and Decoding

The next step is to generate text using our language models conditioned on a seed
text snippet. For Transformer-XL, we used the memory recurrence mechanism of the
architecture. The sequence length was set to 64 with a memory of 640 tokens. We apply
the decoding strategy from the official codebase2 . At each step during generation, we
sample from the k most probable tokens in a uniform manner. The default value for k is
set to 3.
For GPT-2, we predict the next token given the 255 tokens before that token. We use
top-p decoding strategy with p = 0.9, implemented in the aitextgen library3 . That is, we
sample from the most probable tokens from the distribution over the next token such that
the most probable tokens combine in total at least 90% of the probability mass.
For the experiments, we used our models to automatically generate text snippets
conditioned on the actual start of 1.8K opinions, randomly sampled from the text.
2 [Link]
3 [Link]
6 Peric et al. Proceedings of ASAIL 2020

2.7. Open Source Demonstration

We produced a Google Colab notebook for generating legal texts using the fine-
tuned GPT-2 model, available at [Link]
1VjTUAUgXqZCsmDZYS2N098_2w46oIEED?usp=sharing. We have made the code for
the Transformer-XL model publicly available at [Link]
transformer-xl.

3. Evaluation

3.1. A Generated Example

We show an example of generated text in Figure 4. At the top (a), we see the prompt:
the start of a Circuit Court opinion describing appellant’s request for an exception to a 90-
day complaint filing period. In (b) we see the genuine completion. The third panel gives
the generated completion for the from-scratch Transformer-XL model (c), the fourth for
the vanilla GPT-2 model (d), and the fifth panel for the fine-tuned GPT-2 model (e).

(a) Prompt: Appellant urges that he acted diligently to present his claim to the court and
that this diligence should justify the one-day delay. Without reciting all of the details of
his efforts we note that the 90-day period for filing a complaint expired on May 14, 1973.
(b) Genuine Ending: On March 6, 1973, he had sought the services of an attorney to draft
a formal complaint for him. The complaint was prepared and delivered to him prior to the
filing deadline but the exact date is not clear from the record. Appellant’s Memorandum
is Support of Motion to Appeal would indicate this date as April 13, 1972.
(c) Transformer-XL: In this connection, the Nygren’s argument is without support. Ap-
pellant argues, for example, for a period not exceeding five days after filing an indictment
on October 15. Appellee argues, for example for an offense involving his activities under
§506, which was committed on October 22 and April 22.
(d) Vanilla GPT-2: Appellant was able to present his claim in the court in less than 10
days, which is not a trivial amount of time to provide for a review of a case in which the
complaint was filed within the 90-day period . Moreover, it is undisputed that appellant
did not have any prior knowledge of the nature of the allegations and the nature of the
issues raised in the complaint.
(e) Fine-tuned GPT-2 on Legal Text: The Government’s failure to disclose the facts of
the case, and the failure of the court to address the underlying factual issues is not at issue
here. On November 19, 1973, respondent filed his civil suit in the District Court. The
District Court denied the claim for $ 100 million, arguing that the District Court had not
acted to protect his interest in the case, and that there was no constitutional or statutory
impediment to enforcing the suit.

Figure 4. Generated text example.

The genuine continuation (b) further explains the exact timeline of the events.
Transformer-XL (c) discusses the argument of the appellant, with some discordant dates
mentioned. Vanilla GPT-2 (d) discusses that the appellant was able to present the claim
in less than 10 days, somewhat contradictory of the initial issue over taking more than 90
days. Finally, the fine-tuned GPT-2 (e) digresses into a discussion of the government’s
failure to disclose the facts of the case. While there is suspicious content in each of the
machine-generated snippets, the style is legalistic and facially convincing.
Peric et al. Proceedings of ASAIL 2020 7

3.2. Readability

To provide some statistical comparison of the text generator models, we calculated

the Flesch–Kincaid readability score4 of the generated samples and the actual cases for
the 1.8K sample. As a non-legal comparison, we compute readability for the Harry Potter
corpus. A higher score indicates easier readability.

Table 1. Readability scores.

Opinion Readability Score
Actual Circuit Opinions 37.37
Opinions Transformer-XL 40.92
Opinions GPT-2 37.47
Harry Potter 72.8

We show results in Table 1, which indicate some interesting variation across the real
and generated corpora. We see that the documents generated by GPT-2 are as hard to
read as the original Court Opinions, while the ones generated by Transformer-XL have a
higher readability score, indicating that they are slightly easier to read. This is in line with
the overall lower perplexity obtained by our fine-tuned GPT-2 checkpoint, indicating that
the GPT-2 checkpoint is closer to the legalese style of the opinions. All our generated
text and the genuine opinions are rather technical and thus hard to read. Meanwhile, the
average readability score for the Harry Potter books is almost twice as high than the
Circuit Opinions and our generated opinions.

3.3. Human Survey Evaluation

We designed a survey to measure the ability of native English speakers and Legal
Professionals to distinguish generated legal extracts from authentic Court opinions. We
showed the participants an authentic prompt from a legal text, the actual opinion, and
the generated continuation produced by our Transformer-XL and GPT-2 models. The
participants were then asked to rank the text according to how authentic they believed the
different snippets were. At the same time, they had to indicate how confident they were
about their verdict. The ten prompts can be found in Appendix A5 .
We recruited 54 survey participants in total. For the following statistics, we only
include the responses of 25 participants who correctly identified a trivial attention-test
question with at least 70% confidence. More details regarding that example can be found
in Appendix A.
The human annotators could not distinguish the human from machine generated le-
gal texts. Comparing genuine texts to GPT-2, the annotators were successful only 49%
of the time (slightly worse than random). With Transformer-XL, the participants were
slightly more accurate, picking the genuine text 53% of the time. In addition, the an-
notators indicated a generally low level of confidence in their guesses. The average in-
4 [Link]
5 Appendices are available at [Link]

Modeling_with_Transformers.pdf?raw=1
8 Peric et al. Proceedings of ASAIL 2020

dicated confidence was 57.6% for extracts produced by GPT-2 and 55.4% for extracts
produced by Transformer-XL. The differences between GPT-2 and Transformer-XL are
not statistically significant
If we focus on a subset of participants with legal expertise, (at least three years of le-
gal education, n = 6), the results are almost identical. They could not correctly guess the
genuine follow-up text, and they were not confident in their guesses. These numbers (al-
though a small sample size) suggest that legal experts and other participants are similarly
uncertain about determining whether a legal opinion is machine-generated or written by
humans.

4. Automated Detection of Machine-Generated Opinions

We have a balanced dataset of 3.6K examples (1.8K judge-generated vs. 1.8K

machine-generated texts) for both the Transformer-XL and the GPT-2 model. We fine-
tune RoBERTa (a pre-trained bidirectional transformer [12]) to classify which continu-
ations are judge-generated (the genuine opinion) and which are machine-generated. In
both settings, we train on half of the examples and evaluate on the other half.

4.1. Training Transformer Discriminators

We fine-tune RoBERTa with a learning rate of 2e-5 and a batch-size of 16 for three
epochs. We show results in Table 2. In the last columns of Table 2, we show precision,
recall and F1 for the machine-generated class.
Table 2. Deep-learning detection of machine-generated texts
Setting Accuracy (%) Pr (%) Rc (%) F1 (%)
GPT-2 generated texts (RoBERTaGPT-2 ) 94.2 90.1 99.2 94.4
Transformer-XL generated texts (RoBERTaT-XL ) 97.5 97.7 97.2 97.5
RoBERTaGPT-2 predicting Transformer-XL texts 76.7 85.6 64.3 73.4
RoBERTaT-XL predicting GPT-2 texts 74.9 95.8 52.2 67.6

In the first two rows, we see that in contrast to humans (among them legal ex-
perts) considered in our survey, machine-learning models are very good at detecting
machine-generated language. This is in line with observations made in [7], where BERT-
classifiers outperform humans on the task of automatically detecting machine-generated
text. Remarkably, in the setting where we fine-tune GPT, we detect almost all machine-
generated examples (recall over 99%) while maintaining a precision of over 90%. In the
Transformer-XL case, we have fairly balanced precision and recall, both over 97%.
In the last two rows, we show results for 2 additional experiments, which would be
closer to a real-world setting where one would not know what model was used to generate
fake legal text. We use our discriminator trained on detecting GPT-2 produced samples
to detect Transformer-XL samples, and vice versa. In both settings, the overall accuracy
of the classifier drops significantly. But still, performance is much better than human
guessing. Interestingly, in both experiments, the discriminator achieves a high precision
at the expense of a reduced recall.
Peric et al. Proceedings of ASAIL 2020 9

It seems that overall, our models are capable of detecting machine-generated texts
with very high accuracy. The models are effective even though we provided a modest-
sized training set and have not performed any hyperparameter tuning. Appendix B pro-
vides some descriptive analysis of the errors made by the discriminators.

5. Conclusion

In this paper, we explored the capabilities of Transformer-based Decoders on gen-

erating U.S. Circuit Court opinions. We show examples of machine-generated text and
validate the quality of our generated text via a survey. The results suggest that hu-
mans (among them legal experts) are in general uncertain about determining whether
text is machine-generated or judge-generated. We additionally explore the capabilities of
machine-learning models to predict whether a given snippet is machine-generated or an
actual opinion. We find that machine-learning models are very good at that task, yielding
way above 90% accuracy in different settings.
While the hurdle to generate authentic long texts is still high, the level of proficiency
that our two Transformer models achieve for short texts is already high enough to make
a direct identification difficult, even for specialized legal experts.
This research could be extended in a number of ways. More work could be done to
make these text generators useful for lawyers or judges seeking assistance in drafting le-
gal documents. For example, one could do conditioned generation based on the direction
of the decision – affirm or reverse. Thus language models could be used to explore the
legal arguments for or against a given position.
It should be mentioned that legal language models, like language models generally,
are vulnerable to biases prevalent in the training data. For example, our judicial-opinion
models could be biased in terms of the ideology and topics used by Circuit Court judges
[13]. These biases will re-appear in text generated by our models, even if that is not
intended. Measuring and adjusting for these biases is an important area for future work.
10 Peric et al. Proceedings of ASAIL 2020

References

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,

and I. Polosukhin, “Attention Is All You Need,” in Advances in Neural Information
Processing Systems, 2017, pp. 5998–6008.
[2] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Nee-
lakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger,
T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse,
M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCan-
dlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot
Learners,” CoRR, vol. abs/2005.14165, 2020.
[3] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language
Models are Unsupervised Multitask Learners,” OpenAI Blog, 2019.
[4] W. Alschner and D. Skougarevskiy, “Can Robots Write Treaties? Using Recurrent
Neural Networks to Draft International Investment Agreements,” in JURIX, ser.
Frontiers in Artificial Intelligence and Applications, vol. 294, 2016, pp. 119–124.
[5] W. Alschner and D. Skougarevskiy, “Towards an automated production of legal
texts using recurrent neural networks,” in Proceedings of the 16th edition of the
International Conference on Artificial Intelligence and Law, ICAIL, 2017, pp. 229–
232.
[6] R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, and Y. Choi,
“Defending Against Neural Fake News,” in Advances in Neural Information Pro-
cessing Systems 32, 2019, pp. 9054–9065.
[7] D. Ippolito, D. Duckworth, C. Callison-Burch, and D. Eck, “Automatic Detection
of Generated Text is Easiest when Humans are Fooled,” in Proceedings of the 58th
Annual Meeting of the Association for Computational Linguistics, 2020, pp. 1808–
1822.
[8] S. E. Kreps, M. McCain, and M. Brundage, “All the News that’s Fit to Fabricate: AI-
Generated Text as a Tool of Media Misinformation,” Available at SSRN 3525002,
2020.
[9] Z. Dai, Z. Yang, Y. Yang, J. G. Carbonell, Q. V. Le, and R. Salakhutdinov,
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context,”
CoRR, vol. abs/1901.02860, 2019.
[10] D. Simonson, D. Broderick, and J. Herr, “The Extent of Repetition in Contract Lan-
guage,” in Proceedings of the Natural Legal Language Processing Workshop, 2019,
pp. 21–30.
[11] M. Honnibal and M. Johnson, “An Improved Non-monotonic Transition System for
Dependency Parsing,” in Proceedings of the 2015 Conference on Empirical Meth-
ods in Natural Language Processing, 2015, pp. 1373–1378.
[12] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-
moyer, and V. Stoyanov, “RoBERTa: A Robustly Optimized BERT Pretraining Ap-
proach,” arXiv preprint arXiv:1907.11692, 2019.
[13] C. I. Hausladen, M. H. Schubert, and E. Ash, “Text classification of ideological
direction in judicial opinions,” International Review of Law and Economics, vol. 62,
p. 105903, 2020.

Bigdata50022 2020 9378201
No ratings yet
Bigdata50022 2020 9378201
10 pages
Large Language Models and Their Possible Uses in L
No ratings yet
Large Language Models and Their Possible Uses in L
21 pages
Large Language Models Are Legal But They Are Not Making The Case For A Power LLM
No ratings yet
Large Language Models Are Legal But They Are Not Making The Case For A Power LLM
7 pages
Ghosh Et Al
No ratings yet
Ghosh Et Al
6 pages
2022 Lrec-1 470
No ratings yet
2022 Lrec-1 470
10 pages
Indian Legal Language Model Development
No ratings yet
Indian Legal Language Model Development
7 pages
Chatgpt, Ai Large Language Models, and Law - 2023
No ratings yet
Chatgpt, Ai Large Language Models, and Law - 2023
32 pages
The Impact of Large Language Modeling On Natural Language Processing in Legal Te
No ratings yet
The Impact of Large Language Modeling On Natural Language Processing in Legal Te
7 pages
Natural Language Processing For The Legal Domain
No ratings yet
Natural Language Processing For The Legal Domain
35 pages
Exploring Text Decoding Methodsfor Portuguese Legal Text Generation, Kenzo Miranda Sakiyama
No ratings yet
Exploring Text Decoding Methodsfor Portuguese Legal Text Generation, Kenzo Miranda Sakiyama
15 pages
LexGLUE A Benchmark Dataset For Legal Language Understadning in English
No ratings yet
LexGLUE A Benchmark Dataset For Legal Language Understadning in English
21 pages
Natural Language Processing For The Legal Domain
No ratings yet
Natural Language Processing For The Legal Domain
35 pages
Artificial Intelligence - Legal Reasoning Legal Research and Lega
No ratings yet
Artificial Intelligence - Legal Reasoning Legal Research and Lega
23 pages
L D D F - T P - T L L M: Egal Ocuments Rafting With INE Uned RE Rained Arge Anguage Odel
No ratings yet
L D D F - T P - T L L M: Egal Ocuments Rafting With INE Uned RE Rained Arge Anguage Odel
14 pages
Legal AI Bias: Racial Skew Analysis
No ratings yet
Legal AI Bias: Racial Skew Analysis
8 pages
2022 nllp-1 26
No ratings yet
2022 nllp-1 26
11 pages
Natural Language Processing For The Legal Domain A Survey of Tasks, Datasets, Models, and Challenges
No ratings yet
Natural Language Processing For The Legal Domain A Survey of Tasks, Datasets, Models, and Challenges
35 pages
Legal Assist AI-Leveraging Transformer Based Model
No ratings yet
Legal Assist AI-Leveraging Transformer Based Model
17 pages
Improving Legal Judgement Prediction in Romanian
No ratings yet
Improving Legal Judgement Prediction in Romanian
8 pages
Classification of US Supreme Court Cases Using BERT-Based Techniques
No ratings yet
Classification of US Supreme Court Cases Using BERT-Based Techniques
7 pages
SSRN 4314839
No ratings yet
SSRN 4314839
7 pages
BLT Can Large Language Models Handle Basic Legal Text 2311.09693v2
No ratings yet
BLT Can Large Language Models Handle Basic Legal Text 2311.09693v2
14 pages
Exploring Llms Applications in Law: A Literature Review On Current Legal NLP Approaches
No ratings yet
Exploring Llms Applications in Law: A Literature Review On Current Legal NLP Approaches
24 pages
Peerj Cs 2697
No ratings yet
Peerj Cs 2697
21 pages
Introduction For Artificial Intelligence and Law Special Issue Natural Language Processing For Legal Texts
No ratings yet
Introduction For Artificial Intelligence and Law Special Issue Natural Language Processing For Legal Texts
3 pages
(Greco+Tagarelli) s10506 023 09374 7
No ratings yet
(Greco+Tagarelli) s10506 023 09374 7
148 pages
Legal NLP Summary
No ratings yet
Legal NLP Summary
3 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
GPT Models in Legal Textual Entailment
No ratings yet
GPT Models in Legal Textual Entailment
6 pages
Few-Shot Learning in GPT-3 Models
No ratings yet
Few-Shot Learning in GPT-3 Models
74 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Related - A Comparison Study of Pre-Trained Language Models For Chinese Legal Document Classification
No ratings yet
Related - A Comparison Study of Pre-Trained Language Models For Chinese Legal Document Classification
6 pages
(Coursera) GenAI
No ratings yet
(Coursera) GenAI
27 pages
Prompt Templates For Structured Answer Generation
No ratings yet
Prompt Templates For Structured Answer Generation
11 pages
1 s2.0 S2666651021000176 Main
No ratings yet
1 s2.0 S2666651021000176 Main
6 pages
2022 nllp-1 13
No ratings yet
2022 nllp-1 13
19 pages
Evaluating The Machine Learning Models Based On Natural Language Processing Tasks
No ratings yet
Evaluating The Machine Learning Models Based On Natural Language Processing Tasks
15 pages
Balaa Punda
No ratings yet
Balaa Punda
25 pages
Large Legal Fictions
No ratings yet
Large Legal Fictions
37 pages
Fine-tuning GPT-3 for Legal Rule Classifications
No ratings yet
Fine-tuning GPT-3 for Legal Rule Classifications
20 pages
Natural Language Processing in Law Prediction of Outcomes in The Higher Courts of Turkey
No ratings yet
Natural Language Processing in Law Prediction of Outcomes in The Higher Courts of Turkey
16 pages
ChatGPT and The Future of Legal Education and Practice
No ratings yet
ChatGPT and The Future of Legal Education and Practice
14 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
ChatGPT As Speechwriter For The French Presidents.18382v1
No ratings yet
ChatGPT As Speechwriter For The French Presidents.18382v1
22 pages
Legal-BERT: Adapting BERT for Law
No ratings yet
Legal-BERT: Adapting BERT for Law
7 pages
Semantic Segmentation of Legal Documents Via Rhetorical Roles
No ratings yet
Semantic Segmentation of Legal Documents Via Rhetorical Roles
19 pages
Natural Language Processing For Legal Document Review: Categorising Deontic Modalities in Contracts
No ratings yet
Natural Language Processing For Legal Document Review: Categorising Deontic Modalities in Contracts
22 pages
Transformer Models Overview for NLP
No ratings yet
Transformer Models Overview for NLP
5 pages
NeurIPS 2023 The Cambridge Law Corpus A Dataset For Legal Ai Research Paper Datasets and Benchmarks
No ratings yet
NeurIPS 2023 The Cambridge Law Corpus A Dataset For Legal Ai Research Paper Datasets and Benchmarks
31 pages
NLP in Legal Research
No ratings yet
NLP in Legal Research
7 pages
MIREL2016
No ratings yet
MIREL2016
14 pages
B.E Research Paper 5
No ratings yet
B.E Research Paper 5
40 pages
ICC 24 - YP Workshop On Challenges Related To Being A PHD Student - Satisfaction Survey
No ratings yet
ICC 24 - YP Workshop On Challenges Related To Being A PHD Student - Satisfaction Survey
2 pages
Legal Artificial Assistance Agent To Assist Refugees
No ratings yet
Legal Artificial Assistance Agent To Assist Refugees
3 pages
Online - Talk On Future Technologies in Communications Society (Responses)
No ratings yet
Online - Talk On Future Technologies in Communications Society (Responses)
4 pages
ICC 24 - YP Panel On Navigating The World of Academic Publishing - From Writing To Reviewing
No ratings yet
ICC 24 - YP Panel On Navigating The World of Academic Publishing - From Writing To Reviewing
4 pages
CreditcardscomInc 20070810 S-1 EX-10.33 362297 EX-10.33 Affiliate Agreement
No ratings yet
CreditcardscomInc 20070810 S-1 EX-10.33 362297 EX-10.33 Affiliate Agreement
12 pages
Model Employment Contract
No ratings yet
Model Employment Contract
8 pages
Case Law
No ratings yet
Case Law
6 pages
Causes of East Pakistan's Secession
100% (1)
Causes of East Pakistan's Secession
6 pages
12.diyat - Qisas
No ratings yet
12.diyat - Qisas
3 pages
Business Email Writing
No ratings yet
Business Email Writing
13 pages
Result 9b Test 1
No ratings yet
Result 9b Test 1
1 page
Computer Book For PPSC Lecturer Computer Science
No ratings yet
Computer Book For PPSC Lecturer Computer Science
107 pages
Neral Defences in PPC
No ratings yet
Neral Defences in PPC
3 pages
1.unlawful Assembly
0% (1)
1.unlawful Assembly
3 pages
Powershell Commands PDF
No ratings yet
Powershell Commands PDF
3 pages
Cloak Room Guidelines for Hostel Students
No ratings yet
Cloak Room Guidelines for Hostel Students
2 pages
Final Report MR Group 1
No ratings yet
Final Report MR Group 1
11 pages
Project IN Practical Research 1: Grade 11 - Abm 1 Group 2
100% (1)
Project IN Practical Research 1: Grade 11 - Abm 1 Group 2
20 pages
The Structure of Greater Ashanti, Another Point of View
No ratings yet
The Structure of Greater Ashanti, Another Point of View
23 pages
Course Form
No ratings yet
Course Form
1 page
Oral Communication 2nd Quarter Notes
100% (1)
Oral Communication 2nd Quarter Notes
15 pages
Archaeology and Science
No ratings yet
Archaeology and Science
5 pages
Task 2 Detailed Analysis of Music Magazine XXL
No ratings yet
Task 2 Detailed Analysis of Music Magazine XXL
4 pages
Earth Building in Scotland - Past, Present, and Future - Morton and Little 2001
No ratings yet
Earth Building in Scotland - Past, Present, and Future - Morton and Little 2001
18 pages
Program Evaluation - Memo To School Principal
No ratings yet
Program Evaluation - Memo To School Principal
3 pages
MGMT Session 12 Teams
No ratings yet
MGMT Session 12 Teams
25 pages
Naukri DrVisheshRawat (24y 0m)
No ratings yet
Naukri DrVisheshRawat (24y 0m)
3 pages
Secret War: Freemasonry vs. Church
100% (1)
Secret War: Freemasonry vs. Church
373 pages
BPI V Evangelista
No ratings yet
BPI V Evangelista
4 pages
A Theft at The Museum Information Gap Activities Picture Description Exe 117160
No ratings yet
A Theft at The Museum Information Gap Activities Picture Description Exe 117160
3 pages
Railroad and Communication Network Including Telegraph and Postal Services
No ratings yet
Railroad and Communication Network Including Telegraph and Postal Services
8 pages
116 PDF
No ratings yet
116 PDF
14 pages
Relationship Between Repos, Securities Lending, and Derivatives
No ratings yet
Relationship Between Repos, Securities Lending, and Derivatives
21 pages
List 020610
80% (5)
List 020610
1,554 pages
Robbins Brothers - Top 30 Creditors
No ratings yet
Robbins Brothers - Top 30 Creditors
6 pages
Networking Assignment
No ratings yet
Networking Assignment
20 pages
The Incredible Shrinking Universe of Stocks
100% (1)
The Incredible Shrinking Universe of Stocks
29 pages
Seandavidson Studentinvolvment
No ratings yet
Seandavidson Studentinvolvment
6 pages
Utility Bills
No ratings yet
Utility Bills
2 pages
Lmia # 959409899 Parvinder Saini
100% (4)
Lmia # 959409899 Parvinder Saini
4 pages
Some Notes On The Authors of Aami Liter
No ratings yet
Some Notes On The Authors of Aami Liter
18 pages
On-Boarding, Training, Development & Career Planning
No ratings yet
On-Boarding, Training, Development & Career Planning
8 pages
G.R. No. 208566. November 19, 2013 (Case Brief - Digest)
No ratings yet
G.R. No. 208566. November 19, 2013 (Case Brief - Digest)
3 pages
HSE KPIs 8 PGS
100% (2)
HSE KPIs 8 PGS
8 pages
CPP Practice Exam for Payroll Professionals
No ratings yet
CPP Practice Exam for Payroll Professionals
18 pages

Legal Language Modeling

Uploaded by

Legal Language Modeling

Uploaded by

Peric et al.

Proceedings of ASAIL 2020 1

Legal Language Modeling

Abstract. We explore the use of deep learning algorithms to generate text in a

2. Data and Methods

1 The opinion texts can be obtained from CourtListener.

2.2. Language Model Objective

Our approach to representing legal documents is an auto-regressive language model.

L1 (U) = ∑ logP(ui |ui−k , ..., ui−1 ; Θ), (1)

the probability of token ui appearing at position i conditional on the k tokens appearing

2.3. From-Scratch Transformer-XL Model

We train a Transformer-XL model from scratch. After applying a word tokenizer

Figure 2. Sample of the dataset.

The Transformer-XL model consists of 12 layers, where each layer consists of 16

2.4. Fine-Tuned GPT-2 Model

Secondly, we consider a fine-tuned GPT-2 model. We started with a GPT-2BASE

2.5. Model Training

We trained the Transformer-XL model with a batch-size of 16 and used a sequence

Figure 3. Perplexity during training.

2.6. Generation and Decoding

2.7. Open Source Demonstration

3.1. A Generated Example

Figure 4. Generated text example.

To provide some statistical comparison of the text generator models, we calculated

Table 1. Readability scores.

3.3. Human Survey Evaluation

4. Automated Detection of Machine-Generated Opinions

We have a balanced dataset of 3.6K examples (1.8K judge-generated vs. 1.8K

4.1. Training Transformer Discriminators

In this paper, we explored the capabilities of Transformer-based Decoders on gen-

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,

You might also like