Human Centered AI for Indian Legal Text Analytics
Sudipto Ghosh1 , Devanshu Verma1 , Balaji Ganesan2 ,
Purnima Bindal1 , Vikas Kumar1 and Vasudha Bhatnagar1
1
Department of Computer Science, University of Delhi
2
IBM Research
{sudipto.mcs22, dverma, pbindal, vikas, vbhatnagar}@cs.du.ac.in,
[email protected]arXiv:2403.10944v1 [cs.HC] 16 Mar 2024
Abstract
Legal research is a crucial task in the practice of
law. It requires intense human effort and intellec-
tual prudence to research a legal case and prepare
arguments. Recent boom in generative AI has not
translated to proportionate rise in impactful legal
applications, because of low trustworthiness and
and the scarcity of specialized datasets for train-
ing Large Language Models (LLMs). This position
paper explores the potential of LLMs within Legal
Text Analytics (LTA), highlighting specific areas
where the integration of human expertise can sig-
nificantly enhance their performance to match that
of experts. We introduce a novel dataset and de-
scribe a human centered, compound AI system that
principally incorporates human inputs for perform-
ing LTA tasks with LLMs.
Figure 1: Human computer interaction can bring down information
1 Introduction asymmetry in the justice delivery system
Shneiderman [2022] defines Human-Centered AI (HCAI) as
a collection of successful technologies that amplify, augment,
empower, and enhance human performance. Figure 1 shows a compound AI system, with possible
In the evolving field of human-centered computing, the fo- points of interaction between human actors including law pro-
cus is increasingly on harnessing computing technologies that fessionals and common citizens. Additionally, law students
are not just advanced but also intuitively aligned with human and researchers, petitioners, law activists, etc., also interact
experiences and needs. Within the realm of open-source large with AI-driven LTA services and platforms. Legal experts
language models (LLMs), this perspective becomes crucial may use the LTA services for improved efficiency in legal re-
as we explore ways to blend the computational might of AI search and help speed up justice delivery. Citizens who are
systems with the nuanced understanding and contextual judg- not well versed in legalese can use the services to understand
ment that humans bring to the table. legal documents, do basic research for drafting petitions and
One particular area that can benefit from HCAI is justice be able to submit better responses to the judicial system after
delivery in our court systems. In many countries, the legal taking help from LTA services.
system is overwhelmed by a backlog of cases, especially in The motivation for a human centered, compound AI sys-
the lower judiciary. While there are legislations like speedy tem involving models and humans is our observation that the
justice acts, the legal processes are inherently time consum- recent boom in Generative AI has not translated to propor-
ing. AI can help automate legal analytics tasks using Legal tionate rise in impactful applications. The reasons for this are
Text Analytics (LTA) and speed up justice delivery. low trustworthiness, lack of focus on the common citizens,
Zaharia et al. [2024] calls for a shift from models to com- and general unavailability of resources which deteriorates in
pound AI systems. We propose creating a human centered, domains that directly touch human lives.
compound AI system comprising of large language models In this position paper, we identify few prominent legal
to perform various LTA tasks which deliberately and princi- text analytics tasks undertaken by legal researchers and self-
pally elicit human input. represented litigants and comment on the performance of ex-
isting models on these tasks. We introduce a new dataset and Indian Legal Document Corpus published by Malik et al.
describe how large language models, eliciting human inputs [2021] contains 35,000 Indian court judgments and gold stan-
at all levels, can better serve the needs of the people. dard explanations for the Court Judgment Prediction and Ex-
Our paper is organised as follows. In Section 2, we survey planation task. We introduce a dataset annotated for tasks
related work in Legal Text Analytics. In Section 3, we intro- like question answering, summarization and petition drafting,
duce a new dataset for the human centered AI system that we which help self-represented litigants to access justice.
propose. In Section 4, we discuss existing problems and how
human input can help improve AI systems and in Section 5, Knowledge Infusion
we briefly discuss our future work, a large language model to Chalkidis et al. [2020] introduced the LegalBERT model that
be built incorporating HCAI principles. continues to be used for tasks on the legal data including our
experiments in this work. Paul et al. [2022] have introduced
2 Related Work InLegalBERT which is trained on Indian legal documents. In-
fusing knowledge into large language models has been dis-
In this section, we present a non-exhaustive survey of the re- cussed in several works. Two survey papers by Wei et al.
cent works in tasks relevant for legal text analytics. [2021] and Yang et al. [2021] present different methods to
Legal Knowledge Graph infuse knowledge into large language models. Islam et al.
[2021] consumes a knowledge graph for the entity generation
Automatic Knowledge Graph Construction (AKBC) has been task.
popularized since the Knowledge Base Population track Ji Agarwal et al. [2020] created a method to translate knowl-
et al. [2010] organized by TAC. Domain specific knowledge edge graph triples into sentences for enhancing LLM pre-
graphs (Abu-Salih [2021]) remain an ongoing research area. training. Moiseev et al. [2022] and Agarwal et al. [2023] then
Dhani et al. [2021] and Jain et al. [2022] discuss creating directly integrated these triples into LLMs and T5 models,
legal knowledge graphs using judgements and related docu- respectively, showing two effective paths for knowledge inte-
ments from Indian courts. The role of human annotations in gration—via natural language or directly from triples. Vasisht
knowledge graph construction is also a well researched area. et al. [2023] took a different approach by using contextual
Chiticariu et al. [2010] proposed a system to extract domain text for embedding knowledge into models.dos Santos et al.
specific entities and relationships from documents. Vannur [2022] developed Knowledge Prompts for frequent Wikidata
et al. [2021] discussed fairness in personal knowledge base entities, refined to aid in triple prediction. Diao et al. [2023]
construction. We can characterize all the methods as based uses adapters for efficient knowledge infusion into large lan-
on rule-based or rule-assisted knowledge base construction. guage models.
Question Answering on Indian Judgements Retrieval Augmented Generation
Although question answering is a well-studied problem Pra- Text-to-SQL field has seen significant interest with the ap-
manik et al. [2021], QA systems in the legal domain are plication of large language models (LLMs) for generating
not commonly available. Automatic question answering sys- queries. Our focus is on querying case related information for
tems not only provide consultancy to litigants who are typi- retrieval augmented generation with LLMs. CRUSH4SQL
cally chartering unfamiliar grounds of the legal domain, but by Kothyari et al. [2023] employs a retrieval-based method
are equally beneficial to legal professionals. Ganesan et al. where an LLM generates a simplified schema for query re-
[2020] presented a question answering system that leverages finement. DIN-SQL by Pourreza and Rafiei [2023] uses a se-
domain specific knowledge graphs. ries of prompts to translate natural language into SQL, prov-
ing effective on benchmarks like BIRD Li et al. [2023] and
Judgment Summarization Spider Yu et al. [2018], surpassing even fine-tuned models.
Automatic summarization of judgements, and preparation of
headnotes (highlighting the point-of-law) help law profes- Resources and Tools
sionals to locate discussion of a legal issue in lengthy judge- Services such as Rocket Lawyer and LegalZoom provide users
ments. There have been works in different countries that with the means to create legal documents including pleas,
have addressed this including SALOMON Uyttendaele et al. wills, and contracts. DoNotPay guides users through vari-
[1998] in Belgium, Letsum Farzindar and Lapalme [2004] ous legal processes, offering support from contesting park-
in Canada, Case summarizer Polsley et al. [2016] in Aus- ing tickets to drafting legal documents. Similarly, platforms
tralia. Yizhen et al. [2021] summarizes contents of Chinese like Avvo and LawGuru allow individuals to seek legal advice
civil judgments. Kanapala et al. [2019] presented a survey by posing questions and receiving answers from experienced
of works in legal text summarization. OpenNyAI Kalamkar lawyers, offering invaluable insights for document prepara-
et al. [2022] released annotated Indian court judgments and tion. Legal aid societies and non-profit organizations like Pro
models for tasks such as automatic structuring of judgments Bono Net provide legal assistance support to those with lim-
using rhetorical roles and extractive summarization. ited resources.
Many court and government websites offer interactive
Legal Datasets forms and templates to assist self-represented litigants in cre-
Guha et al. [2023] introduced LegalBench, a benchmark for ating their legal documents with ease. Legal education web-
measuring legal reasoning in large language models. The sites like Nolo expand access to legal information through
extensive guides, DIY resources, books, and articles, making 158.91 words. These paragraphs serve as the context for gen-
legal processes and document preparation more accessible to erating the QA dataset.
non-lawyers. Highlighting the intersection of AI and legal We use the LangChain (Chase [2022]) framework to create
aid, Barale [2022] proposes an innovative approach to design- a QA generation pipeline. To guide the generation process,
ing ethical human-AI reasoning support systems for decision- we employ few-shot prompting technique, prompt template is
makers in specialized legal areas, such as refugee law, under- shown in figure 2. We use a publicly available short question-
scoring the potential of AI to augment human capabilities in answering dataset for the Indian legal system to serve as
the legal domain. Joshi et al. [2016] proposes a method for examples. The dataset comprises of 150 question-answers
legal cognitive assistance. pairs with questions pertaining to the Indian Constitution, ju-
diciary, legislative, and various socio-political issues in In-
3 Dataset dia. For each curated context, we select 10 question-answers
pairs using maximum marginal relevance criterion such that
In this section, we describe resources that we have created the questions of the pair are most similar to the context at
for enabling legal text analytics. These resources also include hand while maintaining the diversity in these selected ques-
human annotations and mechanisms to interact with experts tions. The selected QA pairs and the context are placed into
and lawyers. We build on some existing resources from the the prompt (Figure 2) and the model is queried.
literature.
Legal Knowledge Graph
A legal knowledge graph can help students familiarize with
the legal terms and concepts. Such knowledge graphs can
also be used to infuse knowledge into or fine tune large lan-
guage models (LLMs) to fill gaps in such models where they
may not have sufficient domain specific knowledge.
We build on Dhani et al. [2021] and Jain et al. [2022] to
create a legal knowledge graph by scraping the web for court
cases, judgements, laws and other cases cited from the judge-
ments etc. In particular, we use court repositories and other
public sources in the Indian court system. We further anno-
tate these documents using manually curated dictionaries as
described in Vannur et al. [2021]. We process the original Figure 2: Prompt template for question-answers generation
documents using Stanza (Qi et al. [2020]) and extract entities
and relations using SystemT (Chiticariu et al. [2010]). We
also use ground truth labels for citations and similarity from
IndianKanoon Sinha [2008] and Casemine Yadav [2013].
We represent the knowledge graph in triples format com-
prising of subject, object and predicate. Table 1 shows the
details about the created knowledge graph. This knowledge
graph along with other annotations will be made publicly
available by us.
Table 1: Details of the legal knowledge graph
Documents 2,286
Sentences 895,398
Triples 801,604
Entities 329,179 Figure 3: Distibution of question lengths in our Question Answering
dataset
Relations 43
The pipeline generates the response (questions and corre-
sponding answers) as JSON objects. We analyse the length
Question Answering distribution of generated questions and plot it in Figure 3.
We present a legal question answering dataset for law stu-
dents, which has been automatically constructed using the Text2SQL
gpt-3.5-turbo model by OpenAI. We download 45 judg- For the text2sql dataset, that has applications in retrieval aug-
ments from the Delhi High Court and extract 1740 paragraphs mented generation for petition drafting among other tasks, we
containing meaningful legal text such as paragraphs contain- extend the question answering dataset described above. We
ing major text as quotation or references from other judg- use ideas from Kothyari et al. [2023] to make the models to
ments, names of petitioners, respondents, judges, organiza- hallucinate a database schema required for a query and put
tions etc. were discarded. The average paragraph length is them together into a sufficiently large dataset.
Figure 4: Tasks in Legal Text Analytics
4 Tasks et al. [2010], Ganesan et al. [2020] and Qi et al. [2020]. This
In this section, we present several tasks in Legal Text Ana- case similarity dataset has 2286 legal documents with cita-
lytics that benefit from human interaction either with addi- tions from IndianKanoon Sinha [2008] and similar cases as
tional annotations or feedback. We also discuss tasks like pe- recommended by Casemine (Yadav [2013]).
tition drafting which is inherently a human activity that draws Compared to the vanilla RGCN Schlichtkrull et al. [2018]
from peoples’ lives. Each of these tasks has been discussed in baseline, they report that the law points identified by legal
the literature, where supervised methods have been proposed experts, when used as handcrafted features in addition to
earlier. We believe annotating samples for training super- the features provide better results. They also compare their
vised models is not necessarily scalable for building human- method with an RGCN version where the node features are
centered AI. Instead, we need to leverage Large Language encoded with LegalBERT Chalkidis et al. [2020].
Models and tailor them to the needs of Legal Text Analytics. We conjecture that this task can be performed with Large
With this motivation, we have looked at several tasks, super- Language Models, by presenting similar and dissimilar cases,
vised methods currently available for them, and how LLMs and asking it to predict if two judgements are similar or oth-
can compete and outperform these supervised methods with erwise. We use the LLaMA-2 70B Chat model and do few-
human intervention. Figure 4 shows the legal tasks at differ- shot prompting, taking a few examples comprising of pairs
ent levels, from low-level nlp tasks to downstream tasks. of document excerpts. We take 1,313 similar pairs from the
dataset used by the authors in Dhani et al. [2021] and an equal
4.1 Case Similarity number of random pairs. We then ask the LLM if a pair is
Dhani et al. [2021] [present] a case similarity solution using similar by giving a one word response. We prompt the lan-
Graph Neural Networks (GNNs), that can help law practi- guage model with these 2,626 target pairs of document ex-
tioners to find similar cases that could lead to early settle- cerpts taken from the 958 unique judgments in the dataset.
ments, better case documents, and faster judgements. Fol-
lowing Vannur et al. [2021], the authors construct a legal Model Case Similarity
knowledge graph using human curated dictionaries to extract RGCN baseline 0.513
entities and relations from documents. The judgements and
acts used in that work are from IndianKanoon Sinha [2008], RGCN + handcrafted features 0.556
a search engine API and Casemine Yadav [2013], a resource RGCN + LegalBERT 0.550
that provides similar judgements among other features. The
authors report manually annotating each judgement as con- LLaMA-2-70B-Chat 0.566
taining a law point or otherwise. They use these annotations
as 27 features with one-hot encoding. Table 2: Performance of different models using ROC-AUC scores
Like in Dhani et al. [2021], we extract entities and relations
from legal documents, using methods described in Chiticariu The case similarity output from the model also has the
reasoning behind the Yes-No response and contained 37.62 [2022] and Bindal et al. [2023] for Legal-LED and Legal-
tokens on average. The performance of the LLaMA-2 chat Pegasus on the IN-Ext dataset published by the former. The
model (Touvron et al. [2023]) is compared with the super- ROUGE scores for the two semantics-based metrics are rea-
vised RGCN model on the Case Similarity task as shown sonably good and a end-user may not suspect any risk in
in Table 2. We report the ROC-AUC scores as in Dhani et believing the summaries. However an expert analysis of a
al. [2021]. Their model with handcrafted features performed summary with highest semantic similarity shows a number of
better than the vanilla version with 0.556 ROC-AUC for case problems in the summary.
similarity. Encoding the features with LegalBERT did not The cutting-edge models for abstractive judgment summa-
seem to improve performance. LLaMA-2-70B-Chat yields a rization, which might have seen some text as part of train-
ROC-AUC score of 0.56626 on the same task. ing data, tend to generate it verbatim in the summary over-
looking the context. Resulting inaccuracies, alterations in
4.2 Judgment Summarization proper nouns, locations and numbers lead to significant de-
Judgement summarization is an important task in legal re- viations from the original content, which is unacceptable in
search. Since judgments typically run into tens of pages, judgment summaries. Dal Pont et al. [2023] explicitly cau-
stakeholders including law professionals, activists and com- tion that abstractive summarization may pose the risk of mis-
mon public, require the gist of the document to make sense leading the readers by generating content absent in the orig-
out of it. Generic and abstractive summarization of judgments inal document. Even a subtle shift in the order of a single
is typically useful for common citizens and law students, who word can alter the meaning, as exemplified by the stark dif-
are not familiar with legal jargon and prefer summaries in a ference between ”accepted an appeal that had been denied”
language that is understandable. Contrastingly, law profes- and ”denied an appeal that had been accepted” [Feijo and
sionals and legal researchers require aspect-based summaries Moreira, 2023]. Superiority of extractive legal summaries
that meet their personal information needs. Law profession- over abstractive ones is well established in some recent stud-
als usually prefer inclusion of legal terms in the summary, ies[Shukla et al., 2022; Feijo and Moreira, 2023; Dal Pont et
thereby necessitating extractive summaries covering the re- al., 2023; Bindal et al., 2023].
search dimension that is important for the law professionals Despite its current caveats, moderating research and devel-
at that point of time. Thus both approaches, abstractive and opment of abstractive techniques for judgment summariza-
extractive, for judgement summarization are essential and re- tion is myopic. We argue that abstractive approach has strong
quire considerable attention for adoption of AI-driven LTA potential for making the long judgment intelligible to a lay
services. person. Ergo, development of reliable and trustworthy sum-
Recent advancements in LTA have led to introduction marization methods that unscramble the complex legal lan-
of several state-of-the-art extractive judgment summarizers. guage is fundamental for democratizing legal knowledge and
Bhattacharya et al. [2021] propose DELSumm, an extrac- ease of access to justice.
tive summarizer that systematically infuses domain expertise, Judiciously integrating knowledge graph, legal dictionary,
elucidating the essential information that should ideally be ontology and other external knowledge sources with LLMs
present in the summary of a judgment. Bindal et al. [2023] can not only alleviate introduction of foreign entities and
summarize landmark judgments based on the references in facts in the summary, but also unravel long and complex le-
citing judgments. Recently, Dal Pont et al. [2023] used state- gal concepts in judgments. Recognizing the prowess of SOTA
of-the-art LLM GPT4 with prompt engineering to generate LLMs for generating confident, yet simple language, we press
extractive summary and report human evaluation results. Ex- for blending sanitation strategies for fact-hygiene and para-
tractive summarizers are comprehensible to seasoned profes- phrasing the legal concepts to generate lay-summaries of the
sionals as legal judgments feature long sentences and intricate judgments. Human-centred abstractive summarization of le-
legal terminology. gal judgments is vital for societal good and improved legal-
Abstractive summarization for legal judgements has been awareness in public.
practiced using several approaches. Legal-LED and Legal-
Pegasus are fine-tuned versions of pre-trained language mod- 4.3 Petition Drafting
els LED and Pegasus respectively. Both models are fine- Petition drafting - the LLM should ask questions that the
tuned on publicly available legal data from the American petitioner can answer. These additional information should
judicial system. Shukla et al. [2022]; Bindal et al. [2023] strengthen the petition.
report evaluation results for summarization of same set of Petition drafting is a task, that is inherently human cen-
Indian legal judgements using semantics-based metrics and tered, especially in the context of Indian court system. Indian
ROUGE scores. Dal Pont et al. [2023] report human eval- courts have the concept of Public Interest Litigations (PILs),
uation of abstractive summaries for Italian legal documents using which, any citizen can approach a court of law to seek
using GPT3 and GPT4. Feijo and Moreira [2023] generate relief on issues concerning the people. There are, of course,
abstractive summaries by chunking the source text, training much larger number of people approaching the courts seeking
summarization models to generate independent versions of redressal of their grievances.
summaries, and applying entailment module to mitigate hal- Enabling people or their lawyers to write well-written pe-
lucination. The method is evaluated for Brazilian Supreme titions can go a long way in getting them access to justice.
Court Rulings using ROUGE metric. Given the backlog and the volume of petitions disposed by
We delve deep into the results reported in Shukla et al. courts in India and in many other countries, poorly written
Figure 5: Format of a Writ petition to be filed in the Supreme Court of India. An LLM based solution for assisting self-representing litigants
should elicit information to draft such a petition.
petitions can add significant cost to both individuals and the Model Hits@1 ↑ Hits@5 ↑ Hits@10 ↑
society as a whole. Among other things, poorly written pe-
titions could be those that leave out important pieces of in- Legal-BERT-base-uncased 0.000 0.000 0.005
formation, addressed to the wrong courts or authorities, risk
being dismissed as frivolous when infact they are not. Figure InLegalBERT-ft-corpus 0.005 0.005 0.015
5 illustrates the format of a writ petition to be filed in SCI.
InLegalBERT-ft-triples 0.225 0.350 0.395
We propose using LLMs to identify missing information in
a petition. This is a qualitatively a much harder task than writ- LLaMA-2-34B-Instruct 0.520 0.556 0.617
ing petition which focus on the writing style and presentation.
Our task involves making LLMs to identify missing informa- Table 3: Comparison of model performance on relation/tail predic-
tion that should typically be present in the petition. This can tion, on a subset of the Legal KG triples
be designed as a conversational question answering task. This
is closely related to the factuality related work in LLMs, since
we do not want the model to ask trivial questions. The model represented litigants in the court system. Considering large
needs to be able to identify salient information in a petition language models are often evaluated on the question answer-
and prompt the user to furnish any missing information. ing task, they can be used in this legal context too.
For example, in a petition about a missing person which is Vasisht et al. [2023] compare the performance of an InLe-
a very sensitive but important judicial function, the petition is galBERT model trained on Indian judgments and additionally
expected to provide the time when the person was last seen fine-tuned on the corpus, with LegalBERT which hasn’t seen
by a member of the public or a CCTV camera. While we the corpus. We report their numbers for reference and com-
expect this to be a multi-turn conversation similar to Trivedi et pare the performance of a LLaMA-2 model on the task, with
al. [2023], we currently focus on putting together a question and without human inputs.
answering dataset and evaluating our LLaMA-2 model on the
question answering task. The dataset consists of 4129 question answer pairs from
our Indian court judgements dataset. This dataset can be used
for different question answering tasks including closed-book
4.4 Question Answering
QA tasks where a model is expected to answer the question
Question Answering is an important application to impart without any further context or external knowledge. We be-
legal knowledge to students and answer questions of self- lieve models created to assist in petition drafting should be
Figure 7: Example SQL query to fill details of a citation in a Writ
Petition
Figure 6: Example Question and Answer in our Indian Court Judge-
ments dataset.
that elicits knowledge from human users as a better solution
for the problems in this space.
Recent work like LegalBERT (Chalkidis et al. [2020]),
able to do well in this task. We expect such models to be CaseLawBERT (Zheng et al. [2021]) and JuriBERT (Douka
pre-trained with legal documents or triples or fine-tuned as et al. [2021]) show the sustained interest of researchers in
appropriate. In-context learning where the external knowl- using language models for downstream legal tasks. How-
edge is provided as triples or as text Vasisht et al. [2023] can ever, these models are typically trained on European legal
also be considered. documents, which are structured by nature, and do not per-
However, currently we do not have models infused with form well in the Indian context directly where courts do not
legal judgements that can perform this closed-book question follow standardized structures when publishing legal docu-
answering. We expect most LLMs to perform on the read- ments. Under these circumstances, existing PLMs do not
ing compression task but we do not believe that capability is work well out of the box and need additional training on lo-
particularly useful for the tasks in Legal Text Analytics. In cal corpora. Works such as InLegalBERT and InCaseLaw-
petition drafting, we need the model to analyze the petition BERT (Paul et al. [2022]) involve training the base models on
and ask question to fill missing information, like the date on Indian legal documents and achieve reasonable performance
which a particular judgement was delivered. on certain tasks like legal statute identification, semantic seg-
We believe providing knowledge graph triples or results mentation and court judgment prediction. In Table 2, we ob-
from a SQL query to be more promising approaches. Our serve that open-domain LLMs have comparable performance
question answering dataset described in Section B of the Sup- to RGCN models. Much work still needs to be done to make
plementary Material gives examples of question answer pairs LLMs useful in LTA tasks that need human expertise.
along with related knowledge graph triples that have been ex- We conjecture to pre-train a LLaMA-2 foundation model
tracted from a judgement. We expect the user to be able to (Touvron et al. [2023]) on Indian legal domain corpora and
upload a judgement or point to a URL, after which we can instruction-tune it for a selected set of tasks in the legal
generate these triples to be provided as context. domain using a concept-enhanced pre-training objective of
entity-concept prediction as proposed in Wang et al. [2024],
4.5 Text2SQL which might help to mitigate hallucination in domain tasks,
In the context of legal text analytics, we consider Text2SQL which could have societal consequences. We aim to fine-
to be predominantly useful for petition drafting and legal re- tune the model using parameter-efficient fine-tuning methods
search, though other analytics use-cases exist. Generating on feasible domain tasks relevant in the Indian context, and
relevant information about past cases can be helpful to cite compare its performance on multiple datasets in tandem with
judgements, as well as establish facts in a written faction. For other state-of-the-art models, with and without fine-tuning on
example, the date on which a particular judgement was deliv- domain corpora.
ered is not part of citation prediction datasets, where as it is Vasisht et al. [2023] propose infusing knowledge into
required while writing a petition. We propose a retrieval aug- LLMs as a general purpose way to improve model perfor-
mented generation (RAG) solution to this problem, where rel- mance on documents. They use contextual text in lieu of
evant facts of a case can be queried using a Text2SQL model. knowledge graph triples. We propose to use a similar ap-
However, the resources for enabling such a feature cur- proach to add context to prompts when training the model on
rently do not exist in the literature. We’ve created a rela- Indian legal documents. Using soft prompts and domain con-
tional database schema consisting of tables and relationships cepts in the training process, we will come up with a family of
between them as provided in Section C of the supplementary LLMs infused with knowledge about the Indian legal system.
material. Given the question answering dataset described in
Section 4.4, and the database schema we have created, we 6 Conclusion
generate text to SQL examples as shown in Figure 7. Our observations of the existing works and applications in the
domain of legal text analytics makes it amply clear that we
5 InLegalLLaMA need to develop and deploy human-centered compound Legal
In Section 4, we described the tasks in Legal Text Analyt- AI systems like the one we have described in this position
ics, that are unique to the legal domain in countries like In- paper. By principally eliciting human input, AI systems can
dia. Based on a survey of the literature, and our observations improve the performance of large language models (LLMs)
on the performance of LLMs, we propose a large language and help our communities build impactful applications, that
model infused with legal knowledge and a composite system enable common people to access justice in our court systems.
References Knowledge prompts: Injecting world knowledge into lan-
Bilal Abu-Salih. Domain-specific knowledge graphs: A guage models through soft prompts, 2022.
survey. Journal of Network and Computer Applications, Stella Douka, Hadi Abdine, Michalis Vazirgiannis, Rajaa
185:103076, 2021. El Hamdani, and David Restrepo Amariles. JuriBERT: A
Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al- masked-language model adaptation for French legal text.
Rfou. Knowledge graph based synthetic corpus genera- In Proceedings of the Natural Legal Language Processing
tion for knowledge-enhanced language model pre-training, Workshop 2021, pages 95–101, 2021.
2020. Atefeh Farzindar and Guy Lapalme. Letsum, a text summa-
Ankush Agarwal, Sakharam Gawade, Sachin rization system in law field. In a THE FACE OF TEXT
Channabasavarajendra, and Pushpak Bhattacharyya. conference (Computer Assisted Text Analysis in the Hu-
There is no big brother or small brother: Knowledge infu- manities), pages 27–36, 2004.
sion in language models for link prediction and question Diego de Vargas Feijo and Viviane P Moreira. Improving
answering. arXiv preprint arXiv:2301.04013, 2023. abstractive summarization of legal rulings through textual
Claire Barale. Human-centered computing in legal nlp-an ap- entailment. Artificial intelligence and law, 31(1), 2023.
plication to refugee status determination. In Proceedings of Balaji Ganesan, Avirup Saha, Jaydeep Sen, Matheen Ahmed
the Second Workshop on Bridging Human–Computer In- Pasha, Sumit Bhatia, and Arvind Agarwal. Anu question
teraction and Natural Language Processing, 2022. answering system. In ISWC 2020 Posters Demos and In-
Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripa- dustry Tracks, volume 2721, pages 394–396, 2020.
bandhu Ghosh, and Saptarshi Ghosh. Incorporating do- Neel Guha, Julian Nyarko, Daniel Ho, Christopher Ré, Adam
main knowledge for extractive summarization of legal case Chilton, Aditya K, et al. LegalBench: A Collaboratively
documents. In Proceedings of the eighteenth international Built Benchmark for Measuring Legal Reasoning in Large
conference on artificial intelligence and law, 2021. Language Models. In A. Oh, T. Neumann, A. Globerson,
Purnima Bindal, Vikas Kumar, Vasudha Bhatnagar, Parikshet K. Saenko, M. Hardt, and S. Levine, editors, Advances in
Sirohi, and Ashwini Siwal. Citation-based summarization Neural Information Processing Systems, volume 36, pages
of landmark judgments. In Proceedings of the 20th In- 44123–44279, 2023.
ternational Conference on Natural Language Processing SK Mainul Islam, Abhinav Nagpal, Balaji Ganesan, and
(ICON), 2023. Pranay Kumar Lohia. Fair data generation using language
Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasio- models with hard constraints. CtrlGen Workshop, 2021.
tis, Nikolaos Aletras, and Ion Androutsopoulos. LEGAL- Sarika Jain, Pooja Harde, Nandana Mihindukulasooriya,
BERT: The muppets straight out of law school. In Trevor Sudipto Ghosh, Abhinav Dubey, and Ankush Bisht. Con-
Cohn, Yulan He, and Yang Liu, editors, Findings of the structing a Knowledge Graph from Indian Legal Domain
Association for Computational Linguistics: EMNLP 2020, Corpus. In TEXT2KG @ Extended Semantic Web Confer-
pages 2898–2904, Online, November 2020. Association ence (ESWC 2022), CEUR Workshop Proceedings, volume
for Computational Linguistics. 3184, pages 80–93, 2022.
Harrison Chase. LangChain, October 2022. Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and
Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Joe Ellis. Overview of the tac 2010 knowledge base popu-
Sriram Raghavan, Frederick Reiss, and Shivakumar lation track. In Third text analysis conference (TAC 2010),
Vaithyanathan. SystemT: An algebraic approach to declar- volume 3, pages 3–3, 2010.
ative information extraction. In Proceedings of the 48th Karuna P Joshi, Aditi Gupta, Sudip Mittal, Claudia Pearce,
Annual Meeting of the Association for Computational Lin- Tim Finin, et al. Alda: Cognitive assistant for legal docu-
guistics, pages 128–137, 2010. ment analytics, 2016.
Thiago Dal Pont, Federico Galli, Andrea Loreggia, Giuseppe Prathamesh Kalamkar, Aman Tiwari, Ashutosh Modi, et al.
Pisano, Riccardo Rovatti, and Giovanni Sartor. Legal sum- Corpus for Automatic Structuring of Legal Documents.
marisation through llms: The prodigit project. arXiv e- In Proceedings of the Thirteenth Language Resources and
prints, pages arXiv–2308, 2023. Evaluation Conference, pages 4420–4429, 2022.
Jaspreet Singh Dhani, Ruchika Bhatt, Balaji Ganesan, Parik- Ambedkar Kanapala, Sukomal Pal, and Rajendra Pamula.
shet Sirohi, and Vasudha Bhatnagar. Similar cases recom- Text summarization from legal documents: a survey. Arti-
mendation using legal knowledge graphs, 2021. ficial Intelligence Review, 51(3):371–402, 2019.
Shizhe Diao, Tianyang Xu, Ruijia Xu, Jiawei Wang, and Tong Mayank Kothyari, Dhruva Dhingra, Sunita Sarawagi, and
Zhang. Mixture-of-domain-adapters: Decoupling and in- Soumen Chakrabarti. CRUSH4SQL: Collective Re-
jecting domain knowledge to pre-trained language models trieval Using Schema Hallucination For Text2SQL. arXiv
memories. arXiv preprint arXiv:2306.05406, 2023. preprint arXiv:2311.01173, 2023.
Cicero Nogueira dos Santos, Zhe Dong, Daniel Cer, John Jinyang Li, Binyuan Hui, Ge Qu, Binhua Li, Jiaxi Yang,
Nham, Siamak Shakeri, Jianmo Ni, and Yun hsuan Sung. Bowen Li, Bailin Wang, Bowen Qin, Rongyu Cao, Ruiying
Geng, et al. Can llm already serve as a database interface? Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and
a big bench for large-scale database grounded text-to-sqls. Ashish Sabharwal. Interleaving retrieval with chain-
arXiv preprint arXiv:2305.03111, 2023. of-thought reasoning for knowledge-intensive multi-step
Vijit Malik, Rishabh Sanjay, Shubham Kumar Nigam, Kri- questions, 2023.
pabandhu Ghosh, Shouvik Kumar Guha, Arnab Bhat- Caroline Uyttendaele, Marie-Francine Moens, and Jos Du-
tacharya, and Ashutosh Modi. ILDC for CJPE: Indian mortier. Salomon: automatic abstracting of legal cases for
Legal Documents Corpus for Court Judgment Prediction effective access to court decisions. Artificial Intelligence
and Explanation. In Proceedings of the 59th Annual Meet- and Law, 6(1):59–79, 1998.
ing of the Association for Computational Linguistics, pages Lingraj S Vannur, Balaji Ganesan, Lokesh Nagalapatti, Hima
4046–4062, 2021. Patel, and MN Tippeswamy. Data augmentation for fair-
Fedor Moiseev, Zhe Dong, Enrique Alfonseca, and Martin ness in personal knowledge base population. In Trends and
Jaggi. Skill: Structured knowledge infusion for large lan- Applications in Knowledge Discovery and Data Mining:
guage models. arXiv preprint arXiv:2205.08184, 2022. PAKDD 2021 Workshops, 2021 Proceedings 25, volume
12705, pages 143–152, New Delhi, India, 2021.
Shounak Paul, Arpan Mandal, Pawan Goyal, and Saptarshi
Ghosh. Pre-training transformers on indian legal text. Kinshuk Vasisht, Balaji Ganesan, Vikas Kumar, and Vasudha
arXiv preprint arXiv:2209.06049, 2022. Bhatnagar. Infusing knowledge into large language models
with contextual prompts, 2023.
Seth Polsley, Pooja Jhunjhunwala, and Ruihong Huang. Cas-
esummarizer: a system for automated summarization of le- Xintao Wang, Zhouhong Gu, Jiaqing Liang, Dakuan Lu,
gal texts. In Proceedings of COLING 2016, the 26th inter- Yanghua Xiao, and Wei Wang. ConcEPT: Concept-
national conference on Computational Linguistics: System Enhanced Pre-Training for Language Models. arXiv
Demonstrations, pages 258–262, 2016. preprint arXiv:2401.05669, 2024.
Mohammadreza Pourreza and Davood Rafiei. Din-sql: De- Xiaokai Wei, Shen Wang, Dejiao Zhang, Parminder Bhatia,
composed in-context learning of text-to-sql with self- and Andrew Arnold. Knowledge enhanced pretrained lan-
correction. arXiv preprint arXiv:2304.11015, 2023. guage models: A compreshensive survey. arXiv preprint
arXiv:2110.08455, 2021.
Soumajit Pramanik, Jesujoba Alabi, Rishiraj Saha Roy, and
Gerhard Weikum. Uniqorn: unified question answering Aniruddha Yadav. Casemine: A granular mapping of indian
over rdf knowledge graphs and natural language text. arXiv case law, 2013.
preprint arXiv:2108.08614, 2021. Jian Yang, Gang Xiao, Yulong Shen, Wei Jiang, Xinyu
Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Hu, Ying Zhang, and Jinghui Peng. A survey of
Christopher D Manning. Stanza: A python natural lan- knowledge enhanced pre-trained models. arXiv preprint
guage processing toolkit for many human languages. In arXiv:2110.00269, 2021.
Association for Computational Linguistics (ACL) System Wang Yizhen, Ou Shiyan, and Chen Jinju. Automatic ab-
Demonstrations. 2020., 2020. stracting civil judgment documents with two-stage proce-
dure. Data Analysis and Knowledge Discovery, 5(5):104–
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne
114, 2021.
Van Den Berg, Ivan Titov, and Max Welling. Modeling re-
lational data with graph convolutional networks. In The Se- Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu
mantic Web: 15th International Conference, ESWC 2018, Wang, Zifan Li, James Ma, Irene Li, Qingning Yao,
Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, Shanelle Roman, Zilin Zhang, and Dragomir Radev. Spi-
pages 593–607, 2018. der: A large-scale human-labeled dataset for complex
and cross-domain semantic parsing and text-to-SQL task.
Ben Shneiderman. Human-centered AI. Oxford University
In Ellen Riloff, David Chiang, Julia Hockenmaier, and
Press, 2022.
Jun’ichi Tsujii, editors, Proceedings of the 2018 Confer-
Abhay Shukla, Paheli Bhattacharya, Soham Poddar, Rajdeep ence on Empirical Methods in Natural Language Process-
Mukherjee, Kripabandhu Ghosh, Pawan Goyal, and Sap- ing, pages 3911–3921, 2018.
tarshi Ghosh. Legal case document summarization: Ex- Matei Zaharia, Omar Khattab, Lingjiao Chen, Davis,
tractive and abstractive methods and their evaluation. In Heather Jared Quincy, Miller, Chris Potts, James Zou,
The 2nd Conference of the Asia-Pacific Chapter of the As- Michael Carbin, Jonathan Frankle, Naveen Rao, and Ali
sociation for Computational Linguistics and the 12th Inter- Ghodsi. The shift from models to compound ai systems.
national Joint Conference on Natural Language Process- https://bair.berkeley.edu/blog/2024/02/18/compound-ai-
ing, 2022. systems/, 2024.
Sushant Sinha. IndianKanoon: Search Engine for Indian Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Hen-
Law, 2008. derson, and Daniel E. Ho. When does pretraining help?
Hugo Touvron, Louis Martin, Kevin R. Stone, Peter Al- assessing self-supervised learning for law and the casehold
bert, Amjad Almahairi, et al. Llama 2: Open Foun- dataset. In Proceedings of the 18th International Confer-
dation and Fine-Tuned Chat Models. arXiv preprint ence on Artificial Intelligence and Law. Association for
arXiv:2307.09288, 2023. Computing Machinery, 2021.