News
Arti ici l intelligence
GPT-4 wins chatbot lawyer contest
AI chatbots have gone head to head on legal reasoning, but none can match real lawyers yet
Jeremy Hsu
LANDMARK chatbot GPT-4 has Guha and his colleagues behind (arXiv, [Link]/kshx). OpenAI’s generative AI
beaten other arti icial intelligence used LegalBench to evaluate “I think the expectation is technology – customised by the
models on a test of legal 20 commercial and open-source that a human legal professional company Harvey – can research
reasoning, but it still falls short large language models, including trained for the task should get legal topics and draft emails. The
of the skills required by lawyers. OpenAI’s GPT-4 and GPT-3.5. GPT-4 close to perfect on the majority greatest challenges in adopting
Some early attempts to use AI generally performed the best with of LegalBench tasks,” says Guha. AI have to do with the costs, risk
chatbots in court have been scores in the 70s and 80s out of “Unfortunately, we haven’t had management and information
disastrous, and this latest inding 100. However, it performed poorly the chance to evaluate a human security for proprietary data,
adds evidence that AI isn’t ready on recalling the speci ics of legal legal professional on these tasks.” but also how such technologies
to handle real-world legal tasks. rules, scoring just 59. GPT-3.5, Providing an independent, handle more complex legal
Neel Guha at Stanford Anthropic’s Claude and Meta’s comparative ranking of AIs is reasoning, says Roberts.
University in California worked Llama models generally trailed likely to be useful for law irms and Given the “signi icant
with AI researchers and lawyers legal teams, says Tom Roberts, a limitations” of AI, the law irm
to design LegalBench, which Artificial intelligence partner at law irm Allen & Overy. has been careful to “place clear
evaluates how well AI chatbots is increasingly being Since February, his company limits on the technology’s use
can deal with six different types used by lawyers has been testing how well and always have a human in
of legal reasoning. LegalBench the loop”, says Roberts.
includes 162 practical tasks that AI chatbots may still give
human lawyers must handle inaccurate answers when asked
in everyday practice, such to provide information about
as correctly analysing legal speci ic cases, laws or regulations,
documents and detecting says Guha. In one real-life
different types of legal language. example, a US judge ined lawyers
That makes LegalBench more for submitting materials based
WESTOCK PRODUCTIONS/SHUTTERSTOCK
relevant to how lawyers work than on non-existent legal cases that
seeing if an AI can memorise the were made up by ChatGPT.
information needed to pass a bar The use of AI chatbots in
exam that lawyers take to qualify, the legal profession also raises
says Guha. GPT-4 has already questions about unauthorised
shown it can outperform the practice of law, copyright issues,
average bar exam test taker legal malpractice and who can
with a score of 75 per cent. access such legal help, says Guha. ❚
He lth
Westerners sleep on average, 242 nights of data days in the West. “They’re not really going to get less sleep,” says Chee.
a rest day,” he says. There is no evidence that people
later on weekends for each wearer.
Overall, across all countries, The researchers also found in Asia need less sleep than those
than people in Asia people slept longer on the weekend that people in Asia typically went in the West, says Chee; instead, the
than during the week and woke up to sleep later and slept less on results reveal how culture affects
WESTERNERS are more likely to go 30 to 60 minutes later. But those in weekdays than people in the US sleep practices.
to bed earlier and have longer lie-ins Asia tended to have shorter lie-ins and Europe. On average, people He says that although the
at the weekend than people in Asia, than the typical Westerner. in Asia slept about 45 minutes sleep data was collected during
which could have health impacts. People in India had the shortest per day less than people in other the covid-19 pandemic, the trends
Michael Chee at the National weekend lie-ins, at some 3 minutes parts of the world. “Work times are are likely to be similar today.
University of Singapore and his longer than their weekday sleep, and roughly the same across the world, As disturbed sleep affects
colleagues analysed data from those in Finland had the longest, so if you go to sleep later, you’re quality of life and long-term health,
220,000 people in 35 countries with about 27 extra minutes it will be crucial to explore the
using sleep trackers between (Sleep Medicine, [Link]/kswp). “Weekend days in many effects of these differences in sleep
January 2021 and January 2022. Chee says that, in many Asian Asian countries aren’t patterns, says Victoria Pak at Emory
Most of the countries were either countries, weekend days aren’t culturally the same as University in Atlanta, Georgia. ❚
Western or Asian and there were, culturally the same as weekend weekend days in the West” J son Arunn Murugesu
16 | New Scientist | 16 September 2023