GPT 4.5 Passes the Turing Test: Study

A UC San Diego study found that human participants frequently misidentified responses generated by OpenAI’s GPT‑4.5 along with Meta’s Llama‑3.1‑405B as coming from a human.
Illustration by Nalini Nirad

The University of California, San Diego, unveiled a research study on Tuesday that claims to provide the “first empirical evidence that any artificial system can pass a standard three-party Turing test”.

happyllama

Alan Turing, a British mathematician and computer scientist, introduced the ‘imitation game’ in 1950, proposing that if an interrogator couldn’t distinguish between a machine and a human in text, the machine might possess human-like intelligence. In a three-party Turing Test, an interrogator converses with both a human and a machine to accurately identify the human. 

The research tested three AI models: OpenAI’s GPT-4.5, Meta’s Llama 3.1 405B, and OpenAI’s GPT-4o. Human participants engaged in five-minute test conversations with one human and one AI system using a split-screen interface. After each round, the interrogator selected the participant they believed was human. 

The AI models were evaluated under two conditions: a minimal instruction (NO-PERSONA) prompt and an enhanced PERSONA prompt that guided the AI to adopt a specific human-like demeanor. 

The results indicated that GPT-4.5 with the PERSONA prompt achieved a win rate of 73%, suggesting that interrogators often mistook it for a human. Llama 3.1‑405B with the PERSONA prompt attained a win rate of around 56%, whereas GPT‑4o under NO‑PERSONA conditions only reached a win rate of 21%.

Interrogators primarily engaged in small talk—asking about daily activities and personal details in 61% of interactions, while also probing social and emotional aspects such as opinions, emotions, humour, and experiences in 50% of interactions.  

“If interrogators are not able to reliably distinguish between a human and a machine, then the machine is said to have passed [the Turing test]. By this logic, both GPT-4.5 and Llama-3.1-405B pass the Turing Test when they are given prompts to adopt a human-like persona,” read a section of the research study. 

The authors stated that these systems might seamlessly supplement or even replace human labour in economic roles that rely on brief conversational exchanges. 

“More broadly, these systems could become indiscriminable substitutes for other social interactions, from conversations with strangers online to those with friends, colleagues, and even romantic companions,” the authors added. 

OpenAI released the GPT-4.5 model in February, which was mostly appreciated for its thoughtful and emotional responses. Ethan Mollick, a professor at The Wharton School, said on X, “It can write beautifully, is very creative, and is occasionally oddly lazy on complex projects.” He even joked that the model took a “lot more” classes in the humanities.

📣 Want to advertise in AIM? Book here

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.
Related Posts
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.