ChatGPT's Diagnostic Accuracy in Ophthalmology
ChatGPT's Diagnostic Accuracy in Ophthalmology
https://doi.org/10.1007/s00417-023-06363-z
MISCELLANEOUS
Received: 28 June 2023 / Revised: 4 December 2023 / Accepted: 23 December 2023 / Published online: 6 January 2024
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024
Abstract
Purpose The purpose of this study is to assess the diagnostic accuracy of ChatGPT in the field of ophthalmology.
Methods This is a retrospective cohort study conducted in one academic tertiary medical center. We reviewed data of patients
admitted to the ophthalmology department from 06/2022 to 01/2023. We then created two clinical cases for each patient. The
first case is according to the medical history alone (Hx). The second case includes an addition of the clinical examination
(Hx and Ex). For each case, we asked for the three most likely diagnoses from ChatGPT, residents, and attendings. Then,
we compared the accuracy rates (at least one correct diagnosis) of all groups. Additionally, we evaluated the total duration
for completing the assignment between the groups.
Results ChatGPT, residents, and attendings evaluated 126 cases from 63 patients (history only or history and exam find-
ings for each patient). ChatGPT achieved a significantly lower accurate diagnosis rate (54%) in the Hx, as compared to
the residents (75%; p < 0.01) and attendings (71%; p < 0.01). After adding the clinical examination findings, the diagnosis
rate of ChatGPT was 68%, whereas for the residents and the attendings, it increased to 94% (p < 0.01) and 86% (p < 0.01),
respectively. ChatGPT was 4 to 5 times faster than the attendings and residents.
Conclusions and relevance ChatGPT showed low diagnostic rates in ophthalmology cases compared to residents and attend-
ings based on patient history alone or with additional clinical examination findings. However, ChatGPT completed the task
faster than the physicians.
Key messages
What is known:
Chatbots in medicine can use natural language processing (NLP) to analyze patient symptoms and provide a
diagnosis.
ChatGPT may be able to provide a preliminary diagnosis or offer guidance on what kind of specialist a patient
should see based on their symptoms.
Chatbots have the potential to improve patient access to medical information.
What is new:
In the field of ophthalmology, ChatGPT achieved approximately 50% of correct diagnoses when presented with
patient history.
Currently, ChatGPT scored significantly lower in overall performance compared to both residents and attendings.
Vol.:(0123456789)
2346 Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352
pupil dilation including the posterior segment. In selected supplementary analysis specifically focuses on relevant his-
cases, according to the physician’s judgment, the patients tory (Relevant Hx) and relevant history and examination
were tested for color vision, confrontational visual fields, (Relevant Hx and Ex). In this analysis, we converted all
cover-uncover test, alternating cover test, etc. cases that would be pertinent when a resident is presenting
a case to a supervising physician/attending. The rationale
Clinical scenarios behind this is grounded in the conventional clinical practice
where only relevant information is typically presented to a
We created two clinical cases for each patient. The first case higher-ranking medical professional. Then we asked Chat-
(History (“Hx”)) included the patient’s age and gender, GPT-3.5 the same prompt (“What are the three most likely
past medical history, past surgical history, medications and diagnoses for the following case”) following relevant history
allergies, past ocular history, chief complaint, and history or followed by relevant history and clinical findings. The
of present illness. The second case (History and Examina- answers were documented.
tion (“Hx and Ex”)) encompassed an additional description Also, the total amount of time for answering all cases
of a complete ophthalmology exam, including visual acu- was recorded for each group (ChatGPT, residents and
ity, extraocular motility, pupillary responses, relative affer- attendings).
ent pupillary defect, intraocular pressure, and a slit lamp
examination with documentation of the lids/lashes/lacrimal Outcomes
system, conjunctiva/sclera, cornea, anterior chamber, iris,
lens, vitreous, optic nerve, macula, vessels, and periphery. The primary outcome was to assess the correct diagnostic
Both sections were created retrospectively according to the prediction rate of ChatGPT as compared to both the resi-
medical records. Patient information was presented in bulk dents and the attendings. For that, we used the final diagno-
for the history part and clinical examination findings were sis of the true patients at discharge from our department as
presented in. the correct diagnosis (gold standard). We calculated the rate
For each case, we asked the ChatGPT-3.5 (https://o penai. of the correct assumed diagnosis for all Hx cases alone and
com/blog/chatgpt; ChatGPT March 2023 Version) the exact the Hx and Ex cases separately for ChatGPT. For the resi-
same prompt—“What are the three most likely diagnoses for dents group and attendings group, we counted a diagnosis as
the following case” followed by the history part as free text correct only if at least two physicians included it.
in the Hx cases or followed by both the history and clinical Secondary outcomes included a comparison of the total
findings in the in Hx and Ex cases. The answers were docu- duration of time that ChatGPT, residents, and attendings
mented. A new chat was started for every Hx case or Hx and used to complete the task.
Ex case. In the same manner, the cases were presented sepa-
rately to three ophthalmology residents (residents group) and Ethical considerations
three well-experienced senior ophthalmologists (attendings
group). Senior physicians were collectively described as The study was conducted in accordance with the tenets of
“attendings.” This group consisted of an anterior segment the Declaration of Helsinki, and approval was obtained from
specialist, a posterior segment specialist, and a general oph- the institutional review board (ASF-0018–23). Informed
thalmologist. All three have over 20 years’ experience in consent was waived for this study by the institutional review
clinical practice and are part of the regular attending staff board. To maintain anonymity and protect the privacy of our
of the department. Only physicians who were not involved patients, only the two researchers (A.S and M.C) who col-
in the diagnosis or treatment of any of the cases included in lected the data from the electronic medical files were able
the study were allowed to participate in the study. The study to view patients’ personal details. The rest of the researchers
design specified that the accuracy of ambiguous diagnosis got access only to the necessary clinical data. Moreover, all
would be settled with consensus by two blinded ophthal- cases presented to ChatGPT and the physicians were anony-
mologist assessors. We decided on a conservative approach mous in nature, and no identifying information of any kind
where only the exact diagnosis would be acceptable, allow- was used.
ing only synonyms for the same condition. The specified
diagnosis was required to accurately reflect the underlying Statistical analysis
condition of the patient and accurately reflect the severity of
the patient’s condition and should guide appropriate treat- Statistical analysis was performed using IBM SPSS Statis-
ment decisions. tics 25 (IBM Corp. Armonk, NY). Continuous variables
In addition to the initial analyses involving both the his- were analyzed into descriptive statistics such as mean,
tory (Hx) and the combination of history and examination median, and standard deviation. A comparison between the
(Hx and Ex), an additional analysis was introduced. This groups was conducted using the chi-square test and Fisher’s
2348 Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352
exact test for binary variables. All statistical tests were two- most common diagnoses were retinal detachment, infectious
tailed. p < 0.05 was considered statistically significant. keratitis, and optic neuritis (Table 1). The final and true diag-
nosis in most cases was from the retinal field (33.3%) followed
by cornea (25%), neuro-ophthalmology (16%), uveitis (11%),
Results glaucoma (8%), and oculoplastics (6%).
In the Hx cases, we found that ChatGPT achieved a sta-
A total of 126 cases (63 patients) were included in the analy- tistically significant lower diagnosis rate of 54% (34 cases),
sis (Fig. 1). The mean age was 51.2 ± 17.7 years and 54% were as compared to the residents with 75% (47 cases; p<0.01)
male. Diagnoses spanned a wide range of disorders. The three and the attendings with 71% (45 cases; p<0.01; Table 2).
Furthermore, when adding the clinical examination, Chat- However, the ability of AI to reach medical diagnoses
GPT scored 68% (43 cases) correctly, whereas the residents based on history alone is an area of active development. In
and the attendings achieved significantly higher rates of 94% this study, we aimed to assess the capabilities of ChatGPT
(59 cases; p<0.01), and 86% (54 cases; p<0.01, Table 2). to diagnose ophthalmologic entities based on real-world
While the accurate diagnosis rate of ChatGPT declined data—history with or without clinical examination data of
between the Hx cases and the Hx and Ex, both the residents 63 patients admitted to our ophthalmology department (Sup-
and the attendings significantly increased the diagnostic plemental Fig. 1). The results show that the rates of correct
rates (p < 0.01). However, ChatGPT achieved a high accu- diagnosis for ChatGPT were 54–68%. We compared Chat-
racy diagnosis rate in both Relevant Hx (57%) and Relevant GPT diagnostic capabilities to physicians in training and to
Hx and Ex (76%; supplementary Table 2). senior ophthalmologists. The residents and the attendings
The aggregate duration required for ChatGPT to produce achieved significantly higher diagnostic rates in all of the
diagnoses for all 126 cases was 42 min and 41 s, which is assessments.
equivalent to an average of 20.19 s per case. This represents In general medicine, studies have reinforced the idea that
a speed improvement of 5.3 times when compared to the a high percentage of patient diagnoses (76–82.5%) can be
residents and 4.5 times when compared to the attendings accurately made using only medical history [20, 21]. Oph-
(Table 2). thalmology, however, is seen as a field that heavily relies on
Overall, ChatGPT was most successful in solving retinal pattern recognition and visual exam input. Yet, the system-
cases (76–81%) and corneal cases (56–81%, Table 3). Chat- atic collection of a comprehensive patient’s medical history
GPT had relatively high diagnostic rates in common oph- still plays an important role in the diagnostic process in oph-
thalmology entities, such as retinal detachments (85–92%), thalmology as well. For example, Wang et al. demonstrated
infectious keratitis (75–100%), and optic neuritis (71–85%). an 88% correct diagnostic rate in neuro-ophthalmology cases
Yet, the chatbot achieved a relativity low score in the major- by an attending ophthalmologist based on the chief com-
ity of uncommon cases like Vogt-Koyanagi-Harada disease plaint combined with the patient’s history [22]. As so, Chat-
(0–33%), intermediate uveitis (0%), or non-arteritic anterior GPT showed inferior accuracy rates in our study. Yet, these
ischemic optic neuropathy (NAION) (0–50%; Supplemen- diagnostic rates according to the patient’s history alone [22]
tary Table 1). are slightly higher than those achieved by the physicians in
our study. This can be explained by the different methodolo-
gies. In our study, the physicians did not produce the patient
history by themselves nor did they not examine the patient.
Discussion Rather, they rely on a vignette that was given to them with
no ability to acquire more information.
Artificial intelligence (AI) is expected to greatly impact Overall, ChatGPT improved its diagnostic capabilities
healthcare by improving patient care, reducing costs, and after exposure to the clinical examination. However, in
increasing the availability of medical treatment [1]. In oph- several cases, ChatGPT diagnosed correctly the Hx case
thalmology, AI is already being implemented in several but then provided an incorrect diagnosis in the history and
aspects currently mostly research-based, such as image examination (Hx and Ex) scenario. This is a surprising
analysis and interpretation. For example, AI algorithms have finding, given that in some cases the clinical examination
been used to analyze images of the retina to diagnose age- provided a clear diagnosis spelled out in words (such as in
related macular degeneration [7] and diabetic retinopathy retinal detachment, hypotony, corneal perforation), and in
with very high accuracy rates [8]. others, it added common classic clinical findings (such as
Table 3 Diagnosis accuracy rates stratified by subspecialty: ChatGPT, residents, and attendings
Patient history alone Patient history and clinical examination
Study cohort ChatGPT Residents Attendings ChatGPT Residents Attendings
Subspecialty (n =) %
Retina 21 0.33 16 (76%) 17 (81%) 16 (76%) 17 (81%) 21 (100%) 21 (100%)
Cornea 16 0.25 9 (56%) 14 (88%) 15 (94%) 13 (81%) 16 (100%) 15 (94%)
Neuro-ophthalmology 10 0.16 5 (50%) 8 (80%) 6 (60%) 8 (80%) 9 (90%) 9 (90%)
Uveitis 7 0.11 1 (14%) 3 (43%) 2 (29%) 1 (14%) 5 (71%) 4 (57%)
Glaucoma 5 0.08 1 (20%) 2 (40%) 3 (60%) 2 (40%) 3 (60%) 3 (60%)
Oculoplastics 4 0.06 2 (50%) 3 (75%) 3 (75%) 2 (50%) 4 (100%) 3 (75%)
2350 Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352
in anterior uveitis, endophthalmitis). We noted that in some scored a higher diagnosis rate (76%) in relevant cases of his-
cases, the new input from the clinical exam confused the tory and clinical examination (Relevant Hx and Ex). This
AI algorithm and by that changed the correct diagnosis to a is a significant improvement in its capabilities compared to
non-relevant, nevertheless true, diagnosis. For example, in cases in which it was presented with all data (Hx and Ex) and
a case of a classic retinal detachment, ChatGPT diagnosed it achieved a 68% accuracy rate. This is a very interesting find-
correctly based on history. However, following reviewing the ing. As we do not know how ChatGPT’s algorithm works, we
clinical exam, which included pseudoexfoliation of the lens can only hypothesize the potential reasons for this result. One
(PXF), cataract, and high disc-to-cup ratio, it changes the explanation can be the ability to focus on key information. In
diagnosis to PXF syndrome, cataract, and glaucoma. This short cases with relevant information, chatbots can quickly
highlights the inability of ChatGPT to differentiate clinical identify and focus on key data points. In contrast, in detailed
results when it encounters multiple findings. cases, it may struggle to discern the crucial facts amidst the
In the secondary analysis (supplementary Table 2), we abundance of data. Another explanation is the reduced noise
found that ChatGPT achieved a slightly higher diagnosis rate and enhanced ambiguity. Short cases often present a more con-
(57%) in cases with only relevant history compared to the cise and clearer picture of the patient’s symptoms and medi-
cases with full history (54%). In both cases, this is a lower cal history. This reduction in noise and ambiguity makes it
rate compared to residents and attendings. However, ChatGPT easier for the chatbot to process and interpret the information
Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352 2351
accurately. Regardless, this finding underscores the importance diagnosis. However, for now, ChatGPT showed significantly
of optimizing chatbot design to handle varying lengths and inferior diagnostic capabilities in overall performance in the
complexities of input data in the field of ocular diagnosis. field of ophthalmology compared to residents in training and
Regarding the speed of ChatGPT, we found that the AI senior ophthalmologists. Thus, it cannot serve as a replace-
required an average of approximately 20 s to diagnose a case. ment for the human medical professional. Moreover, in the
This time is impressive as compared to humans. Both residents setting of inpatients, residents and attendings should use
and attendings needed 400–500% of that time to complete the caution when seeking a medical diagnosis using ChatGPT.
task.
Abbreviations AI: Artificial intelligence; ChatGPT: Generative Pre-
Another interesting finding was that residents’ overall trained Transformer; Hx: History; Hx and Ex: History and exam;
diagnosis accuracy rate was slightly higher than that of the PXF: Pseudoexfoliation
attendings. Several reasons could explain this performance dif-
ference. First, residents may be actively preparing for board Supplementary Information The online version contains supplemen-
tary material available at https://doi.org/10.1007/s00417-023-06363-z.
exams or other assessments, which might allow them to more
easily present possible differential diagnoses for a given case. Data availability The datasets used and/or analyzed during the current
Other studies have also shown similar biases for younger phy- study are available from the corresponding author upon reasonable
sicians to be able to develop a larger differential diagnosis request.
list [23]. Second, the attendings used are subspecialists in a
given field. As cases were in a range of subjects and fields, it Declarations
is possible that even if an attending was more accurate than a Ethical approval This study was approved by the institutional research
resident in his or her field, he scored slightly less on average committee of Shamir Medical Center (Reference number: ASF-0018–
in general ophthalmology cases [24]. Finally, residents might 23) and was performed in accordance with the ethical standards of the
1964 Helsinki declaration and its later amendments.
approach cases with a more open and unbiased perspective.
However, ultimately, the differences between the two groups This article does not contain any studies with human participants or
animals performed by any of the authors.
were non-significant and therefore might not genuinely repre-
sent a meaningful difference between the groups. Conflict of interest The authors declare no competing interests.
So far, several studies have studied the performance of
ChatGPT mostly using question banks or cases treated in an Informed consent Informed consent was waived by IRB.
outpatient setting [18, 23, 24]. One noteworthy strength of Consent for publication Not applicable.
this study is the focus on hospitalized cases. The heightened
complexity might present more challenges to the model as
well as the increased amount of documentation compared to
some question bank cases or outpatient cases. References
Our study has several limitations. First, the retrospective
nature of our study is based on recent admissions. ChatGPT 1. Briganti G, Le Moine O (2020) Artificial intelligence in medicine:
today and tomorrow. Front Med (Lausanne) 7:1–6. https://d oi.o rg/
was trained on data generated up to 2021, so admissions
10.3389/fmed.2020.00027
made in previous years might result in different outcomes; 2. Kulkarni S, Seneviratne N, Baig MS, Khan AHA (2020) Artifi-
however, we estimate this effect to have little impact. Sec- cial intelligence in medicine: Where are we now? Acad Radiol
ond, the residents and attendings groups comprised of a rela- 27:62–70. https://doi.org/10.1016/J.ACRA.2019.10.001
3. Tekkeşin Aİ (2019) Artificial intelligence in healthcare: past, pre-
tively small sample size and a larger sample might be better
sent and future. Anatol J Cardiol 22:8–9. https://d oi.o rg/1 0.1 4744/
powered to detect smaller differences between groups. Third, AnatolJCardiol.2019.28661
all group’s diagnostic capabilities rely on a given patient 4. Benet D, Pellicer-Valero OJ (2022) Artificial intelligence: the
history and exam findings that were conducted by one expe- unstoppable revolution in ophthalmology. Surv Ophthalmol
67:252–270. https://doi.org/10.1016/J.SURVOPHTHAL.2021.
rienced resident. Thus, the physician that is diagnosing the
03.003
cases uses information generated by another physician, 5. Ting DSW, Pasquale LR, Peng L et al (2019) Artificial intelligence
which might not represent their common everyday practice and deep learning in ophthalmology. Br J Ophthalmol 103:167–
and therefore may alter diagnostic rates. 175. https://doi.org/10.1136/BJOPHTHALMOL-2018-313173
6. Hogarty DT, Mackey DA, Hewitt AW (2019) Current state and
future prospects of artificial intelligence in ophthalmology: a
review. Clin Exp Ophthalmol 47:128–139. https://doi.org/10.
Conclusion 1111/CEO.13381
7. Dong L, Yang Q, Zhang RH, Bin WW (2021) Artificial intel-
ligence for the detection of age-related macular degeneration in
To conclude, AI has the potential to improve medical diag-
color fundus photographs: a systematic review and meta-anal-
nosis and can be used as a supportive tool to increase the ysis. EClinicalMedicine 35:100875. https://doi.org/10.1016/J.
speed, efficiency, and in some cases even the accuracy of ECLINM.2021.100875
2352 Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352
8. Ting DSW, Cheung CYL, Lim G et al (2017) Development and and shortcomings. Ophthalmology Science 3:1–7. https://d oi.o rg/
validation of a deep learning system for diabetic retinopathy and 10.1016/j.xops.2023.100324
related eye diseases using retinal images from multiethnic popu- 19. Balas M, Ing EB (2023) Original articles conversational AI mod-
lations with diabetes. JAMA 318:2211–2223. https://doi.org/10. els for ophthalmic diagnosis: comparison of ChatGPT and the
1001/JAMA.2017.18152 Isabel Pro Differential Diagnosis Generator . JFO Open Ophthal-
9. Potapenko I, Boberg-Ans LC, Stormly Hansen M et al (2023) mology 1:100005. https://doi.org/10.1016/j.jfop.2023.100005
Artificial intelligence-based chatbot patient information on com- 20. Hampton JR, Harrison MJG, Mitchell JRA et al (1975) Relative
mon retinal diseases using ChatGPT. Acta Ophthalmol. https:// contributions of history-taking, physical examination, and laboratory
doi.org/10.1111/AOS.15661 investigation to diagnosis and management of medical outpatients.
10. Singh S, Djalilian A, Ali MJ (2023) ChatGPT and ophthalmology: Br Med J 2:486–489. https://doi.org/10.1136/BMJ.2.5969.486
exploring its potential with discharge summaries and operative 21. Peterson MC, Holbrook JH, Von Hales D et al (1992) Contribu-
notes. Semin Ophthalmol 38:503–507. https://doi.org/10.1080/ tions of the history, physical examination, and laboratory investi-
08820538.2023.2209166 gation in making medical diagnoses. West J Med 156:163
11. Ali MJ (2023) ChatGPT and lacrimal drainage disorders: per- 22. Wang MY, Asanad S, Asanad K et al (2018) Value of medical
formance and scope of improvement. Ophthalmic Plast Reconstr history in ophthalmology: a study of diagnostic accuracy. J Curr
Surg 39:221. https://doi.org/10.1097/IOP.0000000000002418 Ophthalmol 30:359. https://doi.org/10.1016/J.JOCO.2018.09.001
12. Kumar Y, Koul A, Singla R, Ijaz MF (2022) Artificial intelligence 23. St-Onge C, Landry M, Xhignesse M et al (2016) Age-related
in disease diagnosis: a systematic literature review, synthesizing decline and diagnostic performance of more and less prevalent
framework and future research agenda. J Ambient Intell Humaniz clinical cases. Adv Health Sci Educ 21:561–570. https://doi.org/
Comput 1:1. https://doi.org/10.1007/S12652-021-03612-Z 10.1007/S10459-015-9651-8/METRICS
13 Richens JG, Lee CM, Johri S (2020) Improving the accuracy of 24. Caddick ZA, Fraundorf SH, Rottman BM, Nokes-Malach TJ
medical diagnosis with causal machine learning. Nat Commun (2023) Cognitive perspectives on maintaining physicians’ medi-
2020 11:1–9. https://doi.org/10.1038/s41467-020-17419-7 cal expertise: II. Acquiring, maintaining, and updating cognitive
14. Tigga NP, Garg S (2020) Prediction of type 2 diabetes using skills. Cogn Res Princ Implic 8(1):47. https://doi.org/10.1186/
machine learning classification methods. Procedia Comput Sci s41235-023-00497-8
167:706–716. https://doi.org/10.1016/J.PROCS.2020.03.336
15. ChatGPT: optimizing language models for dialogue. https://ope- Publisher's Note Springer Nature remains neutral with regard to
nai.com/blog/chatgpt/. Accessed 14 Jan 2023 jurisdictional claims in published maps and institutional affiliations.
16. Else H (2023) Abstracts written by ChatGPT fool scientists.
Nature. https://doi.org/10.1038/D41586-023-00056-7 Springer Nature or its licensor (e.g. a society or other partner) holds
17. Castelvecchi D (2022) Are ChatGPT and AlphaCode going exclusive rights to this article under a publishing agreement with the
to replace programmers? Nature. https:// d oi. o rg/ 1 0. 1 038/ author(s) or other rightsholder(s); author self-archiving of the accepted
D41586-022-04383-Z manuscript version of this article is solely governed by the terms of
18. Antaki F, Touma S, Milad D et al (2023) Evaluating the perfor- such publishing agreement and applicable law.
mance of ChatGPT in ophthalmology an analysis of its successes
Asaf Shemer1,2 · Michal Cohen1,3 · Aya Altarescu1,2 · Maya Atar‑Vardi1,2 · Idan Hecht1,2 · Biana Dubinsky‑Pertzov1,2 ·
Nadav Shoshany1,2 · Sigal Zmujack1,2 · Lior Or1,2 · Adi Einan‑Lifshitz1,2 · Eran Pras1,2,4
3
* Asaf Shemer Faculty of Health Science, Ben-Gurion University
[email protected] of the Negev, South District, Beer‑Sheva, Israel
4
1 The Matlow’s Ophthalmo‑Genetics Laboratory, Department
Department of Ophthalmology, Shamir Medical Center
of Ophthalmology, Shamir Medical Center (Formerly
(Formerly Assaf-Harofeh), Tzrifin, Israel
Assaf-Harofeh), Tzrifin, Israel
2
Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel