0% found this document useful (0 votes)

36 views8 pages

ChatGPT's Diagnostic Accuracy in Ophthalmology

This study evaluates the diagnostic accuracy of ChatGPT in ophthalmology, comparing its performance to that of residents and attending physicians. ChatGPT achieved a lower accuracy rate of 54% based on patient history alone, which improved to 68% with clinical examination data, while residents and attendings had rates of 75% and 94%, respectively. Despite its lower diagnostic accuracy, ChatGPT was significantly faster in providing responses than the human physicians.

Uploaded by

Alfredo HC

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views8 pages

ChatGPT's Diagnostic Accuracy in Ophthalmology

Uploaded by

Alfredo HC

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352

https://doi.org/10.1007/s00417-023-06363-z

MISCELLANEOUS

Diagnostic capabilities of ChatGPT in ophthalmology

Asaf Shemer1,2 · Michal Cohen1,3 · Aya Altarescu1,2 · Maya Atar‑Vardi1,2 · Idan Hecht1,2 · Biana Dubinsky‑Pertzov1,2 ·
Nadav Shoshany1,2 · Sigal Zmujack1,2 · Lior Or1,2 · Adi Einan‑Lifshitz1,2 · Eran Pras1,2,4

Received: 28 June 2023 / Revised: 4 December 2023 / Accepted: 23 December 2023 / Published online: 6 January 2024
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024

Abstract
Purpose The purpose of this study is to assess the diagnostic accuracy of ChatGPT in the field of ophthalmology.
Methods This is a retrospective cohort study conducted in one academic tertiary medical center. We reviewed data of patients
admitted to the ophthalmology department from 06/2022 to 01/2023. We then created two clinical cases for each patient. The
first case is according to the medical history alone (Hx). The second case includes an addition of the clinical examination
(Hx and Ex). For each case, we asked for the three most likely diagnoses from ChatGPT, residents, and attendings. Then,
we compared the accuracy rates (at least one correct diagnosis) of all groups. Additionally, we evaluated the total duration
for completing the assignment between the groups.
Results ChatGPT, residents, and attendings evaluated 126 cases from 63 patients (history only or history and exam find-
ings for each patient). ChatGPT achieved a significantly lower accurate diagnosis rate (54%) in the Hx, as compared to
the residents (75%; p < 0.01) and attendings (71%; p < 0.01). After adding the clinical examination findings, the diagnosis
rate of ChatGPT was 68%, whereas for the residents and the attendings, it increased to 94% (p < 0.01) and 86% (p < 0.01),
respectively. ChatGPT was 4 to 5 times faster than the attendings and residents.
Conclusions and relevance ChatGPT showed low diagnostic rates in ophthalmology cases compared to residents and attend-
ings based on patient history alone or with additional clinical examination findings. However, ChatGPT completed the task
faster than the physicians.

Key messages

What is known:

Chatbots in medicine can use natural language processing (NLP) to analyze patient symptoms and provide a
diagnosis.
ChatGPT may be able to provide a preliminary diagnosis or offer guidance on what kind of specialist a patient
should see based on their symptoms.
Chatbots have the potential to improve patient access to medical information.

What is new:
In the field of ophthalmology, ChatGPT achieved approximately 50% of correct diagnoses when presented with
patient history.
Currently, ChatGPT scored significantly lower in overall performance compared to both residents and attendings.

Keywords ChatGPT · Diagnosis · Ophthalmology · Residents · Artificial intelligence

Extended author information available on the last page of the article

Vol.:(0123456789)
2346 Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352

Background several reports have been published regarding the accuracy

of ChatGPT in the ophthalmology field [18, 19].
Artificial intelligence (AI) is being utilized in the field In this study, we aim to assess the capabilities of Chat-
of medicine to analyze medical data and gain insights for GPT in diagnosing authentic ophthalmic cases that required
research [1]. Advancements in deep learning and machine hospitalization, based on two scenarios: (1) patient history
learning algorithms have made AI more accessible and and (2) patient history and clinical exam.
useful. AI would likely play a larger role shortly for clini-
cians and being aware of its capabilities and drawbacks is
important [2, 3]. Methods
Currently, common applications of AI in ophthalmol-
ogy include clinical decision support and image analysis Setting, study design, and data sources
[4]. Clinical decision support tools assist healthcare pro-
viders in terms of computerized alerts regarding patients’ This is a retrospective cohort study. We included all adult
medication, making informed decisions via clinical guide- patients (age > 18 years) who were referred to the emergency
lines and suggested diagnostic recommendations, and sup- department and admitted to the ophthalmology department
plying other patients’ needs by providing them with rel- at one tertiary medical center, from June 2022 to January
evant information [4–6]. Image analysis has been used so 2023. Additional inclusion criteria were hospitalization that
far in detecting retinal disorders such as age-related macu- lasted for at least 3 days. Hospitalized patients were cho-
lar degeneration [7] and diabetes retinopathy [8] by ana- sen (and not for example patients with common diagnoses
lyzing color fundus photographs. However, ChatGPT may not admitted to the hospital) for several reasons. First, these
offer several potential applications in the field of ophthal- more complex cases might challenge ChatGPT to a further
mology such as education and training of eye-care profes- extent compared to more common diagnoses that are usu-
sionals and support in research such as analyzing research ally not admitted to the hospital and treated in an outpatient
data, identifying trends, and generating new hypotheses. clinic (e.g., conjunctivitis, corneal foreign body). Second, all
Furthermore, ChatGPT may function as a patient educa- admitted patients are evaluated comprehensively and there-
tion platform that provides information for the patients fore results in detailed documentation. Finally, this meth-
and the general public at relatively high accuracy, with odology enables the use of the discharge diagnoses as the
wide accessibility and no costs. This includes providing correct diagnosis and ensures they are as reliable as possible.
information about ophthalmic conditions, education about We excluded patients who were admitted for elective sur-
ways of prevention, supplying treatment options, generat- gery and patients who were re-hospitalized during the past
ing discharge summaries and operative notes, and stating year. Records that did not contain the prespecified required
the possible prognosis [9, 10]. However, in terms of diag- information (chief complaint and history of present illness,
nosis assistance, its capabilities are limited [11]. a slit-lamp examination including intraocular pressure meas-
Anamnesis, also known as a patient’s medical history, is urement) were excluded from the study.
a fundamental part of the diagnostic process in medicine. Each patient’s medical records were reviewed, and demo-
Some studies have shown that AI can be trained to extract graphic and clinical data were collected. We use electronic
relevant information from electronic medical records, and medical records (EMRs) within the ophthalmology depart-
then utilize the data to suggest diagnoses or to predict out- ment. The medical records used were created by a single
comes [12]. For example, AI algorithms have been used to experienced ophthalmology resident who is in charge of
review information from health records and make predic- inpatient admissions in the department. He collected
tions about the risk of certain diseases, such as cardiovas- patient’s medical data and history and conducted the physi-
cular disease or diabetes [13, 14]. cal examination. In our department, patient history and pre-
ChatGPT (Generative Pre-trained Transformer) [15] is sent illness are presented in free text and the clinical exami-
a large language model that was developed by OpenAI and nation findings are presented in structured bullets.
is open-source. Its purpose is to understand and generate Each admission case included the patient’s age and gen-
natural language, and by doing so to assist with different der, past medical history, past surgical history, medications
tasks such as answering questions and providing informa- and allergies, past ocular history, chief complaint, and his-
tion. It was designed to interact and communicate with tory of present illness. A full ophthalmic examination of
people in socialized ways similar to how a person would both eyes is also part of the admission policy, even when
[15, 16]. On 30 November 2022, ChatGPT was released the pathology does not strictly require it. Clinical examina-
to the public. Users worldwide were invited to engage tion included distance-corrected visual acuity, ocular motil-
with the AI chatbot to experience the service [17]. Since, ity, pupillary responses, relative afferent pupillary defect,
tonometry, and a slit-lamp examination before and after
Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352 2347

pupil dilation including the posterior segment. In selected supplementary analysis specifically focuses on relevant his-
cases, according to the physician’s judgment, the patients tory (Relevant Hx) and relevant history and examination
were tested for color vision, confrontational visual fields, (Relevant Hx and Ex). In this analysis, we converted all
cover-uncover test, alternating cover test, etc. cases that would be pertinent when a resident is presenting
a case to a supervising physician/attending. The rationale
Clinical scenarios behind this is grounded in the conventional clinical practice
where only relevant information is typically presented to a
We created two clinical cases for each patient. The first case higher-ranking medical professional. Then we asked Chat-
(History (“Hx”)) included the patient’s age and gender, GPT-3.5 the same prompt (“What are the three most likely
past medical history, past surgical history, medications and diagnoses for the following case”) following relevant history
allergies, past ocular history, chief complaint, and history or followed by relevant history and clinical findings. The
of present illness. The second case (History and Examina- answers were documented.
tion (“Hx and Ex”)) encompassed an additional description Also, the total amount of time for answering all cases
of a complete ophthalmology exam, including visual acu- was recorded for each group (ChatGPT, residents and
ity, extraocular motility, pupillary responses, relative affer- attendings).
ent pupillary defect, intraocular pressure, and a slit lamp
examination with documentation of the lids/lashes/lacrimal Outcomes
system, conjunctiva/sclera, cornea, anterior chamber, iris,
lens, vitreous, optic nerve, macula, vessels, and periphery. The primary outcome was to assess the correct diagnostic
Both sections were created retrospectively according to the prediction rate of ChatGPT as compared to both the resi-
medical records. Patient information was presented in bulk dents and the attendings. For that, we used the final diagno-
for the history part and clinical examination findings were sis of the true patients at discharge from our department as
presented in. the correct diagnosis (gold standard). We calculated the rate
For each case, we asked the ChatGPT-3.5 (https://o penai. of the correct assumed diagnosis for all Hx cases alone and
com/blog/chatgpt; ChatGPT March 2023 Version) the exact the Hx and Ex cases separately for ChatGPT. For the resi-
same prompt—“What are the three most likely diagnoses for dents group and attendings group, we counted a diagnosis as
the following case” followed by the history part as free text correct only if at least two physicians included it.
in the Hx cases or followed by both the history and clinical Secondary outcomes included a comparison of the total
findings in the in Hx and Ex cases. The answers were docu- duration of time that ChatGPT, residents, and attendings
mented. A new chat was started for every Hx case or Hx and used to complete the task.
Ex case. In the same manner, the cases were presented sepa-
rately to three ophthalmology residents (residents group) and Ethical considerations
three well-experienced senior ophthalmologists (attendings
group). Senior physicians were collectively described as The study was conducted in accordance with the tenets of
“attendings.” This group consisted of an anterior segment the Declaration of Helsinki, and approval was obtained from
specialist, a posterior segment specialist, and a general oph- the institutional review board (ASF-0018–23). Informed
thalmologist. All three have over 20 years’ experience in consent was waived for this study by the institutional review
clinical practice and are part of the regular attending staff board. To maintain anonymity and protect the privacy of our
of the department. Only physicians who were not involved patients, only the two researchers (A.S and M.C) who col-
in the diagnosis or treatment of any of the cases included in lected the data from the electronic medical files were able
the study were allowed to participate in the study. The study to view patients’ personal details. The rest of the researchers
design specified that the accuracy of ambiguous diagnosis got access only to the necessary clinical data. Moreover, all
would be settled with consensus by two blinded ophthal- cases presented to ChatGPT and the physicians were anony-
mologist assessors. We decided on a conservative approach mous in nature, and no identifying information of any kind
where only the exact diagnosis would be acceptable, allow- was used.
ing only synonyms for the same condition. The specified
diagnosis was required to accurately reflect the underlying Statistical analysis
condition of the patient and accurately reflect the severity of
the patient’s condition and should guide appropriate treat- Statistical analysis was performed using IBM SPSS Statis-
ment decisions. tics 25 (IBM Corp. Armonk, NY). Continuous variables
In addition to the initial analyses involving both the his- were analyzed into descriptive statistics such as mean,
tory (Hx) and the combination of history and examination median, and standard deviation. A comparison between the
(Hx and Ex), an additional analysis was introduced. This groups was conducted using the chi-square test and Fisher’s
2348 Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352

exact test for binary variables. All statistical tests were two- most common diagnoses were retinal detachment, infectious
tailed. p < 0.05 was considered statistically significant. keratitis, and optic neuritis (Table 1). The final and true diag-
nosis in most cases was from the retinal field (33.3%) followed
by cornea (25%), neuro-ophthalmology (16%), uveitis (11%),
Results glaucoma (8%), and oculoplastics (6%).
In the Hx cases, we found that ChatGPT achieved a sta-
A total of 126 cases (63 patients) were included in the analy- tistically significant lower diagnosis rate of 54% (34 cases),
sis (Fig. 1). The mean age was 51.2 ± 17.7 years and 54% were as compared to the residents with 75% (47 cases; p<0.01)
male. Diagnoses spanned a wide range of disorders. The three and the attendings with 71% (45 cases; p<0.01; Table 2).

Table 1 Baseline characteristics Study cohort

and true diagnosis of the study
group (n = 63) %
Age, average (SD), years 51.2 (17.7)
Male 34 (54%)
Duration of admission, average (SD), days 4.9 (2.1)
Diagnoses
Retinal detachment 13 20.63%
Infectious keratitis 8 12.70%
Optic neuritis 7 11.11%
Corneal perforation 3 4.76%
Vogt-Koyanagi-Harada disease 3 4.76%
Anterior uveitis 2 3.17%
Central retinal artery occlusion 2 3.17%
Chemical corneal exposure 2 3.17%
Endogenous endophthalmitis 2 3.17%
Exogenous endophthalmitis 2 3.17%
Intermediate uveitis 2 3.17%
Non-arteritic anterior ischemic optic neuropathy (NAION) 2 3.17%
Acute angle closure glaucoma 1 1.59%
Blebitis 1 1.59%
Combined central retinal artery occlusion and central retinal venous 1 1.59%
occlusion
Corneal epithelial herpes infection 1 1.59%
Corneal stromal keratitis 1 1.59%
Globe perforation 1 1.59%
Non-healing epithelial defect 1 1.59%
Ocular foreign body 1 1.59%
Ocular hypertension 1 1.59%
Ocular hypotony 1 1.59%
Optic atrophy 1 1.59%
Orbital vascular malformation 1 1.59%
Preseptal orbital cellulitis 1 1.59%
Subretinal hemorrhage 1 1.59%
Uveitis-glaucoma-hyphema syndrome 1 1.59%

Table 2 Diagnosis accuracy ChatGPT Residents Attendings p-value

rates: ChatGPT, residents, and
attendings Patient history alone 34 (56%) 47 (75%) 45 (71%) < 0.01
Patient history and clinical examination 43 (68%) 58 (94%) 55 (87%) < 0.01
Time to diagnose all cases, minutes 42.41 230 193.33 < 0.01
Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352 2349

Furthermore, when adding the clinical examination, Chat- However, the ability of AI to reach medical diagnoses
GPT scored 68% (43 cases) correctly, whereas the residents based on history alone is an area of active development. In
and the attendings achieved significantly higher rates of 94% this study, we aimed to assess the capabilities of ChatGPT
(59 cases; p<0.01), and 86% (54 cases; p<0.01, Table 2). to diagnose ophthalmologic entities based on real-world
While the accurate diagnosis rate of ChatGPT declined data—history with or without clinical examination data of
between the Hx cases and the Hx and Ex, both the residents 63 patients admitted to our ophthalmology department (Sup-
and the attendings significantly increased the diagnostic plemental Fig. 1). The results show that the rates of correct
rates (p < 0.01). However, ChatGPT achieved a high accu- diagnosis for ChatGPT were 54–68%. We compared Chat-
racy diagnosis rate in both Relevant Hx (57%) and Relevant GPT diagnostic capabilities to physicians in training and to
Hx and Ex (76%; supplementary Table 2). senior ophthalmologists. The residents and the attendings
The aggregate duration required for ChatGPT to produce achieved significantly higher diagnostic rates in all of the
diagnoses for all 126 cases was 42 min and 41 s, which is assessments.
equivalent to an average of 20.19 s per case. This represents In general medicine, studies have reinforced the idea that
a speed improvement of 5.3 times when compared to the a high percentage of patient diagnoses (76–82.5%) can be
residents and 4.5 times when compared to the attendings accurately made using only medical history [20, 21]. Oph-
(Table 2). thalmology, however, is seen as a field that heavily relies on
Overall, ChatGPT was most successful in solving retinal pattern recognition and visual exam input. Yet, the system-
cases (76–81%) and corneal cases (56–81%, Table 3). Chat- atic collection of a comprehensive patient’s medical history
GPT had relatively high diagnostic rates in common oph- still plays an important role in the diagnostic process in oph-
thalmology entities, such as retinal detachments (85–92%), thalmology as well. For example, Wang et al. demonstrated
infectious keratitis (75–100%), and optic neuritis (71–85%). an 88% correct diagnostic rate in neuro-ophthalmology cases
Yet, the chatbot achieved a relativity low score in the major- by an attending ophthalmologist based on the chief com-
ity of uncommon cases like Vogt-Koyanagi-Harada disease plaint combined with the patient’s history [22]. As so, Chat-
(0–33%), intermediate uveitis (0%), or non-arteritic anterior GPT showed inferior accuracy rates in our study. Yet, these
ischemic optic neuropathy (NAION) (0–50%; Supplemen- diagnostic rates according to the patient’s history alone [22]
tary Table 1). are slightly higher than those achieved by the physicians in
our study. This can be explained by the different methodolo-
gies. In our study, the physicians did not produce the patient
history by themselves nor did they not examine the patient.
Discussion Rather, they rely on a vignette that was given to them with
no ability to acquire more information.
Artificial intelligence (AI) is expected to greatly impact Overall, ChatGPT improved its diagnostic capabilities
healthcare by improving patient care, reducing costs, and after exposure to the clinical examination. However, in
increasing the availability of medical treatment [1]. In oph- several cases, ChatGPT diagnosed correctly the Hx case
thalmology, AI is already being implemented in several but then provided an incorrect diagnosis in the history and
aspects currently mostly research-based, such as image examination (Hx and Ex) scenario. This is a surprising
analysis and interpretation. For example, AI algorithms have finding, given that in some cases the clinical examination
been used to analyze images of the retina to diagnose age- provided a clear diagnosis spelled out in words (such as in
related macular degeneration [7] and diabetic retinopathy retinal detachment, hypotony, corneal perforation), and in
with very high accuracy rates [8]. others, it added common classic clinical findings (such as

Table 3 Diagnosis accuracy rates stratified by subspecialty: ChatGPT, residents, and attendings
Patient history alone Patient history and clinical examination
Study cohort ChatGPT Residents Attendings ChatGPT Residents Attendings

Subspecialty (n =) %
Retina 21 0.33 16 (76%) 17 (81%) 16 (76%) 17 (81%) 21 (100%) 21 (100%)
Cornea 16 0.25 9 (56%) 14 (88%) 15 (94%) 13 (81%) 16 (100%) 15 (94%)
Neuro-ophthalmology 10 0.16 5 (50%) 8 (80%) 6 (60%) 8 (80%) 9 (90%) 9 (90%)
Uveitis 7 0.11 1 (14%) 3 (43%) 2 (29%) 1 (14%) 5 (71%) 4 (57%)
Glaucoma 5 0.08 1 (20%) 2 (40%) 3 (60%) 2 (40%) 3 (60%) 3 (60%)
Oculoplastics 4 0.06 2 (50%) 3 (75%) 3 (75%) 2 (50%) 4 (100%) 3 (75%)
2350 Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352

Fig. 1 Selection of the study

cohort

in anterior uveitis, endophthalmitis). We noted that in some scored a higher diagnosis rate (76%) in relevant cases of his-
cases, the new input from the clinical exam confused the tory and clinical examination (Relevant Hx and Ex). This
AI algorithm and by that changed the correct diagnosis to a is a significant improvement in its capabilities compared to
non-relevant, nevertheless true, diagnosis. For example, in cases in which it was presented with all data (Hx and Ex) and
a case of a classic retinal detachment, ChatGPT diagnosed it achieved a 68% accuracy rate. This is a very interesting find-
correctly based on history. However, following reviewing the ing. As we do not know how ChatGPT’s algorithm works, we
clinical exam, which included pseudoexfoliation of the lens can only hypothesize the potential reasons for this result. One
(PXF), cataract, and high disc-to-cup ratio, it changes the explanation can be the ability to focus on key information. In
diagnosis to PXF syndrome, cataract, and glaucoma. This short cases with relevant information, chatbots can quickly
highlights the inability of ChatGPT to differentiate clinical identify and focus on key data points. In contrast, in detailed
results when it encounters multiple findings. cases, it may struggle to discern the crucial facts amidst the
In the secondary analysis (supplementary Table 2), we abundance of data. Another explanation is the reduced noise
found that ChatGPT achieved a slightly higher diagnosis rate and enhanced ambiguity. Short cases often present a more con-
(57%) in cases with only relevant history compared to the cise and clearer picture of the patient’s symptoms and medi-
cases with full history (54%). In both cases, this is a lower cal history. This reduction in noise and ambiguity makes it
rate compared to residents and attendings. However, ChatGPT easier for the chatbot to process and interpret the information
Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352 2351

accurately. Regardless, this finding underscores the importance diagnosis. However, for now, ChatGPT showed significantly
of optimizing chatbot design to handle varying lengths and inferior diagnostic capabilities in overall performance in the
complexities of input data in the field of ocular diagnosis. field of ophthalmology compared to residents in training and
Regarding the speed of ChatGPT, we found that the AI senior ophthalmologists. Thus, it cannot serve as a replace-
required an average of approximately 20 s to diagnose a case. ment for the human medical professional. Moreover, in the
This time is impressive as compared to humans. Both residents setting of inpatients, residents and attendings should use
and attendings needed 400–500% of that time to complete the caution when seeking a medical diagnosis using ChatGPT.
task.
Abbreviations AI: Artificial intelligence; ChatGPT: Generative Pre-
Another interesting finding was that residents’ overall trained Transformer; Hx: History; Hx and Ex: History and exam;
diagnosis accuracy rate was slightly higher than that of the PXF: Pseudoexfoliation
attendings. Several reasons could explain this performance dif-
ference. First, residents may be actively preparing for board Supplementary Information The online version contains supplemen-
tary material available at https://doi.org/10.1007/s00417-023-06363-z.
exams or other assessments, which might allow them to more
easily present possible differential diagnoses for a given case. Data availability The datasets used and/or analyzed during the current
Other studies have also shown similar biases for younger phy- study are available from the corresponding author upon reasonable
sicians to be able to develop a larger differential diagnosis request.
list [23]. Second, the attendings used are subspecialists in a
given field. As cases were in a range of subjects and fields, it Declarations
is possible that even if an attending was more accurate than a Ethical approval This study was approved by the institutional research
resident in his or her field, he scored slightly less on average committee of Shamir Medical Center (Reference number: ASF-0018–
in general ophthalmology cases [24]. Finally, residents might 23) and was performed in accordance with the ethical standards of the
1964 Helsinki declaration and its later amendments.
approach cases with a more open and unbiased perspective.
However, ultimately, the differences between the two groups This article does not contain any studies with human participants or
animals performed by any of the authors.
were non-significant and therefore might not genuinely repre-
sent a meaningful difference between the groups. Conflict of interest The authors declare no competing interests.
So far, several studies have studied the performance of
ChatGPT mostly using question banks or cases treated in an Informed consent Informed consent was waived by IRB.
outpatient setting [18, 23, 24]. One noteworthy strength of Consent for publication Not applicable.
this study is the focus on hospitalized cases. The heightened
complexity might present more challenges to the model as
well as the increased amount of documentation compared to
some question bank cases or outpatient cases. References
Our study has several limitations. First, the retrospective
nature of our study is based on recent admissions. ChatGPT 1. Briganti G, Le Moine O (2020) Artificial intelligence in medicine:
today and tomorrow. Front Med (Lausanne) 7:1–6. https://d oi.o rg/
was trained on data generated up to 2021, so admissions
10.3389/fmed.2020.00027
made in previous years might result in different outcomes; 2. Kulkarni S, Seneviratne N, Baig MS, Khan AHA (2020) Artifi-
however, we estimate this effect to have little impact. Sec- cial intelligence in medicine: Where are we now? Acad Radiol
ond, the residents and attendings groups comprised of a rela- 27:62–70. https://doi.org/10.1016/J.ACRA.2019.10.001
3. Tekkeşin Aİ (2019) Artificial intelligence in healthcare: past, pre-
tively small sample size and a larger sample might be better
sent and future. Anatol J Cardiol 22:8–9. https://d oi.o rg/1 0.1 4744/
powered to detect smaller differences between groups. Third, AnatolJCardiol.2019.28661
all group’s diagnostic capabilities rely on a given patient 4. Benet D, Pellicer-Valero OJ (2022) Artificial intelligence: the
history and exam findings that were conducted by one expe- unstoppable revolution in ophthalmology. Surv Ophthalmol
67:252–270. https://doi.org/10.1016/J.SURVOPHTHAL.2021.
rienced resident. Thus, the physician that is diagnosing the
03.003
cases uses information generated by another physician, 5. Ting DSW, Pasquale LR, Peng L et al (2019) Artificial intelligence
which might not represent their common everyday practice and deep learning in ophthalmology. Br J Ophthalmol 103:167–
and therefore may alter diagnostic rates. 175. https://doi.org/10.1136/BJOPHTHALMOL-2018-313173
6. Hogarty DT, Mackey DA, Hewitt AW (2019) Current state and
future prospects of artificial intelligence in ophthalmology: a
review. Clin Exp Ophthalmol 47:128–139. https://doi.org/10.
Conclusion 1111/CEO.13381
7. Dong L, Yang Q, Zhang RH, Bin WW (2021) Artificial intel-
ligence for the detection of age-related macular degeneration in
To conclude, AI has the potential to improve medical diag-
color fundus photographs: a systematic review and meta-anal-
nosis and can be used as a supportive tool to increase the ysis. EClinicalMedicine 35:100875. https://doi.org/10.1016/J.
speed, efficiency, and in some cases even the accuracy of ECLINM.2021.100875
2352 Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352

8. Ting DSW, Cheung CYL, Lim G et al (2017) Development and and shortcomings. Ophthalmology Science 3:1–7. https://d oi.o rg/
validation of a deep learning system for diabetic retinopathy and 10.1016/j.xops.2023.100324
related eye diseases using retinal images from multiethnic popu- 19. Balas M, Ing EB (2023) Original articles conversational AI mod-
lations with diabetes. JAMA 318:2211–2223. https://doi.org/10. els for ophthalmic diagnosis: comparison of ChatGPT and the
1001/JAMA.2017.18152 Isabel Pro Differential Diagnosis Generator . JFO Open Ophthal-
9. Potapenko I, Boberg-Ans LC, Stormly Hansen M et al (2023) mology 1:100005. https://doi.org/10.1016/j.jfop.2023.100005
Artificial intelligence-based chatbot patient information on com- 20. Hampton JR, Harrison MJG, Mitchell JRA et al (1975) Relative
mon retinal diseases using ChatGPT. Acta Ophthalmol. https:// contributions of history-taking, physical examination, and laboratory
doi.org/10.1111/AOS.15661 investigation to diagnosis and management of medical outpatients.
10. Singh S, Djalilian A, Ali MJ (2023) ChatGPT and ophthalmology: Br Med J 2:486–489. https://doi.org/10.1136/BMJ.2.5969.486
exploring its potential with discharge summaries and operative 21. Peterson MC, Holbrook JH, Von Hales D et al (1992) Contribu-
notes. Semin Ophthalmol 38:503–507. https://doi.org/10.1080/ tions of the history, physical examination, and laboratory investi-
08820538.2023.2209166 gation in making medical diagnoses. West J Med 156:163
11. Ali MJ (2023) ChatGPT and lacrimal drainage disorders: per- 22. Wang MY, Asanad S, Asanad K et al (2018) Value of medical
formance and scope of improvement. Ophthalmic Plast Reconstr history in ophthalmology: a study of diagnostic accuracy. J Curr
Surg 39:221. https://doi.org/10.1097/IOP.0000000000002418 Ophthalmol 30:359. https://doi.org/10.1016/J.JOCO.2018.09.001
12. Kumar Y, Koul A, Singla R, Ijaz MF (2022) Artificial intelligence 23. St-Onge C, Landry M, Xhignesse M et al (2016) Age-related
in disease diagnosis: a systematic literature review, synthesizing decline and diagnostic performance of more and less prevalent
framework and future research agenda. J Ambient Intell Humaniz clinical cases. Adv Health Sci Educ 21:561–570. https://doi.org/
Comput 1:1. https://doi.org/10.1007/S12652-021-03612-Z 10.1007/S10459-015-9651-8/METRICS
13 Richens JG, Lee CM, Johri S (2020) Improving the accuracy of 24. Caddick ZA, Fraundorf SH, Rottman BM, Nokes-Malach TJ
medical diagnosis with causal machine learning. Nat Commun (2023) Cognitive perspectives on maintaining physicians’ medi-
2020 11:1–9. https://doi.org/10.1038/s41467-020-17419-7 cal expertise: II. Acquiring, maintaining, and updating cognitive
14. Tigga NP, Garg S (2020) Prediction of type 2 diabetes using skills. Cogn Res Princ Implic 8(1):47. https://doi.org/10.1186/
machine learning classification methods. Procedia Comput Sci s41235-023-00497-8
167:706–716. https://doi.org/10.1016/J.PROCS.2020.03.336
15. ChatGPT: optimizing language models for dialogue. https://ope- Publisher's Note Springer Nature remains neutral with regard to
nai.com/blog/chatgpt/. Accessed 14 Jan 2023 jurisdictional claims in published maps and institutional affiliations.
16. Else H (2023) Abstracts written by ChatGPT fool scientists.
Nature. https://doi.org/10.1038/D41586-023-00056-7 Springer Nature or its licensor (e.g. a society or other partner) holds
17. Castelvecchi D (2022) Are ChatGPT and AlphaCode going exclusive rights to this article under a publishing agreement with the
to replace programmers? Nature. https:// d oi. o rg/ 1 0. 1 038/ author(s) or other rightsholder(s); author self-archiving of the accepted
D41586-022-04383-Z manuscript version of this article is solely governed by the terms of
18. Antaki F, Touma S, Milad D et al (2023) Evaluating the perfor- such publishing agreement and applicable law.
mance of ChatGPT in ophthalmology an analysis of its successes

Authors and Affiliations

3
* Asaf Shemer Faculty of Health Science, Ben-Gurion University
[email protected] of the Negev, South District, Beer‑Sheva, Israel
4
1 The Matlow’s Ophthalmo‑Genetics Laboratory, Department
Department of Ophthalmology, Shamir Medical Center
of Ophthalmology, Shamir Medical Center (Formerly
(Formerly Assaf-Harofeh), Tzrifin, Israel
Assaf-Harofeh), Tzrifin, Israel
2
Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel

1 s2.0 S2666914524001362 Main
No ratings yet
1 s2.0 S2666914524001362 Main
10 pages
Uso de Chat GPT en Oftalmologia
No ratings yet
Uso de Chat GPT en Oftalmologia
9 pages
ChatGPT and Ophthalmology
No ratings yet
ChatGPT and Ophthalmology
3 pages
Integrated Visual and Text-Based Analysis of Ophthalmology Clinical Cases Using A Large Language Model
No ratings yet
Integrated Visual and Text-Based Analysis of Ophthalmology Clinical Cases Using A Large Language Model
7 pages
Performance of Generative Large Language Models On Ophthalmology Board-Style Questions
No ratings yet
Performance of Generative Large Language Models On Ophthalmology Board-Style Questions
9 pages
ChatGPT in Medical AI Development
No ratings yet
ChatGPT in Medical AI Development
6 pages
Applications of Artificial Intelligence To Electronic Health Record Data in Ophthalmology
No ratings yet
Applications of Artificial Intelligence To Electronic Health Record Data in Ophthalmology
15 pages
ChatGPT & Vertex AI for Diabetic Retinopathy Detection
No ratings yet
ChatGPT & Vertex AI for Diabetic Retinopathy Detection
8 pages
AI in Ophthalmology: Clinical Path
No ratings yet
AI in Ophthalmology: Clinical Path
20 pages
O1 For Eye
No ratings yet
O1 For Eye
23 pages
Diagnostic Decisions of Specialist Optometrists Exposed To Ambiguous Deep Learning Outputs
No ratings yet
Diagnostic Decisions of Specialist Optometrists Exposed To Ambiguous Deep Learning Outputs
12 pages
AI-assisted Ophthalmic Imaging For Early Detection of Neurodegenerative Diseases
No ratings yet
AI-assisted Ophthalmic Imaging For Early Detection of Neurodegenerative Diseases
10 pages
Assessing The Utility of ChatGPT Throughout The Entire Clinical Workflow
No ratings yet
Assessing The Utility of ChatGPT Throughout The Entire Clinical Workflow
15 pages
Final Revision of Octane Paper For Nature Medicine
No ratings yet
Final Revision of Octane Paper For Nature Medicine
41 pages
OCT-based Diagnosis of Glaucoma and Glaucoma Stages Using Explainable Machine Learning
No ratings yet
OCT-based Diagnosis of Glaucoma and Glaucoma Stages Using Explainable Machine Learning
20 pages
Large Language Model (LLM) - Driven Chatbots For Neuro-Ophthalmic Medical Education
No ratings yet
Large Language Model (LLM) - Driven Chatbots For Neuro-Ophthalmic Medical Education
3 pages
Opo 45 437
No ratings yet
Opo 45 437
13 pages
Why Artifical Intelligence Models Struggle With Glaucoma Detection in Real-World Settings?
No ratings yet
Why Artifical Intelligence Models Struggle With Glaucoma Detection in Real-World Settings?
4 pages
1 s2.0 S2666914524002173 Main
No ratings yet
1 s2.0 S2666914524002173 Main
49 pages
Deep Learning and Computer Vision For Glaucoma Detection: A Review
No ratings yet
Deep Learning and Computer Vision For Glaucoma Detection: A Review
20 pages
Latest Developments of Generative Artificial Intelligence and Applications in Ophthalmology
No ratings yet
Latest Developments of Generative Artificial Intelligence and Applications in Ophthalmology
11 pages
Chatmyopia: An Ai Agent For Pre-Consultation Education in Primary Eye Care Settings
No ratings yet
Chatmyopia: An Ai Agent For Pre-Consultation Education in Primary Eye Care Settings
35 pages
AI Advances in Glaucoma Diagnosis
No ratings yet
AI Advances in Glaucoma Diagnosis
19 pages
Exploring Large Language Model For Next Generation of Artificial Intelligence in Ophthalmology
No ratings yet
Exploring Large Language Model For Next Generation of Artificial Intelligence in Ophthalmology
9 pages
Glaucoma
No ratings yet
Glaucoma
12 pages
20250731-A Survey of Multimodal Ophthalmic Diagnostics From Task-Specific Approaches To Foundational Models-2508.03734v1
No ratings yet
20250731-A Survey of Multimodal Ophthalmic Diagnostics From Task-Specific Approaches To Foundational Models-2508.03734v1
25 pages
En Oftalmologie 01 04 2024
No ratings yet
En Oftalmologie 01 04 2024
8 pages
157968.1 20220523120553 Covered
No ratings yet
157968.1 20220523120553 Covered
11 pages
5658 Phat
No ratings yet
5658 Phat
21 pages
Advancements in Deep Learning For Automated Diagnosis of Ophthalmic Diseases A Comprehensive Review
No ratings yet
Advancements in Deep Learning For Automated Diagnosis of Ophthalmic Diseases A Comprehensive Review
20 pages
Article 1
No ratings yet
Article 1
8 pages
Vox Sanguinis - 2025 - McBride - Can Medical Students Use Artificial Intelligence To Learn Transfusion Evaluating ChatGPT
No ratings yet
Vox Sanguinis - 2025 - McBride - Can Medical Students Use Artificial Intelligence To Learn Transfusion Evaluating ChatGPT
6 pages
Computer Vision For Eye Diseases Detection Using P
No ratings yet
Computer Vision For Eye Diseases Detection Using P
11 pages
AI in Ophthalmology: Gemini vs ChatGPT
No ratings yet
AI in Ophthalmology: Gemini vs ChatGPT
6 pages
Propuesta de Investigación Biomedica
No ratings yet
Propuesta de Investigación Biomedica
11 pages
AI's Impact on Ophthalmology
No ratings yet
AI's Impact on Ophthalmology
6 pages
Bias and Inaccuracy in AI Chatbot Ophthalmologist
No ratings yet
Bias and Inaccuracy in AI Chatbot Ophthalmologist
9 pages
15893-Main Text-86149-1-10-20240916
No ratings yet
15893-Main Text-86149-1-10-20240916
14 pages
Early Detection of Eye Disease Using CNN
No ratings yet
Early Detection of Eye Disease Using CNN
10 pages
Faithful AI in Healthcare and Medicine
No ratings yet
Faithful AI in Healthcare and Medicine
18 pages
Artificial Intelligence in Glaucoma - Opportunities, Challenges, and Future Directions
No ratings yet
Artificial Intelligence in Glaucoma - Opportunities, Challenges, and Future Directions
48 pages
Compact Model for Glaucoma Detection
No ratings yet
Compact Model for Glaucoma Detection
11 pages
Revolutionizing Oral and Maxillofacial Surgery: Chatgpt S Impact On Decision Support, Patient Communication, and Continuing Education
No ratings yet
Revolutionizing Oral and Maxillofacial Surgery: Chatgpt S Impact On Decision Support, Patient Communication, and Continuing Education
3 pages
Sem J
No ratings yet
Sem J
25 pages
Evaluating ChatGPT As An Adjunct For Radiologic Decision-Making
No ratings yet
Evaluating ChatGPT As An Adjunct For Radiologic Decision-Making
12 pages
Digital Twin Models For Personalised and Predictive Medicine
No ratings yet
Digital Twin Models For Personalised and Predictive Medicine
25 pages
AI in Ophthalmology
No ratings yet
AI in Ophthalmology
8 pages
Deep Learning for Eye Disease Classification
No ratings yet
Deep Learning for Eye Disease Classification
6 pages
OPTH 438127 Artificial Intelligence in Ophthalmic Surgery Current Appli
No ratings yet
OPTH 438127 Artificial Intelligence in Ophthalmic Surgery Current Appli
13 pages
Application of CARE System - A Nitional Real World Evidence studyPIIS2589750021000868
No ratings yet
Application of CARE System - A Nitional Real World Evidence studyPIIS2589750021000868
10 pages
Artificial Intelligence For Glaucoma: State of The Art
No ratings yet
Artificial Intelligence For Glaucoma: State of The Art
7 pages
AI in Glaucoma Biofluid Marker Analysis
No ratings yet
AI in Glaucoma Biofluid Marker Analysis
18 pages
ITPSG03
No ratings yet
ITPSG03
45 pages
Final BIF
No ratings yet
Final BIF
84 pages
Lessons Learned From Designing An AI-Enabled Diagnosis Tool For Pathologists
No ratings yet
Lessons Learned From Designing An AI-Enabled Diagnosis Tool For Pathologists
25 pages
A Multimodal Generative AI Copilot For Human Patho
No ratings yet
A Multimodal Generative AI Copilot For Human Patho
26 pages
Exhibit19 Mares
No ratings yet
Exhibit19 Mares
11 pages
Patient Medical Record Template - PDF Templates - Jotform PDF
No ratings yet
Patient Medical Record Template - PDF Templates - Jotform PDF
15 pages
Anamnesis in Patient Care
No ratings yet
Anamnesis in Patient Care
11 pages
SBARQ Form
100% (5)
SBARQ Form
1 page
OET Writing Tips
100% (8)
OET Writing Tips
21 pages
Drug Informatio-Assign Edited
No ratings yet
Drug Informatio-Assign Edited
5 pages
105e3b07 Muscle Guide General Principles of Return To Play From Muscle Injury
100% (3)
105e3b07 Muscle Guide General Principles of Return To Play From Muscle Injury
47 pages
Assessment 4 DevelopResearchPlan (4010)
No ratings yet
Assessment 4 DevelopResearchPlan (4010)
8 pages
Case History
No ratings yet
Case History
50 pages
LWW Nursing Catalogue 2009
0% (1)
LWW Nursing Catalogue 2009
80 pages
History Taking
No ratings yet
History Taking
11 pages
Clinical Worksheet-Summer 08 REVISED
No ratings yet
Clinical Worksheet-Summer 08 REVISED
17 pages
ICCR Scholarships Application 2017-18
No ratings yet
ICCR Scholarships Application 2017-18
17 pages
New Biophilia Tracker X4 Training Manual
No ratings yet
New Biophilia Tracker X4 Training Manual
185 pages
Nursing Documentation Examples and Formats
No ratings yet
Nursing Documentation Examples and Formats
10 pages
Soap Note Format
No ratings yet
Soap Note Format
3 pages
Clinical Case Study: Dyspnea & CHF
No ratings yet
Clinical Case Study: Dyspnea & CHF
33 pages
Student Case Study Example Template Free Download PDF
No ratings yet
Student Case Study Example Template Free Download PDF
14 pages
6 Chapter Six Templates Forms and Checklists V2
100% (3)
6 Chapter Six Templates Forms and Checklists V2
85 pages
A Grade OET Letters
100% (15)
A Grade OET Letters
15 pages
Medical Jurisprudence
No ratings yet
Medical Jurisprudence
8 pages
A) Normal Referral Letter:: Most Likely, Grave's Disease
No ratings yet
A) Normal Referral Letter:: Most Likely, Grave's Disease
8 pages
888 Pedia Focus
No ratings yet
888 Pedia Focus
5 pages
OET Writing Guide for Nurses
No ratings yet
OET Writing Guide for Nurses
5 pages
Kiiko Style Basics PP
100% (7)
Kiiko Style Basics PP
17 pages
DHF DR DSL
No ratings yet
DHF DR DSL
26 pages
Subjective Data Collection
No ratings yet
Subjective Data Collection
24 pages
Appendicitis Case for Med Students
No ratings yet
Appendicitis Case for Med Students
32 pages
CP Handbook 2017ed
No ratings yet
CP Handbook 2017ed
282 pages
Navigating Patient Medical Records
0% (1)
Navigating Patient Medical Records
28 pages
Importance of Updating Medical History
No ratings yet
Importance of Updating Medical History
2 pages

ChatGPT's Diagnostic Accuracy in Ophthalmology

Uploaded by

ChatGPT's Diagnostic Accuracy in Ophthalmology

Uploaded by

Graefe's Archive for Clinical and Experimental Ophthalmology (2024) 262:2345–2352

Diagnostic capabilities of ChatGPT in ophthalmology

Keywords ChatGPT · Diagnosis · Ophthalmology · Residents · Artificial intelligence

Extended author information available on the last page of the article

Background several reports have been published regarding the accuracy

Table 1 Baseline characteristics Study cohort

Table 2 Diagnosis accuracy ChatGPT Residents Attendings p-value

Fig. 1 Selection of the study

Authors and Affiliations

You might also like