OBJECTIVES: To evaluate the performance of Claude 3.5 Sonnet, a novel multimodal large language model, in interpreting image-based ophthalmology case questions in different subspecialties and question formats.
METHODS: A total of 174 image-based ophthalmology questions from a comprehensive ophthalmology education platform, were analysed by Claude 3.5 Sonnet. Each question was presented in both multiple-choice and open ended formats. Questions were categorized into six subspecialties: retina and uveitis, external eye and cornea, orbit and oculoplastics, neuroophthalmology, glaucoma & cataract, and strabismus, pediatric ophthalmology & genetics. Performance was evaluated by two board-certified ophthalmologists.
RESULTS: Results: Claude 3.5 Sonnet demonstrated an overall accuracy rate of 89.65% in multiple-choice questions and a comparable 87.93% in open ended questions, with no statistically significant difference between formats (p=0.72). Performance showed slight variations among subspecialties, with the highest accuracy in external eye and cornea cases (95.65% in both formats) and lower accuracy in strabismus, pediatric ophthalmology & genetics (87.50% in multiple-choice, 84.38% in open-ended).
DISCUSSION AND CONCLUSION: Claude 3.5 Sonnet showed promising capabilities in interpreting image-based ophthalmology questions, with consistent performance between different question formats. The model showed strength in cornea and external eye section. These findings suggest potential applications in ophthalmology education and board exam preparation however validatation of its ability in real world clinical scenarios need further evaluation.