GPT-3.5-turbo versus GPT-4.0 for medical education
Keywords:
Education, medical, Generative artificial intelligence, Large language modelsAbstract
Objectives: To compare the performances of GPT-3.5-turbo and GPT-4.0 for medical education.
Methods: The performances of GPT-3.5-turbo (via Poe) and GPT-4.0 (via Microsoft 365 Copilot) were compared in terms of medical terminology translation (283 medical terms from six specialties), situational judgment (30 situations in seven contexts), medical knowledge (80 multiple choice questions and 80 clinical scenario questions), medical studies (10 case scenarios), and clinical communication (10 case scenarios).
Results: GPT-4.0 outperformed GPT-3.5-turbo in accuracy in terms of medical terminology translation (98.6% vs 91.5%), situational judgment (83.3% vs 63.3%), medical knowledge (93.8% vs 85.0% on multiple choice questions and 82.5% vs 72.5% on clinical scenario questions), medical studies (in three of ten case scenarios), and clinical communication (in four of 10 case scenarios).
Conclusions: GPT-3.5-turbo and GPT-4.0 performed reasonably well on medical knowledge and medical studies, with GPT-4.0 performing slightly better in medical terminology translation, situational judgment, and clinical communication. These results are promising for the incorporation of large language models in medical education. Nonetheless, overreliance should be avoided as responses can be inaccurate or irrelevant.
References
1. Eysenbach G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ 2023;9:e46885.
2. Toma A, Senkaiahliyan S, Lawler PR, Rubin B, Wang B. Generative AI could revolutionize health care - but not if control is ceded to big tech. Nature 2023;624:36-8.
3. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med 2023;29:1930-40.
4. Clusmann J, Kolbinger FR, Muti HS, et al. The future landscape of large language models in medicine. Commun Med (Lond) 2023;3:141.
5. Omiye JA, Gui H, Rezaei SJ, Zou J, Daneshjou R. Large language models in medicine: the potentials and pitfalls: a narrative review. Ann Intern Med 2024;177:210-20.
6. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2:e0000198.
7. Choi JH, Hickman KE, Monahan AB, Schwarcz D. ChatGPT goes to law school. J Legal Educ 2022;71:387.
8. Terwiesch C. Would ChatGPT3 get a Wharton MBA. A prediction based on its performance in the operations management course. Mack Institute for Innovation Management at the Wharton School, University of Pennsylvania, 2023.
9. Kozachek D. Proceedings of the 15th Conference on Creativity and Cognition. Association for Computing Machinery; 2023.
10. Goodman RS, Patrinely JR, Stone CA Jr, et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open 2023;6:e2336483.
11. Delsoz M, Madadi Y, Munir WM, et al. Performance of ChatGPT in diagnosis of corneal eye diseases. medRxiv 2023;2023.08.25.23294635.
12. Hu X, Ran AR, Nguyen TX, et al. What can GPT-4 do for diagnosing rare eye diseases? A pilot study. Ophthalmol Ther 2023;12:3395-402.
13. Waisberg E, Ong J, Zaman N, et al. GPT-4 for triaging ophthalmic symptoms. Eye (Lond) 2023;37:3874-5.
14. Grimm DR, Lee YJ, Hu K, et al. The utility of ChatGPT as a generative medical translator. Eur Arch Otorhinolaryngol 2024;281:6161-5.
15. Lyu Q, Tan J, Zapadka ME, et al. Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential. Vis Comput Ind Biomed Art 2023;6:9.
16. Medical Audio Glossary. The Chinese University of Hong Kong. Accessed 12 July 2024. Available from: https://intranet.mfpo.cuhk.edu.hk/med/eweb/Terminology/med-term.html
17. Kulkarni S, Parry J, Sitch A. An assessment of the impact of formal preparation activities on performance in the University Clinical Aptitude Test (UCAT): a national study. BMC Med Educ 2022;22:747.
18. Klatt EC, Kumar V. Robbins and Cotran Review of Pathology. Elsevier Health Sciences; 2014.
19. Blackbourne L. Surgical Recall. Wolters Kluwer Health; 2020.
20. Henry ES, Quinton N. How to prepare for the Membership of the Royal College of Physicians (UK) Part 2 Clinical Examination (Practical Assessment of Clinical Examination Skills): an exploratory study. Educ Med J 2021;13:77-91.
21. Membership of the Royal Colleges of Physicians of the United Kingdom. PACES Station 4: Communication Skills & Ethics. Accessed 12 July 2024. Available from: https://www.thefederation.uk/sites/default/files/documents/Station%204%20Scenario%20Pack%20%2824%29.pdf
22. Membership of the Royal Colleges of Physicians of the United Kingdom. PACES Marksheets Sample 2023. Accessed 12 July 2024. Available from: https://www.thefederation.uk/sites/default/files/PACES%20New%20Marksheets%20-%20for%20website.pdf
23. Hua HU, Kaakour AH, Rachitskaya A, Srivastava S, Sharma S, Mammo DA. Evaluation and comparison of ophthalmic scientific abstracts and references by current artificial intelligence chatbots. JAMA Ophthalmol 2023;141:819-24.
24. Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 1999;53:105-11.
25. Garcia Valencia OA, Thongprayoon C, Jadlowiec CC, et al. AI-driven translations for kidney transplant equity in Hispanic populations. Sci Rep 2024;14:8511.
26. Ando K, Sato M, Wakatsuki S, et al. A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical questions. BJA Open 2024;10:100296.
27. Fang C, Wu Y, Fu W, et al. How does ChatGPT-4 perform on non-English national medical licensing examination? An evaluation in Chinese language. PLOS Digit Health 2023;2:e0000397.
28. Wang H, Wu W, Dou Z, He L, Yang, L. Performance and exploration of ChatGPT in medical examination, records and education in Chinese: pave the way for medical AI. Int J Med Inform 2023;177:105173.
29. Ballard DH. Inconsistently accurate: repeatability of GPT-3.5 and GPT-4 in answering radiology board-style multiple choice questions. Radiology 2024;311:e241173.
30. Silverman J, Kinnersley P. Doctors’ non-verbal behaviour in consultations: look at the patient before you look at the computer. Br J Gen Pract 2010;60:76-8.
31. Ayoub NF, Lee YJ, Grimm D, Divi V. Head‐to‐head comparison of ChatGPT versus Google search for medical knowledge acquisition. Otolaryngol Head Neck Surg 2024;170:1484-91.
32. Jo E, Song S, Kim JH, et al. Assessing GPT-4’s performance in delivering medical advice: comparative analysis with human experts. JMIR Med Educ 2024;10:e51282.
33. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 2023;9:e45312.
34. Oztermeli AD, Oztermeli A. ChatGPT performance in the medical specialty exam: an observational study. Medicine (Baltimore) 2023;102:e34673.
35. Ramchandani R, Biglou SG, Gupta M, Guo E. Using AI to revolutionize clinical training through OSCE-GPT: a focused exploration of user feedback on otolaryngology and neurology cases. Can J Neurol Sci 2024;51:S35.
36. Misra SM, Suresh S. Artificial intelligence and Objective Structured Clinical Examinations: using ChatGPT to revolutionize clinical skills assessment in medical education. J Med Educ Curric Dev 2024;11:23821205241263475.
37. Ali R, Tang OY, Connolly ID, et al. Performance of ChatGPT, GPT-4, and Google Bard on a neurosurgery oral boards preparation question bank. Neurosurgery 2023;93:1090-8.
38. Rosoł M, Gąsior JS, Łaba J, Korzeniewski K, Młyńczak M. Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination. Sci Rep 2023;13:20512.
39. Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci 2023;3:100324.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Hong Kong Journal of Ophthalmology

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The Journal has a fully Open Access policy and publishes all articles under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) licence. For any use other than that permitted by this license, written permission must be obtained from the Journal.