Evaluating the Performance of Large Language Models on Doctoral Accounting Exams: A Comparative Study of Six Generative AI Chatbots

Khademi, Sasan

doi:10.22034/jista.2026.568921.1077

Evaluating the Performance of Large Language Models on Doctoral Accounting Exams: A Comparative Study of Six Generative AI Chatbots

Document Type : Original Article

Author

Sasan Khademi

Ph.D in Accounting, Department, Faculty of Economics, Management and Social Sciences, Shiraz University, Shiraz, Iran

10.22034/jista.2026.568921.1077

Abstract

The rapid advancement of large language models (LLMs) has drawn increasing attention from accounting education researchers to their performance on specialized questions and potential implications for learning and assessment. This study aims to evaluate and compare the performance of six LLMs (ChatGPT, Gemini, Perplexity, Grok, DeepSeek, and Qwen) on the Iranian PhD Accounting Examination and to assess their potential as educational support tools. The dataset comprises 300 official multiple-choice questions from three subjects (Auditing, Management Accounting, and Accounting Theory) administered between 2021 and 2025. Responses generated by each model were coded dichotomously (correct/incorrect) and evaluated against two reference levels, 0.25 (random performance) and 0.50 (minimum acceptable threshold), using one-sample proportion tests, with 95% confidence intervals reported for model accuracies. Cochran’s Q test was employed to compare relative performance across models. Results indicated that all models performed significantly above both reference levels. Although Gemini achieved the highest and Qwen the lowest correct-response rates, Cochran’s Q revealed no statistically significant differences in overall performance. Importantly, results are interpreted within an open-book scenario, and given the potential for data leakage and the multiple-choice nature of the questions, findings should not be construed as evidence of deep conceptual understanding or independent reasoning. Overall, the findings suggest that LLMs, even without advanced tuning or specialized training, possess substantial capacity for producing correct responses in standard accounting examinations and may serve as complementary tools in accounting education and assessment design.

Keywords

Large Language Models

Accounting Education

Accounting PhD Entrance Examination

Performance Evaluation

Artificial Intelligence in Education

Subjects

Applications of Artificial Intelligence and Machine Learning in IT Auditing

Adnan Hammood, M., Piri, P., & Ashtab, A. (2025). Feasibility of utilizing advanced artificial intelligence technologies to improve auditing processes in the country. Accounting and Auditing Review, 32(3), 535-559. (in Persian) https://doi.org/10.22059/acctgrev.2025.391837.1009085

Agarwal, P., & Gaur, F. (2020). A historical perspective of artificial intelligence in accounting: Evolution, current developments, and future opportunities. Journal of Accounting and Organizational Change, 16(1), 1–12. https://doi.org/10.1108/JAOC-04-2017-0035

AI Index Steering Committee. (2025). The AI Index 2025 annual report. Institute for Human-Centered AI, Stanford University. https://doi.org/10.48550/arXiv.2504.07139

Alibaba Group. (2024, September 19). Alibaba Cloud unveils Qwen2.5, full‑stack AI infrastructure enhancements at 2024 Apsara Conference. Alibaba Group. https://www.alibabagroup.com/en-US/document-1773855135127044096

Albuquerque, F., & Gomes dos Santos, P. (2024). Can ChatGPT Be a Certified Accountant? Assessing the Responses of ChatGPT for the Professional Access Exam in Portugal. Administrative Sciences, 14(7), 152. https://doi.org/10.3390/admsci14070152

Amoah, N., Fianko, S. K., Dake, S., Agyemang, K., Nyame, I., Adjaye-Gyamfi, O., ... & Lartey, R. (2024). The Impact of Ai Chatbots on the Landscape of Professional Accountancy Examination: An Experimental Study. Available at SSRN 4991304. http://dx.doi.org/10.2139/ssrn.4991304

Bordt, S., & von Luxburg, U. (2023). Chatgpt participates in a computer science exam. arXiv preprint arXiv:2303.09461. https://doi.org/10.48550/arXiv.2303.09461

Bommarito, J., Bommarito, M., Katz, D. M., & Katz, J. (2023). GPT as knowledge worker: a zero-shot evaluation of (AI) CPA capabilities. arXiv preprint arXiv:2301.04408. https://doi.org/10.48550/arXiv.2301.04408

Chippagiri, S. (2025, March 4). DeepSeek: Revolutionizing AI with Open‑Source Large Language Models. DEV Community. https://dev.to/srinivas_chippagiri_e01c8/deepseek-revolutionizing-ai-with-open-source-large-language-models-127i

Dell, S., & Akpan, M. (2024). You are the auditor: A ChatGPT-based multiple choice exam. Advances in Online Education: A Peer-Reviewed Journal, 3(2), 111–120. https://doi.org/10.69554/EINF1743

de Freitas, M. M., Sallaberry, J. D., & de Jesus Silva, T. B. (2024). Application of Chat GPT 4.0 for solving accounting problems. GCG: revista de globalización, competitividad y gobernabilidad, 18(2), 49-64. https://dialnet.unirioja.es/servlet/articulo?codigo=9498637

de Winter, J. C. (2024). Can ChatGPT pass high school exams on English language comprehension?. International Journal of Artificial Intelligence in Education, 34(3), 915-930. https://doi.org/10.1007/s40593-023-00372-z

Eulerich, M., Sanatizadeh, A., Vakilzadeh, H., & Wood, D. A. (2023). Can artificial intelligence pass accounting certification exams? ChatGPT: CPA, CMA, CIA, and EA. ChatGPT: CPA, CMA, CIA, and EA. Available at SSRN. http://www.ais.nptu.edu.tw/bsacc/1121%20materials/SSRN-id4452175_ChatGPT%E8%80%83%E6%9C%83%E8%A8%88%E8%AD%89%E7%85%A7.pdf

Eulerich, M., Sanatizadeh, A., Vakilzadeh, H., & Wood, D. A. (2024). Is it all hype? ChatGPT’s performance and disruptive potential in the accounting and auditing industries. Review of Accounting Studies, 29(3), 2318-2349. https://doi.org/10.1007/s11142-024-09833-9

Foote, K. D. (2023, December 28). A brief history of large language models. DATAVERSITY. https://www.dataversity.net/a-brief-history-of-large-language-models/

Glover, E. (2025, July 16). Grok: What we know about Elon Musk’s AI chatbot. Built In. https://builtin.com/articles/grok

Greenman, C., Esplin, D., Johnston, R., & Richards, J. (2024). An Analysis of the Impact of Artificial Intelligence on the Accounting Profession. Journal of Accounting, Ethics & Public Policy, JAEPP, 25(2), 188-188. https://doi.org/10.60154/jaepp.2024.v25n2p188

Guinness, H. (2024, April 3). What is Perplexity AI? How to use it + how it works. Zapier Blog. https://zapier.com/blog/perplexity-ai

Hashemi-Pour, C., Kerner, S. M., & Patrizio, A. (2025, January 8). What is the Google Gemini AI model (formerly Bard)? TechTarget. https://www.techtarget.com/searchenterpriseai/definition/Google-Gemini

Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. (2024). Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382(2270), 20230254. https://doi.org/10.1098/rsta.2023.0254

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS digital health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198

Martínez, E. (2024). Re-evaluating GPT-4’s bar exam performance. Artificial intelligence and law, 1-24. https://doi.org/10.1007/s10506-024-09307-6

Mashayekhi, B., & Amrollahi, M. R. (2025). The effect of internal auditors' knowledge and professional skepticism on the artificial intelligence utilization. Journal of Empirical Research in Accounting, 15(2), 1-28. (in Persian) https://doi.org/10.22051/jera.2025.50268.3523

Mendonça, N. C. (2024). Evaluating chatgpt-4 vision on brazil's national undergraduate computer science exam. ACM Transactions on Computing Education, 24(3), 1-56. https://dl.acm.org/doi/abs/10.1145/3674149

Mikalef, P., & Gupta, M. (2021). Artificial intelligence capability: Conceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance. Information & Management, 58(3), 103434. https://doi.org/10.1016/j.im.2020.103434

National Aeronautics and Space Administration. (2024). What is artificial intelligence? NASA. https://www.nasa.gov/what-is-artificial-intelligence/

Nourahmadi, M., & Parsi, F. (2025). The role of artificial intelligence in enhancing green accounting and sustainable development: a bibliometrix methode. Journal of Empirical Research in Accounting, 15(2), 211-238. (in Persian) https://doi.org/10.22051/jera.2025.50235.3512

Pierotti, M., Monreale, A., & De Santis, F. (2024). Artificial Intelligence in Accounting and Auditing: Accessing the Corporate Implications. Palgrave Macmillan, Switzerland. ISBN. https://doi.org/10.1007/978-3-031-31299-1

Rahmaini, A., Maanavi, S., & Haddadi, N. (2025). Integration of Artificial Intelligence in Auditing: Challenges and Benefits. Journal of Information System and Technology Audit (JISTA), 1(1). 1-27. (in Persian) https://doi.org/ 10.22034/jista.2025.528769.1051

Rahnama, M., & Rafati, H. (2025). The Ethical Implications of Adopting Artificial Intelligence (AI) in Financial Decision-Making. Journal of Information System and Technology Audit (JISTA), 1(1). 284-301. (in Persian) https://doi.org/10.22034/jista.2025.509536.1032

Saghafi, A., & Parsapoor, M. (2025). Examining impact of accounting data analysis with generative ai on the quality of digital sustainability reporting with the mediating role of green sustainability internal control system. Financial Accounting Knowledge, 12(1), 1-31. (in Persian) https://doi.org/10.30479/jfak.2025.21533.3270

SecureNinja. (2025, March 18). Comparison of Top AI Models: DeepSeek AI, ChatGPT, Gemini, and Perplexity AI. SecureNinja Blog. https://secureninja.com/news/comparison-of-top-ai-models-deepseek-ai-chatgpt-gemini-and.html

Sharida, A., & Hashlamon, I. (2021). Real-time vision-based controller for delta robots. International Journal of Intelligent Systems Technologies and Applications, 20 (4), 271–295. https://doi.org/10.1504/IJISTA.2021.10045532

Sharida, A., Hamdan, A., & Al-Hashimi, M. (2020). Smart cities: The next urban evolution in delivering a better quality of life. Toward Social Internet of Things (SIoT): Enabling Technologies, Architectures and Applications: Emerging Technologies for Connected and Smart Social Objects, 287–298. https://doi.org/10.1007/978-3-030-24513-9_16

Stengel, F. C., Stienen, M. N., Ivanov, M., Gandía-González, M. L., Raffa, G., Ganau, M., ... & Motov, S. (2024). Can AI pass the written European Board Examination in Neurological Surgery?-Ethical and practical issues. Brain and Spine, 4, 102765. https://doi.org/10.1016/j.bas.2024.102765

SY Partners. (2025, February 10). The history of GPT: A journey through generative pre-trained transformers. https://syp.vn/en/article/the-history-of-GPT

TechCrunch. (2025, May 20). DeepThink boosts the performance of Google’s flagship Google Gemini AI model. https://techcrunch.com/2025/05/20/deep-think-boosts-the-performance-of-googles-flagship-google-gemini-ai-model

Va˘rzaru, A. A. (2022). Assessing artificial intelligence technology acceptance in managerial accounting. Electronics, 11, 1–13. https://doi.org/10.3390/electronics11142256

Vasarhelyi, M. A., Moffitt, K. C., Stewart, T., & Sunderland, D. (2023). Large language models: An emerging technology in accounting. Journal of Emerging Technologies in Accounting, 20(2), 1–10. https://doi.org/10.2308/JETA-2023-047. https://doi.org/10.2308/JETA-2023-047

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Wood, D. A., Achhpilia, M. P., Adams, M. T., Aghazadeh, S., Akinyele, K., Akpan, M., ... & Kuruppu, C. (2023). The ChatGPT artificial intelligence chatbot: How well does it answer accounting assessment questions?. Issues in Accounting Education, 38(4), 81-108. https://doi.org/10.2308/ISSUES-2023-013

World Economic Forum. (2020). Future of Jobs Report 2020. https://www.weforum.org/publications/the-future-of-jobs-report-2020/

Wutzler, J. (2024). Outsmarting Artificial Intelligence in the Classroom—Incorporating Large Language Model-Based Chatbots into Teaching. Issues in Accounting Education, 39(4), 183-206. https://doi.org/10.5555/ISSUES-2023-064tn

Zacher, W., & Kuppannagari, S. (2024). Can LLMs Pass the CPA Exam? Evaluating Large Language Model Performance on the Certified Public Accountant Test. Available at SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4788096

Zhangyang, Q., Fang, Y., Zhang, M., Sun, Z., Wu, T., Liu, Z., Lin, D., Wang, J., & Zhao, H. (2023, December 22). Gemini vs GPT‑4V: A preliminary comparison and combination of vision‑language models through qualitative cases. arXiv. https://doi.org/10.48550/arXiv.2312.15011

Journal of Information System and Technology Auditing

Volume 1, Issue 2 - Serial Number 2
September 2026
Pages 57-91

XML

PDF 985.39 K

Receive Date 26 November 2025
Revise Date 12 January 2026
Accept Date 18 February 2026
Publish Date 23 September 2025

Article View 382
PDF Download 62

Journal of Information System and Technology Auditing

Evaluating the Performance of Large Language Models on Doctoral Accounting Exams: A Comparative Study of Six Generative AI Chatbots

Volume 1, Issue 2 - Serial Number 2September 2026Pages 57-91

Files

History

Share

How to cite

Statistics

Volume 1, Issue 2 - Serial Number 2
September 2026
Pages 57-91