Journal of Information System and Technology Auditing

Journal of Information System and Technology Auditing

Evaluating the Performance of Large Language Models on Doctoral Accounting Exams: A Comparative Study of Six Generative AI Chatbots

Document Type : Original Article

Author
Ph.D in Accounting, Department, Faculty of Economics, Management and Social Sciences, Shiraz University, Shiraz, Iran
Abstract
The rapid advancement of large language models (LLMs) has drawn increasing attention from accounting education researchers to their performance on specialized questions and potential implications for learning and assessment. This study aims to evaluate and compare the performance of six LLMs (ChatGPT, Gemini, Perplexity, Grok, DeepSeek, and Qwen) on the Iranian PhD Accounting Examination and to assess their potential as educational support tools. The dataset comprises 300 official multiple-choice questions from three subjects (Auditing, Management Accounting, and Accounting Theory) administered between 2021 and 2025. Responses generated by each model were coded dichotomously (correct/incorrect) and evaluated against two reference levels, 0.25 (random performance) and 0.50 (minimum acceptable threshold), using one-sample proportion tests, with 95% confidence intervals reported for model accuracies. Cochran’s Q test was employed to compare relative performance across models. Results indicated that all models performed significantly above both reference levels. Although Gemini achieved the highest and Qwen the lowest correct-response rates, Cochran’s Q revealed no statistically significant differences in overall performance. Importantly, results are interpreted within an open-book scenario, and given the potential for data leakage and the multiple-choice nature of the questions, findings should not be construed as evidence of deep conceptual understanding or independent reasoning. Overall, the findings suggest that LLMs, even without advanced tuning or specialized training, possess substantial capacity for producing correct responses in standard accounting examinations and may serve as complementary tools in accounting education and assessment design.
Keywords

Subjects


Adnan Hammood, M., Piri, P., & Ashtab, A. (2025). Feasibility of utilizing advanced artificial intelligence technologies to improve auditing processes in the country. Accounting and Auditing Review, 32(3), 535-559. (in Persian) https://doi.org/10.22059/acctgrev.2025.391837.1009085
Agarwal, P., & Gaur, F. (2020). A historical perspective of artificial intelligence in accounting: Evolution, current developments, and future opportunities. Journal of Accounting and Organizational Change, 16(1), 1–12. https://doi.org/10.1108/JAOC-04-2017-0035
AI Index Steering Committee. (2025). The AI Index 2025 annual report. Institute for Human-Centered AI, Stanford University. https://doi.org/10.48550/arXiv.2504.07139
Alibaba Group. (2024, September 19). Alibaba Cloud unveils Qwen2.5, full‑stack AI infrastructure enhancements at 2024 Apsara Conference. Alibaba Group. https://www.alibabagroup.com/en-US/document-1773855135127044096
Albuquerque, F., & Gomes dos Santos, P. (2024). Can ChatGPT Be a Certified Accountant? Assessing the Responses of ChatGPT for the Professional Access Exam in Portugal. Administrative Sciences, 14(7), 152. https://doi.org/10.3390/admsci14070152
Amoah, N., Fianko, S. K., Dake, S., Agyemang, K., Nyame, I., Adjaye-Gyamfi, O., ... & Lartey, R. (2024). The Impact of Ai Chatbots on the Landscape of Professional Accountancy Examination: An Experimental Study. Available at SSRN 4991304. http://dx.doi.org/10.2139/ssrn.4991304
Bordt, S., & von Luxburg, U. (2023). Chatgpt participates in a computer science exam. arXiv preprint arXiv:2303.09461. https://doi.org/10.48550/arXiv.2303.09461
Bommarito, J., Bommarito, M., Katz, D. M., & Katz, J. (2023). GPT as knowledge worker: a zero-shot evaluation of (AI) CPA capabilities. arXiv preprint arXiv:2301.04408. https://doi.org/10.48550/arXiv.2301.04408
Chippagiri, S. (2025, March 4). DeepSeek: Revolutionizing AI with Open‑Source Large Language Models. DEV Community. https://dev.to/srinivas_chippagiri_e01c8/deepseek-revolutionizing-ai-with-open-source-large-language-models-127i
Dell, S., & Akpan, M. (2024). You are the auditor: A ChatGPT-based multiple choice exam. Advances in Online Education: A Peer-Reviewed Journal, 3(2), 111–120. https://doi.org/10.69554/EINF1743
de Freitas, M. M., Sallaberry, J. D., & de Jesus Silva, T. B. (2024). Application of Chat GPT 4.0 for solving accounting problems. GCG: revista de globalización, competitividad y gobernabilidad, 18(2), 49-64. https://dialnet.unirioja.es/servlet/articulo?codigo=9498637
de Winter, J. C. (2024). Can ChatGPT pass high school exams on English language comprehension?. International Journal of Artificial Intelligence in Education, 34(3), 915-930. https://doi.org/10.1007/s40593-023-00372-z
Eulerich, M., Sanatizadeh, A., Vakilzadeh, H., & Wood, D. A. (2023). Can artificial intelligence pass accounting certification exams? ChatGPT: CPA, CMA, CIA, and EA. ChatGPT: CPA, CMA, CIA, and EA. Available at SSRN. http://www.ais.nptu.edu.tw/bsacc/1121%20materials/SSRN-id4452175_ChatGPT%E8%80%83%E6%9C%83%E8%A8%88%E8%AD%89%E7%85%A7.pdf
Eulerich, M., Sanatizadeh, A., Vakilzadeh, H., & Wood, D. A. (2024). Is it all hype? ChatGPT’s performance and disruptive potential in the accounting and auditing industries. Review of Accounting Studies, 29(3), 2318-2349. https://doi.org/10.1007/s11142-024-09833-9
Foote, K. D. (2023, December 28). A brief history of large language models. DATAVERSITY. https://www.dataversity.net/a-brief-history-of-large-language-models/
Glover, E. (2025, July 16). Grok: What we know about Elon Musk’s AI chatbot. Built In. https://builtin.com/articles/grok
Greenman, C., Esplin, D., Johnston, R., & Richards, J. (2024). An Analysis of the Impact of Artificial Intelligence on the Accounting Profession. Journal of Accounting, Ethics & Public Policy, JAEPP, 25(2), 188-188. https://doi.org/10.60154/jaepp.2024.v25n2p188
Guinness, H. (2024, April 3). What is Perplexity AI? How to use it + how it works. Zapier Blog. https://zapier.com/blog/perplexity-ai
Hashemi-Pour, C., Kerner, S. M., & Patrizio, A. (2025, January 8). What is the Google Gemini AI model (formerly Bard)? TechTarget. https://www.techtarget.com/searchenterpriseai/definition/Google-Gemini
Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. (2024). Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382(2270), 20230254. https://doi.org/10.1098/rsta.2023.0254
Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS digital health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198
Martínez, E. (2024). Re-evaluating GPT-4’s bar exam performance. Artificial intelligence and law, 1-24. https://doi.org/10.1007/s10506-024-09307-6
Mashayekhi, B., & Amrollahi, M. R. (2025). The effect of internal auditors' knowledge and professional skepticism on the artificial intelligence utilization. Journal of Empirical Research in Accounting, 15(2), 1-28. (in Persian) https://doi.org/10.22051/jera.2025.50268.3523
Mendonça, N. C. (2024). Evaluating chatgpt-4 vision on brazil's national undergraduate computer science exam. ACM Transactions on Computing Education, 24(3), 1-56. https://dl.acm.org/doi/abs/10.1145/3674149 
Mikalef, P., & Gupta, M. (2021). Artificial intelligence capability: Conceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance. Information & Management, 58(3), 103434. https://doi.org/10.1016/j.im.2020.103434
National Aeronautics and Space Administration. (2024). What is artificial intelligence? NASA. https://www.nasa.gov/what-is-artificial-intelligence/
Nourahmadi, M., & Parsi, F. (2025). The role of artificial intelligence in enhancing green accounting and sustainable development: a bibliometrix methode. Journal of Empirical Research in Accounting, 15(2), 211-238. (in Persian)  https://doi.org/10.22051/jera.2025.50235.3512
Pierotti, M., Monreale, A., & De Santis, F. (2024). Artificial Intelligence in Accounting and Auditing: Accessing the Corporate Implications. Palgrave Macmillan, Switzerland. ISBN. https://doi.org/10.1007/978-3-031-31299-1
Rahmaini, A., Maanavi, S., & Haddadi, N. (2025). Integration of Artificial Intelligence in Auditing: Challenges and Benefits. Journal of Information System and Technology Audit (JISTA), 1(1). 1-27. (in Persian) https://doi.org/ 10.22034/jista.2025.528769.1051 
Rahnama, M., & Rafati, H. (2025). The Ethical Implications of Adopting Artificial Intelligence (AI) in Financial Decision-Making. Journal of Information System and Technology Audit (JISTA), 1(1). 284-301. (in Persian) https://doi.org/10.22034/jista.2025.509536.1032
Saghafi, A., & Parsapoor, M. (2025). Examining impact of accounting data analysis with generative ai on the quality of digital sustainability reporting with the mediating role of green sustainability internal control system. Financial Accounting Knowledge, 12(1), 1-31. (in Persian) https://doi.org/10.30479/jfak.2025.21533.3270
SecureNinja. (2025, March 18). Comparison of Top AI Models: DeepSeek AI, ChatGPT, Gemini, and Perplexity AI. SecureNinja Blog. https://secureninja.com/news/comparison-of-top-ai-models-deepseek-ai-chatgpt-gemini-and.html
Sharida, A., & Hashlamon, I. (2021). Real-time vision-based controller for delta robots. International Journal of Intelligent Systems Technologies and Applications, 20 (4), 271–295. https://doi.org/10.1504/IJISTA.2021.10045532
Sharida, A., Hamdan, A., & Al-Hashimi, M. (2020). Smart cities: The next urban evolution in delivering a better quality of life. Toward Social Internet of Things (SIoT): Enabling Technologies, Architectures and Applications: Emerging Technologies for Connected and Smart Social Objects, 287–298. https://doi.org/10.1007/978-3-030-24513-9_16
Stengel, F. C., Stienen, M. N., Ivanov, M., Gandía-González, M. L., Raffa, G., Ganau, M., ... & Motov, S. (2024). Can AI pass the written European Board Examination in Neurological Surgery?-Ethical and practical issues. Brain and Spine, 4, 102765. https://doi.org/10.1016/j.bas.2024.102765
SY Partners. (2025, February 10). The history of GPT: A journey through generative pre-trained transformers. https://syp.vn/en/article/the-history-of-GPT
TechCrunch. (2025, May 20). DeepThink boosts the performance of Google’s flagship Google Gemini AI model. https://techcrunch.com/2025/05/20/deep-think-boosts-the-performance-of-googles-flagship-google-gemini-ai-model
Va˘rzaru, A. A. (2022). Assessing artificial intelligence technology acceptance in managerial accounting. Electronics, 11, 1–13. https://doi.org/10.3390/electronics11142256
Vasarhelyi, M. A., Moffitt, K. C., Stewart, T., & Sunderland, D. (2023). Large language models: An emerging technology in accounting. Journal of Emerging Technologies in Accounting, 20(2), 1–10. https://doi.org/10.2308/JETA-2023-047. https://doi.org/10.2308/JETA-2023-047
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Wood, D. A., Achhpilia, M. P., Adams, M. T., Aghazadeh, S., Akinyele, K., Akpan, M., ... & Kuruppu, C. (2023). The ChatGPT artificial intelligence chatbot: How well does it answer accounting assessment questions?. Issues in Accounting Education, 38(4), 81-108. https://doi.org/10.2308/ISSUES-2023-013
World Economic Forum. (2020). Future of Jobs Report 2020. https://www.weforum.org/publications/the-future-of-jobs-report-2020/
Wutzler, J. (2024). Outsmarting Artificial Intelligence in the Classroom—Incorporating Large Language Model-Based Chatbots into Teaching. Issues in Accounting Education, 39(4), 183-206. https://doi.org/10.5555/ISSUES-2023-064tn
Zacher, W., & Kuppannagari, S. (2024). Can LLMs Pass the CPA Exam? Evaluating Large Language Model Performance on the Certified Public Accountant Test. Available at SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4788096
Zhangyang, Q., Fang, Y., Zhang, M., Sun, Z., Wu, T., Liu, Z., Lin, D., Wang, J., & Zhao, H. (2023, December 22). Gemini vs GPT‑4V: A preliminary comparison and combination of vision‑language models through qualitative cases. arXiv. https://doi.org/10.48550/arXiv.2312.15011
Volume 1, Issue 2 - Serial Number 2
September 2026
Pages 57-91

  • Receive Date 26 November 2025
  • Revise Date 12 January 2026
  • Accept Date 18 February 2026
  • Publish Date 23 September 2025