ارزیابی عملکرد مدل‌های زبانی بزرگ در آزمون دکتری حسابداری: مطالعه‌ای مقایسه‌ای از شش چت‌بات هوش مصنوعی مولد

خادمی, ساسان

doi:10.22034/jista.2026.568921.1077

ارزیابی عملکرد مدل‌های زبانی بزرگ در آزمون دکتری حسابداری: مطالعه‌ای مقایسه‌ای از شش چت‌بات هوش مصنوعی مولد

نوع مقاله : مقاله پژوهشی

نویسنده

ساسان خادمی

دانش آموخته دکتری حسابداری، بخش حسابداری، دانشکده اقتصاد مدیریت و علوم اجتماعی، دانشگاه شیراز، شیراز، ایران

10.22034/jista.2026.568921.1077

چکیده

پیشرفت شتابان مدل‌های زبانی بزرگ، توجه پژوهشگران را به عملکرد این ابزارها در پاسخ‌گویی به پرسش‌های تخصصی و پیامدهای بالقوه آن‌ها برای یادگیری و ارزشیابی معطوف کرده است. هدف پژوهش حاضر، ارزیابی و مقایسه عملکرد شش مدل زبانی بزرگ شامل ChatGPT، Gemini، Perplexity، Grok، DeepSeek و Qwen در پاسخ‌گویی به سؤالات آزمون دکتری حسابداری ایران است. داده‌های پژوهش شامل ۳۰۰ سؤال چهارگزینه‌ای رسمی آزمون دکتری حسابداری طی سال‌های ۱۴۰۰ تا ۱۴۰۴ در سه درس حسابرسی، حسابداری مدیریت و تئوری حسابداری است. پاسخ‌های هر مدل به‌صورت دودویی (صحیح/غلط) کدگذاری شد و با استفاده از آزمون نسبت تک‌نمونه‌ای، عملکرد آن‌ها نسبت به دو سطح مرجع ۰٫۲۵ (عملکرد تصادفی) و ۰٫۵۰ (سطح پایه قابل قبول) ارزیابی گردید. همچنین، برای مقایسه عملکرد نسبی مدل‌ها از آزمون Cochran’s Q استفاده شد. نتایج نشان داد که عملکرد تمامی مدل‌ها به‌طور معناداری فراتر از هر دو سطح مرجع است. اگرچه مدل Gemini بالاترین و مدل Qwen پایین‌ترین درصد پاسخ صحیح را ثبت کردند، آزمون Cochran’s Q تفاوت معناداری میان عملکرد کلی مدل‌ها نشان نداد. با این حال، نتایج در چارچوب یک سناریوی عملیاتی open-book تفسیر می‌شوند و با توجه به احتمال نشت داده و ماهیت چندگزینه‌ای سؤالات، نباید به‌عنوان شواهدی از درک مفهومی عمیق یا استدلال مستقل مدل‌ها تلقی شوند. به‌طور کلی، یافته‌ها نشان می‌دهد که مدل‌های زبانی بزرگ، حتی بدون تنظیمات پیشرفته یا آموزش اختصاصی، از توان قابل توجهی در عملکرد صحیح در آزمون‌های استاندارد حسابداری برخوردارند و می‌توانند به‌عنوان ابزارهای مکمل در آموزش و طراحی فعالیت‌های ارزشیابی در آموزش عالی حسابداری مورد توجه قرار گیرند.

کلیدواژه‌ها

مدل‌های زبانی بزرگ

آموزش حسابداری

آزمون دکتری حسابداری

ارزیابی عملکرد

هوش مصنوعی در آموزش

موضوعات

کاربردهای هوش مصنوعی و یادگیری ماشین در حسابرسی فناوری اطلاعات

ثقفی، علی؛ پارساپور، محمدرضا. (1404). بررسی تأثیر تحلیل داده‌های حسابداری با هوش مصنوعی مولد بر کیفیت گزارش دهی دیجیتال پایداری با توجه به نقش میانجی سیستم کنترل داخلی سبز پایداری. دانش حسابداری مالی، 12(1)، 1-31. https://doi.org/10.30479/jfak.2025.21533.3270

رحمانی، علی؛ معنوی، سمیرا؛ حدادی، نفیسه. (1404). ادغام هوش مصنوعی در حسابرسی؛ چالش‌ها و مزایا. حسابرسی سیستم‌ها و فناوری اطلاعات، 1(1)، 1-27. https://doi.org/10.22034/jista.2025.528769.1051

رهنما، مریم؛ رفعتی، حمیدرضا. (1404). پیامدهای اخلاقی پذرش هوش مصنوعی در تصمیم‌گیری‌های مالی. حسابرسی سیستم‌ها و فناوری اطلاعات، 1(1)، 284-301. https://doi.org/10.22034/jista.2025.509536.1032

عدنان حمود، محمد؛ پیری، پرویز؛ آشتاب، علی. (1404). امکان‌‌سنجی بهره‌‌گیری از فناوری‌‌های نوین هوش مصنوعی در بهبود فرایندهای حسابرسی در کشور. بررسیهای حسابداری و حسابرسی، 32(3)، 535-559. https://doi.org/10.22059/acctgrev.2025.391837.1009085

مشایخی، بیتا؛ امراللهی، محمدرضا. (1404). تأثیر دانش و تردید حرفه‌ای حسابرسان داخلی بر به کارگیری هوش مصنوعی. پژوهش‌های تجربی حسابداری، 15(2)، 1-28. https://doi.org/10.22051/jera.2025.50268.3523

نوراحمدی، مرضیه؛ پارسی، فاطمه. (1404). نقش هوش مصنوعی در ارتقای حسابداری سبز و توسعه پایدار: رویکرد نگاشت دانش. پژوهش‌های تجربی حسابداری، 15(2)، 211-238. https://doi.org/10.22051/jera.2025.50235.3512

References

Adnan Hammood, M., Piri, P., & Ashtab, A. (2025). Feasibility of utilizing advanced artificial intelligence technologies to improve auditing processes in the country. Accounting and Auditing Review, 32(3), 535-559. (in Persian) https://doi.org/10.22059/acctgrev.2025.391837.1009085

Agarwal, P., & Gaur, F. (2020). A historical perspective of artificial intelligence in accounting: Evolution, current developments, and future opportunities. Journal of Accounting and Organizational Change, 16(1), 1–12. https://doi.org/10.1108/JAOC-04-2017-0035

AI Index Steering Committee. (2025). The AI Index 2025 annual report. Institute for Human-Centered AI, Stanford University. https://doi.org/10.48550/arXiv.2504.07139

Alibaba Group. (2024, September 19). Alibaba Cloud unveils Qwen2.5, full‑stack AI infrastructure enhancements at 2024 Apsara Conference. Alibaba Group. https://www.alibabagroup.com/en-US/document-1773855135127044096

Albuquerque, F., & Gomes dos Santos, P. (2024). Can ChatGPT Be a Certified Accountant? Assessing the Responses of ChatGPT for the Professional Access Exam in Portugal. Administrative Sciences, 14(7), 152. https://doi.org/10.3390/admsci14070152

Amoah, N., Fianko, S. K., Dake, S., Agyemang, K., Nyame, I., Adjaye-Gyamfi, O., ... & Lartey, R. (2024). The Impact of Ai Chatbots on the Landscape of Professional Accountancy Examination: An Experimental Study. Available at SSRN 4991304. http://dx.doi.org/10.2139/ssrn.4991304

Bordt, S., & von Luxburg, U. (2023). Chatgpt participates in a computer science exam. arXiv preprint arXiv:2303.09461. https://doi.org/10.48550/arXiv.2303.09461

Bommarito, J., Bommarito, M., Katz, D. M., & Katz, J. (2023). GPT as knowledge worker: a zero-shot evaluation of (AI) CPA capabilities. arXiv preprint arXiv:2301.04408. https://doi.org/10.48550/arXiv.2301.04408

Chippagiri, S. (2025, March 4). DeepSeek: Revolutionizing AI with Open‑Source Large Language Models. DEV Community. https://dev.to/srinivas_chippagiri_e01c8/deepseek-revolutionizing-ai-with-open-source-large-language-models-127i

Dell, S., & Akpan, M. (2024). You are the auditor: A ChatGPT-based multiple choice exam. Advances in Online Education: A Peer-Reviewed Journal, 3(2), 111–120. https://doi.org/10.69554/EINF1743

de Freitas, M. M., Sallaberry, J. D., & de Jesus Silva, T. B. (2024). Application of Chat GPT 4.0 for solving accounting problems. GCG: revista de globalización, competitividad y gobernabilidad, 18(2), 49-64. https://dialnet.unirioja.es/servlet/articulo?codigo=9498637

de Winter, J. C. (2024). Can ChatGPT pass high school exams on English language comprehension?. International Journal of Artificial Intelligence in Education, 34(3), 915-930. https://doi.org/10.1007/s40593-023-00372-z

Eulerich, M., Sanatizadeh, A., Vakilzadeh, H., & Wood, D. A. (2023). Can artificial intelligence pass accounting certification exams? ChatGPT: CPA, CMA, CIA, and EA. ChatGPT: CPA, CMA, CIA, and EA. Available at SSRN. http://www.ais.nptu.edu.tw/bsacc/1121%20materials/SSRN-id4452175_ChatGPT%E8%80%83%E6%9C%83%E8%A8%88%E8%AD%89%E7%85%A7.pdf

Eulerich, M., Sanatizadeh, A., Vakilzadeh, H., & Wood, D. A. (2024). Is it all hype? ChatGPT’s performance and disruptive potential in the accounting and auditing industries. Review of Accounting Studies, 29(3), 2318-2349. https://doi.org/10.1007/s11142-024-09833-9

Foote, K. D. (2023, December 28). A brief history of large language models. DATAVERSITY. https://www.dataversity.net/a-brief-history-of-large-language-models/

Glover, E. (2025, July 16). Grok: What we know about Elon Musk’s AI chatbot. Built In. https://builtin.com/articles/grok

Greenman, C., Esplin, D., Johnston, R., & Richards, J. (2024). An Analysis of the Impact of Artificial Intelligence on the Accounting Profession. Journal of Accounting, Ethics & Public Policy, JAEPP, 25(2), 188-188. https://doi.org/10.60154/jaepp.2024.v25n2p188

Guinness, H. (2024, April 3). What is Perplexity AI? How to use it + how it works. Zapier Blog. https://zapier.com/blog/perplexity-ai

Hashemi-Pour, C., Kerner, S. M., & Patrizio, A. (2025, January 8). What is the Google Gemini AI model (formerly Bard)? TechTarget. https://www.techtarget.com/searchenterpriseai/definition/Google-Gemini

Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. (2024). Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382(2270), 20230254. https://doi.org/10.1098/rsta.2023.0254

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS digital health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198

Martínez, E. (2024). Re-evaluating GPT-4’s bar exam performance. Artificial intelligence and law, 1-24. https://doi.org/10.1007/s10506-024-09307-6

Mashayekhi, B., & Amrollahi, M. R. (2025). The effect of internal auditors' knowledge and professional skepticism on the artificial intelligence utilization. Journal of Empirical Research in Accounting, 15(2), 1-28. (in Persian) https://doi.org/10.22051/jera.2025.50268.3523

Mendonça, N. C. (2024). Evaluating chatgpt-4 vision on brazil's national undergraduate computer science exam. ACM Transactions on Computing Education, 24(3), 1-56. https://dl.acm.org/doi/abs/10.1145/3674149

Mikalef, P., & Gupta, M. (2021). Artificial intelligence capability: Conceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance. Information & Management, 58(3), 103434. https://doi.org/10.1016/j.im.2020.103434

National Aeronautics and Space Administration. (2024). What is artificial intelligence? NASA. https://www.nasa.gov/what-is-artificial-intelligence/

Nourahmadi, M., & Parsi, F. (2025). The role of artificial intelligence in enhancing green accounting and sustainable development: a bibliometrix methode. Journal of Empirical Research in Accounting, 15(2), 211-238. (in Persian) https://doi.org/10.22051/jera.2025.50235.3512

Pierotti, M., Monreale, A., & De Santis, F. (2024). Artificial Intelligence in Accounting and Auditing: Accessing the Corporate Implications. Palgrave Macmillan, Switzerland. ISBN. https://doi.org/10.1007/978-3-031-31299-1

Rahmaini, A., Maanavi, S., & Haddadi, N. (2025). Integration of Artificial Intelligence in Auditing: Challenges and Benefits. Journal of Information System and Technology Audit (JISTA), 1(1). 1-27. (in Persian) https://doi.org/ 10.22034/jista.2025.528769.1051

Rahnama, M., & Rafati, H. (2025). The Ethical Implications of Adopting Artificial Intelligence (AI) in Financial Decision-Making. Journal of Information System and Technology Audit (JISTA), 1(1). 284-301. (in Persian) https://doi.org/10.22034/jista.2025.509536.1032

Saghafi, A., & Parsapoor, M. (2025). Examining impact of accounting data analysis with generative ai on the quality of digital sustainability reporting with the mediating role of green sustainability internal control system. Financial Accounting Knowledge, 12(1), 1-31. (in Persian) https://doi.org/10.30479/jfak.2025.21533.3270

SecureNinja. (2025, March 18). Comparison of Top AI Models: DeepSeek AI, ChatGPT, Gemini, and Perplexity AI. SecureNinja Blog. https://secureninja.com/news/comparison-of-top-ai-models-deepseek-ai-chatgpt-gemini-and.html

Sharida, A., & Hashlamon, I. (2021). Real-time vision-based controller for delta robots. International Journal of Intelligent Systems Technologies and Applications, 20 (4), 271–295. https://doi.org/10.1504/IJISTA.2021.10045532

Sharida, A., Hamdan, A., & Al-Hashimi, M. (2020). Smart cities: The next urban evolution in delivering a better quality of life. Toward Social Internet of Things (SIoT): Enabling Technologies, Architectures and Applications: Emerging Technologies for Connected and Smart Social Objects, 287–298. https://doi.org/10.1007/978-3-030-24513-9_16

Stengel, F. C., Stienen, M. N., Ivanov, M., Gandía-González, M. L., Raffa, G., Ganau, M., ... & Motov, S. (2024). Can AI pass the written European Board Examination in Neurological Surgery?-Ethical and practical issues. Brain and Spine, 4, 102765. https://doi.org/10.1016/j.bas.2024.102765

SY Partners. (2025, February 10). The history of GPT: A journey through generative pre-trained transformers. https://syp.vn/en/article/the-history-of-GPT

TechCrunch. (2025, May 20). DeepThink boosts the performance of Google’s flagship Google Gemini AI model. https://techcrunch.com/2025/05/20/deep-think-boosts-the-performance-of-googles-flagship-google-gemini-ai-model

Va˘rzaru, A. A. (2022). Assessing artificial intelligence technology acceptance in managerial accounting. Electronics, 11, 1–13. https://doi.org/10.3390/electronics11142256

Vasarhelyi, M. A., Moffitt, K. C., Stewart, T., & Sunderland, D. (2023). Large language models: An emerging technology in accounting. Journal of Emerging Technologies in Accounting, 20(2), 1–10. https://doi.org/10.2308/JETA-2023-047. https://doi.org/10.2308/JETA-2023-047

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Wood, D. A., Achhpilia, M. P., Adams, M. T., Aghazadeh, S., Akinyele, K., Akpan, M., ... & Kuruppu, C. (2023). The ChatGPT artificial intelligence chatbot: How well does it answer accounting assessment questions?. Issues in Accounting Education, 38(4), 81-108. https://doi.org/10.2308/ISSUES-2023-013

World Economic Forum. (2020). Future of Jobs Report 2020. https://www.weforum.org/publications/the-future-of-jobs-report-2020/

Wutzler, J. (2024). Outsmarting Artificial Intelligence in the Classroom—Incorporating Large Language Model-Based Chatbots into Teaching. Issues in Accounting Education, 39(4), 183-206. https://doi.org/10.5555/ISSUES-2023-064tn

Zacher, W., & Kuppannagari, S. (2024). Can LLMs Pass the CPA Exam? Evaluating Large Language Model Performance on the Certified Public Accountant Test. Available at SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4788096

Zhangyang, Q., Fang, Y., Zhang, M., Sun, Z., Wu, T., Liu, Z., Lin, D., Wang, J., & Zhao, H. (2023, December 22). Gemini vs GPT‑4V: A preliminary comparison and combination of vision‑language models through qualitative cases. arXiv. https://doi.org/10.48550/arXiv.2312.15011

دوره 1، شماره 2 - شماره پیاپی 2
مهر 1404
صفحه 57-91

XML

اصل مقاله 985.39 K

تاریخ دریافت 05 آذر 1404
تاریخ بازنگری 22 دی 1404
تاریخ پذیرش 29 بهمن 1404
تاریخ انتشار 01 مهر 1404

تعداد مشاهده مقاله 833
تعداد دریافت فایل اصل مقاله 72

حسابرسی سیستم‌ها و فناوری اطلاعات

ارزیابی عملکرد مدل‌های زبانی بزرگ در آزمون دکتری حسابداری: مطالعه‌ای مقایسه‌ای از شش چت‌بات هوش مصنوعی مولد

دوره 1، شماره 2 - شماره پیاپی 2مهر 1404صفحه 57-91

فایل ها

سابقه مقاله

هم رسانی

ارجاع به این مقاله

آمار

دوره 1، شماره 2 - شماره پیاپی 2
مهر 1404
صفحه 57-91