Journal of Information System and Technology Auditing

Journal of Information System and Technology Auditing

Anomaly Detection in Information Technology Auditing Using Risk-Based Pseudo-Labels and the Random Forest Algorithm

Document Type : Original Article

Authors
1 Professor, Department of Computer Engineering Faculty of Engineering, Alzahra University Tehran, Iran
2 M. Sc. Graduated, Data Mining Laboratory, Department of Computer Engineering Faculty of Engineering, Alzahra University Tehran, Iran
3 M. Sc. Student, Data Mining Laboratory, Department of Computer Engineering Faculty of Engineering, Alzahra University Tehran, Iran
4 PhD Student, Data Mining Laboratory, Department of Computer Engineering Faculty of Engineering, Alzahra University Tehran, Iran
Abstract
With the increasing use of information systems and the growing volume and diversity of system data, information technology audit faces new challenges in identifying abnormal and high-risk behaviors. Traditional audit methods, which are mainly based on manual inspections and static rules, have limited capability in detecting complex and non-linear patterns in today''s data. In this research, the anomaly detection problem in information technology audit is modeled as a binary classification task, and a data-driven approach based on machine learning is proposed to identify and prioritize high-risk cases. In the proposed method, transaction, customer, and merchant data are integrated, and after structured preprocessing, audit-oriented features are extracted, including temporal patterns, cross-system discrepancy indicators, and deviations from normal customer behavior. This study uses the public "IEEE-CIS Fraud Detection" dataset, consisting of 1000 transactions with 25 features. The features include raw transaction and customer data, as well as indicators extracted based on an audit approach such as temporal patterns and deviations from normal behavior. Due to the limitation of actual anomaly labels, a pseudo-labeling mechanism based on audit rules and risk scoring is designed and used as the target variable for training the random forest model. The model output is a probability score that enables ranking transactions and extracting prioritized high-risk cases. Experimental results show that the proposed method achieves 97% accuracy, 85% precision, 93% recall, and 89% F1-score on the test set, and can be used as an effective decision support tool for information technology audit.
Keywords

Subjects


Ahmadi, S.J., Faghani Makarani, K., & Fazeli, N. (2024). Data mining techniques and financial statement fraud prediction. Journal of Management Accounting and Auditing Knowledge, 13(52), 15–28. https://www.iaaaas.com/article_223291.html (in Persian)
Alsalmi, E., Alhuzali, A., & Alhothali, A. (2025). Log-based anomaly detection of system logs using graph neural network. Computers, Materials and Continua, 86(2), 1–20.
Bagherian Kasegari, A., Raeisi Vanani, I., Amiri, M., & Homayoun, S. (2024). Detection of financial fraud in public companies using financial and non-financial criteria with a machine learning approach. Intelligent Business Management Studies, 13(50), 99–142. https://ims.atu.ac.ir/article_18048.html (in Persian)
Chacko, N., Ravichandaran, M., Rao, R., & Chandra Shenoi, S. (2012). An anomalous cooling event observed in the Bay of Bengal during June 2009. Ocean Dynamics, 62(5), 671–681.
Chalapathy, R., & Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv preprint, arXiv:1901.03407.
Chen, Y., Zhao, C., Xu, Y., Nie, C., & Zhang, Y. (2025). Deep learning in financial fraud detection: Innovations, challenges, and applications. Data Science and Management.
De la Cruz Cabello, M., Sales, T., & Machado, M. (2025). AIOps for log anomaly detection in the era of LLMs: A systematic literature review. Intelligent Systems with Applications, 200608.
De Vries, T. (2022). Anomaly detection in IT audit: The possibilities and potential in the domain of IT audit [Master’s thesis, University of Turku].
Dzuranin, A. C., & Mălăescu, I. (2016). The current state and future direction of IT audit: Challenges and opportunities. Journal of Information Systems, 30(1), 7–20.
Fazlzadeh, A., Haghighat, J., Pourkian, F., & Ahmadian, V. (2019). Testing the performance of the random forest algorithm and the deep neural network algorithm in a statistical arbitrage strategy. Financial Engineering and Securities Management, 10(40), 349–364. https://sid.ir/paper/197626/fa (in Persian)
Gantz, S. D. (2013). The basics of IT audit: Purposes, processes, and practical information. Elsevier.
Hasan, M. T., & Ahmed, I. (2025). AI-driven anomaly detection for data loss prevention and security assurance in electronic health records. Review of Applied Science and Technology, 4(3), 35–67.
Hilal, W., Gadsden, S., & Yawney, J. (2022). Financial fraud: A review of anomaly detection techniques and recent advances. Expert Systems with Applications, 193, 116429.
Hozouri, A., Mirzaei, A., & Effatparvar, M. (2025). A comprehensive survey on intrusion detection systems with advances in machine learning, deep learning and emerging cybersecurity challenges. Discover Artificial Intelligence, 5(1), 314. (in Persian)
Kakavand Teimoory, G., Keyvanpour, M. R., & Ghaebi, M. (2025). Explainable diabetes prediction via hybrid data preprocessing and ensemble learning. International Journal of Web Research, 8(4), 51–66.
Karimi Far, A., Darabi, R., & Hamidian, M. (2025). Evaluating the efficiency of regression and deep learning approaches in detecting financial statement fraud with a focus on the justification dimension. Accounting and Auditing Studies, 15(3), 241-282. https://journals.alzahra.ac.ir/article_8266.html?lang=en (in Persian)
Kazemi, T., & Piri, M. (2022). Predicting financial reporting fraud schemes using a multi-class machine learning approach. Empirical Research in Accounting, 12(4), 255–280. https://jera.alzahra.ac.ir/article_6880.html (in Persian)
Mohan, C. K., & Mehrotra, K. G. (2017). Anomaly detection in banking operations. IDRBT Journal, 16.
Motie, S., & Raahemi, B. (2024). Financial fraud detection using graph neural networks: A systematic review. Expert Systems with Applications, 240, 122156.
Niu, W., Liao, X., Huang, S., Li, Y., Zhang, X., & Li, B. (2024). A robust wide and deep learning framework for log-based anomaly detection. Applied Soft Computing, 153, 111314.
Okolie, S., Amadi, C., Odii, J., Nwokorie, E., & Onyemauche, U. (2025). Anomaly detection in heterogeneous cybersecurity data. Franklin Open, 100426.
Patel, T., & Iyer, S. S. (2025). SiaDNN: Siamese deep neural network for anomaly detection in user behavior. Knowledge-Based Systems, 113769.
Pinto, S. O. & Sobreiro, V. A. (2022). Literature review: Anomaly detection approaches on digital business financial systems. Digital Business, 2(2), 100038.
Quinn, M., & Strauss, E. (2018). The Routledge companion to accounting information systems. Routledge.
Rahmani, A., Manavi, S., & Haddadi, N. (2025). Integrating artificial intelligence into auditing: Challenges and benefits. Systems Auditing and Information Technology, 1(1), 1–27. (in Persian)
Rahnamay Roudposhti, F. (2012). Data mining and financial fraud detection. Knowledge of Accounting and Management Auditing, 1(3), 17–33. https://sid.ir/paper/238039/fa (in Persian)
Ram, Murugan & Khamar (2024). AI-driven network anomaly detection for enhanced cybersecurity and performance. Proceedings of the 9th International Conference on Communication and Electronics Systems (ICCES), IEEE.
Rezaei Pithenoei, Y., Asghari Shalmani, M., & Deliridehbaneh, H. (2021). Introducing a suitable organizing framework for data mining applications in accounting and auditing: A review of popular techniques for financial data classification. Journal of Modern Research Approaches in Management and Accounting, 5(19), 1507–1525. https://www.majournal.ir/index.php/ma/article/view/1207 (in Persian)
Romney, M. B., Stainbart, P. G., Summers, S. L., & Wood, D. A. (2006). Accounting information systems. Prentice Hall.
Soltani, M., Mohammadinejhad, Z., & Mohseni, A. H. (2024). BGP routing algorithm evaluation. International Conference on Soft Computing. https://civilica.com/doc/1967023/ (in Persian)
Sun, Y., Keung, J., Yang, Z., Liu, S., & Liao, Y. (2025). SemiSMAC: A semi-supervised framework for log anomaly detection with automated hyperparameter tuning. Information and Software Technology, 107869.
Thiprungsri, S., & Vasarhelyi, M. A. (2011). Cluster analysis for anomaly detection in accounting data: An audit approach. International Journal of Digital Accounting Research, 11.
Uchida, H., Tominaga, K., Itai, H., Li, Y., & Nakatoh, Y. (2024). Improving log anomaly detection via spatial pooling: Combining SPClassifier with ensemble method. Cognitive Robotics, 4, 217–227.
Wu, J., Zhang, S., Liu, H., & Yang, W. (2025). AAR-Log: A robust log anomaly detection method resisting adversarial attacks. Computer Networks, 111471.
Volume 1, Issue 2 - Serial Number 2
September 2026
Pages 92-126

  • Receive Date 05 January 2026
  • Revise Date 25 February 2026
  • Accept Date 07 March 2026
  • Publish Date 23 September 2025