Optimizing Deep Q-Networks with fuzzy inference-based adaptive replay buffer management

Dowlatshahi, M. B.; Beiranvand, S.

doi:10.22111/ijfs.2025.9358

تعداد نشریات	31
تعداد شماره‌ها	848
تعداد مقالات	8,174
تعداد مشاهده مقاله	16,079,059
تعداد دریافت فایل اصل مقاله	10,605,586

	Optimizing Deep Q-Networks with fuzzy inference-based adaptive replay buffer management
Iranian Journal of Fuzzy Systems
دوره 22، شماره 4، مهر و آبان 2025، صفحه 161-174 اصل مقاله (1.49 M)
نوع مقاله: Research Paper
شناسه دیجیتال (DOI): 10.22111/ijfs.2025.9358
نویسندگان
M. B. Dowlatshahi^* ¹؛ S. Beiranvand²
¹Department of Computer Engineering, Lorestan University, Khorramabad, Iran
²Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran.
چکیده
Deep reinforcement learning algorithms, such as Deep Q-Networks (DQN), require careful tuning of replay memory pa rameters. In standard DQN implementations, these parameters remain fixed, which conflicts with the dynamic nature of the learning process where environmental conditions and reward stability continuously change. This mismatch often results in unstable learning or slow convergence. In this paper, we present a fuzzy logic-based system for adaptively adjusting three key replay memory parameters: memory size, the ratio of recent samples, and priority weight. The proposed fuzzy system evaluates the agent’s state by monitoring reward variations and average training errors, and accordingly updates these parameters to maintain optimal values during training. To assess the effectiveness of the pro posed approach, we compared it with conventional DQN and PER-DQN methods across three benchmark reinforcement learning environments: CartPole v1, LunarLander v2, and Taxi v3. Experimental and statistical analyses demonstrate that our method improves average rewards, reduces training time, and enhances learning stability.
کلیدواژه‌ها
Reinforcement learning؛ deep neural networks؛ replay memory؛ fuzzy logic؛ adaptive parameter tuning

مراجع
[1] F. E. Alsaadi, et al., A new fuzzy reinforcement learning method for effective chemotherapy, Mathematics, 11(2) (2023), 477. https://doi.org/10.3390/math11020477 [2] M. Annabestani, et al., A new soft computing method for integration of expert’s knowledge in reinforcement learning problems, arXiv, (2021). https://doi.org/10.48550/arXiv.2106.07088 [3] C. D’Eramo, et al., Sharing knowledge in multi-task deep reinforcement learning, arXiv, (2024). https://doi. org/10.48550/arXiv.2401.09561 [4] X. C. Han, et al., Attention ensemble mixture: A novel offline reinforcement learning algorithm for autonomous vehicles, Applied Intelligence, 55(6) (2025), 1-14. https://doi.org/10.1007/s10489-025-06403-7 [5] U. Hwang, H. T. Lim, S. Hong, Tackling environment heterogeneity in federated reinforcement learning, 2025 IEEE Conference on Artificial Intelligence (CAI), IEEE, (2025). https://doi.org/10.1109/CAI64502.2025.00221 [6] J. S. R. Jang, C. T. Sun, E. Mizutani, Neuro-fuzzy and soft computing: A computational approach to learning and machine intelligence, Upper Saddle River, NJ, USA: Prentice Hall, 1997. https://doi.org/10.1109/TAC.1997. 633847 [7] A. A. Khater, M. Fekry, M. El Bardini, A. M. El Nagar, Deep reinforcement learning based adaptive fuzzy con trol for electro hydraulic servo system, Neural Computing and Applications, (2025). https://doi.org/10.1007/ s00521-024-10741-x [8] L. J. Lin, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning, 8(3-4) (1992), 293-321. https://doi.org/10.1007/BF00992699 [9] V. Mnih, et al., Human-level control through deep reinforcement learning, Nature, 518(7540) (2015), 529-533. https://doi.org/10.1038/nature14236 [10] D. E. Neves, L. Ishitani, Z. K. G. P. Junior, Advances and challenges in learning from experience replay, Artificial Intelligence Review, 58(2) (2024), 1-54. https://doi.org/10.1007/s10462-024-11062-0 [11] J. Parker-Holder, et al., Automated reinforcement learning (autorl): A survey and open problems, Journal of Artificial Intelligence Research, 74 (2022), 517-568. https://doi.org/10.48550/arXiv.2201.03916 [12] J. F. Pettit, et al., Disco-dso: Coupling discrete and continuous optimization for efficient generative design in hybrid spaces, Proceedings of the AAAI Conference on Artificial Intelligence, 39(25) (2025). https://doi.org/ 10.48550/arXiv.2412.11051 [13] T. Schaul, et al., Prioritized experience replay, arXiv, (2015). https://doi.org/10.48550/arXiv.1511.05952 [14] T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay, Proceeding International Conference on Learning Representations (ICLR), arXive, (2016). https://doi.org/10.48550/arXiv.1511.05952 [15] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018. https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf [16] M. E. Taylor, P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research, 10 (2009), 1633-1685. https://www.jmlr.org/papers/volume10/taylor09a/taylor09a.pdf [17] H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double Q-learning, Proceedings of the AAAI Conference on Artificial Intelligence, (2016), 2094-2100. https://doi.org/10.48550/arXiv.1509.06461 [18] L. Wang, Y. Huang, Hierarchical reinforcement learning with curriculum learning and subpolicy transfer in navi gation environments, International Conference on Intelligent Computing, Singapore: Springer Nature Singapore, (2025). https://doi.org/10.1007/978-981-96-9894-3_14 [19] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep reinforcement learning, in Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, (2016), 1995-2003. https://doi.org/10.48550/arXiv.1511.06581 [20] K. Young, B. Wang, M. E. Taylor, Metatrace actor-critic: Online step-size tuning by meta-gradient descent for reinforcement learning control, arXiv, (2018). https://doi.org/10.48550/arXiv.1805.04514 [21] L. A. Zadeh, Fuzzy sets, Information and Control, 8(3) (1965), 338-353. https://doi.org/10.1016/ S0019-9958(65)90241-X [22] J. X. Zhan, et al., Accelerating deep reinforcement learning with fuzzy logic rules, International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Cham: Springer Nature Switzerland, (2023). https://doi.org/10.1007/978-3-031-36822-6_23 [23] T. Zhang, et al., Dynamics-adaptive continual reinforcement learning via progressive contextualization, IEEE Trans actions on Neural Networks and Learning Systems, 35(10) (2023), 14588-14602. https://doi.org/10.1109/ TNNLS.2023.3280085 [24] B. Zhao, Z. Z. Liu, Z. Tang, KAFQN: Kolmogorov-Arnold fuzzy-guided Q-network in reinforcement learning, ICASSP 2025, IEEE, (2025). https://doi.org/10.1109/ICASSP49660.2025.10890744 [25] L. L. Zhen, Value-based reinforcement learning, in Artificial Intelligence for Engineers: Basics and Implementations, Cham: Springer Nature Switzerland, (2025), 337-355. https://doi.org/10.1007/978-3-031-75953-6_14
آمار تعداد مشاهده مقاله: 461 تعداد دریافت فایل اصل مقاله: 349

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

اخبار و اعلانات

آمار

Optimizing Deep Q-Networks with fuzzy inference-based adaptive replay buffer management