| تعداد نشریات | 31 |
| تعداد شمارهها | 825 |
| تعداد مقالات | 7,925 |
| تعداد مشاهده مقاله | 14,584,122 |
| تعداد دریافت فایل اصل مقاله | 9,440,860 |
Fuzzy Radial Basis Function Least Square Policy Iteration: A Novel Critic-Only Reinforcement Learning Framework | ||
| Iranian Journal of Fuzzy Systems | ||
| دوره 22، شماره 2، خرداد و تیر 2025، صفحه 59-80 اصل مقاله (1.18 M) | ||
| نوع مقاله: Research Paper | ||
| شناسه دیجیتال (DOI): 10.22111/ijfs.2025.48345.8503 | ||
| نویسندگان | ||
| Omid Mehrabi1؛ Ahmad Fakharian* 2؛ Mehdi Siahi1؛ Amin Ramezani3 | ||
| 1Department of Electrical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran | ||
| 2Department of Electrical Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran | ||
| 3Department of Electrical and Computer Engineering, Tarbiat Modares University (TMU), Tehran, Iran. | ||
| چکیده | ||
| In this paper, a new form of critic-only Reinforcement Learning algorithm for continuous state spaces control problems is proposed. Our approach, called Fuzzy-RBF Least Square Policy Iteration (FRLSPI), tunes the weight parameters of the fuzzy-RBF network (a hybrid model constituted by combining Takagi-Sugeno fuzzy rule inference system with RBF network) online and is acquired through combining Least Squares Policy Iteration (LSPI) with fuzzy-RBF network as a function approximator. In FRLSPI, based on the basis functions defined in the fuzzy-RBF network, a solution has been provided for the challenge of determining the state-action basis functions in LSPI. We also provide positive theoretical results concerning an error bound between the optimal and the approximated Action Value Function (AVF) for FRLSPI. Our proposed method has suitable features such as positive mathematical analysis, learning rate independency and, comparatively good convergence properties. Simulation studies regarding the mountain-car control task and acrobat problem demonstrate the applicability and performance of our learning framework. The overall results indicate that the proposed idea can outperform previously known reinforcement learning algorithms. | ||
| کلیدواژهها | ||
| Fuzzy reinforcement learning؛ fuzzy-RBF network؛ Generalization؛ Least Square Policy Iteration | ||
| مراجع | ||
|
[1] D. Allahverdy, A. Fakharian, M. B. Menhaj, Back-stepping integral sliding mode control with iterative learning control algorithm for quadrotor UAVs, Journal of Electrical Engineering and Technology, 14(6) (2019), 2539-2547. https://doi.org/10.1007/s42835-019-00257-z [2] B. Andr´e, C. Anderson, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Artificial Intelligence, 172(4-5) (2008), 454-482. https://doi.org/10.1016/j.artint.2007.08.001 [3] A. Barakat, P. Bianchi, J. Lehmann, Analysis of a target-based actor-critic algorithm with linear function approximation, arXiv preprint arXiv:2106.07472, (2021). https://doi.org/10.48550/arXiv.2106.07472 [4] D. P. Bertsekas, J. N. Tsitsiklis, Neuro-dynamic programming, Belmont, MA: Athena Scientific, 1996.
[5] L. Bu¸soniu, D. Ernst, B. De Schutter, R. Babuˇska, Online least-squares policy iteration for reinforcement learning control, Proceedings of the 2010 American Control Conference, Baltimore, MD, USA, (2010), 486-491. https: //doi.org/10.1109/ACC.2010.5530856 [6] L. Bu¸soniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuˇska, B. De Schutter, Least-squares methods for policy iteration, In: Wiering, M., van Otterlo, M. (Eds.), Reinforcement Learning: State-of-the-Art. In: Adaptation, Learning, and Optimization, 12, Springer, Heidelberg, Germany, (2012), 75-109. https://doi.org/10.1007/ 978-3-642-27645-3-3 [7] Y. Cui, T. Matsubara, K. Sugimoto, Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states, Neural Networks, 94 (2017), 13-23. https://doi.org/10.1016/j. neunet.2017.06.007 [8] V. Derhami, V. J. Majd, M. N. Ahmadabadi, Fuzzy sarsa learning and the proof of existence of its stationary points, Asian Journal of Control, (2008), 535-549. https://doi.org/10.1002/asjc.54 [9] Y. Duan, X. Chen, R. Houthooft, J. Schulman, P. Abbeel, Benchmarking deep reinforcement learning for continuous control, arXiv preprint arXiv:1604.06778, (2016). https://doi.org/10.48550/arXiv.1604.06778 [10] K. Fr¨amling, Light-weight reinforcement learning with function approximation for real-life control tasks, In Proceedings of 5th International Conference on Informatics in Control, Automation and Robotics, Funchal, Madeira, Portugal, (2008), 127-134. https://doi.org/10.5220/0001484001270134 [11] F. Ghorbani, V. Derhami, M. Afsharchi, Fuzzy least square policy iteration and its mathematical analysis, International Journal of Fuzzy Systems, (2016), 1-14. https://doi.org/10.1007/s40815-016-0270-1 [12] R. A. Howard, Dynamic programming and Markov processes, New York: Wiley, 1960.
[13] K. S. Hwang, S. W. Tan, M. C. Tsai, Reinforcement learning to adaptive control of nonlinear systems, IEEE Transactions on Systems, Man, and Cybernetics-Part B, 33(3) (2003), 514-521. https://doi.org/10.1109/TSMCB. 2003.811112 [14] H. S. Jakab, L. Csat´o, Sparse approximations to value functions in reinforcement learning, In: Koprinkova-Hristova, P., Mladenov, V., Kasabov, N. (eds) Artificial Neural Networks. Springer Series in Bio-/Neuroinformatics, 4, Springer, Cham, (2015). https://doi.org/10.1007/978-3-319-09903-3-14 [15] Y. Jia, X. Y. Zhou, Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms, Journal of Machine Learning Research, 23(275) (2022), 12603-12652. https://doi.org/10.2139/ssrn.3969101 [16] M. G. Lagoudakis, R. Par, Least-squares policy iteration, Journal of Machine Learning Research, (2003), 1107-1249. https://doi.org/10.1162/1532443041827907 [17] Y. J. Liu, L. Tang, S. Tong, C. P. Chen, D. J. Li, Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, IEEE Transactions on Neural Networks and Learning Systems, 26(1) (2015), 165-176. https://doi.org/10.1109/TNNLS.2014.2360724 [18] I. Nishikawa, K. Matsunaga, An unsupervised learning of a layered network and its application to a motion acquisition, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No.02CH37290), Honolulu, HI, USA, 2 (2002), 1667-1672. https://doi.org/10.1109/IJCNN.2002.1007768 [19] W. Rudin, Principles of mathematical analysis, 3rd ed. New York, NY, USA: McGraw-Hill Education, 1976.
[20] B. Saglam, C. Cicek, F. Multu, S. Kozat, Off-Policy correction for actor-critic algorithms in deep reinforcement learning, arXiv, (2022). https://doi.org/10.48550/arXiv.2208.00755 [21] Second Annual Reinforcement Learning Competition. http://rl-competition.org.
[22] A. Sheikhlar, A. Fakharian, Online policy iteration-based tracking control of four wheeled omni-directional robots, Journal of Dynamic Systems, Measurement, and Control, 140(8) (2018), 081017. https://doi.org/10.1115/1. 4039287 [23] J. Sherman, W. J. Morrison, Adjustment of an inverse matrix corresponding to a change in one element of a given matrix, The Annals of Mathematical Statistics, 21(1) (1950), 124-127. https://doi.org/10.1214/aoms/ 1177729893 [24] N. Snehal, W. Pooja, K. Sonam, S. R. Wagh, N. M. Singh, Control of an acrobot system using reinforcement learning with probabilistic policy search, 2021 Australian and New Zealand Control Conference (ANZCC), Gold Coast, Australia, (2021), 68-73. https://doi.org/10.1109/ANZCC53563.2021.9628194 [25] E. H. Sumiea, S. J. Abdulkadir, H. Alhussian, S. M. Al-Selwi, A. Alqushaibi, M. G. Ragab, S. M. Fati, Deep deterministic policy gradient algorithm: A systematic review, Heliyon, 10 (2024). https://doi.org/10.1016/j. heliyon.2024.e30697 [26] R. S. Sutton, A. G. Bareto, Reinforcement learning: An introduction, Second Edition, MIT Press, Massachusetts, 2017. [27] N. Tziortziotis, C. Dimitrakakis, M. Vazirgiannis, Randomised bayesian least-squares policy iteration, arXiv, (2019). https://doi.org/10.48550/arXiv.1904.03535 [28] B. Varga, B. Kulcs´ar, M. H. Chehreghani, Deep Q-learning: A robust control approach, International Journal of Robust and Nonlinear Control, 33(1) (2023), 526-554. https://doi.org/10.1038/nature14236 [29] X. Xu, D. Hu, X. Lu, Kernel-based least squares policy iteration for reinforcement learning, In IEEE Transactions on Neural Networks, 18(4) (2007), 973-992. https://doi.org/10.1109/TNN.2007.899161 [30] X. Xu, C. Liu, S. X. Yang, D. Hu, Hierarchical approximate policy iteration with binary-tree state space decomposition, In IEEE Transactions on Neural Networks, 22(12) (2011), 1863-1877. https://doi.org/10.1109/TNN. 2011.2168422 [31] S. Yahyaa, B. Manderick, Knowledge gradient for online reinforcement learning, In: Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (Eds.), Agents and Artificial Intelligence. In: ICAART 2014 LNCS, 8946, Springer, Cham, (2014), 103-118. https://doi.org/10.1007/978-3-319-25210-0-7 [32] M. Zaki, A. Mohan, A. Goplan, S. Manner, Actor-critic based improper reinforcement learning, arXiv, (2022). https://doi.org/10.48550/arXiv.2207.09090 | ||
|
آمار تعداد مشاهده مقاله: 458 تعداد دریافت فایل اصل مقاله: 290 |
||