| تعداد نشریات | 33 |
| تعداد شمارهها | 799 |
| تعداد مقالات | 7,730 |
| تعداد مشاهده مقاله | 13,776,214 |
| تعداد دریافت فایل اصل مقاله | 9,006,561 |
Building classification algorithm based on combining the improved principal component analysis and fuzzy relationship of Bayesian method | ||
| Iranian Journal of Fuzzy Systems | ||
| دوره 22، شماره 5، آذر و دی 2025، صفحه 49-68 اصل مقاله (1.52 M) | ||
| نوع مقاله: Research Paper | ||
| شناسه دیجیتال (DOI): 10.22111/ijfs.2025.52578.9280 | ||
| نویسندگان | ||
| Hieu Huynh-Van1؛ Tai Vovan* 2 | ||
| 1Faculty of Applied Science, Ho Chi Minh City University of Technology (HCMUT) | ||
| 2College of Natural Science, Can Tho University, Vietnam | ||
| چکیده | ||
| This article develops a novel classification algorithm that integrates the improved Bayesian approach with Principal Component Analysis (PCA). First, PCA significantly reduces the dimensionality of the data while enhancing inter-class separability, thereby improving classification performance. Next, the prior probabilities are refined based on newly derived variables established for both the training and test sets using a fuzzy clustering analysis technique. Finally, the Bayesian classification rule is constructed based on the probability density functions of the classes, using the transformed variables obtained from PCA and the estimated prior probabilities. The algorithm is presented in detail, including procedural steps, illustrative example, and implementation on both numerical and image data through a the establised RStudio procedure. A key contribution of this study lies in the theoretical enhancement of Bayes error analysis and the proof of convergence of the proposed algorithm. Empirical applications demonstrate that the proposed algorithm yields stable results and outperforms several existing approaches when applied to certain numerical and image datasets using statistical parameters. | ||
| کلیدواژهها | ||
| Bayesian classifier؛ fuzzy relationship؛ image analysis؛ posterior probability؛ principal component analysis | ||
| مراجع | ||
|
[1] H. A. Alfeilat, V. B. Surya Prasath, A. B. A. Hassanat, O. Lasassmeh, A. Abu Alfeilat, A. S. Tarawneh, M. Bashir, Distance and similarity measures effect on the performance of k-nearest neighbor classifier: A review, Big Data, 7(2) (2019), 123-144. https://doi.org/10.1089/big.2018.0175 [2] M. Awad, R. Khanna, Support vector machines for classification, in Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers (Springer, 2015), 39-66. https://doi.org/10.1007/ 978-1-4302-5990-9_3 [3] D. K. Behera, M. Das, S. Swetanisha, J. Nayak, S. Vimal, B. Naik, Follower link prediction using the XGBoost classification model with multiple graph features, Wireless Personal Communications, 127 (2022), 695-714. https: //doi.org/10.1007/s11277-021-08399-y [4] I. D. Borlea, R. E. Precup, F. Dr˘agan, A. B. Borlea, Centroid update approach to k-means clustering, Advances in Electrical and Computer Engineering, 17 (2017), 3-10. https://doi.org/10.4316/AECE.2017.04001 [5] R. B. Cattell, The scree test for the number of factors, Multivariate Behavioral Research, 1 (1966), 245-276. https: //doi.org/10.1207/s15327906mbr0102_10 [6] Y. P. Chen, C. H. Liu, K. Y. Chou, S. Y. Wang, Real-time and low-memory multi-face detection system design based on naive Bayes classifier using FPGA, 2016 International Automatic Control Conference (CACS), IEEE, (2016), 7-12. https://doi.org/10.1109/CACS.2016.7973875 [7] H. S. Chiang, D. H. Shih, B. Lin, M. H. Shih, An APN model for arrhythmic beat classification, Bioinformatics, 30 (2014), 1739-1746. https://doi.org/10.1093/bioinformatics/btu101 [8] P. Domingos, M. Pazzani, On the optimality of the simple Bayesian classifier under zero-one loss, Machine Learning, 29(2) (1997), 103-130. https://doi.org/10.1023/A:1007413511361 [9] J. C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Journal of Cybernetics, 3 (1973), 32-57. https://doi.org/10.1080/01969727308546046 [10] Z. Feng, G. Huang, D. Chi, Classification of the complex agricultural planting structure with a semi-supervised extreme learning machine framework, Remote Sensing, 12(22) (2020), 3708. https://doi.org/10.3390/rs12223708 [11] R. A. Fisher, The statistical utilization of multiple measurements, Annals of Eugenics, 8 (1938), 376-386. https: //doi.org/10.1111/j.1469-1809.1938.tb02189.x [12] T. Hastie, R. Tibshirani, J. H. Friedman, The elements of statistical learning: Data mining, inference, and prediction, 2nd ed., Springer, 2009. https://doi.org/10.1007/978-0-387-84858-7 [13] D. W. Hosmer Jr., S. Lemeshow, R. X. Sturdivant, Applied logistic regression, 3rd ed., Wiley, 2013. https: //doi.org/10.1002/9781118548387 [14] S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, W. Xu, Applications of support vector machine (SVM) learning in cancer genomics, Cancer Genomics and Proteomics, 15 (2018), 41-51. https://doi.org/10.21873/ cgp.20063 [15] H. Huynh-Van, T. Le-Hoang, T. Thai-Minh, H. Nguyen-Dinh, T. Vo-Van, Classifying the lung images for people infected with COVID-19 based on the extracted feature interval, Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 11 (2023), 856-865. https://doi.org/10.1080/21681163.2022.2117645 [16] H. Huynh-Van, T. Le-Hoang, T. Vo-Van, Classifying for images based on the extracted probability density function and the quasi Bayesian method, Computational Statistics, 39 (2024), 2677-2701. https://doi.org/10.1007/ s00180-023-01400-1 [17] I. T. Jolliffe, Principal component analysis, 2nd ed., Springer, 2002. https://doi.org/10.1007/b98835
[18] H. F. Kaiser, The application of electronic computers to factor analysis, Educational and Psychological Measurement, 20 (1960), 141-151. https://doi.org/10.1177/001316446002000116 [19] D. Lazo, R. Calabrese, C. Bravo, The effects of customer segmentation, borrower behaviors and analytical methods on the performance of credit scoring models in the agribusiness sector, The Journal of Credit Risk, 16(4) (2021), 119-156. https://doi.org/10.21314/JCR.2020.272 [20] F. Louzada, A. Ara, G. B. Fernandes, Classification methods applied to credit scoring: Systematic review and overall comparison, Surveys in Operations Research and Management Science, 21 (2016), 117-134. https://doi. org/10.1016/j.sorms.2016.10.001 [21] E. G. Mansoori, M. J. Zolghadri, S. D. Katebi, Using distribution of data to enhance performance of fuzzy classification systems, Iranian Journal of Fuzzy Systems, 4(1) (2007), 21-36. https://doi.org/10.22111/ijfs.2007.355 [22] M. Mirbabaie, S. Stieglitz, N. R. J. Frick, Artificial intelligence in disease diagnostics: A critical review and classification on the current state of research guiding future direction, Health and Technology, 11 (2021), 1-23. https://doi.org/10.1007/s12553-021-00555-5 [23] J. G. Neto, L. V. Ozorio, T. C. C. de Abreu, B. F. Dos Santos, F. Pradelle, Modeling of biogas production from food, fruits and vegetables wastes using artificial neural network (ANN), Fuel, 285 (2021), 119081. https: //doi.org/10.1016/j.fuel.2020.119081 [24] T. Nguyen-Trang, T. Vo-Van, A new approach for determining the prior probabilities in the classification problem by Bayesian method, Advances in Data Analysis and Classification, 11 (2017), 629-643. https://doi.org/10. 1007/s11634-016-0253-y [25] V. H. Nhu, D. Zandi, H. Shahabi, K. Chapi, A. Shirzadi, N. Al-Ansari, S. K. Singh, J. Dou, H. Nguyen, Comparison of support vector machine, bayesian logistic regression, and alternating decision tree algorithms for shallow landslide susceptibility mapping along a mountainous road in the west of Iran, Applied Sciences, 10(15) (2020), 5047. https: //doi.org/10.3390/app10155047 [26] J. Ning, Neural network-based pattern recognition in the framework of edge computing, Romanian Journal of Information Science and Technology, 27(1) (2024), 106-119. https://doi.org/10.59277/ROMJIST.2024.1.08
[27] T. Pham-Gia, N. Turkkan, A. Bekker, Bounds for the Bayes error in classification: A Bayesian approach using discriminant analysis, Statistical Methods and Applications, 16 (2007), 7-26. https://doi.org/10.1007/ s10260-006-0012-x [28] T. Pham-Gia, N. Turkkan, T. Vovan, Statistical discrimination analysis using the maximum function, Communications in Statistics–Simulation and Computation, 37(2) (2008), 320-336. https://doi.org/10.1080/ 03610910701790475 [29] D. Pham-Toan, T. Vo-Van, Improving the genetic algorithm in fuzzy cluster analysis for numerical data and its applications, Iranian Journal of Fuzzy Systems, 20(5) (2023), 171-187. https://doi.org/10.22111/ijfs.2023. 7834 [30] G. R. Terrell, D. W. Scott, Variable kernel density estimation, The Annals of Statistics, (1992), 1236-1265. http: //www.jstor.org/stable/2242011 [31] T. Vo-Van, L1-distance and classification problem by Bayesian method, Journal of Applied Statistics, 44(3) (2017), 385-401. https://doi.org/10.1080/02664763.2016.1174194 [32] T. Vo-Van, Some results of classification problem by Bayesian method and application in credit operation, Statistical Theory and Related Fields, 2(2) (2018), 150-157. https://doi.org/10.1080/24754269.2018.1528420 [33] T. Vo-Van, An improved fuzzy time series forecasting model using variations of data, Fuzzy Optimization and Decision Making, 18 (2019), 151-173. https://doi.org/10.1007/s10700-018-9290-7 [34] T. Vo-Van, H. Che-Ngoc, N. Le-Dai, T. Nguyen-Trang, A new strategy for short-term stock investment using Bayesian approach, Computational Economics, 59(2) (2022), 887-911. https://doi.org/10.1007/ s10614-021-10115-8 [35] T. Vo-Van, H. Che-Ngoc, T. Nguyen-Trang, Textural features selection for image classification by Bayesian method, 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNCFSKD), (2017), 733-139. https://doi.org/10.1109/FSKD.2017.8393365 [36] T. Vo-Van, T. Pham-Gia, Clustering probability distributions, Journal of Applied Statistics, 37(11) (2010), 1891- 1910. https://doi.org/10.1080/02664760903186049 [37] J. Wagner, T. O. J. Klippel, Using machine learning techniques for crop classification in agriculture, Journal of Precision Agriculture, 19(4) (2018), 200-215. https://doi.org/10.1007/s11119-017-9524-9 [38] Y. Wu, X. Jiang, J. Kim, L. Ohno-Machado, Grid binary logistic regression (GLORE): Building shared models without sharing data, Journal of the American Medical Informatics Association, 19(5) (2012), 758-764. https: //doi.org/10.1136/amiajnl-2012-000862 [39] A. J. Wyner, M. Olson, J. Bleich, D. Mease, Explaining the success of AdaBoost and random forests as interpolating classifiers, Journal of Machine Learning Research, 18(48) (2017), 1-33. http://jmlr.org/papers/v18/15-240. html | ||
|
آمار تعداد مشاهده مقاله: 187 تعداد دریافت فایل اصل مقاله: 99 |
||