Allo-Self-RAG: Fuzzy aggregation of internal and external critique signals for improved Self-RAG evaluation

Hosseini, F.; Eftekhari, M.

doi:10.22111/ijfs.2026.9936

تعداد نشریات	31
تعداد شماره‌ها	839
تعداد مقالات	8,107
تعداد مشاهده مقاله	15,478,848
تعداد دریافت فایل اصل مقاله	10,326,658

	Allo-Self-RAG: Fuzzy aggregation of internal and external critique signals for improved Self-RAG evaluation
Iranian Journal of Fuzzy Systems
دوره 23، شماره 3، مرداد و شهریور 2026، صفحه 141-163 اصل مقاله (4.03 M)
نوع مقاله: Research Paper
شناسه دیجیتال (DOI): 10.22111/ijfs.2026.9936
نویسندگان
F. Hosseini¹؛ M. Eftekhari^* ¹^{، 2}
¹Department of computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
²Visiting Researcher at Institute for Applied Computer Science (InfAI), Nature-Inspired Machine Intelligence, Dresden, Germany
چکیده
Retrieval-Augmented Generation (RAG) systems play a crucial role in grounding Large Language Models (LLMs) with external knowledge. However, existing architectures such as Self-RAG employ static linear aggregation of internal critique tokens, which requires manual tuning and inadequately models the non-linear interactions underlying retrieval and generation. Moreover, exclusive reliance on self-critique can introduce confirmation bias and hallucinations. To overcome these limitations, this work introduces Allo-Self-RAG, a neuro-symbolic framework that integrates fuzzy logic with RAG. From a dual-process-inspired perspective, Allo-Self-RAG is framed as a structured enhancement over the heuristic Self-RAG baseline: the standard Self-RAG pipeline is closer to System-1-like post-retrieval behavior, whereas Allo-Self-RAG introduces a more System-2-like evaluation layer through structured signal aggregation. A Fuzzy Inference System (FIS) adaptively fuses internal self-critique tokens with external allo-critique signals from an independent reranker, replacing static linear aggregation with rule-guided score integration and a rule-based revision mechanism for conflicting evidence. When this evaluation stage detects ambiguity or conflicting evidence among top-ranked candidates, the framework automatically invokes a synthesis step to reconcile contradictions and produce a more reliable consensus answer. Simulated Annealing (SA) is employed to optimize fuzzy membership functions automatically using a small calibration dataset, eliminating manual parameter tuning. Extensive experimental evaluation demonstrates that Allo-Self-RAG consistently outperforms the Self-RAG baseline, achieving 56.61% accuracy on PopQA (+1.45% improvement), 66.98% on ARC-Challenge (+1.03% improvement), and 67.51% on PubHealth (+1.01% improvement), showing reliable gains across retrieval-augmented question answering benchmarks.
کلیدواژه‌ها
Retrieval-augmented generation؛ fuzzy inference systems؛ simulated annealing؛ external evaluator؛ evidence synthesis؛ neuro-symbolic approach

مراجع
[1] F. Abdolinejad, M. Eftekhari, Augmenting RAG with nonnegative matrix factorization-driven semantic chunking in embedding space, The Journal of Supercomputing, 82 (2026), 224. https://doi.org/10.1007/ s11227-026-08370-3 [2] A. Asai, Z. Wu, et al., Self-RAG: Learning to retrieve, generate, and critique through self-reflection, arXiv, (2023). https://arxiv.org/abs/2310.11511 [3] N. A. Birur, T. Baswa, et al., VERA: Validation and enhancement for retrieval augmented systems, arXiv, (2024). https://arxiv.org/abs/2409.15364 [4] W. Cai, J. Jiang, et al., A survey on mixture of experts in large language models, IEEE Transactions on Knowledge and Data Engineering, 37(7) (2025), 3896-3915. https://doi.org/10.1109/TKDE.2025.3554028 [5] P. Chen, X. Liu, et al., Fuzzy reasoning chain (FRC): An innovative reasoning framework from fuzziness to clarity, Findings of the Association for Computational Linguistics: EMNLP 2025, Association for Computational Linguistics, (2025), 10230-10240. https://doi.org/10.18653/v1/2025.findings-emnlp.541 [6] P. Clark, et al., Think you have solved question answering? Try ARC, the AI2 reasoning challenge, arXiv, (2018). https://arxiv.org/abs/1803.05457 [7] J. Deng, Y. Shen, et al., Influence guided context selection for effective retrieval-augmented generation, arXiv, (2025). https://arxiv.org/abs/2509.21359 [8] Y. Dubois, et al., AlpacaFarm: A simulation framework for methods that learn from human feedback, arXiv, (2024). https://arxiv.org/abs/2305.14387 [9] M. Eftekhari, A. Mehrpooya, et al., How fuzzy concepts contribute to machine learning, Springer, 2022. https: //doi.org/10.1007/978-3-030-94066-9 [10] L. Gao, X. Ma, J. Lin, J. Callan, Precise zero-shot dense retrieval without relevance labels, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, (2023), 1762-1777. https://doi.org/10.18653/v1/2023.acl-long.99 [11] A. Garcez, L. C. Lamb, Neurosymbolic AI: The 3rd wave, Artificial Intelligence Review, 56 (2023), 12387-12406. https://doi.org/10.1007/s10462-023-10448-w [12] F. Hosseini, M. Eftekhari, PFE-SELF-RAG: Balancing self-RAG evaluation metrics via Pareto efficiency, Journal of Mahani Mathematical Research, (2026), 179-208. https://doi.org/10.22103/jmmr.2026.25661.1841 [13] Y. Huang, J. Xiangji Huang, A survey on retrieval-augmented text generation for large language models, ACM Computing Surveys, 58(12) (2026). https://doi.org/10.1145/3805774 [14] G. Izacard, M. Caron, et al., Unsupervised dense information retrieval with contrastive learning, Transactions on Machine Learning Research, (2022). http://dblp.uni-trier.de/db/journals/tmlr/tmlr2022.html# IzacardCHRBJG22 [15] S. Jeong, J. Baek, et al., Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), (2024), 7036-7050. https://doi.org/10.18653/v1/2024.naacl-long.389 [16] Z. Ji, N. Lee, et al., Survey of hallucination in natural language generation, ACM Computing Surveys, 55(12) (2023), 1-38. https://doi.org/10.1145/3571730 [17] Z. Jiang, F. Xu, et al., Active retrieval augmented generation, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, (2023), 7969-7992. https://doi.org/10.18653/v1/2023.emnlp-main. 495 [18] M. Joshi, E. Choi, D. Weld, L. Zettlemoyer, TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, (2017), 1601-1611. https://doi.org/10. 18653/v1/P17-1147 [19] D. Kahneman, Thinking, fast and slow, Macmillan, 2011. [20] V. Karpukhin, B. O˘guz, et al., Dense passage retrieval for open-domain question answering, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, (2020), 6769-6781. https://doi.org/10.18653/v1/2020.emnlp-main.550 [21] O. Khattab, M. Zaharia, ColBERT: Efficient and effective passage search via contextualized late interaction over BERT, SIGIR ’20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2020), 39-48. https://doi.org/10.1145/3397271.3401075 [22] S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Optimization by simulated annealing, Science, 220(4598) (1983), 671- 680. [23] N. Kotonya, F. Toni, Explainable automated fact-checking for public health claims, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, (2020), 7740-7754. https://doi.org/10.18653/ v1/2020.emnlp-main.623 [24] J. Lesatod, J. Rivera, et al., An adaptive compute approach to optimize inference efficiency in large language models, Wiley, 2024. https://doi.org/10.22541/au.172851214.47069639/v1 [25] P. Lewis, E. Perez, et al., Retrieval-augmented generation for knowledge-intensive NLP tasks, arXiv, (2021). https: //arxiv.org/abs/2005.11401 [26] E. Liu, et al., Efficient expert pruning for sparse mixture-of-experts language models: Enhancing performance and reducing inference costs, arXiv, (2024). https://doi.org/10.48550/arXiv.2407.00945 [27] N. F. Liu, K. Lin, et al., Lost in the middle: How language models use long contexts, Transactions of the Association for Computational Linguistics, 12 (2024), 153-173. https://doi.org/10.1162/tacl_a_00638 [28] J. Liu, P. Tang, et al., A survey on inference optimization techniques for mixture of experts models, ACM Computing Surveys, 58(10) (2026), 1-37. https://doi.org/10.1145/3794845 [29] X. Lyu, S. Grafberger, S. Biegel, et al., Improving retrieval-augmented large language models via data importance learning, arXiv, (2023). https://arxiv.org/abs/2307.03027 [30] N. Masoumi, O. Davar, M. Eftekhari, MG-CRAG: Fusion of multi-granular retrieval evaluators in corrective RAG with weakly supervised fine-tuning, Knowledge and Information Systems, 68(1) (2026), 149. https://doi.org/10. 1007/s10115-026-02778-2 [31] S. Mishra, et al., From facts to conclusions: Integrating deductive reasoning in retrieval-augmented LLMs, arXiv, (2025). https://arxiv.org/abs/2512.16795 [32] R. Nogueira, K. Cho, Passage re-ranking with BERT, arXiv, (2020). https://arxiv.org/abs/1901.04085 [33] B. Pan, Y. Shen, et al., Dense training, sparse inference: Rethinking training of mixture-of-experts language mModels, arXiv, (2024). https://arxiv.org/abs/2404.05567 [34] O. Press, M. Zhang, et al., Measuring and narrowing the compositionality gap in language models, Conference: Findings of the Association for Computational Linguistics: EMNLP, (2023). https://doi.org/10.18653/v1/2023. findings-emnlp.378 [35] A. Rogers, J. Boyd-Graber, N. Okazaki, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, (2023). https: //aclanthology.org/2023.acl-long.0/ [36] W. Shi, S. Min, et al., REPLUG: Retrieval-augmented black-box language models, arXiv, (2023). https://arxiv. org/abs/2301.12652 [37] N. Shinn, F. Cassano, et al., Reflexion: Language agents with verbal reinforcement learning, NIPS ’23: Proceedings of the 37th International Conference on Neural Information Processing Systems, (2023), 8634-8652. [38] W. Sun, L. Yan, et al., Is ChatGPT good at search? Investigating large language models as re-ranking agents, arXiv, (2024). https://arxiv.org/abs/2304.09542 [39] P. Tamhankar, N. R. Patel, M. C. Kolla, MultiRAG: A fuzzy logic-driven multi-granularity framework for legal document generation, 2025 IEEE International Conference on Information Reuse and Integration and Data Science (IRI), (2025), 313-318. https://doi.org/10.1109/IRI66576.2025.00065 [40] H. Touvron, et al., Llama 2: Open foundation and fine-tuned chat models, arXiv, (2023). https://arxiv.org/ abs/2307.09288 [41] H. Trivedi, N. Balasubramanian, et al., Interleaving retrieval with chain-of-thought reasoning for knowledgeintensive multi-step questions, arXiv, (2023). https://arxiv.org/abs/2212.10509 [42] H. Wang, A. Prasad, et al., Retrieval-augmented generation with conflicting evidence, arXiv, (2025). https:// arxiv.org/abs/2504.13079 [43] H. Wang, L. Ren, T. Zhao, L. Jiao, CoLLM: Industrial large–small model collaboration with fuzzy decision-making agent and self-reflection, IEEE Transactions on Fuzzy Systems, 34(4) (2026). https://doi.org/10.1109/TFUZZ. 2025.3594229 [44] F. Wang, X. Wan, et al., Astute RAG: Overcoming imperfect retrieval augmentation and knowledge conflicts for large language models, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, (2025), 30553-30571. https://doi.org/10. 18653/v1/2025.acl-long.1476 [45] Z. Wang, Z. Wang, et al., Speculative RAG: Enhancing retrieval augmented generation through drafting, arXiv, (2025). https://arxiv.org/abs/2407.08223 [46] X. Wang , J. Wei, et al., Self-consistency improves chain of thought reasoning in language models, arXiv, (2023). https://arxiv.org/abs/2203.11171 [47] S. Xie, T. Yang, et al., LLM-driven multimodal knowledge graph construction for industrial process with prompt optimization and fuzzy RAG, IEEE Transactions on Fuzzy Systems, 99 (2026), 1-14. https://doi.org/10.1109/ TFUZZ.2026.3665172 [48] F. Xu, W. Shi, E. Choi, RECOMP: Improving retrieval-augmented LMs with compression and selective augmentation, arXiv, (2023). https://arxiv.org/abs/2310.04408 [49] F. Xue, Z. Zheng, et al., Openmoe: An early effort on open mixture-of-experts language models, ICML’24: Proceedings of the 41st International Conference on Machine Learning, (2024), 55625-55655. [50] S. Q. Yan, J. C. Gu, et al., Corrective retrieval augmented generation, arXiv, (2024). https://arxiv.org/abs/ 2401.15884 [51] T. Yao, et al., Multiagent fuzzy reinforcement learning with LLM for cooperative navigation of endovascular robotics, IEEE Transactions on Fuzzy Systems, 34 (2026), 1109-1119. https://doi.org/10.1109/TFUZZ.2025.3585934 [52] Y. Yu, et al., RankRAG: Unifying context ranking with retrieval-augmented generation in LLMs, arXiv, (2024). https://arxiv.org/abs/2407.02485 [53] W. Yu, H. Zhang, et al., Chain-of-Note: Enhancing robustness in retrieval-augmented language models, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, (2024), 14672-14685. https://doi.org/10.18653/v1/2024.emnlp-main.813 [54] D. Zhang, J. Song, et al., Mixture of experts in large language models, arXiv, (2025). https://doi.org/10.48550/ arXiv.2507.11181 [55] H. Zhuang, et al., RankT5: Fine-tuning T5 for text ranking with ranking losses, SIGIR ’23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2022), 2308-2313. https://doi.org/10.1145/3539618.3592047
آمار تعداد مشاهده مقاله: 4 تعداد دریافت فایل اصل مقاله: 3

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

اخبار و اعلانات

آمار

Allo-Self-RAG: Fuzzy aggregation of internal and external critique signals for improved Self-RAG evaluation