Abstract:
The enrichment degree of shale oil and gas in the Hongxing area is highly correlated with lithofacies characteristics, which serve as a core indicator for shale reservoir evaluation in the region and directly affect the efficiency and accuracy of oil and gas exploration and development; to address the identification challenges of scarce lithofacies label samples and complex overlapping conventional logging response characteristics of shales in the Hongxing area, this study proposes a heterogeneous ensemble learning model integrating three machine learning methods: Bayesian classification, BP neural network, and random forest. Firstly, the model establishes a refined "3-endmember 4-component" lithofacies classification scheme based on mineral three-endmembers (siliceous, argillaceous, calcareous) and total organic carbon (TOC) content, accurately classifying the shales in the study area into five types: carbon-rich hybrid shales, carbon-rich siliceous shales, high-carbon hybrid shales, high-carbon siliceous shales, and medium-carbon calcareous shales. Subsequently, a Stacking ensemble framework is adopted, generating lithofacies prediction probability matrices of the three base learners (Bayesian classification, BP neural network, and random forest) through K-fold cross-validation, and optimizing weight allocation via an adaptive probability fusion mechanism to synergize the adaptability of Bayesian classification to scarce samples, the nonlinear fitting capability of BP neural network for complex features, and the strong stability of random forest, thereby realizing direct identification and rule characterization of target lithofacies based on conventional logging curves. Application of the model to shale lithofacies identification in Well HY1-4, a typical well in the Hongxing area, yields results showing that the model achieves a comprehensive identification accuracy of 93.19%, representing an average effective improvement of 14.85% compared with single methods such as Bayesian classification, BP neural network, and random forest; it maintains excellent robustness under extreme exploration scenarios including scarce samples, significant data noise, and unbalanced class distribution, and demonstrates favorable spatial generalization ability through cross-regional validation in Well FY10 of the Fuxing area.