Abstract:Ground-level O3 is a major component of photochemical pollution and poses significant risks to human health and ecosystems. To accurately predict O3 concentration variations in Shanghai, this study proposed a random forest prediction model for O3 concentrations optimized through the fuzzy C-means clustering algorithm, based on monitoring data of six air pollutant and weather forecast data from 2014 to 2020 in Shanghai. Firstly, two clustering factors were selected using cross-correlation analysis. Then, O3 concentration was categorized into three types using the fuzzy C-means clustering algorithm. Finally, a random forest model was established to predict O3 concentration, and the predictive performance before and after clustering was compared. The results show that the O3 concentration and PM10 concentration of the previous one day have the greatest influence on the O3 concentration of the prediction day, and O3 concentration variation is notably affected by the month. After fuzzy C-means clustering, the mean absolute error and root mean square error of the predicted O3_8h concentration decreased by 10.5% and 8.8%, respectively. The random forest model improves the accuracy of O3 concentration prediction, and the coefficient of determination R2 increases after clustering, demonstrating that this model has high practical value for predicting O3 pollution in Shanghai.