Publications
Found 8 publication(s)
- 1
Richter, K. (2025): Machine Learning-supported visibility forecasting by combining station, Meteosat and reanalysis data Philipps University of Marburg, master thesis
- log in to download
- link
- view metadata
- Abstract: Accurate forecasts of ra...
- Keywords: | Radiation fog | fog horizontal visibility | Machine learning | Nowcasting | XGBoost |
Abstract:
Accurate forecasts of radiation fog are an objective of significant relevance due to its impact on traffic, aviation, and transportation. This study will explore the adaptation and enhancement of a previously developed Machine Learning-based nowcasting framework for radiation fog events. The objective is to explore the expansion potential to a spatial scale and model accuracy improvements through application at three distinct weather station locations that experience radiation fog. Further, the effectiveness of Numerical Weather Prediction (NWP) data as additional predictor variable source on model performance will be assessed. This will be performed through integration of datasets from German Weather Service (DWD) stations, Meteosat Second Generation (MSG) channel properties and regional reanalysis variables from COSMO NWP model. Distinct model variants based on different dataset combinations (Station, MSG+COSMO, Station+MSG+COSMO, Visibility-Only) will be evaluated. Using eXtreme Gradient Boosting (XGBoost) algorithm, the framework forecasts absolute visibility with 60-minute lead time. A persistence model serves as benchmark. Performance will be assessed using scoring metrics (Accuracy, Correlation, Percentage bias, Mean Absolute Error) across the full visibility range and three visibility threshold bounds (2 km, 1.1 km, 0.4 km). Temporal accuracy of fog formation and dissipation will be determined through evaluation of fog formation and dissipation time shifts. XGBoost models mostly outperform PM, with tendencies of Station+MSG+COSMO variant performing best and MSG+COSMO variant worst. Prediction difficulties arise in the 0.4 km threshold segment due to measurement resolution limitations and value imbalance of visibility data. The model variants reliably predict fog event transitions, with the majority forecasted with deviations < 30 minutes and only few events overseen. A consistent tendency towards delayed prediction is observed. Variability in model performances across station locations suggests that small-scale environmental characteristics contribute to different model robustness at distinct sites. The results indicate strong potential for further spatial framework extension. COSMO variables partially contribute to improved model performance. The framework marks a solid foundation for future exploitation.
Grigusova, P.; Limberger, O.; Murkute, C.; Pucha, F.; González-Jaramillo, V.; Fries, A.; Windhorst, D.; Breuer, L.; de Paula, M.D.; Hickler, T.; Trachte, K. & Bendix, J. (2025): Radiation partitioning in a cloud-rich tropical mountain rain forest of the S-Ecuadorian Andes for use in plot-based land surface modelling. Dynamics of Atmospheres and Oceans 110, 101553.
- download
- link
- view metadata
- DOI: 10.1016/j.dynatmoce.2025.101553
- Abstract: Understanding the partit...
- Keywords: | Machine learning | Diffuse radiation | Surface radiation balance | Land surface modeling | Tropical mountain rain forest |
Abstract:
Understanding the partitioning of downward shortwave radiation into direct and diffuse components is essential for modeling ecosystem energy fluxes. Accurate partitioning functions are critical for land surface models (LSMs) coupled with climate models, yet these functions often depend on regional cloud and aerosol conditions. While data for developing semi-empirical partitioning functions are abundant in mid-latitudes, their performance in tropical regions, particularly in the high Andes, remains poorly understood due to scarce ground-based measurements. This study analyzed a unique dataset of shortwave radiation components from a tropical mountain rainforest (MRF) in southern Ecuador, developing and testing a locally adapted partitioning function using Random Forest Regression. The model achieved high accuracy in predicting the percentage of diffuse radiation (%Dif; R2=0.95, RMSE = 5.33, MAE = 3.74) and absolute diffuse radiation (R2=0.99, RMSE = 5.30, MAE = 14). When applied to simulate upward shortwave radiation, the model outperformed commonly used partitioning functions achieving the lowest RMSE (8.62) and MAE (5.82) while matching the highest R2 (0.97). These results underscore the importance of regionally adapted radiation partitioning functions for improving LSM performance, particularly in complex tropical environments. The adapted LSM will be further utilized for studies on heat fluxes and carbon sequestration.
Schütz, M.; Schütz, A.; Bendix, J. & Thies, B. (2024): Improving classification-based nowcasting of radiation fog with machine learning based on filtered and preprocessed temporal data. Quarterly Journal of the Royal Meteorological Society 150(759), 577--596.
- log in to download
- link
- view metadata
- DOI: 10.1002/qj.4619
- Abstract: Radiation fog nowcasting...
- Keywords: | fog | Machine learning | Nowcasting | forecast |
Abstract:
Radiation fog nowcasting remains a complex yet critical task due to its substantial impact on traffic safety and economic activity. Current numerical weather prediction models are hindered by computational intensity and knowledge gaps regarding fog-influencing processes. Machine-Learning (ML) models, particularly those employing the eXtreme Gradient Boosting (XGB) algorithm, may offer a robust alternative, given their ability to learn directly from data, swiftly generate nowcasts, and manage non-linear interrelationships among fog variables. However, unlike recurrent neural networks XGB does not inherently process temporal data, which is crucial in fog formation and dissipation. This study proposes incorporating preprocessed temporal data into the model training and applying a weighted moving-average filter to regulate the substantial fluctuations typical in fog development. Using an ML training and evaluation scheme for time series data, we conducted an extensive bootstrapped comparison of the influence of different smoothing intensities and trend information timespans on the model performance on three levels: overall performance, fog formation and fog dissipation. The performance is checked against one benchmark and two baseline models. A significant performance improvement was noted for the station in Linden-Leihgestern (Germany), where the initial F1 score of 0.75 (prior to smoothing and trend information incorporation) was improved to 0.82 after applying the smoothing technique and further increased to 0.88 when trend information was incorporated. The forecasting periods ranged from 60 to 240 min into the future. This study offers novel insights into the interplay of data smoothing, temporal preprocessing, and ML in advancing radiation fog nowcasting.
Vorndran, M.; Schütz, A.; Bendix, J. & Thies, B. (2023-07-27). Pointwise Machine Learning Based Radiation Fog Nowcast with Station Data in Germany. Presented at 9th International Conference on Fog, Fog Collection, and Dew, Fort Collins, Colorado, USA.
- download
- link
- view metadata
- Abstract: There are many uncertain...
- Keywords: | Radiation fog | station data | Machine learning | Nowcasting | XGBoost |
Abstract:
There are many uncertainties in radiation fog forecast. Continuous effort is being made to improve the forecast. A supplementary and increasingly popular approach to numerical weather forecast is the forecast with machine learning (ML) algorithms. While numerical weather forecast is based on mathematical models with partial differential equations, ML algorithms take a more heuristic approach. The latter strategy calls for three steps. Precise data preprocessing is the initial step. This implies that after preprocessing, the dataset must contain the forecast-relevant information in a way that the algorithm can learn from it. This is not a trivial step because it necessitates a thorough understanding of the fundamental principles underlying radiation fog. Even when the relevant information is contained in the data, it is not always evident, especially in severely unbalanced fog datasets. The best strategy to achieve a pleasing result may therefore not be to simply feed the algorithm all the data and variables that are available. So that the appropriate dynamics may be detected by the algorithm, the data and information should be adjusted accordingly. The second step is the data splitting into training, validation and test datasets. The ability to predict fog is driven by the temporally linked process that describes the ongoing change in atmospheric state but in order to guarantee constant independence between the training, validation and test dataset, the data splitting method must consider this temporally linked information between the individual datapoints. Otherwise, the algorithm’s forecast accuracy can be based on the temporally correlated information content of the individual data points. The third step is the interpretation of the model scores. When looking at the forecast score alone, it is a very abstract number that does not directly allow a statement about the forecast performance of the model. In order to evaluate the model performance, two baselines are of relevance: algorithm complexity and dataset complexity. A baseline for algorithm complexity justifies the chosen algorithm and also classifies the model performance. A baseline for dataset complexity also classifies the model performance and enables a better comparability of different datasets. Following these principles, our current objective is to improve the ML based fog forecast with XGBoost for a forecasting period up to four hours for the station in Linden-Leihgestern (Germany). The training and evaluation are based on the Expanding Window Approach (Vorndran et al. 2022) that considers the autocorrelation of a fog time series and maintains the temporal order during both training and evaluation. The evaluation is based on a score for each of the following categories: Overall performance, fog formation, and fog dissipation. The results are set in relation to different baselines to evaluate the performance and the dataset complexity. Building on this scheme, newly preprocessed data led to an improvement in the prediction of radiation fog for the station in Linden-Leihgestern. We will present the most recent findings from our research.
Vorndran, M.; Schütz, A.; Bendix, J. & Thies, B. (2022-09-16). The effect of filtering and preprocessed temporal information on a classification based machine learning model for radiation fog nowcasting. Presented at AK Klima, Würzburg.
- download
- link
- view metadata
- Abstract: The current goal of our ...
- Keywords: | station data | Machine learning | Nowcasting | XGBoost |
Abstract:
The current goal of our research is to improve the machine learning (ML) based fog forecast for a forecasting period up to four hours for the station in Linden-Leihgestern. The prediction of radiation fog is still subject to large uncertainties. In particular, the precise prediction of fog start and dissipation, i.e. the transitions, is very difficult. The high-frequency fluctuations of the variables in the formation and dissipation phases pose a particular challenge to ML models. These strong fluctuations make it difficult to extract the necessary information about the past, namely increasing or decreasing trend. However, the temporal evolution in the past is determining for the development of radiation fog. Thus, these dynamics must be prepared in such a way that they can be learned during model training. Therefore, different smoothing levels were tested using a Gaussian moving average filter. Furthermore, additional trend variables for model training were generated that carry information about the temporal evolution of previous data points. Training and evaluation have been carried out with the Expanding Window Approach (Vorndran et al. 2022) that has recently been accepted as a training and validation method for radiation fog prediction. Building on this scheme with the tree-based algorithm XGBoost, the newly preprocessed data led to an improvement in the prediction of radiation fog for the station in Linden-Leihgestern. The results from this research will be presented in the poster session. The study is funded by the DFG research project “FOG-ML FOrecasting radiation foG by combining station and satellite data using Machine Learning”.
Vorndran, M.; Schütz, A.; Bendix, J. & Thies, B. (2021-11-05). Training and validation weaknesses in pointwise classification-based radiation fog forecast using machine learning algorithms . Presented at AK Klima, Passau.
- download
- link
- view metadata
- Abstract: Fog forecasting still sh...
- Keywords: | fog forecasting | station data | Machine learning | Decision Trees | Classification | XGBoost |
Abstract:
Fog forecasting still shows large inaccuracies in accurately predicting fog formation, dissipation and duration. Since a few years, Machine learning (ML) algorithms are increasingly used in addition to numerical fog forecasts because of their computational speed and ability to learn non-linear interactions between the variables. Due to their black-box nature, precise and accurate training and evaluation is vital to prevent insufficient training or meaningless scores. Three main points important for fog prediction are explained in the following. 1. Fog forecasting datasets consist of autocorrelated variables. In most cases, there is an information leakage between the training and test data sets which are used to evaluate the model performance. This information leakage can have an impact on the performance scores because the stronger the information flow, the easier it is for the model to memorize. 2. Fog forecasting datasets have a temporal order. To be able to make statements about the performance of an operational model this temporal order should already be simulated during model training and evaluation. This is because for an operational model, the training data points are always older than the data points to be predicted. Commonly used training methods neglect this fact. 3. Time series used for fog forecasting usually have a large imbalance between the frequency of the fog class and non-fog class. This imbalance can have an unfavorable interaction with the confusion matrix based meteorological scores that are widely used for evaluation. All of the aforementioned points, if not considered, can lead to an insufficient forecast without even being noticed. Therefore, the negative influence on the model score of two commonly used training methods that neglect the points named above will be shown using an XGBoost model and a logistic regression model. In comparison, a training and evaluation method was evaluated that maintains the temporal order and thus simulates the performance of an operational model. It will also be shown that common meteorological scores, since they are computed based on a confusion matrix, share a weakness when the data set is unbalanced: Persistence behavior remains undetected. The study is funded by the DFG research project “FOG-ML FOrecasting radiation foG by combining station and satellite data using Machine Learning”.
Vorndran, M.; Schütz, A.; Bendix, J. & Thies, B. (2022): Current training and validation weaknesses in classification-based radiation fog nowcast using machine learning algorithms. Artificial Intelligence for the Earth Systems 1(2), e210006.
- log in to download
- link
- view metadata
- DOI: 10.1175/AIES-D-21-0006.1
- Abstract: Large inaccuracies still...
- Keywords: | fog forecasting | station data | Machine learning | Model evaluation | Decision Trees | Classification | Nowcasting | XGBoost |
Abstract:
Large inaccuracies still exist in accurately predicting fog formation, dissipation, and duration. To improve these deficiencies, machine learning (ML) algorithms are increasingly used in nowcasting in addition to numerical fog forecasts because of their computational speed and their ability to learn the nonlinear interactions between the variables. Although a powerful tool, ML models require precise training and thoroughly evaluation to prevent misinterpretation of the scores. In addition, a fog dataset’s temporal order and the autocorrelation of the variables must be considered. Therefore, classification-based ML related pitfalls in fog forecasting will be demonstrated in this study by using an XGBoost fog forecasting model. By also using two baseline models that simulate guessing and persistence behavior, we have established two independent evaluation thresholds allowing for a more assessable grading of the ML model’s performance. It will be shown that, despite high validation scores, the model could still fail in operational application. If persistence behavior is simulated, commonly used scores are insufficient to measure the performance. That will be demonstrated through a separate analysis of fog formation and dissipation, because these are crucial for a good fog forecast. We also show that commonly used blockwise and leave-many-out cross-validation methods might inflate the validation scores and are therefore less suitable than a temporally ordered expanding window split. The presented approach provides an evaluation score that closely mimics not only the performance on the training and test dataset but also the operational model’s fog forecasting abilities.
Pauli, E.; Andersen, H.; Bendix, J.; Cermak, J. & Egli, S. (2020): Determinants of fog and low stratus occurrence in continental central Europe – a quantitative satellite-based evaluation. Journal of Hydrology 591, 125451.
- log in to download
- link
- view metadata
- DOI: 10.1016/j.jhydrol.2020.125451
- Abstract: The formation and develo...
- Keywords: | Europe | Fog | Low stratus | Machine learning | Land surface | Atmosphere-land surface interactions |
Abstract:
The formation and development of fog and low stratus clouds (FLS) depend on meteorological and land surface conditions and their interactions with each other. While analyses of temporal and spatial patterns of FLS in Europe exist, the interactions between FLS determinants underlying them have not been studied explicitly and quantitatively at a continental scale yet. In this study, a state-of-the-art machine learning technique is applied to model FLS occurrence over continental Europe, using meteorological and land surface parameters from geostationary satellite and reanalysis data. Spatially explicit model units are created to test for spatial and seasonal differences in model performance and FLS sensitivities to changes in predictors, and effects of different data preprocessing procedures are evaluated. The statistical models show good performance in predicting FLS occurrence during validation, with R2>0.9 especially in winter high pressure situations.The predictive skill of the models seems to be dependent on data availability, data preprocessing, time period, and geographic characteristics. It is shown that atmospheric proxies are more important determinants of FLS presence than surface characteristics, in particular mean sea level pressure, near-surface wind speed and evapotranspiration are crucial, together with FLS occurrence on the previous day. Higher wind speeds, higher land surface temperatures and higher evapotranspiration tend to be negatively related to FLS. Spatial patterns of feature importance show the dominant influence of mean sea level pressure on FLS occurrence throughout the central European domain. When only high pressure situations are considered, wind speed (in the western study region) and evapotranspiration (in the eastern study region) gain importance, highlighting the influence of moisture advection on FLS occurrence in the western parts of the central European domain. This study shows that FLS occurrence can be accurately modeled using machine learning techniques in large spatial domains based on meteorological and land surface predictors. The statistical models used in this study provide a novel analysis tool for investigating empirical relationships in the FLS – land surface system and possibly infer processes.- 1