Laboratory for Climatology and Remote Sensing

Vorndran, M.; Schütz, A.; Bendix, J. & Thies, B. (2023-07-27). Pointwise Machine Learning Based Radiation Fog Nowcast with Station Data in Germany. Presented at 9th International Conference on Fog, Fog Collection, and Dew, Fort Collins, Colorado, USA.

download
link
view metadata
Abstract: There are many uncertain...

Abstract:

There are many uncertainties in radiation fog forecast. Continuous effort is being made to improve the forecast. A supplementary and increasingly popular approach to numerical weather forecast is the forecast with machine learning (ML) algorithms. While numerical weather forecast is based on mathematical models with partial differential equations, ML algorithms take a more heuristic approach. The latter strategy calls for three steps. Precise data preprocessing is the initial step. This implies that after preprocessing, the dataset must contain the forecast-relevant information in a way that the algorithm can learn from it. This is not a trivial step because it necessitates a thorough understanding of the fundamental principles underlying radiation fog. Even when the relevant information is contained in the data, it is not always evident, especially in severely unbalanced fog datasets. The best strategy to achieve a pleasing result may therefore not be to simply feed the algorithm all the data and variables that are available. So that the appropriate dynamics may be detected by the algorithm, the data and information should be adjusted accordingly. The second step is the data splitting into training, validation and test datasets. The ability to predict fog is driven by the temporally linked process that describes the ongoing change in atmospheric state but in order to guarantee constant independence between the training, validation and test dataset, the data splitting method must consider this temporally linked information between the individual datapoints. Otherwise, the algorithm’s forecast accuracy can be based on the temporally correlated information content of the individual data points. The third step is the interpretation of the model scores. When looking at the forecast score alone, it is a very abstract number that does not directly allow a statement about the forecast performance of the model. In order to evaluate the model performance, two baselines are of relevance: algorithm complexity and dataset complexity. A baseline for algorithm complexity justifies the chosen algorithm and also classifies the model performance. A baseline for dataset complexity also classifies the model performance and enables a better comparability of different datasets. Following these principles, our current objective is to improve the ML based fog forecast with XGBoost for a forecasting period up to four hours for the station in Linden-Leihgestern (Germany). The training and evaluation are based on the Expanding Window Approach (Vorndran et al. 2022) that considers the autocorrelation of a fog time series and maintains the temporal order during both training and evaluation. The evaluation is based on a score for each of the following categories: Overall performance, fog formation, and fog dissipation. The results are set in relation to different baselines to evaluate the performance and the dataset complexity. Building on this scheme, newly preprocessed data led to an improvement in the prediction of radiation fog for the station in Linden-Leihgestern. We will present the most recent findings from our research.

Keywords: | Radiation fog | station data | Machine learning | Nowcasting | XGBoost |

Vorndran, M.; Schütz, A.; Bendix, J. & Thies, B. (2022-09-16). The effect of filtering and preprocessed temporal information on a classification based machine learning model for radiation fog nowcasting. Presented at AK Klima, Würzburg.

download
link
view metadata
Abstract: The current goal of our ...

Keywords: | station data | Machine learning | Nowcasting | XGBoost |

Vorndran, M.; Schütz, A.; Bendix, J. & Thies, B. (2022): Current training and validation weaknesses in classification-based radiation fog nowcast using machine learning algorithms. Artificial Intelligence for the Earth Systems 1(2), e210006.

log in to download
link
view metadata
DOI: 10.1175/AIES-D-21-0006.1
Abstract: Large inaccuracies still...

Keywords: | fog forecasting | station data | Machine learning | Model evaluation | Decision Trees | Classification | Nowcasting | XGBoost |

Vorndran, M.; Schütz, A.; Bendix, J. & Thies, B. (2021-11-05). Training and validation weaknesses in pointwise classification-based radiation fog forecast using machine learning algorithms . Presented at AK Klima, Passau.

download
link
view metadata
Abstract: Fog forecasting still sh...

Abstract:

Fog forecasting still shows large inaccuracies in accurately predicting fog formation, dissipation and duration. Since a few years, Machine learning (ML) algorithms are increasingly used in addition to numerical fog forecasts because of their computational speed and ability to learn non-linear interactions between the variables. Due to their black-box nature, precise and accurate training and evaluation is vital to prevent insufficient training or meaningless scores. Three main points important for fog prediction are explained in the following. 1. Fog forecasting datasets consist of autocorrelated variables. In most cases, there is an information leakage between the training and test data sets which are used to evaluate the model performance. This information leakage can have an impact on the performance scores because the stronger the information flow, the easier it is for the model to memorize. 2. Fog forecasting datasets have a temporal order. To be able to make statements about the performance of an operational model this temporal order should already be simulated during model training and evaluation. This is because for an operational model, the training data points are always older than the data points to be predicted. Commonly used training methods neglect this fact. 3. Time series used for fog forecasting usually have a large imbalance between the frequency of the fog class and non-fog class. This imbalance can have an unfavorable interaction with the confusion matrix based meteorological scores that are widely used for evaluation. All of the aforementioned points, if not considered, can lead to an insufficient forecast without even being noticed. Therefore, the negative influence on the model score of two commonly used training methods that neglect the points named above will be shown using an XGBoost model and a logistic regression model. In comparison, a training and evaluation method was evaluated that maintains the temporal order and thus simulates the performance of an operational model. It will also be shown that common meteorological scores, since they are computed based on a confusion matrix, share a weakness when the data set is unbalanced: Persistence behavior remains undetected. The study is funded by the DFG research project “FOG-ML FOrecasting radiation foG by combining station and satellite data using Machine Learning”.

Keywords: | fog forecasting | station data | Machine learning | Decision Trees | Classification | XGBoost |

Laboratory for
Climatology and Remote Sensing

Publikationen

Abstract:

Abstract:

Abstract:

Abstract:

Publications

Schnellsuche

Keywords:

Aktuelle Datensätze

Aktuelle Publikationen