WP-1500.1 and WP-1500.2

The main objective of the two tasks was to enable the EMM infrastructure to carry out impact studies and to perform end-to-end simulations of existing and future space missions remotely sensing the Earth’s atmosphere. The objective was pursued with two activities: on one hand, adequate high-performance computing resources were procured. On the other hand, some relevant open software modules were installed, characterized and advanced to build a set of tools suitable to carry out the mentioned studies and simulations.

Hardware

The hardware now available at CNR Research Area in Sesto Fiorentino (FI), INO computing center, includes:

  • Cluster for parallel computing (448 cores @ 2.85 GHz, 2048 Gb RAM)
  • Server optimized for sequential processing (96 cores @ 3.6 GHz, 768 Gb RAM)
  • Storage system 450 Tb, redundant
  • Rack internal network @200 Gb/s, UPS
INO facility 1
INO facility 2

In addition, for development and testing, CNR-IAC acquired a Dell Server R940 – NVME with four Xeon Gold 6252N processors each with 24 cores, and 1Tb RAM.

Software:

The available open software installed and characterized includes several radiative transfer models, among which:

In addition, within the infrastructure project, we also developed tools for spectral fluxes computation

simulated resolved radiances

(angular integration of spectral radiance) and tools for instrument spectral response function convolutions.

Comparison of Fast Radiative Transfer Models

The radiative transfer tools installed at the CNR-INO computing facility were characterized from both the point of view of accuracy and speed. The accuracy of the models was assessed by comparison of the spectral radiances or the band-specific fluxes to analogous calculations carried out with the slow and accurate KLIMA model.

radiances generated by RTTOV and KLIMA

Figures above show example results of the intercomparisons between RTTOV and KLIMA and between σ-IASI/F2N and KLIMA, respectively.

The two codes were also analyzed from the algorithmic point of view. The characteristics of the two codes are summarized in the following Table.

Characteristic

RTTOV

σ-IASI/F2N

Development

Large development team and community. Already used in meteorological models

Small community and development team, almost all Italian

Algorithm (clear sky)

Fully parametric

Series expansion

Algorithm (cloudy sky)

Fully parametric

Chu+Tang physical models

Product

Instrumental radiance

Hi-res radiance and instrument radiance after convolution with the ISRF

Number of variable gases

Depending on the predictors. Max 7 for most instruments.

Max 12 gases.

Accuracy

For FORUM: about 100% of ARA

For FORUM: about 40% of ARA.

Insertion in a climatological or meteorological model

Easy.

Only as an external executable.

RT of single spectral channels

YES

NO

Numerical Optimization

Already performed

Ample margin

Further Optimization

Using the PC-RTTOV version

Ample margin

The most important characteristic is the possibility of performing the radiative transfer of single channels, which is essential in assimilation. The RTTOV code can do it, because the convolution with the ISRF is already factored in the pre-recorded coefficients table, while the σ-IASI/F2N needs to perform a convolution, so that a whole neighborhood of each frequency is needed.

The RTTOV can be further optimized by using the PC-RTTOV version due to Matricardi (https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.680). PC-RTTOV is able to reconstruct a full IASI spectrum (8461 spectral channels) using only a subset of channels (called scores) and a linear operation. In the figure below we report the error pertaining to the whole spectral reconstruction using 50, 200 and 300 scores, compared with the IASI random noise.

error pertaining to the whole spectral reconstruction

The computational time of PC-RTTOV using 200 scores is about 1/10 of that of the full RTTOV radiative transfer.

Outgoing long-wave fluxes

The spectral radiances produced by KLIMA and σ-IASI/F2N were also angularly integrated (with a newly developed tool) to get the down-welling and the up-welling energy fluxes in specific spectral bands. The obtained fluxes were then compared to those generated by the RRTM and the eCrad codes.

Here is an example of the intercomparison between KLIMA and RRTM derived fluxes. While faster than KLIMA by several orders of magnitude, RRTM provides total long-wave fluxes with accuracy of the order of 1-2 W/m2.

UP-FLUX errors

Speed intercomparisons

The following table summarizes the computing times required by the radiative transfer codes computing spectrally resolved radiances (KLIMA, σ-IASI/F2N and RTTOV).

RTM code

Elapsed time

CPU time

KLIMA

~ 1200 hours

43.2 * 105 s

SIGMA-IASI

104 s

45 s

RTTOV

8.6 s

7.6 s

The RRTM and eCrad codes compute only the band-integrated and total fluxes and are usually embedded in global models that call them billions of times within a single model run. Thus, while more inaccurate, these codes are, by far, faster than both RTTOV and σ-IASI/F2N.

Applications and know-how

Using the characterized radiative transfer models described earlier, we are able to build the end-to-end (E2E) chain (from data acquisition to Level 2 geophysical products) for the simulation of atmospheric passive remote sensing missions. The E2E simulation allows to characterize the product quality as a function of the measurement characteristics, thus it is extremely useful to set up the requirements in the initial phases of a new mission.

Conversely, given the accuracy, the precision and the geometrical specifications of a future or a already running mission, we can assess the information contained in the measurements and the possibility to derive new geophysical products.

Artificial Intelligence methods

Due to the large volume of data expected from next-generation sensors, new machine learning (ML) approaches have been proposed. Although these approaches must rely on physical models during the training phase, the algorithms themselves, typically based on architectures such as neural networks, learn a functional relationship directly from previously generated results of the same or similar problems.

In principle, ML methods can replace any full-physics procedure. However, in the field of remote sensing, research has mainly focused on three application areas:

  • Scene classification, such as distinguishing between clear and cloudy conditions, or estimating indices related to scene homogeneity and cloudiness.
  • The forward model, i.e. predicting a spectrum from a given atmospheric state (radiative transfer, RT).
  • The inverse problem, i.e. retrieving the atmospheric state from measurements (retrieval).

Scene classification is important because the presence of clouds requires the use of radiative transfer models that account for multiple scattering. Modern instruments are often complemented by ancillary sensors for analyzing the field of view, and some instruments are specifically designed for cloud detection and characterization.

A major challenge for ML approaches applied to the RT problem is the curse of dimensionality. An atmospheric state may be described by hundreds of parameters, while the corresponding spectral radiance can consist of thousands of channels. In full-physics models, correlations between spectral channels are inherently represented. In contrast, ML models must learn these correlations from the training data set, which is a demanding task.

For the retrieval problem, the main difficulty lies in the ill-posed nature of the inversion. Full-physics retrieval methods address this issue through regularization techniques, such as reducing dimensionality using Principal Component Analysis (PCA), or constraining solutions towards a climatological prior. ML approaches must incorporate similar strategies, either by learning appropriate regularization from the training dataset or by projecting both atmospheric states and radiances into lower-dimensional spaces (latent spaces) and learning relationships between these representations (e.g. latent twin approaches).

Although ML methods offer very fast inference once trained, their retrieval accuracy still generally falls short of that achieved by full-physics approaches. Ongoing research aims to improve their precision, potentially making them a viable alternative to more computationally expensive methods.

WP-1500.3 – Assimilation of future observation data into weather models

Scientific and infrastructural objectives:

New observations, particularly in spectral bands that have yet to be fully explored, enable us to characterise the atmosphere more accurately from a physical perspective and to study its dynamics and interactions within a complex and evolving climate system. The assimilation of these measurements into weather models is essential for reducing uncertainties in the initial conditions, which strongly influence the accuracy of forecasts. This is particularly relevant in a context of increasing climate variability, where extreme events are becoming increasingly frequent and difficult to predict.

The module provides advanced procedures and methodologies through the implementation of a High-Performance Computing (HPC) infrastructure designed to utilise new observational data, primarily from satellites, within Numerical Weather Prediction (NWP) models.

Position within the EMN component:

This module forms part of the infrastructure component, aimed at developing and implementing High-Performance Computing (HPC) capabilities to support atmospheric simulation for meteorological and climatic purposes. It falls within the scope of activities involving the integration of observations and numerical modelling, with particular reference to the assimilation of next-generation satellite data into numerical forecast models. The module enables OSSE (Observing System Simulation Experiment) testing for new observational platforms and supports the development and optimisation of Data Assimilation (DA) methodologies, for the efficient exploitation of large volumes of observational data and their integration into models. It provides operational capabilities for the management, processing and analysis of complex datasets, supporting high-resolution simulation cycles and the development of more accurate forecasting systems. In this context, the module also integrates advanced Artificial Intelligence (AI) techniques, aimed at improving assimilation processes and reducing computational costs.

Contribution of the module to the EMN infrastructure:

The module supports integration with other EMN modelling components, providing advanced data assimilation capabilities, particularly for studies on the impact of future observations—especially satellite data—on forecast models. It also supports the application of machine learning and Artificial Intelligence (AI) techniques for weather and climate modelling. Through the use of high-performance computing infrastructure, it enables local or global simulations across multiple spatial and temporal scales. It therefore represents a strategic hub for the integration of observations and advanced modelling, contributing to the improvement of weather forecasting systems, with significant implications for the understanding and prediction of environmental changes.

WP-1500.4

Development and implementation of an ensemble data assimilation system for limited area models