# (COMPLETED) Analyzing the Impacts of Non-Gaussian Errors in Gaussian Data Assimilation Systems

#### Project Period: September 1, 2012 to August 31, 2016

Principal Investigator(s): Steve Fletcher

Co-Principal Investigator(s): Andrew Jones (CSU/CIRA)

Graduate Students, Postdoctoral and Other Investigators: John Forsythe (CSU/CIRA) and Anton Kliewer

Sponsor(s): National Science Foundation (NSF)

This NSF funded project investigated the impacts of non-Gaussian errors associated with a temperature and humidity 1-dimensional variational retrieval system called the CIRA 1-Dimensional Optimal Estimator, or C1DOE for short. As well as testing a mixed lognormal â€“Gaussian distribution formulation of a 1DVAR cost function against the standard Gaussian fits all and the logarithmic transform formulation, the PI and Co-PI developed the needed theory to extend the mixed distribution approach from the full field 3DVAR and 4DVAR formulation to incremental versions through the introduction of a geometric tangent linear approach. Given all these new formulations of non-Gaussian systems, we needed to address the question of how could we know in advance that we require a lognormal based data assimilation system? The way we addressed the question just presented was to develop a set of statistical tests that all needed to be satisfied at a high confidence level to ensure that we 1) did not make a false positive statement but 2) to overcome the auto-correlation in the data sample.

Another important aspect of the work on this project was associated with determine a new set of quality control measures to enable observations with lognormally distributed errors would not be rejected by a Gaussian based QC measure. We were able to develop a new linearization that enabled the interchanging of the logarithm and the observation operator such that the theory from the original buddy check system followed through to the lognormal case.

The final areas of research that was undertaken on this grant came about through a peculiarity in the performance of the idea case for the lognormal retrieval system. The first feature that was discovered about the lognormal based 1DVAR case for a 1 variable situation is that the three descriptive statistics: mode (maximum likelihood state), the median (unbiased state) and the mean (minimum variance state) had *regions of optimality*. The regions of optimality for each descriptive statistic was based upon certain values of the parameters in the retrieval, i.e. background state, background error covariance, observational error covariance as well as the measurement and representative errors. We found that if there was no observational errors, but a fixed error covariance and a fixed background error covariance, then there exists values for the background state such that there existed *optimal regions for the values of the background state*, such that each descriptive statistic was optimal.

For more information, please check the project summary PDF.