HadISD: interesting features of sub-daily climate data: Bug in the Buddy Checks

We have recently the noticed that the checks using the neighbouring stations in the HadISD are not running as intended, and are setting no flags at all (see Fig. 1 and also e.g. v331_202309p_Buddy_check). It appears this has been the case since v202_2017p in 2018! Although the initial releases of version 2 did include buddy checks, adaptations to run on a new job management system resulted in an bug where the data being read in for the buddy station was identical to the target station being assessed. Unfortuntately we have only just picked this up.

This error affects the temperature, dew point and sea-level pressure variables which would use the buddy check to identify further spurious values. We show differences between v201_2016f and v202_2017f in Fig. 1 (to keep changes to station counts to a minimum), which clearly demonstrates the effect of this error. Although the majority of stations would only have had a few observations (<0.1% of the total in their record) flagged by this test, it is pervasive across all continents.

https://www.metoffice.gov.uk/hadobs/hadisd/v201_2016f/images/All_fails_TOT_20170330.png

https://www.metoffice.gov.uk/hadobs/hadisd/v202_2017f/images/All_fails_TOT_20180314.png

Fig. 1: Flagging rates for temperature neighbour check, Top - v201_2016f, Bottom - v202_2017f

Also, the neighbours are used to help unset some flags (tentatively) identified by earlier checks. If there are insufficient neighbours, no unsetting occurs. However, where there are enough neighbours, then as these contain identical data to the target station unflagging occurs as the observations from the neighbours appear to be a sufficiently good match.

This affects the climatological (temperature & dew point), distributional gap (temperature, dew point & SLP), odd cluster (temperature, dew point & SLP but not wind speed) and dew point depression checks. The greatest reduction in numbers of observations flagged by any test are in the odd cluster and dew point depression checks (see Figs. 2 & 3) with lesser impacts in the climatological, and minor ones in the gap check.

https://www.metoffice.gov.uk/hadobs/hadisd/v201_2016f/images/All_fails_OCT_20170330.png

https://www.metoffice.gov.uk/hadobs/hadisd/v202_2017f/images/All_fails_OCT_20180314.png

Fig. 2: Flagging rates for temperature odd cluster check, Top - v201_2016f, Bottom - v202_2017f

https://www.metoffice.gov.uk/hadobs/hadisd/v201_2016f/images/All_fails_DPD_20170330.png

https://www.metoffice.gov.uk/hadobs/hadisd/v202_2017f/images/All_fails_DPD_20180314.png

Fig. 3: Flagging rates for dewpoint depression check, Top - v201_2016f, Bottom - v202_2017f

In terms of the impact on the dataset as a whole, the absence of the buddy checks along with the additional erroneous unflagging means that the data are not as clean and quality controlled as we had hoped (and have been stating). We extend heartfelt apologies to all users.

However, there are no other impacts on the data other than some erroneous values are not being flagged that should be. Although the set of automated QC tests applied to the HadISD would never have been a perfect system, we're sorry that it has not been running as effectively for the last few years. The way the QC suite was designed is that individual observations can be flagged by many different tests. Therefore, although some tests are not working as we had intended, in many cases, erroneous observations will be being flagged by other tests. The overall flagging rates across all tests are very similar (Fig. 4), but depending on the application, those values which are currently retained in error may be important.

https://www.metoffice.gov.uk/hadobs/hadisd/v201_2016f/images/All_fails_ALL_Td_20170330.png

https://www.metoffice.gov.uk/hadobs/hadisd/v202_2017f/images/All_fails_ALL_Td_20180314.png

Fig. 4: Flagging rates for all dew point checks combined, Top - v201_2016f, Bottom - v202_2017f

As the dataset has been run with this error for a number of years (since 2018), we have decided to continue updates as they have been, i.e. without the buddy checks running, at this point in time for consistency with previous releases. Given the pause to HadISD updates in early 2024 (see separate post), there are reasons for this approach.

Next steps

Given the issues with the buddy check described here and the forthcoming pause to HadISD updates, we intend to release two versions in early 2024:

v331_202312p which follow on from other versions, with the buddy checks not being applied
v340_2023f where we will reinstate the buddy checks.

We hope this will give clarity and consistency to users of HadISD, and also enable us to perform some further investigations on the impacts of the inclusion of the buddy checks (and corrected unflagging steps) on the data at this point. Users can also ensure they pick a dataset version which is consistent with any other approaches they have done.

HadISD: interesting features of sub-daily climate data

Wednesday, 11 October 2023

Bug in the Buddy Checks

Next steps

No comments:

Post a Comment