We have just run the automated quality control (QC) for the latest monthly update, which includes data from June 2021. I'm sure you are all aware of the severe heat wave which affected the western part of North America in the last part of that month. British Columbia (Canada), Oregon and Washington (USA) experienced a number of days of exceptionally high temperatures, well over 40C in some cases.
Records measured by stations did not only fall, they were smashed, with new values set that were up to 5C higher than the previous records. A report by the World Weather Attribution project indicates that this event was virtually impossible without human induced climate change. Given that this event was so much warmer than anything experienced in this region in the past, we thought to check how the automated QC handled these exceptional values.
QC procedure will always result in retaining some bad values (false
negatives), and also erroneously removing some good ones (false
positives). It is impotant, however, to minimise these as best
possible, and do "least harm". To this effect, observations that have
been flagged by the QC are removed from the main data stream, but are
available in a separate data field in the netCDF files, should any user
wish to re-insert them into the time series.
The town of Lytton (BC) recorded the highest temperatures during this event, but that station does not form part of the HadISD. We had a look to find nearby stations, and show these in Figure 1 for Agassiz, which is south of Lytton and further down the Fraser River, towards Vancouver.
Figure 1. Temperature timeseries for Agassiz (BC, 711130-99999) for (a) all of 2021 and (b) the latter half of June 2021. Observations are in black with any flagged by the climatological QC test in red.
As you can clearly see, the highest values at the peak of the heatwave on the 27-29th June have all been removed, in this case by the Climatological Outlier check. At some level, this is unsurprising, given that the temperatures experienced surpassed anything in the previous record. Climatologically speaking they are exceptional values, and so without any other information to go on, could be dubious.
The Current QC
Of course we know that these are likely to be valid observations, and as HadISD has been designed to retain true extremes, some adjustments to the QC algorithms are necessary. Firstly, let's have a look at how the current test is identifying and flagging these values. For full details see the HadISD paper (Dunn et al, 2012).
The Climatological Outlier check works on a monthly basis, and calculates climatological values for each hour of the day for each month using the winsorized observations (Winsorizing is a process where all values exceeding a certain threshold [5% & 95% in this case] are replaced by these threshold values). Using these 24 climatological values, anomalies are calculated, and then scaled using their inter-quartile range.
We have included a way to account for some of the effects of a shifting climate using a low-pass filter. However, this is only applied to complete years of data, and so on our monthly updates has so far not been included. The resulting distribution is fitted with a Gaussian, and we use where this fitted Gaussian crosses the y=0.1 line to set our threshold, rounded up to the next whole degree.
Figure 2: the distribution of scaled anomalies for June from Agassiz (711130-99999), with the flagged ones highlighted in red. Note the logarithmic y-axis.
The test operates two levels of flagging, depending whether there is an empty bin between those further from the centre than the threshold values. If there is a gap, then these are flagged, as shown in Figure 2. If there isn't an empty bin and the are bins part of a contiguous distribution but are further from the mean than the threshold, then these are "tentatively" flagged (see Figure 10 in the HadISD paper). When running the neighbour checks, these tentatively flags can be removed if sufficient neighbours indate these are reasonable.
In the case of Agassiz, the observations were so extreme, that this test has flagged them without the option of the neighbour check undoing this (Figure 2).
Amending the QC
There are a number of options as to what we could do to improve the actions of this automated QC. However, the important thing is to make sure that whatever we implement, there are as few knock-on effects in other regions and flags as possible. The intention being that we improve this test in a robust, responsible way.
A number of options have so far come to mind, including:
Amend the low-pass filter to include data from the year in progress.
Amend the fitting function from a pure Gaussian, to one which allows skew or even kurtosis. As seen in Figure 2, the distribution has a high tail above the fitted Gaussian, and accounting for this will affect the threshold used. This approach is already used in a different check in the HadISD QC.
Use a rolling range to determine the years contributing to the climatologies used when creating anomalies, so values from 1931 are not contributing to 2021.
Amend the neighbour check so that spatially coherent anomalies result in flags being unset from a greater subset of QC tests.
All of these approaches will need to be tested with care to ensure that any updates do not result in detrimental performance of the QC suite elsewhere in time or space.
We will release this version of HadISD (v18.104.22.168106p) with a note that observations from this event have been erroneously flagged. As this is a preliminary version of HadISD, this is reasonable, and gives us time to implement a solution in "slow time". Watch this space for an update.
Dunn, R. J. H., Willett, K. M., Thorne, P. W., Woolley, E. V., Durre, I., Dai, A., Parker, D. E., and Vose, R. S.: HadISD: a quality-controlled global synoptic report database for selected variables at long-term stations from 1973–2011, Clim. Past, 8, 1649–1679, https://doi.org/10.5194/cp-8-1649-2012, 2012
[Edited 9-Jul-2021 10.50BST to add option of amending the neighbour check]