We have seen in parts 1 and 2 that the Climatological and Distribution checks have been adjusted because some values at some stations were likely being erroneously flagged during the heatwave over North America in June 2021.
In order to see how much of an effect these updated checks have had on the flagging rates, we look at a spatial distribution of the stations during the last days of June 2021 showing the temperatures at each station, with observations flagged with our unmodified QC appearing in green (Figure 1). As is clear, a large number of stations are flagged during this time.
|Figure 1. Temperatures at HadISD stations on 28-June-2021 at 00:00UTC (17:00PDT on 27th June). Flagged stations are shown in green, and non-reporting as transparent.|
We produced the same map after the modifications in the two QC tests, and also the appropriate adjustment to the buddy check to allow any tentative flags to be unset (Figure 2). Many fewer stations are flagged with these updated tests. However, a number are still flagged. These are flagged throughout the period of interest rather than during the hottest part of the day, and so were likely the result of a test which flags an entire month. After some spot checks on these stations, these flags are from the excessive variance test.
|Figure 2. As for Figure 1 but after updated QC tests.|
The excessive variance test looks at the distribution of the within-month variance, and identifies months with exceptionally low or exceptionally high variance. The scaling used for this test is the interquartile range, and months which have a variance more than 8 IQR from the average are flagged. As can be seen for the example of Osoyoos (712150-99999, latitude 49.033, longitude -119.433), the variance for June 2021 is much larger than any previously seen June (Figure 3).
|Figure 3. The Variance Check for Osoyoos (BC, 49.033, -119.433) showing the all of June 2021 flagged.|
The variance check uses a fixed threshold of 8 IQR rather than thresholds generated from the properties of the distribution (as in the climatological and distribution checks). To update this check to determine thresholds from the distribution itself (as in the e.g. climatological and distributional checks) would be a larger change than the relatively small ones we have done so far. Also, in the example in Figure 3, a reasonable threshold determined from the distribution may still have excluded June 2021 (note, the y-axis is a log-scale) and we might struggle to be objective in this change if tailoring to this specific event, perhaps causing issues in other regions. In contrast, the changes in the climatological and distributional checks were easily motivated (and perhaps should have been spotted during development of the monthly updates).
we noted in the HadISD papers, the automated QC is a balance between
removing erroneous/dubious observations but retaining true extremes, and
what we do not want to do is make changes with inadvertently large
impacts elsewhere. Our plan at this point in time is to note this as an issue for this test (and event) to look at in the future in any next major update to HadISD. Any flagged data in HadISD is removed from the netCDF data fields, but remains available within the netCDF files should users wish to access it. If you have thoughts on this, please do get in touch or comment below.
The next update to HadISD (in October 2021) will show a version number increment to reflect these changes in the QC tests (188.8.131.52109p).
[Animations of three days of the heatwave showing the flagged stations before and after the QC test updates are available on the HadISD homepage].