A couple of years ago James Goldie (UNSW) contacted me about an issue he found in HadISD relating to the reporting resolution of temperature and humidity information for stations in Australia.
In the HadISD, the data vary between single-degree, half-degree and 1/10th degree resolution. However, variations between these can cause some interesting striations in derived quantities.
James has written up his work, with some cool animated plots at his blog
News, updates and interesting features of the HadISD dataset
Tuesday, 22 August 2017
Wednesday, 7 June 2017
High windspeed values
Thanks to Phil Jones (UEA) and colleagues for pointing out this issue.
There are a number of stations which have wind values of 88 m/s which also stands out as a repeating value (see Figure 1).
These may be the result of a mistyped missing data code in the original data. It is also clear that this station may have rounding or conversion problems - we have not had the chance to investigate in detail so far.
The maximum wind speed used for the record check is 113.3m/s (derived from a maximum gust speed - https://wmo.asu.edu/content/world-maximum-surface-wind-gust), so this would not exclude these values. The wind speeds are not passed through the distributional or frequent value checks as the shape of the distribution is not gaussian and to this point, these tests have been written assuming this shape. Nor is the spike check applied. Therefore, unfortunately, our QC suite is not (yet) clever enough at identifying these erroneous values.
At the current time we do not have a solution to these issues - we would rather make folks aware than try and implement a "quick fix" which causes issues elsewhere. We will look into this during the course of this year and hope to roll out improvements to the wind QC in the next update.
The stations which have been noted as affected by repeated high values are:
151080-99999
156150-99999
156270-99999
228370-99999
Though others are noted to have one or a few high values.
Please do not hesitate to get in touch if you do spot any issues or would like more information on these.
There are a number of stations which have wind values of 88 m/s which also stands out as a repeating value (see Figure 1).
These may be the result of a mistyped missing data code in the original data. It is also clear that this station may have rounding or conversion problems - we have not had the chance to investigate in detail so far.
The maximum wind speed used for the record check is 113.3m/s (derived from a maximum gust speed - https://wmo.asu.edu/content/world-maximum-surface-wind-gust), so this would not exclude these values. The wind speeds are not passed through the distributional or frequent value checks as the shape of the distribution is not gaussian and to this point, these tests have been written assuming this shape. Nor is the spike check applied. Therefore, unfortunately, our QC suite is not (yet) clever enough at identifying these erroneous values.
At the current time we do not have a solution to these issues - we would rather make folks aware than try and implement a "quick fix" which causes issues elsewhere. We will look into this during the course of this year and hope to roll out improvements to the wind QC in the next update.
The stations which have been noted as affected by repeated high values are:
151080-99999
156150-99999
156270-99999
228370-99999
Though others are noted to have one or a few high values.
Please do not hesitate to get in touch if you do spot any issues or would like more information on these.
Thursday, 9 February 2017
HadISD v2.0.1.2016p
We have just released HadISD version 2.0.1.2016p. All plots and files should be on the website. Between the release of v2.0.0.2015p in September, there have been no updates to years in the past. The ISD raw data were downloaded on 19th January 2017 and processed over the following days.
The station selection was re-run, and so the station list has updated, with now 7877 stations present in this version. There have also been some minor changes to the quality control tests (affecting wind measurements) outlined below. A file indicating which stations are new to HadISD and which are no longer included compared to v2.0.0 is available.
As a result of requests from users, in this version we have passed the wind speed observations through the spike check, and also the wind direction observations through the repeated values (streak) check.
The threshold values used to activate flagging in the spike check are calculated from the properties of the data themselves, using the distribution of differences between one observation and the next.
For the streak check, although the parameters are calculated using the distribution of repeated values, these are only used to flag values if they are less than the defaults used in HadISD versions 1.0.x. We ensure that no calm periods are assessed when applying the streak check. The default values depend on the resolution of the wind direction and are in the table below (see Table 4 in Dunn et al, 2012 for more information).
However if you find something strange, do let us know using the contact details on the HadISD website. Please note the stations which are known to have issues are documented on this blog and on the website.
The quality control code used in this version will be uploaded to the github repository in the coming days.
The station selection was re-run, and so the station list has updated, with now 7877 stations present in this version. There have also been some minor changes to the quality control tests (affecting wind measurements) outlined below. A file indicating which stations are new to HadISD and which are no longer included compared to v2.0.0 is available.
As a result of requests from users, in this version we have passed the wind speed observations through the spike check, and also the wind direction observations through the repeated values (streak) check.
The threshold values used to activate flagging in the spike check are calculated from the properties of the data themselves, using the distribution of differences between one observation and the next.
For the streak check, although the parameters are calculated using the distribution of repeated values, these are only used to flag values if they are less than the defaults used in HadISD versions 1.0.x. We ensure that no calm periods are assessed when applying the streak check. The default values depend on the resolution of the wind direction and are in the table below (see Table 4 in Dunn et al, 2012 for more information).
Resolution (degrees) |
Repeated Streak (h) | Repeated Streak (d) | Repeated Hours | Repeated Days |
---|---|---|---|---|
90 | 120 | 28 | 28 | 10 |
45 | 96 | 28 | 28 | 10 |
22 | 72 | 21 | 21 | 7 |
10 | 48 | 14 | 14 | 7 |
1 | 24 | 7 | 14 | 5 |
However if you find something strange, do let us know using the contact details on the HadISD website. Please note the stations which are known to have issues are documented on this blog and on the website.
The quality control code used in this version will be uploaded to the github repository in the coming days.
Subscribe to:
Posts (Atom)