For the last few years I have been working on adapting the Python code that compiles HadISD2 to run in such a way to enable monthly updates. To do this, I've adopted the following conventions and outlines for the processing.
Quality Control TestsIn some of the quality control tests, the entire period of record is used to e.g. determine parameters of a distribution or set threshold values. With monthly appends to the data, these parameters would change from month to month, resulting in changing threshold values with each release. I decided that the flutter that this would cause in whether observations were flagged or not would be undesirable to users and so developed the tests as follows.
Where thresholds were set from the parameters of the observations themselves, these are only ever calculated from the data occurring up to the last complete year (31st December at 2300). Therefore, adding extra months has no impact until a run in January of a following year.
Some tests do not have thresholds set in this way, and so these are not impacted by the monthly append of new data.
Update cycle and versioning schemaTo retain a stable set of stations that make up HadISD, we have decided to only recreate the station list on an annual cycle. At the same time, all data in the "deep past" can be updated (i.e. prior to the most recent complete calendar year). On monthly updates, only the current year will have any data updated, but due to the way that the ISD files are stored at NCEI, an entire year is downloaded so the update in November could also have changes in February.
To allow users to clearly identify which version of HadISD they are using in any output, we are going to stick with the versioning scheme as of HadISD and HadISD2 (x.y.z.datelabel). However, the date stamp will be more important for the monthly updating dataset. As there will be significant changes to the code, variables and processing we have decided to increment the overall version number by one - forming HadISD version 3. This is documented in a Met Office Hadley Centre Technical Note.
We have been running monthly updates for testing and internal purposes during the last few months of 2018. The first release using this new code will be that in January 2019. This includes the addition of 2018 data over previous release (v18.104.22.1687f), but no changes in the deep past, and no reselection of the stations. This update could be called v22.214.171.124812p - the preliminary update including data to the end of December. But as this will be the final monthly update for 2018's data, it will be released under v126.96.36.1998f.
In February, the update which will include January's data will also check for any updates in previous years (1931-2018). With a new station selection in this update, it will be released as v188.8.131.52901p. In March v184.108.40.206902p etc all the way to January 2020 with v220.127.116.119f there will be updates where all data from 2019 is overwritten. Then in February 2020, there is another update to the deep past, to the station list and also to include January 2020 - resulting in v18.104.22.168001p (note the change in the date label as well as in the "z" label).
Other bits and bobs - precipitation and station level pressureWhile implementing the changes to the HadISD code base, I decided that this was an opportunity to address the issue with the precipitation fields, outlined in the previous post. There are 4 precipitation fields in the ISD, each with an accumulation period, an accumulation amount and quality code. I've split these out into new fields in the netCDF files, one for each accumulation period. This should make it easier for users who wish to use the precipitation amounts. However it is important to note that at this point, these data have NOT been quality controlled.
A user requested to have station level pressure (different to sea-level pressure that is currently in HadISD) included. This we have done, and added another QC test to compare the station and sea-level pressures. If the difference between them is greater or less than 4.5 median-absolute deviations from the median difference, then the station level pressure is flagged.
A Met Office Hadley Centre Technical Note has being drafted and will shortly be available (also on the HadISD website). We encourage users to provide feedback on the monthly updates during 2019.