Tuesday, 7 October 2014

Extending HadISD: Station Selection

I have started to re-assess the station selection part of HadISD.  During the early stages of creating HadISD (in around 2008), the ISD database was interrogated to find stations which would be suitable for HadISD.  This process, outlined in the paper, resulted in the 6103 stations which form HadISDv1.0.x.

However, this station list has not been updated since that point.  This means that we have not benefited from any new stations that have been added to the ISD database in recent years.  The static station list may also to be partly to blame for the jump in 2005 in the total number of stations with available data (see also HadISDH) and also the fall-off in the number of stations since 1990.
Fig. 1 - Number of stations which have data in any given year in HadISDv1.0.x.  These are all the individual input stations (including those merged to form composites), hence the peak is more than 6103.  The dip at 2005 is visible, as well as the drop before 1973 (which set the start period of HadISD v1.0.x).

At the same time as increasing the station selection we also intend to extend HadISD so that data is available and quality controlled prior to 1973.  It is clear from Fig. 1 why the start year of HadISDv1.0.x was chosen as 1973, however this does result in a relatively short period of record. 

Back to the beginning

So, we have gone back to the ISD database to dynamically return a station listing which could be run with each major update of HadISD.  Using the isd-history.txt we extracted those stations which have valid latitudes, longitudes and elevations, and also those which had at least 15 years between their start and end dates.  There are 29525 unique station IDs in the ISD database, and 14947 satisfy these criteria (these numbers will change fractionally as the ISD database is continually updated).

The isd-inventory.txt lists the number of observations in each month for each station.  We have used this to find those stations which report on average every 6 hours and which have observations in at least 15 years worth of months (180 months) to account for stations with many gaps.  This returns 8694 stations world wide.

Fig. 2. The number of stations which have data using the initial version of the updated station selection code.  The drops in 1972 and 2005 are still visible, but the gentle drop off from 1990 is less pronounced when compared to Fig. 1
We have dropped the reporting interval to every 6 hours rather than every 3 to try and select more stations in those regions where currently HadISD does not have many (e.g. central South America, Africa) but still maintain a subdaily resolution.

As can be seen in Fig. 2, there is still a large dip in 1972, and the drop in 2005 has also not entirely disappeared.  Some of these dips may be ameliorated by merging stations together.   However the drop off post 1990 is less prominent, and there are more stations overall.

To merge or not to merge?

As the ISD had many stations with short records, when creating HadISDv.1.0.x stations were merged to create ones with longer records.  This was done using a hierarchical table (see Table 1 in the paper) to identify potential candidates and then an in-depth and time-consuming manual process to reduce this to the mergers used.  If the station selection is to be run on each major update, then selecting these merger candidates would have to be automated.

We could go back to the raw ISD listings and find stations which are merging candidates with the ~8500 initially selected.  By merging these in, some of the gaps in Fig. 2 could be filled (but also possibly not).

As we have found over time, not all of these mergers are correct, and therefore a number of options present themselves:

  1. We can not merge at all, keep all the ISD station IDs as unique and be confident that by creating HadISD we have not degraded any of the data.
  2. We can merge only in cases where we have specific information (from a national met service, for example) as to the identity of stations.  This could also be applied in cases where we have information indicating that a split of a station record would be appropriate 
  3. We can merge (and split) when we have specific information, but also run an automated procedure to identify candidate stations to merge together.  An example algorithm has been produced by the International Surface Temperature Initiative databank v1.0 (see description in the paper).  However this approach is very likely to introduce spurious mergers, however careful we are with the algorithm.
We have not yet decided which route to follow, but are erring towards the second in the first instance.  

If you have any further suggestions or preferences, please leave a comment or get in touch.

No comments:

Post a Comment