Wednesday 7 January 2015

Attempting to fix undocumented merges

As mentioned in an earlier posts, we had found some issues in the Canadian stations which appeared like undocumented station moves.  In discussions with Environment Canada, we were given a list of the Canadian WMO stations along with dates of their changes.  There were 994 stations present in their list.  

We separated the stations out into different categories (the number of stations in each is given in parentheses): 

  • Single - stations which appeared in the list only once (529)
  • On/Off - stations which had an "active" and "inactive" status indicating the start and end dates of operation (47)
  • Good Station Moves - stations which showed a change in location, with dates showing the end of reporting at the previous location, and the start in the new location (216)
  • Overlap Moves - similarly to good station moves, but the start of reporting in the new location occurs before the end of reporting at the old (15)
  • Possible Homogeneity issues - multiple dates at a single location indicating perhaps changes in instrumentation (92)
  • Questionable Moves - location changes with no dates given showing the end at one or the beginning at another location (33)
  • Dates - cases where "active" and "inactive" statuses occurred at the same time, so the final status could not be determined (49)
  • Other - more complex sets of start and end dates that could not be categorised easily (13)
In the ISD, there are more than 1000 stations listed as being in Canada.  We selected those which were likely to correspond to the WMO stations (those which have IDs that match 71???0-99999).  This resulted in 934 stations which we could compare to the Environment Canada list.

Stations which appeared in the Single, On/Off and Homogeneity issues categories were retained in the candidate station list.  Those from the Questionable Moves, Dates, Overlap moves and Other were rejected from the station list. 

The 216 stations in the Good Moves list were processed further.  Using the station details in the ISD list, the period of time when the station was in this location as determined from the Environment Canada list was extracted.  Usually this was the most recent location.  The start and end times of the station were adjusted as appropriate to ensure that only the period in the location as given in the full ISD station list was used when further selecting stations.  In many cases this will result in the station not being selected for inclusion with HadISD.

Of the 934 Canadian stations we were able to assess, 797 were kept for processing by further selection criteria, 33 could not be tested and 104 were rejected. 

There are other stations which are located in Canada (which do not match the WMO IDs) which we could not process.  These, along with the 33 which were not in the Environment Canada list, were retained in the stations selection procedure as we have no information indicating that there are problems with them.

These changes result in 14762 stations being selected using the restrictions on latitude, longitude and time-spans, 8561 in the master-list and 8104 in the final merged list (of which 2045 have other stations merged into them).  This a reduction from the 8207 stations which were in the previous selection, but hopefully fewer of these have serious inhomogeneities resulting from the undocumented station moves.

No comments:

Post a Comment