Monday, 20 October 2014

Further thoughts on the Merging Problem

To follow on from the previous post, we've done some more thinking about the issue of merging stations correctly.  The three options from last time were:
  1. We can not merge at all, keep all the ISD station IDs as unique and be confident that by creating HadISD we have not degraded any of the data.
  2. We can merge only in cases where we have specific information (from a national met service, for example) as to the identity of stations.  This could also be applied in cases where we have information indicating that a split of a station record would be appropriate 
  3. We can merge (and split) when we have specific information, but also run an automated procedure to identify candidate stations to merge together.  An example algorithm has been produced by the International Surface Temperature Initiative databank v1.0 (see description in the paper).  However this approach is very likely to introduce some spurious mergers, however careful we are with the algorithm.
with our preference at the time leaning towards number 2.  

When we were thinking of option 3, what we had in mind was something similar to the merging process carried out for HadISD v1.0.0.  In that process the merging of short records was carried out before stations were selected, so that lots of short records, once merged, would be long enough to pass the selection criteria.  This cross-matching of all 29,000 stations in ISD to obtain a parent-set from which HadISD is drawn will result in some final stations being composed of many short segments.  The likelihood that some of these stations will be erroneously merged together is quite high given the automated nature of the build.  A subtly different alternative came to mind.

However, what could be done is to select stations on the raw ISD record lengths and reporting intervals, and then using this master-list, see which other stations in the ISD could be merged in to supplement these primary stations.  This will not increase the final station list (in fact it will decrease it as there will be stations that have been selected that will be merged together), but should improve the data coverage over time for the final set of merged stations.

Fig. 1: Flowchart showing envisaged station selection procedure with merging

Most of the merging process takes place after stations have been selected on the basis of their length of record and their reporting interval, however for specific countries, it occurs before as we have extra and definitive information as to which stations should be merged or split.  At the moment there are also stations in the master list which will be selected to merge with other stations in the master list - hence the reduction to 8207.

Selecting stations to Merge

To select stations which are possible merging candidates we so far are using a very simple algorithm.  We test the horizontal and vertical separation of the stations and also the similarity of the station names.  The distances are mapped to an exponential decay curve, which returns a value between 0 and 1 which we use as a probability.  For the horizontal distances, this curve falls to 1/e by 25km, and by 100m for the vertical separation.  To calculate the similarity of the station names, we use the Jaccard Index (also used by ISTI), also returning a value between 0 and 1.  These three probabilities are multiplied together, and stations where the final value is >0.5 are selected.

For an automated system there is no perfect result - there will always be false positives (stations that we shouldn't merge and are distinct) and false negatives (stations that we should merge but do not select to do so).  An inspection of the resulting candidates for the UK (where we have more idea of the suitability of merging these candidates) suggests that the values used above are reasonable, with no obvious false positives.  The algorithm and thresholds are not yet set in stone, so changes can still occur.

At the current moment in time we find that within the primary station list, 478 stations are similar to others also in the list, reducing the station number to 8207 (including the changes from the German stations outlined below).  By cross matching these 8207 stations to the complete ISD database, 2101 will contain data from other station IDs.

Fig. 2: The effect of the current merging system on the number of stations that report over time.  Improvements at the beginning and end of the record are clearly visible.
Fig. 2 shows the effect of the merging selection as it currently stands on the stations available in each year.  There are clear improvements in the number of merged stations available in the early part of the record (1935-1970) and also the last 10 years or so. 

Specific Countries - Germany & Canada

For some countries we have specific information about which stations to merge or split (and we hope that we obtain more of these lists as time goes on).  Currently we have information about German stations, whose id's start with 09 and 10 in the isd-history.txt files.  Here, the last 4 digits of the WMO-ID are important, and some stations have had their records split so that 09abcd and 10abcd are the same station.  So we use the selection algorithm to check these station-pairs specifically, and allow them to merge if they pass the same criteria as outlined above.

Including this information prior to the station selection criteria results in 8685 stations being selected compared to 8667 before.

For Canada, things are a little more complicated.  There are only 1000 WMO-IDs available for Canada, and as a result, stations with different locations have ended up with the same IDs.  Thanks to Environment Canada, we have a list of the station moves.  In this case we want to split up records so that apparent false mergers are not included in HadISD.  We are still working on including this information in the station selection code.


These criteria and this procedure have not yet been finalised.  We may still revert to only merging/splitting stations where we have specific information.  If you have further suggestions or comments, please let us know.

Tuesday, 7 October 2014

Extending HadISD: Station Selection

I have started to re-assess the station selection part of HadISD.  During the early stages of creating HadISD (in around 2008), the ISD database was interrogated to find stations which would be suitable for HadISD.  This process, outlined in the paper, resulted in the 6103 stations which form HadISDv1.0.x.

However, this station list has not been updated since that point.  This means that we have not benefited from any new stations that have been added to the ISD database in recent years.  The static station list may also to be partly to blame for the jump in 2005 in the total number of stations with available data (see also HadISDH) and also the fall-off in the number of stations since 1990.
Fig. 1 - Number of stations which have data in any given year in HadISDv1.0.x.  These are all the individual input stations (including those merged to form composites), hence the peak is more than 6103.  The dip at 2005 is visible, as well as the drop before 1973 (which set the start period of HadISD v1.0.x).

At the same time as increasing the station selection we also intend to extend HadISD so that data is available and quality controlled prior to 1973.  It is clear from Fig. 1 why the start year of HadISDv1.0.x was chosen as 1973, however this does result in a relatively short period of record. 

Back to the beginning

So, we have gone back to the ISD database to dynamically return a station listing which could be run with each major update of HadISD.  Using the isd-history.txt we extracted those stations which have valid latitudes, longitudes and elevations, and also those which had at least 15 years between their start and end dates.  There are 29525 unique station IDs in the ISD database, and 14947 satisfy these criteria (these numbers will change fractionally as the ISD database is continually updated).

The isd-inventory.txt lists the number of observations in each month for each station.  We have used this to find those stations which report on average every 6 hours and which have observations in at least 15 years worth of months (180 months) to account for stations with many gaps.  This returns 8694 stations world wide.

Fig. 2. The number of stations which have data using the initial version of the updated station selection code.  The drops in 1972 and 2005 are still visible, but the gentle drop off from 1990 is less pronounced when compared to Fig. 1
We have dropped the reporting interval to every 6 hours rather than every 3 to try and select more stations in those regions where currently HadISD does not have many (e.g. central South America, Africa) but still maintain a subdaily resolution.

As can be seen in Fig. 2, there is still a large dip in 1972, and the drop in 2005 has also not entirely disappeared.  Some of these dips may be ameliorated by merging stations together.   However the drop off post 1990 is less prominent, and there are more stations overall.

To merge or not to merge?

As the ISD had many stations with short records, when creating HadISDv.1.0.x stations were merged to create ones with longer records.  This was done using a hierarchical table (see Table 1 in the paper) to identify potential candidates and then an in-depth and time-consuming manual process to reduce this to the mergers used.  If the station selection is to be run on each major update, then selecting these merger candidates would have to be automated.

We could go back to the raw ISD listings and find stations which are merging candidates with the ~8500 initially selected.  By merging these in, some of the gaps in Fig. 2 could be filled (but also possibly not).

As we have found over time, not all of these mergers are correct, and therefore a number of options present themselves:

  1. We can not merge at all, keep all the ISD station IDs as unique and be confident that by creating HadISD we have not degraded any of the data.
  2. We can merge only in cases where we have specific information (from a national met service, for example) as to the identity of stations.  This could also be applied in cases where we have information indicating that a split of a station record would be appropriate 
  3. We can merge (and split) when we have specific information, but also run an automated procedure to identify candidate stations to merge together.  An example algorithm has been produced by the International Surface Temperature Initiative databank v1.0 (see description in the paper).  However this approach is very likely to introduce spurious mergers, however careful we are with the algorithm.
We have not yet decided which route to follow, but are erring towards the second in the first instance.  

If you have any further suggestions or preferences, please leave a comment or get in touch.

Wednesday, 3 September 2014

Assessing the Homogeneity of HadISD

Apologies for the delay in writing this post, I have been distracted by working on the next version of our QC suite for the HadISD dataset.

Last month, our paper on the Pairwise Homogeneity Assessment of HadISD (Dunn et al, 2014) was published in Climate of the Past.  I have blogged about some of this work before here.

Pairwise Homogeneity Assessment

We used the Pairwise Homogenisation Algorithm (PHA) used for the US Historical Climate Network (USHCN) by Menne & Williams (2009).  This algorithm has the advantage of being able to run automatically for large networks of stations.  As there are ~6000 stations in HadISD, this was an important consideration when selecting which algorithm to use.  

The PHA has also been benchmarked for the USHCN stations (Williams et al, 2012) and also as part of an inter-algorithm comparison run as the COST-HOME project (Venema et al, 2012).  In the COST-HOME analysis, PHA was not the best performing, but was recommended as one of the algorithms to use when performing homogenisation on a monthly basis.  We would ideally like to be able to use a number of algorithms on the HadISD data to understand and be able to quantify (to some level) the uncertainties in the change point locations and adjustment magnitudes.  This will hopefully be available in future releases of HadISD.

As the paper is open access, I'm not going to go through the full details of the final methodology here.  However, if something is unclear in the paper, do leave a comment or get in touch.

Example Application - Global Land-Surface Temperatures

We decided from the outset that we would be unable to apply any of the adjustments found to the hourly data.  The adjustments were calculated using monthly averages, and so it is very likely that these are not going to be the correct values to use for the hourly data.  In fact, we may never be able to calculate appropriate adjustment magnitudes for this high time-resolution data as they are likely to depend on the cause of the inhomogeneity, the time of day and possibly even the weather type.  For example, the effect of moving a station will be different at different times of day (even with the same weather) as the sun hits the screen earlier or later; and if the weather changes, the new location may respond differently over the course of a day than the old.  Determining this from just the data itself (using no metadata as is the case for many stations) will be very hard.

So, what can be done with a list of change point dates and adjustment magnitudes.  Well, we thought that we could see what the "global" average land-surface temperature change from 1973-2013 is when taking stations with fewer and fewer or smaller and smaller inhomogeneities ("global" is in quotation marks because of the station distribution of HadISD - some regions are better sampled than others).   This would show (a) if there was an effect by excluding the more inhomogeneous stations, (b) what this effect was and (c) what other issues arise.

We were also inspired by the work of Callendar (1938, 1961) who used very few stations to estimate the global land-surface temperature and obtained results in agreement with the latest best estimates (see article by Ed Hawkins).  If Callendar's work obtained accurate results with few stations, then using a subsets of the HadISD stations should also.
Fig. 1. All 6103 stations from HadISD (black), CRUTEM4 (red) and CRUTEM4 with matched coverage (blue).  Bottom panel shows the deviations (with the full CRUTEM4 divided by 10 for clarity)

When using all 6103 stations there is already very good agreement between HadISD and the matched CRUTEM4 (Jones et al, 2012) - see Fig. 1.  The HadISD timeseries line (black) is not easily distinguished from the matched CRUTEM4 (blue), and the linear trends match very well.  We use linear trends in this case to easily summarise the change over the 40 years of data.  For this study, we use CRUTEM4 as our known comparison field.  Therefore we compare both against a version where we match the coverage, which is the best possible result we could achieve with HadISD, and the full version, which gives information as to what we are missing as a result of the incomplete coverage that HadISD has.
Fig. 2.  As for Fig. 1, but only taking stations where the largest inhomogeneity is less than 1C.
Only retaining stations where the largest inhomogeneity is <1C (4300 stations), improves the agreement, both for the linear trends but also for the deviations and root-mean-square errors (bottom panel).  Hence using the homogenisation assessment to select those stations which are relatively homogeneous does result in a better estimate.  However, the RMS against the full CRUTEM4 has increased from 0.13 to 0.17.
Fig.3.  As for Fig.1, but only taking stations where the largest inhomogeneity is less than 0.5C.
In Fig. 3 we were even more restrictive and here, although there was still a good match between HadISD and the matched CRUTEM4, the RMS has increased fractionally.  Similarly (not shown here, but in the paper) if we also restricted the number of change points that were detected in a station, the agreement also deteriorated, but not as rapidly as with the size of the inhomogeneity.
Fig. 4. As for Fig. 1, but only taking those stations in which no inhomogeneity was found or which could not be assessed.
Finally, in Fig. 4, we show the result when taking only those 1458 stations in which no inhomogeneity was found or which could not be assessed by PHA.  Here the agreement between HadISD and CRUTEM (both full and matched) is clearly deteriorating, although the linear trends do still agree within their uncertainties.  Also, the HadISD line does not emerge from the CRUTEM uncertainty envelope, even using these restrictions.

This indicates that eventually it is the coverage or which exact underlying stations are included which start to dominate any error, rather than the station quality itself.  However, removing the most inhomogeneous stations does result in better agreement, and also greater confidence on the part of researchers that results obtained are accurate.  Hence there is a balance to be drawn, and this will depend on the problem being addressed and also, at some level, to individual preference.  We can also use the results to double check our merging procedure as some inhomogeneities are likely to arise from stations that should not have been merged together.


Callendar, G. (1938). "The artificial production of carbon dioxide and its influence on temperature", Quarterly Journal of the Royal Meteorological Society, 64 (275), 223-240 DOI: 10.1002/qj.49706427503

Callendar, G. (1961). "Temperature fluctuations and trends over the earth", Quarterly Journal of the Royal Meteorological Society, 87 (371), 1-12 DOI: 10.1002/qj.49708737102

Dunn, R et al. (2014) "Pairwise homogeneity assessment of HadISD", Climate of the Past, 10, 1501

Hawkins, E & Jones, P (2013) "On increasing global temperatures: 75 years after Callendar", Quarterly Journal of the Royal Meteorological Society, 139, DOI: 10.1002/qj.2178

Jones, P et al. (2012) "Hemispheric and large-scale land-surface air temperature variations: An extensive revision and an update to 2010", Journal of Geophysical Research: Atmospheres", 117

Menne, M, & Williams, C, (2009), "Homogenization of temperature series via pairwise comparisons." Journal of Climate 22.7, 1700
Venema V et al, (2012) "Benchmarking homogenization algorithms for monthly data" Climate of the Past, 8,89
Williams C, N, Menne M, L and Thorne, P, W, (2012) "Benchmarking the performance of pairwise homogenization of surface temperatures in the United States" Journal of Geophysical Research, Atmospheres, 117

Monday, 9 June 2014

Cows, Milk and Heat Stress

We've just had a paper published in ERL on heat stress in UK dairy cattle and the effect this has on milk yields.  This was initially a short piece of work which at the time was the first application of HadISD to a specific project.  However the project grew and took longer than anticipated, and hence has only been published now.

(UK) dairy cattle and heat stress

Many animals are affected by high temperatures and humidities.  Think about a hot, humid day, personally I wouldn't have much energy and would prefer just lazing in the shade of a tree.  If, for example, I went for a run, I would get pretty warm, which may cause heat-stroke if I went for too long.  And if it is humid overnight, I don't sleep as well as normal either, which puts my body under more stress.  Cattle aren't often seen going for runs, but if they cannot cool themselves, they also get heat-stressed.  This means they divert energy from growing and producing milk to cooling down, and so you can measure the effect of the warm temperatures in the milk yields, which is handy as asking them how they feel is tricky!

We used a measure for heat stress called the "Temperature-Humidity Index" (THI) which combines the air temperature and the relative humidity.  Both of these are available from data in 68 of the HadISD stations used in this study.  When the THI rises above 70, then cattle experience heat-stress, and if it rises over ~90, then the heat stress is very severe and could be fatal.  We looked at the daily average THI, and across the UK, for most of the stations, this threshold of THI > 70 is only crossed in one or two days per year (Fig. 1).

Fig 1.  Top: The average number of days (over 1973-2012) with THI > 70 for the 68 UK stations.  Bottom: The number of days with THI > 70 in 2003.  Stations with no days where THI > 70 are shown with grey squares.  Note change in colour scale between the two panels.
The stations inside the M25 (the motorway around London) have more days, but we suspect that there are few cattle being raised that close to London.  Looking at the years of 2003 and 2006, which had very warm summers, there were 5-10 days where cattle could be heat-stressed (Fig. 1).  Using data provided by the Cattle Information Service we were able to look at data for milk yields on an animal-by-animal basis for a number of herds in the areas most likely to have been strongly affected by the high temperatures.

Factors affecting Milk Yields

There are a number of complicating factors when looking at milk yields from dairy cattle.  The amount of milk a cow produces depends on both the number of calves she has had, and also the number of days that have elapsed since her most recent calf was born (Fig 2.) 

Fig.2 Top: Yield vs days in milk, Bottom: Yield vs lactation number for herd 198 in Devon.  Blue points show individual measurements and the red ones show the average (over a 10 day bin or lactation number) with a 1-sigma uncertainty.

By looking at data on an animal-by-animal basis we were able to select cows in their first lactation and use ranges of 50 days since the birth of their calf to try and reduce the variation from these additional effects.  Fig. 3 shows the change in the milk yields for one herd over the entire record.  There is a clear decline in milk yields in 2006 for all ranges of days-in-milk.  The yield drops from 30 litres to between 15 and 20 litres.  There are few cattle contributing prior to 2004, resulting in a noisy curve and no clear indication of an effect in 2003.  Combined with the effect of heat-stress is also the reduction in pasture quality for grass fed cattle as the fields tend to dry out during hot weather.  However the feed information was not available

Fig. 3 Milk yields for one-calf cows in herd 4199 in Somerset.  The bands are the 1-sigma ranges for the three ranges of days-in-milk.  The dashed lines show the numbers of cows contributing to the mean milk yields.  The magenta line shows the Somerset average monthly temperature (Perry & Hollis, 2005).  The vertical dashed lines show the heatwaves of 2003 and 2006.

Climate Projections from UKCP09 

To study the effect of any future change in the climate on the number of days with high THI we used the projections from the UK Climate Prediction '09 assessment (UKCP09, Murphy et al 2009).  This is an 11-member ensemble of regional climate models runs on a 25 x 25 km grid.  The future climate is driven by a medium emissions scenario (A1B).  Fig. 4 shows the number of days in each grid box where the THI>70 for the south-west region.  Currently, as per the observations, there are only a few days on average where the threshold is exceeded.  But by the end of the century, this could have risen to around 30 days per year.
Fig. 4. The number of days per grid box where the THI > 70, averaged over the south-west region.  Each of the 11 ensemble members are shown in the grey lines, with the median and interquartile range being given by the thick black line and blue envelope respectively.

This could have a large effect on the milk yields produced by cows, and so impact on the viability of herds and dairy farming in parts of the UK.  Although keeping cattle indoors can mitigate the effect of direct solar radiation, the humidity in barns has been found to be always higher than indoors (Erbez et al, 2010).  Thought will have to be given in the future how best to keep cattle for their well being and also to ensure that dairy farming remains viable in parts of the UK.


Erbez M, Falta D and Chládek G 2010 The relationship between temperature and humidity outside and inside the permanently open-sided cows’ barn Acta Universitatis Agriculturae et Siliviculturae Mendelianae Brunensis (Brno, Česká Republika), LVIII 91–6
Murphy J M et al 2009 UK Climate Projections Science Report: Climate Change Projections Met Office Hadley Centre, Exeter, UK

Tuesday, 29 April 2014

HadISD v1.0.2.2013f released

The latest version of HadISD has been made available on the hadobs website.  This version (v1.0.2.2013f) supercedes the preliminary version from earlier this year (v1.0.2.2013p).  There were further updates to the ISD source data for the year 2013 since the preliminary dataset was created in January, but no changes in earlier years.

Extra variables have been pulled through from the ISD source data in this release, but these have not been quality controlled.  They are wind gust and precipitation period.  As these were not quality controlled there has been no further increment of the version number.  If you use these variables then be aware that they are provided as is, with no guarantee as to their quality.

As always, if you find anything untoward in the data, please contact the dataset maintainers.

Monday, 14 April 2014

First steps in homogenising HadISD

Having done the second annual update to HadISD in January this year (to version, we have started the process of homogenising the dataset.  The issue of homogenising hourly data (applying the adjustments to the data) is something that has not yet been fully solved.  Monthly homogenisation has been used for a while now, and there has been at least one benchmarking study to assess the accuracy and precision of the different available algorithms (Venema et al 2012).  Solving the problem of daily homogenisation has been started by some groups, but my impression is that these have been for small, regional networks of stations and also involved a relatively large amount of manual intervention.  I am open to suggestions of algorithms and studies that I have missed.

In the light of these issues, rather than trying to solve the problem of automated, hourly homogenisation in one step, we shall start by releasing the homogeneous sub-periods for each station in the dataset.  However this means that the users will need to decide what to do with this information.  For example, each sub-period could be treated separately or stations with few/small breaks could be given a greater weighting than those with many/large breaks in any analysis.

As HadISD contains 6103 stations, we have had to use methods and scripts which allow for a completely automated system.  This has the advantage that the results are completely reproducible and objective, even if a system which includes some manual checking might be better in some situations. 

We have chosen to use the Pairwise Homogenisation Algorithm (PHA) used for the US Historical Climate Network (USHCN) by Menne & Williams (2009).  Kate Willett has already used this for her HadISDH dataset (Willett et al. 2012) and is using it for the extension to other humidity variables (for more details see  We could therefore be certain that this system would run on the data automatically and be quick enough to be of use.  Alternative systems were considered (e.g. SPLIDHOM/HOMER, ACMANT, MASH) but none of these were suitable either because of the computer operating systems available or because of the level of manual intervention required.  This was a shame, as a comparison between two or more systems would have given some level of confidence on the breaks found.  Perhaps something for the future.

Networks and Averages

When starting the homogenisation process with PHA, we found that the results were sensitive to the station network used.  Small changes in the neighbour selection and also the individual monthly values would mean that change points were or were not found in target stations.  Hence we initially decided to use four different networks, comprising of stations with more than 30, 20, 10 and zero years of data (the final network contains all 6103 stations). PHA was run using each of these networks separately.

Initially only monthly average data were used, calculated from daily averages.  These are calculated for all days which have more than four observations spread over at least a 12 hour time-span.  Monthly means were calculated for all months with at least 20 qualifying days.  However, Wijngaard et al. (2003) showed that change points were clearer when using the monthly average diurnal temperature range.  Also, monthly average maximum and minimum temperatures were used by Trewin (2013) when homogenising the Australian Climate Observations Reference Network - Surface Air Temperature (ACORN-SAT).  Using these measures could identify change points where the maximum or minimum temperatures change but the means remain unchanged.

We initially tried to use all four different measures (mean, diurnal range, maxima and minima) as well as the four different network types, resulting in 16 PHA runs for each variable (temperature, dewpoint temperature, sea-level pressure and wind speeds).  Change points were merged if they occurred within one year of one another, and the average date was used.  The final set of change points were those which were identified in at least two of the 16 methods.

Although using all these methods and networks may compensate for the conservative nature of PHA (Venema et al. 2012), this approach was biased to selecting change points in stations with longer records.  If a change point is detected 50% of the time by any of the PHA runs, then it is likely to appear 8 times overall for a station with a long record, but only once for a station with a short record, and hence fail to meet the selection criteria in the short station, but be selected in the longer station.

We therefore reduced the complexity to use only the complete network of 6103 stations, and only the mean and diurnal range (temperature and dewpoint), mean and maximum (wind speeds) or mean only (SLP).  If change points were within one year of one another, they were merged.  Naturally this reduced the number of change points detected.  However, despite not applying adjustments to the data, we do not want to make the data worse.  This approach is less likely to include spurious change points in the final lists then if using the combination of all 16 methods. 

Final Methology

The most important change points are those with the largest adjustment values, which should be detected in this simpler analysis as well as in the more complex one.  The change points with the smallest adjustment values are difficult to detect using any of the methods, resulting in the characteristic "missing middle" when showing the distribution of adjustments (Fig. 1).  Assuming that the Gaussian envelope is an accurate representation of all adjustments in HadISD, then PHA has identified most down to a limit of around 0.5C, with no strong bias in the distribution.
Fig. 1 Distribution of adjustment sizes for the monthly average temperatures.  The raw adjustments are shown in black, with a fitted Gaussian in red.  The difference between this Gaussian and the detected adjustments is shown by the blue histogram.

Unsurprisingly, the longer the station record, the more change points are detected, with on average 2.8 per station over 41 years (roughly one every 15 years), but four stations have 11 change points (Fig. 2). 

Fig. 2. The distribution of the number of change points with the length of the station record.
In due course, the change point dates and adjustment sizes (which have not been applied to the data) will be made available on the HadISD website.

The paper describing the final methodology in detail, and also the characteristics of the change points in the dewpoint temperature, sea-level pressure and wind speed observations is now under open review with Climate of the Past:


Menne, Matthew J., and Claude N. Williams Jr. "Homogenization of temperature series via pairwise comparisons." Journal of Climate 22.7 (2009): 1700-1717.

Trewin, B.: A daily homogenized temperature data set for Australia, International Journal of Climatology, 33, 1510–1529, 2013.
Venema, V., Mestre, O., Aguilar, E., Auer, I., Guijarro, J. A., Domonkos, P., Vertacnik, G., Szentimrey, T., Stepanek, P., Zahradnicek, P., et al.: Benchmarking homogenization algorithms for monthly data, Climate of the Past, 8, 89–115, 2012.

Wijngaard, J., Klein Tank, A., and Koennen, G.: Homogeneity of 20th century European daily temperature and precipitation series, International Journal of Climatology, 23, 679–692, 2003.

Willett, Kate M., et al. "HadISDH: an updateable land surface specific humidity product for climate monitoring." Climate of the Past 9.2 (2013): 657-677.

Monday, 7 April 2014

Low windspeeds in Irish stations

Thanks to Clive Wilson (Met Office) for informing us that the wind speeds in the Irish stations between July 1996 and August 1998 are lower than the surrounding years.  An example is shown in Fig 1. for Dublin.

Fig. 1 - Wind speeds for Dublin (039650-99999).  The vertical lines are change points detected on a monthly basis using the PHA algorithm of Menne & Williams (2009).  There is a change in resolution in the middle of 1998 coinciding with the change point.

Of the 14 Irish stations in HadISD, 12 have continuous data across this period (039520-99999 Roches Point and 039700-99999 Claremorris have no or sporadic data only across this period).  Most of the periods are identified using the PHA homogenisation algorithm that we are in the process of applying to HadISD.  

The affected stations are:

039550 99999 CORK AIRPORT
039570 99999 ROSSLARE
039600 99999 KILKENNY
039620 99999 SHANNON AIRPORT
039650 99999 BIRR

039690 99999 DUBLIN AIRPORT
039710 99999 MULLINGAR
039740 99999 CLONES
039760 99999 BELMULLET
039800 99999 MALIN HEAD 

For the moment we advise users of HadISD to be cautious when using wind speed data for these stations over this period.  We are investigating the cause of this low period with the maintainers of the ISD at NCDC and will update this post when we have more information.

Monday, 27 January 2014


We are in the process of finalising the update to HadISD version  All plots and files should appear on the website later this week. This update extends the coverage of the dataset to the end of 2013 (31 December at 2300 inclusive).  It remains a preliminary dataset as there could still be further updates to the ISD dataset in the next few months.  We hope to do a processing run for the final version some time around Easter (to create

We decided not to run an update last year (to what would have been v1.0.1.2012f) as the maintainers of the ISD were doing some large updates to the raw files.  It would only make sense to do the update once the ISD was stable, which would have meant our update being released towards the end of the year.  However, we hope that this year we can stick to our planned update cycle.

The raw data were downloaded on 14th January 2014, and processed over the subsequent week.  There have been changes to all of the raw files in 2010, 2011 and 2012 as part of the ISD update process mentioned above.  We have made no substantial changes to the codes which do the conversion to NetCDF files or the Quality Control suite.  Hence the version number has only incremented by 0.0.1 and the year.

This version still contains 6103 stations, with 4071 passing the final filtering checks, down slightly from the 4206 in v1.0.1.2012p (see the HadISD paper Section 6).  The patterns of flagging are very similar to v1.0.1.2012p.  However if you find something strange, do let us know using the contact details on the HadISD website.  Please note the stations which are known to have issues, documented on this blog and on the website.

Percentage of data removed by the QC tests for Temperature in HadISD v1.0.2.2013p
Percentage of data removed by the QC tests for Dewpoint Temperature in HadISD v1.0.2.2013p

Percentage of data removed by the QC tests for SLP in HadISD v1.0.2.2013p.  SLP is not reported at all time stamps, and so with shorter records the amount removed can appear higher.

We hope do have time to do some more development work on HadISD during 2014 which will address these stations as well as other improvements we have in mind.  So, if there are any requests, do get in touch.