Wednesday 3 September 2014

Assessing the Homogeneity of HadISD

Apologies for the delay in writing this post, I have been distracted by working on the next version of our QC suite for the HadISD dataset.

Last month, our paper on the Pairwise Homogeneity Assessment of HadISD (Dunn et al, 2014) was published in Climate of the Past.  I have blogged about some of this work before here.

Pairwise Homogeneity Assessment

We used the Pairwise Homogenisation Algorithm (PHA) used for the US Historical Climate Network (USHCN) by Menne & Williams (2009).  This algorithm has the advantage of being able to run automatically for large networks of stations.  As there are ~6000 stations in HadISD, this was an important consideration when selecting which algorithm to use.  

The PHA has also been benchmarked for the USHCN stations (Williams et al, 2012) and also as part of an inter-algorithm comparison run as the COST-HOME project (Venema et al, 2012).  In the COST-HOME analysis, PHA was not the best performing, but was recommended as one of the algorithms to use when performing homogenisation on a monthly basis.  We would ideally like to be able to use a number of algorithms on the HadISD data to understand and be able to quantify (to some level) the uncertainties in the change point locations and adjustment magnitudes.  This will hopefully be available in future releases of HadISD.

As the paper is open access, I'm not going to go through the full details of the final methodology here.  However, if something is unclear in the paper, do leave a comment or get in touch.

Example Application - Global Land-Surface Temperatures


We decided from the outset that we would be unable to apply any of the adjustments found to the hourly data.  The adjustments were calculated using monthly averages, and so it is very likely that these are not going to be the correct values to use for the hourly data.  In fact, we may never be able to calculate appropriate adjustment magnitudes for this high time-resolution data as they are likely to depend on the cause of the inhomogeneity, the time of day and possibly even the weather type.  For example, the effect of moving a station will be different at different times of day (even with the same weather) as the sun hits the screen earlier or later; and if the weather changes, the new location may respond differently over the course of a day than the old.  Determining this from just the data itself (using no metadata as is the case for many stations) will be very hard.

So, what can be done with a list of change point dates and adjustment magnitudes.  Well, we thought that we could see what the "global" average land-surface temperature change from 1973-2013 is when taking stations with fewer and fewer or smaller and smaller inhomogeneities ("global" is in quotation marks because of the station distribution of HadISD - some regions are better sampled than others).   This would show (a) if there was an effect by excluding the more inhomogeneous stations, (b) what this effect was and (c) what other issues arise.

We were also inspired by the work of Callendar (1938, 1961) who used very few stations to estimate the global land-surface temperature and obtained results in agreement with the latest best estimates (see article by Ed Hawkins).  If Callendar's work obtained accurate results with few stations, then using a subsets of the HadISD stations should also.
Fig. 1. All 6103 stations from HadISD (black), CRUTEM4 (red) and CRUTEM4 with matched coverage (blue).  Bottom panel shows the deviations (with the full CRUTEM4 divided by 10 for clarity)

When using all 6103 stations there is already very good agreement between HadISD and the matched CRUTEM4 (Jones et al, 2012) - see Fig. 1.  The HadISD timeseries line (black) is not easily distinguished from the matched CRUTEM4 (blue), and the linear trends match very well.  We use linear trends in this case to easily summarise the change over the 40 years of data.  For this study, we use CRUTEM4 as our known comparison field.  Therefore we compare both against a version where we match the coverage, which is the best possible result we could achieve with HadISD, and the full version, which gives information as to what we are missing as a result of the incomplete coverage that HadISD has.
Fig. 2.  As for Fig. 1, but only taking stations where the largest inhomogeneity is less than 1C.
Only retaining stations where the largest inhomogeneity is <1C (4300 stations), improves the agreement, both for the linear trends but also for the deviations and root-mean-square errors (bottom panel).  Hence using the homogenisation assessment to select those stations which are relatively homogeneous does result in a better estimate.  However, the RMS against the full CRUTEM4 has increased from 0.13 to 0.17.
Fig.3.  As for Fig.1, but only taking stations where the largest inhomogeneity is less than 0.5C.
In Fig. 3 we were even more restrictive and here, although there was still a good match between HadISD and the matched CRUTEM4, the RMS has increased fractionally.  Similarly (not shown here, but in the paper) if we also restricted the number of change points that were detected in a station, the agreement also deteriorated, but not as rapidly as with the size of the inhomogeneity.
Fig. 4. As for Fig. 1, but only taking those stations in which no inhomogeneity was found or which could not be assessed.
Finally, in Fig. 4, we show the result when taking only those 1458 stations in which no inhomogeneity was found or which could not be assessed by PHA.  Here the agreement between HadISD and CRUTEM (both full and matched) is clearly deteriorating, although the linear trends do still agree within their uncertainties.  Also, the HadISD line does not emerge from the CRUTEM uncertainty envelope, even using these restrictions.


This indicates that eventually it is the coverage or which exact underlying stations are included which start to dominate any error, rather than the station quality itself.  However, removing the most inhomogeneous stations does result in better agreement, and also greater confidence on the part of researchers that results obtained are accurate.  Hence there is a balance to be drawn, and this will depend on the problem being addressed and also, at some level, to individual preference.  We can also use the results to double check our merging procedure as some inhomogeneities are likely to arise from stations that should not have been merged together.

References


Callendar, G. (1938). "The artificial production of carbon dioxide and its influence on temperature", Quarterly Journal of the Royal Meteorological Society, 64 (275), 223-240 DOI: 10.1002/qj.49706427503

Callendar, G. (1961). "Temperature fluctuations and trends over the earth", Quarterly Journal of the Royal Meteorological Society, 87 (371), 1-12 DOI: 10.1002/qj.49708737102

Dunn, R et al. (2014) "Pairwise homogeneity assessment of HadISD", Climate of the Past, 10, 1501

Hawkins, E & Jones, P (2013) "On increasing global temperatures: 75 years after Callendar", Quarterly Journal of the Royal Meteorological Society, 139, DOI: 10.1002/qj.2178

Jones, P et al. (2012) "Hemispheric and large-scale land-surface air temperature variations: An extensive revision and an update to 2010", Journal of Geophysical Research: Atmospheres", 117

Menne, M, & Williams, C, (2009), "Homogenization of temperature series via pairwise comparisons." Journal of Climate 22.7, 1700
 
Venema V et al, (2012) "Benchmarking homogenization algorithms for monthly data" Climate of the Past, 8,89
 
Williams C, N, Menne M, L and Thorne, P, W, (2012) "Benchmarking the performance of pairwise homogenization of surface temperatures in the United States" Journal of Geophysical Research, Atmospheres, 117