Wednesday, 11 December 2019

Another minor version change

The HadISD code for versions 2 and 3 is written in Python (having migrated from the IDL code in version 1).  However, it has been using Python2.7, support for which is ending on January 1st 2020.  Therefore, I have updated the codebase to ensure that it is Python3 compatible.

During this process, it is possible that I will have missed some of the changes needed to ensure continuity across the versions (e.g. integer versus float division as default).  However I have found one bug which seems to have been present since the creation of the Python version in 2014.

The world records check compares observation values to the world records for each continent (WMO region) as held by the WMO.  Unfortunately in prior versions only the global values were being used, and so this check was not as powerful as it could have been.  For many cases, the observation values that exceeded regional records but not the global ones will have been picked up by other checks (e.g. climatological or distribution).

Below I show the images for the old and new versions of the test.  These are on very slightly different runs (one from v3.0.1.201910p and one from a test run of the new code, which also updated the station counts slightly).  As a fraction of the number of observations in each station, the increase in flagging is less than 0.1% (and in the stations I've checked numbers in the range of single observation to a few tens).
Fig 1: World record checks for temperature in v3.0.1.201910p (Python 2.7 live version)
Fig 2: World record checks for temperature in v3.1.0.201910p (test version of Python 3)
As a result of (a) the update to Python3, and (b) the bug-fix of the world record check, the version released in December (which includes data up to the end of November 2019) is 3.1.0.201911p (as opposed to 3.0.1.201911p).  No other changes have been made, and we expect to release the final version for 2019 updates in early January (3.1.0.2019f).