During this process, it is possible that I will have missed some of the changes needed to ensure continuity across the versions (e.g. integer versus float division as default). However I have found one bug which seems to have been present since the creation of the Python version in 2014.
The world records check compares observation values to the world records for each continent (WMO region) as held by the WMO. Unfortunately in prior versions only the global values were being used, and so this check was not as powerful as it could have been. For many cases, the observation values that exceeded regional records but not the global ones will have been picked up by other checks (e.g. climatological or distribution).
Below I show the images for the old and new versions of the test. These are on very slightly different runs (one from v3.0.1.201910p and one from a test run of the new code, which also updated the station counts slightly). As a fraction of the number of observations in each station, the increase in flagging is less than 0.1% (and in the stations I've checked numbers in the range of single observation to a few tens).
Fig 1: World record checks for temperature in v3.0.1.201910p (Python 2.7 live version) |
Fig 2: World record checks for temperature in v3.1.0.201910p (test version of Python 3) |