Sunday, March 14, 2010

Slaying the Datum Shift Dragon

In the last few weeks, I believe I have made substantial improvements to the way datum shift information is derived from the EPSG dictionary for use in GDAL/OGR, PROJ.4 and related packages.

The EPSG database model supports having multiple datum shift options for a particular datum, such as "Potsdam Rauenberg 1950 DHDN" to get to WGS84. This reflects the reality that datum conversion is often an approximation for which there may be multiple reasonable solutions. Often different datum shift parameters are appropriate depending on the sub-region of the datum being used.

Many other coordinate system dictionary models and GIS systems do *not* provide for a multiplicity of datum shift options. For instance, the OGC Well Known Text representation of a coordinate system only allows for one TOWGS84[] clause indicating the preferred mapping to WGS84. This is also true of many software packages, including those based on PROJ.4 and it's epsg based dictionary.

In the past, the code I had implemented for translating the epsg database to a dictionary format assumed that datum shift parameters should only be carried into the dictionary if there was one, and only one shift available in the EPSG database. I took this approach because it seemed very dangerous to arbitrarily select one of several possible datum shifts without any intrinsic knowledge about which was most appropriate. I depending on users seeing that no datum shift was available in the default translation and taking this to mean that they had to do some research to establish what was most appropriate for them.

Predictably, the real result was massive confusion and complaints. At one time if there was no datum shift available, PROJ.4 would just do a transformation based on the change of ellipsoid which was often a very poor choice. So in PROJ 4.6.0 I altered the code not attempt any datum shift transformation if either or both of the source and destination coordinate system lacked datum shift information.

While this got rid of one family of errors, it also triggered lots of additional frustration and confusion. Part of this was just because there was a change of behavior. But it was also pushing people a bit harder to determine the appropriate datum shift and they did not like having to do this.

So, at last, I have taken the plunge and reworked the scripts used to translate the EPSG dictionary so that they attempt to pick one shift if there are several available. I follow a few heuristics in this effort.

1) I discard any datum shifts marked as deprecated.

2) I examine the supersession table to identify any datum shifts that have been superceeded by newer forms and ignore the superceeded forms.

3) I try to pick the datum shift with the larges "area of use" region under the assumption that it will likely be the broad use shift rather than a shift only applicable in a small area.

4) I examine the datum_shift_pref.csv file to see if there is a user supplied preferred datum shift to use. If so, I use that.

The result of all this is a datum_shift.csv file which includes all the datum shifts, with one of them marked as preferred. That preferred version goes into the gcs.csv file for use with the associated geographic coordinate system.

This seems to be working reasonable well, and I did a big pass through open tickets in GDAL, libgeotiff and PROJ.4 to find outstanding datum shift issues. I believe that the bulk are now resolved.

Currently GDAL, and the PROJ.4 dictionary are just using the preferred datum shift. But the intention of keeping the datum_shift.csv file with all the possible shifts for any given GCS is that savvy applications could let the user choose. Also, my hope is that an each mechanism will be added in the future so that users can alter the preferred setting in the datum_shift.csv file, and then have the gcs.csv regenerated. That waits to be done.

I'm also fixing a few other issues in the coordinate system realm including support for axis orientation (ie. South Orientated Transverse Mercator) and fixing a few other translations. I'd like to thank INGRES who have supported this work as part of an effort to bring top notch coordinate system support into the INGRES geospatial project. I'd also like to thank Jan Hartmann, and Mikael Rittri who assisted with suggested approaches, and verification of the results.