Thursday, March 28, 2013

Getting Started with GeoTools

TL;DR: Use Eclipse->Import->Maven to build an eclipse project for GeoTools.

The project I work on at Google (Earth Engine) uses the GeoTools Java library, primarily for coordinate system transformations.  We have been using it for a few years for this purpose and it mostly works ok though we have worked around a few quirks.  I think it is desirable for our team to be able to contribute at least some small fixes back upstream to GeoTools and so I've been struggling to get to the point where that can be done smoothly.

Contribution Agreements

The first challenge was that we needed permission to contribute.  Generally this is not a problem as Google is pretty open source friendly, but in order to get the GeoTools corporate contribution agreement signed we needed to clear it with our legal folks.  They balked at signing the original agreement which was ... ahem ... hand crafted ... during the GeoTools incubation process.  At this point I was prepared to back off on the effort as I didn't want be the guy who raises a stink because his companies legal people were being difficult, as if that should make it the rest of the worlds problems.  However, +Jody Garnett was concerned that others might run into similar concerns and pushed through a GeoTools change request adopting a slight variation on the more widely accepted Apache contribution agreement.

The next step was internal, where my coworker +Eric Engle, who maintains our internal GeoTool tree, adjusted our use of GeoTools so we build from source and are in a position to patch our local version as  we improve things.  I had tried to do this once last fall and failed to overcome some challenges related to special requirements of manifest documents for GeoTools.  Something to do with finding factories for things like EPSG databases and such.

MapProjection Tolerances

This put things back in my court.  I had a small internally reproducable problem with the spherical mercator projection not staying within the error bounds expected by GeoTools at some points on the globe.  After some debugging Eclipse I came to the conclusion that:

  1. It is evil that MapProjection.transform() actually uses Java asserts to confirm that point transformation are invertable within a tolerance.  This code only triggers when assertions are enabled, and makes it impossible to run code with transformations outside a reasonable realm without triggering this error when assertions are enabled. 
  2. That the default getToleranceForAssertions() result is awfully tight - 1E-5 meters or 1/10th of millimeter!
  3. For spherical projections it checks orthodromicDistance() which after stepping through the code I determined is very sensitive to rounding error for small distances.  Most locations near where I ran into a problem rounded to zero for distance in meters, while the slightly greater distance in radians at the problem point evaluated to something more like one centimeter, exceeding the error threshold. 
Well, those are the details, and I hope eventually to prepare a proper GeoTools bug report with a proposed patch.  In order to get to that I decided I needed to setup a "regular" GeoTools build that I could run tests against, and use to prepare a patch.  The GeoTools manual has great getting started information which I tried to follow.

Git Hate

GeoTools is hosted on github and I have a love/hate relationship with git.  Hmm, lets amend that.  I have a hate/hate relationship with git.  As I somewhat jestingly tell people I haven't yet got a full decade out of subversion yet after learning it in the middle of last decade so I start with a grudge about being forced to learn a new version control technology while the old one continues to work fine from my perspective.  But I also find git way more complicated than necessary.  Ultimately I determined that I needed to clone geotools on github so I would have a public repository that I could stage changes on and send pull-requests from since I can't issue pull requests for repositories living on my workstations at Google or at home.  Then I cloned that repository locally at work and eventually at home.  OK, I struggled a bit before concluding that is what was needed but I'm ok with that. 

Next I came to the conclusion that there is no easy way from the github interface to refresh my github clone from the official git instance, and I had to setup "upstream" on my workstation clone with the command:
  git remote add upstream http://github.com/geotools/geotools.git

And now every time I want my geotools clone on github updated I need to issue the following commands locally in my local github clone - basically pulling the changes down to my local repository and then pushing it back up to my own clone:
  git fetch upstream
  git merge upstream/master
  git push origin master

Blech.

I think things would be a bit different if I had write permission on the GeoTools git repository as then I'd be able to make my changes in a branch in it and then send pull requests against that but that would mean lots of branching which I'm also not too fond of.

Maven

GeoTools uses Maven for building.  I installed the Maven Ubuntu package and seemed good to go.  All I should need to do is "mvn install" in the GeoTools root directory to build.

The default Java on my workstation is a (possibly googlized) OpenJDK 1.7.  I get the impression from Jody that GeoTools is targetted at JDK 1.6 and that most folks use the Oracle JDK, not OpenJDK.  I was able to able to revert to using OpenJDK 1.6 but was still getting lots of things missing (I wish I had kept better notes on this part).  The net result after searching around was that Maven has "profiles" instructing it how to find some classes in different environments that seems to mostly affect the generation of Java docs.  After some messing around we found that because my OpenJDK returned "Google Inc." as the vendor something special was needed and Jody was kind enough to incorporate this upstream.

After this change I was able to compile a Maven build of GeoTools successfully, but it took hours and hours for reasons that were never clear and likely somehow Google specific.  Today I decided to try this again at home on my fairly stock Ubuntu 10.04 machine.  While not exactly fast, it was much faster and I could compile a "mvn install" on an already built tree without testing in a couple minutes.  I will note this time I used a downloaded Maven instead of the one provided by Ubuntu, but trying that at work did not seem to help.

Eclipse

Next, I wanted to get to the point where I had an Eclipse project I could use to debug and improve GeoTools with.  I tried to following the Eclipse Quick Start docs and learned a few things.


  1. Stock Eclipse 3.8 does not seem to include any of the Maven support even though there is apparently some in Eclipse 3.7.  I did find downloading and installing the latest Eclipse (Juno - 4.something) did have good Maven support.
  2. The described process produces a project where you can use GeoTools as jars, and even debug in, but not modify the GeoTools source.  This is ok for a use-only GeoTools user but not for someone hoping to fix things or even make temporary modifications during debugging.
I stumbled around quite a bit trying to get past this.  What I discovered is that the Eclipse File->Import->Maven->Existing Maven Projects path can be used to setup a project for one or more GeoTools modules.  Select the root directory of the GeoTools source tree and Eclipse will walk the pom.xml files (Maven "makefiles) to figure out the dependencies and where the source is.   You will need to select which ones to include.

I ran into complaints about lacking Eclipse understanding of  git-commit-id-plugin and something about jjtree.  I was able to work around this by not including the "cql" module which needed the jjtree stuff, and by hacking out the entire git-commit-id-plugin from the GeoTools root pom.xml file.  After that I also restricted myself to selecting the main "library" and I think "ogc" modules.  

The result was an Eclipse project in my workspace for each GeoTools module, including the referencing module I was interested in.  I was able to go into the src/test tree and run the junit tests fairly easily and all succeeded except three calls to *.class.desiredAssertionStatus() which I commented out.  They did not seem terribly relevant. 

Conclusion

Well, I can't say that I have accomplished my first meaningful contribution to GeoTools yet, but I have at least gotten to the point where I can build and work on things.  Despite some great getting start docs (mostly by Jody I think) and direct support from Jody in IRC (#geotools on freenode) it was still a painful process.  This is partly due to my relative lack of familiarity with Maven, Github, and even Eclipse outside of the very narrow way I use it at Google.  But it also seems that GeoTools is a big project and there are lots of traps one can fall into with versions of the various tools (Eclipse, Maven, Java runtime, etc).  But hopefully people will see that if I can hack it, you likely can too.  

4 comments:

  1. This somewhat parallels my learning experience. The tools can be pretty hard to learn; Maven especially still seems to have awful documentation for some reason. But once you learn the do's and don'ts, it's a productive setup.

    Except maybe Git, I haven't done much with it yet.

    The use of OpenJDK should make no difference to most modules, but as I recall some feature styling and Swing component tests fail without Sun's JDK. I can sweep those into bugs if Jody et al aren't already well-acquainted with it's issues.

    As for the project in general, I submitted a few patches to gt-shapefile, years ago (paralleling the handling of null cells in OGR actually!) and they had a good "patches welcome, but include good tests" mindset.

    Which is awesome.

    ReplyDelete
  2. it's a mindbender but on github you can actually create pull requests to your fork, eg. create a pull request from upstream master to a fork branch...
    eg. https://github.com/mprins/geotools/pull/5

    ReplyDelete
  3. Darn that is a documentation fail - the tutorials are all about using GeoTools in your own project.

    A bit further down (under build) are instructions for loading things into eclipse (http://docs.geotools.org/latest/userguide/build/maven/eclipse.html).

    mvn eclipse:eclipse -Dall will generate the .project and .classpath files.

    Under your eclipse preferences you can define a CLASSPATH variable for for M2_REPO (pointing to ~/.m2/repository)

    And then import as existing projects.

    ReplyDelete
  4. `git pull` is just `git fetch && git merge` all in one. That should knock one extra command out of your workflow.

    Also, you only need to set the upstream master once in the configuration, then you can just issue `git push` and `git pull` (with no additional arguments) and you're golden. That'll work just like Subversion.

    The power of git lies in your ability to cheaply branch. That, in turn, means smaller units of work, tighter iterations, and more rapid development.

    ReplyDelete