Fun with Shapefiles, CRSs and GeoTools

Although I’m now in the “GIS business” for years I had never to deal with shapefiles directly. Now it was time also to investigate tools like QGIS and hack together a simple reader for shp files. At least I thought it was simple but calling me a GIS expert afterwards would be a ridiculous understatement.

GeoTools fun

A quick look and I decided to go with GeoTools as I knew it from name and I needed a tool in Java. Thanks to QGIS I understood quickly that in my case I had to deal with a list of a list of lines containing coordinates but how to read that via GeoTools? The internet provided several solutions, but I didn’t found complete examples for my case. As it turned out: I had to explicitly cast 2 times (!) first from “Feature” to “SimpleFeature” and then from “Geometry” to “MultiLineString”. Not sure if this is really necessary. At least this makes learning a new API very hard.

Now I had the initial code:

Map connect = new HashMap();
// a File is not sufficient as a shapefile consists of multiple files
connect.put("url", file.toURI().toURL());
DataStore dataStore = DataStoreFinder.getDataStore(connect);
String[] typeNames = dataStore.getTypeNames();
String typeName = typeNames[0];
FeatureSource featureSource = dataStore.getFeatureSource(typeName);
CoordinateReferenceSystem sourceCRS = featureSource.getSchema().getCoordinateReferenceSystem();
FeatureCollection collection = featureSource.getFeatures();
// allow for some error due to different datums ('bursa wolf parameters required')
boolean lenient = true;
MathTransform transform = CRS.findMathTransform(sourceCRS, targetCRS, lenient);

List<List<GPXEntry>> lineList = new ArrayList<>();
try (FeatureIterator iterator = collection.features()) {
    while (iterator.hasNext()) {
        SimpleFeature feature = (SimpleFeature) iterator.next();
        MultiLineString mlString = (MultiLineString) feature.getDefaultGeometry();
        ...
    }
}

How short and beautiful. But: It did not compile. And that although I was using the recommended “maven procedure”. It seems that GeoTools seems to follow a bit unusual path that it requires you to define the repositories in your pom.xml – I did only find a solution with the snapshot versions but this was sufficient for the time being.

CRS fun

At least it seemed to work then. But after further longish time I found out that the coordinates had just a tiny offset, so something was wrong with the source or target coordinate reference system (CRS) or with the transformation itself. Again QGIS helped me here and determined the source CRS correctly. But GeoTools was somehow wrong and initially I thought it was GeoTools fault.

But I quickly stumbled over another CRS issue and had to deal with exactly the same CRSs leading to different results. In my case it was CRS.decode(“EPSG:4326”) vs. DefaultGeographicCRS.WGS84 – so they are identical but the results were different!? It turns out that the coordinate axes are mixed! GeoTools fault? No! GeoTools even gave me the solution in its documentation:
“So if you see some data in “EPSG:4326” you have no idea if it is in x/y order or in y/x order”!

Deployment

Puh. Okay. I was ready for deployment and used my usual git and mvn assembly procedure to push stuff on my server but then I got exceptions while runtime about missing classes! Oh no – how can this be when I use maven?
As it turns out GeoTools requires the maven shade plugin in order to bundle the database for correct CRS transformation properly via a plugin architecture I think. And look: the whole jar is now nearly 12MB!

Conclusion

The GIS and Java world are called “enterprise” for a reason. I hope I can help others with my findings. Find the fully working code here.

Units in OpenStreetMap

First of all, this is not a rant nor am I a (regular) mapper but I have some years of experience to read aka ‘interpret’ OSM data. I invite mappers to read, understand and comment on this post (in this order ;)).

Learning and understanding a specific tag

When I learn about a new tag for GraphHopper e.g. maxweight the first thing I do is that I go to taginfo and see some common use cases and implement them. Then I increase the parsing area to country-wide and I add more parsing code here and there to ignore or include commonly used values that make sense or not. Then I go worldwide doing the same. Then what is left, see this gist, are some very infrequent used values, some make sense like ‘15 US ton‘ and some don’t, like ‘agriculture‘. Now I need to decide to fix them, ignore them or include parsing code. In the case of the weight values I did see a reason to include reading values like ‘13000 lbs’ or the most frequent ones like ‘8000 (t(on)s)’ but not e.g. ‘13000 lb’ (10 times world wide) which I just fixed and converted them to SI unit – maybe I should have just added the ‘s’?

OpenStreetMap is a database

In OpenStreetMaps the tagging schema is not always clear and depends from local community to local community. And this is a good thing that OSM is flexible. The question now is, if this difference should be reflected in the data itself or if a more concise database should be preferred and the local difference could be moved into the ‘view’ like the editors. I think:

OSM should prefer more concise data if possible and this gets more important as it grows.

Now back to my example of weight values.

SomeoneElse commented today on my none automatic change where I converted ’15 US tons’ to 13.607 “SI” tons with a world wide occurrence of 5 (!) that we should not make it more complex via SI units. But if you look at the US unit system with ‘US tons’ and ‘short’ and ‘long tons’, ‘pounds’, ‘lbs’ etc, plus the various ‘weight’-possibilities like listed in this good proposal you can guess that this is already not that easy. So such an edit would be probably better done via an assisting editor which converts between weight units.

Popular OSM editors should make it possible to use local units but convert them into some SI-based when stored.

On my OSM diary someone correctly says: But “we map as it is” includes units in a way to. A limiting sign at a bridge does have a written or implied unit.
I answered: Is mapping really the process down to the database? I doubt that. Mapping means modelling the real situation with the tools we have. The tools will evolve and so should the mapping process making the database more concise and the mapping process less complex.

GPSies.com using the GraphHopper Directions API

The founder Klaus of GPSies contacted me nearly 2 years ago when GraphHopper was still in its infancy. GPSies was using Google Maps in its path planning tool and as they are free to use and want to keep it they did not want to buy into the business version of Google Maps so they were seeking for alternatives. At that time GraphHopper was already fast but could not scale to world wide coverage and Klaus provided the necessary hardware to me for experimentation. After a few months of tweaking and further months of testing and minor bug fixing we were able together to replace Google Maps API with a self-hosted GraphHopper on a GPSies server.

Also other customer often requested a hosted version of GraphHopper and so the idea of the GraphHopper Directions API for business was born with several benefits over existing routing APIs like basing it on OpenStreetMap data, high privacy standards, a permissive usage policy and world wide coverage even for bike routing.

Today we proudly announce that GPSies switched to this architecture making routing for GPSies more efficient and more up-to-date and still keep the costs low. Especially the daily OpenStreetMap data updates and regular software updates will make GPSies keep on growing!

The Builder Pattern in Java With the Least Code Possible

Immutable objects are important to make your code more robust, especially in days of more parallelization. A builder pattern is used when some of the variables of an immutable class are required and some are optional. But this leads to a massive constructor explosion, at least in Java. Today I think I found an improved builder pattern which could be used with no attribute duplication in the builder class and no separate private constructor in the domain class.

Usual Constructors

Here is a normal immutable class with the various necessary constructors for only one optional field ‘age’:

public class Person {
  private final String name; // required
  private final int age;     // optional

  public Person(String name, int age) {
     this.name = name;
     this.age = age;
  }
  public Person(String name) {
     this.name = name;
  }
  public Person(int age) {
     this.age = age;
  }

  public String getName() {
     return this.name;
  }
  public int getAge() {
     return age;
  }
}

Builder Pattern 1.0

The builder pattern removes the need of various constructor combinations:

public class Person {
  private final String name; // required
  private final int age;     // optional
  private Person(PersonBuilder pb) {
     this.age = pb.age;
     this.name = pb.name;
  }

  public String getName() {
     return this.name;
  }
  public int getAge() {
     return age;
  }
}

public class PersonBuilder {
  private String name;
  private int age;

  public PersonBuilder name(String name) {
     this.name = name;
     return this;
  }
  public PersonBuilder age(int age) {
     this.age = age;
     return this;
  }

  public Person create() {
     return new Person(this);
  }
}

The usage is:

Person p = new PersonBuilder().
   name("testing").
   age(20).
   create();

Builder Pattern 2.0

Now my builder pattern with less overhead. Of course in real world examples you won’t have only one optional field making the savings more obvious. The Builder Pattern 2.0 uses a static embedded subclass for the builder and still uses (package) protected fields. As you can see this solution is only ~5 lines more than the original immutable object without the constructors as it just moves the setters into a separate class:

public class Person {
  String name; // required
  int age;     // optional

  public String getName() {
     return this.name;
  }
  public int getAge() {
     return age;
  }

  public static class BEGIN extends Person {
    public BEGIN name(String name) {
      this.name = name;
      return this;
    }
    public BEGIN age(int age) {
      this.age = age;
      return this;
    }

    public Person END() {
      return this;
    }
  } // end of builder class
} // end of domain class

The usage is similar to the original builder pattern:

Person p = new Person.BEGIN().
   name("testing").
   age(20).
   END();

Keep in mind that this solution has the ‘drawback’ of no unnecessary object creation involved like builder pattern 1.0. And therefor the END method is not thread-safe unlike the create method. (You can fix that via this.clone() within END, not sure if you like that). Also I think for those cases you probably need more something like a factory. As noted in the comments the builder class START should be renamed to Builder and then even better create a public static method ala ‘Builder Start() { return new Builder(); }’ where you then can avoid the ‘new’ when using it.

Improvement: Builder Pattern 2.1

After the comments and having this implemented in production I observed drawbacks. E.g. that you don’t have to call the END method at all as the subclass is also accepted. And that you could theoretically just downcast a Person object to its builder and change the variables again. The simplest solution is to use composition instead of inheritance like we do with our AlgorithmOptions object at GraphHopper, this way we can also use private fields again.

Conclusion

This new builder pattern is suited if a method has several arguments with some of them optional. You can move these arguments into a separate class and use this pattern to avoid code duplication like I’ll probably do for some classes in GraphHopper. For everyone in love with magic (unlike me) they can also have a look into the project lombok as noted in the comments. Still the best thing would be to have something like this directly in Java and being able to write:

Person p = new Person(name="testing", age=20);

GraphHopper Directions API Going Private Beta

Update: our Directions API is public beta now.

Today we are proud to announce that our Directions API goes into private beta. Contact us and take part to get an API key and try our latest features.

The GraphHopper Directions API includes

  • The Routing API, a fast web service to calculate world wide routes for walking, biking and car.
  • The Matrix API, based on the Routing API you can calculate so called distance matrices more efficient.
  • The Geocoding API, a world wide address search. Still under heavy development and not yet production grade although with good results in several European countries.
  • Daily OpenStreetMap updates
  • A mature service based on the open source routing engine GraphHopper. Read more about GraphHopper at opensource.com

See the Routing and Geocoding API in action at GraphHopper Maps!