Convert a Byte Array to an Integer in Java

In Java there are several ways to convert a byte array into an integer. Let’s assume little endian order for all examples. If you need big endian order you can use can use Integer.reverseBytes.

1. ByteBuffer

byte[] byteArray = new byte[] {1, 2, 3, 4};
int intValue = ByteBuffer.allocate(4).put(byteArray).getInt(0);
System.out.println(intValue);
System.out.println(Integer.toBinaryString(intValue));

If you do not already use a ByteBuffer this is probably not the most efficient solution so let’s have a look another option.

2. BigInteger

byte[] byteArray = new byte[] {1, 2, 3, 4};
int intValue = new BigInteger(byteArray).intValue();
System.out.println(intValue);
System.out.println(Integer.toBinaryString(intValue));

But also in this method there is some overhead as new objects are created. So let’s see how to use low level bit operations in Java to achieve the same without new objects.

3. Bit Operation

byte[] b = new byte[] {1, 2, 3, 4};
int intValue = (b[3] & 0xFF) << 24
| (b[2] & 0xFF) << 16
| (b[1] & 0xFF) << 8
| (b[0] & 0xFF);
System.out.println(intValue);
System.out.println(Integer.toBinaryString(intValue));

So basically we shift each byte to the correct little endian positions and ensure that the unsigned byte is used. E.g. for byte b = -1 there would be higher bits set (implicit integer conversion) and with 0xFF we set them to zero, which ensures that the resulting (implicit) integer value represents the same numerical value as the byte. This also means that the expression (b[3] & 0xFF) << 24 is equivalent to b[3] << 24 but for the sake of clarity I avoided this optimization.

4. byteArrayViewVarHandle

Since Java 9 a lesser known but still efficient solution is to use MethodHandles and you basically “cast” the bytes into an integer like you would do in C/C++. You only need to define a static variable INT:

VarHandle INT = MethodHandles.byteArrayViewVarHandle(int[].class, ByteOrder.LITTLE_ENDIAN);

Then you can use INT like below:

byte[] b = new byte[] {1, 2, 3, 4};
int byteOffset = 0;
int intValue = (int) INT.get(b, byteOffset);
System.out.println(intValue);
System.out.println(Integer.toBinaryString(intValue));

The underlying implementation currently uses the lower level “Unsafe” method described in the next section.

5. Unsafe

To use Unsafe we need to obtain the instance:

static final Unsafe UNSAFE;
static {
    try {
        Field f = Unsafe.class.getDeclaredField("theUnsafe");
        f.setAccessible(true);
        UNSAFE = (Unsafe) f.get(null);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}
static final int BYTE_ARRAY_OFFSET = UNSAFE.arrayBaseOffset(byte[].class);

Then you can use UNSAFE like below:

byte[] b = new byte[] {1, 2, 3, 4};
int byteOffset = 0;
int intValue = UNSAFE.getInt(b, byteOffset + BYTE_ARRAY_OFFSET);System.out.println(intValue);System.out.println(Integer.toBinaryString(intValue));

Project Valhalla makes Java memory efficient again

For a brief introduction into project Valhalla you need to read this wiki at OpenJDK or watch e.g. this talk from Brian Goetz. Basically it changes the layout of data in memory and introduces a possibility to define compact collections of objects. Currently only arrays of primitive types like int[] or double[] are “compact”.

Long ago in JDK 1.5 I was really excited about the new generics feature and I was trying the first prototype. But soon I understood that this was not really what I had expected. So the JVM engineers introduced a similar templating mechanism like in C++ but “only” on the surface and the memory layout was still the same. So I was a bit disappointed and writing memory efficient Java software stayed hard.

Current Memory Layout

Let me describe the current memory layout with an example. If you put many instances of a Point class with two decimal values (e.g. latitude and longitude) into an array you’ll waste lots of memory as every entry in the array is a pointer to a separate Point instance instead of an array of double values but this is not enough: additionally every Point instance needs some header information. See below for a picture with details. This is not only a waste of memory but also an unnecessary indirection and for large arrays could also mean to touch many different memory areas just for looping. This especially hurts when many small instances are stored.

Inline type to the rescue

Since several years (!) there is work going on in OpenJDK that wants to address this. It is a major undertaking as they want to integrate this deep and want that even unmodified applications benefit from this. From time to time I look about their progress – earlier it was called “Value type”, since a few month it is “Inline type”. I think they reached a very interesting milestone that you can easily play with:

I was not able to convince IntelliJ to accept the ‘inline’ keyword despite configuring JDK14 et. Not sure if this requires modifications to the IDE. But maven worked.

The Usual Point Example

As a first test I created the simple Point class

class Point { double lat; double lon; }

and I wanted to find out the memory usage. The solid but stupid way to do this is to set e.g. -Xmx500m and increase the point count until you get an OutOfMemoryError (Java heap space). The results are:

  • without anything special a point count of 14M is possible.
  • when I adding the new “inline” keyword before “class Point” it was possible to increase the count to 32.5M!

You can also use this inlined Point class with generics like ArrayList<Point> but you need a so called “Indirect projections”: ArrayList<Point?>. I.e. it allows backward compatibility but you’ll loose the memory efficiency, at least at the moment as IMO ArrayList uses Object[] and not E[].

Memory Usage Now And Then

The limit of 32.5M points is explainable via
32 500 000*16/1024.0/1024.0=496MB every point instance uses 16 bytes as expected.

The 14M limit means approx 37 bytes per point and is not that easy to explain. The first piece you’ll need is:

In a modern 64-bit JDK, an object has a 12-byte header, padded to a multiple of 8 bytes, so the minimum object size is 16 bytes. For 32-bit JVMs, the overhead is 8 bytes, padded to a multiple of 4 bytes.

Taken from this Stackoverflow answer

Additionally a reference can use between 4 and 8 byte depending on the -Xmx setting, read more about “compressed ordinary object pointers (oops)“.

This leads to the following equation for the example: 4 bytes for the reference, 12 for the header, 16 for the two doubles and 4 byte for the padding to fill the 32 bytes (multiple of 8 bytes), i.e. 36 bytes per point.

So without project Valhalla you currently waste over 55%:

Project Valhalla makes Java memory efficient again

The memory waste of the current memory layout can be even worse if the object is smaller. A Point with two double values for coordinates on earth is a bit too precise and float values are sufficient (even less than 8 bytes). An “inlined” point instance just needs 8 bytes. Without “inline” you need 28 bytes (4+12+2*4+4), which means you waste more than 70%.

Other Valhalla Features

Another feature implemented is the == sign. Try the following unit test in a current JVM:

assertTrue(new Point(11, 12) == new Point(11, 12));
assertTrue(new Point(12, 12) != new Point(11, 12));

And you’ll notice it fails. With project Valhalla this passes and you do not even have to implement an equals method!

At the moment as far as I know there no direct primitive support like ArrayList<int>.

Also “inline”-types do not support declaring an explicit super class, but you can use composition. For example:

inline class Point3D { double ele; Point point; }

My Lenovo On-site Warranty Extension

Three years ago I blogged about the Thinkpad T460 that I newly bought. And I was very pleased with it. Until the mainboard broke a few weeks ago. Just when the 3 years warranty would have been over, but luckily I bought the 5 year warranty extension with on-site support.

The CPU or something froze the laptop after only a few minutes of working with it. The display was still on but neither the keyboard nor the touch pad responded. Sometimes the CPU-fan was active a few minutes after this but not at maximum level. The only possibility was to shutdown the laptop.

This was no Linux compatibility issue. I have not updated anything and so it happened out of nowhere. I disabled Wifi and Bluetooth and also looked into the kernel logs to confirm that there was no kernel panic and I even freshly installed Ubuntu 18.4. just to get the same problems. Furthermore I also updated to the latest BIOS version without success.

Day 1, 20.05.2019 (counting working days only)

After these results I called the hotline on Monday and they replied I should run the extensive diagnostics that come with the BIOS. Ok, so I did this and it froze occasionally also while the CPU stress test. This took me at least 2 hours as I wanted to be precise and helpful with my answer and provided details like that I could even make it reproducible via unplugging the power cable** or that sometimes it ran through the CPU stress test only to freeze later when running the very long running “memory test”. Also often the laptop did not even start for minutes after these freezes.

Day 2

Nothing happened and I had no time to call them again as sometimes you have to work 😉 and improve the fallback laptop.

Day 3

At 11am I still had no response although in the warranty they say “usually the next working day” (üblicherweise am nächsten Werktag) a technician will come to fix it. So I called again. “Funnily” the support Email from Day 1 contained a broken support telephone number for Germany. So the real number has just one zero after “22”, i.e. the correct support number is:

+49 201 22099 888

They confirmed that this seems to be a hardware problem and they promised to send me a new mainboard via express courier and also a technician the next day (to be safe I confirmed mobile number and address). Ok, IMO not that fast like they say, but for me acceptable as I have an old laptop with which I can continue working at least the important things.

Day 4

At 10am the UPS package with the parts arrived. Why don’t they send it to the technician?***

In the afternoon I called them once again to understand why the technician did not come. And roughly 1h later the technician called me and repaired my laptop 🙂 and it seems to work. Hope the “refurbished” sticker on the mainboard box is not a bad sign.

Day 5

Unfortunately the function key does not work anymore (or maybe never had with the new mainboard). I’m pretty sure this is a hardware issue. First of all is the Fn key something that the firmware controls and also when I switch Ctrl and Fn key then the Ctrl key works properly for e.g. a brighter display. Tried an older BIOS and the most recent BIOS but the Fn key still does not work.

I called the hotline they will send me a new keyboard. I’m unsure why this should fix my issue as the keyboard worked properly before the mainboard switch but who knows.

Day 6

The keyboard arrived.

Day 7

The technician replaced the keyboard. The function key is still not working under his Windows and also my Ubuntu. He argued that it still can be a driver issue. I argued that it worked properly with the “freezing”, old mainboard (on Ubuntu).

Day 8-10

No feedback from Lenovo regarding what to do now with this Fn key problem and I did now not call them and just waited for them to act.

Day 11

Something will arrive in the next two days they wrote via Email

Day 12

A new mainboard arrived! The technician came one hour later and installed it. The great thing is: everything is working now – finally 🙂

Conclusion

The experience was not like advertised “expected the next business day” and could be improved. The most important part to improve is to avoid forcing the customer to call the hotline over and over again to make it (days) faster: where are the parts? where is the technician?

I had to workaround a fully dysfunctional laptop only for the first 4 days. (A non-working Fn key is not that bad.)

So, out of 10 stars I would give 7. It isn’t that good and not enough information passes on to the customers, but it seems that at least they care about that issues are fully fixed and 4 days was kind of acceptable for me. And if there is an issue they likely pay more money than you pay for the warranty.

All in all I invested roughly one day into calling them, preparing the fallback hardware, writing this blog post and to find out what was actually wrong. I invested probably too much time into making sure that it is not my fault.

**one strange thing left is that the freeze is still reproducible when plugging or unplugging the power cable while the extensive test run in the BIOS (CPU stress test). So maybe this is unrelated to my issue.

***after a chat with the technician he said that last year the parts were shipped to them, which made indeed more sense to him as well. But after one more thought I think shipping it to the customer could make (a bit) sense if they expect the technician on the road and then he could go directly to the customer without going back to pick the parts.

Fun with Shapefiles, CRSs and GeoTools

Although I’m now in the “GIS business” for years I had never to deal with shapefiles directly. Now it was time also to investigate tools like QGIS and hack together a simple reader for shp files. At least I thought it was simple but calling me a GIS expert afterwards would be a ridiculous understatement.

GeoTools fun

A quick look and I decided to go with GeoTools as I knew it from name and I needed a tool in Java. Thanks to QGIS I understood quickly that in my case I had to deal with a list of a list of lines containing coordinates but how to read that via GeoTools? The internet provided several solutions, but I didn’t found complete examples for my case. As it turned out: I had to explicitly cast 2 times (!) first from “Feature” to “SimpleFeature” and then from “Geometry” to “MultiLineString”. Not sure if this is really necessary. At least this makes learning a new API very hard.

Now I had the initial code:

Map connect = new HashMap();
// a File is not sufficient as a shapefile consists of multiple files
connect.put("url", file.toURI().toURL());
DataStore dataStore = DataStoreFinder.getDataStore(connect);
String[] typeNames = dataStore.getTypeNames();
String typeName = typeNames[0];
FeatureSource featureSource = dataStore.getFeatureSource(typeName);
CoordinateReferenceSystem sourceCRS = featureSource.getSchema().getCoordinateReferenceSystem();
FeatureCollection collection = featureSource.getFeatures();
// allow for some error due to different datums ('bursa wolf parameters required')
boolean lenient = true;
MathTransform transform = CRS.findMathTransform(sourceCRS, targetCRS, lenient);

List<List<GPXEntry>> lineList = new ArrayList<>();
try (FeatureIterator iterator = collection.features()) {
    while (iterator.hasNext()) {
        SimpleFeature feature = (SimpleFeature) iterator.next();
        MultiLineString mlString = (MultiLineString) feature.getDefaultGeometry();
        ...
    }
}

How short and beautiful. But: It did not compile. And that although I was using the recommended “maven procedure”. It seems that GeoTools seems to follow a bit unusual path that it requires you to define the repositories in your pom.xml – I did only find a solution with the snapshot versions but this was sufficient for the time being.

CRS fun

At least it seemed to work then. But after further longish time I found out that the coordinates had just a tiny offset, so something was wrong with the source or target coordinate reference system (CRS) or with the transformation itself. Again QGIS helped me here and determined the source CRS correctly. But GeoTools was somehow wrong and initially I thought it was GeoTools fault.

But I quickly stumbled over another CRS issue and had to deal with exactly the same CRSs leading to different results. In my case it was CRS.decode(“EPSG:4326”) vs. DefaultGeographicCRS.WGS84 – so they are identical but the results were different!? It turns out that the coordinate axes are mixed! GeoTools fault? No! GeoTools even gave me the solution in its documentation:
“So if you see some data in “EPSG:4326” you have no idea if it is in x/y order or in y/x order”!

Deployment

Puh. Okay. I was ready for deployment and used my usual git and mvn assembly procedure to push stuff on my server but then I got exceptions while runtime about missing classes! Oh no – how can this be when I use maven?
As it turns out GeoTools requires the maven shade plugin in order to bundle the database for correct CRS transformation properly via a plugin architecture I think. And look: the whole jar is now nearly 12MB!

Conclusion

The GIS and Java world are called “enterprise” for a reason. I hope I can help others with my findings. Find the fully working code here.

Units in OpenStreetMap

First of all, this is not a rant nor am I a (regular) mapper but I have some years of experience to read aka ‘interpret’ OSM data. I invite mappers to read, understand and comment on this post (in this order ;)).

Learning and understanding a specific tag

When I learn about a new tag for GraphHopper e.g. maxweight the first thing I do is that I go to taginfo and see some common use cases and implement them. Then I increase the parsing area to country-wide and I add more parsing code here and there to ignore or include commonly used values that make sense or not. Then I go worldwide doing the same. Then what is left, see this gist, are some very infrequent used values, some make sense like ‘15 US ton‘ and some don’t, like ‘agriculture‘. Now I need to decide to fix them, ignore them or include parsing code. In the case of the weight values I did see a reason to include reading values like ‘13000 lbs’ or the most frequent ones like ‘8000 (t(on)s)’ but not e.g. ‘13000 lb’ (10 times world wide) which I just fixed and converted them to SI unit – maybe I should have just added the ‘s’?

OpenStreetMap is a database

In OpenStreetMaps the tagging schema is not always clear and depends from local community to local community. And this is a good thing that OSM is flexible. The question now is, if this difference should be reflected in the data itself or if a more concise database should be preferred and the local difference could be moved into the ‘view’ like the editors. I think:

OSM should prefer more concise data if possible and this gets more important as it grows.

Now back to my example of weight values.

SomeoneElse commented today on my none automatic change where I converted ’15 US tons’ to 13.607 “SI” tons with a world wide occurrence of 5 (!) that we should not make it more complex via SI units. But if you look at the US unit system with ‘US tons’ and ‘short’ and ‘long tons’, ‘pounds’, ‘lbs’ etc, plus the various ‘weight’-possibilities like listed in this good proposal you can guess that this is already not that easy. So such an edit would be probably better done via an assisting editor which converts between weight units.

Popular OSM editors should make it possible to use local units but convert them into some SI-based when stored.

On my OSM diary someone correctly says: But “we map as it is” includes units in a way to. A limiting sign at a bridge does have a written or implied unit.
I answered: Is mapping really the process down to the database? I doubt that. Mapping means modelling the real situation with the tools we have. The tools will evolve and so should the mapping process making the database more concise and the mapping process less complex.

GPSies.com using the GraphHopper Directions API

The founder Klaus of GPSies contacted me nearly 2 years ago when GraphHopper was still in its infancy. GPSies was using Google Maps in its path planning tool and as they are free to use and want to keep it they did not want to buy into the business version of Google Maps so they were seeking for alternatives. At that time GraphHopper was already fast but could not scale to world wide coverage and Klaus provided the necessary hardware to me for experimentation. After a few months of tweaking and further months of testing and minor bug fixing we were able together to replace Google Maps API with a self-hosted GraphHopper on a GPSies server.

Also other customer often requested a hosted version of GraphHopper and so the idea of the GraphHopper Directions API for business was born with several benefits over existing routing APIs like basing it on OpenStreetMap data, high privacy standards, a permissive usage policy and world wide coverage even for bike routing.

Today we proudly announce that GPSies switched to this architecture making routing for GPSies more efficient and more up-to-date and still keep the costs low. Especially the daily OpenStreetMap data updates and regular software updates will make GPSies keep on growing!

The Builder Pattern in Java With the Least Code Possible

Immutable objects are important to make your code more robust, especially in days of more parallelization. A builder pattern is used when some of the variables of an immutable class are required and some are optional. But this leads to a massive constructor explosion, at least in Java. Today I think I found an improved builder pattern which could be used with no attribute duplication in the builder class and no separate private constructor in the domain class.

Usual Constructors

Here is a normal immutable class with the various necessary constructors for only one optional field ‘age’:

public class Person {
  private final String name; // required
  private final int age;     // optional

  public Person(String name, int age) {
     this.name = name;
     this.age = age;
  }
  public Person(String name) {
     this.name = name;
  }
  public Person(int age) {
     this.age = age;
  }

  public String getName() {
     return this.name;
  }
  public int getAge() {
     return age;
  }
}

Builder Pattern 1.0

The builder pattern removes the need of various constructor combinations:

public class Person {
  private final String name; // required
  private final int age;     // optional
  private Person(PersonBuilder pb) {
     this.age = pb.age;
     this.name = pb.name;
  }

  public String getName() {
     return this.name;
  }
  public int getAge() {
     return age;
  }
}

public class PersonBuilder {
  private String name;
  private int age;

  public PersonBuilder name(String name) {
     this.name = name;
     return this;
  }
  public PersonBuilder age(int age) {
     this.age = age;
     return this;
  }

  public Person create() {
     return new Person(this);
  }
}

The usage is:

Person p = new PersonBuilder().
   name("testing").
   age(20).
   create();

Builder Pattern 2.0

Now my builder pattern with less overhead. Of course in real world examples you won’t have only one optional field making the savings more obvious. The Builder Pattern 2.0 uses a static embedded subclass for the builder and still uses (package) protected fields. As you can see this solution is only ~5 lines more than the original immutable object without the constructors as it just moves the setters into a separate class:

public class Person {
  String name; // required
  int age;     // optional

  public String getName() {
     return this.name;
  }
  public int getAge() {
     return age;
  }

  public static class BEGIN extends Person {
    public BEGIN name(String name) {
      this.name = name;
      return this;
    }
    public BEGIN age(int age) {
      this.age = age;
      return this;
    }

    public Person END() {
      return this;
    }
  } // end of builder class
} // end of domain class

The usage is similar to the original builder pattern:

Person p = new Person.BEGIN().
   name("testing").
   age(20).
   END();

Keep in mind that this solution has the ‘drawback’ of no unnecessary object creation involved like builder pattern 1.0. And therefor the END method is not thread-safe unlike the create method. (You can fix that via this.clone() within END, not sure if you like that). Also I think for those cases you probably need more something like a factory. As noted in the comments the builder class START should be renamed to Builder and then even better create a public static method ala ‘Builder Start() { return new Builder(); }’ where you then can avoid the ‘new’ when using it.

Improvement: Builder Pattern 2.1

After the comments and having this implemented in production I observed drawbacks. E.g. that you don’t have to call the END method at all as the subclass is also accepted. And that you could theoretically just downcast a Person object to its builder and change the variables again. The simplest solution is to use composition instead of inheritance like we do with our AlgorithmOptions object at GraphHopper, this way we can also use private fields again.

Conclusion

This new builder pattern is suited if a method has several arguments with some of them optional. You can move these arguments into a separate class and use this pattern to avoid code duplication like I’ll probably do for some classes in GraphHopper. For everyone in love with magic (unlike me) they can also have a look into the project lombok as noted in the comments. Still the best thing would be to have something like this directly in Java and being able to write:

Person p = new Person(name="testing", age=20);

GraphHopper Directions API Going Private Beta

Update: our Directions API is public beta now.

Today we are proud to announce that our Directions API goes into private beta. Contact us and take part to get an API key and try our latest features.

The GraphHopper Directions API includes

  • The Routing API, a fast web service to calculate world wide routes for walking, biking and car.
  • The Matrix API, based on the Routing API you can calculate so called distance matrices more efficient.
  • The Geocoding API, a world wide address search. Still under heavy development and not yet production grade although with good results in several European countries.
  • Daily OpenStreetMap updates
  • A mature service based on the open source routing engine GraphHopper. Read more about GraphHopper at opensource.com

See the Routing and Geocoding API in action at GraphHopper Maps!