Tricks to Speed up Neighbor Searches of Quadtrees. #geo #spatial #java

In Java land there are at least two quadtree implementations which are not yet optimal, so I though I’ll post some possibilities to tune them. Some of those possibilities are already implemented in my GraphHopper project.

Quadtree

What is a quadtree? Wikipedia says: “A quadtree is a tree data structure in which each internal node has exactly four children.” And then you need some leaf nodes to actually store some data – e.g. points and associated values.

A quadtree is often used for fast neighbor searches of spatial data like points or lines. And a quadtree with points could work like the following: fill up a leaf node until its full (e.g. 8 entries), then create a branch node with 4 pointers (north-west, north-east, south-west, south-east) and decide where the leaf node entries should go depending of its location, in this process it could be necessary to create new branch nodes if all entries are too much clustered.

Now a simple neighbor search can be implemented recursively. Starting from the root node:

  • If the current node is a leaf node check if the point is in the search area. If that is the case add it to the results
  • If it is branch node check if one of the 4 sub-areas intersects with the search area. If a sub-node intersects then use that as current node and call this method recursively.

Trick 1 – Normalize the Distance

Searches are normally done with a point and a radius. To check if the current area of the quadrant intersects with the search area you need to calculate the distance using the Haversine formula. But you don’t need to calculate it every time in its entire complexity. E.g. you can avoid the following calculation:

R * 2 * Math.asin(Math.sqrt(a));

This is ok, if you have already normalized the radius of the search area via:

double tmp = Math.sin(dist / 2 / R);
return tmp * tmp;

Trick 2 – Use Smart Intersect Methods

The intersect method should fail fast. E.g. when you use again a circle as search area you should calculate only once the bounding box and check intersection of this with the quadrant area before applying the heavy intersect calculation with the Haversine formula:

public boolean intersect(BBox b) {
    // test top intersect
    if (lat > b.maxLat) {
        if (lon < b.minLon)
            return normDist(b.maxLat, b.minLon)  b.maxLon)
            return normDist(b.maxLat, b.maxLon)  0;
    }

    // test bottom intersect
    if (lat < b.minLat) {
        if (lon < b.minLon)
            return normDist(b.minLat, b.minLon)  b.maxLon)
            return normDist(b.minLat, b.maxLon)  0;
    }

    // test middle intersect
    if (lon < b.minLon)         return bbox.maxLon - b.minLon > 0;
    if (lon > b.maxLon)
        return b.maxLon - bbox.minLon > 0;
    return true;
}

Also be very sure you defined your bounding box properly once and for all. I’m using: minLon, maxLon followed by minLat which is south(!) and maxLat. Equally to EX_GeographicBoundingBox in the ISO 19115 standard see this discussion.

Trick 3 – Use Contains() and not only Intersect()

Now a less obvious trick. You could completely avoid intersect calls of quadrant areas which lay entirely in the search area. For this it is necessary to calculate fast if the search area fully contains the quadrant area or not. E.g. the method for a boudning box containing another bounding box would be:

class BBox {
  public boolean contains(BBox b) {
    return maxLat >= b.maxLat && minLat = b.maxLon && minLon   }
  ...
}

Similar for BBox.contains(Circle), Circle.contains(BBox) and Circle.contains(Circle).

Trick 4 – Sort In Time and not just Adding

Normally you want only 10 or less nearest neighbors and not all neighbours in a fixed distance. Especially for search engines like Lucene this should be favoured. For that you should use a priority queue to handle the ordering of the result. But not only of the leaf nodes – also when deciding which branch node should be processed next! See the paper Ranking in spatial databases for more information, where also a method for incremental neighbor search is described. This would make paging through the results very efficient.

Trick 5 – Use Branch Nodes with more than Four Children

Instead of branch nodes with 4 children you could use a less memory efficient but faster arrangement: use branch nodes with 16 child nodes. Or you could even decide on demand which branch node you should use – this can lead to partially big branch arrays where the quadtree is complete – making searching very efficient as then the sub-quadtree is a linear quadtree (see below).

Tricks for linear QuadTrees

Trick 6 – Avoid costly intersect methods

This only applies if you quadtree is a linear one ie. one where you can access the values by spatial keys (aka locational code). You’ll need to compute the spatial key of all four corners of your search area bounding box. Than compute the common bit-prefix of the points and start with that bit key instead from zero to traverse the quadtree. More details are available in the paper Speeding Up Construction of Quadtrees for Spatial Indexing  p.12.

Trick 7 – Bulk processing for linear Quadtrees

If you know that a quadrant is completely contained in a search area you can not only avoid further intersection calls, but also you can completely avoid branching and loop from quadrants bottom-left point to top-right. E.g. your are at key=1011 and you know the current quadrant node is contained in the search area. Then you can loop from the index “1011 000000..” to “1011 111111..”

Trick 8 – Store some Bits from the Point in Branch Nodes

For linear quadtrees you already encoded the point into spatial keys. Now you can use a radix tree to store the data memory efficient: some bits of the spatial key can be stored directly in the branch nodes. I’ve create also a different approach of a memory efficient spatial storage but as it turns out it is not that easy to handle when you need neighbor searches.

Conclusion

As you can see a lot time can go into tuning a quadtree. But at least the first tricks I mentioned should be used as they are easy and fast to apply and will make your quadtree significant faster.

About these ads