WKB4J - WKB4J: Gotchas

Performance

It is hard to qualify the performance gain of using WKB4J instead of parsing the WKT format. Sometimes WKB4J is three times faster, sometimes it's more like a hundred times faster (yes, that's *100), it depends of the dataset. Obviously it also depends on the level of effectiveness of the native WKT parser. I'm planning to investigate this issue. Nevertheless, I can say that WKB4J will scales better than any WKT parser because WK4J parses binary instead of text.

SRIDs

The WKB format specifies that the number of Geometries contained in a geometry is stored in an unsigned 4-bytes integers, meaning that it takes values from [0,2^32 ] . Meanwhile, Java doesn't have an unsigned int type, so in Java, an int can take any value from [-1 * 2^31, 2^31-1 ] . Longs can handles this unsigned values, but they take twice as much space. So basically, if your shapes contains more that 2^31 -1 values, the size will shows up as negative and the driver will complain bitterly.

Furthermore, an array can't contain more than 2^31-1 values, and most factories are built with arrays.

Performance

Make sure that the SQL query actually uses indexes; if not, performance really suffers.

Endianess

In Java, everything is done in the Big-Endian format, disregarding the endian of the underlying platform. The JVM basically handles everything itself. Of course in Postgresql and PostGIS everything is done using th e endian of the platform. This isn't a problem for us because PostGIS provides a way to choose the Endian of the output of the selection query: the endian switch of the asbinary function. 'XDR' selects the Big-Endian format so all queries retrieving binary data from PostGIS using binary data should use something like this : "SELECT AsBinary(the_geom,'XDR') FROM ..." A nice page on the problem of endianess in Java, including some GPL code to read native data in the Little-Endian format.

Precision

For some reasons, the values returned through the WKB format and the WKT format do not match exactly. I'm thinking that the problem lies in data conversion inside Postgis. The data is stored in a specific format in the database. It is inserted through the WKT format. When extracted through the WKT format, inserted and returned values do match. However, when extracted through the WKB format, the value is slightly different. Postgres and Java are both supposed to implement the IEEE 754 standard, but I guess that there is a few glitches left.