Raster file formats for JavaScript mapping
Some time ago, I made some docs about drawing raster data with d3js.
All the examples GeoTIFF files to get the data, but there are many other possibilities. I’ve made the exercice to create some examples using the same dataset but different strategies for creating the data with different formats.
Table of contents
- The data
- GeoTIFF
- NetCDF
- JSON
- JSON with encoded data
- Binary data
- LZW compressed binary data
- Performance comparison
- What to do with all this binary data?
- Links
The data
All the examples use the data from this block. You can see how I got the data here. I have taken only the first layer (msl pressure) to make the examples simpler:
gdal_translate -b 1 vardah.tiff vardah_new.tiff
You can download here the original vardah.tiff file.
GeoTIFF
As in the original example, GeoTIFF can be used as a way to get the raster data. It’s got many advantages, such as being the most widespread format, able to be compressed, that it’s possible to open it directly with any GIS program such as QGIS.
To use it, use the geotiff.js library.
Compression
The compressed images are read directly by the latest versions of the library. The compression can reduce the size a lot, specially with the Deflate option. The parsing time is bigger when the image is compressed, but the time is acceptable.
To create a compressed GeoTIFF file, use the gdal creation options:
gdal_translate -of GTiff -co COMPRESS=DEFLATE vardah.tiff vardah2.tiff
gdal_translate -of GTiff -co COMPRESS=LZW vardah.tiff vardah2.tiff
gdal_translate -of GTiff -co COMPRESS=PACKBITS vardah.tiff vardah2.tiff
Other compression options are not available with the geotiffjs library.
Another thing to take in account is the metadata. The geotransform data is stored in a quite strange way (see tiepoint and pixelscale in the example, and the GDAL metadata, in a special “GDAL” tag, which is not easy to find, although it is not when using python+GDAL.
HTML example
<!DOCTYPE html>
<html>
<meta>
<script src='geotiff.min.js'></script>
</meta>
<body>
<script>
var urlpath = "vardah.tiff"
var oReq = new XMLHttpRequest();
oReq.open("GET", urlpath, true);
oReq.responseType = "arraybuffer";
oReq.onload = function(oEvent) {
var t0 = performance.now();
var tiff = GeoTIFF.parse(this.response);
var image = tiff.getImage();
var data = image.readRasters()[0];
var tiepoint = image.getTiePoints()[0];
var pixelScale = image.getFileDirectory().ModelPixelScale;
var t1 = performance.now();
console.log("Decoding took " + (t1 - t0) + " milliseconds.")
};
oReq.send(); //start process
</script>
- Note that the request must be set with an arraybuffer responsetype
NetCDF
NetCDF is a popular format among meteorology data. The format is quite simple and very flexible. As in the case of GeoTIFF, GDAL can write NetCDF files with a special form and there is a JavaScript library, netcdfjs that reads the format and it’s fast and not very big. It can be opened with QGIS if created with GDAL.
To create a NetCDF file from a GeoTIFF, just run:
gdal_translate -of netCDF -b 1 vardah.tiff vardah.nc
The name of the output band will be Band1, which is not very nice, since the actual name is stored in another field, not the one used to retrieve the data.
HTML example
<!DOCTYPE html>
<html>
<meta>
<script src='netcdfjs.js'></script>
</meta>
<body>
<script>
var urlpath = "vardah.nc"
var reader;
var oReq = new XMLHttpRequest();
oReq.open("GET", urlpath, true);
oReq.responseType = "blob";
oReq.onload = function(oEvent) {
var t0 = performance.now();
var blob = oReq.response;
reader_url = new FileReader();
reader_url.onload = function(e) {
var t0 = performance.now();
reader = new netcdfjs(this.result);
var dataValues = reader.getDataVariable('Band1');
var t1 = performance.now();
console.log("Decoding took " + (t1 - t0) + " milliseconds.")
}
var arrayBuffer = reader_url.readAsArrayBuffer(blob);
};
oReq.send(); //start process
</script>
- The variables lat and lon return the geographical coordinates for every pixel, which is a good feature
- Some metadata is stored in different variables and fields. Take a look to the library api to see them, but:
- Printing reader.variables will output a set ob objects with the projection information, longitudes and latitudes
- reader.dimensions stores the matrix size
- globalAttributes stores other metadata, such as the creation date, GDAL information, etc
- Note that the request must be set with a blob responsetype
JSON
This format is the first that comes in mind when thinking about sharing data. It’s the easiest to understand, and reading it is the most simple thing to code. But it’s a bad idea using it with medium sized matrices, since the size can be for times or more than the original uncompressed GeoTIFF.
HTML example
<!DOCTYPE html>
<html>
<body>
<script>
var oReq = new XMLHttpRequest();
oReq.addEventListener("load", function(data){
var t0 = performance.now();
var jsonData = JSON.parse(this.response);
var t1 = performance.now();
console.log("Decoding took " + (t1 - t0) + " milliseconds.")
});
oReq.open("GET", "vardah.json");
oReq.send();
</script>
- Just parse the JSON file!
- Of course, all the metadata is easy to add, so the format is very flexible
Creating the JSON sample file using python is easy:
import gdal
import json
from base64 import b64encode
import struct
d_s = gdal.Open("vardah.tiff")
data = d_s.GetRasterBand(1).ReadAsArray()
print(data.dtype)
out_data = []
print("Size:", data.shape)
for j in range(data.shape[0]):
for i in range(data.shape[1]):
out_data.append(float(data[j][i]))
json_data = {}
json_data['nx']= data.shape[1]
json_data['ny']= data.shape[0]
json_data['data'] = out_data
fp = open("vardah.json", "w")
fp.write(json.dumps(json_data))
fp.close()
- To make consistent data, put all the numbers in a list, but a matrix could be created the same way, and could be more convenient in certain cases
JSON with encoded data
Plain JSON data is expensive in terms of space. What if we encode the data in Base64? The data will be much smaller and the JSON format can store all the metadata we want with the same flexibility.
Let’s look first at how can we create the sample file:
import gdal
import json
from base64 import b64encode
import struct
d_s = gdal.Open("vardah.tiff")
data = d_s.GetRasterBand(1).ReadAsArray()
print(data.dtype)
out_data = []
print("Size:", data.shape)
for j in range(data.shape[0]):
for i in range(data.shape[1]):
out_data.append(float(data[j][i]))
json_data = {}
json_data['nx']= data.shape[1]
json_data['ny']= data.shape[0]
b64 = b64encode(struct.pack(str(len(out_data))+'f', \*out_data)).decode("utf-8")
json_data['data'] = b64
fp = open("vardahb64.json", "w")
fp.write(json.dumps(json_data))
fp.close()
- Just encode the list after packing it as a binary string
- I have packed the elements using a f, so as float32 values. If this is changed, remember to change the decoding part! Some variables such as classifications can be stored as bytes, which is much more efficient
- The b64encode function returns in bytes, so it has to be encoded to utf-8 to serialize it into a JSON
HTML example
<!DOCTYPE html>
<html>
<body>
<script>
var oReq = new XMLHttpRequest();
oReq.addEventListener("load", function(data){
var t0 = performance.now();
var jsonData = JSON.parse(this.response);
var data = atob(jsonData['data']);
var b = new Uint8Array(
data.split("").map(function(d){return String.charCodeAt(d)})
);
var float32Data = new Float32Array(b.buffer);
var t1 = performance.now();
console.log("Decoding took " + (t1 - t0) + " milliseconds.")
});
oReq.open("GET", "vardahb64.json");
oReq.send();
</script>
Reading this data is quite efficient, but not as easy as plain JSON. The steps are:
- Parse the JSON data with the JSON.parse function
- Convert the encoded field to a binary string using the atob function. This decodes the base64 string
- Retrieve all the bytes
- By splitting all the chars in the string, map all the characters to the UTF-16 codes using String.charCodeAt
- Put all the values to a JavaScript typed array, so we can convert them later
- Since the values were stored as float32, we create a buffer from the unigned int8 array and convert the types. That’s all
Binary data
Using binary data directly can be a bit more difficult, but the size is compact, the format is very flexible and the performance is very good. Also, it doesn’t require any external library, which is very convenient in many cases. And since you control all the format, the original data can be obfuscated easily.
If we want to store metadata, different data types may be involved, making the scripts more complicated, but it’s efficient and not so difficult to do.
Creating the file is easy:
import gdal
import struct
d_s = gdal.Open("vardah.tiff")
data = d_s.GetRasterBand(1).ReadAsArray()
print(data.dtype)
out_data = []
for j in range(data.shape[0]):
for i in range(data.shape[1]):
out_data.append(float(data[j][i]))
fp = open("vardah.bin", "wb")
fp.write(struct.pack(str(len(out_data))+'f', \*out_data))
fp.close()
- Just use the pack function to store the data
- Note that the data is packed with the f letter, this is as float32 elements
HTML example
Reading the binary data is really easy using Javascript typed arrays:
<!DOCTYPE html>
<html>
<body>
<script>
var oReq = new XMLHttpRequest();
oReq.addEventListener("load", function(data){
var t0 = performance.now();
var floatArray= new Float32Array(this.response);
var t1 = performance.now();
console.log("Decoding took " + (t1 - t0) + " milliseconds.")
});
oReq.open("GET", "vardah.bin");
oReq.responseType = 'arraybuffer';
oReq.send();
</script>
- Note that the request must be set with an arraybuffer responsetype
- Just read the responsa into a new Float32Array. All the values will be there
LZW compressed binary data
Of course, as in the GeoTIFF case, all the data can be compressed. Using complex compression algorithms makes you lose the advantage of coding everything without an external library, but the LZW algorithm is so simple that it can be added with a few lines of code.
I will use the code sample from the rossetacode.org site.
File creation using Python
import gdal
import struct
from base64 import b64encode
'''
Compression algorithm
'''
def compress(uncompressed):
"""Compress a string to a list of output symbols."""
# Build the dictionary.
dict_size = 256
dictionary = dict((chr(i), i) for i in xrange(dict_size))
w = ""
result = []
for c in uncompressed:
wc = w + c
if wc in dictionary:
w = wc
else:
result.append(dictionary[w])
# Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size += 1
w = c
# Output the code for w.
if w:
result.append(dictionary[w])
return result
d_s = gdal.Open("vardah.tiff")
data = d_s.GetRasterBand(1).ReadAsArray()
out_data = []
for j in range(data.shape[0]):
for i in range(data.shape[1]):
out_data.append(float(data[j][i]))
out_data_bytes = struct.pack(str(len(out_data))+'f', \*out_data)
compressed = compress(out_data_bytes)
fp = open("vardah.lzw.bin", "wb")
fp.write(struct.pack(str(len(compressed))+'H', \*compressed))
fp.close()
- The compression function is copied directly from the rossetacode.org site
- It’s supposed to work with a string, so we will convert out floats list into a binary bytes string
- pack will convert the data list into a string with the binary data. The compressed data will be byte by byte
- The data is compressed with the function
- The data is written as a string of unsigned shorts. This is because the compressed data is a list with values from 0 to 65535, so the unsigned short will be the most efficient way to store its values
The size is reduced by 50% in our example. If a classification is used instead of float values, the compression will be much more efficient.
If using python3, the compress function would be:
def compress(uncompressed):
"""Compress a string to a list of output symbols."""
# Build the dictionary.
dict_size = 256
dictionary = {bytes([i]): i for i in range(dict_size)}
w = b""
result = []
for c in uncompressed:
#print(type(w), type(bytes([c])), c, bytes([c]))
wc = w + bytes([c])
if wc in dictionary:
w = wc
else:
result.append(dictionary[w])
# Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size += 1
w = bytes([c])
# Output the code for w.
if w:
result.append(dictionary[w])
return result
- str vars in python 2 become bytes in python3, so everything has to be adapted
- xrange has to be changed to range
HTML example
<!DOCTYPE html>
<html>
<body>
<script>
var oReq = new XMLHttpRequest();
oReq.addEventListener("load", function(data){
var t0 = performance.now();
var compressedArray = new Uint16Array(this.response);
console.info(compressedArray.length);
var uncompressed = uncompress(compressedArray);
var t1 = performance.now();
console.log("Decoding took " + (t1 - t0) + " milliseconds.")
});
oReq.open("GET", "vardah.lzw.bin");
oReq.responseType = 'arraybuffer';
oReq.send();
//https://rosettacode.org/wiki/LZW_compression#JavaScript
function uncompress(compressed) {
var i,
dictionary = [],
w,
result,
floatResult = [],
k,
entry = "",
dictSize = 256;
for (i = 0; i < 256; i += 1) {
dictionary[i] = String.fromCharCode(i);
}
w = String.fromCharCode(compressed[0]);
result = w;
for (i = 1; i < compressed.length; i += 1) {
k = compressed[i];
if (dictionary[k]) {
entry = dictionary[k];
} else {
if (k === dictSize) {
entry = w + w.charAt(0);
} else {
return null;
}
}
result += entry;
// Add w+entry[0] to the dictionary.
dictionary[dictSize++] = w + entry.charAt(0);
w = entry;
}
//Convert from chars to float32 array
var b = new Uint8Array(
result.split("").map(function(d){return String.charCodeAt(d)})
);
return new Float32Array(b.buffer);
}
</script>
- As in the other cases, just cll the uncompress function and the float array data will be in the variable
- The uncompress function it the same of the one at the rossetacode.org site, but modified to convert the bytes string to a Float32Array
- By splitting all the chars in the string, map all the characters to the UTF-16 codes using String.charCodeAt
- Put all the values to a Uint8Array JavaScript typed array, so we can convert them later
- The unsigned short array is then converted to a Float32Array using buffers
Not so difficult! If some metadata has to be added, things can be a bit more complicated, specially if different types are involved
Performance comparison
I run all the options so it’s easy to compare the final file size and the time it takes to parse
Format | Size | Parsing time |
---|---|---|
Uncompressed GeoTIFF | 102 kB | 20 ms |
Packbits GeoTIFF | 103 kB | 80 ms |
LZW GeoTIFF | 53 kB | 54 ms |
Deflate GeoTIFF | 40 kB | 59 ms |
JSON | 490 kB | 9 ms |
Base64 JSON | 135 kB | 12 ms |
Binary | 101 kB | 0.15 ms |
LZW binary | 54 kB | 14 ms |
- GeoTIFF files, specially if compressed, are the smallest ones, but with the higher parsing time. Anyway, 60ms is a very good time, so it will be the usual method
- JSON files are the most inefficient in terms of space, and the parsing time is not as low as it could be, because there are many characters to parse
- Binary files are really fast to parse, and the size is quite small if compressed
What to do with all this binary data?
With the html canvas element and some libraries around there, many visualizations can be done with a point matrix. I made a tutorial some time ago: d3-raster-tools-docs
Links
- drawing raster data with d3js
- Vardah and leaflet block
- Generating the Vardah data file
- The original geotiff file
- The geotiff.js library
- The netcdfjs library
- Base64 Wikipedia page
- JavaScript typed arrays
- LZW algorithm wikipedia page
- LZW implementation in many languages, including Python and JavaScript
- D3js raster tools documentation