2019 FOSS4G Bucharest Talks speaker: Joachim Ungar
EOxCloudless: Level-3 Analysis Ready Satellite Data
Having an enormous amount of data available can be difficult to handle. At EOX among other things we create global satellite basemaps (https://s2maps.eu), therefore we understand that saving resources increases reliability of any tool, product or data. While working on the basemaps we found out we can derive new products for our customers: multispectral cloud free mosaics (https://cloudless.eox.at).
Usually, a user is confronted with a large number of single scenes or products with varying degrees of quality and cloud coverage. To make this first step of using EO data easier a good extraction method and data bundling becomes more and more important to make it easier to access Earth observation data without having to dig through the archives. Such dissemination options allow everyone to easily access large datasets which are reduced and prepared for instant analysis, machine learning, validation, etc.
There are some guidelines which try to define analysis ready data (ARD), however with no clear definition at hand. We are utilizing the experience we have gathered while working closely with our customers. These range from scientists, industry and various national agencies. Every single one of these have their own specifications and
Mapchete - tile-based geodata processing
Mapchete (https://github.com/ungarj/mapchete) is a tool written in Python which helps processing large amounts of geodata such as global high resolution datasets. It does so by executing a user-defined Python function on smaller chunks of data (tiles).
The standard tiling schemes follow the well-known tile pyramid schemes used by WMTS which also enable mapchete to let the user easily preview process outputs using a built-in development server (Flask) hosting an OpenLayers page.
By processing large areas through their much smaller tiles or metatiles, possible memory errors can be avoided. Furthermore, tiles can be processed on multiple CPU cores in parallel which speeds up the processing time.
All geospatial data (i.e. raster and feature data) are internally handled and exposed to the user-defined process function either as NumPy arrays (raster) or GeoJSON-like dictionaries (features) which can easily be edited with well-known Python packages like shapely or scipy.
For I/O operations mapchete makes heavy use of rasterio (https://github.com/mapbox/rasterio) and Fiona (https://github.com/Toblerity/Fiona). It can read data formats supported by these packages and can currently write outputs into WMTS-like tile directories of GeoTIF