Creating mosaics, clipping and removing overlapping satellite images

Posted on April 2, 2021 · Tagged with gis, python, maps, visualization

Intro

This post describes ways to download, clip and join satellite images. The module sentinel-mosaic is used throughout this blog post.

Background

There’s a number of satellites launched by the European Space Agency that take images of Earth which are then sent to ground stations and made publicly available through the Copernicus Open Access Hub and its respective API.

In this post we’re focusing mainly on the data from Sentinel-2A and Sentinel-2B which were designed for multiple purposes, one of those purposes being land monitoring.

Here are some of the other satellites together with their launch dates and lifespans.

satellitelaunch dateend of lifespan

Sentinel-2A

2015-06-23

2022-09-23 (7.25y lifespan)

Sentinel-2B

2017-05-07

2024-08-07 (7.25y lifespan)

Sentinel-3B

2018-04-25

2025-04-25

Sentinel-3A

2016-02-16

2023-02-16

Sentinel-5P

2017-10-13

2024-10-13 (7y)

Sentinel-6A (currently Sentinel-6 Michael Freilich, also known as Jason-CS)

2020-11-21 link1

2026-04-21 (5.5 years)

Sentinel-6B

~ 2026-01-01

~ 2031-06-01

The Copernicus Hub has a data retention policy. The policy states that data is readily available for download for the past 12 months, and older data downloads can be triggered and downloaded within 24 hours.

Setup

The following libraries are required. We’re installing them through the conda package manager that comes with the Anaconda3 Python distribution. There’s a Windows installer and a Linux installer.

The nice part about Anaconda3 is that they offer curated versions of these packages and their package manager takes care of external libraries as well. This makes the install much easier, so fixes like the one I had to do for Geopandas here when it was installed directly from PyPi are not necessary anymore.

libraryfunctionalityversion

rasterio

raster image library

1.2.1

sentinelsat

library that offers ways to query data collected from Sentinel through the Sentinel-2A and Sentinel-2B satellites

0.14

shapely

library that provides algorithms for processing vector data

1.7.1

GDAL

library and set of tools that allows processing of geographical raster images from commandline. it’s also a dependency for other lirbaries.

3.1.4

geopandas

library for transforming and querying geographical data frames (the data frames can contain metadata with polygon contours)

0.9.0

matplotlib.pyplot

library for plotting data and drawing vectorial as well as raster data (mostly used for previews and summarizing data)

3.4.0

Jupyter

1.0.0

Anaconda distribution

Anaconda3 2020.11

Python version used

Python 3.8.5

In order to install these, create a new environment, switch to it and then install the packages.

conda create -n e3 python=3.8.5
conda activate e3
conda config --add channels conda-forge
conda config --set channel_priority strict
conda install rasterio==1.2.1 sentinelsat==0.14 shapely==1.7.1
conda install gdal==3.1.4 geopandas==0.9.0 matplotlib==3.4.0 jupyter==1.0.0

Now also install the sentinel-mosaic package

pip install sentinel-mosaic

Summary of the full algorithm

The steps are the following:

  1. Reading the input data (the region of interest) from a GeoJSON file

  2. Querying the Sentinel API to get the tiles that intersect the region of interest

  3. Removing redundant granules I

  4. Removing redundant granules II

  5. Downloading the reduced set of granules (multiple zip archives stored on disk)

  6. Unpacking the archives

  7. Raster conversion JP2OpenJPEGGeoTiff

  8. Joining all the raster images TCI (True Color) into one big image

  9. Reprojecting the image to EPSG:4326

  10. Clipping the region of interest from the big image

Removing redundant granules

The AOI(area of interest) itself might be located partially in one satellite image, and partially in another satellite image. In order to extract it, we need to pick certain granules (satellite images) that cover the AOI, download them, join them, and then cut out the AOI.

One of the problems I noticed when dealing with the metadata returned by the Sentinel API was that it returns overlapping granules. This is because the overlapping images were actually taken on different orbits of the satellites over nearly the same area.

Furthermore, here’s a small table describing the types of images that you can expect to get back:

publicleveldimensionsdisk space

No

Level-0

25km x 23km

?

No

Level-1A

25km x 23km

?

No

Level-1B

25km x 23km

27 MB

Yes

Level-1C

100km x 100km

600 MB

Yes

Level-2A

100km x 100km

800 MB

So in order to minimize the amount of data you download, we can decide which of the granules covering the area of interest we want to download (depending on the attributes of each granule). Reducing these images also helps later on with the other steps (the less images we’re joining/reprojecting, the less time it will take).

p5 algo 1 1
Figure 1. 1 Input data
p5 algo 1 2
Figure 2. 2 Union of all polygons
p5 algo 1 3
Figure 3. 3 After removing redundant polygons

 

This algorithm goes through all polygons and adds them to union_poly only if they’re not already contained in union_poly (in other words, we’re only adding them to union_poly if they can increase the total area covered).

In the code below, ps is the set of polygons used as input.

1
2
3
4
5
6
7
8
9
union_poly = ps[0]
union_parts = [ps[0],]
for p in ps[1:]:
    common = union_poly.intersection(p)
    if p.area - common.area < 0.001:
        pass
    else:
        union_parts.append(p)
        union_poly = union_poly.union(p)

The area of interest

The area of interest is a polygon (you can draw one yourself on geojson.io) that describes the final area we want clipped from the granules. Here are two examples, one of Cluj-Napoca, and the other of Madrid.

p5 cluj aoi

p5 madrid aoi

Usage

After going through the setup described above, and defining an area of interest, you can run the entire algorithm like so:

python3.8 ./sentinel-toolbelt.py --auth_file auth.json --aoi_file ./sample_inputs/p12_madrid.geojson --dl_dir ./out/madrid/

You can also use the --start_date and --end_date switches to only use images taken from a certain period.

The file auth.json has the following structure:

{
    "user": ".....",
    "pass": "....."
}

and you can get a username/password by creating an account here.

Results

The dates at which the images were taken will matter, because even though the adjacent images will match (and you can see this by looking at rivers or roads continuing from one image to next one), the colors can differ a lot if the images were taken in different seasons (for example one in autumn and one during winter).

Here are results from intermediary and final steps (you can zoom-in a lot, they’re very large):

If you liked this article and would like to discuss more about custom software solutions feel free to get in touch at stefan.petrea@gmail.com.