by Jeff Siarto
This week, in collaboration with Geoscience Australia, we released our Sentinel-2 Cloud-Optimized GeoTIFF (COG) dataset on AWS Open Data. Our collection contains all 11.4 million scenes from the Sentinel-2 Public Dataset, except the JPEG2000 (JP2K) files are all converted to COGs. Our dataset is continuously updated to mirror the growth of the public Sentinel-2 data and we’ve added SpatioTemporal Asset Catalog (STAC) metadata to the JSON file to make searching, discovering, and working with the data easier.
Sentinel-2 is an important platform for Earth observation and its imagery contributes to ongoing research in climate change, land use, emergency management, security, and a host of other geophysical systems. There are two Sentinel-2 satellites: Sentinel-2A and Sentinel-2B, both in sun-synchronous polar orbits, 180 degrees out of phase from one another. This configuration allows the system to cover the entire planet with a revisit time (the time it takes for the satellite to get back over a previously-imaged area) of 5 days in mid-latitudes. This means that Sentinel-2 imagery is great for detecting changes on the Earth’s surface (think: last week there were trees here, this week they’re gone) and is a key tool for evaluating geographic areas before and after natural disasters.
By making the Sentinel-2 archive more cloud and analysis friendly (via COGs and STAC), we’re making the data more user friendly and (hopefully) making the lives of emergency managers, climate scientists, and mapmakers that much easier. This is a small step in our big goal of doing work that benefits our world.
Why Cloud-Optimized GeoTIFFs (COGs)?
Sentinel-2 L1C and L2A products are in JP2K format which is not “cloud friendly.” While JP2K files are smaller than their COG counterparts (less storage costs) they require 2x the data transfer and almost 25x more GET requests to do partial reading over the internet (high access costs). COGs have internal tiling, internal overviews, and can be accessed partially over the internet–you download and process only what you need. Reading a tile from a COG is at least 3 times faster than for JP2K and is much better suited for processing and analyzing data in the cloud.
More than 11.4 million Sentinel-2 scenes have been generated, globally, since Nov 1, 2016. Each scene has 17 GeoTIFF files–nearly 194 million COGs. Each scene takes approximately 8 minutes to process and we’ve processed up to 40,000 scenes/hour using the AWS SPOT market, allowing us to keep costs low.
Why SpatioTemporal Asset Catalog (STAC)?
STAC aims to provide common metadata for exposing geospatial assets, simplifying the search and discovery of datasets requiring different tools and APIs to access.
The goal of STAC is to enable a global index of all imagery, derived data products and alternative geospatial captures (LiDAR, SAR, Full Motion Video, Hyperspectral and beyond). STAC focuses on an easily implementable standard for organizations to expose their data in a persistent and reliable way.
We’ve even provided a public API called Earth-search which is a central search catalog of all AWS Public Datasets using STAC (including the new Sentinel-2 COGs).
We are big supporters of the AWS Open Data program and continue to invest in the future of cloud-accessible and analysis-ready geospatial data as a means to reducing time-to-science and making Earth observation data more broadly accessible to the scientific and software development communities.
Need Help Getting Started?
- Intake-STAC with sat-search and Sentinel-2 COGs (Jupyter Notebook)
- Sentinel-2 COGs on AWS Open Data
- Earth Search Overview
Tell Us About Your Project
We’re always interested in hearing about projects using the Sentinel-2 COG Archive or any of our open source tools for search, discovery, and processing. If you’re interested in working with our team or just want to let us know about your awesome geospatial project, please get in touch.