Main Page
Contents
TESSERA Foundation Model Project
Introduction
Background and motivation
Earth Observation (EO) from satellites has already generated petabytes of data in the last few decades. The rate of growth of observation data is set to increase as more satellites are launched due to rapidly declining launch costs. Most observation data are freely available. Moreover, due to the repeat patterns of satellite orbits, observation data form a time series for each observed location on Earth. Careful analysis and interpretation of these time series can help address critical issues including biodiversity monitoring, identifying land use and land cover, choosing optimal crop management strategies, and quantifying forest degradation and deforestation.
Problem
Despite the abundance and free availability of EO data, much of it is “contaminated,” especially in the optical domain, by cloud cover, sensor-specific observation biases, and non-uniform temporal sampling due to the inherent nature of satellite orbital patterns.
State of the art
Existing approaches for handling contaminated observation time series fall into three categories. The first is multi-temporal compositing, which aggregates data collected across a certain time period so that regions obscured by clouds during one satellite pass are filled in by an earlier/later observation. This approach is effective, but, crucially, it los es the fine-grain temporal signal embedded in the data. For example, a three-month composite of crop observations would lose much of the crop-growth induced reflectance signal. Similarly, composited forest observations lose valuable phenological information.
A second approach is multi-spectral. These techniques extract information from weak signals to restore missing information. However, they work best when the optical signals are only partially affected by clouds.
Finally, several approaches propose in-painting of cloud-obscured patches. See Section II.B of the PLFM paper for a survey and where they fall short. The most sophisticated in-painting approach in the literature appears to be PLFM, where cloud-penetrating microwave radar data collected by the Sentinel 1 as well as temporal-sequence blending is used to remove clouds from Sentinel 2 optical images. However, a radar sensor cannot provide information on parameters such as chlorophyll absorption, which are of prime importance in the optical signal, because this does not affect the measured microwave backscattering coefficient. Moreover, only very coarse structures are captured by Sentinel 1 and no biochemicals, except water. Finally, the goal of this work seems to be to produce visually reasonable looking outputs rather than exact outputs, as we are.
Our approach: Barlow Twins SSL
The key idea in our work is to represent the time series of multi-spectral reflectance patterns from a grid cell (pixel) with a single numerical ‘representation’ that is derived using a self-supervised learning (SSL) algorithm. SSL methods extract meaningful representations of input data by optimising a surrogate objective. Unlike supervised learning approaches that require labelled ‘ground truth’ data created by human experts, which is time-consuming, expensive, and error-prone, SSL needs no labels. Moreover, SSL representations can typically be more easily transferred across time and space. The extracted representations can be directly used for downstream tasks, since, although they needn’t do so in general, for data that are inherently ‘sparse’, they implicitly represent a multi-class categorization/clustering of the input data. Alternatively, the semantic meaning associated with the representations can be discovered by training a classifier, such as a random forest, with only a small amount of labelled data.
The specific SSL approach we use is the Barlow Twin (BT) approach, which is a self-supervised, non-contrastive way to train SSL models. Our surrogate task is to ensure that different augmentations (e.g., different cropped versions of the input signal) lead to the same representations, and where the representations across a data batch are more or less uncorrelated to each other. Like other SSL approaches, the Barlow Twin does not need labelled data to create a foundation model. We have recently shown that categorised BT representations achieve high accuracy in crop classification. Specifically, by training a random forest classifier to categorise representations using small amounts of ground truth data, we can assign representations to crop types with high accuracy, when compared with ground truth. Mantle Labs has also used representations for quantification of Above Ground Biomass (AGB in t/ha), land cover mapping, tree species identification, crop type mapping, pasture quality assessment, and selection of counterfactuals and found these to be of good quality.