Difference between revisions of "MPhil/ACS projects"
(Created page with "=Global Carbon Mapping Using TESSERA= We are seeking an MPhil student in Computer Science to undertake an exciting Master's project on global carbon mapping using TESSERA (Te...") |
(No difference)
|
Latest revision as of 11:58, 1 October 2025
Global Carbon Mapping Using TESSERA
We are seeking an MPhil student in Computer Science to undertake an exciting Master's project on global carbon mapping using TESSERA (Temporal Embeddings of Surface Spectra for Earth Representation and Analysis). This project offers a unique opportunity to work with a cutting-edge foundation model that provides unprecedented access to global Earth observation data at 10-meter resolution, enabling robust carbon accounting and ecosystem analysis at scale. TESSERA (https://svr-sk818-web.cl.cam.ac.uk/tessera/) is an open-source foundation model that uses self-supervised learning to distil petabytes of satellite time series into 128-dimensional spectral-temporal representations, preserving critical phenological signals that conventional approaches lose. Your work will contribute to pressing climate challenges by developing novel methods for carbon stock estimation and land-use change detection, leveraging TESSERA's analysis-ready outputs to generate actionable insights for conservation and sustainable land management. This project combines state-of-the-art machine learning with real-world environmental impact, offering excellent opportunities for publication and collaboration with leading researchers in remote sensing and climate science. If you are passionate about applying computational methods to address the climate crisis and want to work with globally scalable, privacy-preserving technology that enables reproducible science, we encourage you to apply.
LLMs to Align Habitat Names Within Consistent Hierarchies
We are seeking an MPhil student in Computer Science to develop innovative applications of large language models (LLMs) for aligning local and regional habitat names to the IUCN habitat classification scheme (https://www.iucnredlist.org/resources/habitat-classification-scheme). This project addresses a critical challenge in biodiversity conservation: ecological surveys and conservation assessments worldwide use diverse, context-specific habitat nomenclatures that are difficult to reconcile with standardized frameworks, hindering global-scale analysis and Red List assessments. The IUCN habitat classification provides a standardized three-level hierarchy using familiar habitat terms that account for biogeography, latitudinal zonation, and depth in marine systems, but mapping heterogeneous local terminologies to this framework remains labour-intensive and inconsistent. Your work will explore how modern LLMs can intelligently parse habitat descriptions, understand ecological context, and propose robust alignments to the IUCN hierarchy, potentially incorporating few-shot learning, semantic reasoning, and uncertainty quantification. This project offers substantial impact for conservation practice by enabling automated, transparent crosswalks between local habitat classifications and global standards, facilitating species Area of Habitat assessments and supporting evidence-based conservation decisions. If you are excited about applying natural language processing to biodiversity informatics and want to bridge the gap between computational linguistics and ecological science, this project provides an excellent opportunity to develop novel methods with immediate real-world applications in global conservation efforts.
Creating a 10m Scale Climatology Dataset from a Foundation Model
This project proposes to develop the first 10-meter resolution climatology dataset for the United Kingdom using self-supervised learning (SSL) features derived from satellite imagery. By using precomputed embeddings from geospatial foundation models and weather station data, we can create climate maps with unprecedented spatial detail for applications in conservation, agriculture, and climate adaptation planning.
Climatology data is foundational for applied research across ecology, agriculture, conservation biology, and climate science. However, existing state of the art datasets suffer from spatial resolution limitations:
- WorldClim: Global climate data at ~1km resolution, widely used but spatially coarse
- CHELSA: Climate data at ~1km resolution with improved modeling for high elevations
These coarse resolution data average out microclimate variation that drives ecological processes, species distributions, and agricultural productivity. For example, temperature differences between north-facing and south-facing slopes are invisible in current datasets due to aliasing effects, as the entire north-south slope may be represented by a single 1km grid cell by WorldClim.
Recent advances in satellite remote sensing and machine learning offer new possibilities for climate mapping. Sentinel-1 and Sentinel-2 satellites provide global coverage at 10m resolution with regular revisit cycles, which geospatial foundation models (ex. Tessera https://github.com/ucam-eo/tessera) can extract environmental information from, and with self-supervised learning approaches, learn rich representations without requiring labeled training data.
The insight is that satellite imagery captures environmental conditions (topography, vegetation, land cover, seasonal dynamics) that directly influence local climate. SSL features from satellite time series should therefore correlate strongly with climate variables measured at weather stations.
The primary objective is to develop 10-meter resolution temperature and precipitation maps for the United Kingdom using SSL features derived from satellite and weather station data.
Secondary Objectives may include:
- Quantitatively compare the accuracy of SSL-based climate maps against existing climate data - Analyze the spatial patterns of microclimate variation revealed at 10m resolution - Demonstrate applications for conservation planning, agricultural modeling, and climate adaptation - Establish methodological framework for scaling to other regions or global coverage
Data Sources
Satellite Data: Pre-computed SSL embeddings from Sentinel-1 and Sentinel-2 (2017-2024) accessed via GeoTessera API (https://github.com/ucam-eo/geotessera). These embeddings capture temporal dynamics and environmental patterns at 10m spatial resolution.
Climate Data: weather station measurements including: - Daily/monthly temperature (minimum, maximum, average) - Precipitation totals - Other variables as available (humidity, wind speed)
GHCN: https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily Met Office: https://www.metoffice.gov.uk/research/climate/maps-and-data/historic-station-data ECAD: https://www.ecad.eu/
Validation Data: Independent weather stations held out from training for accuracy testing.