GreenHyperSpectra: Self-Supervised Learning for Global Vegetation Monitoring

๐Ÿ“Œ Overview

Created GreenHyperSpectra, a large-scale multi-source hyperspectral pretraining dataset designed to enable label-efficient machine learning for global vegetation trait prediction. Developed self-supervised and semi-supervised learning methods that overcome critical challenges of label scarcity and domain shifts across sensors, ecosystems, and geographical regions.

Type: PhD Research Project

Duration: February 2024 โ€“ December 2025

Institution: Mila - Quebec AI Institute, ScaDS.AI Leipzig

Status: Published at NeurIPS 2025


Technical Highlights:

  • Implemented self-supervised learning framework (MAE) using PyTorch for robust spectral representations
  • Compiled multi-source dataset spanning diverse sensors, ecosystems, and geographical regions for pretraining foundation models
  • Developed semi-supervised approaches leveraging both labeled and unlabeled samples to improve label efficiency

Technologies: PyTorch, Weights & Biases, Google Earth Engine, QGIS, Python, HPC Systems

Impact:

  • Publication: Presented and accepted at NeurIPS 2025
  • Dataset Release: Publicly available benchmark for vegetation trait prediction research

Links: