GreenHyperSpectra: Self-Supervised Learning for Global Vegetation Monitoring
๐ Overview
Created GreenHyperSpectra, a large-scale multi-source hyperspectral pretraining dataset designed to enable label-efficient machine learning for global vegetation trait prediction. Developed self-supervised and semi-supervised learning methods that overcome critical challenges of label scarcity and domain shifts across sensors, ecosystems, and geographical regions.
Type: PhD Research Project
Duration: February 2024 โ December 2025
Institution: Mila - Quebec AI Institute, ScaDS.AI Leipzig
Status: Published at NeurIPS 2025
Technical Highlights:
- Implemented self-supervised learning framework (MAE) using PyTorch for robust spectral representations
- Compiled multi-source dataset spanning diverse sensors, ecosystems, and geographical regions for pretraining foundation models
- Developed semi-supervised approaches leveraging both labeled and unlabeled samples to improve label efficiency
Technologies: PyTorch, Weights & Biases, Google Earth Engine, QGIS, Python, HPC Systems
Impact:
- Publication: Presented and accepted at NeurIPS 2025
- Dataset Release: Publicly available benchmark for vegetation trait prediction research
Links:
