GitHub - spaceml-org/SDO-FM: SDO-FM: A foundation model for the Sun

SDO-FM: A foundation model for the Sun

SDO-FM is a prototype foundation model that integrates data from SDO’s HMI and AMI instruments to encapsulate the Sun's dynamics within an embedding space. Encapsulating the Sun's complex physical interactions in a multi-modal model unlocks the potential for many downstream investigations, lowering the complexity and costs for machine learning in solar physics. Building a small fine-tuned adapter can be done for a fraction of the cost of building an AI model with classical methods.

SDO-FM consists of four key components: an ingestion pipeline to create machine learning ready datasets, model architecture and training, embeddings and fine-tunable models, and downstream fine-tuned applications. SDO-FM utilizes existing datasets (SDOMLv2) and well defined downstream tasks.

The engineering focused on adapting two model architectures to the SDO dataset, and building a benchmarking harness for scientific validation. Training was engineered for a TPU-distributed approach, designed to be compute-agnostic and aligned with industry best practices.

SDO-FM was built using a science-led approach and active involvement of the scientific community. Validation tasks for the prototype as well as future proposed validation tasks were chosen through a series of workshops and feedback from heliophysics researchers, teams building other foundation models in weather prediction, climate science, and earth sciences, and SDO instrument experts. This collaborative approach will help us optimize the model for useful downstream applications and adapt to emerging tools and methodologies.

Downstream validation tasks

The model is validated by adapting the embeddings and comparing outcomes against published known results using classical machine learning methods. The four validation tasks used were predicting the Earth’s thermospheric density (F10.7), reconstructing missing channels, autocalibration of the AIA instrument, and the virtualization of the broken MEGS-A instrument (Virtual EVE).

The project will also investigate if the foundation model can either replicate or leverage SDOML (the data product developed in FDL.AI).

Repo structure

├── assets              # images for this readme
├── experiments         # configuration files for different trials 
├── notebooks           # visualisation/testing ipynb
├── scripts             # entrypoint and highest level executors
├── sdofm               # python package
│   ├── ablation        # models without backbone integration
│   ├── benchmarks      # metrics for comparison
│   ├── datasets        # dataloaders/modules
│   ├── finetuning      # modules for finetuning
│   ├── models          # model components 
│   ├── pretraining     # modules for pretraining
└── └── visualisation   # various graphing utilities

Datasets

Name	Description	Granularity & Source
NASA’s Solar Dynamics Observatory (SDO) Pesnell et al. 2012	Three instruments: Atmospheric Imaging Assembly (AIA) 2 ultraviolet, 1600 & 1700 Å 7 extreme ultraviolet, 94, 131, 171, 193, 211, 304, and 335 Å. Helioseismic and Magnetic Imager (HMI) - visible filtergrams processed into: photospheric Dopplergrams line-of-sight magnetograms vector magnetograms. EUV Variability Experiment (EVE) - EUV spectral irradiance from 1 to 1050 Å. MEGS disperse EUV light from full disk of the Sun and corona onto a charge coupled device.	4096x4096 12 second cadence: AIA - Lemen et al. 2012. HMI - Hoeksema et al. 2014. 1024 x 2048: EUV - Woods et al. 2012. Downsampled 512x512/0.6, 512x512/0.5 arcsec 6 (AIA) 12 (HMI) minute cadence for machine learning: Galvez et al. 2019 via sdoml.org.

Models

Backbones

Name	Paper
Masked Autoencoders Are Scalable Vision Learners	He, Kaiming, et al. "Masked autoencoders are scalable vision learners." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022 (link)
Foundation Models for Generalist Geospatial Artificial Intelligence (Prithvi)	Jakubik, Johannes, et al. "Foundation models for generalist geospatial artificial intelligence." arXiv preprint arXiv:2310.18660 (2023) (link)
NVAE: A Deep Hierarchical Variational Autoencoder	Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." Advances in neural information processing systems 33 (2020): 19667-19679 (link)

Heads

Name	Paper
Multichannel autocalibration for the Atmospheric Imaging Assembly using machine learning	Dos Santos, Luiz FG, et al. "Multichannel autocalibration for the Atmospheric Imaging Assembly using machine learning." Astronomy & Astrophysics 648 (2021): A53 (link)
Virtual EVE: a Deep Learning Model for Solar Irradiance Prediction	Indaco, Manuel, et al. "Virtual EVE: a Deep Learning Model for Solar Irradiance Prediction." (2023) (link)

Setup

Installation

SDO-FM can be installed locally by directly installing the package in this repository.

pip install -e .

Usage

To run any task we assume execution inside a container with the image described in the Dockerfile and Hydra configurations, these are kept in the experiments directory. The entry point is main.py and args will select a configuration:

python scripts/main.py --config-name=default

CLI overrides are still possible with this selection but be aware of some shells not escaping quotes or sqaure brackets:

python scripts/main.py --config-name=default experiment.seed=37

Pre-training

python scripts/main.py --config-name=pretrain_tiny

Fine-tuning

Science objective 1: Dimming

python scripts/main.py --config-name=dimming_tiny

Science objective 2: TBD

Science objective 3: TBD

Development

Training a model in GCP Vertex

To train a model in the cloud start by branching off main and prepare your model and experiment configuration.

git branch -b <NEW_EXPERIMENT>
cd SDO-FM
cp experiments/default.yaml <NEW_EXPERIMENT_CONFIG>.yaml

Once ready commit your changes and tag it with a version number of the format v*.*.*, e.g. v0.1.2

git add .
git commit -m "Added <NEW EXPERIMENT>"
git tag v0.1.2
git push -u origin <NEW_EXPERIMENT>
git push --tags

This will trigger an action to build this repository state in Google Cloud Build, it takes around 10 mins 🍵. Once completed it will be available in W&B as a job here. To run it, select "Launch" and define your overrides, e.g.

{
    "args": [
        "--config-name=<NEW_EXPERIMENT_CONFIG>",
        "experiment_name=<NEW_EXPERIMENT_RUN>",
    ],
    "run_config": {},
    "entry_point": ["/src/scripts/main.py"]
}

Set your compute resources abiding by this table, e.g.

MACHINE_TYPE: a2-ultragpu-8g
ACCELERATOR_TYPE: NVIDIA_A100_80GB
ACCELERATOR_COUNT: 8

Citation

@software{SDOFM_2024,
    title           = {{Solar Dynamics Observatory Foundation Model}},
    repository-code = {https:/spaceml-org/SDO-FM},
    year            = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 235 Commits
.github/workflows		.github/workflows
assets		assets
experiments		experiments
notebooks		notebooks
scripts		scripts
sdofm		sdofm
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
tpu_install_dev_env.sh		tpu_install_dev_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDO-FM: A foundation model for the Sun

Downstream validation tasks

Repo structure

Datasets

Models

Backbones

Heads

Setup

Installation

Usage

Pre-training

Fine-tuning

Science objective 1: Dimming

Science objective 2: TBD

Science objective 3: TBD

Development

Training a model in GCP Vertex

Citation

About

Releases

Packages

Contributors 5

Languages

spaceml-org/SDO-FM

Folders and files

Latest commit

History

Repository files navigation

SDO-FM: A foundation model for the Sun

Downstream validation tasks

Repo structure

Datasets

Models

Backbones

Heads

Setup

Installation

Usage

Pre-training

Fine-tuning

Science objective 1: Dimming

Science objective 2: TBD

Science objective 3: TBD

Development

Training a model in GCP Vertex

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages