Skip to content

Commit

Permalink
feat: add sparse PCA using variable projection (#170)
Browse files Browse the repository at this point in the history
Publication of this method is available at http://arxiv.org/abs/1804.00341. Associated repository with Python code can be found here: https:/erichson/ristretto/tree/master/ristretto
  • Loading branch information
nicrie authored Jul 18, 2024
1 parent a2669b8 commit e2ccc76
Show file tree
Hide file tree
Showing 25 changed files with 1,864 additions and 160 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ For questions or support, please open a [Github issue](https:/xarray
- MCA: Python package [xMCA](https:/Yefee/xMCA) by Yefee
- CCA: Python package [CCA-Zoo](https:/jameschapman19/cca_zoo) by James Chapman
- ROCK-PCA: Matlab implementation by [Diego Bueso](https:/DiegoBueso/ROCK-PCA)
- Sparse PCA: Based on [Ristretto](https:/erichson/ristretto) library by Benjamin Erichson

## How to cite?

Expand Down
40 changes: 40 additions & 0 deletions docs/_autosummary/xeofs.models.SparsePCA.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
xeofs.models.SparsePCA
======================

.. currentmodule:: xeofs.models

.. autoclass:: SparsePCA
:members:
:show-inheritance:
:inherited-members:


.. automethod:: __init__


.. rubric:: Methods

.. autosummary::

~SparsePCA.__init__
~SparsePCA.components
~SparsePCA.compute
~SparsePCA.deserialize
~SparsePCA.explained_variance
~SparsePCA.explained_variance_ratio
~SparsePCA.fit
~SparsePCA.fit_transform
~SparsePCA.get_params
~SparsePCA.get_serialization_attrs
~SparsePCA.inverse_transform
~SparsePCA.load
~SparsePCA.save
~SparsePCA.scores
~SparsePCA.serialize
~SparsePCA.transform






1 change: 1 addition & 0 deletions docs/api_1_intra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@ Single-Set Analysis
xeofs.models.ComplexEOF
xeofs.models.ComplexEOFRotator
xeofs.models.ExtendedEOF
xeofs.models.SparsePCA
xeofs.models.OPA
xeofs.models.GWPCA
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/auto_examples/1single/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="EOF analysis in S-mode maximises the temporal variance.">
<div class="sphx-glr-thumbcontainer" tooltip="This example demonstrates the application of sparse PCA [1]_ to sea surface temperature data. S...">

.. only:: html

Expand All @@ -76,7 +76,7 @@

.. raw:: html

<div class="sphx-glr-thumbnail-title">EOF analysis (S-mode)</div>
<div class="sphx-glr-thumbnail-title">Sparse PCA</div>
</div>


Expand Down
46 changes: 30 additions & 16 deletions docs/auto_examples/1single/plot_eof-smode.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,14 @@
"metadata": {},
"source": [
"\n",
"# EOF analysis (S-mode)\n",
"# Sparse PCA\n",
"\n",
"EOF analysis in S-mode maximises the temporal variance.\n"
"This example demonstrates the application of sparse PCA [1]_ to sea surface temperature data. Sparse PCA is an alternative to rotated PCA, where the components are sparse, often providing a more interpretable solution.\n",
"\n",
"We replicate the analysis from the original paper [1]_, which identifies the ENSO (El Niño-Southern Oscillation) as the fourth mode, representing about 1% of the total variance. The original study focused on weekly sea surface temperatures from satellite data, whereas we use monthly data from ERSSTv5. Consequently, our results may not match exactly, but they should be quite similar.\n",
"\n",
"## References\n",
".. [1] Erichson, N. B. et al. Sparse Principal Component Analysis via Variable Projection. SIAM J. Appl. Math. 80, 977-1002 (2020).\n"
]
},
{
Expand All @@ -17,12 +22,20 @@
"outputs": [],
"source": [
"# Load packages and data:\n",
"import xarray as xr\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.gridspec import GridSpec\n",
"import xarray as xr\n",
"from cartopy.crs import EqualEarth, PlateCarree\n",
"from matplotlib.gridspec import GridSpec\n",
"\n",
"from xeofs.models import EOF"
"from xeofs.models import SparsePCA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use sea surface temperature data from 1990 to 2017, consistent with the original paper.\n",
"\n"
]
},
{
Expand All @@ -31,14 +44,15 @@
"metadata": {},
"outputs": [],
"source": [
"sst = xr.tutorial.open_dataset(\"ersstv5\")[\"sst\"]"
"sst = xr.tutorial.open_dataset(\"ersstv5\")[\"sst\"]\n",
"sst = sst.sel(time=slice(\"1990\", \"2017\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perform the actual analysis\n",
"We perform sparse PCA using the `alpha` and `beta` parameters, which define the sparsity imposed by the elastic net (refer to Table 1 in the paper). In our analysis, we set `alpha` to 1e-5, as specified by the authors. Although the authors do not specify a value for `beta`, it appears that the results are not highly sensitive to this parameter. Therefore, we use the default `beta` value of 1e-4.\n",
"\n"
]
},
Expand All @@ -48,7 +62,7 @@
"metadata": {},
"outputs": [],
"source": [
"model = EOF(n_modes=5, use_coslat=True)\n",
"model = SparsePCA(n_modes=4, alpha=1e-5)\n",
"model.fit(sst, dim=\"time\")\n",
"expvar = model.explained_variance()\n",
"expvar_ratio = model.explained_variance_ratio()\n",
Expand All @@ -60,7 +74,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Explained variance fraction\n",
"The explained variance fraction confirms that the fourth mode explains about 1% of the total variance, which is consistent with the original paper.\n",
"\n"
]
},
Expand All @@ -78,7 +92,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Create figure showing the first two modes\n",
"Examining the first four modes, we clearly identify ENSO as the fourth mode.\n",
"\n"
]
},
Expand All @@ -91,10 +105,10 @@
"proj = EqualEarth(central_longitude=180)\n",
"kwargs = {\"cmap\": \"RdBu\", \"vmin\": -0.05, \"vmax\": 0.05, \"transform\": PlateCarree()}\n",
"\n",
"fig = plt.figure(figsize=(10, 8))\n",
"gs = GridSpec(3, 2, width_ratios=[1, 2])\n",
"ax0 = [fig.add_subplot(gs[i, 0]) for i in range(3)]\n",
"ax1 = [fig.add_subplot(gs[i, 1], projection=proj) for i in range(3)]\n",
"fig = plt.figure(figsize=(10, 12))\n",
"gs = GridSpec(4, 2, width_ratios=[1, 2])\n",
"ax0 = [fig.add_subplot(gs[i, 0]) for i in range(4)]\n",
"ax1 = [fig.add_subplot(gs[i, 1], projection=proj) for i in range(4)]\n",
"\n",
"for i, (a0, a1) in enumerate(zip(ax0, ax1)):\n",
" scores.sel(mode=i + 1).plot(ax=a0)\n",
Expand All @@ -104,7 +118,7 @@
" a0.set_xlabel(\"\")\n",
"\n",
"plt.tight_layout()\n",
"plt.savefig(\"eof-smode.jpg\")"
"plt.savefig(\"sparse_pca.jpg\")"
]
}
],
Expand All @@ -124,7 +138,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
39 changes: 25 additions & 14 deletions docs/auto_examples/1single/plot_eof-smode.py
Original file line number Diff line number Diff line change
@@ -1,48 +1,57 @@
"""
EOF analysis (S-mode)
Sparse PCA
========================
EOF analysis in S-mode maximises the temporal variance.
This example demonstrates the application of sparse PCA [1]_ to sea surface temperature data. Sparse PCA is an alternative to rotated PCA, where the components are sparse, often providing a more interpretable solution.
We replicate the analysis from the original paper [1]_, which identifies the ENSO (El Niño-Southern Oscillation) as the fourth mode, representing about 1% of the total variance. The original study focused on weekly sea surface temperatures from satellite data, whereas we use monthly data from ERSSTv5. Consequently, our results may not match exactly, but they should be quite similar.
References
----------
.. [1] Erichson, N. B. et al. Sparse Principal Component Analysis via Variable Projection. SIAM J. Appl. Math. 80, 977-1002 (2020).
"""

# Load packages and data:
import xarray as xr
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import xarray as xr
from cartopy.crs import EqualEarth, PlateCarree
from matplotlib.gridspec import GridSpec

from xeofs.models import EOF
from xeofs.models import SparsePCA

# %%
# We use sea surface temperature data from 1990 to 2017, consistent with the original paper.

sst = xr.tutorial.open_dataset("ersstv5")["sst"]
sst = sst.sel(time=slice("1990", "2017"))

# %%
# Perform the actual analysis
# We perform sparse PCA using the `alpha` and `beta` parameters, which define the sparsity imposed by the elastic net (refer to Table 1 in the paper). In our analysis, we set `alpha` to 1e-5, as specified by the authors. Although the authors do not specify a value for `beta`, it appears that the results are not highly sensitive to this parameter. Therefore, we use the default `beta` value of 1e-4.

model = EOF(n_modes=5, use_coslat=True)
model = SparsePCA(n_modes=4, alpha=1e-5)
model.fit(sst, dim="time")
expvar = model.explained_variance()
expvar_ratio = model.explained_variance_ratio()
components = model.components()
scores = model.scores()

# %%
# Explained variance fraction
# The explained variance fraction confirms that the fourth mode explains about 1% of the total variance, which is consistent with the original paper.

print("Explained variance: ", expvar.round(0).values)
print("Relative: ", (expvar_ratio * 100).round(1).values)

# %%
# Create figure showing the first two modes
# Examining the first four modes, we clearly identify ENSO as the fourth mode.

proj = EqualEarth(central_longitude=180)
kwargs = {"cmap": "RdBu", "vmin": -0.05, "vmax": 0.05, "transform": PlateCarree()}

fig = plt.figure(figsize=(10, 8))
gs = GridSpec(3, 2, width_ratios=[1, 2])
ax0 = [fig.add_subplot(gs[i, 0]) for i in range(3)]
ax1 = [fig.add_subplot(gs[i, 1], projection=proj) for i in range(3)]
fig = plt.figure(figsize=(10, 12))
gs = GridSpec(4, 2, width_ratios=[1, 2])
ax0 = [fig.add_subplot(gs[i, 0]) for i in range(4)]
ax1 = [fig.add_subplot(gs[i, 1], projection=proj) for i in range(4)]

for i, (a0, a1) in enumerate(zip(ax0, ax1)):
scores.sel(mode=i + 1).plot(ax=a0)
Expand All @@ -52,4 +61,6 @@
a0.set_xlabel("")

plt.tight_layout()
plt.savefig("eof-smode.jpg")
plt.savefig("sparse_pca.jpg")

# %%
2 changes: 1 addition & 1 deletion docs/auto_examples/1single/plot_eof-smode.py.md5
Original file line number Diff line number Diff line change
@@ -1 +1 @@
9dcc843908baf836161fbc9c490b6136
bb0b6c390528787d00735dd586afb457
Loading

0 comments on commit e2ccc76

Please sign in to comment.