Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add augurs-clustering crate with DBSCAN algorithm #100

Merged
merged 7 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run_benchmarks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
repository: ${{ github.event.pull_request.base.repo.full_name }}

- name: Benchmark base
run: cargo bench -- --save-baseline main > base.txt
run: cargo bench --all-features -- --save-baseline main > base.txt

- name: Upload base benchmark Results
uses: actions/upload-artifact@v4
Expand Down
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ keywords = [

[workspace.dependencies]
augurs-changepoint = { version = "0.3.1", path = "crates/augurs-changepoint" }
augurs-clustering = { version = "0.3.1", path = "crates/augurs-clustering" }
augurs-core = { version = "0.3.1", path = "crates/augurs-core" }
augurs-dtw = { version = "0.3.1", path = "crates/augurs-dtw" }
augurs-ets = { version = "0.3.1", path = "crates/augurs-ets" }
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,9 @@ APIs are subject to change, and functionality may not be fully implemented.
| Name | Purpose | Status |
| ------------------------ | -------------------------------------------------------------------- | -------------------------------------------------------------------- |
| [`augurs-changepoint`][] | Changepoint detection for time series | alpha - API is flexible right now |
| [`augurs-clustering`][] | Time series clustering algorithms | alpha - API is flexible right now |
| [`augurs-core`][] | Common structs and traits | alpha - API is flexible right now |
| [`augurs-dtw`][] | Dynamic Time Warping (DTW) | alpha - API is flexible right now |
| [`augurs-dtw`][] | Dynamic Time Warping (DTW) | alpha - API is flexible right now |
| [`augurs-ets`][] | Automatic exponential smoothing models | alpha - non-seasonal models working and tested against statsforecast |
| [`augurs-mstl`][] | Multiple Seasonal Trend Decomposition using LOESS (MSTL) | beta - working and tested against R |
| [`augurs-outlier`][] | Outlier detection for time series | alpha - API is flexible right now |
Expand Down Expand Up @@ -62,6 +63,7 @@ Dual-licensed to be compatible with the Rust project.
Licensed under the Apache License, Version 2.0 `<http://www.apache.org/licenses/LICENSE-2.0>` or the MIT license `<http://opensource.org/licenses/MIT>`, at your option.

[`augurs-changepoint`]: https://crates.io/crates/augurs-changepoint
[`augurs-clustering`]: https://crates.io/crates/augurs-clustering
[`augurs-core`]: https://crates.io/crates/augurs-core
[`augurs-dtw`]: https://crates.io/crates/augurs-dtw
[`augurs-ets`]: https://crates.io/crates/augurs-ets
Expand Down
10 changes: 10 additions & 0 deletions crates/augurs-clustering/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Other
- Add `augurs-clustering` crate
23 changes: 23 additions & 0 deletions crates/augurs-clustering/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[package]
name = "augurs-clustering"
license.workspace = true
authors.workspace = true
documentation.workspace = true
repository.workspace = true
version.workspace = true
edition.workspace = true
keywords.workspace = true
description = "Time series clustering."

[dependencies]
augurs-core.workspace = true

[dev-dependencies]
criterion.workspace = true

[lib]
bench = false

[[bench]]
name = "dbscan"
harness = false
1 change: 1 addition & 0 deletions crates/augurs-clustering/LICENSE-APACHE
1 change: 1 addition & 0 deletions crates/augurs-clustering/LICENSE-MIT
41 changes: 41 additions & 0 deletions crates/augurs-clustering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Time series clustering algorithms

This crate contains algorithms for clustering time series.

So far, only DBSCAN is implemented, and the distance matrix must be passed directly.
A crate such as [`augurs-dtw`] must be used to calculate the distance matrix for now.

## Usage

```rust
use augurs_clustering::{Dbscan, DistanceMatrix};

# fn main() -> Result<(), Box<dyn std::error::Error>> {
let distance_matrix = DistanceMatrix::try_from_square(
vec![
vec![0.0, 1.0, 2.0, 3.0],
vec![1.0, 0.0, 3.0, 3.0],
vec![2.0, 3.0, 0.0, 4.0],
vec![3.0, 3.0, 4.0, 0.0],
],
)?;
let clusters = Dbscan::new(0.5, 2).fit(&distance_matrix);
assert_eq!(clusters, vec![-1, -1, -1, -1]);
# Ok(())
# }
```

## Credits

This implementation is based heavily on to the implementation in [`linfa-clustering`] and [`scikit-learn`].
The main difference between these is that we operate directly on the distance matrix rather than calculating
it as part of the clustering algorithm.

[`augurs-dtw`]: https://crates.io/crates/augurs-dtw
[`linfa-clustering`]: https://crates.io/crates/linfa-clustering
[`scikit-learn`]: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

## License

Dual-licensed to be compatible with the Rust project.
Licensed under the Apache License, Version 2.0 `<http://www.apache.org/licenses/LICENSE-2.0>` or the MIT license `<http://opensource.org/licenses/MIT>`, at your option.
24 changes: 24 additions & 0 deletions crates/augurs-clustering/benches/dbscan.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
use criterion::{criterion_group, criterion_main, Criterion};

use augurs_clustering::Dbscan;
use augurs_core::DistanceMatrix;

fn dbscan(c: &mut Criterion) {
let distance_matrix = include_str!("../data/dist.csv")
.lines()
.map(|l| {
l.split(',')
.map(|s| s.parse::<f64>().unwrap())
.collect::<Vec<f64>>()
})
.collect::<Vec<Vec<f64>>>();
let distance_matrix = DistanceMatrix::try_from_square(distance_matrix).unwrap();
sd2k marked this conversation as resolved.
Show resolved Hide resolved
c.bench_function("dbscan", |b| {
b.iter(|| {
Dbscan::new(10.0, 3).fit(&distance_matrix);
});
});
}

criterion_group!(benches, dbscan);
criterion_main!(benches);
Loading
Loading