Skip to content

Commit

Permalink
ENH: Adding and documenting configs in nx-parallel (#75)
Browse files Browse the repository at this point in the history
* initial commit

* minor docs updates

* style fix

* mv _set_nx_config to decorators.py

* renamed nx_config to active

* style fix

* added _set_nx_config to main namespace

* added F401

* Renamed _set_nx_config to _configure_if_nx_active

* updated Config.md

* Improved Config.md

* improved Config.md

* added _configure_if_nx_active to all funcs

* renamed cpu_count to get_n_jobs

* removing n_jobs from Parallel() because that will be configured using joblib.parallel_config or networkx config

* renaming cpu_count or total_cores to n_jobs

* updated README

* updated docs acc to config

* updated Config.md and README.md based on the review comments

* improved config docs
  • Loading branch information
Schefflera-Arboricola authored Aug 26, 2024
1 parent a98224c commit 15de782
Show file tree
Hide file tree
Showing 25 changed files with 441 additions and 188 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def parallel_func(G, nx_arg, additional_backend_arg_1, additional_backend_arg_2=

In parallel computing, "chunking" refers to dividing a large task into smaller, more manageable chunks that can be processed simultaneously by multiple computing units, such as CPU cores or distributed computing nodes. It's like breaking down a big task into smaller pieces so that multiple workers can work on different pieces at the same time, and in the case of nx-parallel, this usually speeds up the overall process.

The default chunking in nx-parallel is done by first determining the number of available CPU cores and then allocating the nodes (or edges or any other iterator) per chunk by dividing the total number of nodes by the total CPU cores available. (ref. [chunk.py](./nx_parallel/utils/chunk.py)). This default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https:/networkx/nx-parallel/pull/33)). Also, when [the `config` PR](https:/networkx/networkx/pull/7225) is merged in networkx, and the `config` will be added to nx-parallel, then the user would be able to control the number of CPU cores they would want to use and then the chunking would be done accordingly.
The default chunking in nx-parallel is done by slicing the list of nodes (or edges or any other iterator) into `n_jobs` number of chunks. (ref. [chunk.py](./nx_parallel/utils/chunk.py)). By default, `n_jobs` is `None`. To learn about how you can modify the value of `n_jobs` and other config options refer [`Config.md`](./Config.md). The default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https:/networkx/nx-parallel/pull/33)).

## General guidelines on adding a new algorithm

Expand Down
156 changes: 156 additions & 0 deletions Config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Configuring nx-parallel

`nx-parallel` provides flexible parallel computing capabilities, allowing you to control settings like `backend`, `n_jobs`, `verbose`, and more. This can be done through two configuration systems: `joblib` and `NetworkX`. This guide explains how to configure `nx-parallel` using both systems.

## 1. Setting configs using `joblib.parallel_config`

`nx-parallel` relies on [`joblib.Parallel`](https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html) for parallel computing. You can adjust its settings through the [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) class provided by `joblib`. For more details, check out the official [joblib documentation](https://joblib.readthedocs.io/en/latest/parallel.html).

### 1.1 Usage

```python
from joblib import parallel_config

# Setting global configs
parallel_config(n_jobs=3, verbose=50)
nx.square_clustering(H)

# Setting configs in a context
with parallel_config(n_jobs=7, verbose=0):
nx.square_clustering(H)
```

Please refer the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand the config parameters.

Note: Ensure that `nx.config.backends.parallel.active = False` when using `joblib` for configuration, as NetworkX configurations will override `joblib.parallel_config` settings if `active` is `True`.

## 2. Setting configs using `networkx`'s configuration system for backends

To use NetworkX’s configuration system in `nx-parallel`, you must set the `active` flag (in `nx.config.backends.parallel`) to `True`.

### 2.1 Configs in NetworkX for backends

When you import NetworkX, it automatically sets default configurations for all installed backends, including `nx-parallel`.

```python
import networkx as nx

print(nx.config)
```

Output:

```
NetworkXConfig(
backend_priority=[],
backends=Config(
parallel=ParallelConfig(
active=False,
backend="loky",
n_jobs=None,
verbose=0,
temp_folder=None,
max_nbytes="1M",
mmap_mode="r",
prefer=None,
require=None,
inner_max_num_threads=None,
backend_params={},
)
),
cache_converted_graphs=True,
)
```

As you can see in the above output, by default, `active` is set to `False`. So, to enable NetworkX configurations for `nx-parallel`, set `active` to `True`. Please refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on networkx configuration system.

### 2.2 Usage

```python
# enabling networkx's config for nx-parallel
nx.config.backends.parallel.active = True

# Setting global configs
nxp_config = nx.config.backends.parallel
nxp_config.n_jobs = 3
nxp_config.verbose = 50

nx.square_clustering(H)

# Setting config in a context
with nxp_config(n_jobs=7, verbose=0):
nx.square_clustering(H)
```

The configuration parameters are the same as `joblib.parallel_config`, so you can refer to the [official joblib's documentation](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html) to better understand these config parameters.

### 2.3 How Does NetworkX's Configuration Work in nx-parallel?

In `nx-parallel`, there's a `_configure_if_nx_active` decorator applied to all algorithms. This decorator checks the value of `active`(in `nx.config.backends.parallel`) and then accordingly uses the appropriate configuration system (`joblib` or `networkx`). If `active=True`, it extracts the configs from `nx.config.backends.parallel` and passes them in a `joblib.parallel_config` context manager and calls the function in this context. Otherwise, it simply calls the function.

## 3. Comparing NetworkX and Joblib Configuration Systems

### 3.1 Using Both Systems Simultaneously

You can use both NetworkX’s configuration system and `joblib.parallel_config` together in `nx-parallel`. However, it’s important to understand their interaction.

Example:

```py
# Enable NetworkX configuration
nx.config.backends.parallel.active = True
nx.config.backends.parallel.n_jobs = 6

# Global Joblib configuration
joblib.parallel_config(backend="threading")

with joblib.parallel_config(n_jobs=4, verbose=55):
# NetworkX config for nx-parallel
# backend="loky", n_jobs=6, verbose=0
nx.square_clustering(G, backend="parallel")

# Joblib config for other parallel tasks
# backend="threading", n_jobs=4, verbose=55
joblib.Parallel()(joblib.delayed(sqrt)(i**2) for i in range(10))
```

- **NetworkX Configurations for nx-parallel**: When calling functions within `nx-parallel`, NetworkX’s configurations will override those specified by Joblib. For example, the `nx.square_clustering` function will use the `n_jobs=6` setting from `nx.config.backends.parallel`, regardless of any Joblib settings within the same context.

- **Joblib Configurations for Other Code**: For any other parallel code outside of `nx-parallel`, such as a direct call to `joblib.Parallel`, the configurations specified within the Joblib context will be applied.

This behavior ensures that `nx-parallel` functions consistently use NetworkX’s settings when enabled, while still allowing Joblib configurations to apply to non-NetworkX parallel tasks.

**Key Takeaway**: When both systems are used together, NetworkX's configuration (`nx.config.backends.parallel`) takes precedence for `nx-parallel` functions. To avoid unexpected behavior, ensure that the `active` setting aligns with your intended configuration system.

### 3.2 Key Differences

- **Parameter Handling**: The main difference is how `backend_params` are passed. Since, in networkx configurations are stored as a [`@dataclass`](https://docs.python.org/3/library/dataclasses.html), we need to pass them as a dictionary, whereas in `joblib.parallel_config` you can just pass them along with the other configurations, as shown below:

```py
nx.config.backends.parallel.backend_params = {"max_nbytes": None}
joblib.parallel_config(backend="loky", max_nbytes=None)
```

- **Default Behavior**: By default, `nx-parallel` looks for configs in `joblib.parallel_config` unless `nx.config.backends.parallel.active` is set to `True`.

### 3.3 When Should You Use Which System?

When the only networkx backend you're using is `nx-parallel`, then either of the NetworkX or `joblib` configuration systems can be used, depending on your preference.

But, when working with multiple NetworkX backends, it's crucial to ensure compatibility among the backends to avoid conflicts between different configurations. In such cases, using NetworkX's configuration system to configure `nx-parallel` is recommended. This approach helps maintain consistency across backends. For example:

```python
nx.config.backend_priority = ["another_nx_backend", "parallel"]
nx.config.backends.another_nx_backend.config_1 = "xyz"
joblib.parallel_config(n_jobs=7, verbose=50)

nx.square_clustering(G)
```

In this example, if `another_nx_backend` also internally utilizes `joblib.Parallel` (without exposing it to the user) within its implementation of the `square_clustering` algorithm, then the `nx-parallel` configurations set by `joblib.parallel_config` will influence the internal `joblib.Parallel` used by `another_nx_backend`. To prevent unexpected behavior, it is advisable to configure these settings through the NetworkX configuration system.

**Future Synchronization:** We are working on synchronizing both configuration systems so that changes in one system automatically reflect in the other. This started with [PR#68](https:/networkx/nx-parallel/pull/68), which introduced a unified context manager for `nx-parallel`. For more details on the challenges of creating a compatibility layer to keep both systems in sync, refer to [Issue#76](https:/networkx/nx-parallel/issues/76).

If you have feedback or suggestions, feel free to open an issue or submit a pull request.

Thank you :)
50 changes: 28 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,26 @@ nx-parallel is a NetworkX backend that uses joblib for parallelization. This pro

## Algorithms in nx-parallel

- [number_of_isolates](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#L8)
- [square_clustering](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/cluster.py#L10)
- [local_efficiency](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L9)
- [closeness_vitality](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#L9)
- [is_reachable](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L10)
- [tournament_is_strongly_connected](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L54)
- [all_pairs_node_connectivity](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/connectivity/connectivity.py#L17)
- [approximate_all_pairs_node_connectivity](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/approximation/connectivity.py#L12)
- [betweenness_centrality](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L19)
- [edge_betweenness_centrality](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L94)
- [node_redundancy](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/bipartite/redundancy.py#L11)
- [all_pairs_dijkstra](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L28)
- [all_pairs_dijkstra_path_length](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L71)
- [all_pairs_dijkstra_path](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L121)
- [all_pairs_bellman_ford_path_length](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L164)
- [all_pairs_bellman_ford_path](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L209)
- [johnson](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L252)
- [all_pairs_all_shortest_paths](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/generic.py#L10)
- [all_pairs_shortest_path_length](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L18)
- [all_pairs_shortest_path](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L62)
- [all_pairs_all_shortest_paths](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/generic.py#L11)
- [all_pairs_bellman_ford_path](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L212)
- [all_pairs_bellman_ford_path_length](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L168)
- [all_pairs_dijkstra](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L29)
- [all_pairs_dijkstra_path](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L124)
- [all_pairs_dijkstra_path_length](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L73)
- [all_pairs_node_connectivity](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/connectivity/connectivity.py#L18)
- [all_pairs_shortest_path](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L63)
- [all_pairs_shortest_path_length](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L19)
- [approximate_all_pairs_node_connectivity](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/approximation/connectivity.py#L13)
- [betweenness_centrality](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L20)
- [closeness_vitality](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#L10)
- [edge_betweenness_centrality](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L96)
- [is_reachable](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L13)
- [johnson](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L256)
- [local_efficiency](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L10)
- [node_redundancy](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/bipartite/redundancy.py#L12)
- [number_of_isolates](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#L9)
- [square_clustering](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/cluster.py#L11)
- [tournament_is_strongly_connected](https:/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L59)

<details>
<summary>Script used to generate the above list</summary>
Expand Down Expand Up @@ -107,6 +107,12 @@ Note that for all functions inside `nx_code.py` that do not have an nx-parallel
import networkx as nx
import nx_parallel as nxp

# enabling networkx's config for nx-parallel
nx.config.backends.parallel.active = True

# setting `n_jobs` (by default, `n_jobs=None`)
nx.config.backends.parallel.n_jobs = 4

G = nx.path_graph(4)
H = nxp.ParallelGraph(G)

Expand All @@ -121,10 +127,10 @@ nxp.betweenness_centrality(G)

# method 4 : using nx-parallel implementation with ParallelGraph object
nxp.betweenness_centrality(H)

# output : {0: 0.0, 1: 0.6666666666666666, 2: 0.6666666666666666, 3: 0.0}
```

For more on how to play with configurations in nx-parallel refer the [Config.md](./Config.md)! Additionally, refer the [NetworkX's official backend and config docs](https://networkx.org/documentation/latest/reference/backends.html) for more on functionalities provided by networkx for backends and configs like logging, `backend_priority`, etc. Another way to configure nx-parallel is by using [`joblib.parallel_config`](https://joblib.readthedocs.io/en/latest/generated/joblib.parallel_config.html).

### Notes

1. Some functions in networkx have the same name but different implementations, so to avoid these name conflicts at the time of dispatching networkx differentiates them by specifying the `name` parameter in the `_dispatchable` decorator of such algorithms. So, `method 3` and `method 4` are not recommended. But, you can use them if you know the correct `name`. For example:
Expand Down
Loading

0 comments on commit 15de782

Please sign in to comment.