Function to rechunk single variables or batch variables from existing MDIO #368

tasansal · 2024-03-12T01:24:14Z

This PR allows users to rechunk MDIO datasets when needed.

Also added examples and documentation.

The MDIO API has been enhanced with support for additional file operation modes ('w' for rechunking) and a new rechunking feature. The 'copy_mdio' function now accepts strongly typed arguments, and 'rechunk' functions have been added to efficiently resize chunks for large datasets, with progress tracking and error handling improvements.

Expanded the docstrings in `convenience.py` to include examples illustrating how to use `rechunk` and `rechunk_batch` functions for clarity and ease of use for developers.

The new section "Convenience Functions" has been added to the reference documentation. It specifically includes documentation for the `mdio.api.convenience` module, excluding `create_rechunk_plan` and `write_rechunked_values`.

The rechunk operations in the MDIO API now accept an optional compressor parameter, allowing users to specify a custom data compression codec. The default compressor, Blosc('zstd'), is set if none is provided, ensuring backward compatibility.

Inserted a TODO comment in `rechunk` function for writing tests, referencing the relevant issue.

Removed an extraneous newline and introduced a constant MAX_BUFFER to handle buffer size for chunking. Updated the create_rechunk_plan function's docstring to include the buffer size details, making it clearer how the buffer size can be adjusted by altering the MAX_BUFFER variable. This change enhances code readability and maintainability.

Added a new demo `rechunking.ipynb` to demonstrate how to optimize access patterns using rechunking and lossy compression. The notebook includes detailed steps and code snippets to create optimized, compressed copies for different access patterns, enhancing read performance.

Streamlined the Jupyter notebook by resetting execution counts and cleaning up metadata fields. This provides a fresh state for the execution environment and a more structured document for other developers to follow.

Corrected the reference from 'notebook' to 'page', added a parenthetical clarification to a section heading, and updated the performance benchmark outputs. These changes improve document clarity and provide the latest performance metrics.

tasansal added the enhancement New feature or request label Mar 12, 2024

tasansal self-assigned this Mar 12, 2024

tasansal added 8 commits March 11, 2024 20:51

Add usage examples to rechunk functions

55596fb

Expanded the docstrings in `convenience.py` to include examples illustrating how to use `rechunk` and `rechunk_batch` functions for clarity and ease of use for developers.

Add convenience functions section to docs

8fb1b6a

The new section "Convenience Functions" has been added to the reference documentation. It specifically includes documentation for the `mdio.api.convenience` module, excluding `create_rechunk_plan` and `write_rechunked_values`.

Add rechunk function TODO comment

d34d816

Inserted a TODO comment in `rechunk` function for writing tests, referencing the relevant issue.

Refactor notebook: reset execution counts and tidy metadata

95d436d

Streamlined the Jupyter notebook by resetting execution counts and cleaning up metadata fields. This provides a fresh state for the execution environment and a more structured document for other developers to follow.

Update rechunking notebook with minor tweaks

363390a

Corrected the reference from 'notebook' to 'page', added a parenthetical clarification to a section heading, and updated the performance benchmark outputs. These changes improve document clarity and provide the latest performance metrics.

tasansal merged commit d157465 into main Mar 12, 2024
20 checks passed

tasansal deleted the add-rechunker branch March 12, 2024 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function to rechunk single variables or batch variables from existing MDIO #368

Function to rechunk single variables or batch variables from existing MDIO #368

tasansal commented Mar 12, 2024 •

edited

Loading

Function to rechunk single variables or batch variables from existing MDIO #368

Function to rechunk single variables or batch variables from existing MDIO #368

Conversation

tasansal commented Mar 12, 2024 • edited Loading

tasansal commented Mar 12, 2024 •

edited

Loading