Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing local and cloud SEG-Y files with new I/O library #381

Merged
merged 67 commits into from
Jun 25, 2024
Merged

Conversation

tasansal
Copy link
Collaborator

@tasansal tasansal commented Apr 4, 2024

This is a significant overhaul of MDIO by replacing segyio with our own segy parser. There are minor breaking changes.

SEG-Y Docs, SEG-Y Repo

⚠️ BREAKING ⚠️

  • The segy_to_mdio endianness parameter is gone and will be inferred from the file.
  • The output format can no longer be changed. Whatever was on the input now is on the output.
  • Many dependencies require newer versions (may not be an issue).

ℹ️ Import/Export Version Matrix ℹ️

MDIO Version File Version < 0.8.0 File Version >= 0.8.0
< 0.8.0
>= 0.8.0

There are some new capabilities:

  • Input SEG-Y revision and endianness are automatically inferred (or can be overridden).
  • Some SEG-Y binary header keys can now be overridden (i.e., ext text headers, etc, that break standard and ingestion).
  • Ingesting any SEG-Y spec is now allowed. It doesn't have to strictly adhere to segyio fields (i.e., missing past trace header byte 233 (offset 232).
  • SEG-Y files can now be ingested from the cloud without download or HTTP links. (Beware of performance. It needs careful configuration and depends on the environment; we are working on documentation to explain).
  • A lot of SEG-Y parsing complexity is now offloaded to segy library.
  • Headers are parsed faster (or think of it as less Python CPU cycles) than segyio.
  • Some extra bug fixes here and there.

@tasansal tasansal self-assigned this Apr 4, 2024
@tasansal tasansal added breaking Breaking Changes enhancement New feature or request labels Apr 4, 2024
Removed the list type check from the text_header setter in accessor.py. The application now expects a string input instead of a list, simplifying the validation process.
Removed the list type check from the text_header setter in accessor.py. The application now expects a string input instead of a list, simplifying the validation process.
Removed the list type check from the text_header setter in accessor.py. The application now expects a string input instead of a list, simplifying the validation process.
Simplify `header_scan_worker` and `trace_worker` in SEG-Y module by removing unused imports and streamlining parameter list. Update functions to work directly with `SegyFile` instances and clean up data handling logic for efficiency.
Refactor the parsing functions in `src/mdio/segy/parsers.py` to simplify the codebase and improve maintainability. Redundant functions such as `parse_binary_header`, `parse_text_header`, and `get_trace_count` have been removed, while imports have been condensed to only essential modules. The `NUM_CORES` logic is updated to count logical cores instead of just physical ones.
Removed unused imports and functions in the SEG-Y converter module to enhance code maintainability. Simplified the arguments for the `segy_to_mdio` function to increase ease of use and readability. Reduced complexity by utilizing `SegyFile` class for SEG-Y file operations.
The get_grid_plan function in utilities.py has been refactored to accept a SegyFile instance instead of individual parameters for the file path. Unused imports were eliminated, and type checking imports are now conditional, improving readability and modularity.
The changes involve major refactoring of the code base to use the 'segy' library instead of 'segyio'. Most notably, this included updating the handling of SEG-Y dtypes, byte order, and trace headers. Unused imports have been removed to clean up the code. A new multiprocessing chunk size has been introduced and set attributes to SegyFile instance instead of passing them as function arguments.
The segy package version has been updated from 0.0.13 to 0.0.14 in the pyproject.toml file. This upgrade was performed to update software dependencies and to integrate the latest bug fixes and features delivered with the new version.
A new helper function, 'make_segy_factory', has been created to handle the generation of SegyFactory. This function accepts more parameters to provide better control over the creation of the SEG-Y based on the MDIO metadata. Changes also include updates in import declarations and reorganization of some code blocks in the 'mdio_spec_to_segy' function.
In the SegyFactory initialization within creation.py, the sample_interval parameter has been modified to be multiplied by 1000. This change ensures that the value is correctly represented in microseconds, aligning with the expected data format.
The safety check in noxfile.py has been updated to temporarily ignore a specific Common Vulnerabilities and Exposures (CVE) number because it's not deemed critical. A TODO note is added to remind removal of this exception once the issue is resolved.
@tasansal tasansal marked this pull request as ready for review June 24, 2024 23:09
@tasansal tasansal merged commit f4a5ad4 into main Jun 25, 2024
20 checks passed
@tasansal tasansal deleted the new-segy-io branch June 25, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking Changes enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant