Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weather-mv: Improve tool's efficiency in terms of time & memory. #291

Open
mahrsee1997 opened this issue Feb 3, 2023 · 2 comments
Open
Labels
enhancement New feature or request P1 weather-mv

Comments

@mahrsee1997
Copy link
Collaborator

mahrsee1997 commented Feb 3, 2023

Time Efficient:

Make use of gcloud alpha storage in open_local() method, sinks.py.

Findings -- using gsutil for downloading the data from gcs to the local file system is 5 times slower compared to gcloud alpha storage.

Memory Efficient:

Every time when we log xr_dataset.nbytes it will takes the complete dataset in-memory which is causing OOM killer invocation.
TODO: Find a better way for logging the dataset size.

Real-time data ingestion into BQ:

beam.io.WriteToBigQuery() -- in case of batch pipeline data is not ingested into BQ in real-time. Because batch pipeline processes all elements before writing to BigQuery.

@mahrsee1997 mahrsee1997 added enhancement New feature or request P1 weather-mv labels Feb 3, 2023
@mahrsee1997
Copy link
Collaborator Author

A temporary fix has been implemented on the mv-optimization branch (link). Further work is required to prepare the changes for merging.

@mahrsee1997
Copy link
Collaborator Author

Fixed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P1 weather-mv
Projects
None yet
Development

No branches or pull requests

1 participant