Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safety check to prevent small sub dataframe from overwriting full dataframe on disk #126

Merged
merged 1 commit into from
Oct 31, 2022

Conversation

kushalkolar
Copy link
Collaborator

@kushalkolar kushalkolar commented Oct 29, 2022

Addresses #125

Since a DataFrame batch path is propogated to any "sub dataframes" created by fancy indexing or other means, this prevents the existing full dataframe on disk from being overwritten by the smaller sub dataframe.

For example:

from mesmerize_core import *

df = load_batch("/path/to/batch.pickle")

# create a sub dataframe
sub_df = df[df["algo"] == "cnmf"]

# this is dangerous! So this PR would prevent it.
sub_df.save_to_disk()

User can still override with the max_index_diff kwarg:

# forced save if number of rows doesn't differ by more than 5
sub_df.save_to_disk(max_index_diff=5)

@kushalkolar kushalkolar added this to the v0.1 beta 2 milestone Oct 29, 2022
@kushalkolar kushalkolar merged commit 607c04f into master Oct 31, 2022
@kushalkolar kushalkolar deleted the save-to-disk-safety-check branch January 8, 2023 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant