-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamline data situation #53
Labels
needs-votes 👍
Please upvote, if this is worthwhile
Comments
among other things, the repeated downloads of the big dumps via |
maxheld83
referenced
this issue
in subugoe/openairegraph
Apr 29, 2020
this would also actually be a feature for a lot of users, who might face the same problem when they run this in CI or collaboratively. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We seem to be running into a similar problem in several projects, including http:/subugoe/hoad/, http:/subugoe/openairegraph/ and the crossref dump situation http:/njahn82/cr_dump/:
There's big-ish (>1MB) serialised data, usually JSON, CSV or the same compressed, which is either/or
(I'm not talking about databases here, that's a separate concern).
These files cause several problems / face limitations:
git commit
ed (too large)Possible straightforward solutions might be:
store only locally(no reproducibility)store on a network drive(no reproducibility)setting up a database(too expensive/too much hassle unless absolutely necessary)I think we need something else which neatly abstracts away all this.
There's probably a good solution out there already.
One avenue to pursue would be git lfs.
Ideally, we should have a solution which understands serialised data, and has a better understanding of diffing rows. (order does not matter).
Anyway, this shouldn't be too complicated and we might start with something small.
I'm going to look into this when I have the time.
I think this could save us all a lot of time.
The text was updated successfully, but these errors were encountered: