-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend 2 billion row benchmarks e.g. memory usage, sorting, joining, by-reference #2
Labels
Comments
This was referenced Jun 8, 2014
mattdowle
changed the title
Extensive benchmarking on large in-memory dataset
Extend 2 billion row benchmarks
Sep 26, 2014
mattdowle
changed the title
Extend 2 billion row benchmarks
Extend 2 billion row benchmarks e.g. memory usage, sorting, joining, by-reference
Dec 12, 2014
For memory usage, perhaps: https:/gsauthof/cgmemtime |
Figured |
Closed
I would like to close this one as it is already epic, and will be epic for a long time, due to broad scope defined here. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We've currently gone to 2E9 rows (the 32bit index limit) with 9 columns (100GB). See benchmarks page on wiki.
Ideally it would be great to compare all available tools that are either specifically developed for large in-memory data manipulation or are capable of handling data at these sizes much better than base. Of course base-R should also be included, typically as control.
Aspect of benchmarking should be to highlight not just run time (speed), but also memory usage. The sorting/ordering by reference, sub-assignment by reference etc.. features, for example, at this data size should display quite clearly on speed and memory gains attainable.
The text was updated successfully, but these errors were encountered: