Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread tries to map memory for the entire file when using nrows #4329

Open
Tracked by #3189
Feakster opened this issue Mar 30, 2020 · 5 comments
Open
Tracked by #3189

fread tries to map memory for the entire file when using nrows #4329

Feakster opened this issue Mar 30, 2020 · 5 comments
Labels
fread top request One of our most-requested issues

Comments

@Feakster
Copy link

Hi,

I've been trying to use fread() to import a large (12GB) tab-delimited text file, which is too large for my machine to import in its entirety. I thought that I would be able to use the nrows parameter to import a cut-down version of the file to draft my code on, but this results in the following error:

System errno 22 unmapping file: Invalid argument
Error in fread("data.tab", header = T, sep = "\t", nrows = 10L) : 
  Opened 11.63GB (12491321418 bytes) file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.

However, if I use head to create a new file containing only the first 11 rows of the original file I am able to use fread() to import the new file without issue.

Below is a sample of the code I am running to replicate the issue:

### File Size (Original File) ###
file.size("data.tab")/1e9 # Roughly 12.5 GB

### Import First 10 Rows (Original File) ###
dt <- fread("data.tab",
            header = T, sep = "\t",
            nrows = 10L) # Fails

### Create New File Using the First 11 Rows of the Existing One ###
system("head -n 11 data.tab > data_head.tab")

### File Size (New File) ###
file.size("data_head.tab")/1e3 # Roughly 330 KB

### Import (New File) ###
dt <- fread("data_head.tab",
            header = T, sep = "\t") # Succeeds

I have searched the current issues log for data.table, but the only problem I've found resembling mine is issue #2321, which was closed on 3rd March 2018. The closing messages for this issue stated that the issue had been fixed in data.table version 1.10.5 through the use of lazy memory mapping. However, I'm using data.table version 1.12.8 and seem to be stumbling across the same issue. Once imported, the 10-row data table is only 1.9MB - nowhere near the physical memory limit of my machine (4GB).

My output for sessionInfo() is below:

R version 3.6.3 (2020-02-29)
Platform: aarch64-unknown-linux-gnu (64-bit)
Running under: Manjaro ARM

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.9.so
LAPACK: /usr/lib/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_GB.UTF-8           LC_COLLATE=en_GB.UTF-8       
 [5] LC_MONETARY=en_GB.UTF-8       LC_MESSAGES=en_GB.UTF-8      
 [7] LC_PAPER=en_GB.UTF-8          LC_NAME=en_GB.UTF-8          
 [9] LC_ADDRESS=en_GB.UTF-8        LC_TELEPHONE=en_GB.UTF-8     
[11] LC_MEASUREMENT=en_GB.UTF-8    LC_IDENTIFICATION=en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8 rkward_0.7.1     

loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3   

Thank you in advance,

Ben

@elad663
Copy link

elad663 commented Jan 26, 2021

I had a similar problem on windows on R 4.0.2. Was solved by upgrading R to 4.0.3.

@george-kan
Copy link

Same issue here with R 4.0.4

@jangorecki
Copy link
Member

@george-kan could you provide reproducible example?

@george-kan
Copy link

@jangorecki what I tried was reading one bz2 file from here: https://database.lichess.org/ . The data in the bz2 file are essentially 1 column data.
The command I used was: fread("lichess_db_standard_rated_2020-03.pgn.bz2", colClasses = "character", nrows = 1000)
But this fails. I apologize if this is not 100% reproducible but I cannot think of another way to share a massive file... Please let me know in case I can help more somehow.

@jangorecki
Copy link
Member

Thanks for info. Interesting, I use lichess.org myself occasionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fread top request One of our most-requested issues
Projects
None yet
Development

No branches or pull requests

5 participants