Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #5384] fread() fail to deal with missing values in integer64 columns #488

Closed
arunsrinivasan opened this issue Jun 8, 2014 · 5 comments
Assignees
Labels
Milestone

Comments

@arunsrinivasan
Copy link
Member

Submitted by: Peter Stoyanov; Assigned to: Nobody; R-Forge link

Using fread() to read in the data below yields strange results for NA values in columns which fread() detects as integer64. All other columns are OK:

2012,276,,0,"S1","001",1,,724135215,1590915056,
2012,276,2,8,"S1","001",1, ,,154598,0
2012,276,2,12,"S1","001",1,NA,5118863,21819477,
2012,276,2,0,"S1","011",8,3127133583,3127133583,9003982501,0

The resulting data.table has "9218868437227407266" instead of "NA" in columns 8 and 9. Only str() prints these as NA, everything else I tried sees them as numeric values (min, max, sum, etc). Then again str() prints out the fourth element of column 8 as "1.55e-314" instead of "3127133583".

I posted this first here on StackOverflow but it did not generate any interest for 2 weeks, so I've linked it here as well.

@richierocks
Copy link

I've just fallen over this bug too. To reproduce:

fread("x,y\n0,\n", colClasses = list(integer64 = "y"))
##    x                   y
## 1: 0 9218868437227407266

@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Mar 15, 2015
@arunsrinivasan arunsrinivasan self-assigned this Mar 15, 2015
@pstoyanov
Copy link

Thank you (for this and all the other excellent work).

@richierocks
Copy link

Thanks for this. The fix isn't quite complete though. It works when fread correctly auto-detects the column classes, but not when it has to bump a column to integer64.

To reproduce:

fread(
  "x,y
0,12345678901234
0,
0,
0,
0,
,
,
,
,
,
,
,
,
,
,
,
12345678901234,
0,
0,
0,
0,
0,
")

In this example missing values still show as 9218868437227407266 in x but they are correctly missing in y.

@arunsrinivasan
Copy link
Member Author

Thanks, will take a look asap.

@arunsrinivasan
Copy link
Member Author

Please write back if this is still not resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants