-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread: need more flexible behavior when encountering a broken line. #2263
Comments
the also, how would lastly, i hope |
Yes, there is no reason to remove I think it would be useful to think from the use-case perspective. What are the possible reasons to have a file with incorrect number of fields in some lines? I can think of several reasons:
Anything I missed? |
That sounds pretty thorough. I believe 6. is the most common. This falls under 4., but may be worth considering separately: More than one data base is contained in a single file, probably separated by some YAML/metadeta header internally |
So let's consider how would we tackle each of these situations. What would the ideal
|
A new parameter
bad.lines
(or similar) is proposed. This parameter adjustsfread
's strategy when dealing with lines that are "broken" (i.e. have less or more than the required number of columns). This parameter may take the following values:"error"
(default) -- stop scanning the file and raise an exception."fill"
(currently achieved withfill=TRUE
) -- any lines having too few fields are padded with NAs. Here "too few" means less than the maximum number of fields observed across all rows in the file."skip"
-- broken lines are simply ignored."extract"
-- any broken lines are placed into a separate datatable, whereas the "main" datatable retains empty rows in their place. The extra datatable will have at least the following fields:lineno
(line number in the original data file),rowno
(corresponding row number in the "main" datatable),line
(the text of the line),nfields
(number of fields detected on that line)Additionally, there should be parameter
report
(default FALSE), which is used for strategies"fill"
and"skip"
, and instructsfread
to report to the user line numbers that were filled/skipped.The text was updated successfully, but these errors were encountered: