Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty rows at end of CVR #455

Closed
HEdingfield opened this issue Jun 18, 2020 · 1 comment
Closed

Handle empty rows at end of CVR #455

HEdingfield opened this issue Jun 18, 2020 · 1 comment

Comments

@HEdingfield
Copy link
Contributor

See discussion in email thread "Test" started on 2020-06-09 (it also contains a test file to repro the error). When attempting to tabulate a CVR with blank lines at the bottom, the log shows:

2020-06-09 17:00:45 EDT SEVERE: Cast vote record identifier not found for: ballots_for_Multi-Pass_IRV_flowchart.xlsx-16
2020-06-09 17:00:45 EDT SEVERE: Data format error while parsing source file: C:\Users\redacted\Documents\RCV\Bright Spots\Tests\Multi-pass IRV\ballots for Multi-Pass IRV flowchart.xlsx
2020-06-09 17:00:45 EDT INFO: See the log for details.
2020-06-09 17:00:45 EDT SEVERE: Parsing cast vote records failed!
2020-06-09 17:00:45 EDT SEVERE: Aborting tabulation due to cast vote record errors!

@moldover comment:

George, this xlsx seems to have some data after the end of the cvr data. What happens is, all 15 rows are read into memory. Then the xlsx parser finds another row. This row does not have the expected cvr id data (or any data at all), but this cvr id check is the first one to fail. So I poked at the file, deleted some rows to no avail. Then I copy-pasted only the text cvr data into a new .xlsx, and this would tabulate.
We can add this scenario to the error message. We could also be tolerant of empty rows, and maybe make this a warning. Any idea what might have happened?

Another @moldover comment:

I am thinking about this more, and I hesitate to add any exceptions to the data format validation, because it might allow other problems to creep in. If people are copy-pasting various things into cvr files, and this results in weird blank lines... well, that should probably be an error, and require the user to be more deliberate when creating a cvr file.

Here is an improved error message - something like this at least could avoid wasted time identifying the issue:

2020-06-11 09:28:24 PDT SEVERE: Cast vote record identifier missing on row 17 in file ballots_for_Multi-Pass_IRV_flowchart.xlsx. This may be due to an incorrectly formatted xlsx file.

@tarheel comment:

I would be more on board with that approach if this arose from a clear mistake by the user or if there were an obvious remedy for it. But it seems like when this happens, the only way to get around it is to figure out that you have to copy and paste your data into a whole new file. I agree with the general principle about not trying to work around user errors... but in this case I'm having trouble thinking of a practical downside to making the tabulator always ignore blank lines at the end of CVR files.

Or, at a minimum, the tabulator could just identify this specific case and the error message could tell the user exactly what to do to fix it.

My thoughts: I've found it to be the case that sometimes file reader APIs have an option specifically for ignoring blank lines at the end. Maybe our existing library has this and it's as simple as flipping the bit for it?

@moldover
Copy link
Contributor

Error message added. Example output:

2020-09-23 19:57:18 PDT INFO: Reading ES&S cast vote record file: /Users/Jon/Downloads/test/cvrs.xlsx...
2020-09-23 19:57:18 PDT SEVERE: Cast vote record identifier missing on row 17 in file cvrs.xlsx. This may be due to an incorrectly formatted xlsx file. Try copying your cvr data into a new xlsx file to fix this.
2020-09-23 19:57:18 PDT SEVERE: Data format error while parsing source file: /Users/Jon/Downloads/test/cvrs.xlsx
2020-09-23 19:57:18 PDT INFO: See the log for details.
2020-09-23 19:57:18 PDT SEVERE: Parsing cast vote records failed!
2020-09-23 19:57:18 PDT SEVERE: Aborting tabulation due to cast vote record errors!
2020-09-23 19:57:18 PDT INFO: Tabulation session completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants