Remove byte-order mark from JSON stream and first CSV chunk #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When validating JSON with a readable stream, the presence of a byte-order mark causes a parser error.
When validating CSV, an initial BOM will be part of the first column name. If the first column name is also quoted, this means that the value will not be recognized as quoted. This causes the first column to not match any of the expected column names.
Solution
When reading the first chunk of a JSON stream, check for the presence of a byte-order mark. If it is present, remove it.
When reading the first chunk of CSV, check for the presence of a byte-order mark. If it is present, remove it.
Test Plan
A test case that uses a JSON file encoded as utf-8 with BOM is added to test this change.
A test case that uses CSV file encoded as utf-8 with BOM is added to test this change. The first column name in the file is quoted.
EDIT: Updated 2024/08/16 with information about the CSV changes (984cb6d)