-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Maps][File upload] Parse geojson files in chunks to avoid thread blocking #46710
Conversation
Pinging @elastic/kibana-gis |
💔 Build Failed |
Giving this a whirl locally, yay :) In addition to having % of file processed displayed, I think it could be helpful to have some additional live-updating meta info like # of features processed, perhaps broken down by # of points, lines, and polygons. Basically, more feedback for the user that shows something is actually happening (we've all stared at our % counter waiting and hoping it will increment...) and can also be of practical value. |
…ous file cancelled" (covered in separate PR) This reverts commit 0688e73.
💚 Build Succeeded |
…tle with less frequent callbacks
…king # Conflicts: # x-pack/legacy/plugins/file_upload/public/components/json_index_file_picker.js
💔 Build Failed |
💔 Build Failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing feedback, some additional suggestions to improve readability
x-pack/legacy/plugins/file_upload/public/components/json_index_file_picker.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/geo_json_clean_and_validate.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/pattern_reader.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/pattern_reader.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/pattern_reader.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/pattern_reader.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/pattern_reader.js
Outdated
Show resolved
Hide resolved
💚 Build Succeeded |
💚 Build Succeeded |
x-pack/legacy/plugins/file_upload/public/components/json_index_file_picker.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/geo_json_clean_and_validate.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/geo_json_clean_and_validate.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/geo_json_clean_and_validate.js
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more comments to tidy-up, but looking much better, thanks.
I tested this PR with following file, and Kibana crashes completely. Seems to be related to having a null geometry (?)
:5601/wzd/bundles/commons.bundle.js:239591 Uncaught (in promise) TypeError: Cannot read property 'type' of null
at :5601/wzd/bundles/commons.bundle.js:239591
at Array.map (<anonymous>)
at JsonUploadAndParse._setIndexTypes (:5601/wzd/bundles/commons.bundle.js:239589)
at JsonUploadAndParse.componentDidUpdate (:5601/wzd/bundles/commons.bundle.js:239628)
at commitLifeCycles (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:17143)
at commitAllLifeCycles (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:18530)
at HTMLUnknownElement.callCallback (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:149)
at Object.invokeGuardedCallbackDev (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:199)
at invokeGuardedCallback (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:256)
at commitRoot (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:18742)
Even if we do not handle error-messaging in the UX, we should avoid Kibana crashing completely when faulty inputs propagate. For that particular file, you would need to up the limit to 90mb iso 50mb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking really good. Just a few minor changes.
x-pack/legacy/plugins/file_upload/public/components/json_index_file_picker.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/pattern_reader.js
Outdated
Show resolved
Hide resolved
x-pack/legacy/plugins/file_upload/public/util/pattern_reader.js
Outdated
Show resolved
Hide resolved
Interesting dataset, many geometries set to null. For now, I've added code to drop null features and display a parsing error to the user. A more robust solution would likely fall under the previously linked error-handling issue. This is the best solution for now as we're not well equipped in File Upload or Maps to handle features with null geometries. There are a lot of assumptions around features having a |
It is valid GeoJSON to have a null geometry (not 100% why the spec has it
but it does...), so I don't think it needs to or should be treated as an
error to the user. It could be a useful warning I suppose, as some users
may not be aware of these features.
…On Tue, Oct 15, 2019 at 2:48 PM Aaron Caldwell ***@***.***> wrote:
I tested this PR with following file, and Kibana crashes completely. Seems
to be related to having a null geometry (?)
cpd-incidents.geojson.zip
Interesting dataset, many geometries set to null. For now, I've added code
to drop null features and display a parsing error to the user. A more
robust solution would likely fall under the previously linked
error-handling issue. This is the best solution for now as we're not well
equipped in File Upload or Maps to handle features with null geometries.
There are a lot of assumptions around features having a geometry and
associated type throughout the code that we'll need to address if we want
to include them in client results.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#46710?email_source=notifications&email_token=AAAECXP5G6UAVOVXQEIKOFLQOYF7JA5CNFSM4I22SXW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJ2IHI#issuecomment-542352413>,
or unsubscribe
<https:/notifications/unsubscribe-auth/AAAECXICBUNLTVR3S6TVQLDQOYF7JANCNFSM4I22SXWQ>
.
|
Thanks @bcamper! Agreed that ultimately we need to be prepared for anything that comes through the door passing schema validation. We're not quite there yet, but I definitely think it's where we want to get to. Prior to this PR, the behavior was to show an error and not allow the user to upload a file that contained features with any "invalid" ( So this is the first step- "Open the file in chunks and stop thread-blocking". So mostly I'm just maintaining existing functionality elsewhere (i.e.- same file limits, similar error cases, etc.). It still shows some features had "errors" as it did before, but it at least allows indexing the remaining ones. Future work, as you suggest, should definitely revisit this though and allow a diversity of warnings and errors to cover different cases! |
retest |
Makes sense, iterative improvement is good :)
…On Tue, Oct 15, 2019 at 3:36 PM Aaron Caldwell ***@***.***> wrote:
It is valid GeoJSON to have a null geometry (not 100% why the spec has it
but it does...), so I don't think it needs to or should be treated as an
error to the user. It could be a useful warning I suppose, as some users
may not be aware of these features.
Thanks @bcamper <https:/bcamper> Agreed that ultimately we
need to be prepared for anything that comes through the door passing schema
validation <#47472>. We're not
quite there yet, but I definitely think it's where we want to get to. Prior
to this PR, the behavior was to show an error and not allow the user to
upload a file that contained features with any "invalid" (null included)
geometries. Basically an "all or nothing" approach. Since this PR is the
first step in "breaking the file into chunks" and looking at it on a
feature-by-feature basis, we now have the luxury of being able to handle
features in a more tailored way.
So this is the first step- "Open the file in chunks and stop
thread-blocking". So mostly I'm just maintaining existing functionality
elsewhere (i.e.- same file limits, similar error cases, etc.), it still
shows some features had "errors" as it did before, but it at least allows
indexing the remaining ones. Future work, as you suggest, should definitely
revisit this though and allow a diversity of warnings and errors to cover
different cases!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#46710?email_source=notifications&email_token=AAAECXJ23MM3FTBYYOV43ADQOYLURA5CNFSM4I22SXW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJ6YUY#issuecomment-542370899>,
or unsubscribe
<https:/notifications/unsubscribe-auth/AAAECXKAJM5KRIP3TBK6VJLQOYLURANCNFSM4I22SXWQ>
.
|
…fileHandler return type
💔 Build Failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change. This is great that the parsing of the file no longer blocks the browser
lgtm
code review, tested in chrome
💚 Build Succeeded |
…cking (elastic#46710) * Add file parse chunking, update component on progress * Clean up clean and validate and redo to process single features * Add oboe dependency * Prevent state updates on cancel * Handle new files added mid-way through parsing another file * Fix issue where subsequent index name is wiped out when previous file cancelled * Remove unneeded oboe abort logic * Dice parsing logic up further for testing * Clean up * Revert "Fix issue where subsequent index name is wiped out when previous file cancelled" (covered in separate PR) This reverts commit 0688e73. * Update file parse test to focus on different stream states * Update clean and validate tests to reflect function input/output changes * Bump up file buffer. Simplify ui update logic, not neceesary to throttle with less frequent callbacks * Show features parsed on UI rather than percentage * Remove extra mock reset * Review feedback. Add localized feature tracking callback * Review feedback. Add comment explaining progress update throttling. Also, use debounce to throttle * Remove console log * Consolidate feature handling into one function passed to oboeStream node * Abstract oboe logic to separate class and import for use in file parser * Update file parser test to mock PatternReader import * Prevent file parse active flag from resetting if another file is in progress * Don't pass back result if no features found on complete, throw error with feedback. Add clean-up for prev PatternReader * Use singleton version of jsts reader & writer. Pass back unmodified feature if clean returns nothing * Make fileHandler function async * Return null if no geometry * Handle single features differently. Fixes functional test error * Update jest test to use unique instances & counts of readers * Review feedback * Review feedback * Review feedback. Add error-handling for null geom * Fix i18n error * Clean up handling of cancelled/replaced files to account for changed fileHandler return type
…cking (#46710) (#48306) * Add file parse chunking, update component on progress * Clean up clean and validate and redo to process single features * Add oboe dependency * Prevent state updates on cancel * Handle new files added mid-way through parsing another file * Fix issue where subsequent index name is wiped out when previous file cancelled * Remove unneeded oboe abort logic * Dice parsing logic up further for testing * Clean up * Revert "Fix issue where subsequent index name is wiped out when previous file cancelled" (covered in separate PR) This reverts commit 0688e73. * Update file parse test to focus on different stream states * Update clean and validate tests to reflect function input/output changes * Bump up file buffer. Simplify ui update logic, not neceesary to throttle with less frequent callbacks * Show features parsed on UI rather than percentage * Remove extra mock reset * Review feedback. Add localized feature tracking callback * Review feedback. Add comment explaining progress update throttling. Also, use debounce to throttle * Remove console log * Consolidate feature handling into one function passed to oboeStream node * Abstract oboe logic to separate class and import for use in file parser * Update file parser test to mock PatternReader import * Prevent file parse active flag from resetting if another file is in progress * Don't pass back result if no features found on complete, throw error with feedback. Add clean-up for prev PatternReader * Use singleton version of jsts reader & writer. Pass back unmodified feature if clean returns nothing * Make fileHandler function async * Return null if no geometry * Handle single features differently. Fixes functional test error * Update jest test to use unique instances & counts of readers * Review feedback * Review feedback * Review feedback. Add error-handling for null geom * Fix i18n error * Clean up handling of cancelled/replaced files to account for changed fileHandler return type
Resolves #40205. This PR leverages the capabilities of FileReader to:
It leverages oboejs to parse features out of the binary data stream for cleaning and validation
A couple of things to note with this PR