Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Maps][File upload] Parse geojson files in chunks to avoid thread blocking #46710

Merged
merged 43 commits into from
Oct 15, 2019

Conversation

kindsun
Copy link
Contributor

@kindsun kindsun commented Sep 26, 2019

Resolves #40205. This PR leverages the capabilities of FileReader to:

  • Read the file into a binary format
  • Give progress indication after each chunk has been read

It leverages oboejs to parse features out of the binary data stream for cleaning and validation

A couple of things to note with this PR

  • We aren't currently set up in the Maps app to add layers incrementally in quick succession. In other words, there's no easy way for us to tell if a layer has been added to mapbox-gl and then add more data to it, but it's definitely possible in the near-future. This and a few other ideas have been captured in [Maps][File upload] Tracking issue for GeoJSON Upload Optimizations #46376. In the meantime, we are able to build the parsed file incrementally before display, which does work quite a bit better then how it works currently on master
  • There are a number of different sax-style parsers available. oboejs is one of the stronger options in my opinion, but we're not by any means "locked in". The actual chunking logic is performed by the FileReader instance. Any future option that might replace oboejs would just have to be able to read binary chunks and identify text patterns in a similar way.

@kindsun kindsun added WIP Work in progress [Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation v8.0.0 Feature:File Upload v7.5.0 labels Sep 26, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-gis

@elasticmachine
Copy link
Contributor

💔 Build Failed

@bcamper
Copy link

bcamper commented Oct 2, 2019

Giving this a whirl locally, yay :)

In addition to having % of file processed displayed, I think it could be helpful to have some additional live-updating meta info like # of features processed, perhaps broken down by # of points, lines, and polygons. Basically, more feedback for the user that shows something is actually happening (we've all stared at our % counter waiting and hoping it will increment...) and can also be of practical value.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Aaron Caldwell added 3 commits October 7, 2019 11:20
…king

# Conflicts:
#	x-pack/legacy/plugins/file_upload/public/components/json_index_file_picker.js
@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

Copy link
Contributor

@thomasneirynck thomasneirynck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing feedback, some additional suggestions to improve readability

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Copy link
Contributor

@thomasneirynck thomasneirynck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more comments to tidy-up, but looking much better, thanks.

I tested this PR with following file, and Kibana crashes completely. Seems to be related to having a null geometry (?)

cpd-incidents.geojson.zip

:5601/wzd/bundles/commons.bundle.js:239591 Uncaught (in promise) TypeError: Cannot read property 'type' of null
    at :5601/wzd/bundles/commons.bundle.js:239591
    at Array.map (<anonymous>)
    at JsonUploadAndParse._setIndexTypes (:5601/wzd/bundles/commons.bundle.js:239589)
    at JsonUploadAndParse.componentDidUpdate (:5601/wzd/bundles/commons.bundle.js:239628)
    at commitLifeCycles (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:17143)
    at commitAllLifeCycles (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:18530)
    at HTMLUnknownElement.callCallback (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:149)
    at Object.invokeGuardedCallbackDev (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:199)
    at invokeGuardedCallback (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:256)
    at commitRoot (webpack://%5Bname%5D/./node_modules/react-dom/cjs/react-dom.development.js?:18742)

Even if we do not handle error-messaging in the UX, we should avoid Kibana crashing completely when faulty inputs propagate. For that particular file, you would need to up the limit to 90mb iso 50mb.

Copy link
Contributor

@nreese nreese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really good. Just a few minor changes.

@kindsun
Copy link
Contributor Author

kindsun commented Oct 15, 2019

I tested this PR with following file, and Kibana crashes completely. Seems to be related to having a null geometry (?)

cpd-incidents.geojson.zip

Interesting dataset, many geometries set to null. For now, I've added code to drop null features and display a parsing error to the user. A more robust solution would likely fall under the previously linked error-handling issue. This is the best solution for now as we're not well equipped in File Upload or Maps to handle features with null geometries. There are a lot of assumptions around features having a geometry and associated type throughout the code that we'll need to address if we want to include them in client results.

@bcamper
Copy link

bcamper commented Oct 15, 2019 via email

@kindsun
Copy link
Contributor Author

kindsun commented Oct 15, 2019

It is valid GeoJSON to have a null geometry (not 100% why the spec has it
but it does...), so I don't think it needs to or should be treated as an
error to the user. It could be a useful warning I suppose, as some users
may not be aware of these features.

Thanks @bcamper! Agreed that ultimately we need to be prepared for anything that comes through the door passing schema validation. We're not quite there yet, but I definitely think it's where we want to get to. Prior to this PR, the behavior was to show an error and not allow the user to upload a file that contained features with any "invalid" (null included) geometries. Basically an "all or nothing" approach. Since this PR is the first step in "breaking the file into chunks" and looking at it on a feature-by-feature basis, we now have the luxury of being able to handle features in a more tailored way.

So this is the first step- "Open the file in chunks and stop thread-blocking". So mostly I'm just maintaining existing functionality elsewhere (i.e.- same file limits, similar error cases, etc.). It still shows some features had "errors" as it did before, but it at least allows indexing the remaining ones. Future work, as you suggest, should definitely revisit this though and allow a diversity of warnings and errors to cover different cases!

@kindsun
Copy link
Contributor Author

kindsun commented Oct 15, 2019

retest

@bcamper
Copy link

bcamper commented Oct 15, 2019 via email

@elasticmachine
Copy link
Contributor

💔 Build Failed

Copy link
Contributor

@nreese nreese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change. This is great that the parsing of the file no longer blocks the browser

lgtm
code review, tested in chrome

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@kindsun kindsun merged commit dc7bf3d into elastic:master Oct 15, 2019
kindsun pushed a commit to kindsun/kibana that referenced this pull request Oct 15, 2019
…cking (elastic#46710)

* Add file parse chunking, update component on progress

* Clean up clean and validate and redo to process single features

* Add oboe dependency

* Prevent state updates on cancel

* Handle new files added mid-way through parsing another file

* Fix issue where subsequent index name is wiped out when previous file cancelled

* Remove unneeded oboe abort logic

* Dice parsing logic up further for testing

* Clean up

* Revert "Fix issue where subsequent index name is wiped out when previous file cancelled" (covered in separate PR)

This reverts commit 0688e73.

* Update file parse test to focus on different stream states

* Update clean and validate tests to reflect function input/output changes

* Bump up file buffer. Simplify ui update logic, not neceesary to throttle with less frequent callbacks

* Show features parsed on UI rather than percentage

* Remove extra mock reset

* Review feedback. Add localized feature tracking callback

* Review feedback. Add comment explaining progress update throttling. Also, use debounce to throttle

* Remove console log

* Consolidate feature handling into one function passed to oboeStream node

* Abstract oboe logic to separate class and import for use in file parser

* Update file parser test to mock PatternReader import

* Prevent file parse active flag from resetting if another file is in progress

* Don't pass back result if no features found on complete, throw error with feedback. Add clean-up for prev PatternReader

* Use singleton version of jsts reader & writer. Pass back unmodified feature if clean returns nothing

* Make fileHandler function async

* Return null if no geometry

* Handle single features differently. Fixes functional test error

* Update jest test to use unique instances & counts of readers

* Review feedback

* Review feedback

* Review feedback. Add error-handling for null geom

* Fix i18n error

* Clean up handling of cancelled/replaced files to account for changed fileHandler return type
kindsun pushed a commit that referenced this pull request Oct 16, 2019
…cking (#46710) (#48306)

* Add file parse chunking, update component on progress

* Clean up clean and validate and redo to process single features

* Add oboe dependency

* Prevent state updates on cancel

* Handle new files added mid-way through parsing another file

* Fix issue where subsequent index name is wiped out when previous file cancelled

* Remove unneeded oboe abort logic

* Dice parsing logic up further for testing

* Clean up

* Revert "Fix issue where subsequent index name is wiped out when previous file cancelled" (covered in separate PR)

This reverts commit 0688e73.

* Update file parse test to focus on different stream states

* Update clean and validate tests to reflect function input/output changes

* Bump up file buffer. Simplify ui update logic, not neceesary to throttle with less frequent callbacks

* Show features parsed on UI rather than percentage

* Remove extra mock reset

* Review feedback. Add localized feature tracking callback

* Review feedback. Add comment explaining progress update throttling. Also, use debounce to throttle

* Remove console log

* Consolidate feature handling into one function passed to oboeStream node

* Abstract oboe logic to separate class and import for use in file parser

* Update file parser test to mock PatternReader import

* Prevent file parse active flag from resetting if another file is in progress

* Don't pass back result if no features found on complete, throw error with feedback. Add clean-up for prev PatternReader

* Use singleton version of jsts reader & writer. Pass back unmodified feature if clean returns nothing

* Make fileHandler function async

* Return null if no geometry

* Handle single features differently. Fixes functional test error

* Update jest test to use unique instances & counts of readers

* Review feedback

* Review feedback

* Review feedback. Add error-handling for null geom

* Fix i18n error

* Clean up handling of cancelled/replaced files to account for changed fileHandler return type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[File upload][Maps] If possible, use non-blocking option to parse JSON files
5 participants