Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Create mathColumn function to improve performance #101908

Merged
merged 6 commits into from
Jun 16, 2021

Conversation

wylieconlon
Copy link
Contributor

This is one of the followups needed to improve Lens formula performance. We found unacceptably slow performance when using mapColumn + math in combination, where the total execution time for common cases was several seconds long. By combining this into a single mathColumn function we are able to get consistent performance.

Checklist

Delete any items that are not applicable to this PR.

@wylieconlon wylieconlon added Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) Team:Visualizations Visualization editors, elastic-charts and infrastructure v8.0.0 Team:AppServices release_note:skip Skip the PR/issue when compiling release notes Feature:Lens v7.14.0 labels Jun 10, 2021
@wylieconlon wylieconlon requested a review from a team June 10, 2021 14:43
@wylieconlon wylieconlon requested a review from a team as a code owner June 10, 2021 14:43
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

throw new Error('ID must be unique');
}

const newRows = input.rows.map((row) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have this processed in async chunks, in order to give the thread some time to run some small tasks here and there if very big tables are passed.
Lodash exposes a chunks utility for this. What do you think?

: [],
},
],
expression: [currentColumn.references.length ? `"${currentColumn.references[0]}"` : ``],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is causing the failing test with an empty formula which is annoying - I tried to fix it using staticColumn, but it's missing separate id/name params. We could add those, not sure whether there's a more elegant solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best solution I've found is to use mapColumn with an empty expression.

const newRows = input.rows.map((row) => {
return {
...row,
[args.id]: math.fn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still calling math separately for each row which causes the tinymath parser to run many times. If there are a lot of rows, this becomes relevant to performance (~4k rows with a very simple formula - can get worse when multiple math contexts are used for column wise calculations):
Screenshot 2021-06-14 at 11 00 15

I propose we cache the ast by not calling evaluate, but parse, then interpret. This can be done either by pulling the math logic into this expression function so we can simply call parse once, then interpret for every row, or by using memoize-one in the math function on the parse call.

Can be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the memoization to tinymath in this PR as it definitely improves the overall speed.

Comment on lines +84 to +87
{
expression: args.expression,
onError: args.onError,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object could be declared on top and reused over and over. Just saving some memory.

I've made also an experiment reusing the same table "template" above, but in terms of performance results were negligible for a medium size table, so not worth the hack.

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure about the memoization of the parsing, could you check?

@@ -23,7 +24,7 @@ function parse(input, options) {
}

try {
return parseFn(input, options);
return memoizeOne(parseFn)(input, options);
Copy link
Contributor

@flash1293 flash1293 Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't test (I can do tomorrow), but is this actually memoizing? Looking at the memoize-one source code, it seems like memoizeOne itself is not memoized on the passed-in function so it would create a new memoization closure on each call without actually ever hitting the cache.

Looks like the memoizeOne call should be moved outside of the parse function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right, the memoizeOne function returns a instance each time!

@wylieconlon
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
expressions 156 158 +2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
expressions 1469 1495 +26

Any counts in public APIs

Total count of every any typed public API. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats any for more detailed information.

id before after diff
expressions 57 58 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
canvas 1.3MB 1.3MB +719.0B
lens 1.5MB 1.5MB -59.0B
total +660.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
expressions 202.8KB 205.9KB +3.1KB
Unknown metric groups

API count

id before after diff
expressions 1896 1922 +26

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and parse time as well as overall time spent with doing math went down significantly, LGTM.

There are still optimizations we can do e.g. not having "pass through" mathColumn calls on the root and for things like moving_average(count()) (right now there's a math call to copy the count metric, then the moving_average call, then a math call for copying the moving average result into the final column - we only need the moving_average call). But let's do those separately

@wylieconlon wylieconlon added the auto-backport Deprecated - use backport:version if exact versions are needed label Jun 16, 2021
@wylieconlon wylieconlon merged commit bdc8740 into elastic:master Jun 16, 2021
@wylieconlon wylieconlon deleted the lens/formula-performance branch June 16, 2021 14:35
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jun 16, 2021
)

* [Lens] Create mathColumn function to improve performance

* Fix empty formula case

* Fix tinymath memoization

Co-authored-by: Kibana Machine <[email protected]>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Jun 16, 2021
…102356)

* [Lens] Create mathColumn function to improve performance

* Fix empty formula case

* Fix tinymath memoization

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Wylie Conlon <[email protected]>
@clintandrewhall
Copy link
Contributor

@wylieconlon This PR has broken Canvas Storybook... investigating why with @spalger.

Screen Shot 2021-06-17 at 1 40 56 PM

@clintandrewhall
Copy link
Contributor

We have a fix-- it originated in the webpack.config of kbn-storybook. I'll include a fix in #101962

clintandrewhall added a commit to clintandrewhall/kibana that referenced this pull request Jun 17, 2021
majagrubic pushed a commit to majagrubic/kibana that referenced this pull request Jun 18, 2021
)

* [Lens] Create mathColumn function to improve performance

* Fix empty formula case

* Fix tinymath memoization

Co-authored-by: Kibana Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) Feature:Lens release_note:skip Skip the PR/issue when compiling release notes Team:Visualizations Visualization editors, elastic-charts and infrastructure v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants