Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve x-axis ordering in split series vis. #27723

Closed
wants to merge 8 commits into from

Conversation

lukeelmers
Copy link
Member

@lukeelmers lukeelmers commented Dec 21, 2018

Closed in favor of #31533

Resolves part of #17532

Summary

This resolves an issue where the original sort order sent back by ES was
lost for point series / vislib visualizations with split series. This
was due to the way the point series agg response handler generated
series data, only filling in series values as it encountered them
bucket-by-bucket, rather than first looking at all x-values and ordering
them consistently within each series.

With this change, when a series is first created in the agg_response, it
will first look at all results, preserving the x-value sort order. Then
when creating new series, it will instantiate a zero-filled array with
the correctly ordered x axis values, filling it in with the real values
as it encounters them.

This duplicates some of the work done in the vislib zero_injection
component, which can likely be cleaned up further, or possibly removed
entirely.

To Do

- [ ] Determine if vislib/components/zero_injection can be removed (Edit: I think we should look at this in a separate PR to keep things smaller and simpler... plus I want to take additional time for testing should we remove this).

Checklist

- [ ] This was checked for cross-browser compatibility, including a check against IE11
- [ ] Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
- [ ] Documentation was added for features that require explanation or tutorials
- [ ] This was checked for keyboard-only and screenreader accessibility

@lukeelmers lukeelmers added review Feature:Vislib Vislib chart implementation WIP Work in progress Feature:Visualizations Generic visualization features (in case no more specific feature label is available) v7.0.0 Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Dec 21, 2018
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

This resolves an issue where the original sort order sent back by ES was
lost for point series / vislib visualizations with split series. This
was due to the way the point series agg response handler generated
series data, only filling in series values as it encountered them
bucket-by-bucket, rather than first looking at all x-values and ordering
them consistently within each series.

With this change, when a series is first created in the `agg_response`, it
will first look at all results, preserving the x-value sort order. Then
when creating new series, it will instantiate a zero-filled array with
the correctly ordered x axis values, filling it in with the real values
as it encounters them.

This duplicates some of the work done in the vislib `zero_injection`
component, which can likely be cleaned up further, or possibly removed
entirely.
@lukeelmers lukeelmers changed the title [WIP] Preserve x-axis ordering in split series vis. Preserve x-axis ordering in split series vis. Jan 23, 2019
@lukeelmers lukeelmers removed the WIP Work in progress label Jan 23, 2019
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested it in chrome linux

src/ui/public/agg_response/point_series/_get_series.js Outdated Show resolved Hide resolved
@ppisljar

This comment has been minimized.

Copy link
Member

@markov00 markov00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me block this PR so we can discuss this a bit:
Are we fixing only the first level of ordering (fixing only the order of the main columns on the x axis)?

Because the issue #17532 is not only related to the ordering of the bars on the x axis, but is related also to the ordering of the splitted series for each bar (as some linked issues report that.

You can easily see that if you use the following:

  • first x axis by extensions terms
  • split series by machine os.

Now try to change the ordering of the split series aggregation (ascending and descending) and check the tooltips values: seems that the series are ordered by series and they are not respecting the ordering coming from ES.
On the inspector table you can easily see the right results: but the visualization just insert points based on series orders not on data order.

@lukeelmers
Copy link
Member Author

After further investigation with @markov00, we confirmed that this PR does indeed only solve part of the problem: While the x-axis will be ordered correctly, subbuckets will still be sorted based on the results of the first agg.

TL;DR: I recommend we merge this PR as it still solves one use case, and open a new PR for the second.

To reiterate Marco's point, take the following example using kibana_sample_data_logs. Here's the aggregation config:

  • X-Axis terms agg bucket (field.extension.keyword), sorted alphabetically
  • split series terms agg on machine.os.keyword

Here is an excerpt of the ES response:

      "aggregations": {
        "2": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 218,
          "buckets": [
            {
              "3": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 129,
                "buckets": [
                  {
                    "key": "win 7",
                    "doc_count": 114
                  },
                  {
                    "key": "ios",
                    "doc_count": 117
                  },
                  {
                    "key": "osx",
                    "doc_count": 125
                  },
                  {
                    "key": "win 8",
                    "doc_count": 125
                  }
                ]
              },
              "key": "",
              "doc_count": 610
            },
            {
              "3": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 62,
                "buckets": [
                  {
                    "key": "win 8",
                    "doc_count": 43
                  },
                  {
                    "key": "win 7",
                    "doc_count": 47
                  },
                  {
                    "key": "osx",
                    "doc_count": 51
                  },
                  {
                    "key": "ios",
                    "doc_count": 54
                  }
                ]
              },
              "key": "css",
              "doc_count": 257
            },
            {
              "3": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 44,
                "buckets": [
                  {
                    "key": "win 7",
                    "doc_count": 29
                  },
                  {
                    "key": "win 8",
                    "doc_count": 32
                  },
                  {
                    "key": "win xp",
                    "doc_count": 33
                  },
                  {
                    "key": "ios",
                    "doc_count": 34
                  }
                ]
              },
              "key": "deb",
              "doc_count": 172
            },
            ...

This PR will ensure the order is correct for the first bucket:

["", "css", "deb", ...]

However, the subbuckets will all be ordered based on the first result only:

["win 7", "ios", "osx", "win 8", ...]

This issue is described in deeper detail in the comments on the original issue, and in some of the duplicate issues. I was focused on solving for the x-axes and missed the second use case.

Solving for the subbucket ordering is more complex as it requires reworking our fundamental structure for passing around series data; currently the data is passed to vislib like this:

[
  {label: "win 7", aggLabel: "Count", aggId: "1", count: 0, values: Array(5)},
  {label: "ios", aggLabel: "Count", aggId: "1", count: 0, values: Array(5)},
  {label: "osx", aggLabel: "Count", aggId: "1", count: 0, values: Array(5)},
  {label: "win 8", aggLabel: "Count", aggId: "1", count: 0, values: Array(5)},
  {label: "win xp", aggLabel: "Count", aggId: "1", count: 0, values: Array(5)}
]

values[] contain the point data for each of the (ordered) items in the x-axis

As you can see, there is no concept of ordering series items within each x-axis bucket, as they only exist once at the outer level.

Solving for the ordering of subbuckets would require a few things:

  1. We would need to decide what is the "source of truth" for ordering the legend. Is this ordered based on the results for the current window of data you are looking at (as was requested in the original issue)? What if you're using a custom metric to order that data?
  2. We would need to rethink the way we are passing data to the charts, such that we could introduce a mechanism to track both overall ordering of the subbuckets, as well as ordering within each individual point on the x-axis.

Since this PR still solves one valid use case and is separate from the subbucket ordering issue, I recommend we merge this and open a new PR for the second use case.

@elasticmachine

This comment has been minimized.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution (probably) has another issue. We are zero filling all the series, which means we are actually changing the results. It will not have any effect on the bar chart, but area and line charts will look different. I don't think there is an easy way around it, as we have no way to differentiate between zero filled values and actual zero values.

A quick overview of where things go wrong:

  • we correctly convert table (tabify) to preserve all the orders
  • we correctly create the series: series are ordered by the order they appeared in the response, and their values are ordered by the order they appeared in the response.

But when the chart tries to draw this, it would render one series a time, in the order they appeared and always render all values for each series. This is what produces wrong x-axis.

One way to fix above, would be to give information about the x axis, in the form of ordered array of x-axis values. Chart can then use that to make sure x-axis order is preserved.

However this still doesn't solve the issue @markov00 is mentioning, but i would argue that is not really an issue. Our charts were designed to behave that way. In most scenarios it makes sense:

  1. you do any chart with non stacked split series ... when the series are not stacked you might expect them to always show in the same order. For example charts like this:
    column-chart-excel

order of series is always the same, no matter their value. it would be confusing if the red bar would be jumping left and right.

  1. you do stacked area or line. You will always want the order of series to stay the same, no matter the values ... you don't want the zig-zag lines just because the order of series changed between data points.

so the only use case where this order doesn't make that much sense (it still might in some scenarios) is in a stacked bar chart.

I suggest leaving this out of this PR, opening a feature request for it and referencing it in original issue.

@markov00
Copy link
Member

markov00 commented Feb 14, 2019

@ppisljar the zigzag thing only depends on how you order the splitted series. Since we provide the user the ability to change the Order by option, he can decide if it's better to have the subbuckets orderd by metric (that can make the zigzag thing but can be used to compare behaviours on each bucket) or you can order alphabetically (that preserve the bucket orders and dont create the zig zag thing).
Thing is that we are not respecting the split series order by. It's neither alphabetical or by value, it's by first come first served, or better we just preserve the order of elements on the first sub bucket, appending any other new bucket value on the end of this list. On Luke's example you see that the first bucket is

 {
                    "key": "win 7",
                    "doc_count": 114
                  },
                  {
                    "key": "ios",
                    "doc_count": 117
                  },
                  {
                    "key": "osx",
                    "doc_count": 125
                  },
                  {
                    "key": "win 8",
                    "doc_count": 125

that's the order we maintain throughout the visualization. when we find win xp we just add it to the end of the ordering list creating something like:
win 7, ios, osx, win 8, win xp that doesn't have any predictable ordering, is not alphabetical, is not by value.

So in conclusion: yes mine is an issue: we are not taking in consideration the split series order by.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@ppisljar
Copy link
Member

@markov00 and we never were. But not really relevant, it shouldn't be part of this PR, its gonna be quite a big undertaking. as discussed yesterday over zoom, the problem is that the data structure we use to respresent chart data (series) doesn't hold the information about the ordering of points.

@lukeelmers
Copy link
Member Author

I'm closing this in favor of #31533, which will address the issue as follows:

  • x value ordering is still determined when the series are generated, and is then passed to vislib
  • we ensure that the correct order is preserved during the zero injection process, that way we aren't zero filling everything by default.

we have no way to differentiate between zero filled values and actual zero values.

@ppisljar Just a note that I think we can check for the presence of an xi key in the series value to determine if it is zero-filled (xi: Infinity is set on all zero-filled items and is not present on "real" values). But regardless, I think the plan described above is simpler as it doesn't touch as many things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Vislib Vislib chart implementation Feature:Visualizations Generic visualization features (in case no more specific feature label is available) review Team:Visualizations Visualization editors, elastic-charts and infrastructure v6.7.0 v7.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants