Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bar in a bar charts are getting unsorted when adding a sub-bucket aggregation #17532

Closed
melvynator opened this issue Apr 3, 2018 · 9 comments
Labels
enhancement New value added to drive a business result Feature:Vislib Vislib chart implementation Feature:XYAxis XY-Axis charts (bar, area, line) Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@melvynator
Copy link

Kibana version: 6.2.2

Elasticsearch version: 6.2.2

Server OS version: Mac OS

Browser version: Google chrome Version 65.0.3325.162 (Official Build) (64-bit)

Browser OS version: Chrome 65 on Mac OS X 10

Original install method (e.g. download page, yum, from source, etc.): Download page

Description of the problem including expected versus actual behavior:
The problem appears when adding a sub-bucket to a bar chart. If I have this bar chart:

screen shot 2018-04-03 at 23 57 02

It's a simple terms aggregation on a specific field.

If I want to split the series using another terms aggregation the sorting will be messed up:

screen shot 2018-04-03 at 23 59 00

This visualisation is not in accordance with the elasticsearch response:

{
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "neutral",
                "doc_count": 6091
              }
            ]
          },
          "key": "nicetrybertha",
          "doc_count": 6091
        },
        {
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "positive",
                "doc_count": 5325
              }
            ]
          },
          "key": "JennaGuillaume",
          "doc_count": 5325
        },
        {
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "positive",
                "doc_count": 4626
              }
            ]
          },
          "key": "malgico",
          "doc_count": 4626
        }

It may be because of the polarity of certain bucket, but I have no mean to confirm this hypothesis.

Steps to reproduce:

  1. Build a bar chart
  2. Define a terms aggregation
  3. Split the series using another term aggregation

@thomasneirynck I would provide a dataset ASAP

@thomasneirynck thomasneirynck added bug Fixes for quality problems that affect the customer experience Feature:Visualizations Generic visualization features (in case no more specific feature label is available) triage_needed labels Apr 3, 2018
@timroes
Copy link
Contributor

timroes commented Apr 11, 2018

Thanks for the report. I am able to reproduce this with makelogs data as follows:

  • Terms on extension.raw, size: 5
    • Terms on machine.os.raw, size: 2

screenshot-20180411-143928

There could be a possible workaround for you. Can you try switching the "Order By" of the "Split Series" to "Term" (or whatever the third option is) instead of using "metric: Count". Since you have all possible 3 sentiment values within each of the author buckets, this might work.

@timroes timroes added Feature:XYAxis XY-Axis charts (bar, area, line) and removed triage_needed labels Apr 11, 2018
@melvynator
Copy link
Author

melvynator commented Apr 20, 2018

@timroes @ppisljar

Thanks for the replication.

I tried to apply the work around, it doesn't seem to fix the issue:

screen shot 2018-04-20 at 11 22 07

@ogtool
Copy link

ogtool commented May 17, 2018

Believe we have this issue as well (ES + Kibana both 6.2.3) - I lodged at https://discuss.elastic.co/t/sub-aggregation-graph-ordering-off-first-bucket-not-overall-data-set/132216 and was pointed at this existing request.

What appears to be happening is that the sorting (for the legend and each bar graph) is based on the popularity of the sub-aggs in the first bar graph. The same ordering is then applied to every single bar graph - if your data sample for the first bucket is an outlier (as I do in the screenshot below) then this leads to a pretty illogical UX for the rest of the graph. This assumption seems to be consistent across all the samples other users have provided.

The expected output (in my opinion) is that the all sub-aggregations should be considered in the whole displayed time period and then the sorting of the legend and within each bar graph should reflect all data points, not just the first buckets sub-aggs.

Some may expect that each bar in the graph is ordered based off it's own sub-aggs - maybe this is a user selectable option as I feel if you ask 100 people for their opinion, you'd get 150 different answers.

image

CCM

@timroes timroes added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Sep 16, 2018
@timroes timroes removed Feature:Visualizations Generic visualization features (in case no more specific feature label is available) labels Oct 1, 2018
@willphillips-armedia
Copy link

Hey, this is still an open issue in 6.4.2. Anyone know if any progress has been made on this? This is seriously impacting our users. #25687

@lukeelmers
Copy link
Member

An exploratory PR (#27723) was opened to investigate this, and here are our findings after much discussion (some of which is captured in the PR comments).

There are really two things that are causing confusion:

  1. x-axis values are not ordered consistently when dealing with split series visualizations. I think this was the original intent the issue @willphillips-armedia created in Sorting of X-Axis is incorrect when "Split Series" is also enabled #25687. This affects all point series charts.
  2. The ordering of subbuckets within stacked bar charts is determined based on the order of the first agg result that comes back, and that order is applied to all subsequent buckets. This may lead to surprising behavior when using stacked bars, like if your first result is an outlier and affects the sorting of all other bars (+ the legend). This is essentially the problem @ogtool is describing above.

I'm not sure if @melvynator intended for this issue to address item 1, item 2, or both, but for the time being I opened #31534 to track item 1 separately. (A solution for that issue is already in progress)

As for item 2, there are a lot of things to consider:

  1. Kibana has, as far as we are able to tell, always behaved this way. It's a result of the underlying data schema we are using to pass around series data: Preserve x-axis ordering in split series vis. #27723 (comment)
  2. The sorting is primarily an issue when you are using stacked split series bar charts. When using non-stacked charts, I think the current sorting behavior (where subbucket order is consistent across each X value) is probably what most users would expect.
  3. In order to introduce this requested functionality to Kibana, we would need to:
    1. Redesign the schema we are using for vislib / point series data and roll it out across vislib.
    2. Determine how we should actually sort subbuckets. (As @ogtool rightly points out, there could be a lot of opinions on how this should be handled). e.g. Is is based on all of the data that is currently visible? Is it sorted individually for each x value? How should the legend be ordered?

Since this change would be a team effort that requires overhauling large portions of vislib, I'm going to keep this issue open for now in order to keep tabs on item 2 until we can set aside some time to tackle it properly.

The good news is that item 1 should be fixed any day now, and we have a much clearer picture of the effort that would be involved to make item 2 happen.

@lukeelmers lukeelmers added enhancement New value added to drive a business result Feature:Vislib Vislib chart implementation and removed app-stabilizing bug Fixes for quality problems that affect the customer experience labels Feb 20, 2019
@larrywongl
Copy link

Any update on this issue? I have met the same issue.

Capture

@Knksumanth
Copy link

Any update on when this issue will be addressed and which version can we expect this to be fixed?

@lukeelmers lukeelmers removed their assignment May 14, 2020
@formiaczek
Copy link

Hello,

This is to also 👍 and follow up from another discussion and it's conclusions on this subject made here.

To summarize: issue can be reproduced with the Kibana demo here

As @lukeelmers pointed out:

The ordering of subbuckets (...) is determined based on the order of the first agg result that comes back, and that order is applied to all subsequent buckets.

Nested buckets allow for 'groping' the data by some criteria, the most-inner bucket being a 'sort-of' result meant to present / make sense of the data.

Currently it is possible to:

  • re-arrange the order of buckets (and it is useful and affects the way data is being processed in subsequent buckets)
  • for each bucket (for most of the aggregations) there is 'OrderBy' and 'Size'. Because it is 'for a given bucket' - it really seems that this should be applied within the bucket (and not some outer/parent bucket).
  • for each bucket the pair: 'OrderBy' and 'Size X' - seem like they should produce 'Top' (or 'Bottom' depending on 'Ascending' vs 'Descending') X items from the resulting bucket (and not necessarily from an outer bucket that likely contains other, unrelated items). There is an issue with this too (see linked discussion) whereby items expected to fall within the 'Top/Bottom X' range might disappear from the bucket if they won't fall into this range in root bucket (and this is very likely if Size is small enough). Given this unpredictability - the only safe way to currently use 'Size' it to set it to a 'big enough' value so that expected results are not discarded..

I think that if 'Order by' that is set against a particular bucket is not applied within this bucket,
and because in some use-cases original 'order' of first aggregation (current behaviour) might actually be desired, to preserve the 'existing' behaviour and address issues that arrise(d), perhaps either could be considered:

  • 'OrderBy' SHOULD NOT be available and accessible for sub-buckes at all (perhaps it could appear in the 'root/parent' bucket (upon adding sub-buckets) instead. This would make it unambiguously clear where & when it is applied in the processing,
  • Don't 'hard-code' it like now and allow for more control: by making the 'OrderBy' have additional option about WHERE it is applied, e.g.:
  1. 'Apply in current bucket' (this is what me, and others have expected and really need contributing to this and related discussions) AND
    'Apply to most outer (root) bucket': (this is current 'default', could stay 'default' to avoid breaking other things)
  2. More 'generic' version of the above: 'Apply to 'XX' bucket' where XX is 'current', 'root' or any of the buckets on the path from 'current' to 'root' (again 'root' being default if nothing is selected/pointed out to preserve current behaviour.

When it is possible to control and apply the 'SortBy' to a selected bucket - this 'Size' related issue will also get addressed.

When someone really depends on a deterministic behaviour, being able to control aggregation results in regards to sorting is something that is really needed.
Unfortunately there is currently NO way to achieve this. And there is an impression like there was because it 'sometimes' works like that. And 'sometimes' though is not really good enough.. ELK is and should be a solid stack, it is so powerful and useful that it seems surprising that it doesn't already cope with it, especially when it's being raised for many years now.. I think it is really time to fix / update this now instead of 'bouncing' it back again: every reply made to comments on that now gets more and more links to other discussions (spanning for many years), and every reply from ELK Team seems to say 'yeah, maybe someday, not now.. you see? this has been like that forever and we can't change it now'

Sorry if I sound sarcastic - this is not the intention, really - I'm just trying to point out that I really care, and know it could be done better. I can even spare my time to help if possible / needed - just let me know!

Thanks!
Lukasz.

@markov00
Copy link
Member

Closing this because the 6.x version is not anymore under maintenance. Please upgrade to the latest 8 version

@markov00 markov00 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Vislib Vislib chart implementation Feature:XYAxis XY-Axis charts (bar, area, line) Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests