Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new rate aggregation #60674

Closed
imotov opened this issue Aug 4, 2020 · 11 comments · Fixed by #61369
Closed

Add new rate aggregation #60674

imotov opened this issue Aug 4, 2020 · 11 comments · Fixed by #61369
Assignees
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@imotov
Copy link
Contributor

imotov commented Aug 4, 2020

We have received several requests to add "rate" functionality or aggregation: how many docs/s were there per bucket in a date_histogram? We have decided to implement this as a special metric aggregation with a scope limited to only date_histogram aggregation at the moment, in other words, the rate aggregation will have to be a descendant of a date_histgram and in the case of nested histograms the closest ancestor will be used to determine the rate.

POST /sales/_search
{
  "aggs": {
    "sales": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "day"
      },
      "aggs": {
        "the_rate": {
          "rate": {
            "unit": "second" 
          } 
        }
      }
    }
  }
}

{
  ...
  "aggregations": {
    "sales": {
      "buckets": [
        {
          "key_as_string": "2020-07-29",
          "doc_count": 300000,
          "the_rate": {
            "rate": 3.47222222222,
            "rate_as_string": "3.47222222222/s"
          }
        },
       ...
      ]
    }
  }
}

By default the number of documents in the bucket will be used to calculate the rate, but it will be also possible to specify a numeric field to use sum of all values of the field to calculate the rate:

      "aggs": {
        "the_rate": {
          "rate": {
            "field": "num_of_requests" 
            "unit": "second" 
          } 
        }
      }

We could also add support for "accumulative" : true flag to address #60619 in a future iteration.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 4, 2020
@wylieconlon
Copy link

I'm not sure that it's a good idea to combine this function, which is the "average event rate", and the other function "positive rate". The main reason is that the "event rate" calculation is much simpler.

@imotov
Copy link
Contributor Author

imotov commented Aug 10, 2020

The date histogram can support fixed, calendar and legacy intervals. These interval need to work with the rate calculator used by the rate aggregation. In order to remove ambiguity I would like to propose the following limitations:

  1. If the date histogram is defined using fixed_interval, the unit parameter in the aggregation will be treated as a fixed interval and only values allowed in the fixed interval parameters in a date histogram will be supported. The actual rate division ratio will be calculated based by converting both intervals into milliseconds during parse time.
  2. If the date histogram is defined using calendar_interval the unit parameter will be treated as calendar intervals. Some rate ratios will be calculated during parse time but some will have to be calculated during query time (QT) based on the actual size of the buckets as following:
minute hour day week month quarter year
minute 1 60 1440 10080 QT QT QT
hour 1/60 1 24 168 QT QT QT
day 1/1440 1/24 1 7 QT QT QT
week 1/10080 1/1440 1/60 1 QT QT QT
month N/A N/A N/A N/A 1 3 12
quarter N/A N/A N/A N/A 1/3 1 4
year N/A N/A N/A N/A 1/12 1/4 1
  1. If the date histogram is defined using legacy interval the rate histogram will throw an exception.

@dgieselaar
Copy link
Member

Will this support histogram fields as well? ie, will it use .counts?

@dgieselaar
Copy link
Member

dgieselaar commented Aug 11, 2020

Just wanted to add that we do various manual rate calculations in APM and we also have a manual implementation of a monotonically increasing counter for our garbage collection charts. We combine a max, derivative, and bucket_script aggregation to achieve the latter. Would love to get rid of these 😀

@imotov
Copy link
Contributor Author

imotov commented Aug 11, 2020

Will this support histogram fields as well? ie, will it use .counts?

I was thinking to use count by default if the field parameter is not present. Do you see a need for a special value that should be supported by the field parameter to represent count?

Just wanted to add that we do various manual rate calculations in APM and we also have a manual implementation of a monotonically increasing counter for our garbage collection charts.

We decided to handle monotonically increasing counters as a separate aggregation, which is tracked by #60619. Could you take a look and comment on this issue if you see some functionality that is currently missing?

@dgieselaar
Copy link
Member

dgieselaar commented Aug 12, 2020

@imotov To clarify, considering the following document type:

{
	"@timestamp": {
		"type": "date"
	},
	"duration": {
		"type": "histogram"
	}
}

I would like to be able to use a rate aggregation as:

{
	"aggs": {
		"by_date": {
			"date_histogram": {
				"field": "@timestamp"
			},
			"aggs": {
				"rate": {
					"rate": {
						"field": "duration"
					}
				}
			}
		}
	}
}

And have the rate based on the count of values stored in the histogram field duration. Would that work out of the box?

@polyfractal
Copy link
Contributor

@dgieselaar in that scenario, I'm assuming it'd be the total sum of the histo counts in that bucket?

  • an hour bucket
  • two documents with histograms:
    • doc1 histo counts: [1,2,3,4]
    • doc2 histo counts: [5,6,7]

In this case, the total count would be (1+2+3+4+5+6+7)==28 for the two histos, and so the rate would be 28/hr in that bucket.

Is that what you'd expect to happen, or something else?

@dgieselaar
Copy link
Member

@polyfractal that's right, I should have said sum of counts, not count of values. Thanks for catching that 😃

@polyfractal
Copy link
Contributor

👍 awesome, sounds good. I started to type out one thing and then confused myself... so figured it'd be good to make sure we were talking about the same thing :)

I'll defer to Igor whose much more deeper in the weeds working on this than me, but that doesn't sound like it would be impossible to implement. Might end up being a separate PR just to keep things simple though since the histo implementation is a little different from other fields. But

@imotov
Copy link
Contributor Author

imotov commented Aug 13, 2020

I think the original plan was to add all values, the way Zach described it.

imotov added a commit that referenced this issue Aug 25, 2020
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
imotov added a commit that referenced this issue Aug 25, 2020
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
imotov added a commit to imotov/elasticsearch that referenced this issue Nov 6, 2020
In the initial implementation I missed the the test to check if bucket
sorting works correctly. This commit adds this test.

Relates to elastic#60674
imotov added a commit that referenced this issue Nov 9, 2020
In the initial implementation I missed the the test to check if bucket
sorting works correctly. This commit adds this test.

Relates to #60674
imotov added a commit to imotov/elasticsearch that referenced this issue Nov 9, 2020
In the initial implementation I missed the the test to check if bucket
sorting works correctly. This commit adds this test.

Relates to elastic#60674
imotov added a commit that referenced this issue Nov 9, 2020
In the initial implementation I missed the the test to check if bucket
sorting works correctly. This commit adds this test.

Relates to #60674
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants