-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric Selector aggregation #48069
Labels
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
85 tasks
Potential naming idea: |
38 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'd like to propose an aggregation that "selects" a metric from a document according to some kind of ordering criteria on a second field. For example, you may want the most recent latency value within a date_histogram bucket: in this case, the "metric" is the
latency
field, and the ordering criteria istimestamp DESC, size: 1
.This is a fairly common use-case which is difficult to accomplish today.
top_hits
can give you the information, but it fetches an entire document and is not compatible with pipeline aggregations. It is also fairly expensive if many values/documents are being fetched. You can sometimes get the required information with clever usages of other aggs (like amax
agg, or scripting) to pull out the document you're looking for, but they are fragile and hacky approaches.The WeightedAvg agg added support for multiple ValuesSources, so a "metric selector" should not be too difficult to implement.
All naming is tentative, open to better suggestions! :)
Request Syntax
metric
sort
metric
byorder
sort
field? Ascending or descendingsize
<sort, metric>
tuples that should be returned1
multi_value_mode
metric
fields be collapsed into a single value?avg
Response
Note how the
sort
values are ordered descending per-bucket, and it returns a single metric value for each sort value. There may be 1000 documents in a bucket, but unlike other aggregations this actually returnsn
individual values from the documents themselves. If there are ties, there would be multiple objects with the samesort
.Misc
size
to prevent abuses. It should be fairly easy to track in a breaker, so that might be sufficient. I would feel better if there was a hard/soft limit though :) Like top_hits, this should be used to fetch a handful of values not an entire indexkeyword
, etc).asc
/desc
(e.g. the min or max values of a field), we shouldn't run into top-n accuracy issues liketerms
agg can have. Each shard will always send it'sn
min/max values and the coordinator will assemble a global min/max list. It might be that all topn
values have the same sort key and others are omitted, but this is not incorrect since we are displaying individual results and not grouping./cc @costin @colings86
The text was updated successfully, but these errors were encountered: