Cardinality aggregation missing from response inconsistently #64142

noamkush · 2020-10-26T09:52:46Z

Elasticsearch version: 7.9.1 (reproducible on demo.elastic.co)

Description of the problem including expected versus actual behavior:
The response to the exact same search request sometimes contains the cardinality aggregation and sometimes it doesn't - the issue is transient.

Steps to reproduce:
The following request (on demo.elastic.co):

POST _search?request_cache=false
{
  "aggs": {
    "date_range": {
      "aggs": {
        "unique_users": {
          "cardinality": {
            "field": "agent.name"
          }
        }
      },
      "date_range": {
        "field": "@timestamp",
        "ranges": [
          {
            "from": "2020-09-01T00:00:00.000000+00:00"
          }
        ]
      }
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "agent.id": []
          }
        }
      ]
    }
  }
}

Results in:

{
  "took" : 962,
  "timed_out" : false,
  "_shards" : {
    "total" : 480,
    "successful" : 480,
    "skipped" : 477,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_range" : {
      "buckets" : [
        {
          "key" : "2020-09-01T00:00:00.000Z-*",
          "from" : 1.5989184E12,
          "from_as_string" : "2020-09-01T00:00:00.000Z",
          "doc_count" : 0,
          "unique_users" : {
            "value" : 0
          }
        }
      ]
    }
  }
}

And other times in

{
  "took" : 444,
  "timed_out" : false,
  "_shards" : {
    "total" : 480,
    "successful" : 480,
    "skipped" : 478,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_range" : {
      "buckets" : [
        {
          "key" : "2020-09-01T00:00:00.000Z-*",
          "from" : 1.5989184E12,
          "from_as_string" : "2020-09-01T00:00:00.000Z",
          "doc_count" : 0
        }
      ]
    }
  }
}

Notice the unique_users metric missing from the second response. It doesn't take more than a few tries to get both responses.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-10-26T10:40:29Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

nik9000 · 2020-10-26T13:04:34Z

This sounds like something I may have broken when I was reworking the timing of aggs. And it may be fixed by #64016 which is an extension of a bug that I fixed that looked similar.

nik9000 · 2020-10-27T12:43:55Z

I've created a local reproduction and confirmed that it still hits master. This happens when a few things line up:

Some shards map the field that you are running the range on and some do not.
You are querying enough shard to kick in the can_match phase (128).
Your query matches no docs and we recognize this during the can_match phase.
The shards that don't map the field for the range sort before those that do map the field for the range.

All of that causes us to run the "Range.Unmapped" special aggregator which mistakenly doesn't include its subAggs. I'll open a fix for it.

nik9000 · 2020-10-27T12:48:52Z

It's a bit simpler than that, I just discovered: all you need is for the date_range to target an index that doesn't map the target field. That'll get you the same results. can_match comes up on demo.elastic.co because it has lots of shards and it makes the results more confusing because, well, some shards do map that field so you'd expect not to use the unmapped implementation. But can_match kicks in here to help and we confuse it with a bad aggregator impl.

Now that we're consistently using `cat_match` to filter which shards we run on we can get this confusing case: 1. You have a search with, say, a range and a sub-agg. 2. That search has a query that `can_match` can recognize will match no docs. On *any* shard. 3. So we dutifully run it on a single shard so it can produce the "empty" aggs. 4. The shard we pick happens to not have the target of the range mapped. 5. This kicks in the special range aggregator that doesn't collect any documents. 6. Before this commit, that range aggregator *also* never produced any sub-aggs. So, without this change, it was quite possible for a search that happened to match no documents to "throw away" the sub-aggs of a range and a few other aggs. We've had this problem for a long, long time but it is more confusing now because `can_match` is really kicking in and causing us to see cases where it looks like you are targeting a lot of shards but you really are only targeting a couple. It used to be that to get the "no sub-aggs" behavior you had to explicitly target only shards that didn't map the target field of the `range` agg. And, like, in that case it isn't too bad because you targeted a sort of degenerate shard. But now that `can_match` is doing its thing you can end up with the confusing steps above. It took me several hours to track down what what happening I know how the individual pieces of all of this works. It took four hours to figure out how they fit together in this case.... Anyway! This replaces all the aggregator implementations that throw out the sub-aggregators with ones that keep them. I think this'll be less confusing in the future. Closes elastic#64142

Now that we're consistently using `cat_match` to filter which shards we run on we can get this confusing case: 1. You have a search with, say, a range and a sub-agg. 2. That search has a query that `can_match` can recognize will match no docs. On *any* shard. 3. So we dutifully run it on a single shard so it can produce the "empty" aggs. 4. The shard we pick happens to not have the target of the range mapped. 5. This kicks in the special range aggregator that doesn't collect any documents. 6. Before this commit, that range aggregator *also* never produced any sub-aggs. So, without this change, it was quite possible for a search that happened to match no documents to "throw away" the sub-aggs of a range and a few other aggs. We've had this problem for a long, long time but it is more confusing now because `can_match` is really kicking in and causing us to see cases where it looks like you are targeting a lot of shards but you really are only targeting a couple. It used to be that to get the "no sub-aggs" behavior you had to explicitly target only shards that didn't map the target field of the `range` agg. And, like, in that case it isn't too bad because you targeted a sort of degenerate shard. But now that `can_match` is doing its thing you can end up with the confusing steps above. It took me several hours to track down what what happening I know how the individual pieces of all of this works. It took four hours to figure out how they fit together in this case.... Anyway! This replaces all the aggregator implementations that throw out the sub-aggregators with ones that keep them. I think this'll be less confusing in the future. Closes #64142

…4214) Now that we're consistently using `cat_match` to filter which shards we run on we can get this confusing case: 1. You have a search with, say, a range and a sub-agg. 2. That search has a query that `can_match` can recognize will match no docs. On *any* shard. 3. So we dutifully run it on a single shard so it can produce the "empty" aggs. 4. The shard we pick happens to not have the target of the range mapped. 5. This kicks in the special range aggregator that doesn't collect any documents. 6. Before this commit, that range aggregator *also* never produced any sub-aggs. So, without this change, it was quite possible for a search that happened to match no documents to "throw away" the sub-aggs of a range and a few other aggs. We've had this problem for a long, long time but it is more confusing now because `can_match` is really kicking in and causing us to see cases where it looks like you are targeting a lot of shards but you really are only targeting a couple. It used to be that to get the "no sub-aggs" behavior you had to explicitly target only shards that didn't map the target field of the `range` agg. And, like, in that case it isn't too bad because you targeted a sort of degenerate shard. But now that `can_match` is doing its thing you can end up with the confusing steps above. It took me several hours to track down what what happening I know how the individual pieces of all of this works. It took four hours to figure out how they fit together in this case.... Anyway! This replaces all the aggregator implementations that throw out the sub-aggregators with ones that keep them. I think this'll be less confusing in the future. Closes elastic#64142

…64247) Now that we're consistently using `cat_match` to filter which shards we run on we can get this confusing case: 1. You have a search with, say, a range and a sub-agg. 2. That search has a query that `can_match` can recognize will match no docs. On *any* shard. 3. So we dutifully run it on a single shard so it can produce the "empty" aggs. 4. The shard we pick happens to not have the target of the range mapped. 5. This kicks in the special range aggregator that doesn't collect any documents. 6. Before this commit, that range aggregator *also* never produced any sub-aggs. So, without this change, it was quite possible for a search that happened to match no documents to "throw away" the sub-aggs of a range and a few other aggs. We've had this problem for a long, long time but it is more confusing now because `can_match` is really kicking in and causing us to see cases where it looks like you are targeting a lot of shards but you really are only targeting a couple. It used to be that to get the "no sub-aggs" behavior you had to explicitly target only shards that didn't map the target field of the `range` agg. And, like, in that case it isn't too bad because you targeted a sort of degenerate shard. But now that `can_match` is doing its thing you can end up with the confusing steps above. It took me several hours to track down what what happening I know how the individual pieces of all of this works. It took four hours to figure out how they fit together in this case.... Anyway! This replaces all the aggregator implementations that throw out the sub-aggregators with ones that keep them. I think this'll be less confusing in the future. Closes #64142

…64244) Now that we're consistently using `cat_match` to filter which shards we run on we can get this confusing case: 1. You have a search with, say, a range and a sub-agg. 2. That search has a query that `can_match` can recognize will match no docs. On *any* shard. 3. So we dutifully run it on a single shard so it can produce the "empty" aggs. 4. The shard we pick happens to not have the target of the range mapped. 5. This kicks in the special range aggregator that doesn't collect any documents. 6. Before this commit, that range aggregator *also* never produced any sub-aggs. So, without this change, it was quite possible for a search that happened to match no documents to "throw away" the sub-aggs of a range and a few other aggs. We've had this problem for a long, long time but it is more confusing now because `can_match` is really kicking in and causing us to see cases where it looks like you are targeting a lot of shards but you really are only targeting a couple. It used to be that to get the "no sub-aggs" behavior you had to explicitly target only shards that didn't map the target field of the `range` agg. And, like, in that case it isn't too bad because you targeted a sort of degenerate shard. But now that `can_match` is doing its thing you can end up with the confusing steps above. It took me several hours to track down what what happening I know how the individual pieces of all of this works. It took four hours to figure out how they fit together in this case.... Anyway! This replaces all the aggregator implementations that throw out the sub-aggregators with ones that keep them. I think this'll be less confusing in the future. Closes #64142

noamkush added >bug needs:triage Requires assignment of a team area label labels Oct 26, 2020

iverase added :Analytics/Aggregations Aggregations and removed needs:triage Requires assignment of a team area label labels Oct 26, 2020

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 26, 2020

nik9000 self-assigned this Oct 27, 2020

nik9000 mentioned this issue Oct 27, 2020

Make sure non-collecting aggs include sub-aggs #64214

Merged

nik9000 closed this as completed in #64214 Oct 27, 2020

nik9000 mentioned this issue Oct 27, 2020

Make sure non-collecting aggs include sub-aggs (backport of #64214) #64244

Merged

nik9000 mentioned this issue Oct 27, 2020

Make sure non-collecting aggs include sub-aggs (backport of #64214) #64247

Merged

Mpdreamz mentioned this issue Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this issue Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cardinality aggregation missing from response inconsistently #64142

Cardinality aggregation missing from response inconsistently #64142

noamkush commented Oct 26, 2020

elasticmachine commented Oct 26, 2020

nik9000 commented Oct 26, 2020

nik9000 commented Oct 27, 2020

nik9000 commented Oct 27, 2020 •

edited

Loading

Cardinality aggregation missing from response inconsistently #64142

Cardinality aggregation missing from response inconsistently #64142

Comments

noamkush commented Oct 26, 2020

elasticmachine commented Oct 26, 2020

nik9000 commented Oct 26, 2020

nik9000 commented Oct 27, 2020

nik9000 commented Oct 27, 2020 • edited Loading

nik9000 commented Oct 27, 2020 •

edited

Loading