Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain shard failure response no longer contains error property #54235

Closed
tylersmalley opened this issue Mar 25, 2020 · 4 comments
Closed

Certain shard failure response no longer contains error property #54235

tylersmalley opened this issue Mar 25, 2020 · 4 comments
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team team-discuss

Comments

@tylersmalley
Copy link
Contributor

This effects 7.7.0+

This is a pretty particular error to hit, so I am assuming the regression was not intentional. It appears that for an invalid scripted field, it is failing only one of the shards and skipping the rest which is resulting in a hits response instead of an error.

This change broke a test we had in Kibana, preventing the promotion of our nightly snapshot. Looking for guidance on if this is an accidental regression or we should update our checks accordingly.

Here is the minimum reproduction I could come up with:

cURL friendly commands

curl -XPUT "http://localhost:9200/test-1" -H 'Content-Type: application/json' -d'{ "settings": { "number_of_shards": 1 }, "mappings": { "properties": { "@timestamp": { "type": "date" } } }}'

curl -XPUT "http://localhost:9200/test-2" -H 'Content-Type: application/json' -d'{ "settings": { "number_of_shards": 1 }, "mappings": { "properties": { "@timestamp": { "type": "date" } } }}'

curl -XPOST "http://localhost:9200/test-*/_search" -H 'Content-Type: application/json' -d'{ "version": true, "size": 500, "sort": [ { "@timestamp": { "order": "desc", "unmapped_type": "boolean" } } ], "script_fields": { "invalid_scripted_field": { "script": { "source": "invalid", "lang": "painless" } } }, "query": { "bool": { "must": [], "filter": [ { "match_all": {} }, { "range": { "@timestamp": { "gte": "2020-03-25T20:26:57.120Z", "lte": "2020-03-25T20:41:57.120Z", "format": "strict_date_optional_time" } } } ] } }}'

PUT /test-1
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      }
    }
  }
}

PUT /test-2
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      }
    }
  }
}

POST /test-*/_search
{
  "version": true,
  "size": 500,
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "script_fields": {
    "invalid_scripted_field": {
      "script": {
        "source": "invalid",
        "lang": "painless"
      }
    }
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2020-03-25T20:26:57.120Z",
              "lte": "2020-03-25T20:41:57.120Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ]
    }
  }
}

This failed search attempt is returning _shards and hits, where the only mention of a failure is within the _shards.failures.

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "skipped" : 1,
    "failed" : 1,
    "failures" : [
      {
        "shard" : 0,
        "index" : "test-1",
        "node" : "3XFk8qtTTu-C0cBdpLOBPw",
        "reason" : {
          "type" : "script_exception",
          "reason" : "compile error",
          "script_stack" : [
            "invalid",
            "^---- HERE"
          ],
          "script" : "invalid",
          "lang" : "painless",
          "position" : {
            "offset" : 0,
            "start" : 0,
            "end" : 7
          },
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "variable [invalid] is not defined"
          }
        }
      }
    ]
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

Compare this to the previous behavior, where the response was an error with a root_cause and failed_shards:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "compile error",
        "script_stack" : [
          "invalid",
          "^---- HERE"
        ],
        "script" : "invalid",
        "lang" : "painless",
        "position" : {
          "offset" : 0,
          "start" : 0,
          "end" : 7
        }
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test-1",
        "node" : "3XFk8qtTTu-C0cBdpLOBPw",
        "reason" : {
          "type" : "script_exception",
          "reason" : "compile error",
          "script_stack" : [
            "invalid",
            "^---- HERE"
          ],
          "script" : "invalid",
          "lang" : "painless",
          "position" : {
            "offset" : 0,
            "start" : 0,
            "end" : 7
          },
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "variable [invalid] is not defined"
          }
        }
      }
    ]
  },
  "status" : 400
}
@jimczi jimczi added the :Search/Search Search-related issues that do not fall into other categories label Mar 25, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@jimczi
Copy link
Contributor

jimczi commented Mar 25, 2020

This is a side effect of #53873 that changes the default for pre_filter_shard_size. We try to skip shards that cannot match the query more aggressively by default in 7.7.
However you can reproduce easily in previous versions by setting pre_filter_shard_size to 1 in your request:

POST /test-*/_search?pre_filter_shard_size=1

So this is not a regression per say, but I agree that it is confusing.
Currently we consider that a skipped shard is a successful shard.
This is why the response contains:

"_shards" : {
    "total" : 2,
    "successful" : 1,
    "skipped" : 1,
    "failed" : 1,
}

By default a search request is considered successful if at least one shard is successful.
To make the behavior more consistent with queries that don't skip shards, we should change the heuristic and consider a request successful only if if it has a non-skipped successful shard:
successful > 0 && successful > skipped.
It's currently impossible to skip all shards since we force at least one execution so total is always greater than skipped.

@javanna
Copy link
Member

javanna commented Mar 27, 2020

++ on failing the request if the only shard counted successful was in fact skipped.

I also wonder if we should look at improving the _shards section long-term. I know there are edge cases and that total is not always the sum of successful and failed, but the fact that skipped are also counted as successful is possibly counterintuitive?

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@jimczi
Copy link
Contributor

jimczi commented Dec 4, 2020

That was fixed in #64337, hence closing

@jimczi jimczi closed this as completed Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team team-discuss
Projects
None yet
Development

No branches or pull requests

5 participants