Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'key' field to 'function_score' query function definition in explanation response #1711

Closed
lrynek opened this issue Dec 13, 2021 · 20 comments · Fixed by #2244 or #2390
Closed

Add 'key' field to 'function_score' query function definition in explanation response #1711

lrynek opened this issue Dec 13, 2021 · 20 comments · Fixed by #2244 or #2390
Labels
documentation pending Tracks issues which have PRs merged but documentation changes pending enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@lrynek
Copy link

lrynek commented Dec 13, 2021

Is your feature request related to a problem? Please describe.
When trying to extract current function value from _explanation part of OpenSearch JSON response (i.e. for debugging or logging purposes), I can do it only with text matching of a script body (and only with those functions that operates on script language, the filter ones are out of reach).

Describe the solution you'd like
I would add a new field key (or whatever name suits best) to the function_score query functions array, as follows:

ACTUAL
(see: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html)

Request
{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "return doc['ids'].containsAll(params.ids) ? 1 : 0;",
              "params": {
                "ids": [1, 2]
              }
            }
          },
          "weight": 65
        },
        {
          "filter": {
            "terms": {
              "location.city_id": [
                "1"
              ]
            }
          },
          "weight": 35
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}
Response
{
  "took": 35,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 100.0,
    "hits": [
      {
        "_score": 100.0,
        "_source": {
        },
        "_explanation": {
          "value": 100.0,
          "description": "sum of:",
          "details": [
            {
              "value": 100.0,
              "description": "min of:",
              "details": [
                {
                  "value": 100.0,
                  "description": "function score, score mode [sum]",
                  "details": [
                    {
                      "value": 65.0,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 1.0,
                          "description": "script score function, computed with script:\"Script{type=inline, lang='painless', idOrCode='return doc['ids'].containsAll(params.ids) ? 1 : 0;', options={}, params={ids=[1,2]}\" and parameters: \n{ids=[1,2]}",
                          "details": []
                        },
                        {
                          "value": 65.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 35.0,
                      "description": "function score, product of:",
                      "details": [
                        {
                          "value": 1.0,
                          "description": "match filter: location.city_id:{1}",
                          "details": []
                        },
                        {
                          "value": 35.0,
                          "description": "product of:",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "constant score 1.0 - no function provided",
                              "details": []
                            },
                            {
                              "value": 35.0,
                              "description": "weight",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

EXPECTED

Request
{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
here----->"key": "af59aa50-19f4-45c8-90d2-c1a0b91416e1",
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "return doc['ids'].containsAll(params.ids) ? 1 : 0;",
              "params": {
                "ids": [1, 2]
              }
            }
          },
          "weight": 65
        },
        {
here----->"key": "f4ff6d9e-96d6-401c-8da7-ff99d8228457",
          "filter": {
            "terms": {
              "location.city_id": [
                "1"
              ]
            }
          },
          "weight": 35
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}
Response
{
  "took": 35,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 100,
    "hits": [
      {
        "_score": 100,
        "_source": {},
        "_explanation": {
          "value": 100,
          "description": "sum of:",
          "details": [
            {
              "value": 100,
              "description": "min of:",
              "details": [
                {
                  "value": 100,
                  "description": "function score, score mode [sum]",
                  "details": {
 here-(as-a-key)--->"af59aa50-19f4-45c8-90d2-c1a0b91416e1": {
                      "value": 65,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 1,
                          "description": "script score function, computed with script:\"Script{type=inline, lang='painless', idOrCode='return doc['ids'].containsAll(params.ids) ? 1 : 0;', options={}, params={ids=[1,2]}\" and parameters: \n{ids=[1,2]}",
                          "details": []
                        },
                        {
                          "value": 65,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    },
 here-(as-a-key)--->"f4ff6d9e-96d6-401c-8da7-ff99d8228457": {
                      "value": 35,
                      "description": "function score, product of:",
                      "details": [
                        {
                          "value": 1,
                          "description": "match filter: location.city_id:{1}",
                          "details": []
                        },
                        {
                          "value": 35,
                          "description": "product of:",
                          "details": [
                            {
                              "value": 1,
                              "description": "constant score 1.0 - no function provided",
                              "details": []
                            },
                            {
                              "value": 35,
                              "description": "weight",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  }
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

The retrieval of specific computed values will be more precise after such or similar implementation.

Describe alternatives you've considered
🅰️ Another possibility would be to expose these key value pairs on a particular function explanation details if the root one makes it more difficult to implement:

Request
{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
here----->"key": "af59aa50-19f4-45c8-90d2-c1a0b91416e1",
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "return doc['ids'].containsAll(params.ids) ? 1 : 0;",
              "params": {
                "ids": [1, 2]
              }
            }
          },
          "weight": 65
        },
        {
here----->"key": "f4ff6d9e-96d6-401c-8da7-ff99d8228457",
          "filter": {
            "terms": {
              "location.city_id": [
                "1"
              ]
            }
          },
          "weight": 35
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}
Response
{
  "took": 35,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 100.0,
    "hits": [
      {
        "_score": 100.0,
        "_source": {
        },
        "_explanation": {
          "value": 100.0,
          "description": "sum of:",
          "details": [
            {
              "value": 100.0,
              "description": "min of:",
              "details": [
                {
                  "value": 100.0,
                  "description": "function score, score mode [sum]",
                  "details": [
                    {
or-here-------------->"key": "af59aa50-19f4-45c8-90d2-c1a0b91416e1",
(on-first-computed-distinctive-value-level)
                      "value": 65.0,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 1.0,
                          "description": "script score function, computed with script:\"Script{type=inline, lang='painless', idOrCode='return doc['ids'].containsAll(params.ids) ? 1 : 0;', options={}, params={ids=[1,2]}\" and parameters: \n{ids=[1,2]}",
                          "details": []
                        },
                        {
                          "value": 65.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    },
                    {
or-here-------------->"key": "f4ff6d9e-96d6-401c-8da7-ff99d8228457",
(on-first-computed-distinctive-value-level)
                      "value": 35.0,
                      "description": "function score, product of:",
                      "details": [
                        {
                          "value": 1.0,
                          "description": "match filter: location.city_id:{1}",
                          "details": []
                        },
                        {
                          "value": 35.0,
                          "description": "product of:",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "constant score 1.0 - no function provided",
                              "details": []
                            },
                            {
                              "value": 35.0,
                              "description": "weight",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

🅱️ Another way round would be to simply return keys with respective values on the root level of the explanation response JSON:

Request
{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
here----->"key": "af59aa50-19f4-45c8-90d2-c1a0b91416e1",
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "return doc['ids'].containsAll(params.ids) ? 1 : 0;",
              "params": {
                "ids": [1, 2]
              }
            }
          },
          "weight": 65
        },
        {
here----->"key": "f4ff6d9e-96d6-401c-8da7-ff99d8228457",
          "filter": {
            "terms": {
              "location.city_id": [
                "1"
              ]
            }
          },
          "weight": 35
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}
Response
{
  "took": 35,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 100.0,
    "hits": [
      {
        "_score": 100.0,
        "_source": {
        },
        "_explanation": {
          "value": 100.0,
here----->"key_value_pairs": {
(on-the-root-level)
            "af59aa50-19f4-45c8-90d2-c1a0b91416e1": 65.0,
            "f4ff6d9e-96d6-401c-8da7-ff99d8228457": 35.0
          },
          "description": "sum of:",
          "details": [
            {
              "value": 100.0,
              "description": "min of:",
              "details": [
                {
                  "value": 100.0,
                  "description": "function score, score mode [sum]",
                  "details": [
                    {
                      "value": 65.0,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 1.0,
                          "description": "script score function, computed with script:\"Script{type=inline, lang='painless', idOrCode='return doc['ids'].containsAll(params.ids) ? 1 : 0;', options={}, params={ids=[1,2]}\" and parameters: \n{ids=[1,2]}",
                          "details": []
                        },
                        {
                          "value": 65.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 35.0,
                      "description": "function score, product of:",
                      "details": [
                        {
                          "value": 1.0,
                          "description": "match filter: location.city_id:{1}",
                          "details": []
                        },
                        {
                          "value": 35.0,
                          "description": "product of:",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "constant score 1.0 - no function provided",
                              "details": []
                            },
                            {
                              "value": 35.0,
                              "description": "weight",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Additional context
The feature has been originally requested on Elasticsearch GitHub repository.

@lrynek lrynek added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 13, 2021
@lrynek
Copy link
Author

lrynek commented Jan 25, 2022

@anasalkouz hi there! 👋 Only asking whether you consider this enhancement worth implementing in any of the future versions of OpenSearch? Thanks for any update / prognosis on that! 🙂

@dblock
Copy link
Member

dblock commented Feb 2, 2022

How do you use this today (when do you match on query body)? How would a uuid make that easier?

The details provided seem to be an ordered list. So if "key" was 1, 2, 3, would it be the same or different from a uuid? Note that a summary could easily have a key of 1, 2, 3, .....

@lrynek
Copy link
Author

lrynek commented Feb 3, 2022

@dblock the problem is that it's not always present on the same level of the details, so it doesn't matter the order as the position of the actual value vary depending on the computation and the type of function at hand. I recursively go deeper on that _explanation array in PHP and try to determine by the description that the assigned value is the one I'm looking for.
With uuid or any other unique value assigned on whatever level of the tree I would be able to precisely match the value to the intended function. However the best would be to not iterate over that array anymore but simply pick it directly from the key-value map provided on the root of the array.

@dblock
Copy link
Member

dblock commented Feb 3, 2022

@lrynek thanks

Feels like explanation should always contain the exact same amount of detail as the number of queries you're trying to explain. In your example you ask to explain 2 functions and the return is an array of 2 items, isn't it? Do you have an example when is it not the case and is that a bug? Do you have an example where the response doesn't match the layout of the request where you have to match by query/script?

I am not against adding a key, but that is an API change and still feels redundant to me.

@lrynek
Copy link
Author

lrynek commented Feb 4, 2022

@dblock Yes, I know but I don't mean the order or the top level number of functions and their respective results but the fact that the computed value sometimes can be present on the deeper level of the tree within one of those two example outputs.
It is due to having or not having weight param within the function definition.

Here is a sample PHP code for the lookup I mean (as for the sake of simplicity and security I haven't shared real query from our system and thus it doesn't serves well a purpose maybe):

public function findFactors(FactorInterface ...$factors): Factors
{
        $matchedFactors = [];
        
        foreach ($factors as $factor)
        {
	        $factorName                    = $factor->name();
	        $factorMatchingDescriptionPart = $factor->matchingDescriptionPart();
	        $explainedFactorsData          = $this->data['details'][0]['details'][0]['details'] ?? [];
        
	        foreach ($explainedFactorsData as $data)
	        {
		        $descriptionLevel1 = $data['description'] ?? '';
		        $descriptionLevel2 = $data['details'][0]['description'] ?? '';
		        $description       = $descriptionLevel1 . $descriptionLevel2;
        
		        if (false !== \strpos($description, $factorMatchingDescriptionPart))
		        {
			        $value = (float)($data['value'] ?? 0);
			        $initialValue = isset($data['details'][0]['value']) ? (float)($data['details'][0]['value']) : null;
			        $weight = isset($data['details'][1]['value']) ? (float)($data['details'][1]['value']) : null;
			        $matchedFactors[] = new Factor($factorName, $value, $initialValue, $weight);
		        }
	        }
        }
        
        return new Factors(...$matchedFactors);
}

As it is mostly related to the modular approach of OpenSearch query building in our PHP app (so I cannot rely on the order of added functions) from the higher level code architecture. With unique keys/ids/whatever I would be able to not care about the query building order of the functions query part, neither I would be bothered by the actual depth of the final value.

Regarding your statement:

but that is an API change and still feels redundant to me.

I think it can be implemented as an optional parameter of the function definition, similar to existing weight parameter for API backward compatibility.

I imagine it looks complicated, so I'm open to have a short call on that topic if you see it's beneficial 🙂

@lrynek
Copy link
Author

lrynek commented Feb 12, 2022

@dblock wdyt? 🙂

@dblock
Copy link
Member

dblock commented Feb 14, 2022

A few thoughts.

  • I see 20+ thumbs up on this issue! Sounds like a lot of people want this? I'd love to hear from others why this is important to your application/client from others!
  • API changes are easy to make and hard to remove, so I am going to argue against this for a bit more, see if there are simpler solutions, and try to convince myself that this is a good change. I am keeping an open mind.
  • Would reproducing the entire function query in the response, next to "details", at all levels, "as is" also solve this? It would remove the need for additional input.
  • Are there scenarios in OpenSearch where a query with N functions produces more than N details?
  • I'd like to ask maybe @kartg or @andrross to pitch into this discussion.

Regarding your statement:

but that is an API change and still feels redundant to me.

I think it can be implemented as an optional parameter of the function definition, similar to existing weight parameter for API backward compatibility.

For sure, but just because we can still doesn't mean we should :)

@andrross
Copy link
Member

I think it can be implemented as an optional parameter of the function definition, similar to existing weight parameter for API backward compatibility.

I don't love the idea of an optional parameter that changes the structure of the response as the proposed solution would. It means any code that parses the response would have to be able to handle two different structures based on a parameter in the request.

Would reproducing the entire function query in the response, next to "details", at all levels, "as is" also solve this? It would remove the need for additional input.

^ This proposal seems a lot simpler. @lrynek would this solve the problem? There are potential variants of this as well, such as adding a "tag" (or similar) field to the request that then is included as a "tag" field in each corresponding response. The response structure remains the same so it would be backward compatible (assuming an additional field in the response is ignored by anything that doesn't care about it).

@lrynek
Copy link
Author

lrynek commented Feb 16, 2022

@andrross @dblock Thank you for your detailed responses and help in trying to understand each other 👍

However, looks that we cannot reach the understanding on both sides - would be awesome (and maybe even faster to reach any outcome / consensus) to have a short 15 minutes call around this topic, wdyt? ☎️ (Zoom / Google Meets / whatever)

I fully get and understood that we have somewhere deep in the details of each function explanation and those details count matches with the number of functions passed in the request ✔️

However it's not what is a problem. The problem at hand is to match these explanations precisely not depending only on the order of functions as you would have in the higher level application code (as in my case) no control over the order of requested functions mapped to the particular service responsible for the singular function definition building. In that matter is far more convenient and even only this way possible to have that logic segregated and matching afterwards i.e. via the name of the factor (the naming used on higher level in my app) involved in the query.

@dblock

Would reproducing the entire function query in the response, next to "details", at all levels, "as is" also solve this? It would remove the need for additional input.

It won't work as it is what we already do with the only change to pass the entire function definition JSON string to match with that value, so even worse than the actual workaround + it breaks the requirement you guys trying to pursue:

@andrross

I don't love the idea of an optional parameter that changes the structure of the response as the proposed solution would. It means any code that parses the response would have to be able to handle two different structures based on a parameter in the request.

But actual optional weight param is doing exactly that - adding yet another level of depth in order to fetch the "final" value... 🤔

Have you checked both 🅰️ and 🅱️ alternatives provided in the description of this request? As looks like the 🅱️ one is fulfilling your requirements perfectly (a separate key available on the _explanation root, the additional one, so BC fulfilled, as normally people iterate only over known keys once integrated // key_value_pairs can be whatever you consider best):

{
  "_explanation": {
    "key_value_pairs": {
      "af59aa50-19f4-45c8-90d2-c1a0b91416e1": 65.0,
      "f4ff6d9e-96d6-401c-8da7-ff99d8228457": 35.0
    },
    "value": 100.0,
    "description": "sum of:",
    "details": []
}

@dblock
Copy link
Member

dblock commented Feb 16, 2022

@andrross @dblock Thank you for your detailed responses and help in trying to understand each other 👍

Generally, I'd rather not, because it's important we provide visibility to the community in all discussions and I don't know how else to do it than in writing. If you still want to just brainstorm, I'd be happy to jump on a call, dblock[at]amazon[dot]com, I'm in EST, and I can loop in @andrross.

Back to the problem I actually think we totally understand each-other. Additional input does seem to solve your problem in one easy way, but it is an API change and we must think hard about such things and avoid introducing something that will be here for years. My goal is to provide input and feedback to you so you (or someone else) can implement the best possible solution.

I fully get and understood that we have somewhere deep in the details of each function explanation and those details count matches with the number of functions passed in the request ✔️

However it's not what is a problem. The problem at hand is to match these explanations precisely not depending only on the order of functions as you would have in the higher level application code (as in my case) no control over the order of requested functions mapped to the particular service responsible for the singular function definition building. In that matter is far more convenient and even only this way possible to have that logic segregated and matching afterwards i.e. via the name of the factor (the naming used on higher level in my app) involved in the query.

This still sounds like a convenience, and something that can be solved in your application. I am reading that you are saying "I have a client that doesn't know in which order it passed the query parts in, so it cannot rely on the order of the explanation returned, please modify the service so that I don't have to make any changes in the client to track the order of queries". Am I reading this correctly?

If so, that's the textbook definition of "designing an API for a specific client" and is an anti-pattern.

Reproducing the query does make changing the client a bit easier because you would not need to compare strings, but I understand that doesn't solve the order problem.

I don't love the idea of an optional parameter that changes the structure of the response as the proposed solution would. It means any code that parses the response would have to be able to handle two different structures based on a parameter in the request.

But actual optional weight param is doing exactly that - adding yet another level of depth in order to fetch the "final" value... 🤔

This is not a fair comparison because weight is meaningful input to ranking, as opposed to an ID which will be unused by the query engine other than to be reproduced in the results.

Have you checked both 🅰️ and 🅱️ alternatives provided in the description of this request? As looks like the 🅱️ one is fulfilling your requirements perfectly (a separate key available on the _explanation root, the additional one, so BC fulfilled, as normally people iterate only over know keys once integrated // key_value_pairs can be whatever you consider best)

{
  "_explanation": {
    "key_value_pairs": {
      "af59aa50-19f4-45c8-90d2-c1a0b91416e1": 65.0,
      "f4ff6d9e-96d6-401c-8da7-ff99d8228457": 35.0
    },
    "value": 100.0,
    "description": "sum of:",
    "details": []
}

I have read that carefully and I disagree with the approach for the many reasons I gave above.

@lrynek

  1. Do you know other products that have an explain function that takes keys/tags/names on input and reproduces them on output?
  2. Can we please consider more alternatives to A and B, such as modifying your application? Do you see any?

@lrynek
Copy link
Author

lrynek commented Feb 17, 2022

@dblock

Generally, I'd rather not, because it's important we provide visibility to the community in all discussions and I don't know how else to do it than in writing. If you still want to just brainstorm, I'd be happy to jump on a call, dblock[at]amazon[dot]com, I'm in EST, and I can loop in @andrross.

It wasn't intended to slip around this GitHub issue thread and prevent full transparency - only a brainstorm and of course I can add a summary back here after that brainstorming call 👍 I'm in CEST timezone, so very close to yours, please provide any timing that works best for you guys to have a call on the topic.

This still sounds like a convenience, and something that can be solved in your application. I am reading that you are saying "I have a client that doesn't know in which order it passed the query parts in, so it cannot rely on the order of the explanation returned, please modify the service so that I don't have to make any changes in the client to track the order of queries". Am I reading this correctly?

Yes, in terms of the order maintenance itself it's exactly this but please take into account also varying depth of the final value calculated as well. So the convenience is not only for the order but depth included.

If so, that's the textbook definition of "designing an API for a specific client" and is an anti-pattern.

Although I fully agree with the one-client API design anti-pattern, I disagree that our case is only one-client one. It's a matter of every client that uses any high level compound solution that would benefit from such a feature. If we consider only direct JSON calls and responses with explanation for direct usage it wouldn't make sense to me either to implement such a feature. However, considering the ease of use on that higher level I still think it's fair enough to consider such move.

This is not a fair comparison because weight is meaningful input to ranking, as opposed to an ID which will be unused by the query engine other than to be reproduced in the results.

OK, agree 👍

  1. Do you know other products that have an explain function that takes keys/tags/names on input and reproduces them on output?

No, it's the first time I ran to such situation, I cannot provide at hand any comparison of this sort // maybe any other person from the followers would have some? 🤔

  1. Can we please consider more alternatives to A and B, such as modifying your application? Do you see any?

I already have them implemented in the code 😉 But I considered both script parts matching and the order depth manual lookup something that can be leveraged by the API itself as, exactly, a convenience of use. That was the main and only reason to file this issue and as such it won't make any sense to have an outcome decision to go back to the application where I've already been 😜

If you mind some call would be awesome to have a brainstorm on that at some moment, if not I'll accept the refusal but still we'll be facing IMHO unnecessary hassle and complexity in terms of retrieving explained values. The engine itself has the exact knowledge of the level and depth of the final calculated value. For me it can even provide it repeated on the higher level without any dynamic input key from my query and it would be far more convenient than the actual version. I am open to have any conversation on that, thx! 🙂

@reta
Copy link
Collaborator

reta commented Feb 17, 2022

@dblock @lrynek @andrross I have run into couple of scenarios with the similar problem at the end (try to match input and output), and it looks to me that Opensearch has a good foundation to that - named queries [1]. At this moment it does not work exactly the same way but if we add / extend the existing implementation to propagate back the query / filter / function names into the explanation block (we need to introduce named functions), it should by and large address the problem (not sure how difficult it is though). It is very close to option 🅰️, does it look like viable alternative or I am missing something?

[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/query-dsl-bool-query.html#named-queries

@lrynek
Copy link
Author

lrynek commented Feb 17, 2022

@reta for me it would perfectly work - this is the missing example of such request / response custom tag propagation @dblock you were looking for.

For me it's perfectly usable with the same _name property as it would be consistent with the already implemented approach for queries (in the case here it would be for functions) 👍

@andrross
Copy link
Member

Expanding the named queries approach seems like a great solution to me. It's an existing solution to a very similar problem so expanding it to functions is a natural extension of the feature.

@reta
Copy link
Collaborator

reta commented Feb 22, 2022

Looking into it ... :-)

@reta
Copy link
Collaborator

reta commented Feb 23, 2022

@lrynek @andrross the things turned out to be a bit more complicated than expected, the Explanations are actually coming from Apache Lucene and are not extendable, I have opened an improvement here [1] and awaiting for feedback. But this is where I have got so far with named scripts support:

{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "return doc['user_name'].contains('cash') ? 1 : 0;",
              "_name": "script1"
            }
          },
          "weight": 65
        },
        {
          "filter": {
            "terms": {
             "_name": "terms_filter",
              "abc": [
                "1"
              ]
            }
          },
          "weight": 35
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

The _name is returned back in the description at the moment, due to [1]:

"description": "script score function, computed with script:\"Script{type=inline, 
lang='painless', idOrCode='return doc['user_name'].contains('cash') ? 1 : 0;', options={}, 
params={}, _name='script1'}\"",
"description": "match filter: abc:{1}, _name=terms_filter",

The complete explanation looks like this:

{
    ...
    "_explanation": {
        "value": 35.0,
        "description": "min of:",
        "details": [
            {
                "value": 35.0,
                "description": "function score, score mode [sum]",
                "details": [
                    {
                        "value": 0.0,
                        "description": "product of:",
                        "details": [
                            {
                                "value": 0.0,
                                "description": "script score function, computed with script:\"Script{type=inline, 
lang='painless', idOrCode='return doc['user_name'].contains('cash') ? 1 : 0;', options={}, 
params={}, _name='script1'}\"",
                                "details": [
                                    {
                                        "value": 1.0,
                                        "description": "_score: ",
                                        "details": [
                                            {
                                                "value": 1.0,
                                                "description": "*:*",
                                                "details": []
                                            }
                                        ]
                                    }
                                ]
                            },
                            {
                                "value": 65.0,
                                "description": "weight",
                                "details": []
                            }
                        ]
                    },
                    {
                        "value": 35.0,
                        "description": "function score, product of:",
                        "details": [
                            {
                                "value": 1.0,
                                "description": "match filter: abc:{1}, _name=terms_filter",
                                "details": []
                            },
                            {
                                "value": 35.0,
                                "description": "product of:",
                                "details": [
                                    {
                                        "value": 1.0,
                                        "description": "constant score 1.0 - no function provided",
                                        "details": []
                                    },
                                    {
                                        "value": 35.0,
                                        "description": "weight",
                                        "details": []
                                    }
                                ]
                            }
                        ]
                    }
                ]
            },
            {
                "value": 3.4028235E38,
                "description": "maxBoost",
                "details": []
            }
        ]
    }
}

[1] https://issues.apache.org/jira/browse/LUCENE-10432

@dblock
Copy link
Member

dblock commented Mar 1, 2022

Thanks @reta for resolving our conversation productively! Looking forward to a working _name implementation.

@lrynek
Copy link
Author

lrynek commented Mar 4, 2022

@dblock @reta @andrross Thank you guys! ❤️ 🎉

@reta
Copy link
Collaborator

reta commented Mar 7, 2022

Documentation: opensearch-project/documentation-website#432

This was referenced Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation pending Tracks issues which have PRs merged but documentation changes pending enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
None yet
5 participants