[Ingest Node Pipelines] More detailed _simulate response #56004

jloleysens · 2020-04-30T08:44:54Z

Describe the feature:

In Kibana, new UI was built to support management of ingest node pipelines (elastic/kibana#62321). The UI also gives users a way of simulating the pipeline while creating or editing it.

The simulate UI we are envisioning will provide users with detailed information about the path each document has traveled through the simulated pipeline. Including the following information:

An indication of all processors that ran against the document. This will enable a tree view of the set (or subset) of processors.
At each processor did it fail or not pass the conditional (if). This will enable ✅ and ❌ indications at each processor. Critically, it would be important to know why something went wrong.
For a specific processor what was updated in the document (or a point-in-time look at the document post-processor) - something like this would enable a visual diff at each step.

On ES 8.0.0 snapshot this is an example request-response pair:

Request

{
	"pipeline": {
		"description": "_description",
		"processors": [
			{
				"set": {
					"field": "field3",
					"value": "_value3",
					"tag": "THIS SHOULD BE THERE"
				}
			},
			{
				"rename": {
					"if": "ctx.foo == 'bar'",
					"field": "foo1",
					"target_field": "fieldA",
					"tag": "field1_renamer",
					"on_failure": [
						{
							"set": {
								"field": "field4",
								"value": "THIS SHOULD BE THERE FROM FAILURE",
								"tag": "BLAH TEST"
							}
						}
					]
				}
			},
			{
				"set": {
					"field": "field3",
					"value": "_value3",
					"tag": "THIS SHOULD BE THERE"
				}
			}
		]
	},
	"docs": [
		{
			"_index": "index",
			"_id": "id",
			"_source": {
				"foo": "bar"
			}
		},
		{
			"_index": "index",
			"_id": "id",
			"_source": {
				"foo": "123"
			}
		}
	]
}

Response

{
  "docs": [
    {
      "processor_results": [
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        },
        {
          "tag": "field1_renamer",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar",
              "field4": "THIS SHOULD BE THERE FROM FAILURE"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        },
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar",
              "field4": "THIS SHOULD BE THERE FROM FAILURE"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        }
      ]
    },
    {
      "processor_results": [
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "123"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842165Z"
            }
          }
        },
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "123"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842165Z"
            }
          }
        }
      ]
    }
  ]
}

Given the above, we do not have a way of mapping these results back to the submitted pipeline to achieve 1 and 2, only a less detailed version of 3. (Where does THIS SHOULD BE THERE FROM FAILURE actually come from?)

One solution is that the response would be a structural mirror of the pipeline submitted to simulate. This would be simplest for consumers to map the result tree back to the submitted tree, each path would map back to a specific processor.

Alternatively, a flat structure could still work if we use tag as a placeholder for a serialised path which points to the processor in the submitted pipeline (e.g., 0.on_failure.1). However tag will still be exposed to users as a field they can enter values into (so we would be hijacking it for the call to _simulate). See this issue (#56000) concerning the multiple concerns of tag.

Assistance here would be greatly appreciated.

CC @jakelandis @talevy @cjcenizal

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-04-30T13:56:50Z

Pinging @elastic/es-core-features (:Core/Features/Ingest)

If a conditional is added to a processor, and that processor fails, and that processor has an on_failure handler, the full trace of all of the executed processors may not be displayed in simulate verbose. The information is correct, but misses displaying some of the steps used to get there. This happens because a processor that is conditional processor is a wrapper around the real processor and a processor with an on_failure handler is also a wrapper around the processor(s). When decorating for simulation we treat compound processor specially, but if a compound processor is wrapped by a conditional processor that compound processor's processors can be missed for decoration resulting in the missing displayed steps. The fix to this is to treat the conditional processor specially and explicitly seperate it from the processor it is wrapping. This requires us to keep track of 2 processors a possible conditional processor and the actual processor it may be wrapping. related: #56004

If a conditional is added to a processor, and that processor fails, and that processor has an on_failure handler, the full trace of all of the executed processors may not be displayed in simulate verbose. The information is correct, but misses displaying some of the steps used to get there. This happens because a processor that is conditional processor is a wrapper around the real processor and a processor with an on_failure handler is also a wrapper around the processor(s). When decorating for simulation we treat compound processor specially, but if a compound processor is wrapped by a conditional processor that compound processor's processors can be missed for decoration resulting in the missing displayed steps. The fix to this is to treat the conditional processor specially and explicitly seperate it from the processor it is wrapping. This requires us to keep track of 2 processors a possible conditional processor and the actual processor it may be wrapping. related: elastic#56004

#56635) If a conditional is added to a processor, and that processor fails, and that processor has an on_failure handler, the full trace of all of the executed processors may not be displayed in simulate verbose. The information is correct, but misses displaying some of the steps used to get there. This happens because a processor that is conditional processor is a wrapper around the real processor and a processor with an on_failure handler is also a wrapper around the processor(s). When decorating for simulation we treat compound processor specially, but if a compound processor is wrapped by a conditional processor that compound processor's processors can be missed for decoration resulting in the missing displayed steps. The fix to this is to treat the conditional processor specially and explicitly seperate it from the processor it is wrapping. This requires us to keep track of 2 processors a possible conditional processor and the actual processor it may be wrapping. related: #56004

elasticmachine · 2020-06-03T16:01:24Z

Pinging @elastic/es-ui (:ES-UI)

jakelandis · 2020-06-17T22:48:13Z

#56478 fixed a bug that may have led to missing information from the verbose output.
#57906 introduces a description for each processor and #58207 will echo that in the verbose output.

With the bug fixed and description displayed, we still need:

display a processor if the if condition resulted in false
better handling when the drop processor drops is executed (today it is just a null in the output)

I would like to propose to introduce to the output:

a new status field with the options success, error, error_ignored, skipped, dropped
a new if object that shows the condition and the resultant status
a new processor_type to help identify what kind of processor it is.

For example, for the given input (success):

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "if": "1 + 1 == 2",
          "field": "a",
          "value": true,
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

the verbose output is:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "success",
          "processor_type" : "set",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          },
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : {
              "a" : true
            },
            "_ingest" : {
              "pipeline" : "_simulate_pipeline",
              "timestamp" : "2020-06-17T22:22:59.338538Z"
            }
          }
        }
      ]
    }
  ]
}

For the given input (skipped):

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "if": "1 + 1 == 3",
          "field": "a",
          "value": true,
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

the output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "skipped",
          "processor_type" : "set",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 3",
            "result" : false
          }
        }
      ]
    }
  ]
}

input (error_ignore):

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "if": "1 + 1 == 2",
          "target_field": "a",
          "field": "b",
          "ignore_failure": true, 
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "error_ignored",
          "processor_type" : "rename",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          },
          "ignored_error" : {
            "error" : {
              "root_cause" : [
                {
                  "type" : "illegal_argument_exception",
                  "reason" : "field [b] doesn't exist"
                }
              ],
              "type" : "illegal_argument_exception",
              "reason" : "field [b] doesn't exist"
            }
          },
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : { },
            "_ingest" : {
              "pipeline" : "_simulate_pipeline",
              "timestamp" : "2020-06-17T22:26:07.137658Z"
            }
          }
        }
      ]
    }
  ]
}

input (error)

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "if": "1 + 1 == 2",
          "target_field": "a",
          "field": "b",
          "ignore_failure": false, 
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "error",
          "processor_type" : "rename",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          },
          "error" : {
            "root_cause" : [
              {
                "type" : "illegal_argument_exception",
                "reason" : "field [b] doesn't exist"
              }
            ],
            "type" : "illegal_argument_exception",
            "reason" : "field [b] doesn't exist"
          }
        }
      ]
    }
  ]
}

input (dropped)

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "drop": {
          "if": "1 + 1 == 2",
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "dropped",
          "processor_type" : "drop",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          }
        }
      ]
    }
  ]
}

dropped and error are terminal, such that they will always be the last item displayed (if it exists).
status will always be present, and it is safe to assume that skipped = if evaluated to false.
the if object will only appear if there is an if condition defined
the 'description', 'tag' will only show if they are defined

There is a bit of an oddity with the pipeline processor such that the pipeline processor itself doesn't show it's description or tag, support for the pipeline processor via simulate verbose has some pre-existing short comings. It works, but it requires a real pipeline to call out to and unless you use the tag or description in creative ways there is no providence from which pipeline that processor originated.

Here is more verbose example:

PUT _ingest/pipeline/mypipeline
{
  "processors": [
    {
      "set" : {
        "field": "mypipelineprocessor",
        "value": true,
        "description": "description 4"
      }
    }
  ]
}

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "if": "1 + 1 == 2",
          "field": "a",
          "value": true,
          "description": "description one"
        }
      },
      {
        "rename": {
          "target_field": "ggg",
          "field": "hhhh",
          "ignore_failure": true,
          "description": "description two"
        }
      },
      {
        "pipeline": {
          "description": "description three",
          "name": "mypipeline"
        }
      },
      {
        "drop": {
          "if": "9 * 2 == 11",
          "description": "description five"
        }
      },
      {
        "set": {
          "description": "description six",
          "if": "1 + 1 == 3",
          "field": "b",
          "value": true,
          "tag" : "my tag"
        }
      },
      {
        "drop": {
          "if": "9 * 2 == 18",
          "description": "description seven"
        }
      },
      {
        "set": {
          "description": "description eight",
          "field": "c",
          "value": true
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "success",
          "processor_type" : "set",
          "description" : "description one",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          },
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : {
              "a" : true
            },
            "_ingest" : {
              "pipeline" : "_simulate_pipeline",
              "timestamp" : "2020-06-17T22:38:26.837775Z"
            }
          }
        },
        {
          "status" : "error_ignored",
          "processor_type" : "rename",
          "description" : "description two",
          "ignored_error" : {
            "error" : {
              "root_cause" : [
                {
                  "type" : "illegal_argument_exception",
                  "reason" : "field [hhhh] doesn't exist"
                }
              ],
              "type" : "illegal_argument_exception",
              "reason" : "field [hhhh] doesn't exist"
            }
          },
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : {
              "a" : true
            },
            "_ingest" : {
              "pipeline" : "_simulate_pipeline",
              "timestamp" : "2020-06-17T22:38:26.837775Z"
            }
          }
        },
        {
          "status" : "success",
          "processor_type" : "set",
          "description" : "description 4",
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : {
              "mypipelineprocessor" : true,
              "a" : true
            },
            "_ingest" : {
              "pipeline" : "mypipeline",
              "timestamp" : "2020-06-17T22:38:26.837775Z"
            }
          }
        },
        {
          "status" : "skipped",
          "processor_type" : "drop",
          "description" : "description five",
          "if" : {
            "condition" : "9 * 2 == 11",
            "result" : false
          }
        },
        {
          "status" : "skipped",
          "processor_type" : "set",
          "description" : "description six",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 3",
            "result" : false
          }
        },
        {
          "status" : "dropped",
          "processor_type" : "drop",
          "description" : "description seven",
          "if" : {
            "condition" : "9 * 2 == 18",
            "result" : true
          }
        }
      ]
    }
  ]
}

jloleysens · 2020-06-18T16:03:06Z

@jakelandis This looks really great!

With regards to your new additions:

a new status field with the options success, error, error_ignored, skipped, dropped

This sounds fantastic and will be vital for the kind of debugging experience we want to build

a new if object that shows the condition and the resultant status

This is also an excellent addition giving us a way to give the user good feedback on why something was skipped.

a new processor_type to help identify what kind of processor it is.

I am not sure this is as useful for our use case. We will still want to be able to tie this processor result back to the submitted pipeline instance. This means we will also have the processor type available.

Couple initial follow up questions from my side:

I also assume no structural changes to the processors results were introduced. So we still have a flat structure inside of processor_results?
In order to tie these results back to the original pipeline programmatically we will make use of the tag field, assigning a unique id. We will then iterate over the processors results over the tree and map them to specific processor instances based on their ids. Is it correct to say that in all cases if I provide a tag it will be returned in the processor_results array for all types of processors (pipeline or otherwise)?

jakelandis · 2020-06-23T16:17:56Z

I also assume no structural changes to the processors results were introduced.

correct, same structure with some additional elements.

Is it correct to say that in all cases if I provide a tag it will be returned in the processor_results array for all types of processors (pipeline or otherwise)?

There is a current the oddity (as mentioned above) such that the pipeline processor itself (the one that forks out to a different pipeline) is not part of the output... I can look into that further but can't commit to that quite yet. Otherwise, yes tag will always be included.

jakelandis · 2020-07-30T01:14:12Z

#60433 has been submitted which I believe addresses these concerns. Also, I was able to get the pipeline processor sorted out so it shouldn't be special (as mentioned above).

This commit enhances the verbose output for the `_ingest/pipeline/_simulate?verbose` api. Specifically this adds the following: * the pipeline processor is now included in the output * the conditional (if) and result is now included in the output iff it was defined * a status field is always displayed. the possible values of status are * `success` - if the processor ran with out errors * `error` - if the processor ran but threw an error that was not ingored * `error_ignored` - if the processor ran but threw an error that was ingored * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false) * `dropped` - if the the `drop` processor ran and dropped the document * a `processor_type` field for the type of processor (e.g. set, rename, etc.) * throw a better error if trying to simulate with a pipeline that does not exist closes #56004

This commit enhances the verbose output for the `_ingest/pipeline/_simulate?verbose` api. Specifically this adds the following: * the pipeline processor is now included in the output * the conditional (if) and result is now included in the output iff it was defined * a status field is always displayed. the possible values of status are * `success` - if the processor ran with out errors * `error` - if the processor ran but threw an error that was not ingored * `error_ignored` - if the processor ran but threw an error that was ingored * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false) * `dropped` - if the the `drop` processor ran and dropped the document * a `processor_type` field for the type of processor (e.g. set, rename, etc.) * throw a better error if trying to simulate with a pipeline that does not exist closes elastic#56004 # Conflicts: # docs/reference/ingest/apis/simulate-pipeline.asciidoc

This commit enhances the verbose output for the `_ingest/pipeline/_simulate?verbose` api. Specifically this adds the following: * the pipeline processor is now included in the output * the conditional (if) and result is now included in the output iff it was defined * a status field is always displayed. the possible values of status are * `success` - if the processor ran with out errors * `error` - if the processor ran but threw an error that was not ingored * `error_ignored` - if the processor ran but threw an error that was ingored * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false) * `dropped` - if the the `drop` processor ran and dropped the document * a `processor_type` field for the type of processor (e.g. set, rename, etc.) * throw a better error if trying to simulate with a pipeline that does not exist closes #56004

jakelandis added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Apr 30, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label Apr 30, 2020

jakelandis self-assigned this May 3, 2020

jakelandis mentioned this issue May 8, 2020

Fix ingest simulate verbose on failure with conditional #56478

Merged

cjcenizal added the :ES-UI label Jun 3, 2020

elasticmachine added the Team:Deployment Management Meta label for Management Experience - Deployment Management team label Jun 3, 2020

cjcenizal removed the :ES-UI label Jun 9, 2020

jakelandis mentioned this issue Jul 29, 2020

Enhance the ingest node simulate verbose output #60433

Merged

jakelandis closed this as completed in #60433 Aug 4, 2020

Mpdreamz mentioned this issue Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this issue Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ingest Node Pipelines] More detailed _simulate response #56004

[Ingest Node Pipelines] More detailed _simulate response #56004

jloleysens commented Apr 30, 2020 •

edited by cjcenizal

Loading

elasticmachine commented Apr 30, 2020

elasticmachine commented Jun 3, 2020

jakelandis commented Jun 17, 2020 •

edited

Loading

jloleysens commented Jun 18, 2020 •

edited

Loading

jakelandis commented Jun 23, 2020

jakelandis commented Jul 30, 2020

[Ingest Node Pipelines] More detailed _simulate response #56004

[Ingest Node Pipelines] More detailed _simulate response #56004

Comments

jloleysens commented Apr 30, 2020 • edited by cjcenizal Loading

elasticmachine commented Apr 30, 2020

elasticmachine commented Jun 3, 2020

jakelandis commented Jun 17, 2020 • edited Loading

jloleysens commented Jun 18, 2020 • edited Loading

jakelandis commented Jun 23, 2020

jakelandis commented Jul 30, 2020

jloleysens commented Apr 30, 2020 •

edited by cjcenizal

Loading

jakelandis commented Jun 17, 2020 •

edited

Loading

jloleysens commented Jun 18, 2020 •

edited

Loading