Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Node Pipelines] More detailed _simulate response #56004

Closed
jloleysens opened this issue Apr 30, 2020 · 6 comments · Fixed by #60433
Closed

[Ingest Node Pipelines] More detailed _simulate response #56004

jloleysens opened this issue Apr 30, 2020 · 6 comments · Fixed by #60433
Assignees
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team Team:Deployment Management Meta label for Management Experience - Deployment Management team

Comments

@jloleysens
Copy link
Contributor

jloleysens commented Apr 30, 2020

Describe the feature:

In Kibana, new UI was built to support management of ingest node pipelines (elastic/kibana#62321). The UI also gives users a way of simulating the pipeline while creating or editing it.

The simulate UI we are envisioning will provide users with detailed information about the path each document has traveled through the simulated pipeline. Including the following information:

  1. An indication of all processors that ran against the document. This will enable a tree view of the set (or subset) of processors.
  2. At each processor did it fail or not pass the conditional (if). This will enable ✅ and ❌ indications at each processor. Critically, it would be important to know why something went wrong.
  3. For a specific processor what was updated in the document (or a point-in-time look at the document post-processor) - something like this would enable a visual diff at each step.

On ES 8.0.0 snapshot this is an example request-response pair:

Request
{
	"pipeline": {
		"description": "_description",
		"processors": [
			{
				"set": {
					"field": "field3",
					"value": "_value3",
					"tag": "THIS SHOULD BE THERE"
				}
			},
			{
				"rename": {
					"if": "ctx.foo == 'bar'",
					"field": "foo1",
					"target_field": "fieldA",
					"tag": "field1_renamer",
					"on_failure": [
						{
							"set": {
								"field": "field4",
								"value": "THIS SHOULD BE THERE FROM FAILURE",
								"tag": "BLAH TEST"
							}
						}
					]
				}
			},
			{
				"set": {
					"field": "field3",
					"value": "_value3",
					"tag": "THIS SHOULD BE THERE"
				}
			}
		]
	},
	"docs": [
		{
			"_index": "index",
			"_id": "id",
			"_source": {
				"foo": "bar"
			}
		},
		{
			"_index": "index",
			"_id": "id",
			"_source": {
				"foo": "123"
			}
		}
	]
}
Response
{
  "docs": [
    {
      "processor_results": [
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        },
        {
          "tag": "field1_renamer",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar",
              "field4": "THIS SHOULD BE THERE FROM FAILURE"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        },
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "bar",
              "field4": "THIS SHOULD BE THERE FROM FAILURE"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842161Z"
            }
          }
        }
      ]
    },
    {
      "processor_results": [
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "123"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842165Z"
            }
          }
        },
        {
          "tag": "THIS SHOULD BE THERE",
          "doc": {
            "_index": "index",
            "_id": "id",
            "_source": {
              "field3": "_value3",
              "foo": "123"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2020-04-30T08:26:53.842165Z"
            }
          }
        }
      ]
    }
  ]
}

Given the above, we do not have a way of mapping these results back to the submitted pipeline to achieve 1 and 2, only a less detailed version of 3. (Where does THIS SHOULD BE THERE FROM FAILURE actually come from?)

One solution is that the response would be a structural mirror of the pipeline submitted to simulate. This would be simplest for consumers to map the result tree back to the submitted tree, each path would map back to a specific processor.

Alternatively, a flat structure could still work if we use tag as a placeholder for a serialised path which points to the processor in the submitted pipeline (e.g., 0.on_failure.1). However tag will still be exposed to users as a field they can enter values into (so we would be hijacking it for the call to _simulate). See this issue (#56000) concerning the multiple concerns of tag.

Assistance here would be greatly appreciated.

CC @jakelandis @talevy @cjcenizal

@jakelandis jakelandis added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Apr 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Apr 30, 2020
@jakelandis jakelandis self-assigned this May 3, 2020
jakelandis added a commit that referenced this issue May 12, 2020
If a conditional is added to a processor, and that processor fails, and 
that processor has an on_failure handler, the full trace of all of the 
executed processors may not be displayed in simulate verbose. The 
information is correct, but misses displaying some of the steps used 
to get there.

This happens because a processor that is conditional processor is a 
wrapper around the real processor and a processor with an on_failure 
handler is also a wrapper around the processor(s). When decorating for 
simulation we treat compound processor specially, but if a compound processor
is wrapped by a conditional processor that compound processor's processors 
can be missed for decoration resulting in the missing displayed steps.

The fix to this is to treat the conditional processor specially and
explicitly seperate it from the processor it is wrapping. This requires
us to keep track of 2 processors a possible conditional processor and
the actual processor it may be wrapping.

related: #56004
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue May 12, 2020
If a conditional is added to a processor, and that processor fails, and 
that processor has an on_failure handler, the full trace of all of the 
executed processors may not be displayed in simulate verbose. The 
information is correct, but misses displaying some of the steps used 
to get there.

This happens because a processor that is conditional processor is a 
wrapper around the real processor and a processor with an on_failure 
handler is also a wrapper around the processor(s). When decorating for 
simulation we treat compound processor specially, but if a compound processor
is wrapped by a conditional processor that compound processor's processors 
can be missed for decoration resulting in the missing displayed steps.

The fix to this is to treat the conditional processor specially and
explicitly seperate it from the processor it is wrapping. This requires
us to keep track of 2 processors a possible conditional processor and
the actual processor it may be wrapping.

related: elastic#56004
jakelandis added a commit that referenced this issue May 12, 2020
#56635)

If a conditional is added to a processor, and that processor fails, and 
that processor has an on_failure handler, the full trace of all of the 
executed processors may not be displayed in simulate verbose. The 
information is correct, but misses displaying some of the steps used 
to get there.

This happens because a processor that is conditional processor is a 
wrapper around the real processor and a processor with an on_failure 
handler is also a wrapper around the processor(s). When decorating for 
simulation we treat compound processor specially, but if a compound processor
is wrapped by a conditional processor that compound processor's processors 
can be missed for decoration resulting in the missing displayed steps.

The fix to this is to treat the conditional processor specially and
explicitly seperate it from the processor it is wrapping. This requires
us to keep track of 2 processors a possible conditional processor and
the actual processor it may be wrapping.

related: #56004
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ui (:ES-UI)

@elasticmachine elasticmachine added the Team:Deployment Management Meta label for Management Experience - Deployment Management team label Jun 3, 2020
@cjcenizal cjcenizal removed the :ES-UI label Jun 9, 2020
@jakelandis
Copy link
Contributor

jakelandis commented Jun 17, 2020

#56478 fixed a bug that may have led to missing information from the verbose output.
#57906 introduces a description for each processor and #58207 will echo that in the verbose output.

With the bug fixed and description displayed, we still need:

  • display a processor if the if condition resulted in false
  • better handling when the drop processor drops is executed (today it is just a null in the output)

I would like to propose to introduce to the output:

  • a new status field with the options success, error, error_ignored, skipped, dropped
  • a new if object that shows the condition and the resultant status
  • a new processor_type to help identify what kind of processor it is.

For example, for the given input (success):

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "if": "1 + 1 == 2",
          "field": "a",
          "value": true,
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

the verbose output is:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "success",
          "processor_type" : "set",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          },
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : {
              "a" : true
            },
            "_ingest" : {
              "pipeline" : "_simulate_pipeline",
              "timestamp" : "2020-06-17T22:22:59.338538Z"
            }
          }
        }
      ]
    }
  ]
}

For the given input (skipped):

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "if": "1 + 1 == 3",
          "field": "a",
          "value": true,
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

the output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "skipped",
          "processor_type" : "set",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 3",
            "result" : false
          }
        }
      ]
    }
  ]
}

input (error_ignore):

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "if": "1 + 1 == 2",
          "target_field": "a",
          "field": "b",
          "ignore_failure": true, 
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "error_ignored",
          "processor_type" : "rename",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          },
          "ignored_error" : {
            "error" : {
              "root_cause" : [
                {
                  "type" : "illegal_argument_exception",
                  "reason" : "field [b] doesn't exist"
                }
              ],
              "type" : "illegal_argument_exception",
              "reason" : "field [b] doesn't exist"
            }
          },
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : { },
            "_ingest" : {
              "pipeline" : "_simulate_pipeline",
              "timestamp" : "2020-06-17T22:26:07.137658Z"
            }
          }
        }
      ]
    }
  ]
}

input (error)

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "if": "1 + 1 == 2",
          "target_field": "a",
          "field": "b",
          "ignore_failure": false, 
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "error",
          "processor_type" : "rename",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          },
          "error" : {
            "root_cause" : [
              {
                "type" : "illegal_argument_exception",
                "reason" : "field [b] doesn't exist"
              }
            ],
            "type" : "illegal_argument_exception",
            "reason" : "field [b] doesn't exist"
          }
        }
      ]
    }
  ]
}

input (dropped)

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "drop": {
          "if": "1 + 1 == 2",
          "description": "my description",
          "tag" : "my tag"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}

output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "dropped",
          "processor_type" : "drop",
          "description" : "my description",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          }
        }
      ]
    }
  ]
}

dropped and error are terminal, such that they will always be the last item displayed (if it exists).
status will always be present, and it is safe to assume that skipped = if evaluated to false.
the if object will only appear if there is an if condition defined
the 'description', 'tag' will only show if they are defined

There is a bit of an oddity with the pipeline processor such that the pipeline processor itself doesn't show it's description or tag, support for the pipeline processor via simulate verbose has some pre-existing short comings. It works, but it requires a real pipeline to call out to and unless you use the tag or description in creative ways there is no providence from which pipeline that processor originated.

Here is more verbose example:

PUT _ingest/pipeline/mypipeline
{
  "processors": [
    {
      "set" : {
        "field": "mypipelineprocessor",
        "value": true,
        "description": "description 4"
      }
    }
  ]
}

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "if": "1 + 1 == 2",
          "field": "a",
          "value": true,
          "description": "description one"
        }
      },
      {
        "rename": {
          "target_field": "ggg",
          "field": "hhhh",
          "ignore_failure": true,
          "description": "description two"
        }
      },
      {
        "pipeline": {
          "description": "description three",
          "name": "mypipeline"
        }
      },
      {
        "drop": {
          "if": "9 * 2 == 11",
          "description": "description five"
        }
      },
      {
        "set": {
          "description": "description six",
          "if": "1 + 1 == 3",
          "field": "b",
          "value": true,
          "tag" : "my tag"
        }
      },
      {
        "drop": {
          "if": "9 * 2 == 18",
          "description": "description seven"
        }
      },
      {
        "set": {
          "description": "description eight",
          "field": "c",
          "value": true
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {}
    }
  ]
}


output:

{
  "docs" : [
    {
      "processor_results" : [
        {
          "status" : "success",
          "processor_type" : "set",
          "description" : "description one",
          "if" : {
            "condition" : "1 + 1 == 2",
            "result" : true
          },
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : {
              "a" : true
            },
            "_ingest" : {
              "pipeline" : "_simulate_pipeline",
              "timestamp" : "2020-06-17T22:38:26.837775Z"
            }
          }
        },
        {
          "status" : "error_ignored",
          "processor_type" : "rename",
          "description" : "description two",
          "ignored_error" : {
            "error" : {
              "root_cause" : [
                {
                  "type" : "illegal_argument_exception",
                  "reason" : "field [hhhh] doesn't exist"
                }
              ],
              "type" : "illegal_argument_exception",
              "reason" : "field [hhhh] doesn't exist"
            }
          },
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : {
              "a" : true
            },
            "_ingest" : {
              "pipeline" : "_simulate_pipeline",
              "timestamp" : "2020-06-17T22:38:26.837775Z"
            }
          }
        },
        {
          "status" : "success",
          "processor_type" : "set",
          "description" : "description 4",
          "doc" : {
            "_index" : "_index",
            "_id" : "_id",
            "_source" : {
              "mypipelineprocessor" : true,
              "a" : true
            },
            "_ingest" : {
              "pipeline" : "mypipeline",
              "timestamp" : "2020-06-17T22:38:26.837775Z"
            }
          }
        },
        {
          "status" : "skipped",
          "processor_type" : "drop",
          "description" : "description five",
          "if" : {
            "condition" : "9 * 2 == 11",
            "result" : false
          }
        },
        {
          "status" : "skipped",
          "processor_type" : "set",
          "description" : "description six",
          "tag" : "my tag",
          "if" : {
            "condition" : "1 + 1 == 3",
            "result" : false
          }
        },
        {
          "status" : "dropped",
          "processor_type" : "drop",
          "description" : "description seven",
          "if" : {
            "condition" : "9 * 2 == 18",
            "result" : true
          }
        }
      ]
    }
  ]
}

@jloleysens
Copy link
Contributor Author

jloleysens commented Jun 18, 2020

@jakelandis This looks really great!

With regards to your new additions:

a new status field with the options success, error, error_ignored, skipped, dropped

This sounds fantastic and will be vital for the kind of debugging experience we want to build

a new if object that shows the condition and the resultant status

This is also an excellent addition giving us a way to give the user good feedback on why something was skipped.

a new processor_type to help identify what kind of processor it is.

I am not sure this is as useful for our use case. We will still want to be able to tie this processor result back to the submitted pipeline instance. This means we will also have the processor type available.

Couple initial follow up questions from my side:

  1. I also assume no structural changes to the processors results were introduced. So we still have a flat structure inside of processor_results?

  2. In order to tie these results back to the original pipeline programmatically we will make use of the tag field, assigning a unique id. We will then iterate over the processors results over the tree and map them to specific processor instances based on their ids. Is it correct to say that in all cases if I provide a tag it will be returned in the processor_results array for all types of processors (pipeline or otherwise)?

@jakelandis
Copy link
Contributor

I also assume no structural changes to the processors results were introduced.

correct, same structure with some additional elements.

Is it correct to say that in all cases if I provide a tag it will be returned in the processor_results array for all types of processors (pipeline or otherwise)?

There is a current the oddity (as mentioned above) such that the pipeline processor itself (the one that forks out to a different pipeline) is not part of the output... I can look into that further but can't commit to that quite yet. Otherwise, yes tag will always be included.

@jakelandis
Copy link
Contributor

#60433 has been submitted which I believe addresses these concerns. Also, I was able to get the pipeline processor sorted out so it shouldn't be special (as mentioned above).

jakelandis added a commit that referenced this issue Aug 4, 2020
This commit enhances the verbose output for the
`_ingest/pipeline/_simulate?verbose` api. Specifically
this adds the following:

* the pipeline processor is now included in the output
* the conditional (if) and result is now included in the output iff it was defined
* a status field is always displayed. the possible values of status are
  * `success` - if the processor ran with out errors
  * `error` - if the processor ran but threw an error that was not ingored
  * `error_ignored` - if the processor ran but threw an error that was ingored
  * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false)
  * `dropped` - if the the `drop` processor ran and dropped the document
* a `processor_type` field for the type of processor (e.g. set, rename, etc.)
* throw a better error if trying to simulate with a pipeline that does not exist

closes #56004
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue Aug 4, 2020
This commit enhances the verbose output for the
`_ingest/pipeline/_simulate?verbose` api. Specifically
this adds the following:

* the pipeline processor is now included in the output
* the conditional (if) and result is now included in the output iff it was defined
* a status field is always displayed. the possible values of status are
  * `success` - if the processor ran with out errors
  * `error` - if the processor ran but threw an error that was not ingored
  * `error_ignored` - if the processor ran but threw an error that was ingored
  * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false)
  * `dropped` - if the the `drop` processor ran and dropped the document
* a `processor_type` field for the type of processor (e.g. set, rename, etc.)
* throw a better error if trying to simulate with a pipeline that does not exist

closes elastic#56004
# Conflicts:
#	docs/reference/ingest/apis/simulate-pipeline.asciidoc
jakelandis added a commit that referenced this issue Aug 27, 2020
This commit enhances the verbose output for the
`_ingest/pipeline/_simulate?verbose` api. Specifically
this adds the following:
* the pipeline processor is now included in the output
* the conditional (if) and result is now included in the output iff it was defined
* a status field is always displayed. the possible values of status are
  * `success` - if the processor ran with out errors
  * `error` - if the processor ran but threw an error that was not ingored
  * `error_ignored` - if the processor ran but threw an error that was ingored
  * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false)
  * `dropped` - if the the `drop` processor ran and dropped the document
* a `processor_type` field for the type of processor (e.g. set, rename, etc.)
* throw a better error if trying to simulate with a pipeline that does not exist

closes #56004
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team Team:Deployment Management Meta label for Management Experience - Deployment Management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants