[ML] Make anomaly detection jobs compatible with "subobjects" : false #88379

droberts195 · 2022-07-08T12:53:24Z

#86166 added the option for object fields in mappings to have a subobjects : false setting. This in turn allows fieldnames with dots to be nested inside the object, without the usual object/scalar clashes that would arise if some scalar fields have more components than others with the same prefix.

For example, subobjects : false makes the following document possible:

{
  "@timestamp" : "2022-07-08T13:23:39",
  "metrics" : {
    "responsetime" : 100, 
    "responsetime.min" : 10,
    "responsetime.max" : 900
  }
}

The mappings for such a document could look like this:

{
  "metrics1": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "metrics": {
          "subobjects": false,
          "properties": {
            "responsetime": {
              "type": "double"
            },
            "responsetime.max": {
              "type": "double"
            },
            "responsetime.min": {
              "type": "double"
            }
          }
        }
      }
    }
  }
}

Historically it would have been possible to store the document, but only by completely disabling mappings for the metrics object. With subobjects : false the dotted fields under metrics can all have mappings and participate in searches and aggregations.

It is currently possible to create a job that analyses all these fields as the field_name of detector functions.

But supposed instead we also have dotted fields that we want to use as split fields for our job, for example:

{
  "metrics2": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "attributes": {
          "subobjects": false,
          "properties": {
            "service": {
              "type": "keyword"
            },
            "service.administrator": {
              "type": "keyword"
            },
            "service.category": {
              "type": "keyword"
            }
          }
        },
        "metrics": {
          "subobjects": false,
          "properties": {
            "responsetime": {
              "type": "double"
            },
            "responsetime.max": {
              "type": "double"
            },
            "responsetime.min": {
              "type": "double"
            }
          }
        }
      }
    }
  }
}

Now creation of the job fails if we try to reference multiple fields under attributes, for example:

{
  "statusCode": 400,
  "error": "Bad Request",
  "message": "[x_content_parse_exception: [status_exception] Reason: Fields [attributes.service] and [attributes.service.administrator] cannot both be used in the same analysis_config]: [1:359] [cluster:admin/xpack/ml/job/estimate_model_memory] failed to parse field [analysis_config]",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "status_exception",
            "reason": "Fields [attributes.service] and [attributes.service.administrator] cannot both be used in the same analysis_config"
          }
        ],
        "type": "x_content_parse_exception",
        "reason": "[1:359] [cluster:admin/xpack/ml/job/estimate_model_memory] failed to parse field [analysis_config]",
        "caused_by": {
          "type": "status_exception",
          "reason": "Fields [attributes.service] and [attributes.service.administrator] cannot both be used in the same analysis_config"
        }
      },
      "status": 400
    }
  }
}

The reason we prevent this is to make it possible to include the fields in our anomaly records.

Instead we could allow jobs to be created with fields like this, and instead change the mappings on our results indices. However, there is a problem here: because results indices can be shared, the results index may already exist with mappings that are incompatible with specifying subobjects : false in the results mappings.

It's tricky to incorporate this validation at the parsing stage, as the parser cannot be expected to check the mappings on an existing index.

We have two options:

Change nothing - subobjects : false will work with anomaly detection jobs if the dotted fields are used as metrics, and this was the intended use case as seen in the PR title of Add support for dots in field names for metrics usecases #86166.
Change our analysis_config parser to permit field names that would clash in the results if adding subobjects : false as a results mapping is not possible. Then fail when actually creating the job if creating our desired mappings is not possible. There is already a precedent for failing at this time - if the latest job would push the number of mapped fields in the shared results index over 1000 we fail the job creation at the point of modifying the results index.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-07-08T12:53:27Z

Pinging @elastic/ml-core (Team:ML)

droberts195 · 2023-07-03T10:05:28Z

#88934 is likely to increase adoption of "subobjects" : false.

droberts195 added :ml Machine learning team-discuss labels Jul 8, 2022

elasticmachine added the Team:ML Meta label for the ML team label Jul 8, 2022

droberts195 mentioned this issue Jul 19, 2022

[ML] Weird advice from explain endpoint about aggregatable fields for DFA index with "subobjects" : false #88605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Make anomaly detection jobs compatible with "subobjects" : false #88379

[ML] Make anomaly detection jobs compatible with "subobjects" : false #88379

droberts195 commented Jul 8, 2022 •

edited

Loading

elasticmachine commented Jul 8, 2022

droberts195 commented Jul 3, 2023

[ML] Make anomaly detection jobs compatible with "subobjects" : false #88379

[ML] Make anomaly detection jobs compatible with "subobjects" : false #88379

Comments

droberts195 commented Jul 8, 2022 • edited Loading

elasticmachine commented Jul 8, 2022

droberts195 commented Jul 3, 2023

droberts195 commented Jul 8, 2022 •

edited

Loading