Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Make anomaly detection jobs compatible with "subobjects" : false #88379

Open
droberts195 opened this issue Jul 8, 2022 · 2 comments
Open
Labels
:ml Machine learning Team:ML Meta label for the ML team team-discuss

Comments

@droberts195
Copy link
Contributor

droberts195 commented Jul 8, 2022

#86166 added the option for object fields in mappings to have a subobjects : false setting. This in turn allows fieldnames with dots to be nested inside the object, without the usual object/scalar clashes that would arise if some scalar fields have more components than others with the same prefix.

For example, subobjects : false makes the following document possible:

{
  "@timestamp" : "2022-07-08T13:23:39",
  "metrics" : {
    "responsetime" : 100, 
    "responsetime.min" : 10,
    "responsetime.max" : 900
  }
}

The mappings for such a document could look like this:

{
  "metrics1": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "metrics": {
          "subobjects": false,
          "properties": {
            "responsetime": {
              "type": "double"
            },
            "responsetime.max": {
              "type": "double"
            },
            "responsetime.min": {
              "type": "double"
            }
          }
        }
      }
    }
  }
}

Historically it would have been possible to store the document, but only by completely disabling mappings for the metrics object. With subobjects : false the dotted fields under metrics can all have mappings and participate in searches and aggregations.

It is currently possible to create a job that analyses all these fields as the field_name of detector functions.

But supposed instead we also have dotted fields that we want to use as split fields for our job, for example:

{
  "metrics2": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "attributes": {
          "subobjects": false,
          "properties": {
            "service": {
              "type": "keyword"
            },
            "service.administrator": {
              "type": "keyword"
            },
            "service.category": {
              "type": "keyword"
            }
          }
        },
        "metrics": {
          "subobjects": false,
          "properties": {
            "responsetime": {
              "type": "double"
            },
            "responsetime.max": {
              "type": "double"
            },
            "responsetime.min": {
              "type": "double"
            }
          }
        }
      }
    }
  }
}

Now creation of the job fails if we try to reference multiple fields under attributes, for example:

{
  "statusCode": 400,
  "error": "Bad Request",
  "message": "[x_content_parse_exception: [status_exception] Reason: Fields [attributes.service] and [attributes.service.administrator] cannot both be used in the same analysis_config]: [1:359] [cluster:admin/xpack/ml/job/estimate_model_memory] failed to parse field [analysis_config]",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "status_exception",
            "reason": "Fields [attributes.service] and [attributes.service.administrator] cannot both be used in the same analysis_config"
          }
        ],
        "type": "x_content_parse_exception",
        "reason": "[1:359] [cluster:admin/xpack/ml/job/estimate_model_memory] failed to parse field [analysis_config]",
        "caused_by": {
          "type": "status_exception",
          "reason": "Fields [attributes.service] and [attributes.service.administrator] cannot both be used in the same analysis_config"
        }
      },
      "status": 400
    }
  }
}

The reason we prevent this is to make it possible to include the fields in our anomaly records.

Instead we could allow jobs to be created with fields like this, and instead change the mappings on our results indices. However, there is a problem here: because results indices can be shared, the results index may already exist with mappings that are incompatible with specifying subobjects : false in the results mappings.

It's tricky to incorporate this validation at the parsing stage, as the parser cannot be expected to check the mappings on an existing index.

We have two options:

  1. Change nothing - subobjects : false will work with anomaly detection jobs if the dotted fields are used as metrics, and this was the intended use case as seen in the PR title of Add support for dots in field names for metrics usecases #86166.
  2. Change our analysis_config parser to permit field names that would clash in the results if adding subobjects : false as a results mapping is not possible. Then fail when actually creating the job if creating our desired mappings is not possible. There is already a precedent for failing at this time - if the latest job would push the number of mapped fields in the shared results index over 1000 we fail the job creation at the point of modifying the results index.
@droberts195 droberts195 added :ml Machine learning team-discuss labels Jul 8, 2022
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jul 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@droberts195
Copy link
Contributor Author

#88934 is likely to increase adoption of "subobjects" : false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning Team:ML Meta label for the ML team team-discuss
Projects
None yet
Development

No branches or pull requests

2 participants