Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] adding for_export flag for ml plugin GET resource APIs #63092

Merged
merged 6 commits into from
Oct 2, 2020

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Sep 30, 2020

This adds the new for_export flag to the following APIs:

  • GET _ml/anomaly_detection/<job_id>
  • GET _ml/datafeeds/<datafeed_id>
  • GET _ml/data_frame/analytics/<analytics_id>

The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.

The following fields are not returned in the objects:

  • any field that is not user settable (e.g. version, create_time)
  • any field that is a calculated default value (e.g. datafeed chunking_config)
  • any field that would effectively require changing to be of use (e.g. datafeed job_id)
  • any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

closes #63055

This PR is completed by the following PR #63899

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent
Copy link
Member Author

//CC @walterra @jgowdyelastic

@@ -33,10 +33,12 @@
public class GetDataFrameAnalyticsRequest implements Validatable {

public static final ParseField ALLOW_NO_MATCH = new ParseField("allow_no_match");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is there any reason for ALLOW_NO_MATCH to be ParseField rather than String?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not, but just didn't touch the existing field :)

builder.startObject(INDICES_OPTIONS.getPreferredName());
indicesOptions.toXContent(builder, params);
builder.endObject();
} else { // Don't include random defaults or unnecessary defauls in export
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else { // Don't include random defaults or unnecessary defauls in export
} else { // Don't include random defaults or unnecessary defaults in export

builder.startObject(INDICES_OPTIONS.getPreferredName());
indicesOptions.toXContent(builder, params);
builder.endObject();
} else { // Don't include random defaults or unnecessary defauls in export
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this else block is not for making the retrieved config "puttable" but it does something more (clears defaults).
I'm wondering what happens when we ever change the code for generating the default. Then the config is indexed with default D1, then in the subsequent version we change the default to be D2 and then we retrieve the config for export and the default is not cleared (as it was the default in the past but no longer is). Is that a good behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@przemekwitek a related question.

What if the user didn't set their own chunking_config but when they cloned the datafeed, they DID change the aggregation information (maybe the date_histogram bucket size). Then when they try to PUT, the chunking config is now illegal.

Should we allow this behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tough question. I guess it's better to stick to the promise that we are able to PUT the config obtained via GET with for_export...

@@ -19,7 +19,7 @@ Retrieves configuration information for {dfeeds}.

`GET _ml/datafeeds/` +

`GET _ml/datafeeds/_all`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems your editor auto-formatted the asciidoc files you touched. I've had it happen to me too when opening the asciidoc files in intelliJ, so I know edit them with other editors. I think it's worth reverting those unchanged lines to protect git history.

builder.endObject();
return builder;
}

private TimeValue defaultRandomQueryDelay() {
Random random = new Random(jobId.hashCode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a single method that calculates the default query delay between this one and Builder.setDefaultQueryDelay

builder.field(QUERY_DELAY.getPreferredName(), queryDelay.getStringRep());
}
// This is always "match_all"
if (queryProvider.equals(QueryProvider.defaultQuery()) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit to not always return the query? The query is integral to the behaviour of the datafeed. Even in the improbable scenario where we change the default query in a future release, it'd be a weird surprise to get a datafeed for_export from a previous version cluster and put it in a newer version to find out different docs are picked because the default query changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it in a newer version to find out different docs are picked because the default query changed.

If we change the default query to EXCLUDE docs, I think that is a huge thing and should probably never be done.

I can happily include the query here, but it did seem unnecessary to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that changing the default query sounds hard to justify :-)

But I do think we should include the query. We return it in get, so we might as well also return it for_export.

}

private ChunkingConfig defaultChunkingConfig() {
if (aggProvider == null || aggProvider.getParsedAggs() == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's avoid duplication on the default calculation. We can reuse this in the Builder.setDefaultChunkingConfig.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit 7bd6e78 into elastic:master Oct 2, 2020
@benwtrent benwtrent deleted the feature/ml-add-for-export-flag branch October 2, 2020 12:29
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 20, 2020
…63092)

This adds the new `for_export` flag to the following APIs:

- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that would effectively require changing to be of use (e.g. datafeed job_id)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

closes elastic#63055
benwtrent added a commit that referenced this pull request Oct 20, 2020
…ields in GET config APIs (#63899)(#63092) (#63177)

* [ML] adding for_export flag for ml plugin GET resource APIs (#63092)

This adds the new `for_export` flag to the following APIs:

- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that would effectively require changing to be of use (e.g. datafeed job_id)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

closes #63055

* [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)

When exporting and cloning ml configurations in a cluster it can be
frustrating to remove all the fields that were generated by
the plugin. Especially as the number of these fields change
from version to version.

This flag, exclude_generated, allows the GET config APIs to return
configurations with these generated fields removed.

APIs supporting this flag:
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

relates to #63055
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Add an option to ML get APIs to return only the fields required by the corresponding put API
5 participants