Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add new include flag to GET inference/<model_id> API for model training metadata #61922

Merged

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Sep 3, 2020

Adds new flag include to the get trained models API

The flag initially has two valid values: definition, total_feature_importance.

Consequently, the old include_model_definition flag is now deprecated.

When total_feature_importance is included, the total_feature_importance field
is included in the model metadata object.

Including definition is the same as previously setting include_model_definition=true.

@benwtrent benwtrent force-pushed the feature/ml-inference-add-description-api branch from ece8df8 to eacc213 Compare September 3, 2020 16:22
@benwtrent benwtrent requested a review from lcawl September 4, 2020 12:52
@benwtrent benwtrent force-pushed the feature/ml-inference-add-description-api branch 2 times, most recently from 082aff6 to 19b0eb6 Compare September 4, 2020 13:50
@benwtrent benwtrent force-pushed the feature/ml-inference-add-description-api branch from 19b0eb6 to bfb390e Compare September 4, 2020 17:21
@benwtrent
Copy link
Member Author

@elasticmachine update branch

@benwtrent benwtrent requested a review from lcawl September 8, 2020 15:39
Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added three more suggestions, otherwise LGTM


`from`::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't notice this sooner, but these descriptions have the same problem--they're job-specific. I've created model-specific descriptions in #62128

Suggested change
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from]
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from-models]


`size`::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto re needing a model-specific description:

Suggested change
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size]
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size-models]

}
},
{
"feature_name" : "FlightTimeMin",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very long example, so it's worth considering putting "..." in a couple of places where the info is redundant.

Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs LGTM

@benwtrent benwtrent force-pushed the feature/ml-inference-add-description-api branch from 18f2a16 to 4e4a198 Compare September 16, 2020 14:12
@benwtrent benwtrent marked this pull request as ready for review September 16, 2020 17:32
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent benwtrent changed the title [ML] Add new inference/<model_id>/_metadata API for model training metadata [ML] Add new include flag to GET inference/<model_id> API for model training metadata Sep 16, 2020
Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has changed a lot since my last review, so I've added more suggestions.

<4> Indicate if the total feature importance for the features used in training
should be included in the model `metadata` field.
<5> Should the definition be fully decompressed on GET
<6> Allow empty response if no Trained Models match the provided ID patterns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<6> Allow empty response if no Trained Models match the provided ID patterns.
<6> Allow empty response if no trained models match the provided ID patterns.

If false, an error will be thrown if no Trained Models match the
ID patterns.
<6> An optional list of tags used to narrow the model search. A Trained Model
<7> An optional list of tags used to narrow the model search. A Trained Model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<7> An optional list of tags used to narrow the model search. A Trained Model
<7> An optional list of tags used to narrow the model search. A trained model

=====
`total_feature_importance`:::
(array)
An array of the total feature importance for each training feature used from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An array of the total feature importance for each training feature used from

AFAIK we've never defined a "training feature". Can we just say feature? If not, this term needs to be explained.

@@ -785,6 +785,23 @@ prediction. Defaults to the `results_field` value of the {dfanalytics-job} that
used to train the model, which defaults to `<dependent_variable>_prediction`.
end::inference-config-results-field-processor[]

tag::inference-metadata-feature-importance-feature-name[]
The training feature name for which this importance was calculated.
Copy link
Contributor

@lcawl lcawl Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training feature name

Ditto earlier comment about "training feature".

Suggested change
The training feature name for which this importance was calculated.
The feature for which this importance was calculated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this in #62643 too

(Optional, string)
A comma delimited string of optional fields to include in the response body.
Valid options are:
- definition: to include the model definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- definition: to include the model definition
- `definition`: Includes the model definition

A comma delimited string of optional fields to include in the response body.
Valid options are:
- definition: to include the model definition
- total_feature_importance: to include the total feature importance for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- total_feature_importance: to include the total feature importance for the
- `total_feature_importance`: Includes the total feature importance for the

Valid options are:
- definition: to include the model definition
- total_feature_importance: to include the total feature importance for the
training feature sets. This field will be available in the `metadata` field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
training feature sets. This field will be available in the `metadata` field.
training data sets. This field is available in the `metadata` field in the response body.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

can have many tags or none. The trained models in the response will
contain all the provided tags.
<7> Optional boolean value indicating if certain fields should be removed on
<8> Optional boolean value indicating if certain fields should be removed on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap these 2 sentences around, the fact that certain fields are removed is the detail

<8> Optional boolean value for requesting the trained model in a format that can then be put into another cluster. Certain fields that can only be set when the model is imported are removed.

@@ -408,7 +412,7 @@ public Builder(TrainedModelConfig config) {
this.definition = config.definition == null ? null : new LazyModelDefinition(config.definition);
this.description = config.getDescription();
this.tags = config.getTags();
this.metadata = config.getMetadata();
this.metadata = config.getMetadata() == null ? null : new HashMap<>(config.getMetadata());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package private ctor also wraps this map in line 151

this.metadata = metadata == null ? null : Collections.unmodifiableMap(metadata);

Make the ctor argument a UnmodifiableMap and getMetadata() return an UnmodifiableMap and I think that simplifies these maps being wrapped in a map in a map etc

}
if (this.metadata == null) {
this.metadata = new HashMap<>();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a copy of the unmodifiable map here

@@ -482,11 +518,11 @@ public void getTrainedModel(final String modelId, final boolean includeDefinitio
try {
builder = handleSearchItem(multiSearchResponse.getResponses()[0], modelId, this::parseInferenceDocLenientlyFromSource);
} catch (ResourceNotFoundException ex) {
listener.onFailure(new ResourceNotFoundException(
getTrainedModelListener.onFailure(new ResourceNotFoundException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It is clearer if finalListener is used for these onFailure calls.

if (includeTotalFeatureImportance == false) {
finalListener.onResponse(modelBuilders.stream()
.map(TrainedModelConfig.Builder::build)
.sorted(Comparator.comparing(TrainedModelConfig::getModelId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configs are still sorted by the search aren't they? Sorting here shouldn't be necessary

.sorted(Comparator.comparing(TrainedModelConfig::getModelId))
.collect(Collectors.toList()));
return;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete extra blank line

.collect(Collectors.toList())),
failure -> {
// total feature importance is not necessary for a model to be valid
// we shouldn't fail if it is not found
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense

Is it the case that newer models after version 7.10 will always have the total feature importance?

@@ -142,6 +142,8 @@ yamlRestTest {
'ml/inference_crud/Test put ensemble with tree where tree has out of bounds feature_names index',
'ml/inference_crud/Test put model with empty input.field_names',
'ml/inference_crud/Test PUT model where target type and inference config mismatch',
'ml/inference_metadata/Test get given missing trained model metadata',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a inference_metadata.yml did you forget to add it?

@benwtrent
Copy link
Member Author

run elasticsearch-ci/2

created by {dfanalytics} contain `analysis_config` and `input` objects.
.Properties of metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't formatting properly, so I think a continuation character is required.

Suggested change
.Properties of metadata
+
.Properties of metadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this in #62643

@benwtrent benwtrent merged commit fdb7b6d into elastic:master Sep 18, 2020
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Sep 18, 2020
…raining metadata (elastic#61922)

Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
benwtrent added a commit that referenced this pull request Sep 18, 2020
…odel training metadata (#61922) (#62620)

* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)

Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.

* fixing test

* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
@benwtrent benwtrent deleted the feature/ml-inference-add-description-api branch October 19, 2020 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants