ml inference ingest processor support for local models #2508

rbhavna · 2024-06-05T22:03:27Z

Description

ml inference ingest processor support for local models

Issues Resolved

[List any issues this PR will resolve]

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Bhavana Ramaram <[email protected]>

mingshl · 2024-06-06T17:23:28Z

plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java

- .put(MLInferenceIngestProcessor.TYPE, new MLInferenceIngestProcessor.Factory(parameters.scriptService, parameters.client));
+ .put(
+ MLInferenceIngestProcessor.TYPE,
+ new MLInferenceIngestProcessor.Factory(parameters.scriptService, parameters.client, xContentRegistry)


why we need the xContentRegistry passed from the plugin?

We add it here because it can be added as dependency argument to MLInferenceIngestProcessor when its instantiated or created in Factory.

mingshl · 2024-06-06T17:26:42Z

plugin/src/main/java/org/opensearch/ml/processor/MLInferenceIngestProcessor.java

+ existingFields++;
+ }
+ }
+ if (!override && existingFields == dotPaths.size()) {


if override is false and existing Fields is the same as the dothPath size, we skip adding the output mapping in silent ?

Yes, it means if the ingest document had a field that already has the processor out field, e.g. text_embedding field, we can skip it in current running processor. User can explicitly set override to true to rewrite the output field though

how does the user know if the field is skipped? Maybe we can add some logging.

sure will do that

mingshl · 2024-06-06T17:28:39Z

plugin/src/main/java/org/opensearch/ml/processor/MLInferenceIngestProcessor.java

+ int existingFields = 0;
+ for (String path : dotPaths) {
+ if (ingestDocument.hasField(path)) {
+ existingFields++;


when the document doesn't have the new field, will it add the newField to output mapping?

No the output fields are already added to newOutputMapping in line 204. In this for loop, we see if any of that field in the same specified path already exists in the document. In which case, if override flag is false, we remove that field from the output mapping. This will make sure we are can save time for re-processing the field that might have already been inferred previously inferred

plugin/src/main/java/org/opensearch/ml/processor/MLInferenceIngestProcessor.java

plugin/src/main/java/org/opensearch/ml/processor/ModelExecutor.java

plugin/src/test/java/org/opensearch/ml/processor/MLInferenceIngestProcessorTests.java

Signed-off-by: Bhavana Ramaram <[email protected]>

plugin/src/main/java/org/opensearch/ml/processor/MLInferenceIngestProcessor.java

plugin/src/main/java/org/opensearch/ml/processor/ModelExecutor.java

mingshl · 2024-06-10T21:32:01Z

plugin/src/test/java/org/opensearch/ml/rest/RestMLInferenceIngestProcessorIT.java

+ List embedding1 = JsonPath.parse(document).read("_source.books[0].title_embedding");
+ Assert.assertEquals(1536, embedding1.size());
+ List embedding2 = JsonPath.parse(document).read("_source.books[1].title_embedding");
+ Assert.assertEquals(1536, embedding2.size());


for the IT, this test for each processor with nested documents, can you also add the test that do not use for each processor? does it still work fine?

Yes it works fine. Will add few more tests for local models

mingshl · 2024-06-10T21:33:01Z

plugin/src/test/java/org/opensearch/ml/rest/RestMLInferenceIngestProcessorIT.java

+ MLModelConfig modelConfig = TextEmbeddingModelConfig
+ .builder()
+ .modelType("bert")
+ .frameworkType(TextEmbeddingModelConfig.FrameworkType.SENTENCE_TRANSFORMERS)


is the local model only supporting for sentence transformers? have you test other type of local models?

It is also working on sparse_encoders and cross encoding models. Will add more unit tests

Found it difficult to add ITs for other models. Added them in UTs. Currently we dont have predict IT tests for other models. Its not letting me use pre-trained models because of size and timing-out issue. I can add them by first adding a few small model URLs to the test data first. I added a TODO within the test class

Signed-off-by: Bhavana Ramaram <[email protected]>

* ml inference ingest processor support for local models Signed-off-by: Bhavana Ramaram <[email protected]> (cherry picked from commit 7cd5291)

* ml inference ingest processor support for local models Signed-off-by: Bhavana Ramaram <[email protected]> (cherry picked from commit 7cd5291) Co-authored-by: Bhavana Ramaram <[email protected]>

opensearch-trigger-bot · 2024-10-01T20:18:52Z

The backport to feature/multi_tenancy failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-feature/multi_tenancy feature/multi_tenancy
# Navigate to the new working tree
cd .worktrees/backport-feature/multi_tenancy
# Create a new branch
git switch --create backport/backport-2508-to-feature/multi_tenancy
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7cd52915d04d8ac7ddb6e37a74a256603587ce69
# Push it to GitHub
git push --set-upstream origin backport/backport-2508-to-feature/multi_tenancy
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-feature/multi_tenancy

Then, create a pull request where the base branch is feature/multi_tenancy and the compare/head branch is backport/backport-2508-to-feature/multi_tenancy.

…oject#2508) * ml inference ingest processor support for local models Signed-off-by: Bhavana Ramaram <[email protected]>

* ml inference ingest processor support for local models Signed-off-by: Bhavana Ramaram <[email protected]> Co-authored-by: Bhavana Ramaram <[email protected]>

ml inference ingest processor support for local models

5c6aa08

Signed-off-by: Bhavana Ramaram <[email protected]>

rbhavna requested review from b4sjoo, dhrubo-os, jngz-es, model-collapse, ylwu-amzn, zane-neo, Zhangxunmt, austintlee, HenryL27, samuel-oci and xinyual as code owners June 5, 2024 22:03

rbhavna had a problem deploying to ml-commons-cicd-env June 5, 2024 22:03 — with GitHub Actions Error

rbhavna had a problem deploying to ml-commons-cicd-env June 5, 2024 22:03 — with GitHub Actions Failure

rbhavna had a problem deploying to ml-commons-cicd-env June 5, 2024 22:03 — with GitHub Actions Error

rbhavna had a problem deploying to ml-commons-cicd-env June 5, 2024 22:03 — with GitHub Actions Failure

mingshl reviewed Jun 6, 2024

View reviewed changes

add unit tests and ITs

a78f03c

Signed-off-by: Bhavana Ramaram <[email protected]>

rbhavna had a problem deploying to ml-commons-cicd-env June 10, 2024 19:25 — with GitHub Actions Failure

rbhavna had a problem deploying to ml-commons-cicd-env June 10, 2024 19:25 — with GitHub Actions Error

rbhavna had a problem deploying to ml-commons-cicd-env June 10, 2024 19:26 — with GitHub Actions Failure

rbhavna had a problem deploying to ml-commons-cicd-env June 10, 2024 19:26 — with GitHub Actions Error

jngz-es reviewed Jun 10, 2024

View reviewed changes

plugin/src/main/java/org/opensearch/ml/processor/MLInferenceIngestProcessor.java Show resolved Hide resolved

jngz-es reviewed Jun 10, 2024

View reviewed changes

plugin/src/main/java/org/opensearch/ml/processor/ModelExecutor.java Show resolved Hide resolved

jngz-es previously approved these changes Jun 10, 2024

View reviewed changes

mingshl reviewed Jun 10, 2024

View reviewed changes

ylwu-amzn previously approved these changes Jun 11, 2024

View reviewed changes

rbhavna had a problem deploying to ml-commons-cicd-env June 11, 2024 19:12 — with GitHub Actions Error

rbhavna had a problem deploying to ml-commons-cicd-env June 11, 2024 19:12 — with GitHub Actions Failure

rbhavna had a problem deploying to ml-commons-cicd-env June 11, 2024 19:12 — with GitHub Actions Error

rbhavna had a problem deploying to ml-commons-cicd-env June 11, 2024 19:12 — with GitHub Actions Failure

fix failing ITs

a4f711b

Signed-off-by: Bhavana Ramaram <[email protected]>

rbhavna dismissed stale reviews from ylwu-amzn and jngz-es via a4f711b June 11, 2024 19:34

rbhavna temporarily deployed to ml-commons-cicd-env June 11, 2024 19:34 — with GitHub Actions Inactive

rbhavna temporarily deployed to ml-commons-cicd-env June 11, 2024 19:35 — with GitHub Actions Inactive

mingshl approved these changes Jun 11, 2024

View reviewed changes

ylwu-amzn approved these changes Jun 11, 2024

View reviewed changes

b4sjoo approved these changes Jun 11, 2024

View reviewed changes

rbhavna merged commit 7cd5291 into opensearch-project:main Jun 11, 2024
9 checks passed

opensearch-trigger-bot bot mentioned this pull request Jun 11, 2024

[Backport 2.x] ml inference ingest processor support for local models #2532

Merged

rbhavna mentioned this pull request Jun 11, 2024

Update documentation of ml inference processors to support for local models opensearch-project/documentation-website#7368

Merged

1 task

ylwu-amzn mentioned this pull request Jun 14, 2024

[ENHANCEMENT] Avoid un-necessary predictions in ingest processors #2413

Closed

dhrubo-os mentioned this pull request Jun 25, 2024

[META] Support local model in ML inference processor #2499

Closed

dhrubo-os added the backport feature/multi_tenancy label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml inference ingest processor support for local models #2508

ml inference ingest processor support for local models #2508

rbhavna commented Jun 5, 2024

mingshl Jun 6, 2024

rbhavna Jun 6, 2024

mingshl Jun 6, 2024

rbhavna Jun 6, 2024

mingshl Jun 10, 2024

rbhavna Jun 10, 2024

mingshl Jun 6, 2024

rbhavna Jun 6, 2024

mingshl Jun 10, 2024

rbhavna Jun 10, 2024

mingshl Jun 10, 2024

rbhavna Jun 10, 2024

rbhavna Jun 11, 2024

opensearch-trigger-bot bot commented Oct 1, 2024

ml inference ingest processor support for local models #2508

ml inference ingest processor support for local models #2508

Conversation

rbhavna commented Jun 5, 2024

Description

Issues Resolved

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

opensearch-trigger-bot bot commented Oct 1, 2024