Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AD command for PPL/AD integration #455

Merged
merged 8 commits into from
Mar 8, 2022
Merged

Add AD command for PPL/AD integration #455

merged 8 commits into from
Mar 8, 2022

Conversation

jackiehanyang
Copy link
Contributor

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jackiehanyang jackiehanyang requested a review from a team as a code owner March 3, 2022 21:16
@codecov-commenter
Copy link

codecov-commenter commented Mar 3, 2022

Codecov Report

Merging #455 (fa1ef16) into feature/ppl-ml (9768bd9) will decrease coverage by 0.57%.
The diff coverage is 29.60%.

Impacted file tree graph

@@                 Coverage Diff                  @@
##             feature/ppl-ml     #455      +/-   ##
====================================================
- Coverage             95.78%   95.20%   -0.58%     
- Complexity             2704     2715      +11     
====================================================
  Files                   269      273       +4     
  Lines                  7281     7364      +83     
  Branches                544      550       +6     
====================================================
+ Hits                   6974     7011      +37     
- Misses                  252      298      +46     
  Partials                 55       55              
Flag Coverage Δ
query-workbench 62.91% <ø> (ø)
sql-engine 98.37% <29.60%> (-0.68%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...a/org/opensearch/sql/utils/MLCommonsConstants.java 0.00% <0.00%> (ø)
...ch/sql/opensearch/planner/physical/ADOperator.java 0.00% <0.00%> (ø)
...opensearch/planner/physical/MLCommonsOperator.java 30.43% <0.00%> (+18.95%) ⬆️
...rch/planner/physical/MLCommonsOperatorActions.java 1.75% <1.75%> (ø)
...ain/java/org/opensearch/sql/analysis/Analyzer.java 100.00% <100.00%> (ø)
.../org/opensearch/sql/planner/logical/LogicalAD.java 100.00% <100.00%> (ø)
...ch/sql/planner/logical/LogicalPlanNodeVisitor.java 100.00% <100.00%> (ø)
.../sql/planner/physical/PhysicalPlanNodeVisitor.java 100.00% <100.00%> (ø)
...ecutor/protector/OpenSearchExecutionProtector.java 100.00% <100.00%> (ø)
...search/sql/opensearch/storage/OpenSearchIndex.java 100.00% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9768bd9...fa1ef16. Read the comment docs.

return new HashMap<String, Literal>() {{
put("shingle_size", (ctx.shingle_size != null)
? getArgumentValue(ctx.shingle_size)
: new Literal(8, DataType.INTEGER));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If shingle_size = 8 is default setting, it would be better to encapsulated in AD operator instead of in parser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to not set default value here in the latest revision

@@ -19,7 +19,7 @@

private final Boolean value;

private ExprBooleanValue(Boolean value) {
public ExprBooleanValue(Boolean value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change to public?

Copy link
Contributor Author

@jackiehanyang jackiehanyang Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to reference it in resultBuilder.put(resultKeyName, new ExprBooleanValue(columnValue.booleanValue()));. Other Expr value classes under this path all have public access.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ExprBooleanValue.of() method instead.

return resultBuilder.build();
}

private void popluateResultBuilder(ColumnValue columnValue,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it duplicated with convertRowIntoExprValue() in MLCommonsOperator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can consider abstract such common shared code, that may benefit general train , predict and train_predict command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract duplicate code in the latest revision

@penghuo
Copy link
Collaborator

penghuo commented Mar 4, 2022

In general,

  1. Please add more detail description in PR.
  2. Please add docs for new commands. It is not only docs also used as integration test. https:/opensearch-project/sql/blob/main/docs/dev/Doctest.md

}

protected MLAlgoParams convertArgumentToMLParameter(Map<String, Literal> arguments) {
if (arguments.get("time_field").getValue() == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See multiple places using "time_field". How about creating constants and reuse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find any Constants class to use at first in this package, so didn't use constants to represent these fields. Created a constant class in the latest revision to store MLCommons related constants.

// change key name to avoid duplicate key issue in result map
// only value will be shown in the final returned result
if (schema.containsKey(resultKeyName)) {
resultKeyName = resultKeyName + "1";
Copy link
Contributor

@ylwu-amzn ylwu-amzn Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it general name convention in SQL to append "1" to avoid duplicate name? Seems not so readable, maybe we can append algorithm/command name like "_ad", or "_batch_rcf"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's the best practice to address this, need to discuss it with package owner to find out. All the other command arguments are built into a list, instead of a map that we used here. So no similar case to reference.

@@ -89,6 +89,13 @@ kmeansCommand
k=integerLiteral
;

adCommand
: AD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support the other parameters. You can do it in next PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue to track it - #462
Will address this in the PR.

Copy link
Contributor

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

import org.opensearch.sql.opensearch.client.MLClient;
import org.opensearch.sql.planner.physical.PhysicalPlan;

public abstract class OperatorActions extends PhysicalPlan {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperatorActions is too generic. Could you make more specific for ML.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, add java docs for abstract class and public/abstract methods.

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@penghuo penghuo merged commit f042860 into opensearch-project:feature/ppl-ml Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants