-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AD command for PPL/AD integration #455
Add AD command for PPL/AD integration #455
Conversation
Signed-off-by: jackieyanghan <[email protected]>
Signed-off-by: jackieyanghan <[email protected]>
Signed-off-by: jackieyanghan <[email protected]>
Signed-off-by: jackieyanghan <[email protected]>
Signed-off-by: jackieyanghan <[email protected]>
Codecov Report
@@ Coverage Diff @@
## feature/ppl-ml #455 +/- ##
====================================================
- Coverage 95.78% 95.20% -0.58%
- Complexity 2704 2715 +11
====================================================
Files 269 273 +4
Lines 7281 7364 +83
Branches 544 550 +6
====================================================
+ Hits 6974 7011 +37
- Misses 252 298 +46
Partials 55 55
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
return new HashMap<String, Literal>() {{ | ||
put("shingle_size", (ctx.shingle_size != null) | ||
? getArgumentValue(ctx.shingle_size) | ||
: new Literal(8, DataType.INTEGER)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If shingle_size = 8 is default setting, it would be better to encapsulated in AD operator instead of in parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shingle size is optional, MLCommons will set it as 8 by default https:/opensearch-project/ml-commons/blob/main/ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/rcf/FixedInTimeRandomCutForest.java#L82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to not set default value here in the latest revision
@@ -19,7 +19,7 @@ | |||
|
|||
private final Boolean value; | |||
|
|||
private ExprBooleanValue(Boolean value) { | |||
public ExprBooleanValue(Boolean value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change to public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to reference it in resultBuilder.put(resultKeyName, new ExprBooleanValue(columnValue.booleanValue()));
. Other Expr value classes under this path all have public access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ExprBooleanValue.of() method instead.
return resultBuilder.build(); | ||
} | ||
|
||
private void popluateResultBuilder(ColumnValue columnValue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it duplicated with convertRowIntoExprValue() in MLCommonsOperator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can consider abstract such common shared code, that may benefit general train
, predict
and train_predict
command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract duplicate code in the latest revision
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/ADOperator.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/ADOperator.java
Outdated
Show resolved
Hide resolved
In general,
|
} | ||
|
||
protected MLAlgoParams convertArgumentToMLParameter(Map<String, Literal> arguments) { | ||
if (arguments.get("time_field").getValue() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See multiple places using "time_field". How about creating constants and reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't find any Constants class to use at first in this package, so didn't use constants to represent these fields. Created a constant class in the latest revision to store MLCommons related constants.
// change key name to avoid duplicate key issue in result map | ||
// only value will be shown in the final returned result | ||
if (schema.containsKey(resultKeyName)) { | ||
resultKeyName = resultKeyName + "1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it general name convention in SQL to append "1" to avoid duplicate name? Seems not so readable, maybe we can append algorithm/command name like "_ad", or "_batch_rcf"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's the best practice to address this, need to discuss it with package owner to find out. All the other command arguments are built into a list, instead of a map that we used here. So no similar case to reference.
@@ -89,6 +89,13 @@ kmeansCommand | |||
k=integerLiteral | |||
; | |||
|
|||
adCommand | |||
: AD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support the other parameters. You can do it in next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue to track it - #462
Will address this in the PR.
Signed-off-by: jackieyanghan <[email protected]>
Signed-off-by: jackieyanghan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
import org.opensearch.sql.opensearch.client.MLClient; | ||
import org.opensearch.sql.planner.physical.PhysicalPlan; | ||
|
||
public abstract class OperatorActions extends PhysicalPlan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperatorActions is too generic. Could you make more specific for ML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, add java docs for abstract class and public/abstract methods.
Signed-off-by: jackieyanghan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
[Describe what this change achieves]
Issues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.