Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripting: enable regular expressions by default #63029

Merged

Conversation

stu-elastic
Copy link
Contributor

@stu-elastic stu-elastic commented Sep 29, 2020

  • Setting script.painless.regex.enabled has a new option,
    limited, the default. This defaults to using regular
    expressions but limiting the complexity of the regular
    expressions.

    In addition to limited, the setting can be true, as
    before, which enables regular expressions without limiting them.

    false totally disables regular expressions, which was the
    old default.

  • New setting script.painless.regex.limit-factor. This limits
    regular expression complexity by limiting the number characters
    a regular expression can consider based on input length.

    The default is 6, so a regular expression can consider
    6 * input length number of characters. With input
    foobarbaz (length 9), for example, the regular expression
    can consider 54 (6 * 9) characters.

    This reduces the impact of exponential backtracking in Java's
    regular expression engine.

  • add @inject_constant annotation to whitelist.

    This annotation signals that a compiler settings will
    be injected at the beginning of a whitelisted method.

    The format is argnum=settingname:
    1=foo_setting 2=bar_setting.

    Argument numbers must start at one and must be sequential.

  • Augment
    Pattern.split(CharSequence)
    Pattern.split(CharSequence, int),
    Pattern.splitAsStream(CharSequence)
    Pattern.matcher(CharSequence)
    to take the value of script.painless.regex.limit-factor as a
    an injected parameter, limiting as explained above when this
    setting is in use.

Fixes: #49873

* Setting `script.painless.regex.enabled` has a new option,
  `use-factor`, the default.  This defaults to using regular
  expressions but limiting the complexity of the regular
  expressions.

  In addition to `use-factor`, the setting can be `true`, as
  before, which enables regular expressions without limiting them.

  `false` totally disables regular expressions, which was the
  old default.

* New setting `script.painless.regex.limit-factor`.  This limits
  regular expression complexity by limiting the number characters
  a regular expression can consider based on input length.

  The default is `6`, so a regular expression can consider
  `6` * input length number of characters.  With input
  `foobarbaz` (length `9`), for example, the regular expression
  can consider `54` (`6 * 9`) characters.

  This reduces the impact of exponential backtracking in Java's
  regular expression engine.

* add `@inject_constant` annotation to whitelist.

  This annotation signals that a compiler settings will
  be injected at the beginning of a whitelisted method.

  The format is `argnum=settingname`:
  `1=foo_setting 2=bar_setting`.

  Argument numbers must start at one and must be sequential.

* Augment
  `Pattern.split(CharSequence)`
  `Pattern.split(CharSequence, int)`,
  `Pattern.splitAsStream(CharSequence)`
  `Pattern.matcher(CharSequence)`
  to take the value of `script.painless.regex.limit-factor` as a
  an injected parameter, limiting as explained above when this
  setting is in use.

Fixes: elastic#49873
@stu-elastic stu-elastic added >enhancement :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v8.0.0 v7.10.0 labels Sep 29, 2020
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few high level comments mostly. First is on the naming. "use-factor" is pretty cryptic. Can we use something like "limited"? This flows nicely to me. If I ask the question "are regexes enabled?" the answer true/false/limited. Second, it seems there are quite a few TODOs leftover that can probably be removed? If they are to be left in then please expand on them and/or create an issue so someone else on the team could understand what needs to be done.

@@ -58,6 +58,7 @@ class java.util.regex.Matcher {
String replaceFirst(String)
boolean requireEnd()
Matcher reset()
# Whitelisting Matcher.reset(String) works around the regex limiting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't actually whitelist this method, I think this is just a comment to note if we did allow that method it would allow escaping the regex limiting? Could you clarify the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment.

@@ -66,6 +66,8 @@ public ScriptScope(PainlessLookup painlessLookup, CompilerSettings compilerSetti
staticConstants.put("$SOURCE", scriptSource);
staticConstants.put("$DEFINITION", painlessLookup);
staticConstants.put("$FUNCTIONS", functionTable);
// TODO(stu): inject compiler settings here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover todo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, these and all the left over todos were why this was initially a draft. They are gone now.

@@ -211,7 +211,7 @@ protected void injectStaticFieldsAndGetters() {
irLoadFieldMemberNode.setLocation(internalLocation);
irLoadFieldMemberNode.setExpressionType(String.class);
irLoadFieldMemberNode.setName("$NAME");
irLoadFieldMemberNode.setStatic(true);
irLoadFieldMemberNode.setStatic(true); // TODO(stu): add $COMPILER_INJECTS, add hash map and set it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this todo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

if (arguments.containsKey(argNum) == false) {
throw new IllegalArgumentException("[@inject_constant] missing argument number [" + argNum + "]");
}
// TODO(stu): Jack, how do I verify against CompilerSettings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todos like this are very cryptic. Can we just have a normal comment if some explanation is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed when draft.

import java.util.Collections;
import java.util.List;

public class InjectConstantAnnotation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some basic java docs on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@stu-elastic stu-elastic marked this pull request as ready for review September 30, 2020 14:49
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Sep 30, 2020
@jdconrad
Copy link
Contributor

@stu-elastic One note is we need to ensure we support the operators ==~ and =~ in the BinaryMathNode.write method. Apologies as I dropped the ball on this one.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stu-elastic stu-elastic merged commit 93f29a4 into elastic:master Oct 5, 2020
stu-elastic added a commit to stu-elastic/elasticsearch that referenced this pull request Oct 5, 2020
* Setting `script.painless.regex.enabled` has a new option,
  `use-factor`, the default.  This defaults to using regular
  expressions but limiting the complexity of the regular
  expressions.

  In addition to `use-factor`, the setting can be `true`, as
  before, which enables regular expressions without limiting them.

  `false` totally disables regular expressions, which was the
  old default.

* New setting `script.painless.regex.limit-factor`.  This limits
  regular expression complexity by limiting the number characters
  a regular expression can consider based on input length.

  The default is `6`, so a regular expression can consider
  `6` * input length number of characters.  With input
  `foobarbaz` (length `9`), for example, the regular expression
  can consider `54` (`6 * 9`) characters.

  This reduces the impact of exponential backtracking in Java's
  regular expression engine.

* add `@inject_constant` annotation to whitelist.

  This annotation signals that a compiler settings will
  be injected at the beginning of a whitelisted method.

  The format is `argnum=settingname`:
  `1=foo_setting 2=bar_setting`.

  Argument numbers must start at one and must be sequential.

* Augment
  `Pattern.split(CharSequence)`
  `Pattern.split(CharSequence, int)`,
  `Pattern.splitAsStream(CharSequence)`
  `Pattern.matcher(CharSequence)`
  to take the value of `script.painless.regex.limit-factor` as a
  an injected parameter, limiting as explained above when this
  setting is in use.

Fixes: elastic#49873
stu-elastic added a commit that referenced this pull request Oct 5, 2020
* Setting `script.painless.regex.enabled` has a new option,
  `use-factor`, the default.  This defaults to using regular
  expressions but limiting the complexity of the regular
  expressions.

  In addition to `use-factor`, the setting can be `true`, as
  before, which enables regular expressions without limiting them.

  `false` totally disables regular expressions, which was the
  old default.

* New setting `script.painless.regex.limit-factor`.  This limits
  regular expression complexity by limiting the number characters
  a regular expression can consider based on input length.

  The default is `6`, so a regular expression can consider
  `6` * input length number of characters.  With input
  `foobarbaz` (length `9`), for example, the regular expression
  can consider `54` (`6 * 9`) characters.

  This reduces the impact of exponential backtracking in Java's
  regular expression engine.

* add `@inject_constant` annotation to whitelist.

  This annotation signals that a compiler settings will
  be injected at the beginning of a whitelisted method.

  The format is `argnum=settingname`:
  `1=foo_setting 2=bar_setting`.

  Argument numbers must start at one and must be sequential.

* Augment
  `Pattern.split(CharSequence)`
  `Pattern.split(CharSequence, int)`,
  `Pattern.splitAsStream(CharSequence)`
  `Pattern.matcher(CharSequence)`
  to take the value of `script.painless.regex.limit-factor` as a
  an injected parameter, limiting as explained above when this
  setting is in use.

Fixes: #49873
Backport of: 93f29a4
@stu-elastic
Copy link
Contributor Author

master: 93f29a4
7.x (7.10): 791a9d5

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Oct 20, 2020
Now that we've got regexes enabled by default (elastic#63029) this adds a test
to runtime fields just to make sure that it works with regexes. It does,
but this adds a test to make sure it continues to work.
nik9000 added a commit that referenced this pull request Oct 20, 2020
Now that we've got regexes enabled by default (#63029) this adds a test
to runtime fields just to make sure that it works with regexes. It does,
but this adds a test to make sure it continues to work.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Oct 20, 2020
Now that we've got regexes enabled by default (elastic#63029) this adds a test
to runtime fields just to make sure that it works with regexes. It does,
but this adds a test to make sure it continues to work.
nik9000 added a commit that referenced this pull request Oct 20, 2020
Now that we've got regexes enabled by default (#63029) this adds a test
to runtime fields just to make sure that it works with regexes. It does,
but this adds a test to make sure it continues to work.
pugnascotia pushed a commit to pugnascotia/elasticsearch that referenced this pull request Oct 21, 2020
Now that we've got regexes enabled by default (elastic#63029) this adds a test
to runtime fields just to make sure that it works with regexes. It does,
but this adds a test to make sure it continues to work.
jrodewig added a commit that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to #63029.

Closes #75199.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to elastic#63029.

Closes elastic#75199.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to elastic#63029.

Closes elastic#75199.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to elastic#63029.

Closes elastic#75199.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to elastic#63029.

Closes elastic#75199.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to elastic#63029.

Closes elastic#75199.
elasticsearchmachine added a commit that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to #63029.

Closes #75199.

Co-authored-by: James Rodewig <[email protected]>
elasticsearchmachine added a commit that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to #63029.

Closes #75199.

Co-authored-by: James Rodewig <[email protected]>
elasticsearchmachine added a commit that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to #63029.

Closes #75199.

Co-authored-by: James Rodewig <[email protected]>
jrodewig added a commit that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to #63029.

Closes #75199.

Co-authored-by: James Rodewig <[email protected]>
elasticsearchmachine added a commit that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to #63029.

Closes #75199.

Co-authored-by: James Rodewig <[email protected]>
jrodewig added a commit that referenced this pull request Aug 4, 2021
Documents the `script.painless.regex.enabled` and
`script.painless.regex.limit-factor` cluster settings.

Relates to #63029.

Closes #75199.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Core/Infra Meta label for core/infra team v7.10.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Painless Safety: Regexes
5 participants