Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FormattedMessage doesn't need to use regular expressions #1223

Closed
jvz opened this issue Jan 23, 2023 · 4 comments · Fixed by #1885
Closed

FormattedMessage doesn't need to use regular expressions #1223

jvz opened this issue Jan 23, 2023 · 4 comments · Fixed by #1885
Assignees
Labels
enhancement Additions or updates to features minor API change performance Issues or PRs that affect performance, throughput, latency, etc.
Milestone

Comments

@jvz
Copy link
Member

jvz commented Jan 23, 2023

In order to determine which message format syntax to use in FormattedMessage, this uses a regular expression. This can be simplified by flipping the ordering to check for a parametrized message first before assuming it's a printf-style message.

@jvz jvz added the enhancement Additions or updates to features label Jan 23, 2023
@jvz jvz added this to the 2.20.0 milestone Jan 23, 2023
@adwsingh
Copy link
Contributor

@jvz I was looking into this. How do you suggest we check for a parametrized message, by doing countArgumentPlaceholders(msg) > 0 ?

@ppkarwasz
Copy link
Contributor

@adwsingh, that sounds about right.

I marked you as "assignee" to prevent other contributors to work on it. It's not an obligation.

@adwsingh
Copy link
Contributor

adwsingh commented Feb 17, 2023

@ppkarwasz after reading the docs I am not so sure of my implementation anymore as I think it would break users.

In docs we specify,

The message pattern passed to a FormattedMessage is first checked to see if it is a valid java.text.MessageFormat pattern. If it is, a MessageFormatMessage is used to format it. If not it is next checked to see if it contains any tokens that are valid format specifiers for String.format(). If so, a StringFormattedMessage is used to format it. Finally, if the pattern doesn't match either of those then a ParameterizedMessage is used to format it.

For a message like logger.error("Test message {} %s", "abc");, we would now log Test message abc %s instead of Test message {} abc, which would go against what we specify in the docs.

@ppkarwasz
Copy link
Contributor

@adwsingh,

Yes, it is a breaking change to silence Code scanning alert #51: checking for an unescaped {} is much faster than checking for a java.util.Formatter format specifier.

I believe we can change the documentation to state:

The message factory supports the following format specifiers:

  • those specified by java.text.MessageFormat,
  • those specified by java.util.Formatter,
  • {} placeholders (cf. ParameterizedMessage).
    Mixing specifier from these 3 categories is not supported.

BTW: The regex we use currently is identical to java.util.Formatter.formatSpecifier used by the JRE, so this does not change anything security-wise.

@jvz jvz added the performance Issues or PRs that affect performance, throughput, latency, etc. label Oct 15, 2023
@ppkarwasz ppkarwasz self-assigned this Oct 23, 2023
ppkarwasz added a commit to ppkarwasz/logging-log4j2 that referenced this issue Oct 23, 2023
We change the order in which `FormattedMessage` checks the format of the
provided pattern: we first check for the presence of `{}` placeholders
and only then for `java.util.Format` specifiers.

This eliminates the need for a potentially exponential regular
expression evalutation, which was reported by Spotbugs (apache#1849).

The Javadoc and documentation were improved to clarify the heuristic
used by `FormattedMessage`.

Closes apache#1223.

Remark: since `FormattedMessage` used the **same** regular expression as
`java.util.Format`, if a message uses `java.util.Format` specifiers, it
is still vulnerable to a ReDOS.
@ppkarwasz ppkarwasz modified the milestones: 2.20.0, 2.x Oct 23, 2023
ppkarwasz added a commit to ppkarwasz/logging-log4j2 that referenced this issue Oct 23, 2023
We change the order in which `FormattedMessage` checks the format of the
provided pattern: we first check for the presence of `{}` placeholders
and only then for `java.util.Format` specifiers.

This eliminates the need for a potentially exponential regular
expression evalutation, which was reported by Spotbugs (apache#1849).

The Javadoc and documentation were improved to clarify the heuristic
used by `FormattedMessage`.

Closes apache#1223.

Remark: since `FormattedMessage` used the **same** regular expression as
`java.util.Format`, if a message uses `java.util.Format` specifiers, it
is still vulnerable to a ReDOS.
@ppkarwasz ppkarwasz modified the milestones: 2.x, 2.22.0 Nov 13, 2023
dongjoon-hyun pushed a commit to apache/spark that referenced this issue Nov 22, 2023
### What changes were proposed in this pull request?
The pr aims to upgrade log4j2 from 2.21.0 to 2.22.0.

### Why are the changes needed?
This is the first log4j2 version that provides a CycloneDX Software Bill of Materials (SBOM) and the new version bring some new change and fix like:
- Change the order of evaluation of FormattedMessage formatters. Messages are evaluated using java.util.Format only if they don't comply to the java.text.MessageFormat or ParameterizedMessage format. (apache/logging-log4j2#1223)
- Change default encoding of HTTP Basic Authentication to UTF-8 and add log4j2.configurationAuthorizationEncoding property to overwrite it. (apache/logging-log4j2#1970)
- Removed unused FastDateParser which was causing unnecessary heap overhead ([LOG4J2-3672](https://issues.apache.org/jira/browse/LOG4J2-3672), apache/logging-log4j2#1848)
- Fix MDC pattern converter causing issues for %notEmpty (apache/logging-log4j2#1922)
- Fix NotSerializableException thrown when Logger is serialized with a ReusableMessageFactory (apache/logging-log4j2#1884)

the full release note as follows:
-https:/apache/logging-log4j2/releases/tag/rel%2F2.22.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43940 from LuciferYang/SPARK-46038.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
ppkarwasz added a commit to ppkarwasz/logging-log4j2 that referenced this issue Jan 17, 2024
We change the order in which `FormattedMessage` checks the format of the
provided pattern: we first check for the presence of `{}` placeholders
and only then for `java.util.Format` specifiers.

This eliminates the need for a potentially exponential regular
expression evalutation, which was reported by Spotbugs (apache#1849).

The Javadoc and documentation were improved to clarify the heuristic
used by `FormattedMessage`.

Closes apache#1223.

Remark: since `FormattedMessage` used the **same** regular expression as
`java.util.Format`, if a message uses `java.util.Format` specifiers, it
is still vulnerable to a ReDOS.
ppkarwasz added a commit to ppkarwasz/logging-log4j2 that referenced this issue Jan 26, 2024
We change the order in which `FormattedMessage` checks the format of the
provided pattern: we first check for the presence of `{}` placeholders
and only then for `java.util.Format` specifiers.

This eliminates the need for a potentially exponential regular
expression evalutation, which was reported by Spotbugs (apache#1849).

The Javadoc and documentation were improved to clarify the heuristic
used by `FormattedMessage`.

Closes apache#1223.

Remark: since `FormattedMessage` used the **same** regular expression as
`java.util.Format`, if a message uses `java.util.Format` specifiers, it
is still vulnerable to a ReDOS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Additions or updates to features minor API change performance Issues or PRs that affect performance, throughput, latency, etc.
Projects
None yet
3 participants