Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOG4J2-3672 Avoid invoking DateFormatSymbols.getZoneStrings() in FastDateParser #1848

Closed
wants to merge 1 commit into from

Conversation

tristantarrant
Copy link

@vy vy self-requested a review October 9, 2023 20:49
@vy vy self-assigned this Oct 9, 2023
@vy vy added the enhancement Additions or updates to features label Oct 9, 2023
@vy vy changed the title LOG4J2-3672 Avoid invoking DateFormatSymbols.getZoneStrings() in Fast… LOG4J2-3672 Avoid invoking DateFormatSymbols.getZoneStrings() in FastDateParser Oct 10, 2023
@vy vy changed the title LOG4J2-3672 Avoid invoking DateFormatSymbols.getZoneStrings() in FastDateParser LOG4J2-3672 Avoid invoking DateFormatSymbols.getZoneStrings() in FastDateParser Oct 10, 2023
@vy
Copy link
Member

vy commented Oct 10, 2023

@tristantarrant, thanks so much for taking time to not only report the problem, but also provide a fix. 🙇 I have some concerns regarding this change. For one, it is backward incompatible, hence we cannot have your toggle enabled by default. Nevertheless, I think we don't need a toggle: FastDateParser can do better than matching against an enormous regex to figure out the time zone.

Since FastDateParser is borrowed from Apache Commons Lang in 2015 (ouch!), I'd first check if the upstream already has a fix for this. Otherwise, I'd try to figure out a better way. All in all, I have an inkling that we can do better than a regex.

@tristantarrant
Copy link
Author

Hi @vy, I agree we need a better solution.

  • avoid constructing that ridiculous regex with all the timezone names anyway. A simple [A-Za-z0-9\+\-/\s]+ should be enough
  • if the user uses a GMT-xx or RFC822-style timezone, use the quick path
  • if the user uses a primary timezone name (e.g. Europe/Rome), use TimeZone.getTimeZone()
  • otherwise fallback to obtaining the full list of zone names and aliases. Unfortunately there is no other way to look up timezones by alias... By using the approach used in my PR we can save ~700K of heap over the use of DateFormatSymbols.getInstance(locale).getZoneStrings(); (as you can see from the table I've put in the Jira)

Unfortunately this does not solve the bigger issue: if someone uses SimpleDateFormat in their code, they are going to trigger the full JDK caching of timezones, rendering our fix useless anyway.

A possibility, aside from fixing this in the JDK in some way, is to install an alternate ZoneRulesProvider using the java.time.zone.DefaultZoneRulesProvider system property. Such a provider would avoid caching the full TZ dataset.

@tristantarrant
Copy link
Author

I have reworked the PR to lazily initialize the TZ names only if a the date pattern uses a non-GMT/RFC822-style name. A WARNing will be logged once in this case.

My idea about implementing an alternate ZoneRulesProvider is hindered by https://bugs.openjdk.org/browse/JDK-8303899 :(

@vy
Copy link
Member

vy commented Oct 23, 2023

@tristantarrant, I haven't forgotten about this issue and your PR, but swamped with other priorities right now. Please allow me some time.

@tristantarrant
Copy link
Author

@tristantarrant, I haven't forgotten about this issue and your PR, but swamped with other priorities right now. Please allow me some time.

I wasn't expecting you to act on it urgently. Take your time :)

@vy
Copy link
Member

vy commented Nov 1, 2023

Figured the most recent FastDateParser from Apache Commons Lang also still uses getZoneStrings(): https:/apache/commons-lang/blob/5b5656a8b403fd91284b0d2acc893dac7ebd5c29/src/main/java/org/apache/commons/lang3/time/FastDateParser.java#L932

@vy
Copy link
Member

vy commented Nov 2, 2023

@tristantarrant, the more I digged this the more I felt like you are pulling my leg. 😅 Log4j uses date parsing nowhere in the code. I can mark all date parsing related stuff as deprecated and remove them at a certain point. Though all this begs the question "How did you end up having the problem described in LOG4J2-2672?" 🤔 Are you trying to parse date using Log4j libraries? If so, why?

@tristantarrant
Copy link
Author

tristantarrant commented Nov 2, 2023

Log4j does not parse dates: it parses a date formatter which is then used to print timestamps in the logs. We use the following layout:

<PatternLayout pattern="%highlight{%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p (%t) [%c] %m%throwable}{INFO=normal, DEBUG=normal, TRACE=normal}%n"/>

and that %d{....} causes all the timezone initialization.

@tristantarrant
Copy link
Author

I'll create a small verifier and attach it to the issue.

@vy
Copy link
Member

vy commented Nov 2, 2023

getZoneStrings() is only used in FastDateParser, and when I delete that class (and some more other code), the code still compiles fine. Anyway, apparently I am missing something. I will continue tomorrow.

@tristantarrant
Copy link
Author

I apologize: I included the wrong layout. This one triggers the call:

 <PatternLayout pattern="%d{dd/MMM/yyyy:HH:mm:ss Z} %-5p %m%throwable%n"/>

Removing the Z will "fix" it, but then the timestamp will not have the Common Log Format (as described here https://httpd.apache.org/docs/2.4/logs.html) which is what we need.

@tristantarrant
Copy link
Author

I've created the reproducer: https:/tristantarrant/log4j-tzdb-issue

@tristantarrant
Copy link
Author

It will generate a tz.hprof heap dump which you can open with VisualVM or Eclipse MAT.

@vy
Copy link
Member

vy commented Nov 2, 2023

Okay, apparently FastDateFormat eagerly creates a FastDateParser instance. I have checked this with other PMC members and decided to remove the FastDateParser from the code base. I will implement the change and inform you.

vy added a commit that referenced this pull request Nov 2, 2023
@vy
Copy link
Member

vy commented Nov 2, 2023

f9766bb should have fixed the issue. @tristantarrant, would you mind confirming the fix, please? (You can either try building the sources yourself or use 2.22.0-SNAPSHOT after the associated build succeeds.)

@tristantarrant
Copy link
Author

Even better: less code is my favourite code :)
We can close this PR.

vy added a commit that referenced this pull request Nov 2, 2023
@vy
Copy link
Member

vy commented Nov 2, 2023

@tristantarrant, there were forgotten tests. Pushed one more commit. The new CI run: https:/apache/logging-log4j2/actions/runs/6736362709

vy added a commit that referenced this pull request Nov 3, 2023
vy added a commit that referenced this pull request Nov 3, 2023
@tristantarrant
Copy link
Author

@vy just to let you know that our application initial heap footprint went down from ~31MB to 25MB (we also disabled initialization of JMX) thanks to this. Thanks again.

@tristantarrant tristantarrant deleted the LOG4J2-3672 branch November 21, 2023 10:00
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Nov 22, 2023
### What changes were proposed in this pull request?
The pr aims to upgrade log4j2 from 2.21.0 to 2.22.0.

### Why are the changes needed?
This is the first log4j2 version that provides a CycloneDX Software Bill of Materials (SBOM) and the new version bring some new change and fix like:
- Change the order of evaluation of FormattedMessage formatters. Messages are evaluated using java.util.Format only if they don't comply to the java.text.MessageFormat or ParameterizedMessage format. (apache/logging-log4j2#1223)
- Change default encoding of HTTP Basic Authentication to UTF-8 and add log4j2.configurationAuthorizationEncoding property to overwrite it. (apache/logging-log4j2#1970)
- Removed unused FastDateParser which was causing unnecessary heap overhead ([LOG4J2-3672](https://issues.apache.org/jira/browse/LOG4J2-3672), apache/logging-log4j2#1848)
- Fix MDC pattern converter causing issues for %notEmpty (apache/logging-log4j2#1922)
- Fix NotSerializableException thrown when Logger is serialized with a ReusableMessageFactory (apache/logging-log4j2#1884)

the full release note as follows:
-https:/apache/logging-log4j2/releases/tag/rel%2F2.22.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43940 from LuciferYang/SPARK-46038.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Additions or updates to features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants