Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change regex usage to manual parsing #1094

Merged
merged 2 commits into from
Aug 29, 2023

Conversation

lahma
Copy link
Collaborator

@lahma lahma commented Jun 8, 2023

I started profiling the "worst case" of evaluating formulas which seemed to take a lot of resources. The culprit turned out to be usage of regex parsing to get column/row/cell information. I created a custom parser that should fulfill the current needs and is a lot faster as it's being called in hot code paths. I also updated APIs to allow also ReadOnlySpan<char> for operations that needed substring to work with.

Based on running the benchmarks in the other PR, here are the improvement numbers. Still room to improve but this was the low-hanging fruit as first step.

NPOI.Benchmarks.LargeExcelFileBenchmark

Diff Method Mean Error Allocated
Old Load 11.741 s 1.1257 s 3.65 GB
New 11.497 s (-2%) 2.4215 s 2.79 GB (-24%)
Old Write 3.483 s 0.1738 s 1.19 GB
New 3.492 s (0%) 0.2931 s 1.19 GB (0%)
Old Evaluate 62.677 s 4.1891 s 75.32 GB
New 18.854 s (-70%) 2.7709 s 12.93 GB (-83%)

NPOI.Benchmarks.RangeValuesBenchmark

Diff Method Mean Error Allocated
Old Double 85.46 ms 0.367 ms 31.45 MB
New 86.43 ms (+1%) 0.181 ms 31.45 MB (0%)
Old String 99.93 ms 0.449 ms 59.32 MB
New 98.76 ms (-1%) 0.521 ms 59.32 MB (0%)
Old Date 114.71 ms 1.603 ms 35.73 MB
New 115.39 ms (+1%) 0.832 ms 35.73 MB (0%)
Old Formulas 1,064.12 ms 10.889 ms 1480.15 MB
New 402.57 ms (-62%) 1.058 ms 284.07 MB (-81%)

@lahma lahma force-pushed the manual-cell-address-parsing branch from 01009b5 to 2b3a949 Compare June 8, 2023 18:06
@tonyqus
Copy link
Member

tonyqus commented Jun 8, 2023

Wow, this is a incredible result. I'm looking forward to this PR.

@tonyqus tonyqus added this to the NPOI 2.7.0 milestone Jun 8, 2023
@lahma lahma force-pushed the manual-cell-address-parsing branch 6 times, most recently from da33d7d to dea2293 Compare June 9, 2023 17:58
@lahma lahma marked this pull request as ready for review June 9, 2023 18:11
@lahma
Copy link
Collaborator Author

lahma commented Jun 9, 2023

Hey @tonyqus I believe this is ready for review. I've updated the benchmarks table to reflect latest version. I'm quite confident that this PR shouldn't break much and public API only has additions of ROS version side-by-side with the string-taking version. I was a good thing to have the tests as part of PR as they discovered one case I hadn't covered!

To improve read/load speed I think it would require changing from XmlDocument to XmlReader which is a lower-level API. I think there's some easy wins in Write part though, there's some unnecessary string allocations there.

@tonyqus
Copy link
Member

tonyqus commented Jun 21, 2023

I did think of changing XmlDocument to XmlReader. But it's kind of tradeoff between convenience and performance. XmlDocument is much easier to use than XmlReader for most cases.

@tonyqus
Copy link
Member

tonyqus commented Jul 31, 2023

Can you help put CellReferenceParser.cs into SS/Util folder instead of main folder?

@lahma lahma force-pushed the manual-cell-address-parsing branch 2 times, most recently from 4c427cb to 11e5c05 Compare July 31, 2023 19:41
@lahma
Copy link
Collaborator Author

lahma commented Jul 31, 2023

Rebased and tweaks in last commit, Linux build still fails due to the fix PR still pending.

@lahma lahma force-pushed the manual-cell-address-parsing branch from 11e5c05 to 0bd93e9 Compare August 26, 2023 05:27
@tonyqus
Copy link
Member

tonyqus commented Aug 29, 2023

LGTM

@tonyqus tonyqus merged commit d87aae7 into nissl-lab:master Aug 29, 2023
2 checks passed
@lahma lahma deleted the manual-cell-address-parsing branch August 29, 2023 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants