Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2: Don't use stack for index tables #1014

Merged
merged 3 commits into from
Oct 4, 2024
Merged

s2: Don't use stack for index tables #1014

merged 3 commits into from
Oct 4, 2024

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Oct 3, 2024

Provide a pooled array pointer for tables instead of using stack.

Seems like Go is still unstable with large stacks, so use alternative method.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced encoding functionality with improved memory management using temporary byte arrays.
    • Added new benchmark functions for encoding and decoding blocks, supporting both single-threaded and parallel processing.
  • Bug Fixes

    • Improved error handling and assertion logic during encoding processes.
  • Documentation

    • Updated comments and documentation strings to clarify functionality and changes.
  • Tests

    • Introduced new benchmark tests to measure performance for encoding and decoding operations.

Provide a pooled array pointer for tables instead of using stack.

Seems like Go is still unstable with large stacks, so use alternative method.
Copy link

coderabbitai bot commented Oct 3, 2024

📝 Walkthrough

Walkthrough

The pull request introduces significant modifications across several files in the s2 package, focusing on enhancing encoding logic, memory management, and error handling. Key changes include the addition of new types and parameters to various encoding functions, the implementation of synchronization mechanisms for memory management, and the introduction of benchmarking functions for performance analysis. The overall structure and functionality of the encoding process have been refined to improve efficiency, clarity, and maintainability.

Changes

File Change Summary
s2/_generate/gen.go Added regTable type for register management; updated genEncodeBlockAsm and genEncodeBetterBlockAsm to include a temporary table pointer; refined logic for maximum length and offset handling; enhanced error handling in assert function; improved comments and documentation.
s2/encode.go Introduced estblockPool for managing temporary byte arrays; modified EstimateBlockSize to utilize the pool for different input sizes, enhancing memory efficiency during encoding.
s2/encode_amd64.go Added encPools and encBetterPools for managing temporary byte arrays in encoding functions; updated encodeBlock and encodeBlockBetter to use these pools, improving memory management and performance during encoding operations.
s2/encode_go.go Updated calcBlockSize and calcBlockSizeSmall functions to include an additional parameter for a temporary byte array, enhancing processing capabilities.
s2/encodeblock_amd64.go Modified multiple encoding function signatures to include a new temporary buffer parameter, enhancing their capabilities for managing temporary data during encoding operations.
s2/s2_test.go Added benchmarking functions for encoding and decoding operations, allowing for performance analysis; made minor adjustments to the TestMatchLen function for validation against a reference implementation.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (9)
s2/s2_test.go (3)

2229-2243: LGTM: BenchmarkEncodeBlockSingle implementation

The BenchmarkEncodeBlockSingle function is well-implemented and follows the same structure as BenchmarkDecodeBlockSingle. It correctly handles file downloading, applies size limits when necessary, and uses the benchEncode function for benchmarking.

Consider extracting the common setup code (downloading, reading, and size limiting) into a separate helper function to reduce duplication between BenchmarkDecodeBlockSingle and BenchmarkEncodeBlockSingle.


2245-2251: LGTM: BenchmarkEncodeBlockParallel implementation

The BenchmarkEncodeBlockParallel function is correctly implemented, using the benchFile function for parallel encoding benchmarks. This approach is consistent with the BenchmarkDecodeBlockParallel function and promotes code reuse.

Consider refactoring BenchmarkDecodeBlockParallel and BenchmarkEncodeBlockParallel into a single function with a parameter to determine whether to benchmark encoding or decoding. This would reduce code duplication and improve maintainability. For example:

func BenchmarkBlockParallel(b *testing.B, decode bool) {
	for i := range testFiles {
		b.Run(fmt.Sprint(i, "-", testFiles[i].label), func(b *testing.B) {
			benchFile(b, i, decode)
		})
	}
}

// Usage:
func BenchmarkDecodeBlockParallel(b *testing.B) {
	BenchmarkBlockParallel(b, true)
}

func BenchmarkEncodeBlockParallel(b *testing.B) {
	BenchmarkBlockParallel(b, false)
}

2205-2251: Overall assessment: Solid implementation with room for minor improvements

The new benchmark functions (BenchmarkDecodeBlockSingle, BenchmarkDecodeBlockParallel, BenchmarkEncodeBlockSingle, and BenchmarkEncodeBlockParallel) are well-implemented and significantly enhance the test coverage for block encoding and decoding in both single-threaded and parallel contexts. They follow existing patterns in the file, promoting consistency and readability.

To further improve the code:

  1. Consider extracting the common setup code in the single-threaded benchmarks into a helper function.
  2. Explore the possibility of combining the parallel benchmark functions into a single parameterized function to reduce duplication.

These refactorings would enhance maintainability and reduce the likelihood of inconsistencies in future updates.

s2/encode_amd64.go (1)

114-121: Consider using a switch statement for improved readability.

The repeated if-else conditions for checking the length of src can be replaced with a switch statement to improve code readability and clarity.

Here's an example of how you can refactor the code using a switch statement:

func encodeBlockBetter(dst, src []byte) (d int) {
    // ... existing code ...

    switch {
    case len(src) > 4<<20:
        const sz, pool = 589824, 0
        tmp := getPooledArray(&encBetterPools[pool], sz)
        defer encBetterPools[pool].Put(tmp)
        return encodeBetterBlockAsm(dst, src, tmp)
    case len(src) >= limit12B:
        const sz, pool = 589824, 0
        tmp := getPooledArray(&encBetterPools[pool], sz)
        defer encBetterPools[pool].Put(tmp)
        return encodeBetterBlockAsm4MB(dst, src, tmp)
    case len(src) >= limit10B:
        const sz, pool = 81920, 0
        tmp := getPooledArray(&encBetterPools[pool], sz)
        defer encBetterPools[pool].Put(tmp)
        return encodeBetterBlockAsm12B(dst, src, tmp)
    case len(src) >= limit8B:
        const sz, pool = 20480, 1
        tmp := getPooledArray(&encBetterPools[pool], sz)
        defer encBetterPools[pool].Put(tmp)
        return encodeBetterBlockAsm10B(dst, src, tmp)
    case len(src) >= minNonLiteralBlockSize:
        const sz, pool = 5120, 2
        tmp := getPooledArray(&encBetterPools[pool], sz)
        defer encBetterPools[pool].Put(tmp)
        return encodeBetterBlockAsm8B(dst, src, tmp)
    default:
        return 0
    }
}

Using a switch statement with case conditions makes the code more readable and easier to understand the different length ranges and their corresponding actions.

Also applies to: 124-132, 135-143, 146-153, 158-166

s2/encodeblock_amd64.go (2)

14-14: Consider adding unit tests to verify the behavior with the new function signatures.

While the changes appear to be consistent, it would be beneficial to add unit tests specifically exercising the functions with the new signatures. This will help catch any potential issues related to the additional temporary byte array parameter and ensure the expected behavior is maintained.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Also applies to: 21-21, 28-28, 35-35, 42-42, 49-49, 56-56, 63-63, 70-70, 77-77, 84-84, 91-91, 98-98, 105-105, 112-112, 119-119, 126-126, 133-133, 140-140, 147-147, 154-154, 161-161


14-14: Document the purpose and usage of the temporary byte arrays.

The temporary byte arrays have been introduced in the function signatures, but their purpose and usage are not clearly documented. It would be helpful to add comments explaining:

  1. Why the temporary byte arrays are needed.
  2. How they are expected to be used within the functions.
  3. Any assumptions or constraints related to their size and content.

Consider adding comments to each function to clarify the purpose and usage of the temporary byte arrays. For example:

// encodeBlockAsm encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4294967295 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
// The 'tmp' parameter is a temporary byte array used for intermediate storage during encoding.
// It must be a pointer to a [65536]byte array.
//
//go:noescape
func encodeBlockAsm(dst []byte, src []byte, tmp *[65536]byte) int

Also applies to: 21-21, 28-28, 35-35, 42-42, 49-49, 56-56, 63-63, 70-70, 77-77, 84-84, 91-91, 98-98, 105-105, 112-112, 119-119, 126-126, 133-133, 140-140, 147-147, 154-154, 161-161

s2/encode.go (1)

69-86: Avoid magic numbers and use constants for pool sizes.

The pool sizes 2048 and 32768 are used as magic numbers in multiple places. Define them as constants at the package level with descriptive names to improve readability and maintainability.

For example:

const (
    smallBufferSize = 2048
    largeBufferSize = 32768
)

func EstimateBlockSize(src []byte) (d int) {
    ...
    if len(src) <= 1024 {
        const sz, pool = smallBufferSize, 0
        ...
    } else {
        const sz, pool = largeBufferSize, 1
        ...
    }
    ...
}
🧰 Tools
🪛 GitHub Check: build (1.23.x, macos-latest)

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 85-85:
undefined: encPools


[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools

🪛 GitHub Check: build (1.22.x, macos-latest)

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 85-85:
undefined: encPools


[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools

🪛 GitHub Check: fuzz-s2

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 85-85:
undefined: encPools

🪛 GitHub Check: build (1.21.x, macos-latest)

[failure] 70-70:
undefined: encPools


[failure] 74-74:
cannot infer T (/Users/runner/work/compress/compress/internal/race/norace.go:12:17)


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 84-84:
cannot infer T (/Users/runner/work/compress/compress/internal/race/norace.go:12:17)


[failure] 85-85:
undefined: encPools


[failure] 70-70:
undefined: encPools


[failure] 74-74:
cannot infer T (/Users/runner/work/compress/compress/internal/race/norace.go:12:17)

🪛 GitHub Check: build-special

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 85-85:
undefined: encPools

s2/_generate/gen.go (2)

149-156: Consider using a struct literal for improved readability.

Instead of initializing the regTable fields one by one, consider using a struct literal for improved readability:

table := regTable{r: Load(Param("tmp"), GP64()), disp: 0}

931-933: Consider using a struct literal for improved readability.

Instead of initializing the regTable fields one by one, consider using struct literals for improved readability:

lTab := regTable{r: table, disp: 0}
sTab := regTable{r: table, disp: lTableSize}  
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f73ab1e and e3b2a8d.

⛔ Files ignored due to path filters (1)
  • s2/testdata/fuzz/block-corpus-enc.zip is excluded by !**/*.zip
📒 Files selected for processing (5)
  • s2/_generate/gen.go (9 hunks)
  • s2/encode.go (3 hunks)
  • s2/encode_amd64.go (5 hunks)
  • s2/encodeblock_amd64.go (1 hunks)
  • s2/s2_test.go (1 hunks)
🧰 Additional context used
🪛 GitHub Check: build (1.23.x, macos-latest)
s2/encode.go

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 85-85:
undefined: encPools


[failure] 87-87:
too many arguments in call to calcBlockSize


[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools

🪛 GitHub Check: build (1.22.x, macos-latest)
s2/encode.go

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 85-85:
undefined: encPools


[failure] 87-87:
too many arguments in call to calcBlockSize


[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools

🪛 GitHub Check: fuzz-s2
s2/encode.go

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 85-85:
undefined: encPools


[failure] 87-87:
too many arguments in call to calcBlockSize

🪛 GitHub Check: build (1.21.x, macos-latest)
s2/encode.go

[failure] 70-70:
undefined: encPools


[failure] 74-74:
cannot infer T (/Users/runner/work/compress/compress/internal/race/norace.go:12:17)


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 84-84:
cannot infer T (/Users/runner/work/compress/compress/internal/race/norace.go:12:17)


[failure] 85-85:
undefined: encPools


[failure] 87-87:
too many arguments in call to calcBlockSize


[failure] 70-70:
undefined: encPools


[failure] 74-74:
cannot infer T (/Users/runner/work/compress/compress/internal/race/norace.go:12:17)

🪛 GitHub Check: build-special
s2/encode.go

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 80-80:
undefined: encPools


[failure] 85-85:
undefined: encPools


[failure] 87-87:
too many arguments in call to calcBlockSize

🔇 Additional comments (10)
s2/s2_test.go (2)

2205-2219: LGTM: BenchmarkDecodeBlockSingle implementation

The implementation of BenchmarkDecodeBlockSingle is well-structured and consistent with other benchmark functions in the file. It correctly handles file downloading, applies size limits when necessary, and uses the benchDecode function for actual benchmarking.


2221-2227: LGTM: BenchmarkDecodeBlockParallel implementation

The BenchmarkDecodeBlockParallel function is efficiently implemented, leveraging the existing benchFile function for parallel decoding benchmarks. This approach promotes code reuse and maintains consistency with other benchmark functions.

s2/encode_amd64.go (3)

189-197: Verify the impact of the changes on the Snappy compatibility.

The changes made to the encodeBlockSnappy function, such as the usage of sync.Pool and the modified length conditions, should be carefully reviewed to ensure that they don't break the compatibility with the Snappy format.

To verify the Snappy compatibility, you can run the following script:

#!/bin/bash

# Run Snappy compatibility tests
go test -run=TestSnappyCompatibility ./...

# Verify that the Snappy-encoded data can be decoded correctly
go test -run=TestSnappyDecoding ./...

This script will run the Snappy compatibility tests and verify that the Snappy-encoded data can be decoded correctly. If any of the tests fail, it indicates that the changes might have introduced compatibility issues with the Snappy format.

Additionally, review the Snappy format specification and compare it with the modified code to ensure that the encoding and decoding processes still adhere to the specification.

Also applies to: 200-207, 210-217, 220-227, 232-239


262-270: Ensure the changes are benchmarked and profiled.

The modifications made to the encodeBlockBetterSnappy function, including the usage of sync.Pool and the updated length conditions, should be thoroughly benchmarked and profiled to measure their impact on performance.

To benchmark and profile the changes, you can use the following commands:

# Run benchmarks for the Snappy encoding
go test -bench=BenchmarkEncodeBlockBetterSnappy ./...

# Run CPU and memory profiling
go test -cpuprofile cpu.prof -memprofile mem.prof -bench=BenchmarkEncodeBlockBetterSnappy ./...

# Analyze the CPU profile
go tool pprof cpu.prof

# Analyze the memory profile
go tool pprof mem.prof

These commands will run benchmarks specifically for the encodeBlockBetterSnappy function and generate CPU and memory profiles. Use the go tool pprof command to analyze the profiles and identify any performance bottlenecks or excessive memory allocations.

Compare the benchmark results and profile analysis with the previous implementation to ensure that the changes have a positive impact on performance and memory usage. If there are any regressions or unexpected behavior, investigate and optimize accordingly.

Also applies to: 274-282, 285-293, 296-303, 308-316


14-14: Verify the usage of sync.Pool in the codebase.

The introduction of sync.Pool for managing temporary byte arrays is a good approach to optimize memory usage. However, it's important to ensure that the usage of sync.Pool is consistent throughout the codebase and that it doesn't introduce any unexpected behavior or performance issues.

To verify the usage of sync.Pool, you can run the following script:

This script will help you locate all the instances where sync.Pool is used, as well as the specific usage of encPools and encBetterPools. Review the results to ensure that the usage is consistent and follows the intended pattern.

Also applies to: 90-90

s2/encodeblock_amd64.go (2)

14-14: Ensure the temporary byte arrays are properly initialized and have the expected lifetime.

The new function signatures include a pointer to a fixed-size byte array as the temporary storage. It's important to ensure that:

  1. The byte arrays are properly initialized before being passed to the functions.
  2. The lifetime of the byte arrays is managed correctly to avoid dangling pointers or memory leaks.

To verify the proper initialization and lifetime management of the temporary byte arrays, run the following script:

Review the results to ensure proper initialization and lifetime management of the temporary byte arrays.

Also applies to: 21-21, 28-28, 35-35, 42-42, 49-49, 56-56, 63-63, 70-70, 77-77, 84-84, 91-91, 98-98, 105-105, 112-112, 119-119, 126-126, 133-133, 140-140, 147-147, 154-154, 161-161


14-14: Verify the impact of the function signature changes on the codebase.

The function signatures have been updated to include an additional parameter for a temporary byte array. This change is consistent across all the affected functions. However, it's crucial to ensure that all the function calls within the codebase have been updated to pass the required temporary byte array.

To verify the impact and ensure all function calls have been updated, run the following script:

If any outdated function calls are found, please update them to pass the temporary byte array argument.

Also applies to: 21-21, 28-28, 35-35, 42-42, 49-49, 56-56, 63-63, 70-70, 77-77, 84-84, 91-91, 98-98, 105-105, 112-112, 119-119, 126-126, 133-133, 140-140, 147-147, 154-154, 161-161

✅ Verification successful

All function calls have been successfully updated with the additional tmp parameter. No issues found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash

# Description: 
# Verify that all function calls in Go files have been updated to match the new signatures.

# Expected results:
# - No occurrences of the old function signatures.
# - All occurrences of the new function signatures should have the additional argument.

# Search for each function and analyze the results
funcs=(
  "encodeBlockAsm"
  "encodeBlockAsm4MB" 
  "encodeBlockAsm12B"
  "encodeBlockAsm10B"
  "encodeBlockAsm8B"
  "encodeBetterBlockAsm"
  "encodeBetterBlockAsm4MB"
  "encodeBetterBlockAsm12B" 
  "encodeBetterBlockAsm10B"
  "encodeBetterBlockAsm8B"
  "encodeSnappyBlockAsm"
  "encodeSnappyBlockAsm64K"
  "encodeSnappyBlockAsm12B"
  "encodeSnappyBlockAsm10B"
  "encodeSnappyBlockAsm8B"
  "encodeSnappyBetterBlockAsm"
  "encodeSnappyBetterBlockAsm64K" 
  "encodeSnappyBetterBlockAsm12B"
  "encodeSnappyBetterBlockAsm10B"
  "encodeSnappyBetterBlockAsm8B"
  "calcBlockSize"
  "calcBlockSizeSmall"
)

for func in "${funcs[@]}"; do
  echo "Analyzing function: $func"
  rg --type go --ignore-case -A 5 "\b$func\b" 
  echo "---"
done

Length of output: 51334


Script:

#!/bin/bash

# Description:
# Verify that all function calls in Go files have been updated to match the new signatures with the 'tmp' parameter.

# Expected results:
# - No occurrences of the old function signatures without the 'tmp' parameter.
# - All occurrences of the updated function signatures should include the 'tmp' argument.

# Define the list of functions to verify
funcs=(
  "encodeBlockAsm"
  "encodeBlockAsm4MB" 
  "encodeBlockAsm12B"
  "encodeBlockAsm10B"
  "encodeBlockAsm8B"
  "encodeBetterBlockAsm"
  "encodeBetterBlockAsm4MB"
  "encodeBetterBlockAsm12B" 
  "encodeBetterBlockAsm10B"
  "encodeBetterBlockAsm8B"
  "encodeSnappyBlockAsm"
  "encodeSnappyBlockAsm64K"
  "encodeSnappyBlockAsm12B"
  "encodeSnappyBlockAsm10B"
  "encodeSnappyBlockAsm8B"
  "encodeSnappyBetterBlockAsm"
  "encodeSnappyBetterBlockAsm64K" 
  "encodeSnappyBetterBlockAsm12B"
  "encodeSnappyBetterBlockAsm10B"
  "encodeSnappyBetterBlockAsm8B"
  "calcBlockSize"
  "calcBlockSizeSmall"
)

# Iterate over each function and check for calls without the 'tmp' parameter
for func in "${funcs[@]}"; do
  echo "Checking for calls to function: $func without 'tmp' parameter"
  # Search for function calls missing the third argument 'tmp'
  rg --type go --ignore-case "\\b$func\\s*\\([^,]+,[^,]+\\)" -n
  echo "---"
done

# Check for any remaining instances of the old function signatures (without 'tmp')
echo "Checking for any old function signatures without 'tmp' parameter"
rg --type go --ignore-case "func\s+$\{funcs[*]\}\s*\([^,]+,[^,]+\)" -n
echo "---"

Length of output: 7465

s2/encode.go (2)

Line range hint 1-11: LGTM!

The rest of the changes in this file look good to me. The core encoding logic remains unchanged, and the new imports and sync.Pool usage are properly scoped to the EstimateBlockSize function.

Also applies to: 15-57, 88-452


69-77: ⚠️ Potential issue

Pass the temporary buffer as an argument to calcBlockSizeSmall and calcBlockSize.

The static analysis tools report that too many arguments are being passed to calcBlockSizeSmall and calcBlockSize. It seems that the temporary buffer tmp needs to be passed as an argument to these functions.

Modify the calls to calcBlockSizeSmall and calcBlockSize to pass the tmp buffer:

-		d = calcBlockSizeSmall(src, tmp)
+		d = calcBlockSizeSmall(src, tmp[:])

-		d = calcBlockSize(src, tmp)
+		d = calcBlockSize(src, tmp[:]) 

To verify that calcBlockSizeSmall and calcBlockSize accept the temporary buffer as an argument, run:

The expected output should show that both functions accept two arguments, with the second one being a slice.

Also applies to: 79-87

🧰 Tools
🪛 GitHub Check: build (1.23.x, macos-latest)

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall

🪛 GitHub Check: build (1.22.x, macos-latest)

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall

🪛 GitHub Check: fuzz-s2

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall

🪛 GitHub Check: build (1.21.x, macos-latest)

[failure] 70-70:
undefined: encPools


[failure] 74-74:
cannot infer T (/Users/runner/work/compress/compress/internal/race/norace.go:12:17)


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall


[failure] 70-70:
undefined: encPools


[failure] 74-74:
cannot infer T (/Users/runner/work/compress/compress/internal/race/norace.go:12:17)

🪛 GitHub Check: build-special

[failure] 70-70:
undefined: encPools


[failure] 75-75:
undefined: encPools


[failure] 77-77:
too many arguments in call to calcBlockSizeSmall

s2/_generate/gen.go (1)

84-84: Verify the function signature change in the codebase.

The second parameter to genEncodeBetterBlockAsm has been changed from 14 to 13 for the "encodeSnappyBetterBlockAsm64K" function call. Ensure that all corresponding function definitions and calls have been updated to match this new signature.

Run the following script to verify the function usage:

s2/encode_amd64.go Show resolved Hide resolved
s2/encode.go Outdated Show resolved Hide resolved
s2/encode.go Outdated Show resolved Hide resolved
s2/encode.go Outdated Show resolved Hide resolved
s2/_generate/gen.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (8)
s2/encode_go.go (3)

320-320: LGTM. Consider adding documentation for the new parameter.

The addition of the *[32768]byte parameter aligns with the PR objective of using a pooled array pointer for index tables instead of the stack. This change should help address issues related to stack instability in Go when handling large stacks.

Consider adding a comment explaining the purpose of this new parameter, even though it's currently unused within the function. This will help future maintainers understand the intention behind this change.


506-506: LGTM. Consider adding documentation for the new parameter.

The addition of the *[2048]byte parameter is consistent with the changes made to calcBlockSize and aligns with the PR objective. This should help optimize memory usage for smaller blocks while addressing stack instability issues.

Consider adding a comment explaining the purpose of this new parameter and why it uses a smaller array size compared to calcBlockSize. This will provide clarity for future maintainers and contributors.


320-320: Overall impact: Positive change, consider additional updates.

The changes to calcBlockSize and calcBlockSizeSmall are well-aligned with the PR objective of using pooled array pointers for index tables. This should help address stack instability issues in Go when handling large stacks.

Some considerations:

  1. Ensure that the calling code is updated to provide the new array pointers.
  2. Consider adding unit tests to verify that the functions work correctly with the new parameters.
  3. Update any relevant documentation or comments in other parts of the codebase that might reference these functions.

To fully implement the pooled array pointer strategy:

  1. Review and update any other functions in the package that might benefit from a similar approach.
  2. Consider creating a memory pool manager to efficiently allocate and reuse these arrays across multiple compression operations.
  3. Update the package documentation to explain this new memory management strategy and its benefits.

Also applies to: 506-506

s2/encode_amd64.go (1)

38-45: Ensure proper error handling when retrieving from the pool.

The code currently ignores the boolean return value from the pool's Get() method, which indicates whether the retrieved value is newly allocated. It's good practice to handle this explicitly to avoid any potential issues.

Consider adding error handling or logging when the pool returns a newly allocated value. For example:

tmp, ok := encPools[pool].Get().(*[sz]byte)
if !ok {
    log.Printf("encPools[%d]: created new buffer of size %d", pool, sz)
    tmp = &[sz]byte{}
}

This will help track when the pool is unable to reuse an existing buffer and has to allocate a new one.

Also applies to: 48-55, 58-65, 68-75, 80-87, 114-121, 124-132, 135-143, 146-153, 158-166, 189-197, 200-207, 210-217, 220-227, 232-239, 262-270, 274-282, 285-293, 296-303, 308-316

s2/encodeblock_amd64.go (2)

14-14: Consider adding tests to cover the new function signatures.

To ensure the correctness of the changes and prevent regressions, it's a good practice to add tests that exercise the new function signatures.

Create test cases that:

  • Call the modified functions with valid input and the correctly sized tmp parameter.
  • Verify that the functions behave as expected and produce the correct output.
  • Test edge cases and boundary conditions related to the tmp parameter.

Adding these tests will improve the overall test coverage and provide confidence in the changes made to the function signatures.

Do you want me to generate some example test cases for the modified functions?

Also applies to: 21-21, 28-28, 35-35, 42-42, 49-49, 56-56, 63-63, 70-70, 77-77, 84-84, 91-91, 98-98, 105-105, 112-112, 119-119, 126-126, 133-133, 140-140, 147-147, 154-154, 161-161


14-14: Document the purpose and usage of the tmp parameter.

Update the function documentation comments to clearly explain the purpose and usage of the new tmp parameter. Provide guidelines on how to allocate and initialize the tmp array correctly.

Include information such as:

  • The role of the tmp parameter in the encoding process.
  • The required size of the tmp array for each function.
  • Any assumptions or constraints related to the tmp parameter.

Clear documentation will help developers understand how to use these functions correctly and avoid potential issues related to incorrect usage of the tmp parameter.

Apply this diff to update the documentation comments:

// encodeBlockAsm encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4294967295 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 65536.
//
//go:noescape
func encodeBlockAsm(dst []byte, src []byte, tmp *[65536]byte) int

// encodeBlockAsm4MB encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4194304 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 65536.
//
//go:noescape
func encodeBlockAsm4MB(dst []byte, src []byte, tmp *[65536]byte) int

// encodeBlockAsm12B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 16383 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 16384.
//
//go:noescape
func encodeBlockAsm12B(dst []byte, src []byte, tmp *[16384]byte) int

// encodeBlockAsm10B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4095 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 4096.
//
//go:noescape
func encodeBlockAsm10B(dst []byte, src []byte, tmp *[4096]byte) int

// encodeBlockAsm8B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 511 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 1024.
//
//go:noescape
func encodeBlockAsm8B(dst []byte, src []byte, tmp *[1024]byte) int

// encodeBetterBlockAsm encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4294967295 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 589824.
//
//go:noescape
func encodeBetterBlockAsm(dst []byte, src []byte, tmp *[589824]byte) int

// encodeBetterBlockAsm4MB encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4194304 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 589824.
//
//go:noescape
func encodeBetterBlockAsm4MB(dst []byte, src []byte, tmp *[589824]byte) int

// encodeBetterBlockAsm12B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 16383 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 81920.
//
//go:noescape
func encodeBetterBlockAsm12B(dst []byte, src []byte, tmp *[81920]byte) int

// encodeBetterBlockAsm10B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4095 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 20480.
//
//go:noescape
func encodeBetterBlockAsm10B(dst []byte, src []byte, tmp *[20480]byte) int

// encodeBetterBlockAsm8B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 511 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 5120.
//
//go:noescape
func encodeBetterBlockAsm8B(dst []byte, src []byte, tmp *[5120]byte) int

// encodeSnappyBlockAsm encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4294967295 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 65536.
//
//go:noescape
func encodeSnappyBlockAsm(dst []byte, src []byte, tmp *[65536]byte) int

// encodeSnappyBlockAsm64K encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 65535 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 65536.
//
//go:noescape
func encodeSnappyBlockAsm64K(dst []byte, src []byte, tmp *[65536]byte) int

// encodeSnappyBlockAsm12B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 16383 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 16384.
//
//go:noescape
func encodeSnappyBlockAsm12B(dst []byte, src []byte, tmp *[16384]byte) int

// encodeSnappyBlockAsm10B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4095 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 4096.
//
//go:noescape
func encodeSnappyBlockAsm10B(dst []byte, src []byte, tmp *[4096]byte) int

// encodeSnappyBlockAsm8B encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 511 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter is a pointer to a temporary byte array used during encoding.
+// It must be a pointer to a byte array of size 1024.
//
//go:noescape
func encodeSnappyBlockAsm8B(dst []byte, src []byte, tmp *[1024]byte) int

// encodeSnappyBetterBlockAsm encodes a non-empty src to a guaranteed-large-enough dst.
// Maximum input 4294967295 bytes.
// It assumes that the varint-encoded length of the decompressed bytes has already been written.
+// The `tmp` parameter


Also applies to: 21-21, 28-28, 35-35, 42-42, 49-49, 56-56, 63-63, 70-70, 77-77, 84-84, 91-91, 98-98, 105-105, 112-112, 119-119, 126-126, 133-133, 140-140, 147-147, 154-154, 161-161

</blockquote></details>
<details>
<summary>s2/encode.go (1)</summary><blockquote>

`69-87`: **Consider adding comments to explain the pool sizes and their usage.**

The changes introduce two different pool sizes (`2048` and `32768`) for handling small and larger input sizes respectively. To improve code readability and maintainability, consider adding comments to explain the rationale behind these pool sizes and their corresponding usage.

Here's an example of how you can add comments to clarify the pool sizes:

```go
// For small inputs (less than or equal to 1024 bytes), use a pool with byte arrays of size 2048.
// This pool is optimized for small input sizes to reduce memory allocation overhead.
const sz, pool = 2048, 0
tmp, ok := estblockPool[pool].Get().(*[sz]byte)
// ...

// For larger inputs (greater than 1024 bytes), use a pool with byte arrays of size 32768.
// This pool is optimized for larger input sizes to provide better performance.
const sz, pool = 32768, 1
tmp, ok := estblockPool[pool].Get().(*[sz]byte)
// ...

Adding clear comments will help future maintainers understand the purpose and usage of these pool sizes.

s2/_generate/gen.go (1)

874-883: Ensure the temporary table size is sufficient.

The genEncodeBetterBlockAsm function now takes an additional tmp *[tableSize]byte parameter to use a temporary table allocated on the heap instead of the stack.

However, the table size calculation and checks seem to be duplicated from the genEncodeBlockAsm function. Consider refactoring this to avoid duplication and ensure the size is consistent and sufficient for all cases.

Apply this diff to remove the duplicated table size calculation and checks:

-	var lTableSize = 4 * (1 << lTableBits)
-	var sTableSize = 4 * (1 << sTableBits)
-	tableSize := lTableSize + sTableSize
-
-	// Memzero needs at least 128 bytes.
-	if tableSize < 128 {
-		panic("tableSize must be at least 128 bytes")
-	}
-	arrPtr := fmt.Sprintf(", tmp *[%d]byte", tableSize)
-

Then update the function signature to reuse the tableSize variable from genEncodeBlockAsm:

-func (o options) genEncodeBetterBlockAsm(name string, lTableBits, sTableBits, skipLog, lHashBytes, maxLen int) {
+func (o options) genEncodeBetterBlockAsm(name string, lTableBits, sTableBits, skipLog, lHashBytes, maxLen int, tableSize int) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f73ab1e and 87a8119.

⛔ Files ignored due to path filters (1)
  • s2/testdata/fuzz/block-corpus-enc.zip is excluded by !**/*.zip
📒 Files selected for processing (6)
  • s2/_generate/gen.go (9 hunks)
  • s2/encode.go (3 hunks)
  • s2/encode_amd64.go (5 hunks)
  • s2/encode_go.go (2 hunks)
  • s2/encodeblock_amd64.go (1 hunks)
  • s2/s2_test.go (1 hunks)
🔇 Additional comments (12)
s2/encode_go.go (1)

320-320: Summary: Approved changes with minor suggestions for improvement.

The modifications to calcBlockSize and calcBlockSizeSmall are well-implemented and align with the PR objective of using pooled array pointers for index tables. These changes should help address stack instability issues in Go when handling large stacks, without altering the core compression logic.

Suggestions for improvement:

  1. Add comments explaining the purpose of the new parameters in both functions.
  2. Update relevant documentation and calling code to reflect these changes.
  3. Consider implementing a memory pool manager for efficient array allocation and reuse.

Overall, this is a positive change that should improve the package's performance and stability.

Also applies to: 506-506

s2/encode_amd64.go (1)

38-45: Verify that the assembly functions are thread-safe.

Since the buffers retrieved from the pools are shared across goroutines, it's crucial to ensure that the assembly functions (encodeBlockAsm, encodeBlockBetterAsm, etc.) are thread-safe and don't introduce any data races when accessing the shared buffers.

To verify the thread-safety of the assembly functions, you can use the Go race detector by running tests with the -race flag:

If the race detector reports any issues, carefully review the assembly code and make necessary synchronization changes.

Additionally, consider adding a comment to each assembly function documenting its thread-safety guarantees. This will help maintainers understand the concurrency semantics of the code.

For example:

// encodeBlockAsm is thread-safe and can be called concurrently with the same dst and src buffers.
// The tmp buffer must not be shared between concurrent calls.
func encodeBlockAsm(dst, src, tmp []byte) int {
    // ...
}

Also applies to: 48-55, 58-65, 68-75, 80-87, 114-121, 124-132, 135-143, 146-153, 158-166, 189-197, 200-207, 210-217, 220-227, 232-239, 262-270, 274-282, 285-293, 296-303, 308-316

s2/encodeblock_amd64.go (2)

14-14: Ensure the tmp parameter is properly initialized and has the correct size.

When calling these functions, make sure to allocate and pass a byte array of the appropriate size as the tmp parameter. Using an incorrectly sized array may lead to buffer overflows or unexpected behavior.

Double-check that the size of the tmp array matches the size specified in the function signature for each function.

Run the following script to verify the initialization of the tmp parameter:

#!/bin/bash

# Description:
# Verify that the `tmp` parameter is properly initialized with the correct size when calling the modified functions.

# Test: Search for the function usages and check the `tmp` parameter initialization.
# Expect: The `tmp` parameter should be initialized with a byte array of the size specified in the function signature.
functions=(
  "encodeBlockAsm:65536"
  "encodeBlockAsm4MB:65536"
  "encodeBlockAsm12B:16384"
  "encodeBlockAsm10B:4096"
  "encodeBlockAsm8B:1024"
  "encodeBetterBlockAsm:589824"
  "encodeBetterBlockAsm4MB:589824"
  "encodeBetterBlockAsm12B:81920"
  "encodeBetterBlockAsm10B:20480"
  "encodeBetterBlockAsm8B:5120"
  "encodeSnappyBlockAsm:65536"
  "encodeSnappyBlockAsm64K:65536"
  "encodeSnappyBlockAsm12B:16384"
  "encodeSnappyBlockAsm10B:4096"
  "encodeSnappyBlockAsm8B:1024"
  "encodeSnappyBetterBlockAsm:589824"
  "encodeSnappyBetterBlockAsm64K:294912"
  "encodeSnappyBetterBlockAsm12B:81920"
  "encodeSnappyBetterBlockAsm10B:20480"
  "encodeSnappyBetterBlockAsm8B:5120"
  "calcBlockSize:32768"
  "calcBlockSizeSmall:2048"
)

for func_size in "${functions[@]}"; do
  func=${func_size%%:*}
  size=${func_size#*:}
  echo "Checking initialization of 'tmp' for $func..."
  rg --type go -A 5 "\b$func\b\(" | awk -v size="$size" '$0 ~ "&\\[[0-9]+\\]byte" { 
    match($0, "&\\[([0-9]+)\\]byte", arr)
    if (arr[1] != size) {
      print "Error: Incorrect size for 'tmp' parameter. Expected: " size ", Found: " arr[1]
    }
  }'
done

If any mismatches are found, update the tmp parameter initialization to use the correct size.

Also applies to: 21-21, 28-28, 35-35, 42-42, 49-49, 56-56, 63-63, 70-70, 77-77, 84-84, 91-91, 98-98, 105-105, 112-112, 119-119, 126-126, 133-133, 140-140, 147-147, 154-154, 161-161


14-14: Verify the impact of the function signature changes in the codebase.

The function signatures have been updated to include an additional tmp parameter, which is a pointer to a byte array of varying sizes. This change is consistent across all the affected functions.

However, it's crucial to ensure that all the callers of these functions have been updated to pass the required tmp parameter. Failing to do so may result in compilation errors or unexpected behavior.

Run the following script to verify the usage of these functions:

If any inconsistencies are found, the callers must be updated to match the new function signatures.

Also applies to: 21-21, 28-28, 35-35, 42-42, 49-49, 56-56, 63-63, 70-70, 77-77, 84-84, 91-91, 98-98, 105-105, 112-112, 119-119, 126-126, 133-133, 140-140, 147-147, 154-154, 161-161

✅ Verification successful

All function callers correctly pass the tmp parameter.

No inconsistencies were found in the function calls.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash

# Description: 
# Verify that all callers of the modified functions have been updated to pass the `tmp` parameter.

# Test: Search for the function usages. 
# Expect: All occurrences should pass the `tmp` parameter.
functions=(
  "encodeBlockAsm"
  "encodeBlockAsm4MB"
  "encodeBlockAsm12B"
  "encodeBlockAsm10B"
  "encodeBlockAsm8B"
  "encodeBetterBlockAsm"
  "encodeBetterBlockAsm4MB"
  "encodeBetterBlockAsm12B"
  "encodeBetterBlockAsm10B"
  "encodeBetterBlockAsm8B"
  "encodeSnappyBlockAsm"
  "encodeSnappyBlockAsm64K"
  "encodeSnappyBlockAsm12B"
  "encodeSnappyBlockAsm10B"
  "encodeSnappyBlockAsm8B"
  "encodeSnappyBetterBlockAsm"
  "encodeSnappyBetterBlockAsm64K"
  "encodeSnappyBetterBlockAsm12B"
  "encodeSnappyBetterBlockAsm10B"
  "encodeSnappyBetterBlockAsm8B"
  "calcBlockSize"
  "calcBlockSizeSmall"
)

for func in "${functions[@]}"; do
  echo "Checking usages of $func..."
  rg --type go -A 5 "\b$func\b\("
done

Length of output: 11421

s2/encode.go (3)

69-87: Evaluate the impact of the changes on the overall performance.

The modifications in the EstimateBlockSize function introduce the usage of estblockPool and change the way temporary byte arrays are handled. While these changes aim to improve memory efficiency, it's important to evaluate their impact on the overall performance of the encoding process.

To assess the performance impact, consider running benchmarks and profiling the code before and after the changes. Here's a suggested approach:

  1. Create benchmark tests for the EstimateBlockSize function with different input sizes (small and large).
  2. Run the benchmarks on the original code and record the results.
  3. Apply the changes to the EstimateBlockSize function.
  4. Run the benchmarks again on the modified code and compare the results with the original benchmarks.
  5. Analyze the benchmark results to determine if there are any significant performance improvements or regressions.
  6. If necessary, use profiling tools to identify any performance bottlenecks or excessive memory allocations.

By conducting a thorough performance evaluation, you can ensure that the changes indeed improve the efficiency of the encoding process without introducing any unexpected performance issues.


58-59: Ensure proper synchronization and avoid contention.

The introduction of estblockPool as a slice of sync.Pool is a good approach to manage temporary byte arrays. However, it's crucial to ensure that the access to the pool is properly synchronized to avoid contention and race conditions.

To verify the proper usage of estblockPool, run the following script:

#!/bin/bash

# Check if estblockPool is accessed within a mutex lock
if ! rg --type go 'estblockPool\[.*\]\.Get\(\)' -A 10 | rg -q 'mu(tex)?\.Lock\(\)'; then
  echo "estblockPool is accessed without proper synchronization."
else
  echo "estblockPool access is properly synchronized."
fi

If the pool access is not properly synchronized, consider adding necessary mutex locks or using atomic operations to ensure thread safety.


69-77: Ensure proper usage of temporary byte arrays.

The changes introduce the usage of temporary byte arrays fetched from estblockPool to optimize memory usage during block size estimation. It's important to ensure that these temporary arrays are properly initialized, used, and returned to the pool to avoid memory leaks or unexpected behavior.

To verify the proper usage of temporary byte arrays, run the following script:

#!/bin/bash

# Check if the temporary byte arrays are properly returned to the pool
if ! rg --type go 'tmp, ok := estblockPool\[pool\]\.Get\(\)' -A 10 | rg -q 'defer estblockPool\[pool\]\.Put\(tmp\)'; then
  echo "Temporary byte arrays are not properly returned to the pool."
else
  echo "Temporary byte arrays are properly returned to the pool."
fi

# Check if the temporary byte arrays are properly initialized
if ! rg --type go 'tmp, ok := estblockPool\[pool\]\.Get\(\)' -A 10 | rg -q 'tmp = &\[.*\]byte{}'; then
  echo "Temporary byte arrays are not properly initialized when not found in the pool."
else
  echo "Temporary byte arrays are properly initialized when not found in the pool."
fi

If the temporary arrays are not properly managed, make sure to return them to the pool using defer statements and initialize them correctly when they are not found in the pool.

Also applies to: 79-87

s2/s2_test.go (3)

2221-2227: LGTM!

The BenchmarkDecodeBlockParallel looks good. It uses the existing benchFile function to run parallel decoding benchmarks for each test file.


2245-2251: LGTM!

The BenchmarkEncodeBlockParallel looks good. It uses the existing benchFile function to run parallel encoding benchmarks for each test file.


2213-2215: Verify the impact of limiting the data size on benchmarks.

Limiting the data size for benchmarks with sizeLimit may affect the benchmark results and not provide a comprehensive performance assessment. Ensure that the benchmarks cover a representative range of data sizes to get meaningful performance measurements.

To verify the impact, run the benchmarks with different sizeLimit values or without any limit and compare the results:

s2/_generate/gen.go (2)

2942-2942: Verify the output margin is sufficient for all cases.

The output margin is being set to 8 bytes. However, it's important to ensure this margin is sufficient to handle all possible output scenarios without risking buffer overflows.

Perform a thorough analysis of the maximum possible output sizes across all compression algorithms and configurations:


84-84: Verify the maximum skip value is correctly adjusted for Snappy compatibility.

The maximum skip value is being set to 0 for Snappy compatibility. Please ensure this is the intended behavior and that it does not negatively impact performance for Snappy use cases.

To verify, search the codebase for references to maxSkip and confirm the intended usage:

✅ Verification successful

Verified: The maxSkip value is correctly set to 0 for Snappy compatibility and does not negatively impact performance for Snappy use cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if setting maxSkip to 0 for Snappy is intentional and valid.

# 1. Search for where maxSkip is set and used. 
# Expect: Should only be set to 0 for Snappy and used correctly.
rg -A 5 -B 5 'maxSkip'

# 2. Search for Snappy-related logic that may be impacted by maxSkip. 
# Expect: Logic should handle maxSkip=0 case properly.
rg -A 5 -B 5 'snappy'

Length of output: 142770

s2/encode_amd64.go Show resolved Hide resolved
s2/encode_amd64.go Show resolved Hide resolved
s2/encode.go Show resolved Hide resolved
s2/s2_test.go Show resolved Hide resolved
s2/s2_test.go Show resolved Hide resolved
s2/_generate/gen.go Show resolved Hide resolved
@klauspost
Copy link
Owner Author

klauspost commented Oct 4, 2024

12 hours fuzz testing passed, execs: 938238817

@klauspost klauspost merged commit dbd6c38 into master Oct 4, 2024
19 checks passed
@klauspost klauspost deleted the s2-dont-use-stack branch October 4, 2024 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant