Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tempodb integer divide by zero error #2167

Closed
kroksys opened this issue Mar 3, 2023 · 9 comments · Fixed by #2333
Closed

tempodb integer divide by zero error #2167

kroksys opened this issue Mar 3, 2023 · 9 comments · Fixed by #2333
Labels
good first issue Good for newcomers

Comments

@kroksys
Copy link

kroksys commented Mar 3, 2023

I'm just letting you know about potential problem. I did some experiments and somehow I broke tempo storage connected with minio (without replication). I have no idea how to reproduce it, but from stack trace it seems like it needs to check for 0 value before dividing it somewhere. I'm obviously using docker-compose setup and after I removed volume and started everything over again the issue is gone.

stack-tempo-1    | panic: runtime error: integer divide by zero
stack-tempo-1    | 
stack-tempo-1    | goroutine 632 [running]:
stack-tempo-1    | github.com/grafana/tempo/tempodb.(*timeWindowBlockSelector).windowForTime(...)
stack-tempo-1    |      /drone/src/tempodb/compaction_block_selector.go:182
stack-tempo-1    | github.com/grafana/tempo/tempodb.newTimeWindowBlockSelector({0x4000284cf0, 0x1, 0xd?}, 0x0, 0x0, 0x0, 0x2, 0x4)
stack-tempo-1    |      /drone/src/tempodb/compaction_block_selector.go:58 +0x604
stack-tempo-1    | github.com/grafana/tempo/tempodb.(*readerWriter).doCompaction(0x40003006c0)
stack-tempo-1    |      /drone/src/tempodb/compactor.go:92 +0xf4
stack-tempo-1    | github.com/grafana/tempo/tempodb.(*readerWriter).compactionLoop(0x40003006c0)
stack-tempo-1    |      /drone/src/tempodb/compactor.go:74 +0x64
stack-tempo-1    | created by github.com/grafana/tempo/tempodb.(*readerWriter).EnableCompaction
stack-tempo-1    |      /drone/src/tempodb/tempodb.go:407 +0x1d0
stack-tempo-1 exited with code 2
@joe-elliott
Copy link
Member

The divide by zero is here:

return t.Unix() / int64(twbs.MaxCompactionRange/time.Second)

and should only happen if compaction_window is set to 0:

compactor:
    compaction:
        compaction_window: 5m

We should error more gracefully and log something meaningful in this case instead of just panicing.

@joe-elliott joe-elliott added the good first issue Good for newcomers label Mar 3, 2023
@kroksys
Copy link
Author

kroksys commented Mar 3, 2023

In my case compactor is set to null. I guess googling and copy/paste don't always work out. But after full reset it works even with this config.
If compactor: null then all objects inside that config will be 0, nil or "", etc. ?

compactor: null

distributor:
  receivers:
    otlp:
      protocols:
        http:
        grpc:

overrides:
  ingestion_burst_size_bytes: 10_000_000
  max_traces_per_user: 1000000

server:
  http_listen_port: 3200

storage:
  trace:
    backend: s3
    s3:
      bucket: tempo
      endpoint: minio:9000
      access_key: minio
      secret_key: 9reatsecr3t
      insecure: true
    pool:
      queue_depth: 2000
    wal:
      path: /var/tempo/wal

@joe-elliott
Copy link
Member

If compactor: null then all objects inside that config will be 0, nil or "", etc. ?

Honestly, I don't know if that would take Tempo defaults are be set to all 0s.

@ie-pham ie-pham self-assigned this Mar 7, 2023
@mghildiy
Copy link
Contributor

mghildiy commented Apr 1, 2023

Can I take this issue?

@ie-pham ie-pham removed their assignment Apr 3, 2023
@ie-pham
Copy link
Contributor

ie-pham commented Apr 3, 2023

@mghildiy go ahead! :)

@mghildiy
Copy link
Contributor

mghildiy commented Apr 4, 2023

The divide by zero is here:

return t.Unix() / int64(twbs.MaxCompactionRange/time.Second)

and should only happen if compaction_window is set to 0:

compactor:
    compaction:
        compaction_window: 5m

We should error more gracefully and log something meaningful in this case instead of just panicing.

I see 2 divisions going on here. I need to take care of both, I guess.

@mghildiy
Copy link
Contributor

mghildiy commented Apr 4, 2023

For testing, a unit test would suffice I guess.
Also, is it possible to simulate 'distributed' setup locally to perform some sort of user testing?

@mapno
Copy link
Member

mapno commented Apr 4, 2023

There are a few example set-ups in this folder https:/grafana/tempo/tree/main/example

@mghildiy
Copy link
Contributor

I have PR here:

#2333

But its build is going on since ages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants