Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic autocreation stalls producer for 10 seconds #114

Closed
AnatolyRugalev opened this issue Dec 6, 2021 · 8 comments
Closed

Topic autocreation stalls producer for 10 seconds #114

AnatolyRugalev opened this issue Dec 6, 2021 · 8 comments

Comments

@AnatolyRugalev
Copy link

AnatolyRugalev commented Dec 6, 2021

We are using topic autocreation with franz-go and we experience noticeable delay when producing first message to non-existing topic with ProduceSync method. After some debugging I found that reducing MetadataMinAge reduces this delay as well. Is that expected behavior?

Relevant client configuration:

opts := []kgo.Opt{
  kgo.ProduceRequestTimeout(2 * time.Second),
  kgo.RetryTimeout(6 * time.Second),
  kgo.MetadataMaxAge(2 * time.Minute),
  kgo.AllowAutoTopicCreation(),
}

franz-go version: v1.2.3

Click to expand log file
time=2021-12-06T20:01:04+01:00 level=info msg=producing to a new topic for the first time, fetching metadata to learn its partitions[topic local_8800ab87-b192-4b8c-b755-7a0d37c343b2] reporter=job-producer topic=local_8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:04+01:00 level=info msg=immediate metadata update triggered[why forced load due to unknown produce topic] reporter=job-producer why=forced load due to unknown produce topic
time=2021-12-06T20:01:04+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-producer
time=2021-12-06T20:01:04+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-producer
time=2021-12-06T20:01:04+01:00 level=debug msg=issuing api versions request[broker seed 0 version 3] broker=seed 0 reporter=job-producer version=3
time=2021-12-06T20:01:04+01:00 level=debug msg=wrote ApiVersions v3[broker seed 0 bytes_written 31 write_wait 165.368µs time_to_write 39.661µs err <nil>] broker=seed 0 bytes_written=31 err=<nil> reporter=job-producer time_to_write=39.661µs write_wait=165.368µs
time=2021-12-06T20:01:04+01:00 level=debug msg=read ApiVersions v3[broker seed 0 bytes_read 404 read_wait 460.982µs time_to_read 2.669408ms err <nil>] broker=seed 0 bytes_read=404 err=<nil> read_wait=460.982µs reporter=job-producer time_to_read=2.669408ms
time=2021-12-06T20:01:04+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-producer
time=2021-12-06T20:01:04+01:00 level=debug msg=wrote Metadata v9[broker seed 0 bytes_written 67 write_wait 5.06139ms time_to_write 30.492µs err <nil>] broker=seed 0 bytes_written=67 err=<nil> reporter=job-producer time_to_write=30.492µs write_wait=5.06139ms
time=2021-12-06T20:01:04+01:00 level=debug msg=read Metadata v9[broker seed 0 bytes_read 119 read_wait 410.821µs time_to_read 17.599878ms err <nil>] broker=seed 0 bytes_read=119 err=<nil> read_wait=410.821µs reporter=job-producer time_to_read=17.599878ms
time=2021-12-06T20:01:04+01:00 level=info msg=new topic metadata wait failed, retrying wait[topic local_8800ab87-b192-4b8c-b755-7a0d37c343b2 err LEADER_NOT_AVAILABLE: There is no leader for this topic-partition as we are in the middle of a leadership election.] err=LEADER_NOT_AVAILABLE: There is no leader for this topic-partition as we are in the middle of a leadership election. reporter=job-producer topic=local_8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:04+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-producer
time=2021-12-06T20:01:04+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-producer
time=2021-12-06T20:01:04+01:00 level=debug msg=issuing api versions request[broker 1 version 3] broker=1 reporter=job-producer version=3
time=2021-12-06T20:01:04+01:00 level=debug msg=wrote ApiVersions v3[broker 1 bytes_written 31 write_wait 106.234µs time_to_write 28.198µs err <nil>] broker=1 bytes_written=31 err=<nil> reporter=job-producer time_to_write=28.198µs write_wait=106.234µs
time=2021-12-06T20:01:04+01:00 level=debug msg=read ApiVersions v3[broker 1 bytes_read 404 read_wait 475.948µs time_to_read 3.548014ms err <nil>] broker=1 bytes_read=404 err=<nil> read_wait=475.948µs reporter=job-producer time_to_read=3.548014ms
time=2021-12-06T20:01:04+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-producer
time=2021-12-06T20:01:04+01:00 level=debug msg=wrote Metadata v9[broker 1 bytes_written 67 write_wait 5.91561ms time_to_write 32.116µs err <nil>] broker=1 bytes_written=67 err=<nil> reporter=job-producer time_to_write=32.116µs write_wait=5.91561ms
time=2021-12-06T20:01:04+01:00 level=debug msg=read Metadata v9[broker 1 bytes_read 119 read_wait 503.213µs time_to_read 14.927694ms err <nil>] broker=1 bytes_read=119 err=<nil> read_wait=503.213µs reporter=job-producer time_to_read=14.927694ms
time=2021-12-06T20:01:04+01:00 level=info msg=new topic metadata wait failed, retrying wait[topic local_8800ab87-b192-4b8c-b755-7a0d37c343b2 err LEADER_NOT_AVAILABLE: There is no leader for this topic-partition as we are in the middle of a leadership election.] err=LEADER_NOT_AVAILABLE: There is no leader for this topic-partition as we are in the middle of a leadership election. reporter=job-producer topic=local_8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:04+01:00 level=debug msg=wrote Metadata v9[broker seed 0 bytes_written 67 write_wait 267.552µs time_to_write 34.995µs err <nil>] broker=seed 0 bytes_written=67 err=<nil> reporter=job-producer time_to_write=34.995µs write_wait=267.552µs
time=2021-12-06T20:01:04+01:00 level=debug msg=read Metadata v9[broker seed 0 bytes_read 119 read_wait 526.747µs time_to_read 12.997286ms err <nil>] broker=seed 0 bytes_read=119 err=<nil> read_wait=526.747µs reporter=job-producer time_to_read=12.997286ms
time=2021-12-06T20:01:04+01:00 level=info msg=metadata update triggered[why re-updating due to inner errors: LEADER_NOT_AVAILABLE{local_8800ab87-b192-4b8c-b755-7a0d37c343b2}] reporter=job-producer why=re-updating due to inner errors: LEADER_NOT_AVAILABLE{local_8800ab87-b192-4b8c-b755-7a0d37c343b2}
time=2021-12-06T20:01:04+01:00 level=info msg=new topic metadata wait failed, retrying wait[topic local_8800ab87-b192-4b8c-b755-7a0d37c343b2 err LEADER_NOT_AVAILABLE: There is no leader for this topic-partition as we are in the middle of a leadership election.] err=LEADER_NOT_AVAILABLE: There is no leader for this topic-partition as we are in the middle of a leadership election. reporter=job-producer topic=local_8800ab87-b192-4b8c-b755-7a0d37c343b2

============== 10 second gap here

time=2021-12-06T20:01:14+01:00 level=debug msg=wrote Metadata v9[broker 1 bytes_written 67 write_wait 266.433µs time_to_write 50.592µs err <nil>] broker=1 bytes_written=67 err=<nil> reporter=job-producer time_to_write=50.592µs write_wait=266.433µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read Metadata v9[broker 1 bytes_read 197 read_wait 311.16733ms time_to_read 25.563µs err <nil>] broker=1 bytes_read=197 err=<nil> read_wait=311.16733ms reporter=job-producer time_to_read=25.563µs
time=2021-12-06T20:01:14+01:00 level=info msg=done waiting for metadata for new topic[topic local_8800ab87-b192-4b8c-b755-7a0d37c343b2] reporter=job-producer topic=local_8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=initializing producer id[] reporter=job-producer
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote InitProducerID v4[broker seed 0 bytes_written 34 write_wait 173.641µs time_to_write 34.41µs err <nil>] broker=seed 0 bytes_written=34 err=<nil> reporter=job-producer time_to_write=34.41µs write_wait=173.641µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read InitProducerID v4[broker seed 0 bytes_read 26 read_wait 483.243µs time_to_read 1.804594ms err <nil>] broker=seed 0 bytes_read=26 err=<nil> read_wait=483.243µs reporter=job-producer time_to_read=1.804594ms
time=2021-12-06T20:01:14+01:00 level=info msg=producer id initialization success[id 7 epoch 0] epoch=0 id=7 reporter=job-producer
time=2021-12-06T20:01:14+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-producer
time=2021-12-06T20:01:14+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-producer
time=2021-12-06T20:01:14+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-producer
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote Produce v8[broker 1 bytes_written 244 write_wait 981.393µs time_to_write 25.516µs err <nil>] broker=1 bytes_written=244 err=<nil> reporter=job-producer time_to_write=25.516µs write_wait=981.393µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read Produce v8[broker 1 bytes_read 100 read_wait 472.121µs time_to_read 3.200482ms err <nil>] broker=1 bytes_read=100 err=<nil> read_wait=472.121µs reporter=job-producer time_to_read=3.200482ms
time=2021-12-06T20:01:14+01:00 level=debug msg=produced[broker 1 to local_8800ab87-b192-4b8c-b755-7a0d37c343b2[2{0=>1}]] broker=1 reporter=job-producer to=local_8800ab87-b192-4b8c-b755-7a0d37c343b2[2{0=>1}]
time=2021-12-06T20:01:14+01:00 level=info msg=beginning autocommit loop[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=metadata update triggered[why client initialization] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 why=client initialization
time=2021-12-06T20:01:14+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=issuing api versions request[broker seed 0 version 3] broker=seed 0 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 version=3
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote ApiVersions v3[broker seed 0 bytes_written 31 write_wait 62.705µs time_to_write 21.894µs err <nil>] broker=seed 0 bytes_written=31 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=21.894µs write_wait=62.705µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read ApiVersions v3[broker seed 0 bytes_read 404 read_wait 360.535µs time_to_read 2.555572ms err <nil>] broker=seed 0 bytes_read=404 err=<nil> read_wait=360.535µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=2.555572ms
time=2021-12-06T20:01:14+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote Metadata v9[broker seed 0 bytes_written 67 write_wait 4.299755ms time_to_write 24.078µs err <nil>] broker=seed 0 bytes_written=67 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=24.078µs write_wait=4.299755ms
time=2021-12-06T20:01:14+01:00 level=debug msg=read Metadata v9[broker seed 0 bytes_read 197 read_wait 346.522µs time_to_read 1.566851ms err <nil>] broker=seed 0 bytes_read=197 err=<nil> read_wait=346.522µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=1.566851ms
time=2021-12-06T20:01:14+01:00 level=info msg=beginning to manage the group lifecycle[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=joining group[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=issuing api versions request[broker 1 version 3] broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 version=3
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote ApiVersions v3[broker 1 bytes_written 31 write_wait 57.763µs time_to_write 22.921µs err <nil>] broker=1 bytes_written=31 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=22.921µs write_wait=57.763µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read ApiVersions v3[broker 1 bytes_read 404 read_wait 384.176µs time_to_read 1.676039ms err <nil>] broker=1 bytes_read=404 err=<nil> read_wait=384.176µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=1.676039ms
time=2021-12-06T20:01:14+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote FindCoordinator v3[broker 1 bytes_written 63 write_wait 3.565361ms time_to_write 22.191µs err <nil>] broker=1 bytes_written=63 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=22.191µs write_wait=3.565361ms
time=2021-12-06T20:01:14+01:00 level=debug msg=read FindCoordinator v3[broker 1 bytes_read 39 read_wait 422.746µs time_to_read 1.442076ms err <nil>] broker=1 bytes_read=39 err=<nil> read_wait=422.746µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=1.442076ms
time=2021-12-06T20:01:14+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote JoinGroup v7[broker 1 bytes_written 173 write_wait 757.116µs time_to_write 22.229µs err <nil>] broker=1 bytes_written=173 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=22.229µs write_wait=757.116µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read JoinGroup v7[broker 1 bytes_read 248 read_wait 404.855µs time_to_read 5.129187ms err <nil>] broker=1 bytes_read=248 err=<nil> read_wait=404.855µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=5.129187ms
time=2021-12-06T20:01:14+01:00 level=info msg=joined, balancing group[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 member_id test-155e8289-3318-48fb-9e49-ef373a0d877b instance_id 0xc00049b5a0 generation 1 balance_protocol cooperative-sticky leader true] balance_protocol=cooperative-sticky generation=1 group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 instance_id=0xc00049b5a0 leader=true member_id=test-155e8289-3318-48fb-9e49-ef373a0d877b reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=balancing group as leader[] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=balance group member[id test-155e8289-3318-48fb-9e49-ef373a0d877b instance_id test interests interested topics: [local_8800ab87-b192-4b8c-b755-7a0d37c343b2], previously owned: ] id=test-155e8289-3318-48fb-9e49-ef373a0d877b instance_id=test interests=interested topics: [local_8800ab87-b192-4b8c-b755-7a0d37c343b2], previously owned:  reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=balanced[plan test-155e8289-3318-48fb-9e49-ef373a0d877b{local_8800ab87-b192-4b8c-b755-7a0d37c343b2[0 1 2]}] plan=test-155e8289-3318-48fb-9e49-ef373a0d877b{local_8800ab87-b192-4b8c-b755-7a0d37c343b2[0 1 2]} reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=syncing[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 protocol_type consumer protocol cooperative-sticky] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 protocol=cooperative-sticky protocol_type=consumer reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote SyncGroup v5[broker 1 bytes_written 256 write_wait 106.141µs time_to_write 23.118µs err <nil>] broker=1 bytes_written=256 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=23.118µs write_wait=106.141µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read SyncGroup v5[broker 1 bytes_read 115 read_wait 410.639µs time_to_read 2.564327ms err <nil>] broker=1 bytes_read=115 err=<nil> read_wait=410.639µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=2.564327ms
time=2021-12-06T20:01:14+01:00 level=info msg=synced[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 assigned local_8800ab87-b192-4b8c-b755-7a0d37c343b2[0 1 2]] assigned=local_8800ab87-b192-4b8c-b755-7a0d37c343b2[0 1 2] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=new group session begun[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 added map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:[0 1 2]] lost map[]] added=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:[0 1 2]] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 lost=map[] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=beginning heartbeat loop[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=fetching offsets for added partitions[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 added map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:[0 1 2]]] added=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:[0 1 2]] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=entering OnAssigned[with map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:[0 1 2]]] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 with=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:[0 1 2]]
time=2021-12-06T20:01:14+01:00 level=debug msg=sharded request[destinations [1]] destinations=[1] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote OffsetFetch v7[broker 1 bytes_written 121 write_wait 196.254µs time_to_write 24.921µs err <nil>] broker=1 bytes_written=121 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=24.921µs write_wait=196.254µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read OffsetFetch v7[broker 1 bytes_read 122 read_wait 411.192µs time_to_read 1.533869ms err <nil>] broker=1 bytes_read=122 err=<nil> read_wait=411.192µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=1.533869ms
time=2021-12-06T20:01:14+01:00 level=info msg=assigning partitions[why newly fetched offsets for group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 how assigning everything new, keeping current assignment input map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{-2.-1~0} 1:{-2.-1~0} 2:{-2.-1~0}]]] how=assigning everything new, keeping current assignment input=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{-2.-1~0} 1:{-2.-1~0} 2:{-2.-1~0}]] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 why=newly fetched offsets for group local_8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=assign requires loading offsets[] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=offsets to load broker[broker 1 load {map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{-2.-1~0} 1:{-2.-1~0} 2:{-2.-1~0}]] map[]}] broker=1 load={map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{-2.-1~0} 1:{-2.-1~0} 2:{-2.-1~0}]] map[]} reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote ListOffsets v5[broker 1 bytes_written 122 write_wait 109.965µs time_to_write 27.414µs err <nil>] broker=1 bytes_written=122 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=27.414µs write_wait=109.965µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read ListOffsets v5[broker 1 bytes_read 142 read_wait 437.378µs time_to_read 1.667824ms err <nil>] broker=1 bytes_read=142 err=<nil> read_wait=437.378µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=1.667824ms
time=2021-12-06T20:01:14+01:00 level=debug msg=handled list results[broker 1 using map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{0 0} 1:{0 0} 2:{0 0}]] reloading map[]] broker=1 reloading=map[] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 using=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{0 0} 1:{0 0} 2:{0 0}]]
time=2021-12-06T20:01:14+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote Fetch v12[broker 1 bytes_written 191 write_wait 845.563µs time_to_write 26.46µs err <nil>] broker=1 bytes_written=191 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=26.46µs write_wait=845.563µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read Fetch v12[broker 1 bytes_read 337 read_wait 352.325µs time_to_read 3.331182ms err <nil>] broker=1 bytes_read=337 err=<nil> read_wait=352.325µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=3.331182ms
time=2021-12-06T20:01:14+01:00 level=debug msg=updated uncommitted[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 to local_8800ab87-b192-4b8c-b755-7a0d37c343b2[], local_8800ab87-b192-4b8c-b755-7a0d37c343b2[], local_8800ab87-b192-4b8c-b755-7a0d37c343b2[2{0=>0=>1}]] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 to=local_8800ab87-b192-4b8c-b755-7a0d37c343b2[], local_8800ab87-b192-4b8c-b755-7a0d37c343b2[], local_8800ab87-b192-4b8c-b755-7a0d37c343b2[2{0=>0=>1}]
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote Fetch v12[broker 1 bytes_written 125 write_wait 148.535µs time_to_write 25.122µs err <nil>] broker=1 bytes_written=125 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=25.122µs write_wait=148.535µs
time=2021-12-06T20:01:14+01:00 level=info msg=metadata update triggered[why client initialization] reporter=job-finder why=client initialization
time=2021-12-06T20:01:14+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=issuing api versions request[broker seed 0 version 3] broker=seed 0 reporter=job-finder version=3
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote ApiVersions v3[broker seed 0 bytes_written 31 write_wait 49.032µs time_to_write 21.183µs err <nil>] broker=seed 0 bytes_written=31 err=<nil> reporter=job-finder time_to_write=21.183µs write_wait=49.032µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read ApiVersions v3[broker seed 0 bytes_read 404 read_wait 342.037µs time_to_read 1.926107ms err <nil>] broker=seed 0 bytes_read=404 err=<nil> read_wait=342.037µs reporter=job-finder time_to_read=1.926107ms
time=2021-12-06T20:01:14+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker seed 0] addr=127.0.0.1:9092 broker=seed 0 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote Metadata v9[broker seed 0 bytes_written 67 write_wait 3.613423ms time_to_write 24.216µs err <nil>] broker=seed 0 bytes_written=67 err=<nil> reporter=job-finder time_to_write=24.216µs write_wait=3.613423ms
time=2021-12-06T20:01:14+01:00 level=debug msg=read Metadata v9[broker seed 0 bytes_read 197 read_wait 362.395µs time_to_read 2.098122ms err <nil>] broker=seed 0 bytes_read=197 err=<nil> read_wait=362.395µs reporter=job-finder time_to_read=2.098122ms
time=2021-12-06T20:01:14+01:00 level=info msg=assigning partitions[why new assignments from direct consumer how assigning everything new, keeping current assignment input map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{-2.-1~0} 1:{-2.-1~0} 2:{0.-1~0}]]] how=assigning everything new, keeping current assignment input=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{-2.-1~0} 1:{-2.-1~0} 2:{0.-1~0}]] reporter=job-finder why=new assignments from direct consumer
time=2021-12-06T20:01:14+01:00 level=debug msg=assign requires loading offsets[] reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=offsets to load broker[broker 1 load {map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{-2.-1~0} 1:{-2.-1~0}]] map[]}] broker=1 load={map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[0:{-2.-1~0} 1:{-2.-1~0}]] map[]} reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=issuing api versions request[broker 1 version 3] broker=1 reporter=job-finder version=3
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote ApiVersions v3[broker 1 bytes_written 31 write_wait 47.074µs time_to_write 22.195µs err <nil>] broker=1 bytes_written=31 err=<nil> reporter=job-finder time_to_write=22.195µs write_wait=47.074µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read ApiVersions v3[broker 1 bytes_read 404 read_wait 328.305µs time_to_read 2.897592ms err <nil>] broker=1 bytes_read=404 err=<nil> read_wait=328.305µs reporter=job-finder time_to_read=2.897592ms
time=2021-12-06T20:01:14+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote Fetch v12[broker 1 bytes_written 125 write_wait 4.608826ms time_to_write 24.381µs err <nil>] broker=1 bytes_written=125 err=<nil> reporter=job-finder time_to_write=24.381µs write_wait=4.608826ms
time=2021-12-06T20:01:14+01:00 level=debug msg=opening connection to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=connection opened to broker[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=connection initialized successfully[addr 127.0.0.1:9092 broker 1] addr=127.0.0.1:9092 broker=1 reporter=job-finder
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote ListOffsets v5[broker 1 bytes_written 106 write_wait 5.441759ms time_to_write 33.574µs err <nil>] broker=1 bytes_written=106 err=<nil> reporter=job-finder time_to_write=33.574µs write_wait=5.441759ms
time=2021-12-06T20:01:14+01:00 level=debug msg=read Fetch v12[broker 1 bytes_read 263 read_wait 422.134µs time_to_read 2.382216ms err <nil>] broker=1 bytes_read=263 err=<nil> read_wait=422.134µs reporter=job-finder time_to_read=2.382216ms
time=2021-12-06T20:01:14+01:00 level=debug msg=read ListOffsets v5[broker 1 bytes_read 0 read_wait 487.09µs time_to_read 1.792011ms err client closed] broker=1 bytes_read=0 err=client closed read_wait=487.09µs reporter=job-finder time_to_read=1.792011ms
time=2021-12-06T20:01:14+01:00 level=info msg=assigning partitions[why invalidating all assignments in LeaveGroup how unassigning everything input map[]] how=unassigning everything input=map[] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 why=invalidating all assignments in LeaveGroup
time=2021-12-06T20:01:14+01:00 level=debug msg=read Fetch v12[broker 1 bytes_read 0 read_wait 393.041µs time_to_read 16.354307ms err context canceled] broker=1 bytes_read=0 err=context canceled read_wait=393.041µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=16.354307ms
time=2021-12-06T20:01:14+01:00 level=info msg=heartbeat errored[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 err context canceled] err=context canceled group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=debug msg=read from broker errored, killing connection[addr 127.0.0.1:9092 broker 1 successful_reads 1 err context canceled] addr=127.0.0.1:9092 broker=1 err=context canceled reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 successful_reads=1
time=2021-12-06T20:01:14+01:00 level=debug msg=entering OnRevoked[with map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:[0 1 2]]] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 with=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:[0 1 2]]
time=2021-12-06T20:01:14+01:00 level=debug msg=in CommitOffsetsSync[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 with map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[2:{0 0}]]] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 with=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[2:{0 0}]]
time=2021-12-06T20:01:14+01:00 level=debug msg=issuing commit[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 uncommitted map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[2:{0 0}]]] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 uncommitted=map[local_8800ab87-b192-4b8c-b755-7a0d37c343b2:map[2:{0 0}]]
time=2021-12-06T20:01:14+01:00 level=debug msg=wrote OffsetCommit v8[broker 1 bytes_written 218 write_wait 151.1µs time_to_write 29.123µs err <nil>] broker=1 bytes_written=218 err=<nil> reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_write=29.123µs write_wait=151.1µs
time=2021-12-06T20:01:14+01:00 level=debug msg=read OffsetCommit v8[broker 1 bytes_read 67 read_wait 470.251µs time_to_read 3.46822ms err <nil>] broker=1 bytes_read=67 err=<nil> read_wait=470.251µs reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 time_to_read=3.46822ms
time=2021-12-06T20:01:14+01:00 level=debug msg=updated committed[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2 to local_8800ab87-b192-4b8c-b755-7a0d37c343b2[2{0=>0}]] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 to=local_8800ab87-b192-4b8c-b755-7a0d37c343b2[2{0=>0}]
time=2021-12-06T20:01:14+01:00 level=debug msg=left CommitOffsetsSync[group local_8800ab87-b192-4b8c-b755-7a0d37c343b2] group=local_8800ab87-b192-4b8c-b755-7a0d37c343b2 reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2
time=2021-12-06T20:01:14+01:00 level=info msg=assigning partitions[why clearing assignment at end of group management session how unassigning everything input map[]] how=unassigning everything input=map[] reporter=job-consumer-8800ab87-b192-4b8c-b755-7a0d37c343b2 why=clearing assignment at end of group management session
time=2021-12-06T20:01:14+01:00 level=info msg=flushing[] reporter=job-producer
time=2021-12-06T20:01:14+01:00 level=debug msg=flushed[] reporter=job-producer
@twmb
Copy link
Owner

twmb commented Dec 7, 2021

I looked into this and will change some internal code to hopefully improve this situation.

Currently, an unknown topic triggers an immediate update. When using metadata requests to create topics, Kafka does not assign a leader immediately, so the topic / partitions have no leader. This is considered a "failure" internally, and since this metadata request was from an immediate trigger, the client immediately retries again. This retry happens twice (three tries total), all very fast back to back, and Kafka has not assigned a leader in this time (<1s, per your logs). After three failures, the client triggers itself to load metadata slowly (metadata min age). This is the 10s pause you see, after which it requests again, and by that time Kafka has assigned a leader.

I'm going to change the code to not immediately re-issue a metadata request. The new code will only allow one extra retry when the metadata request has a true error (connection cut), since this case already retries 3x internally before getting to this point (it's retries all the way down). When the request is non-errored but has internal errors (no leader), I will allow retries up to a total of 8 requests, with a 250ms sleep between them. This gives immediate-triggers 2s to be successful, rather than the prior basically 0s.

As well, I'm going to change the default metadata min age to 5s. So, if Kafka still has not stabilized on a leader within 2s, the client will fall back to waiting 5s between attempts rather than 10

All told this will increase the number of metadata requests on per-partition failures. Immediate triggers only happen on per-partition errors, so ideally the increased number of metadata requests only happens when things are going south, and it may help recover in the client sooner.

@twmb
Copy link
Owner

twmb commented Dec 7, 2021

Should be closed with 8325ba7; if it is not, I'm not sure there's much more to do within the client to help this case.

@twmb twmb closed this as completed Dec 7, 2021
@caioaao
Copy link

caioaao commented May 23, 2022

(commenting in this because it seems like it's related) I think this introduced an issue when you start a consumer and immediately after create the topic.it seems like it's sleeping for ~2.5s upon calling PollFetches

@twmb
Copy link
Owner

twmb commented May 24, 2022

^That sounds somewhat related, but I don't think there's much to be improved here. The client doesn't backoff 2.5s, it will retry a few times and then wait 2.5s. It's possible that the partitions don't have a leader immediately?

@caioaao
Copy link

caioaao commented May 24, 2022

Maybe the retries are too fast? Or maybe the 2.5s wait could be replaced by an exponential back-off. wdyt?

@twmb
Copy link
Owner

twmb commented May 24, 2022

This retries every 250ms 8 times in a row. What specific problem are you running into?

@caioaao
Copy link

caioaao commented May 24, 2022

I wrote a snippet to illustrate what I think the issue is:

func TestFranzPollHanging(t *testing.T) {
	topicID := uuid.NewString()

	// Create client used to consume
	subCl, err := kgo.NewClient(
		kgo.SeedBrokers("localhost:9092"),
		kgo.ConsumeTopics(topicID),
	)
	if err != nil {
		t.Fatal(err)
	}

	// start polling for fetches
	wg := new(sync.WaitGroup)
	wg.Add(1)
	go func() {
		fmt.Println(">fetch", time.Now())
		for {
			cctx, cancel := context.WithTimeout(context.Background(), 300*time.Millisecond)
			defer cancel()

			rs := subCl.PollFetches(cctx)
			if rs.Err() == nil {
				fmt.Println("<fetch", time.Now())
				wg.Done()
				break
			}
			fmt.Println(" fetch", rs.Err())
		}
	}()

	// create topic
	admCl := kadm.NewClient(subCl)
	admCl.CreateTopics(context.Background(), 1, 1, nil, topicID)

	// Producing with new client
	fmt.Println(">produce", time.Now())
	pubCl, err := kgo.NewClient(kgo.SeedBrokers("localhost:9092"))
	if err != nil {
		t.Fatal(err)
	}

	pubCl.ProduceSync(context.Background(), &kgo.Record{Value: []byte("waiting"), Topic: topicID})
	fmt.Println("<produce", time.Now())

	wg.Wait()
}

This outputs the following:

=== RUN   TestFranz
>fetch 2022-05-24 18:40:32.063644477 -0300 -03 m=+0.001146096
>produce 2022-05-24 18:40:32.185176625 -0300 -03 m=+0.122678175
<produce 2022-05-24 18:40:32.213508301 -0300 -03 m=+0.151011737
<fetch 2022-05-24 18:40:34.572476576 -0300 -03 m=+2.509978615
--- PASS: TestFranz (2.51s)

As you can see, the producer runs virtually instantly after topic creation and 100ms after the fetch was called, but the fetch hangs for 2.5s. I'd expect it to return before, probably after the 250ms and during the first retry. My hypothesis is the retry logic is probably broken - maybe some caching issue?

@twmb
Copy link
Owner

twmb commented May 25, 2022

This is benign behavior that would only be encountered if trying to create a client and consume immediately after topic creation. However, the fix is small so I've done so here: f981856

I still expect the client to take a bit if you, for example, use regex consuming and create a topic. The client only periodically updates metadata.

As well, if your topic doesn't exist and you try to consume it and then you only create the topic a minute later, it will still take the min metadata wait to discover it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants