-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support seed brokers' hostname with multiple addresses #877
Support seed brokers' hostname with multiple addresses #877
Conversation
This seems like a risky change, and I haven't seen other requests for this functionality. Would you consider using rdkafka-ruby instead? I'm wary of introducing risky changes to ruby-kafka unless there's broad demand. |
What kind of risk do you think there is? I think this change makes ruby-kafka easier to use without any risk. |
lib/kafka/cluster.rb
Outdated
@stale = false | ||
|
||
return cluster_info | ||
Resolv.getaddresses(node.hostname).select { |ip| ip =~ Resolv::IPv4::Regex }.shuffle.each do |ip| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about IPv6 addresses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think currently ruby-kafka doesn't support IPv6 addresses because Kafka::BrokerUri.parse
can't parse IPv6 addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that a hostname, a DNS record, could have a IPv6 address, and I didn’t think of such a case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to have this IPv4 regex at all? Why not simply:
Resolv.getaddresses(node.hostname).shuffle.each do |ip|
If an IPv6 address causes an exception further down the stack, better to solve that further down the stack, right? I don't actually follow what the issue would be with Kafka::BrokerUri.parse
-- that's further up the stack and would be handling the text hostname before passing it down to this method.
If someone wanted to pass an IPv6 address in natively, the format should be like Kafka.new("kafka://[::1]:9092")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the code as follows:
(@resolve_seed_brokers ? Resolv.getaddresses(node.hostname).shuffle : [node.hostname]).each
I think currently ruby-kafka doesn't support IPv6 addresses because Kafka::BrokerUri.parse can't parse IPv6 addresses.
This is my misunderstanding. Although I don't remember well, I might have tried Kafka.new("kafka://::1:9092")
...
Is there any reason to have this IPv4 regex at all?
At first, I thought it was meaningless to try IPv6 addresses because both IPv4 addresses and IPv6 addresses might belong to the same broker. However we cannot know whether they belong to the same broker or not, and the DNS record might have only IPv6 addresses.
I also checked the behavior of librdkafka and found that it used both IPv4 addresses and IPv6 addresses even if they belonged to the same broker.
cf. https:/edenhill/librdkafka/blob/v1.6.1/src/rdaddr.c#L162-L232
For the above reasons, I decided not to exclude IPv6 addresses.
If this new behavior can be made opt-in, i.e. disabled by default, then I'd be open to it. |
OK, I'll think about another option or something like that. |
.circleci/config.yml
Outdated
@@ -7,6 +7,7 @@ jobs: | |||
LOG_LEVEL: DEBUG | |||
steps: | |||
- checkout | |||
- run: sudo apt-get update && sudo apt-get install -y cmake # For installing snappy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line relevant to the multiple address support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was needed to run the tests. I've removed it because the current master branch includes the change.
I strongly agree with this PR! In my experience it is a common practice to use a single hostname that expands to multiple IP address records, and in this case (i.e. using Kafka in the first place because HA is important) it would be helpful to explicitly iterate over the list of records. I think the PR itself has two issues: 1) unrelated change in circleci.yml and 2) IPv4 limitation for some reason, and would ask if @abicky would remove those lines if that would make this eligible to land the PR?
Thank you! |
Looking forward to this, it matches our use case: we run a kubernetes cluster with the Kafka brokers exposed through a single DNS entry. Whenever one of the brokers go down (rare but can happen due to our k8s setup), there's a chance our services will try to reach the IP of the unavailable broker due to local DNS caching, and when it can't connect to that particular IP the service will just crash, thinking there are no brokers available @abicky , are you still working on this? Let me know if I can be of help |
14e9990
to
99fcabb
Compare
By default, the Java implementation of a Kafka client can use each returned IP address from a hostname. cf. https://kafka.apache.org/documentation/#client.dns.lookup librdkafka also does: > If host resolves to multiple addresses librdkafka will > round-robin the addresses for each connection attempt. cf. https:/edenhill/librdkafka/blob/v1.5.3/INTRODUCTION.md#brokers Such a feature helps us to manage seed brokers' IP addresses easily, so this commit implements it.
99fcabb
to
2cf27e4
Compare
(@resolve_seed_brokers ? Resolv.getaddresses(node.hostname).shuffle : [node.hostname]).each do |hostname_or_ip| | ||
node_info = node.to_s | ||
node_info << " (#{hostname_or_ip})" if node.hostname != hostname_or_ip | ||
@logger.info "Fetching cluster metadata from #{node_info}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@logger.info "Fetching cluster metadata from #{node_info}" | |
@logger.info "Fetching cluster metadata from #{node.hostname}#{node_info}" |
This would show both the hostname and the resolved IP address that's being attempted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node.hostname
is not necessary because node.to_s
has the information:
node_info = node.to_s # e.g. "kafka://localhost:9092"
# e.g. "kafka://localhost:9092" => "kafka://localhost:9092 (127.0.0.1)"
node_info << " (#{hostname_or_ip})" if node.hostname != hostname_or_ip
Here is an example output:
/Users/arabiki/ghq/src/github.com/zendesk/ruby-kafka/lib/kafka/cluster.rb:454:in `fetch_cluster_info': Could not connect to any of the seed brokers: (Kafka::ConnectionError)
- kafka://localhost0:9092 (10.0.0.1): Operation timed out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right on, that looks good already!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks good!
Can you also add an entry to the changelog?
a5ce0d7
to
4602e1f
Compare
4602e1f
to
ef7c21c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
By default, the Java implementation of a Kafka client can use each returned IP address from a hostname.
cf. https://kafka.apache.org/documentation/#client.dns.lookup
librdkafka also does:
cf. https:/edenhill/librdkafka/blob/v1.5.3/INTRODUCTION.md#brokers
Such a feature helps us to manage seed brokers' IP addresses easily, so this PR implements it.