Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Coredns cannot resolve node hostname #1087

Closed
ccjjxx99 opened this issue Dec 3, 2022 · 23 comments
Closed

[BUG] Coredns cannot resolve node hostname #1087

ccjjxx99 opened this issue Dec 3, 2022 · 23 comments
Labels
kind/bug kind/bug

Comments

@ccjjxx99
Copy link

ccjjxx99 commented Dec 3, 2022

What happened:
I have deployed metrics-server on the cloud node. It continues to report the following error:

E1203 12:49:23.192743       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-219:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-219"
E1203 12:49:23.192760       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-221:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-221"
E1203 12:49:23.192766       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-224:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-224"
E1203 12:49:23.192769       1 scraper.go:139] "Failed to scrape node" err="Get \"https://center:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="center"
E1203 12:49:23.192746       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-222:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-222"
E1203 12:49:23.192801       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-218:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-218"
E1203 12:49:23.192802       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-223:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-223"
E1203 12:49:23.193890       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-220:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-220"
E1203 12:49:23.193916       1 scraper.go:139] "Failed to scrape node" err="Get \"https://dell2015:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="dell2015"
E1203 12:49:23.193923       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-225:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-225"
I1203 12:49:23.445026       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I1203 12:49:33.445641       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"

When I turned on the log function of coredns and checked the logs, I found that coredns could not resolve the hostname:

[INFO] 10.244.0.21:57363 - 10071 "AAAA IN node-221.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000101561s
[INFO] 10.244.0.21:49140 - 15804 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000197454s
[INFO] 10.244.0.21:52591 - 40725 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000170565s
[INFO] 10.244.0.21:46343 - 27268 "A IN node-222.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000144675s
[INFO] 10.244.0.21:53605 - 21188 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00017991s
[INFO] 10.244.0.21:56493 - 14043 "A IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000145095s
[INFO] 10.244.0.21:57767 - 20232 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000104592s
[INFO] 10.244.0.21:55905 - 46769 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000100891s
[INFO] 10.244.0.21:38400 - 21470 "A IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000154556s
[INFO] 10.244.0.21:42241 - 28115 "AAAA IN node-223.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00014793s
[INFO] 10.244.0.21:46009 - 15495 "AAAA IN node-225.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000150244s
[INFO] 10.244.0.21:43989 - 42034 "A IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000086667s
[INFO] 10.244.0.21:37473 - 36930 "AAAA IN node-218.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000160677s
[INFO] 10.244.0.21:38626 - 9816 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00009503s
[INFO] 10.244.0.21:57427 - 45436 "A IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000181907s
[INFO] 10.244.0.21:42602 - 2082 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00021916s
[INFO] 10.244.0.21:48372 - 64152 "AAAA IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000215355s
[INFO] 10.244.0.21:38931 - 17188 "A IN node-220.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000149272s
[INFO] 10.244.0.21:47704 - 5818 "A IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000100259s
[INFO] 10.244.0.21:43007 - 5861 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00007362s
[INFO] 10.244.0.21:56270 - 62782 "A IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000167426s

In fact, I have mounted the yurt-tunnel-nodes configmap to coredns:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  ...
  name: coredns
  ...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - args:
            - '-conf'
            - /etc/coredns/Corefile
          image: 'registry.aliyuncs.com/google_containers/coredns:1.8.4'
          ...
          volumeMounts:
            - mountPath: /etc/edge       # here
              name: edge
              readOnly: true
            - mountPath: /etc/coredns
              name: config-volume
              readOnly: true
      ...
      volumes:
        - configMap:
            defaultMode: 420
            name: yurt-tunnel-nodes     # here
          name: edge
        - configMap:
            defaultMode: 420
            items:
              - key: Corefile
                path: Corefile
            name: coredns
          name: config-volume
  ...

And I added the hosts to the configmap of coredns:

---
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        log {
        }
        ready
        hosts /etc/edge/tunnel-nodes {    # here
            reload 300ms
            fallthrough
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods verified
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
  resourceVersion: '1363115'

And the yurt-tunnel-nodes configmap is as shown below, where 10.107.2.246 is the ClusterIP of x-tunnel-server-internal-svc:

---
apiVersion: v1
data:
  tunnel-nodes: "10.107.2.246\tdell2015\n10.107.2.246\tnode-218\n10.107.2.246\tnode-219\n10.107.2.246\tnode-220\n10.107.2.246\tnode-221\n10.107.2.246\tnode-222\n10.107.2.246\tnode-223\n10.107.2.246\tnode-224\n10.107.2.246\tnode-225\n172.26.146.181\tcenter"
kind: ConfigMap
metadata:
  annotations: {}
  name: yurt-tunnel-nodes
  namespace: kube-system
  resourceVersion: '1296168'

I think all these configures is well. So why coredns returns NXDOMAIN where solving the node hostname?

What you expected to happen:
Coredns can resolve the node hostname to the ClusterIP of x-tunnel-server-internal-svc.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • OpenYurt version: 1.1
  • Kubernetes version (use kubectl version): 1.22.8
  • OS (e.g: cat /etc/os-release): Ubuntu 22.04.1 LTS
  • Kernel (e.g. uname -a): 5.15.0-46-generic
  • Install tools: Manually Setup
  • Others:

others

/kind bug

@ccjjxx99 ccjjxx99 added the kind/bug kind/bug label Dec 3, 2022
@ccjjxx99 ccjjxx99 changed the title [BUG] coredns cannot resolve hostname [BUG] Coredns cannot resolve node hostname Dec 3, 2022
@rambohe-ch
Copy link
Member

@ccjjxx99 Thank you for posting issue. coredns can not resolve node name, i am afraid that maybe you have not used the correct coredns instance, so please check the following points:

  1. coredns component worked as DaemonSet or not?
  2. the dns resolution request from metrics-server is forwarded to the coredns instance on the same node or not?
  3. please use the dig command to check the specified coredns instance can not resolve the node name or not?

    dig @{coredns pod ip} {node hostname}

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 4, 2022

@rambohe-ch Thanks for your response. I checked these three points:

  1. coredns component worked as DaemonSet. I have 10 servers: 1 on cloud and 9 at local. There is a coredns pod on each server.
  2. I don't know how to get which pod the service has forwarded the request to, could you please tell me the specific method? Thanks a lot.
    By the way, I followed the tutorial to adjust my kube-proxy by "Comment config.conf file's property clientConnection.kubeconfig". And I created two nodepools: one contains the cloud node, and another contains the 9 edge nodes. I guess this will ensure that dns requests on the cloud node will only be sent to the cloud coredns, am I right?
  3. The pod IP of coredns on cloud server is 10.244.2.24, I use dig command and it shows as follows:
root@center:~# dig @10.244.0.24 node-218

; <<>> DiG 9.18.1-1ubuntu1.2-Ubuntu <<>> @10.244.0.24 node-218
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22864
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: e7a68989a2ce9691 (echoed)
;; QUESTION SECTION:
;node-218.                      IN      A

;; ANSWER SECTION:
node-218.               30      IN      A       10.107.2.246

;; Query time: 0 msec
;; SERVER: 10.244.0.24#53(10.244.0.24) (UDP)
;; WHEN: Sun Dec 04 19:37:55 CST 2022
;; MSG SIZE  rcvd: 73

then I check the log of coredns:

[INFO] 10.244.0.1:48941 - 13681 "A IN node-218. udp 49 false 1232" NOERROR qr,aa,rd 50 0.000116782s
[INFO] 10.244.0.21:40499 - 39570 "AAAA IN node-218.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000161284s
[INFO] 10.244.0.21:47982 - 49678 "A IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000077668s
[INFO] 10.244.0.21:38718 - 46903 "AAAA IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000153572s

I can see the first item shows NOERROR to resolve node-218.
but other three from 10.244.0.21 (the metrics-server pod ip) returns NXDOMAIN to resolve node-218.kube-system.svc.cluster.local.

@rambohe-ch
Copy link
Member

2. I don't know how to get which pod the service has forwarded the request to, could you please tell me the specific method? Thanks a lot.

@ccjjxx99 please check the contents of /etc/resolv.conf file in metrics-server pod mount namespace. the step as following:

  1. get the pid of metrics-server container: docker inspect {metrics-server container id} | grep Pid
  2. enter the mount namespace of metrics-server container: nsenter -t {pid} -m
  3. check the contents of /etc/resolv.conf: cat /etc/resolv.conf

by the way, as the output of dig @10.244.0.24 node-218, coredns(10.244.0.24) can resolve the node hostname, so you should make sure that the correct coredns instance is used by metrics-server or not.

@rambohe-ch
Copy link
Member

by the way, the logs of coredns also showed that coredns can resolve node-218. dns request. and the other type of dns requests can not resolved is under expectation.

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 5, 2022

@rambohe-ch I tried various methods but all failed to get into the metrics - server container.

root@center:~# docker ps | grep metrics-server
9ee17aefd09f   1c655933b9c5                                         "/metrics-server --c…"   42 hours ago    Up 42 hours                                                                                                k8s_metrics-server_metrics-server-667fb6bffc-qvbrv_kube-system_4dd07250-6e41-4951-85d6-1c7c7471cf25_0
3fcb46a053a9   registry.aliyuncs.com/google_containers/pause:3.5    "/pause"                 42 hours ago    Up 42 hours                                                                                                k8s_POD_metrics-server-667fb6bffc-qvbrv_kube-system_4dd07250-6e41-4951-85d6-1c7c7471cf25_0
root@center:~# docker inspect 3fcb46a053a9 | grep Pid
            "Pid": 1013438,
            "PidMode": "",
            "PidsLimit": null,
root@center:~# nsenter -t 1013438 -m
nsenter: failed to execute /bin/bash: No such file or directory
root@center:~# sudo nsenter --target 1013438 --mount --uts --ipc --net --pid 
nsenter: failed to execute /bin/bash: No such file or directory
root@center:~# docker exec -it 3fcb46a053a9 /bin/bash
OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
root@center:~# docker exec -it 3fcb46a053a9 /bin/sh
OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown
root@center:~# docker inspect 9ee17aefd09f | grep Pid
            "Pid": 1013554,
            "PidMode": "",
            "PidsLimit": null,
root@center:~# nsenter -t 1013554 -m
nsenter: failed to execute /bin/bash: No such file or directory

But I think this is not the problem.
The dns request form metrics-server is indeed sent to the correct coredns pod in the cloud node.
Because the logs of cloud node's coredns shows many many requests form 10.244.0.21, which is exactly the metrics-server pod ip.
Other edge node's coredns didn't receive the metrics-server's requests.

[INFO] 10.244.0.21:60906 - 52465 "AAAA IN node-220.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000196151s
[INFO] 10.244.0.21:45663 - 51091 "AAAA IN node-222.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000079281s
[INFO] 10.244.0.21:50449 - 38732 "AAAA IN center.kube-system.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000216412s
[INFO] 10.244.0.21:34369 - 54770 "A IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000096898s
[INFO] 10.244.0.21:45431 - 16859 "A IN node-221.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000161652s
[INFO] 10.244.0.21:58024 - 15135 "AAAA IN node-223.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00014005s
[INFO] 10.244.0.21:35876 - 48978 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000158383s
[INFO] 10.244.0.21:52847 - 53038 "AAAA IN node-218.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000161755s
[INFO] 10.244.0.21:39896 - 20432 "AAAA IN node-219.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000161184s
[INFO] 10.244.0.21:35551 - 47097 "AAAA IN node-220.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00012329s
[INFO] 10.244.0.21:35156 - 8781 "A IN node-224.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000156926s
[INFO] 10.244.0.21:59233 - 43706 "A IN node-221.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000080524s
[INFO] 10.244.0.21:35153 - 38501 "A IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000136949s
[INFO] 10.244.0.21:54386 - 16631 "A IN node-224.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000192448s
[INFO] 10.244.0.21:52536 - 39481 "A IN node-221.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000142629s
[INFO] 10.244.0.21:42583 - 39265 "A IN node-220.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000160651s
[INFO] 10.244.0.21:50282 - 9887 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000136408s
[INFO] 10.244.0.21:60724 - 33562 "A IN node-222.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000149509s
[INFO] 10.244.0.21:45892 - 6711 "A IN node-218.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000149003s
[INFO] 10.244.0.21:34698 - 28509 "AAAA IN center.svc.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000186134s
[INFO] 10.244.0.21:43259 - 50351 "A IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000142315s
[INFO] 10.244.0.21:59030 - 26753 "AAAA IN node-221.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00014849s
[INFO] 10.244.0.21:37536 - 56581 "AAAA IN node-219.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00009081s
[INFO] 10.244.0.21:49904 - 64894 "AAAA IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000142386s
[INFO] 10.244.0.21:40334 - 49490 "A IN node-223.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000160979s
[INFO] 10.244.0.21:45590 - 43230 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00014566s
[INFO] 10.244.0.21:38694 - 36789 "A IN node-221.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000144038s
[INFO] 10.244.0.21:43403 - 2708 "A IN node-222.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000134989s
[INFO] 10.244.0.21:57337 - 63336 "AAAA IN node-220.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00014718s
[INFO] 10.244.0.21:37807 - 56240 "A IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000093111s
[INFO] 10.244.0.21:57894 - 41834 "AAAA IN node-222.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000104832s
[INFO] 10.244.0.21:49952 - 3065 "AAAA IN node-221.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000141598s

I think now the question is why coredns can resolve node-218 but cannot resolve node-218.kube-system.svc.cluster.local.

@rambohe-ch
Copy link
Member

rambohe-ch commented Dec 5, 2022

I tried various methods but all failed to get into the metrics - server container.

@ccjjxx99 because there are no /bin/bash command in the metrics-server volume, maybe you can run docker inspect {metrics-server container} and find the mounted file on the host for /etc/resolv.conf file.

I think now the question is why coredns can resolve node-218 but cannot resolve node-218.kube-system.svc.cluster.local.

you can check the contents of yurt-tunnel-nodes configmap, only hostname\t{clusterIP/nodeIP} dns records are loaded into coredns, so node-218.kube-system.svc.cluster.local. can not be resolved.

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 5, 2022

@rambohe-ch Thank you for your patience to answer. Now I can see the resolv.conf:

root@center:~# docker ps | grep metrics
24c03ace5cb1   5787924fe1d8                                         "/metrics-server --c…"   9 minutes ago   Up 9 minutes                                                                                               k8s_metrics-server_metrics-server-ccc64b89b-bbhnx_kube-system_a4699585-70b4-409a-a2bc-0a76e3c24159_0
95326c677726   registry.aliyuncs.com/google_containers/pause:3.5    "/pause"                 9 minutes ago   Up 9 minutes                                                                                               k8s_POD_metrics-server-ccc64b89b-bbhnx_kube-system_a4699585-70b4-409a-a2bc-0a76e3c24159_0
root@center:~# docker inspect 95326c677726 | grep resolv.conf
        "ResolvConfPath": "/var/lib/docker/containers/95326c677726d9615bbb8b4aa6b60c26189a03e5f8348dedd430c8481c4d4abe/resolv.conf",
root@center:~# cat /var/lib/docker/containers/95326c677726d9615bbb8b4aa6b60c26189a03e5f8348dedd430c8481c4d4abe/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local 
options ndots:5

10.96.0.10 is my kube-dns service IP. So we can't see the request is forwarded to which pod form the resolv.conf file.

Refer to the article https://cloud.tencent.com/developer/article/1669860 .Now I think the problem is that the dns resolve request form metric-server sent to coredns is node-218.kube-system.svc.cluster.local. and node-218.svc.cluster.local. rather than node-218. So coredns cannot resolve. So maybe it's a bug of metrics-server, not coredns?

@rambohe-ch
Copy link
Member

10.96.0.10 is my kube-dns service IP. So we can't see the request is forwarded to which pod form the resolv.conf file.

Refer to the article https://cloud.tencent.com/developer/article/1669860 .Now I think the problem is that the dns resolve request form metric-server sent to coredns is node-218.kube-system.svc.cluster.local. and node-218.svc.cluster.local. rather than node-218. So coredns cannot resolve. So maybe it's a bug of metrics-server, not coredns?

@ccjjxx99 I think that the contents of /etc/resolv.conf file are correct, and the problem is that dns request to kube-dns service(10.96.0.10) is not forwarded to the correct dns instance(10.244.0.24). because Yurthub component can provide service topology capability(reference link: https://openyurt.io/docs/user-manuals/network/service-topology/) that make sure the traffic to service can be closed in the nodePool or the node, so you need to check the yurthub component is worked or not.

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 5, 2022

@rambohe-ch I checked service topology documentation. My understanding is that the following three operations are required to implement the coredns service topology:

  1. create nodepools. I have completed this step:
root@center:~# kubectl get np
NAME     TYPE    READYNODES   NOTREADYNODES   AGE
master   Cloud   1            0               6d3h
worker   Edge    9            0               5d23h
  1. Configure kube-proxy.
    2.1 EndpointSliceProxying feature gate must be enabled. I have completed this step.
    2.2 kube-proxy needs to be configured to connect to Yurthub instead of the API server. (I guess the "comment kubeconfig line" in the documentation is used for this). I have completed this step.
$ kubectl edit cm -n kube-system kube-proxy

apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    featureGates: # 1. enable EndpointSliceProxying feature gate.
      EndpointSliceProxying: true
    bindAddressHardFail: false
    clientConnection:
      acceptContentTypes: ""
      burst: 0
      contentType: ""
      # kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 0
    clusterCIDR: 10.244.0.0/16
    configSyncPeriod: 0s
    ...
  1. Add annotation openyurt.io/topologyKeys: openyurt.io/nodepool to coredns service. I have completed this step.
$ kubectl edit svc kube-dns -n kube-system

apiVersion: v1
kind: Service
metadata:
  annotations:
    openyurt.io/topologyKeys: openyurt.io/nodepool
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  creationTimestamp: "2022-11-29T05:16:22Z"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
    ...

I think I have completed the steps needed to turn on the service topology.
And the coredns instance that the dns request is forwarded to should be the correct one. The coredns logs can confirm my suspicions.
The coredns log on the cloud is shown below. From here we can see that the dns resolves requests only from 10.244.0.34 (the pod ip of metrics-server). Flannel assigns ip segment of 10.244.0.xxx to the cloud node. The cloud dns receives only requests from the cloud.

[INFO] 10.244.0.34:44845 - 43648 "AAAA IN node-222.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000155647s
[INFO] 10.244.0.34:57082 - 59449 "A IN node-218.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000075229s
[INFO] 10.244.0.34:46446 - 28363 "AAAA IN center.kube-system.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000070907s
[INFO] 10.244.0.34:36274 - 28093 "AAAA IN node-221.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000171694s
[INFO] 10.244.0.34:56568 - 28186 "AAAA IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00016328s
[INFO] 10.244.0.34:42647 - 18738 "A IN node-222.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000153169s
[INFO] 10.244.0.34:52029 - 37203 "AAAA IN center.svc.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000151947s
[INFO] 10.244.0.34:43274 - 62686 "A IN center.kube-system.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000145011s
[INFO] 10.244.0.34:35558 - 10570 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00014237s
[INFO] 10.244.0.34:46121 - 31936 "AAAA IN node-222.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000076477s
[INFO] 10.244.0.34:32920 - 46910 "AAAA IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00008089s
[INFO] 10.244.0.34:47813 - 10470 "A IN node-220.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000165363s
[INFO] 10.244.0.34:35892 - 9917 "A IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000174996s
[INFO] 10.244.0.34:55406 - 43804 "AAAA IN node-222.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000215967s
[INFO] 10.244.0.34:54117 - 6119 "AAAA IN node-218.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000148794s
[INFO] 10.244.0.34:47720 - 47531 "AAAA IN node-224.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000184608s
[INFO] 10.244.0.34:58268 - 26468 "AAAA IN node-222.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000083209s
[INFO] 10.244.0.34:34805 - 48727 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000154946s
[INFO] 10.244.0.34:33916 - 25562 "A IN node-219.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000155333s
[INFO] 10.244.0.34:48529 - 56286 "A IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000146616s

The coredns log on the edge node is shown below. From here we can see that There are no requests for any network segment that is 10.244.0.xxx. Coredns on the edge node only receives the dns resolve requests from other edge node's pods.

[INFO] 10.244.7.24:38999 - 61060 "AAAA IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.003131428s
[INFO] 10.244.3.114:52459 - 1197 "A IN alertmanager-main-0.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.000308356s
[INFO] 10.244.6.20:60046 - 41155 "A IN alertmanager-main-0.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.001142701s
[INFO] 10.244.6.20:43854 - 18915 "AAAA IN alertmanager-main-1.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.002393563s
[INFO] 10.244.7.24:50397 - 57282 "A IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.001983575s
[INFO] 10.244.6.20:57974 - 50704 "A IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.001152781s
[INFO] 10.244.3.114:45886 - 7402 "AAAA IN alertmanager-main-1.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.000257763s
[INFO] 10.244.7.24:59640 - 36814 "A IN alertmanager-main-1.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.00327722s
[INFO] 10.244.6.20:49062 - 32133 "AAAA IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.00237721s
[INFO] 10.244.6.20:56237 - 21911 "A IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.002904769s
[INFO] 10.244.7.24:45237 - 33414 "A IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.00146456s
[INFO] 10.244.7.24:54569 - 28801 "AAAA IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.002224121s
[INFO] 10.244.3.114:52000 - 4809 "AAAA IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.000202435s
[INFO] 10.244.6.20:42462 - 28785 "A IN alertmanager-main-0.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.002152535s
[INFO] 10.244.3.114:50635 - 7729 "AAAA IN alertmanager-main-0.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.000479877s
[INFO] 10.244.3.114:33515 - 7376 "A IN alertmanager-main-1.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.000634407s
[INFO] 10.244.7.24:37460 - 23717 "AAAA IN alertmanager-main-0.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.001543537s
[INFO] 10.244.7.24:36886 - 62752 "A IN alertmanager-main-0.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.002599676s
[INFO] 10.244.7.24:40389 - 43420 "A IN alertmanager-main-1.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.000312099s
[INFO] 10.244.3.114:46481 - 40304 "AAAA IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.00052999s
[INFO] 10.244.6.20:45600 - 29183 "A IN alertmanager-main-1.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.003239748s
[INFO] 10.244.3.114:36099 - 44417 "AAAA IN alertmanager-main-1.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 181 0.000825769s
[INFO] 10.244.7.24:46935 - 50087 "A IN alertmanager-main-2.alertmanager-operated.monitoring.svc.cluster.local. udp 88 false 512" NOERROR qr,aa,rd 174 0.000838249s

@rambohe-ch
Copy link
Member

@ccjjxx99 you should check the network setting of kube-dns service on the cloud whether only one coredns instance behind the service. and the logs of coredns on edge nodes can not prove that no dns requests have been forwarded to them, because dns requests can not reach edge nodes for physical network gap.

by the way, you can deploy a test pod(like busybox) on cloud node, so you nsenter into test container, then use the following commands to check dns requests fowrading:

  1. dig node-218.
  2. dig @10.244.0.24 node-218.

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 5, 2022

@rambohe-ch Thanks a lot. Now I believe that you are right. I started a busybox-dig container. It shows that dig @10.244.0.25 node-218. is ok but dig node-218. cannot work.

root@center:~# kubectl exec -it busybox -- sh
/ # dig node-218.

; <<>> DiG 9.10.2 <<>> node-218.
;; global options: +cmd
;; connection timed out; no servers could be reached
/ # dig @10.244.0.25 node-218.

; <<>> DiG 9.10.2 <<>> @10.244.0.25 node-218.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63414
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;node-218.                      IN      A

;; ANSWER SECTION:
node-218.               30      IN      A       10.107.2.246

;; Query time: 0 msec
;; SERVER: 10.244.0.25#53(10.244.0.25)
;; WHEN: Mon Dec 05 12:08:32 UTC 2022
;; MSG SIZE  rcvd: 61

/ # cat /etc/resolv.conf 
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local 
options ndots:5

So kube-dns server forward my dns resolve request to other coredns pods rather than the coredns on cloud node ( ip 10.244.0.25).
But I indeed followed the tutorial of service topology capability (reference link: https://openyurt.io/docs/user-manuals/network/service-topology/) and completed the three steps:

  1. Create nodepools. Add cloud node and edge nodes to different two nodepools.
  2. Config kube-proxy by enabling EndpointSliceProxying feature gate and commenting the kubeconfig line. By the way, the tutorial of manually setup an openyurt cluster doesn't have the step of enabling EndpointSliceProxying feature gate(https://openyurt.io/docs/installation/openyurt-prepare/#5-kubeproxy-adjustment). But the tutorial of service topology capability (https://openyurt.io/docs/user-manuals/network/service-topology/) has this step. Is this necessary?
  3. Add annotation openyurt.io/topologyKeys: openyurt.io/nodepool to coredns service.

Are there any other steps needed? Does the daemonset configuration of coredns need to be adjusted? Thanks a lot!

@rambohe-ch
Copy link
Member

  1. Create nodepools. Add cloud node and edge nodes to different two nodepools.
  2. Config kube-proxy by enabling EndpointSliceProxying feature gate and commenting the kubeconfig line. By the way, the tutorial of manually setup an openyurt cluster doesn't have the step of enabling EndpointSliceProxying feature gate(https://openyurt.io/docs/installation/openyurt-prepare/#5-kubeproxy-adjustment). But the tutorial of service topology capability (https://openyurt.io/docs/user-manuals/network/service-topology/) has this step. Is this necessary?
  3. Add annotation openyurt.io/topologyKeys: openyurt.io/nodepool to coredns service.

Are there any other steps needed? Does the daemonset configuration of coredns need to be adjusted? Thanks a lot!

@ccjjxx99 please check the networking setting of kube-dns service on cloud node, if kube-proxy uses ipvs mode, you can execute ipvsadm -Ln to check ipvs rules of 10.96.0.10 address(kube-dns service clusterIP).

maybe you can try to restart kube-proxy pod on the cloud node, and check service topology capability can works or not.

by the way: A openyurt community meeting on 11:00AM(beijing time) 2022.12.7 will be held, and we will introduce the details of data filter framework of yurthub(include service topology capability), and welcome you to the meeting.

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 6, 2022

@rambohe-ch I checked my kube-proxy work mode in configmap, it's empty. Refering to https://kubernetes.io/docs/reference/config-api/kube-proxy-config.v1alpha1/#kubeproxy-config-k8s-io-v1alpha1-ProxyMode , I know the default mode is iptables.

apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
featureGates: # 1. enable EndpointSliceProxying feature gate.
  EndpointSliceProxying: true
bindAddressHardFail: false
clientConnection:
  acceptContentTypes: ""
  burst: 0
  contentType: ""
  # kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
  qps: 0
clusterCIDR: 10.244.0.0/16
configSyncPeriod: 0s
conntrack:
  maxPerCore: null
  min: null
  tcpCloseWaitTimeout: null
  tcpEstablishedTimeout: null
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
  masqueradeAll: false
  masqueradeBit: null
  minSyncPeriod: 0s
  syncPeriod: 0s
ipvs:
  excludeCIDRs: null
  minSyncPeriod: 0s
  scheduler: ""
  strictARP: false
  syncPeriod: 0s
  tcpFinTimeout: 0s
  tcpTimeout: 0s
  udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
showHiddenMetricsForVersion: ""
udpIdleTimeout: 0s
winkernel:
  enableDSR: false
  networkName: ""
  sourceVip: ""

So I checked the iptables rules. It seems to contain the coredns pod ip of all edge nodes (like 10.244.12.184, 10.244.3.123)

root@center:~# iptables-save | grep kube-dns
-A KUBE-SEP-2JVQP42V4GXGTRDZ -s 10.244.0.25/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-2JVQP42V4GXGTRDZ -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.25:53
-A KUBE-SEP-35CFGASLMX4O2XYS -s 10.244.12.184/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-35CFGASLMX4O2XYS -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.12.184:9153
-A KUBE-SEP-35PBZICJ72K6TDDC -s 10.244.0.25/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-35PBZICJ72K6TDDC -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.25:9153
-A KUBE-SEP-73RZ7TGJF4NADUFX -s 10.244.3.123/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-73RZ7TGJF4NADUFX -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.3.123:53
-A KUBE-SEP-746ZY5XFPW3TX56U -s 10.244.11.9/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-746ZY5XFPW3TX56U -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.11.9:9153
-A KUBE-SEP-75ECG6LZX63U6XZ3 -s 10.244.4.113/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-75ECG6LZX63U6XZ3 -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.4.113:9153
-A KUBE-SEP-7VTAZWAUOSXEJ2PM -s 10.244.7.34/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-7VTAZWAUOSXEJ2PM -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.7.34:53
-A KUBE-SEP-BPMPHCNAJ5P5WTU7 -s 10.244.3.123/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-BPMPHCNAJ5P5WTU7 -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.3.123:9153
-A KUBE-SEP-CJ3WRWZCVLDTVQ4C -s 10.244.8.121/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-CJ3WRWZCVLDTVQ4C -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.8.121:9153
-A KUBE-SEP-CJB6EHT3WBN2QYCS -s 10.244.10.116/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-CJB6EHT3WBN2QYCS -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.10.116:53
-A KUBE-SEP-EB7UOQXQHQ2KDGIZ -s 10.244.12.184/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-EB7UOQXQHQ2KDGIZ -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.12.184:53
-A KUBE-SEP-EJ4Y7HFQSWLNVIF5 -s 10.244.2.88/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-EJ4Y7HFQSWLNVIF5 -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.2.88:9153
-A KUBE-SEP-FEPHI6AGRUA3MEOG -s 10.244.4.113/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-FEPHI6AGRUA3MEOG -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.4.113:53
-A KUBE-SEP-HGEVONAH7CJAA36K -s 10.244.8.121/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-HGEVONAH7CJAA36K -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.8.121:53
-A KUBE-SEP-KGRFBN223DEBGXBC -s 10.244.10.116/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-KGRFBN223DEBGXBC -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.10.116:53
-A KUBE-SEP-KLA54WROZXCXTXQK -s 10.244.10.116/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-KLA54WROZXCXTXQK -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.10.116:9153
-A KUBE-SEP-KTE6RAV34VK47FHZ -s 10.244.2.88/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-KTE6RAV34VK47FHZ -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.2.88:53
-A KUBE-SEP-NQYSJ3QY2ADXHRG3 -s 10.244.0.25/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-NQYSJ3QY2ADXHRG3 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.25:53
-A KUBE-SEP-O5N2KCI4BFNWKTWU -s 10.244.3.123/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-O5N2KCI4BFNWKTWU -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.3.123:53
-A KUBE-SEP-RRLYA43EWKJOINKW -s 10.244.8.121/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-RRLYA43EWKJOINKW -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.8.121:53
-A KUBE-SEP-RSAZ4AHHEQZB3MPO -s 10.244.7.34/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-RSAZ4AHHEQZB3MPO -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.7.34:9153
-A KUBE-SEP-SI4TEAGSJKZT227M -s 10.244.7.34/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-SI4TEAGSJKZT227M -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.7.34:53
-A KUBE-SEP-SI7YQJTH55EPZ7HO -s 10.244.4.113/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-SI7YQJTH55EPZ7HO -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.4.113:53
-A KUBE-SEP-SXYFPIHHCW6JAZTC -s 10.244.6.29/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-SXYFPIHHCW6JAZTC -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.6.29:53
-A KUBE-SEP-TUIODCONJ2H6CZWA -s 10.244.11.9/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-TUIODCONJ2H6CZWA -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.11.9:53
-A KUBE-SEP-VHJEIEEW5IDJHN6G -s 10.244.12.184/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-VHJEIEEW5IDJHN6G -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.12.184:53
-A KUBE-SEP-WHUZGM2DPRCURNGZ -s 10.244.6.29/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-WHUZGM2DPRCURNGZ -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.6.29:53
-A KUBE-SEP-WP367DNFQ4JA43MN -s 10.244.11.9/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-WP367DNFQ4JA43MN -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.11.9:53
-A KUBE-SEP-XVAEKRNVK5EZT36U -s 10.244.2.88/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-XVAEKRNVK5EZT36U -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.2.88:53
-A KUBE-SEP-ZRPKSY3Z6V6SOIBR -s 10.244.6.29/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZRPKSY3Z6V6SOIBR -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.6.29:9153
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SVC-ERIFXISQEP7F7OF4 ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.10000000009 -j KUBE-SEP-2JVQP42V4GXGTRDZ
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.11111111101 -j KUBE-SEP-CJB6EHT3WBN2QYCS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.12500000000 -j KUBE-SEP-WP367DNFQ4JA43MN
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.14285714272 -j KUBE-SEP-VHJEIEEW5IDJHN6G
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.16666666651 -j KUBE-SEP-KTE6RAV34VK47FHZ
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-73RZ7TGJF4NADUFX
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-FEPHI6AGRUA3MEOG
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-SXYFPIHHCW6JAZTC
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SI4TEAGSJKZT227M
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-HGEVONAH7CJAA36K
-A KUBE-SVC-JD5MR3NA4I4DYORP ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.10000000009 -j KUBE-SEP-35PBZICJ72K6TDDC
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.11111111101 -j KUBE-SEP-KLA54WROZXCXTXQK
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.12500000000 -j KUBE-SEP-746ZY5XFPW3TX56U
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.14285714272 -j KUBE-SEP-35CFGASLMX4O2XYS
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.16666666651 -j KUBE-SEP-EJ4Y7HFQSWLNVIF5
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-BPMPHCNAJ5P5WTU7
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-75ECG6LZX63U6XZ3
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-ZRPKSY3Z6V6SOIBR
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-RSAZ4AHHEQZB3MPO
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-SEP-CJ3WRWZCVLDTVQ4C
-A KUBE-SVC-TCOU7JCQXEZGVUNU ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.10000000009 -j KUBE-SEP-NQYSJ3QY2ADXHRG3
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.11111111101 -j KUBE-SEP-KGRFBN223DEBGXBC
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.12500000000 -j KUBE-SEP-TUIODCONJ2H6CZWA
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.14285714272 -j KUBE-SEP-EB7UOQXQHQ2KDGIZ
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.16666666651 -j KUBE-SEP-XVAEKRNVK5EZT36U
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-O5N2KCI4BFNWKTWU
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-SI7YQJTH55EPZ7HO
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-WHUZGM2DPRCURNGZ
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-7VTAZWAUOSXEJ2PM
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-RRLYA43EWKJOINKW

When I change the kube-proxy mode to ipvs and restart the kube-proxy daemonset, I checked the ipvs rules. It also contains all edge nodes' coredns.

root@center:~# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
...
...
TCP  10.96.0.10:53 rr
  -> 10.244.0.25:53               Masq    1      0          0         
  -> 10.244.2.88:53               Masq    1      0          0         
  -> 10.244.3.123:53              Masq    1      0          0         
  -> 10.244.4.113:53              Masq    1      0          0         
  -> 10.244.6.29:53               Masq    1      0          0         
  -> 10.244.7.34:53               Masq    1      0          0         
  -> 10.244.8.121:53              Masq    1      0          0         
  -> 10.244.10.116:53             Masq    1      0          0         
  -> 10.244.11.9:53               Masq    1      0          0         
  -> 10.244.12.184:53             Masq    1      0          0         
TCP  10.96.0.10:9153 rr
  -> 10.244.0.25:9153             Masq    1      0          0         
  -> 10.244.2.88:9153             Masq    1      0          0         
  -> 10.244.3.123:9153            Masq    1      0          0         
  -> 10.244.4.113:9153            Masq    1      0          0         
  -> 10.244.6.29:9153             Masq    1      0          0         
  -> 10.244.7.34:9153             Masq    1      0          0         
  -> 10.244.8.121:9153            Masq    1      0          0         
  -> 10.244.10.116:9153           Masq    1      0          0         
  -> 10.244.11.9:9153             Masq    1      0          0         
  -> 10.244.12.184:9153           Masq    1      0          0         
...
...
UDP  10.96.0.10:53 rr
  -> 10.244.0.25:53               Masq    1      0          0         
  -> 10.244.2.88:53               Masq    1      0          0         
  -> 10.244.3.123:53              Masq    1      0          0         
  -> 10.244.4.113:53              Masq    1      0          0         
  -> 10.244.6.29:53               Masq    1      0          0         
  -> 10.244.7.34:53               Masq    1      0          0         
  -> 10.244.8.121:53              Masq    1      0          0         
  -> 10.244.10.116:53             Masq    1      0          0         
  -> 10.244.11.9:53               Masq    1      0          0         
  -> 10.244.12.184:53             Masq    1      0          0         
...
...  

So the service topology doesn't work. How can I fix it? Thank you very much.

@rambohe-ch
Copy link
Member

So the service topology doesn't work. How can I fix it? Thank you very much.

@ccjjxx99 Please check whether yurthub component with cloud working mode is also deployed on the node where Metric-server is deployed? because the service topology capability is provided by yurthub component.

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 6, 2022

@rambohe-ch My cloud node's name is center, here are the pods deployed on it:

root@center:~# kubectl get pod -n kube-system -o wide | grep center
NAME                                       READY   STATUS    RESTARTS      AGE     IP               NODE       NOMINATED NODE   READINESS GATES
coredns-zj6sz                              1/1     Running   0             27h     10.244.0.25      center     <none>           <none>
etcd-center                                1/1     Running   4             7d      172.26.146.181   center     <none>           <none>
kube-apiserver-center                      1/1     Running   0             7d      172.26.146.181   center     <none>           <none>
kube-controller-manager-center             1/1     Running   1 (7d ago)    7d      172.26.146.181   center     <none>           <none>
kube-flannel-ds-cqgzb                      1/1     Running   0             7d      172.26.146.181   center     <none>           <none>
kube-proxy-mgvlh                           1/1     Running   0             173m    172.26.146.181   center     <none>           <none>
kube-scheduler-center                      1/1     Running   11 (7d ago)   7d      172.26.146.181   center     <none>           <none>
metrics-server-667fb6bffc-68n7b            0/1     Running   0             3h10m   10.244.0.39      center     <none>           <none>
yurt-app-manager-6fd8dcd6b4-jl294          1/1     Running   0             7d      10.244.0.7       center     <none>           <none>
yurt-controller-manager-7f9fbdf99c-chwxd   1/1     Running   0             6d23h   172.26.146.181   center     <none>           <none>
yurt-tunnel-dns-9cbd69765-94lz4            1/1     Running   0             7d      10.244.0.4       center     <none>           <none>
yurt-tunnel-server-65c55c56b9-nbt47        1/1     Running   0             6d23h   172.26.146.181   center     <none>           <none>

yurt-app-manager, yurt-controller-manager, yurt-tunnel-dns, yurt-tunnel-server are all work without error.

@rambohe-ch
Copy link
Member

rambohe-ch commented Dec 6, 2022

yurt-app-manager, yurt-controller-manager, yurt-tunnel-dns, yurt-tunnel-server are all work without error.

@ccjjxx99 The reason is already obvious, Yurthub component is not deployed on the cloud node. and i suggest that divide kubernetes control-plane components(like etcd, kas, kcm, etc.) and OpenYurt control-plane components(like ycm, yurt-tunnel etc.) into two nodes. and deploy Yurthub with cloud working mode on cloud node together with OpenYurt control-plane components.

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 6, 2022

@rambohe-ch 原来如此
I suggest that the openyurt community should improve the documentation.

  1. Openyurt setup: https://openyurt.io/docs/installation/manually-setup/
  2. Service Topology: https://openyurt.io/docs/installation/manually-setup/
  3. Prometheus: https://openyurt.io/docs/user-manuals/monitoring/prometheus/
    None of these documentations mention the Yurthub component, which is very confusing to users.

I have only one cloud server. Can I deploy yurthub on the cloud node with kubernetes control-plane components? The command bellow to start yurthub in the container seems to be to join the cluster. However, my cloud node itself is the k8s control-plane node of the cluster.

    command:
    - yurthub
    - --v=2
    - --server-addr=https://__kubernetes_master_address__
    - --node-name=$(NODE_NAME)
    - --join-token=__bootstrap_token__

And thank you very much for your patient answers.

@rambohe-ch
Copy link
Member

@rambohe-ch 原来如此 I suggest that the openyurt community should improve the documentation.

  1. Openyurt setup: https://openyurt.io/docs/installation/manually-setup/
  2. Service Topology: https://openyurt.io/docs/installation/manually-setup/
  3. Prometheus: https://openyurt.io/docs/user-manuals/monitoring/prometheus/
    None of these documentations mention the Yurthub component, which is very confusing to users.

I have only one cloud server. Can I deploy yurthub on the cloud node with kubernetes control-plane components? The command bellow to start yurthub in the container seems to be to join the cluster. However, my cloud node itself is the k8s control-plane node of the cluster.

    command:
    - yurthub
    - --v=2
    - --server-addr=https://__kubernetes_master_address__
    - --node-name=$(NODE_NAME)
    - --join-token=__bootstrap_token__

And thank you very much for your patient answers.

@ccjjxx99 yes, yurthub can be deployed on the kubernetes control-plane node, and you need to execute the following steps:

  1. beside above parameter settings, add another parameter working-mode=cloud for yurthub, and then move yurthub.yaml into /etc/kubernetes/manifests, make sure yurthub component startup correctly.
  2. then you need to adjust the kubelet.conf(server: http://127.0.0.1:10261) in order to make kubelet accesses kube-apiserver through yurthub, then restart kubelet component.
  3. then restart kube-proxy, coredns pod in order to enable service topology capability.

@ccjjxx99
Copy link
Author

ccjjxx99 commented Dec 7, 2022

@rambohe-ch Thanks a lot! It works. I successfully deployed yurthub on the master node and the service topology function works well. My metrics-server is also working fine. Thanks again for your answer!

@rambohe-ch
Copy link
Member

@ccjjxx99 I will close this issue because the problem had been solved, and you can feel free to reopen this issue if you need.

@rambohe-ch
Copy link
Member

rambohe-ch commented Dec 7, 2022

@ccjjxx99 would you mind registering as an OpenYurt user in this issue?

and your registration will encourage us to improve OpenYurt better.

@TonyZZhang
Copy link
Contributor

If I want to use edge hostname to visit service, Do I need to mounted the yurt-tunnel-nodes configmap to coredns?

@rambohe-ch
Copy link
Member

If I want to use edge hostname to visit service, Do I need to mounted the yurt-tunnel-nodes configmap to coredns?

@TonyZZhang yes, you need to mount yurt-tunnel-nodes configmap into coredns container. but we have recommended to install a independent coredns instance(named yurt-tunnel-dns: https:/openyurtio/openyurt/blob/master/config/setup/yurt-tunnel-dns.yaml) for hostname resolution on cloud nodes or master nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind/bug
Projects
None yet
Development

No branches or pull requests

3 participants