Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nacos 2.4.0部署在k8s上,无法进行自动扩缩容 #12727

Open
Umbrallerstackcode opened this issue Oct 14, 2024 · 11 comments
Open

nacos 2.4.0部署在k8s上,无法进行自动扩缩容 #12727

Umbrallerstackcode opened this issue Oct 14, 2024 · 11 comments
Labels
help wanted Extra attention is needed status/need feedback

Comments

@Umbrallerstackcode
Copy link

部署环境:
nacos 2.4.0
k8s 1.28.2
containerd 1.7.22
mysql 5.7.44

nacos使用statefulset部署集群

部署成功后是初始是两个节点,即nacos-0和nacos-1
user@k8s-master1:~$ kubectl get pod -n nacos-test
NAME READY STATUS RESTARTS AGE
nacos-0 1/1 Running 0 3m
nacos-1 1/1 Running 0 2m59s

当使用命令在线扩容时
$ kubectl scale sts nacos --replicas=3 -n nacos-test

user@k8s-master1:~$ kubectl get pod -n nacos-test
NAME READY STATUS RESTARTS AGE
nacos-0 1/1 Running 0 3m
nacos-1 1/1 Running 0 2m59s
nacos-2 1/1 Running 0 29s

nacos集群节点从两个节点扩展到了3个
但是我们这时候查看配置文件cluster.conf的时候,发现新增的nacos节点并没有DNS信息
user@k8s-master1:~$ for i in 0 1 2; do echo nacos-$i; kubectl exec nacos-$i -n nacos-test -- sh -c "cat conf/cluster.conf"; done
nacos-0
#2024-10-14T09:49:17.967
10.244.36.66:8848
nacos-0.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-1.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-1
#2024-10-14T09:49:18.768
10.244.107.198:8848
nacos-0.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-1.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-2
#2024-10-14T09:51:54.892
10.244.169.131:8848
nacos-0.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-1.nacos-test-svc.nacos-test.svc.cluster.local:8848

这个时候手动去修改configmap:nacos-config,添加新节点的DNS信息
nacos-servers: nacos-0.nacos-test-svc.nacos-test.svc.cluster.local:8848 nacos-1.nacos-test-svc.nacos-test.svc.cluster.local:8848 nacos-2.nacos-test-svc.nacos-test.svc.cluster.local:8848

添加完成后执行
$ kubectl apply -f nacos.yaml
等待一段时间仍然在nacos的控制台界面和cluster.conf配置文件中仍然没有新增的节点信息,最终重启nacos集群的statefulset才可以

$ kubectl rollout restart sts nacos -n nacos-test

user@k8s-master1:~$ for i in 0 1 2; do echo nacos-$i; kubectl exec nacos-$i -n nacos-test -- sh -c "cat conf/cluster.conf"; done
nacos-0
#2024-10-14T09:59:02.973
10.244.36.67:8848
nacos-0.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-1.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-2.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-1
#2024-10-14T09:58:59.051
10.244.107.199:8848
nacos-0.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-1.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-2.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-2
#2024-10-14T09:58:55.452
10.244.169.132:8848
nacos-0.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-1.nacos-test-svc.nacos-test.svc.cluster.local:8848
nacos-2.nacos-test-svc.nacos-test.svc.cluster.local:8848

问题:1、查看nacos官网了解到peer-finder插件已经集成到nacos 2.x中,但是实际应用并未达到效果,是我的配置不对?哪里问题?
2、nacos能否实现自动扩缩容,而不需要手动去更新cluster.conf配置文件,甚至重启statefulset?

麻烦有经验的大佬给解答下上面的问题,非常感谢!

@KomachiSion KomachiSion added the help wanted Extra attention is needed label Oct 14, 2024
@KomachiSion
Copy link
Collaborator

我看到的问题:

  1. 启动时没有选择使用hostname方式启动, 默认以ip启动的话,和配置文件中的域名冲突,会将自己的ip加入到文件里
  2. cluster.conf是动态生效的,不需要重启, 关于peer-finder插件,可以提交issue到对应仓库进行询问。

@Umbrallerstackcode
Copy link
Author

Umbrallerstackcode commented Oct 14, 2024

你好,感谢你的回答,关于问题1,我一会尝试下。关于问题2,现在nacos2.x版本已经把peer-finder集成到里面了,按照你的意思,就是我scale replicas增加之后,只需要apply yaml文件,而不需要更新NACOS_SERVERS的节点DNS吗?

@Umbrallerstackcode
Copy link
Author

大佬,我这还有一个问题,就是只要使用service-headless,日志总是报无法解析域名,但是网络以及coreDNS都是正常的,但是换成NodePort的service,就是正常的

使用headless的时候,控制台界面截图如下
image

这时候日志报错:
2024-10-14 17:43:03,238 WARN [Channel<2401>: (nacos-1.nacos-headless.nacos-test.svc.cluster.local:7848)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host nacos-1.nacos-headless.nacos-test.svc.cluster.local, cause=java.lang.RuntimeException: java.net.UnknownHostException: nacos-1.nacos-headless.nacos-test.svc.cluster.local
at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:223)
at io.grpc.internal.DnsNameResolver.doResolve(DnsNameResolver.java:282)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:318)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.net.UnknownHostException: nacos-1.nacos-headless.nacos-test.svc.cluster.local
at java.net.InetAddress$CachedAddresses.get(InetAddress.java:765)
at java.net.InetAddress.getAllByName0(InetAddress.java:1292)
at java.net.InetAddress.getAllByName(InetAddress.java:1145)
at java.net.InetAddress.getAllByName(InetAddress.java:1066)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:632)
at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:219)
... 5 more
}
2024-10-14 17:43:03,475 WARN [Channel<2427>: (nacos-0.nacos-headless.nacos-test.svc.cluster.local:7848)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host nacos-0.nacos-headless.nacos-test.svc.cluster.local, cause=java.lang.RuntimeException: java.net.UnknownHostException: nacos-0.nacos-headless.nacos-test.svc.cluster.local: Name does not resolve
at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:223)
at io.grpc.internal.DnsNameResolver.doResolve(DnsNameResolver.java:282)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:318)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.net.UnknownHostException: nacos-0.nacos-headless.nacos-test.svc.cluster.local: Name does not resolve
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:868)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1303)
at java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:816)
at java.net.InetAddress.getAllByName0(InetAddress.java:1292)
at java.net.InetAddress.getAllByName(InetAddress.java:1145)
at java.net.InetAddress.getAllByName(InetAddress.java:1066)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:632)
at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:219)
... 5 more
}

但是这些报错的端口7848,9849,8848,9848,这些端口在headless中都配置了,但还是报上面日志错误,麻烦大佬看下这到底是什么问题

@Umbrallerstackcode
Copy link
Author

Umbrallerstackcode commented Oct 15, 2024

你好,昨天按照你说的把问题1和问题2都测试过了,1没有问题解决了,对于2,还有一个小问题,就是扩缩容后,nacos集群可以更新到最新的节点数,但是cluster.conf只有新增节点有全部节点列表,而最初的旧节点,只有初始节点数量,请问这个问题只能手动新增新节点的DNS地址吗

@KomachiSion
Copy link
Collaborator

那样的话就需要提交issue到peer-finder处咨询了,是不是哪里配置不正确导致peer-finder没有正常工作。

我记得peer-finder的作者好像说过这个finder是测试用的。 建议还是通过operator来进行部署。

@Umbrallerstackcode
Copy link
Author

好的谢谢

@Umbrallerstackcode
Copy link
Author

把基本所有的,可以传到容器的环境变量配置都试过了,例如:NACOS_SERVERS MEMBER_LIST 都加上了,但还是只有新增pod的cluster.conf配置文件具有全部集群节点的DNS信息,老的pod还是没有,但是不影响各个pod获取数据,因为数据存储在数据库,而且peer-find插件时集成在nacos 2.x的版本里面的

@Umbrallerstackcode
Copy link
Author

Umbrallerstackcode commented Oct 17, 2024

我说就是cluster.conf文件,也就是#12739 这个问题

@KomachiSion
Copy link
Collaborator

那就是重复问题了, 先标记为duplicated了, 收敛到一个issue里。

@KomachiSion KomachiSion added status/duplicate This issue or pull request already exists help wanted Extra attention is needed status/need feedback and removed help wanted Extra attention is needed status/need feedback status/duplicate This issue or pull request already exists labels Oct 21, 2024
@KomachiSion
Copy link
Collaborator

你看下是关闭哪个ISSUE,#12739 还是 本issue

@Umbrallerstackcode
Copy link
Author

你看下是关闭哪个ISSUE,#12739 还是 本issue

关本issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed status/need feedback
Projects
None yet
Development

No branches or pull requests

2 participants