Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nacos 集群从3个pod 一下增加到5个pod,修改NACOS_REPLICAS = 5后 #166

Closed
scriptyang opened this issue Oct 30, 2020 · 30 comments
Closed

Comments

@scriptyang
Copy link

k8s 5台node节点,使用nacos 3台pod,增加5台后发现选举不出leader。
通过 kubectl logs 无法查看到有用的日志信息。
进入nacos-0 pod 后可以查看到logs/nacos-0.log日志
java.lang.IllegalStateException: can not find peer: nacos-3.nacos-headless.default.svc.cluster.local:8848
at com.alibaba.nacos.naming.consistency.persistent.raft.RaftCore.receivedVote(RaftCore.java:494)
at com.alibaba.nacos.naming.controllers.RaftController.vote(RaftController.java:88)
at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:800)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:908)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:660)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CorsFilter.doFilterInternal(CorsFilter.java:96)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.boot.actuate.web.trace.servlet.HttpTraceFilter.doFilterInternal(HttpTraceFilter.java:90)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at com.alibaba.nacos.naming.web.DistroFilter.doFilter(DistroFilter.java:154)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at com.alibaba.nacos.core.auth.AuthFilter.doFilter(AuthFilter.java:60)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at com.alibaba.nacos.naming.web.TrafficReviseFilter.doFilter(TrafficReviseFilter.java:75)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:209)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:178)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:357)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:270)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:92)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.filterAndRecordMetrics(WebMvcMetricsFilter.java:117)
at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:106)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:791)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1417)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)

正常操作是:
将pod 每个重启一遍就可以识别leader。

问题:可以在不删除和更新yaml配置来进行新增pod识别leader吗?新增pod后leader选举出现问题。

@paderlol
Copy link
Collaborator

你改这个参数没有用啊,你是用的peerfinder插件,还是自己配置的集群地址?

@scriptyang
Copy link
Author

新增的 leaderDueMs 和最少时的pod 是不一样的
nacos-0
{"services":1,"peers":[{"ip":"nacos-0.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":91,"leaderDueMs":13151,"heartbeatDueMs":2500,"state":"FOLLOWER"},{"ip":"nacos-1.nacos-headless.default.svc.cluster.local:8848","term":0,"leaderDueMs":9780,"heartbeatDueMs":1237,"state":"FOLLOWER"},{"ip":"nacos-2.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":91,"leaderDueMs":15365,"heartbeatDueMs":5000,"state":"LEADER"}]}nacos-1
{"services":1,"peers":[{"ip":"nacos-0.nacos-headless.default.svc.cluster.local:8848","term":0,"leaderDueMs":9104,"heartbeatDueMs":2239,"state":"FOLLOWER"},{"ip":"nacos-1.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":91,"leaderDueMs":16225,"heartbeatDueMs":2500,"state":"FOLLOWER"},{"ip":"nacos-2.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":91,"leaderDueMs":15365,"heartbeatDueMs":5000,"state":"LEADER"}]}nacos-2
{"services":1,"peers":[{"ip":"nacos-0.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":91,"leaderDueMs":15651,"heartbeatDueMs":5000,"state":"FOLLOWER"},{"ip":"nacos-1.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":91,"leaderDueMs":19225,"heartbeatDueMs":5000,"state":"FOLLOWER"},{"ip":"nacos-2.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":91,"leaderDueMs":12365,"heartbeatDueMs":2000,"state":"LEADER"}]}nacos-3
{"services":1,"peers":[{"ip":"nacos-3.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-3.nacos-headless.default.svc.cluster.local:8848","term":5,"leaderDueMs":2711,"heartbeatDueMs":500,"state":"CANDIDATE"},{"ip":"nacos-0.nacos-headless.default.svc.cluster.local:8848","term":0,"leaderDueMs":10616,"heartbeatDueMs":2227,"state":"FOLLOWER"},{"ip":"nacos-1.nacos-headless.default.svc.cluster.local:8848","term":0,"leaderDueMs":5365,"heartbeatDueMs":1482,"state":"FOLLOWER"},{"ip":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":0,"leaderDueMs":14185,"heartbeatDueMs":2217,"state":"FOLLOWER"},{"ip":"nacos-4.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-3.nacos-headless.default.svc.cluster.local:8848","term":5,"leaderDueMs":19988,"heartbeatDueMs":1500,"state":"FOLLOWER"}]}nacos-4
{"services":1,"peers":[{"ip":"nacos-3.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-4.nacos-headless.default.svc.cluster.local:8848","term":4,"leaderDueMs":18283,"heartbeatDueMs":5000,"state":"FOLLOWER"},{"ip":"nacos-0.nacos-headless.default.svc.cluster.local:8848","term":0,"leaderDueMs":2117,"heartbeatDueMs":4114,"state":"FOLLOWER"},{"ip":"nacos-1.nacos-headless.default.svc.cluster.local:8848","term":0,"leaderDueMs":12398,"heartbeatDueMs":339,"state":"FOLLOWER"},{"ip":"nacos-2.nacos-headless.default.svc.cluster.local:8848","term":0,"leaderDueMs":3217,"heartbeatDueMs":2862,"state":"FOLLOWER"},{"ip":"nacos-4.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-3.nacos-headless.default.svc.cluster.local:8848","term":5,"leaderDueMs":5488,"heartbeatDueMs":2000,"state":"FOLLOWER"}]}

@scriptyang
Copy link
Author

image

@paderlol
Copy link
Collaborator

扩容后cluster.conf文件有写入吗?

@scriptyang
Copy link
Author

@paderlol 扩容和缩减后 cluster.conf是有跟着修改的,但是通过命令检测leader,新增的pod 是没有的

@paderlol
Copy link
Collaborator

新增的pod是没有leader的?

@scriptyang
Copy link
Author

是的 新增的pod 选举不出leader,在nacos-0中查看得到上面的错误信息,java.lang.IllegalStateException: can not find peer: nacos-3.nacos-headless.default.svc.cluster.local:8848。但是在nacos-0中通过curl nacos-3.nacos-headless.default.svc.cluster.local:8848 是有内容返回的。

@scriptyang
Copy link
Author

如果将之前的3台重启leader信息就会正常

@scriptyang
Copy link
Author

环境是 mysql使用外部环境,nfs 使用 nacos k8s

@paderlol
Copy link
Collaborator

好的,我们看一下

@scriptyang
Copy link
Author

删除掉 作为leader的pod,选举正常进行。通过查看选举成功。在删除leader过程中,nacos会影响业务,选举成功会在删除的pod重启启动完成后选出

@scriptyang
Copy link
Author

image
删除leader 通过 web访问查看,leader不稳

@paderlol
Copy link
Collaborator

Cluster.conf是所有文件都确定新增了吧,能不能把raft日志拉出来看下

@scriptyang
Copy link
Author

image
新增pod里打印的日志
[root@nacos-4 nacos]# cat logs/naming-raft.log
2020-10-30 17:39:15,000 WARN [IS LEADER] no leader is available now!

2020-10-30 17:39:17,962 INFO leader timeout, start voting,leader: null, term: 12

2020-10-30 17:39:17,974 ERROR NACOS-RAFT vote failed: caused: can not find peer: nacos-4.nacos-headless.default.svc.cluster.local:8848;, url: http://nacos-1.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote

2020-10-30 17:39:18,035 INFO received approve from peer: {"ip":"nacos-3.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-4.nacos-headless.default.svc.cluster.local:8848","term":13,"leaderDueMs":19746,"heartbeatDueMs":1000,"state":"FOLLOWER"}

2020-10-30 17:39:18,841 ERROR NACOS-RAFT vote failed: caused: can not find peer: nacos-4.nacos-headless.default.svc.cluster.local:8848;, url: http://nacos-2.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote

2020-10-30 17:39:19,087 ERROR NACOS-RAFT vote failed: caused: can not find peer: nacos-4.nacos-headless.default.svc.cluster.local:8848;, url: http://nacos-0.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote

[root@nacos-4 nacos]# curl http://nacos-1.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote
{"timestamp":"2020-10-30T09:39:39.396+0000","status":501,"error":"Not Implemented","message":"no such api:GET:/nacos/v1/ns/raft/vote","path":"/nacos/v1/ns/raft/vote"}[root@nacos-4 nacos]#

@scriptyang
Copy link
Author

增加后所有的 nacos 都没有了 /nacos/v1/ns/raft/vote
Uploading image.png…

@paderlol
Copy link
Collaborator

哪个版本?

@scriptyang
Copy link
Author

nacos-peer-finder-plugin:1.0

@paderlol
Copy link
Collaborator

你这个事get 那个请求要post的

@scriptyang
Copy link
Author

访问这个路径失败了

2020-10-30 17:40:40,384 INFO received approve from peer: {"ip":"nacos-4.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-3.nacos-headless.default.svc.cluster.local:8848","term":18,"leaderDueMs":18896,"heartbeatDueMs":3500,"state":"FOLLOWER"}

2020-10-30 17:40:40,387 ERROR NACOS-RAFT vote failed: caused: can not find peer: nacos-3.nacos-headless.default.svc.cluster.local:8848;, url: http://nacos-1.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote

2020-10-30 17:40:40,460 ERROR NACOS-RAFT vote failed: caused: can not find peer: nacos-3.nacos-headless.default.svc.cluster.local:8848;, url: http://nacos-2.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote

2020-10-30 17:40:40,883 ERROR NACOS-RAFT vote failed: caused: can not find peer: nacos-3.nacos-headless.default.svc.cluster.local:8848;, url: http://nacos-0.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote

2020-10-30 17:40:45,001 WARN [IS LEADER] no leader is available now!

2020-10-30 17:40:58,377 INFO leader timeout, start voting,leader: null, term: 18

2020-10-30 17:40:58,389 INFO received approve from peer: {"ip":"nacos-4.nacos-headless.default.svc.cluster.local:8848","voteFor":"nacos-3.nacos-headless.default.svc.cluster.local:8848","term":19,"leaderDueMs":19425,"heartbeatDueMs":500,"state":"FOLLOWER"}

2020-10-30 17:40:58,391 ERROR NACOS-RAFT vote failed: caused: can not find peer: nacos-3.nacos-headless.default.svc.cluster.local:8848;, url: http://nacos-1.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote

2020-10-30 17:40:59,089 ERROR NACOS-RAFT vote failed: caused: can not find peer: nacos-3.nacos-headless.default.svc.cluster.local:8848;, url: http://nacos-2.nacos-headless.default.svc.cluster.local:8848/nacos/v1/ns/raft/vote

@scriptyang
Copy link
Author

我加入了钉钉哪个群,群里聊

@chuntaojun
Copy link
Member

有没有尝试过直接自己组建一个nacos集群,不使用k8s或者docker,就是自己解压然后运行,看看在这种模式下扩容看看会不会又这个问题,我这里自己用这种模式扩容是没有问题的

@chuntaojun
Copy link
Member

我在处理了

@paderlol
Copy link
Collaborator

已经处理 具体查看alibaba/nacos#4110

@scriptyang
Copy link
Author

image
使用 1.4.0 还是如此 k8s nacos 升级后,先3台 3台中只有两台是有leader的 有一台没有,扩展成5台后效果跟 之前版本一样

@paderlol
Copy link
Collaborator

paderlol commented Dec 4, 2020

1.4.1 fix

@scriptyang
Copy link
Author

image
最新的好像是1.4.0
latest 是 1.4.1 吗

@paderlol
Copy link
Collaborator

paderlol commented Dec 4, 2020

1.4.1还没发版本啊

@scriptyang
Copy link
Author

哦,好的

@zijiwork
Copy link

1.4.1仍有此问题存在

@paderlol
Copy link
Collaborator

好的,你和上面的issue一样的情况吗?可以把你的问题详情直接提到nacos的issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants