-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG 注册中心集群数据不一致 #7151
Labels
status/duplicate
This issue or pull request already exists
Comments
正常情况下,向宕机节点同步任务应该返回 |
如果确定是问题的话,能否提交一个PR修复一下? |
订阅一下,在k8s上已经被这个集群数据不一致的问题,困扰很久了,server版本1.4.1 |
同在K8S,重启一节点,数据出现不一致情况 |
Closed
这个问题是队列阻塞导致的,目前是靠监控! |
同遇到这个问题,服务端2.1.0版本 |
Refer to #8099 |
KomachiSion
added
status/duplicate
This issue or pull request already exists
and removed
status/need feedback
labels
Aug 8, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
当集群内一个节点,宕机时。其他节点在写入新数据时会出现注册中心数据不一致的问题。
我们已经查询到了原因,并且可以稳定复现,复现的情况符合推测。
场景:
本来有3个节点,宕机一个节点。
一段时间过后,新注册的实例在剩余节点中不一致,(临时实例,使用distro协议)
并且新注册的实例在一定时间后,还是会被正确同步。
原因:
com.alibaba.nacos.common.task.engine.NacosExecuteTaskExecuteEngine # executeWorkers 属性
com.alibaba.nacos.common.task.engine.TaskExecuteWorker#queue 属性
中 BlockingQueue queue 被写满
当一个节点宕机时,其他节点仍然还会进行向宕机节点同步数据的任务,这些任务会失败,然后继续被重新提交。
当队列处理速度>提交数据时,新的实例的同步任务加会被堵塞。
我们最高观察到queue 内堵塞 3w+的任务。
问题是向宕机节点同步的任务耗时太高,这个方法观察耗时 需要 300ms
2021-10-30 07:17:29,952 WARN [DISTRO] Sync data change failed.
com.alibaba.nacos.api.exception.NacosException: Client not connected.
at com.alibaba.nacos.common.remote.client.RpcClient.asyncRequest(RpcClient.java:727)
at com.alibaba.nacos.core.cluster.remote.ClusterRpcClientProxy.asyncRequest(ClusterRpcClientProxy.java:192)
at com.alibaba.nacos.naming.consistency.ephemeral.distro.v2.DistroClientTransportAgent.syncData(DistroClientTransportAgent.java:95)
at com.alibaba.nacos.core.distributed.distro.task.execute.DistroSyncDeleteTask.doExecuteWithCallback(DistroSyncDeleteTask.java:60)
at com.alibaba.nacos.core.distributed.distro.task.execute.AbstractDistroExecuteTask.run(AbstractDistroExecuteTask.java:64)
at com.alibaba.nacos.common.task.engine.TaskExecuteWorker$InnerWorker.run(TaskExecuteWorker.java:116)
The text was updated successfully, but these errors were encountered: