Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: "failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114" #36593

Open
1 task done
1292253144 opened this issue Sep 29, 2024 · 19 comments
Assignees
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@1292253144
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.1
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

用别名aa访问集合vector_info_day_2024_09_29报错:failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0。但是从attu上查看此集合根本没有用到query ID为114
image
但是别名aa绑定的上一个集合vector_info_day_2024_09_28用到了query ID为114

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

[2024/09/29 02:09:47.858 +00:00] [WARN] [proxy/lb_policy.go:169] ["search/query channel failed, node not available"] [traceID=b71e1ccf
9e642c9247dc8286866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-d
ev-rootcoord-dml_12_452131967330518986v0] [nodeID=114] [error="can not find client of node 114"]
[2024/09/29 02:09:47.858 +00:00] [WARN] [retry/retry.go:46] ["retry func failed"] [traceID=b71e1ccf9e642c9247dc8286866a195d] [retried=
0] [error="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114"] [er
rorVerbose="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114\n(1)
attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n |
t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src
/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRet
ry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPoli
cyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Gro
up).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/
asm_amd64.s:1598\nWraps: (2) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (3) can not
find client of node 114\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"]
[2024/09/29 02:09:48.035 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a
pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926]
[2024/09/29 02:09:48.038 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a
pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926]
[2024/09/29 02:09:48.040 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a
pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926]
[2024/09/29 02:09:48.042 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a
pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926]
[2024/09/29 02:09:48.059 +00:00] [INFO] [proxy/meta_cache.go:994] ["clearing shard cache for collection"] [collectionName=ads_aic_app_
album_photo_vector_info_day]
[2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/lb_policy.go:126] ["no available shard delegator found"] [traceID=b71e1ccf9e642c9247dc8
286866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-dev-rootcoord-
dml_12_452131967330518986v0] [nodes="[114]"] [excluded="[114]"]
[2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/lb_policy.go:157] ["failed to select node for shard"] [traceID=b71e1ccf9e642c9247dc8286
866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-dev-rootcoord-dml
_12_452131967330518986v0] [nodeID=-1] [error="channel not available[channel=no available shard delegator found]"]
[2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/task_search.go:511] ["search execute failed"] [traceID=b71e1ccf9e642c9247dc8286866a195d
] [nq=1] [error="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114
"] [errorVerbose="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 11
4\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1
n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/
go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteW
ithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(

LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup
.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/ru
ntime/asm_amd64.s:1598\nWraps: (2) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (3) ca
n not find client of node 114\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"]
[2024/09/29 02:09:48.062 +00:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=b71e1ccf9e642c9247dc828686
6a195d] [error="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find c
lient of node 114"] [errorVerbose="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_45213196733051898
6v0: can not find client of node 114\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*se
archTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/intern
al/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/
milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_schedu
ler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/uti
l/conc/pool.go:81\n | [...repeated from below...]\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n
| github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/intern
al/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry
/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milv
us/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github
.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/
x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (4) failed to get
delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (5) can not find client of node 114\nError types: (1) *
withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString"]
[2024/09/29 02:09:48.062 +00:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=b71e1ccf9e642c9247dc8286866a19
5d] [role=proxy] [db=default] [collection=ads_aic_app_album_photo_vector_info_day] [partitions="[]"] [dsl="series_name in ["胜达"]"] [le
n(PlaceholderGroup)=3084] [OutputFields="[photo_url,photo_weight,id]"] [search_params="[{"key":"anns_field","value":"vector"},
{"key":"topk","value":"5"},{"key":"metric_type","value":"L2"},{"key":"round_decimal","value":"-1"},{"key":"
offset","value":"0"},{"key":"params","value":"{\"nprobe\": 10}"}]"] [guarantee_timestamp=1727575782844] [nq=1] [error
="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node
114"] [errorVerbose="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not f
ind client of node 114\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Exec
ute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*tas
kScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milv
us/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n
| github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go
:81\n | [...repeated from below...]\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/
milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_po
licy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n
| github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/pr
oxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io
/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/
errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (4) failed to get delegator 114
for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (5) can not find client of node 114\nError types: (1) *withstack.with
Stack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString"]
[2024/09/29 02:09:48.067 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a
pp_smallvideo_text_vector_info_day_v2_2024_09_29] [collectionID=452131967332546943]

Anything else?

No response

@1292253144 1292253144 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 29, 2024
@yanliang567
Copy link
Contributor

@1292253144 could you please try to search with the collection name instead of the collection alias? this would help us to know if the alias is not work

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 29, 2024
@yanliang567
Copy link
Contributor

/assign @1292253144

@xiaofan-luan
Copy link
Contributor

My guess is that func (m *MetaCache) update(ctx context.Context, database, collectionName string, collectionID UniqueID) (*collectionInfo, error) is not handling alias corrrectly.

@SimFG please help on it

@yanliang567
Copy link
Contributor

/assign @SimFG

@yanliang567
Copy link
Contributor

it is easy to reproduce if there are multiple replicas... @SimFG

@yanliang567 yanliang567 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 30, 2024
@yanliang567
Copy link
Contributor

the error is different, and the search requests recover in 1 second

09/30/2024 03:52:40 AM - ERROR - <MilvusException: (code=65535, message=fail to search on QueryNode 2: worker(
2) query failed: segment 452894936951134023 belongs to partition 452894936953134673, which is not in [45289493
6952131753])>

@xiaofan-luan
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.1
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

用别名aa访问集合vector_info_day_2024_09_29报错:failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0。但是从attu上查看此集合根本没有用到query ID为114 image 但是别名aa绑定的上一个集合vector_info_day_2024_09_28用到了query ID为114

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

[2024/09/29 02:09:47.858 +00:00] [WARN] [proxy/lb_policy.go:169] ["search/query channel failed, node not available"] [traceID=b71e1ccf 9e642c9247dc8286866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-d ev-rootcoord-dml_12_452131967330518986v0] [nodeID=114] [error="can not find client of node 114"] [2024/09/29 02:09:47.858 +00:00] [WARN] [retry/retry.go:46] ["retry func failed"] [traceID=b71e1ccf9e642c9247dc8286866a195d] [retried= 0] [error="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114"] [er rorVerbose="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src /github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRet ry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPoli cyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Gro up).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/ asm_amd64.s:1598\nWraps: (2) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (3) can not find client of node 114\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"] [2024/09/29 02:09:48.035 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926] [2024/09/29 02:09:48.038 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926] [2024/09/29 02:09:48.040 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926] [2024/09/29 02:09:48.042 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926] [2024/09/29 02:09:48.059 +00:00] [INFO] [proxy/meta_cache.go:994] ["clearing shard cache for collection"] [collectionName=ads_aic_app_ album_photo_vector_info_day] [2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/lb_policy.go:126] ["no available shard delegator found"] [traceID=b71e1ccf9e642c9247dc8 286866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-dev-rootcoord- dml_12_452131967330518986v0] [nodes="[114]"] [excluded="[114]"] [2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/lb_policy.go:157] ["failed to select node for shard"] [traceID=b71e1ccf9e642c9247dc8286 866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-dev-rootcoord-dml _12_452131967330518986v0] [nodeID=-1] [error="channel not available[channel=no available shard delegator found]"] [2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/task_search.go:511] ["search execute failed"] [traceID=b71e1ccf9e642c9247dc8286866a195d ] [nq=1] [error="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114 "] [errorVerbose="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 11 4\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1 n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/ go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteW ithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.( LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup .(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/ru ntime/asm_amd64.s:1598\nWraps: (2) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (3) ca n not find client of node 114\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"] [2024/09/29 02:09:48.062 +00:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=b71e1ccf9e642c9247dc828686 6a195d] [error="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find c lient of node 114"] [errorVerbose="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_45213196733051898 6v0: can not find client of node 114\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*se archTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/intern al/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/ milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_schedu ler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/uti l/conc/pool.go:81\n | [...repeated from below...]\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/intern al/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry /retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milv us/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github .com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/ x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (4) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (5) can not find client of node 114\nError types: (1) * withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString"] [2024/09/29 02:09:48.062 +00:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=b71e1ccf9e642c9247dc8286866a19 5d] [role=proxy] [db=default] [collection=ads_aic_app_album_photo_vector_info_day] [partitions="[]"] [dsl="series_name in ["胜达"]"] [le n(PlaceholderGroup)=3084] [OutputFields="[photo_url,photo_weight,id]"] [search_params="[{"key":"anns_field","value":"vector"}, {"key":"topk","value":"5"},{"key":"metric_type","value":"L2"},{"key":"round_decimal","value":"-1"},{"key":" offset","value":"0"},{"key":"params","value":"{"nprobe": 10}"}]"] [guarantee_timestamp=1727575782844] [nq=1] [error ="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114"] [errorVerbose="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not f ind client of node 114\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Exec ute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*tas kScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milv us/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go :81\n | [...repeated from below...]\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/ milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_po licy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/pr oxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io /milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/ errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (4) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (5) can not find client of node 114\nError types: (1) *withstack.with Stack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString"] [2024/09/29 02:09:48.067 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_text_vector_info_day_v2_2024_09_29] [collectionID=452131967332546943]

Anything else?

No response

is this a recoverable error?

@1292253144
Copy link
Author

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.1
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

用别名aa访问集合vector_info_day_2024_09_29报错:failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0。但是从attu上查看此集合根本没有用到query ID为114 image 但是别名aa绑定的上一个集合vector_info_day_2024_09_28用到了query ID为114

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

[2024/09/29 02:09:47.858 +00:00] [WARN] [proxy/lb_policy.go:169] ["search/query channel failed, node not available"] [traceID=b71e1ccf 9e642c9247dc8286866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-d ev-rootcoord-dml_12_452131967330518986v0] [nodeID=114] [error="can not find client of node 114"] [2024/09/29 02:09:47.858 +00:00] [WARN] [retry/retry.go:46] ["retry func failed"] [traceID=b71e1ccf9e642c9247dc8286866a195d] [retried= 0] [error="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114"] [er rorVerbose="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src /github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRet ry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPoli cyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Gro up).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/ asm_amd64.s:1598\nWraps: (2) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (3) can not find client of node 114\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"] [2024/09/29 02:09:48.035 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926] [2024/09/29 02:09:48.038 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926] [2024/09/29 02:09:48.040 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926] [2024/09/29 02:09:48.042 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_title_vector_info_day_v2_2024_09_29] [collectionID=452131967344872926] [2024/09/29 02:09:48.059 +00:00] [INFO] [proxy/meta_cache.go:994] ["clearing shard cache for collection"] [collectionName=ads_aic_app_ album_photo_vector_info_day] [2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/lb_policy.go:126] ["no available shard delegator found"] [traceID=b71e1ccf9e642c9247dc8 286866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-dev-rootcoord- dml_12_452131967330518986v0] [nodes="[114]"] [excluded="[114]"] [2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/lb_policy.go:157] ["failed to select node for shard"] [traceID=b71e1ccf9e642c9247dc8286 866a195d] [collectionID=452131967330518986] [collectionName=ads_aic_app_album_photo_vector_info_day] [channelName=by-dev-rootcoord-dml _12_452131967330518986v0] [nodeID=-1] [error="channel not available[channel=no available shard delegator found]"] [2024/09/29 02:09:48.061 +00:00] [WARN] [proxy/task_search.go:511] ["search execute failed"] [traceID=b71e1ccf9e642c9247dc8286866a195d ] [nq=1] [error="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114 "] [errorVerbose="failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 11 4\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1 n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/ go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteW ithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.( LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup .(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/ru ntime/asm_amd64.s:1598\nWraps: (2) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (3) ca n not find client of node 114\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"] [2024/09/29 02:09:48.062 +00:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=b71e1ccf9e642c9247dc828686 6a195d] [error="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find c lient of node 114"] [errorVerbose="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_45213196733051898 6v0: can not find client of node 114\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*se archTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/intern al/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/ milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_schedu ler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/uti l/conc/pool.go:81\n | [...repeated from below...]\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/intern al/proxy/lb_policy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry /retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milv us/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github .com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/ x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (4) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (5) can not find client of node 114\nError types: (1) * withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString"] [2024/09/29 02:09:48.062 +00:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=b71e1ccf9e642c9247dc8286866a19 5d] [role=proxy] [db=default] [collection=ads_aic_app_album_photo_vector_info_day] [partitions="[]"] [dsl="series_name in ["胜达"]"] [le n(PlaceholderGroup)=3084] [OutputFields="[photo_url,photo_weight,id]"] [search_params="[{"key":"anns_field","value":"vector"}, {"key":"topk","value":"5"},{"key":"metric_type","value":"L2"},{"key":"round_decimal","value":"-1"},{"key":" offset","value":"0"},{"key":"params","value":"{"nprobe": 10}"}]"] [guarantee_timestamp=1727575782844] [nq=1] [error ="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not find client of node 114"] [errorVerbose="failed to search: failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0: can not f ind client of node 114\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Exec ute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*tas kScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milv us/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go :81\n | [...repeated from below...]\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/ milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_po licy.go:176\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/pr oxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io /milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/ errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (4) failed to get delegator 114 for channel by-dev-rootcoord-dml_12_452131967330518986v0\nWraps: (5) can not find client of node 114\nError types: (1) *withstack.with Stack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString"] [2024/09/29 02:09:48.067 +00:00] [INFO] [proxy/meta_cache.go:395] ["meta update success"] [database=default] [collectionName=ads_aic_a pp_smallvideo_text_vector_info_day_v2_2024_09_29] [collectionID=452131967332546943]

Anything else?

No response

is this a recoverable error?


Yeah, it'll recover on its own in about half an hour

@xiaofan-luan
Copy link
Contributor

@aoiasd @chyezh
guess that's exactly the reason we're guessing. it's recovering but just too slow

@1292253144
Copy link
Author

What might be the cause of this, and how can I avoid it?

@xiaofan-luan
Copy link
Contributor

What might be the cause of this, and how can I avoid it?

there is a actually a bug on rocksmq(Only for stanalone). So each 200ms it will only consume 1k message. Since your cluster doesn't insert for very long time, all the data in rocksmq is timetick, so it takes relatively long time to consume all the timeticks.

With this bug fixed, the watch DML should be recovered in 1-2 minutes

@1292253144
Copy link
Author

In which version was this bug fixed

@SimFG
Copy link
Contributor

SimFG commented Oct 8, 2024

@1292253144
Is your milvus in cluster or standalone mode? What mq is used? The above speculation should only happen in standalone mode, and mq is rocksmq. I see the description in the issue that it is cluster mode, using kafka mq.
reference issue: #36569

@xiaofan-luan
Copy link
Contributor

Ignore me, I thought this is a bug related to alias where some meta cache is not updated

@yanliang567
Copy link
Contributor

@1292253144 could you please help to collect the etcd backup for investigation? Please refer to this doc: https:/milvus-io/birdwatcher to backup etcd backup with birdwatcher

/assign @1292253144

@SimFG
Copy link
Contributor

SimFG commented Oct 8, 2024

According to the information in the log and issue description, can you help confirm:

  1. Is the alias name ads_aic_app_album_photo_vector_info_day
  2. Are the names of the two collections ads_aic_app_album_photo_vector_info_day_2024_09_29 and ads_aic_app_album_photo_vector_info_day_2024_09_28
  3. What are the IDs of the above two collections

@SimFG
Copy link
Contributor

SimFG commented Oct 9, 2024

@1292253144 the search has been reporting this error for half an hour, right?

@1292253144
Copy link
Author

The duration of the error may not be certain, but it is more than half an hour

@SimFG
Copy link
Contributor

SimFG commented Oct 9, 2024

@1292253144 Could you please confirm the three questions above? Also, could you provide a complete log of the error period?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants