Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover readers exactly from checkpoint #1620

Merged
merged 10 commits into from
Jul 23, 2024
Merged

Conversation

Abingcbc
Copy link
Collaborator

问题

之前在从checkpoint恢复reader时,无论之前reader在哪个位置,都会将reader放置到readerArray中,然后在后续移动到正确的位置。
如果出现日志轮转文件数量大于readArray最大大小时,并且出现inode复用,就会出现readerArray顺序错误(readerArray强要求,reader顺序按照文件轮转降序排列,即 log.2 log.1 log)
目前,已知导致的问题:

  1. 错误恢复导致的reader关闭,日志截断
  2. 错误恢复导致文件读取位置置为开头,日志采集重复

解决方法

在checkpoint中新增字段,保存reader在readerArray中的位置。
-2:不在队列中
-1:默认值,新建的reader,需要放置到readerArray末尾
>0:在readerArray中的实际位置
在从checkpoint中恢复的时候,根据不同的值,恢复reader到之前准确的位置。

Copy link
Collaborator

@henryzhx8 henryzhx8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. checkpoint恢复时不要收到队列长度20的限制;
  2. 确认一下升级的那一下,checkpoint里没有idx字段的情况

core/event_handler/EventHandler.cpp Show resolved Hide resolved
commit c8385ca
Author: henryzhx8 <[email protected]>
Date:   Fri Jul 19 09:42:31 2024 +0800

    fix core caused by concurrent use of non-thread-safe gethostbyname (alibaba#1611)

    * fix core caused by concurrent use of non-thread-safe gethostbyname

commit 8fc252e
Author: Qiu Fengshuo <[email protected]>
Date:   Thu Jul 18 09:42:06 2024 +0800

    speedup CI UT job (alibaba#1606)

    * Split the original UT CI job into two separate jobs: one with SPL and one without SPL

    * fix: change design. Build .a and .so at the same CI UT job

    * fix

    * fix

    * fix

    * fix

    * fix

    * fix

    * fix
@messixukejia
Copy link
Collaborator

需要针对性的给出e2e测试场景构造,让这些场景都有机会触发。专门记个任务吧。

core/event_handler/EventHandler.cpp Outdated Show resolved Hide resolved
core/event_handler/EventHandler.cpp Outdated Show resolved Hide resolved
@henryzhx8 henryzhx8 added the bug Something isn't working label Jul 23, 2024
@henryzhx8 henryzhx8 added this to the v2.0 milestone Jul 23, 2024
@henryzhx8 henryzhx8 merged commit 939937a into alibaba:main Jul 23, 2024
15 checks passed
Abingcbc added a commit to Abingcbc/ilogtail that referenced this pull request Jul 23, 2024
Abingcbc added a commit to Abingcbc/ilogtail that referenced this pull request Jul 23, 2024
Abingcbc added a commit to Abingcbc/ilogtail that referenced this pull request Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants