Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix core caused by concurrent use of non-thread-safe gethostbyname #1611

Merged
merged 2 commits into from
Jul 19, 2024

Conversation

henryzhx8
Copy link
Collaborator

@henryzhx8 henryzhx8 commented Jul 12, 2024

  • 问题:在启动发送本地磁盘文件时(上次退出时落盘的),当本地磁盘文件较大时,有一定概率导致core,堆栈信息如下:
image

排查下来大概率是由于定时检查ip变没变时调用的非可重入版本gethostbyname,与同步发送时curl使用的gethostbyname函数发生了data race。

  • 解决方案:将gethostbyname改成线程安全且跨平台的getaddrinfo。

@henryzhx8 henryzhx8 added the core Core feature label Jul 12, 2024
@henryzhx8 henryzhx8 added this to the v2.0 milestone Jul 12, 2024
@henryzhx8 henryzhx8 added bug Something isn't working and removed core Core feature labels Jul 12, 2024
core/common/MachineInfoUtil.cpp Outdated Show resolved Hide resolved
core/common/MachineInfoUtil.cpp Outdated Show resolved Hide resolved
#endif
if (i == 0) {
firstIp = tmp;
#if defined(__linux__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

宏的嵌套比之前复杂了很多,感觉还是之前linux windows整体分开更清晰些,虽然会多一些外围的循环。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1、windows是怎么测试的
2、这个问题自己是否能复现,并发是怎么测试的

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. windows目前没有测试,会记一个todo等windows那边能编译后再测试
  2. yitao的机器在特定输入下有大概率复现,lingzhu之前的dns core问题目前看也是这个导致的,两个场景可以复现。代码修复后多次运行没有再出现core。

@henryzhx8 henryzhx8 merged commit c8385ca into main Jul 19, 2024
15 checks passed
@henryzhx8 henryzhx8 deleted the fix/curl_core branch July 19, 2024 01:42
Abingcbc added a commit to Abingcbc/ilogtail that referenced this pull request Jul 19, 2024
commit c8385ca
Author: henryzhx8 <[email protected]>
Date:   Fri Jul 19 09:42:31 2024 +0800

    fix core caused by concurrent use of non-thread-safe gethostbyname (alibaba#1611)

    * fix core caused by concurrent use of non-thread-safe gethostbyname

commit 8fc252e
Author: Qiu Fengshuo <[email protected]>
Date:   Thu Jul 18 09:42:06 2024 +0800

    speedup CI UT job (alibaba#1606)

    * Split the original UT CI job into two separate jobs: one with SPL and one without SPL

    * fix: change design. Build .a and .so at the same CI UT job

    * fix

    * fix

    * fix

    * fix

    * fix

    * fix

    * fix
linrunqi08 pushed a commit that referenced this pull request Jul 23, 2024
…1611)

* fix core caused by concurrent use of non-thread-safe gethostbyname
linrunqi08 pushed a commit that referenced this pull request Aug 27, 2024
…1611)

* fix core caused by concurrent use of non-thread-safe gethostbyname
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants