Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release v4.2.0 #6191

Merged
merged 8 commits into from
Dec 21, 2023
Merged

release v4.2.0 #6191

merged 8 commits into from
Dec 21, 2023

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Nov 14, 2023

Release checklist:

Copied from #6076, with a few changes.

before merge

  • Update VERSION.txt number.
  • Update version in Appveyor config file.
  • Update version in configure file of R-package: /gha run r-configure.
  • Change development.mode from unreleased to release in pkgdown config file.
  • Update version in python-package/pyproject.toml
  • Add release branch to RTD versions, trigger a new build, check docs
  • All docs for new behavior have Sphinx versionadded:: annotations (docs on those)
    • nothing in this release requires those
  • All new parameters in config.h have *New in version {version}* comments added
    • no new parameters added in config.h in this release
  • Run the valgrind checks with /gha run r-valgrind (docs link)
  • manually test Python and R packages on M1/M2 Mac

PRs that should be merged before releasing:

after merge

Notes for Reviewers

I believe this should be v4.2.0 instead of v4.1.0 because of the two breaking changes:

This release of the R package will not be published to CRAN, as #5987 has still not been resolved. I'm still working on that (and making good progress!), but let's not delay the critical fix for quantized training (#6108) waiting on that. #6191 (comment)

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 14, 2023

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https:/microsoft/LightGBM/actions/runs/6858665668

Status: failure ❌.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 14, 2023

Add release branch to RTD versions, trigger a new build, check docs

✅ successful build: https://readthedocs.org/projects/lightgbm/builds/22543919/

✅ docs look good: https://lightgbm.readthedocs.io/en/release-v4.2.0/

@jmoralez
Copy link
Collaborator

I think we should include a fix for #6195 in this release, I can work on it this week. Also given that the work for supporting arrow isn't complete yet I think we could wait for it as well, WDYT?

@jameslamb
Copy link
Collaborator Author

I think we should include a fix for #6195 in this release, I can work on it this week.

Yeah since you already have #6218 up, I'm good with waiting to officially release this until that's included.

Also given that the work for supporting arrow isn't complete yet I think we could wait for it as well, WDYT?

I still feel the way I did in #6034 (comment) ... we shouldn't delay releasing to wait for the Arrow stuff to be done. I want to get that fix for quantized training out soon.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Dec 1, 2023

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https:/microsoft/LightGBM/actions/runs/7055105840

Status: failure ❌.

@borchero
Copy link
Collaborator

borchero commented Dec 5, 2023

we shouldn't delay releasing to wait for the Arrow stuff to be done.

I think we're very close to being done now though 😄 given the release cycle of this package, it would be a pity to wait another couple months for it to arrive.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Dec 7, 2023

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https:/microsoft/LightGBM/actions/runs/7135050470

Status: failure ❌.

@jameslamb
Copy link
Collaborator Author

The valgrind checks are failing 3 errors all reporting bytes "possibly" lost, and all on code paths involving pthread_create@@GLIBC_2.34.

valgrind output (click me)
==5666== 
==5666== HEAP SUMMARY:
==5666==     in use at exit: 356,517,358 bytes in 57,373 blocks
==5666==   total heap usage: 11,237,862 allocs, 11,180,489 frees, 9,435,448,587 bytes allocated
==5666== 
==5666== 352 bytes in 1 blocks are possibly lost in loss record 157 of 2,135
==5666==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5666==    by 0x40147D9: calloc (rtld-malloc.h:44)
==5666==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==5666==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==5666==    by 0x4DA37B4: allocate_stack (allocatestack.c:430)
==5666==    by 0x4DA37B4: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==5666==    by 0x572D25F: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==5666==    by 0x5723A10: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==5666==    by 0x17BB2B04: LGBM_DatasetCreateFromCSC (c_api.cpp:1512)
==5666==    by 0x17BEA3CB: LGBM_DatasetCreateFromCSC_R (lightgbm_R.cpp:184)
==5666==    by 0x495AE00: R_doDotCall (dotcode.c:894)
==5666==    by 0x4965E41: do_dotcall (dotcode.c:1551)
==5666==    by 0x49A7662: Rf_eval (eval.c:1253)
==5666==    by 0x49AE2CF: do_set (eval.c:3556)
==5666==    by 0x49A7409: Rf_eval (eval.c:1225)
==5666== 
==5666== 352 bytes in 1 blocks are possibly lost in loss record 158 of 2,135
==5666==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5666==    by 0x40147D9: calloc (rtld-malloc.h:44)
==5666==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==5666==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==5666==    by 0x4DA37B4: allocate_stack (allocatestack.c:430)
==5666==    by 0x4DA37B4: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==5666==    by 0x74DB328: std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==5666==    by 0x177F719E: std::thread::thread<LightGBM::PipelineReader::Read(char const*, int, std::function<unsigned long (char const*, unsigned long)> const&)::{lambda()#1}, , void>(LightGBM::PipelineReader::Read(char const*, int, std::function<unsigned long (char const*, unsigned long)> const&)::{lambda()#1}&&) (std_thread.h:163)
==5666==    by 0x177F65B7: LightGBM::PipelineReader::Read(char const*, int, std::function<unsigned long (char const*, unsigned long)> const&) (pipeline_reader.h:56)
==5666==    by 0x177F9EA1: LightGBM::TextReader<int>::ReadAllAndProcess(std::function<void (int, char const*, unsigned long)> const&) (text_reader.h:103)
==5666==    by 0x177F7A7F: LightGBM::TextReader<int>::ReadAllLines() (text_reader.h:160)
==5666==    by 0x177ED0E9: LightGBM::DatasetLoader::LoadTextDataToMemory[abi:cxx11](char const*, LightGBM::Metadata const&, int, int, int*, std::vector<int, std::allocator<int> >*) (dataset_loader.cpp:967)
==5666==    by 0x177E85FA: LightGBM::DatasetLoader::LoadFromFile(char const*, int, int) (dataset_loader.cpp:231)
==5666==    by 0x17BC0CB9: LightGBM::DatasetLoader::LoadFromFile(char const*) (dataset_loader.h:26)
==5666==    by 0x17BAEC75: LGBM_DatasetCreateFromFile (c_api.cpp:983)
==5666== 
==5666== 352 bytes in 1 blocks are possibly lost in loss record 159 of 2,135
==5666==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==5666==    by 0x40147D9: calloc (rtld-malloc.h:44)
==5666==    by 0x40147D9: allocate_dtv (dl-tls.c:375)
==5666==    by 0x40147D9: _dl_allocate_tls (dl-tls.c:634)
==5666==    by 0x4DA37B4: allocate_stack (allocatestack.c:430)
==5666==    by 0x4DA37B4: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==5666==    by 0x572D25F: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==5666==    by 0x5723A10: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==5666==    by 0x17BC6D7E: LightGBM::Booster::Predict(int, int, int, int, int, std::function<std::vector<std::pair<int, double>, std::allocator<std::pair<int, double> > > (int)>, LightGBM::Config const&, double*, long*) const (c_api.cpp:441)
==5666==    by 0x17BB9C62: LGBM_BoosterPredictForMat (c_api.cpp:2482)
==5666==    by 0x17BF15CF: LGBM_BoosterPredictForMat_R (lightgbm_R.cpp:974)
==5666==    by 0x495AFB2: R_doDotCall (dotcode.c:909)
==5666==    by 0x4965E41: do_dotcall (dotcode.c:1551)
==5666==    by 0x49A7662: Rf_eval (eval.c:1253)
==5666==    by 0x49AC6F4: do_begin (eval.c:2977)
==5666== 
==5666== LEAK SUMMARY:
==5666==    definitely lost: 0 bytes in 0 blocks
==5666==    indirectly lost: 0 bytes in 0 blocks
==5666==      possibly lost: 1,056 bytes in 3 blocks
==5666==    still reachable: 356,516,302 bytes in 57,370 blocks
==5666==                       of which reachable via heuristic:
==5666==                         newarray           : 4,264 bytes in 1 blocks
==5666==         suppressed: 0 bytes in 0 blocks
==5666== Reachable blocks (those to which a pointer was found) are not shown.
==5666== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5666== 
==5666== For lists of detected and suppressed errors, rerun with: -s
==5666== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
writing valgrind output to valgrind-logs.log
valgrind found 0 bytes definitely lost
valgrind found 0 bytes indirectly lost
valgrind found 1056 bytes possibly lost
Error: Process completed with exit code 255.

I strongly believe these are false positives, based on these:

And based on the fact that CRAN has previously accepted our submissions which show these "possibly lost" valgrind findings related to pthread_create().

I'm going to build the R package from this branch and submit it to CRAN as v4.2.0. Will post here when I've done that.

@jameslamb
Copy link
Collaborator Author

I've built the R package from this branch and submitted it to CRAN as v4.2.0.

In addition to all the passed tests here, tested the locally-built package by running R CMD check --as-cran on my mac (Intel) and by submitting it to win-builder.

Logs to successful win-builder r-devel build: https://win-builder.r-project.org/lWkS6twSPNK8/00check.log

@shiyu1994 I'm not sure, but you might receive an email from CRAN asking about the maintainer change. Please check at your earliest convenience, and click any confirmation links they send you.

I'll post updates here as the checks run on CRAN.

@jameslamb
Copy link
Collaborator Author

🎉 the package passed the 2 automatic checks!

I just got a message from CRAN saying that, and that they'll continue with other checks once you confirm the maintainer change @shiyu1994 .

package lightgbm_4.2.0.tar.gz has been auto-processed.
We are waiting for confirmation from the old maintainer address now.

Log dir: https://win-builder.r-project.org/incoming_pretest/lightgbm_4.2.0_20231208_055726/
The files will be removed after roughly 7 days.
Installation time in seconds: 420
Check time in seconds: 199
R Under development (unstable) (2023-12-07 r85661 ucrt)

Pretests results:
Windows: https://win-builder.r-project.org/incoming_pretest/lightgbm_4.2.0_20231208_055726/Windows/00check.log
Status: 1 NOTE
Debian: https://win-builder.r-project.org/incoming_pretest/lightgbm_4.2.0_20231208_055726/Debian/00check.log
Status: 1 NOTE

@shiyu1994
Copy link
Collaborator

@jameslamb I've confirmed the changes.

@jameslamb
Copy link
Collaborator Author

thank you!!

@jameslamb
Copy link
Collaborator Author

R-package update: so far, so good 🎉

  • passed the CRAN pre-checks
  • binaries have been built for macOS (x86_64 and arm64)
  • binary for Windows R-devel has been built
  • checks have passed for 6 CRAN check flavors
image

https://cran.r-project.org/web/checks/check_results_lightgbm.html

@mayer79
Copy link
Contributor

mayer79 commented Dec 9, 2023

Wonderful, thank you so much!

@jameslamb
Copy link
Collaborator Author

The CRAN checks for the R package are progressing well!

Only 2 of the main CRAN checks remain (unsure how many of the extra ones from https://cran.r-project.org/web/checks/check_issue_kinds.html will be run or if any have been run already).

Screen Shot 2023-12-12 at 10 50 59 PM

https://cran.r-project.org/web/checks/check_results_lightgbm.html


Given that... let's continue with the release. Here's my proposed sequence:

  1. merge [python-package] Allow to pass Arrow table for prediction #6168
  2. merge [ci] [R-package] allow more possibly-lost warnings from valgrind #6233
  3. re-run valgrind checks and confirm they work on this branch
  4. add any versionadded:: annotations in docs
  5. manually test on M1/M2 mac (I'll do that on my laptop)
  6. get approvals from everyone
  7. merge this branch and do all the other checklist stuff (add new git tag, create GitHub release, upload to PyPI, etc.)

@jameslamb
Copy link
Collaborator Author

manually test Python and R packages on M1/M2 Mac

Seems that LightGBM can be compiled successfully on arm64 Macs, but experiences deadlocks if OpenMP is enabled (which is the default). Looks like #4229 might still be an issue with newer versions of libomp.

I'll put some time into that for the next release... but I don't think it should stop this one, as it's been a problem for a while.

R-package details (click me)

The R package is passing all checks on CRAN's arm64 Mac checks:

But it isn't finding OpenMP. e.g., see https://www.r-project.org/nosvn/R.check/r-release-macos-arm64/lightgbm-00install.html

* installing *source* package ‘lightgbm’ ...
** package ‘lightgbm’ successfully unpacked and MD5 sums checked
** using staged installation
checking location of R... /Library/Frameworks/R.framework/Resources
checking whether MM_PREFETCH works... no
checking whether MM_MALLOC works... yes
checking whether OpenMP will work in a package... no
***********************************************************************************************
 OpenMP is unavailable on this macOS system. LightGBM code will run single-threaded as a result.
 To use all CPU cores for training jobs, you should install OpenMP by running

     brew install libomp
***********************************************************************************************
configure: creating ./config.status
Python package details (click me)

On my M2 Mac with the following:

  • OS: macOS 14.1.2 (Sonoma)
  • compiler: AppleClang 15.0.0
  • Python: 3.11.7

I ran the following on this branch

sh build-python.sh sdist
pip install ./dist/lightgbm-4.2.0.tar.gz

lightgbm compiled successfully and could be imported (python -c "import lightgbm"), but I found that even the following simple program deadlocks (hangs indefinitely) during Dataset construction.

import lightgbm as lgb
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=10_000)
dtrain = lgb.Dataset(X, label=y)
dtrain.construct()

I saw a similar deadlock trying to run the tests.

pytest tests/python_package_test/test_basic.py

I uninstalled lightgbm and tried reinstalling with OpenMP support turned off.

pip uninstall lightgbm
pip install \
    --config-settings=cmake.define.USE_OPENMP=OFF \
    ./dist/lightgbm-4.2.0.tar.gz

When I did that, that simple example and the tests ran successfully and very fast.

@jameslamb
Copy link
Collaborator Author

I think this release is ready to go!

@guolinke @shiyu1994 @jmoralez could you please review?

@borchero let us know here if you have any questions about the release process or anything you see in the diff of this PR.

@@ -1,4 +1,4 @@
version: 4.1.0.99.{build}
version: 4.2.0.{build}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure why we are using 4.2.0 now instead of 4.2.0.99 to differentiate between released version and the one built from source?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we've done for previous releases, after this that'll get changed to 4.2.0.99 in a follow-up PR.

For example: #6090

This version doesn't affect any artifacts that are delivered to users or anything. Just the way builds are organized in the AppVeyor UI.

https://ci.appveyor.com/project/guolinke/lightgbm/history

image

Doing this on the release PR ensures these builds are identifiable in the future as belong to the 4.2.0 release.

differentiate between released version and the one built from source

The commit produced when we merge this PR will be the released version.

@jameslamb
Copy link
Collaborator Author

Thanks everyone! I'll publish the release some time today.

Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness in the reviewer list 😁🚀

@shiyu1994
Copy link
Collaborator

@jameslamb Thanks for your explanation!

@jameslamb jameslamb merged commit 0a9a6bb into master Dec 21, 2023
42 checks passed
@jameslamb jameslamb deleted the release/v4.2.0 branch December 21, 2023 04:28
@jameslamb
Copy link
Collaborator Author

jameslamb commented Dec 21, 2023

Ran the following to create the v4.2.0 tag and update the stable tag.

git fetch upstream --tags
git tag -d stable
git push upstream :refs/tags/stable
git tag stable
git tag v4.2.0
git push upstream stable v4.2.0

(NOTE: I alias this repo to upstream and my fork to origin in my git settings)

https:/microsoft/LightGBM/tags

image

That triggered an Azure DevOps build which should create the release automatically: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=15616&view=results. This takes around 90 minutes (because of the QEMU CI job).

I'll do the remaining tasks tomorrow.

@jameslamb
Copy link
Collaborator Author

v4.2.0 release has been created: https:/microsoft/LightGBM/releases/tag/v4.2.0

I'll handle PyPI, NuGet, and homebrew later today.

@jameslamb
Copy link
Collaborator Author

Update version and commit hash in Homebrew formula

Homebrew/homebrew-core#157978

@jameslamb
Copy link
Collaborator Author

Upload release to test PyPI
Upload release to PyPI.

Uploaded v4.2.0 to test PyPI

gh release download \
    --repo microsoft/LightGBM \
    --dir ./artifacts \
    --pattern 'lightgbm*-py3-*.whl' \
    --pattern 'lightgbm-4.2.0.tar.gz' \
    v4.2.0

twine upload \
    -r testpypi \
    ./artifacts/*

(gh is the GitHub CLI, see https://cli.github.com/manual/gh_release_download)

Then confirmed that installing the latest wheel works.

pip install -i https://test.pypi.org/simple/ 'lightgbm==4.2.0'
python ./examples/python-guide/logistic_regression.py

Then pushed them to real PyPI.

twine upload \
    ./artifacts/*
image

@jameslamb
Copy link
Collaborator Author

Add new tag to RTD versions and trigger a new build.

Remove the release branch from RTD versions

These are done. v4.2.0 are now available on readthedocs.

@jameslamb
Copy link
Collaborator Author

Published to NuGet: https://www.nuget.org/packages/LightGBM/4.2.0

And with that, this release is done! Thanks again to everyone who contributed 👋🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants