Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consistent platform priority #1302

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

sawenzel
Copy link

@sawenzel sawenzel commented Apr 5, 2024

This fixes an inconsistency observed in GRID jobs when loading a software package "A".

The bug was:

  • The GRID runtime determines available platforms (say el8 and el7) for package A
  • then launches an el8 apptainer container
  • alienv executed inside el8 container loads el7 software, inconsistent with the choice of el8 container ... and despite the fact that el8 software A is actually available.

Bug fixed by slightly adjusting the platform priority: We always search the current platform first of all before going elsewhere.

This fixes an inconsistency observed in GRID jobs when loading a software package "A".

The bug was:
- The GRID runtime determines available platforms (say el8 and el7) for package A 
- then launches an el8 apptainer container
- alienv executed inside el8 container loads el7 software, inconsistent with the choice of el8 container ... and despite the fact that el8 software A is actually available.

Bug fixed by slightly adjusting the platform priority: We always search the current platform first of all before going elsewhere.
@sawenzel
Copy link
Author

sawenzel commented Apr 5, 2024

Goes back to an observation made in ticket https://its.cern.ch/jira/browse/O2-4804.

@ktf
Copy link
Member

ktf commented Apr 5, 2024

yes , although this is just a copy of the cvmfs files, so the changes should be done there (as already done for el9 actually).

@sawenzel
Copy link
Author

sawenzel commented Apr 5, 2024

Well. Isn't this repo the authoritative source for the script on cvmfs? I thought we merely publish it there... directly from here. (Anything else appears be error-prone.)

In any case, I don't know how to edit files on cvmfs.

@ktf
Copy link
Member

ktf commented Apr 5, 2024

agreed. There are historical reasons for this and I guess it is time to fix them once I am back.

@@ -67,7 +67,7 @@ case $distro_name in
;;
8*)
distro_xrelease=8.x
platform=el7
platform=el8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is now on CVMFS.

# The above priority is a general search priority. However, given a platform, we should actually
# start searching for software matching this precise platform and only later fall back to other
# platforms. This prevents loading el9 or el7 software, when I am actually on el8 (and el8 software is available).
PLATFORM_PRIORITY="${platform}-$uname_m ${PLATFORM_PRIORITY//${platform}-$uname_m/}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this happen? We should never have O2 / O2Physics tags for different architectures with the same name.

Copy link
Author

@sawenzel sawenzel May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your question (but I suppose that we might in fact have the same name on different arches). This line fixes this problem mentioned above: "alienv executed inside el8 container loads el7 software, inconsistent with the choice of el8 container ... and despite the fact that el8 software A is actually available."

The fix is simply to look first of all if we have software for el8 ... instead of loading the el7 one simply because it is higher up the list. Refining the search list seems a reasonable thing to do and I don't see what might speak against this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reproducer:

ssh lxplus8.cern.ch
/cvmfs/alice.cern.ch/bin/alienv enter O2sim::async-2023-pp-apass3-20240320.1-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants