Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Cargo's target information when possible #1225

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

madsmtm
Copy link
Contributor

@madsmtm madsmtm commented Sep 30, 2024

rustc's target triples generally only have a vague resemblance to each other and to the information needed by cc. Let's instead prefer CARGO_CFG_* variables when available, since these contain the information directly from the compiler itself.

In the cases where it isn't available (i.e. when running outside of a build script), we fall back to parsing the target triple, but instead of doing it in an ad-hoc fashion with string manipulation, we do it in a more structured fashion up front.

Fixes #1219 (at least my main gripe with the current implementation).

Builds upon #1224.

src/target.rs Outdated Show resolved Hide resolved
src/target.rs Outdated Show resolved Hide resolved
rustc's target triples generally only have a vague resemblance to each
other and to the information needed by `cc`. Let's instead prefer
`CARGO_CFG_*` variables when available, since these contain the
information directly from the compiler itself.

In the cases where it isn't available (i.e. when running outside of a
build script), we fall back to parsing the target triple, but instead of
doing it in an ad-hoc fashion with string manipulation, we do it in a
more structured fashion up front.
src/lib.rs Show resolved Hide resolved
src/target.rs Outdated Show resolved Hide resolved
src/target.rs Outdated Show resolved Hide resolved
src/target.rs Outdated Show resolved Hide resolved
Some("ppc64")
} else {
None
fn map_darwin_target_from_rust_to_compiler_architecture(target: &Target) -> &str {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be in target.rs

I also wonder if rustc provides these info?
It looks like something that could be available in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do generation, then we might be able to extract it from llvm_target.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be nice, I think cc has too many hardcoded information in it, and it gets hard to maintain or read very quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to punt on this for later, as we're not guaranteed to be able to know llvm_target (e.g. custom target JSON and/or unknown targets), and so we'll have to implement some sort of conversion anyhow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

This PR is already a lot of improvements to cc codebase.

@madsmtm
Copy link
Contributor Author

madsmtm commented Oct 2, 2024

I've now gone through my own FIXMEs, and opened PRs against rustc to fix the discrepancies I saw when trying to parse the target information.

@madsmtm
Copy link
Contributor Author

madsmtm commented Oct 8, 2024

I've implemented the approach discussed in #1225 (comment) now.

@madsmtm madsmtm marked this pull request as ready for review October 8, 2024 21:01
Comment on lines +16 to +19
writeln!(
f,
"pub(crate) fn get(target_triple: &str) -> Option<Target> {{"
)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think write_str is more readable, since you can use { instead of {{, which is really hard to read

Suggested change
writeln!(
f,
"pub(crate) fn get(target_triple: &str) -> Option<Target> {{"
)?;
f.write_str("pub(crate) fn get(target_triple: &str) -> Option<Target> {"
)?;

Comment on lines +30 to +37
writeln!(f, " {triple:?} => Target {{")?;
writeln!(f, " full_arch: {full_arch:?}.into(),")?;
writeln!(f, " arch: {arch:?}.into(),")?;
writeln!(f, " vendor: {vendor:?}.into(),")?;
writeln!(f, " os: {os:?}.into(),")?;
writeln!(f, " env: {env:?}.into(),")?;
writeln!(f, " abi: {abi:?}.into(),")?;
writeln!(f, " }},")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it possible to have an array of [(&'static str, Target); N]?

I think writing a large match here is less readable than having a sorted array.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suspect that a match would generate more code than a linear/binary search in sorted array, thus taking longer to compile.

Matching would basically be lots of ifs, the compiler might also optimize the if into a jump table, but that would still mean a lot of Target creation at runtime with OOM panic handling blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I seemed to remember Cow not really being usable in consts because they have drop glue, but apparently I remembered wrong, and I think it's doable.

///
/// This differs from `cfg!(target_arch)`, which only specifies the
/// overall architecture, which is too coarse for certain cases.
pub full_arch: Cow<'static, str>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use &'a str here, it'd be simpler and a smaller struct.

For from_cargo_environment_variables, we can cache the environment variables in cc::Build so that we can return a Target<'_>, we already do env caching so this is not a problem.

We can use the homebrew OnceLock impl within our codebase, so that we could avoid lifetime problems with MutexGuard/RwLockGuard.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. I'm not worried/obsessed about the allocation as it doesn't matter much, I just prefer a simpler structure.

Caching the env var in Build would also be consistent with how other env var currently works:

Once we read the env var from system into Build, it's never changed in that object and any objects created by Build::clone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't see how OnceLock would help with avoiding a guard? Do you mean to store cached environment variables in a global instead?

Copy link
Collaborator

@NobodyXu NobodyXu Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put OnceLock inside cc::Build struct.

For example:

struct Build {
    target_triple: Arc<OnceLock<String>>,
    // ...
}

store individual fields of Target directly in it, and return Target<'_> with a lifetime.

Arc is used to be consistent with existing env cache code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Build::from_cargo or similar to make it very explicit when in a build script
2 participants