Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a self-hosted incremental linker #1535

Closed
andrewrk opened this issue Sep 17, 2018 · 18 comments
Closed

create a self-hosted incremental linker #1535

andrewrk opened this issue Sep 17, 2018 · 18 comments
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Milestone

Comments

@andrewrk
Copy link
Member

LLD has no plans to do incremental linking, even though incremental linking is a perfect fit for zig's debug builds. Not good enough. Zig project will have its own linker.

This is a huge project - support has to be explicitly added for every target OS and every target architecture.

Once this project is far enough along, zig will drop its dependency on LLD. No reason to have both.

@andrewrk andrewrk added contributor friendly This issue is limited in scope and/or knowledge of Zig internals. standard library This issue involves writing Zig code for the standard library. labels Sep 17, 2018
@andrewrk andrewrk added this to the 1.1.0 milestone Sep 17, 2018
@ghost
Copy link

ghost commented Sep 17, 2018

Isn’t linking about as fast as cat?
If so how is this big effort worth it vs just relinking everything?

@andrewrk
Copy link
Member Author

andrewrk commented Sep 17, 2018

It's about 5x slower than cat.

Also I just tried catting a debug build of clang to /dev/null and it took 3 seconds on my SSD. That's not fast enough. If you change a single function and recompile a large project, zig should only have to compile 1 function and update only the bytes of that function in the output file.

@ghost
Copy link

ghost commented Sep 17, 2018

Ok as much as I‘d argue against rewrite, this seems to make sense in the long run indeed.
Maybe someone else has started the same effort until 1.1 bekomes a thing. Maybe rust, at least they have function level recompile as well.

@ghost
Copy link

ghost commented Sep 17, 2018

There is this Google Project „Gold“ linker which goal is/ was both fast and incremental linking

https://www.airs.com/blog/archives/38

https:/pathscale/binutils/tree/master/gold

Commit dates seme a bit dated but goals should be perfect so at least could be a guide, maybe it even still works, that would be quite something...

Doing a bit more digging it seems gold added incremental linking, but at least for full linking its half as fast as lld, for whatever reason.
Maybe lld is not as slow afterall.

@Sahnvour
Copy link
Contributor

LLD is crazy fast already, but incremental linking could help on huge projects.

Do you mean LLD will explicitely not have incremental linking ?

@ghost
Copy link

ghost commented Sep 17, 2018

Maybe rust, at least they have function level recompile as well.

regarding rust, they are thinking about it as well
rust-lang/rust#39915 (comment) incremental is possible on windows already so I don't think its absurd to suggest lld will maybe add this?
(they also talk about gold, does not seem to be completely dead, just not working for them but maybe works for zig incremental linking?)

given that lld is already very fast, supports many/ all? important platforms and zig and rust would have the same demand...

@andrewrk
Copy link
Member Author

Do you mean LLD will explicitely not have incremental linking ?

Yes, I've asked them about it and others have asked them about it, and they just want a fast but deterministic linker that redoes the whole job every time.

@andrewrk
Copy link
Member Author

Another motivation for this issue is that the MACH-O code in LLD is poor quality, and nobody in the LLVM community wants to improve it.

See for example https://gist.github.com/srgpqt/61163a279baa4f8d41b01a653c2635bc

A self-hosted linker would fit into the bootstrapping plan (#853), no problem. If we had a self-hosted linker and we dropped the dependency on LLD:

  • stage1 builds with the system C++ compiler and linker. (status quo)
  • stage2 builds with stage1 and the system linker.
  • stage3 builds with stage2

@bnoordhuis
Copy link
Contributor

Not that I want to move the goalposts too much but a custom linker means no LTO unless someone writes that as well (super hard!)

@andrewrk
Copy link
Member Author

andrewrk commented Oct 1, 2018

Our current plan for LTO is emitting everything into a single .o file and running -O3 on that. That's what stage 1 does. It's really slow, but for a release build that's the trade-off.

Meanwhile in debug builds the plan is to split into as many .o files as would speed up the compilation.

It's possible that there may be a setting for release mode, how much to compromise the optimization in exchange for faster build times and less compilation memory requirements.

@ghost
Copy link

ghost commented Oct 1, 2018

The other thing that makes me suspicious is that rust manages to cross compile and they use lld?

And there is also thinLTO which is actually MUCH faster and that would go away as an option as well.

I mean it’s all your project so this is just meant as some thought to avoid unnecessary work but it’s your decision.

reading http://lists.llvm.org/pipermail/llvm-dev/2018-June/123782.html sounds indeed quite discouraging and surprising tbh.

@andrewrk
Copy link
Member Author

andrewrk commented Oct 1, 2018

I would love to know how rust accomplishes cross compiling for the MacOS target. It's very unlikely they're using LLD.

@ghost
Copy link

ghost commented Oct 1, 2018

It's very unlikely they're using LLD.

I forgot to adjust my comment, adding a reference to the rust thread I already linked to previously.


searching though their issues revealed another problem with creating an own linker
rust-lang/rust#54637

For a project at Google, we need retpoline support.

We should still support retpolines when plt is used. That being said, it seems to me that this is almost entirely a linker’s job.


rust-lang/rust#39915 (comment)

they try to use LLD

Once that's done we can advertise it to the community, asking for feedback. Here we can gain both timing information as well as bug reports to send to LLD. If everything goes smoothly (which is sort of doubtful with a whole brand new linker, but hey you never know!) we can turn it on by default, otherwise we can work to stabilize the selection of LLD and then add an option to Cargo.toml so projects can at least opt-in to it.


about the current status here is a search for lld related commits
https:/rust-lang/rust/search?q=lld&type=Commits

as far as I can tell they have their own lld fork like zig which they call rust-lld
at least for ARM https://www.reddit.com/r/rust/comments/9a7te2/nightly_rust_is_switching_to_use_lld_llvms_new/#bottom-comments

rg -F "linker: Some("
src/librustc_target/spec/riscv32imac_unknown_none_elf.rs
28:            linker: Some("rust-lld".to_string()),

src/librustc_target/spec/windows_base.rs
77:        linker: Some("gcc".to_string()),

src/librustc_target/spec/thumb_base.rs
46:        linker: Some("rust-lld".to_string()),

src/librustc_target/spec/armebv7r_none_eabihf.rs
31:            linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/wasm32_unknown_unknown.rs
54:        linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/msp430_none_elf.rs
34:            linker: Some("msp430-elf-gcc".to_string()),

src/librustc_target/spec/riscv32imc_unknown_none_elf.rs
28:            linker: Some("rust-lld".to_string()),

src/librustc_target/spec/l4re_base.rs
35:        linker: Some("ld".to_string()),

src/librustc_target/spec/aarch64_unknown_none.rs
23:        linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/armv7r_none_eabihf.rs
31:            linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/armv7r_none_eabi.rs
31:            linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/armebv7r_none_eabi.rs
31:            linker: Some("rust-lld".to_owned()),

windows seems different from lld

src/librustc_target/spec/windows_base.rs
77:        linker: Some("gcc".to_string()),

about Mac there is
https:/rust-lang/rust/blob/master/src/librustc_target/spec/i686_apple_darwin.rs#L17


sorry this turned into a bit of a mess

andrewrk added a commit that referenced this issue Oct 6, 2018
See #1535

we'll have a better macos linker someday
andrewrk added a commit that referenced this issue Nov 2, 2018
 * add a --system-linker-hack command line parameter to work around
   poor LLD macho code. See #1535
 * build.zig correctly handles static as well as dynamic dependencies
   when building the self hosted compiler.
   - no more unnecessary libxml2 dependency
   - a static build on macos produces a completely static self-hosted
     compiler for macos (except for libSystem as intended).
andrewrk added a commit that referenced this issue Nov 2, 2018
 * add a --system-linker-hack command line parameter to work around
   poor LLD macho code. See #1535
 * build.zig correctly handles static as well as dynamic dependencies
   when building the self hosted compiler.
   - no more unnecessary libxml2 dependency
   - a static build on macos produces a completely static self-hosted
     compiler for macos (except for libSystem as intended).
@bheads
Copy link

bheads commented Nov 16, 2018

Maybe some useful links on linkers:
https://www.iecc.com/linker/
https://www.airs.com/blog/index.php?s=linker

@andrewrk
Copy link
Member Author

Linkers & Loaders is a brilliant resource. This book helped me go from being clueless, to generally understanding what linkers do enough that I can read linker source code and understand the concepts. Note that the author has recommended that people check out a copy at a library or maybe even buy the book because the online content is outdated and therefore contains errors.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 5, 2019

Another motivation for having our own linker has to do with Thread Local Storage (#924). Thread local variables go in the .tdata and .tbss sections, and then the linker merges them together. On Linux, libc looks at the AUXVAL for the PT_TLS Program Header Entry to find out the size of the TLS at runtime. musl-libc, for example, preallocates 16 * pointer_size bytes for TLS but then has to call mmap if it isn't enough:

static struct builtin_tls {
    char c;
    struct pthread pt;
    void *space[16];
} builtin_tls[1];

...

    if (libc.tls_size > sizeof builtin_tls) {
#ifndef SYS_mmap2
#define SYS_mmap2 SYS_mmap
#endif
        mem = (void *)__syscall(
            SYS_mmap2,
            0, libc.tls_size, PROT_READ|PROT_WRITE,
            MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
        /* -4095...-1 cast to void * will crash on dereference anyway,
         * so don't bloat the init code checking for error codes and
         * explicitly calling a_crash(). */
    } else {
        mem = builtin_tls;
    }

This all happens before main. On the other hand, if we had our own linker, we could have a special placeholder for the statically allocated TLS array, and thus always avoid this mmap before main.

It works differently on Windows. I haven't looked up how it works there yet.

@zimmi
Copy link
Contributor

zimmi commented Sep 13, 2019

Maybe relevant: Building a better Go linker

@andrewrk
Copy link
Member Author

This exists now: https:/ziglang/zig/blob/master/src-self-hosted/link.zig

It's far from complete and so far only addresses the needs of the self-hosted compiler, and no work has been put into making it link arbitrary objects. But that's the direction it is headed. Bugs & additional features can be separate issues from this one.

@andrewrk andrewrk modified the milestones: 1.1.0, 0.7.0 Aug 21, 2020
@andrewrk andrewrk added frontend Tokenization, parsing, AstGen, Sema, and Liveness. and removed standard library This issue involves writing Zig code for the standard library. labels Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Projects
None yet
Development

No branches or pull requests

5 participants