Use sccache for builds #1724

jrmuizel · 2017-09-19T20:32:31Z

mozilla/sccache#179 has some instructions.

When I last looked at these I believe I couldn't figure out how exactly to communicate the encrypted AWS key to sccache. The details are fuzzy though.

jrmuizel · 2017-09-20T17:34:28Z

In talking with kats I realized that we might be better off just trying to get sccache to work with task-cluster directly instead of getting it working with travis first.

jrmuizel · 2017-09-20T17:37:06Z

Here's an example of using secrets with taskcluster:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/upload-generated-sources/kind.yml#35
https://dxr.mozilla.org/mozilla-central/source/build/upload_generated_sources.py#94

and upload the secret using https://tools.taskcluster.net/secrets

jrmuizel · 2017-09-20T17:39:08Z

Also:
jrmuizel: you can set env vars for sccache for AWS creds (assuming your CI won't leak them in logs), or you could have a little script to rewrite the JSON secret into ~/.aws/credentials format
ted
https:/mozilla/sccache/blob/master/src/simples3/credential.rs
ted
is the code that sccache uses to find AWS creds (forked from rusoto)

and https://dxr.mozilla.org/mozilla-central/source/taskcluster/scripts/builder/build-linux.sh#55

glennw · 2017-09-20T20:33:44Z

@jrmuizel @metajack What is the likelihood of getting someone at Mozilla who works on this stuff to look into this officially (running CI on task cluster)? I certainly don't know enough about that stuff to look into it.

We haven't been able to merge anything for ~3 days with the latest travis issues, which is following another fortnight of similar issues...

jrmuizel · 2017-09-20T20:43:38Z

@staktrace is looking at this a bit right now and we can probably get @luser to help before he goes on PTO next week.

staktrace · 2017-09-20T21:25:52Z

At the moment I only have basic linux64 taskcluster integration working. There are two steps involved:

Install the integration github app on the repo at https:/apps/taskcluster (I have this app installed for my fork, staktrace/webrender)
Apply the change from my taskcluster-ci branch - this just adds the .taskcluster.yml file and tweaks one of the reftest fuzz numbers.

After doing this, each PR update or push will trigger the taskcluster job. You can see a sample one (from my taskcluster-ci commit) at https://tools.taskcluster.net/groups/GNsTmjKaQyeF5v623NM6eQ - it runs the debug and release commands on whatever the current stable rust version is. @jrmuizel said that for now the "nightly" rust commands we can just leave on travis.

If we want to pin to a specific rust version, we can update the docker image to have that rust version preinstalled, and remove the rustup commands from the .taskcluster.yml script.

Next steps are figuring out how to hook up sccache and getting OS X jobs running. I think OS X is probably more important at this point, since with that we can start using taskcluster "in production", and then work on getting it faster with sccache.

I'd like to merge my taskcluster-ci branch as soon as possible but I'm not sure what effect that will have on bors/homu and the regular workflow. We should probably set up a "maintenance window" or something where we can do the merge, and ensure things are working or roll back if they aren't. Or if there is a test repo somewhere with bors/homu we can use that to try this out.

metajack · 2017-09-20T21:53:05Z

sccache works fine, so I'm not sure what in paritcular is causing problems, or if your request is actually related to sccache builds or just builds being messed up in general. It's pretty easy for us to move a particular repo over to our buildbot instances if that is what is needed.

staktrace · 2017-09-21T18:17:28Z

I talked to taskcluster folks and I have steps on setting up OS X worker machines for taskcluster, so that we can run our own CI farm. It's fairly straightforward and I have it running using my laptop as a test. I think we can rustle up some OS X machines in the Toronto office or hosted remotely somewhere and use them as dedicated CI machines for webrender.

As a bonus, since the OS X setup doesn't use docker, it doesn't reset the machine state after each job is run. This means even if we just use a local sccache we should get a good speedup.

jrmuizel · 2017-09-21T20:00:38Z

The easiest thing to do is probably just get a dedicated mac mini or two from macstadium and expense it.

staktrace · 2017-09-21T22:29:44Z

I set up the worker on the mac mini that jrmuizel rented from macstadium. It seems to be working ok.

Next step to move this along is to try it on servo/webrender instead of just my clone of the repo. Whoever owns servo/webrender needs to install the github-taskcluster integration tool from https:/apps/taskcluster.

Then we need to get :jonasfj to add the necessary scopes to this repo, so that it can spawn the "kats-webrender-ci-osx" type worker via the "localprovisioner" TC provisioner.

And then after that I can make a PR from my branch with the .taskcluster.yml file and see how bors/homu deal with it.

glennw · 2017-09-21T23:41:25Z

@metajack @larsbergstrom Is ^ something you can help with (enabling the TC tool on the WR repo)?

metajack · 2017-09-21T23:44:48Z

Should be done.

staktrace · 2017-09-22T14:58:14Z

Thanks. I got :jonasfj to add the scopes as well so we should be good to try the PR. I'll submit that shortly.

While I was waiting I installed sccache on the OS X worker but I ran into a rustc internal compiler error when building WR with it. I'll investigate that more but for now let's do this without sccache.

luser · 2017-09-22T15:17:55Z

We've seen that error before elsewhere:

thread 'rustc' panicked at 'failed to acquire jobserver token: Error { repr: Os { code: 35, message: "Resource temporarily unavailable" } }', src/libcore/result.rs:860:4

I think this is because cargo creates a make-style jobserver now, and it will pass it down to rustc (for use when you use codegen-units=N). There's some weird interaction here with how the jobserver fd gets passed down and I don't quite understand it.

staktrace · 2017-09-22T15:24:28Z

Yeah, I just commented in rust-lang/rust#42867 which appears to be tracking this problem.

staktrace · 2017-09-25T15:27:09Z

Quick update: I made PR #1746 to get the .taskcluster.yml file merged into the webrender repo. By default this will run the CI jobs via taskcluster for PRs by "collaborators" and for pushes. (We need to set allowPullRequests: public to make it run on PRs by anybody, see documentation). I did this intentionally since until we get everything hooked up it's not too useful to run the jobs on every random PR.

I looked at the bors and homu code/docs to figure out exactly what it is they do and what integration we need there. It seems like when we run CI with travis it notifies the result to the bots via webhooks. AFAICT taskcluster-github doesn't have webhook capability yet so we can either request that and wait for it, or just make the CI command itself call out to a webhook and report the success/failure.

The other important thing is that homu right now runs tests on the merge commit from the PR and latest master. So that means we need some way of triggering the taskcluster run from homu, the same way it triggers travis/appveyor runs. I haven't looked into if this is possible yet, it might be a feature that we need to request of the taskcluster-github integration tool. Have an API to do this will also allow us to make things like retry requests work. Right now retrying has to be done manually via the taskcluster task page, and even then it won't update the final status of the build (I filed bug 1402136 for this).

And finally, one more thing that would be nice is if taskcluster canceled obsolete jobs if e.g. somebody pushes new commits into a PR. It doesn't do this yet and it's an optimization but one that would be good to have. I filed bug 1402884 for this.

Add a .taskcluster.yml file to run CI using taskcluster This is a test PR to see if (a) taskcluster correctly picks up the PR and schedules the CI jobs and (b) to see how bors/homu deal with this extra CI job. This is related to #1724  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/1746)

glennw · 2017-09-26T05:39:49Z

Wow, nice work. The TC builds are so fast compared to how long the normal builds take!

staktrace · 2017-10-03T20:57:17Z

We need to set allowPullRequests: public to make it run on PRs by anybody, see documentation.

This is done now, in #1789.

The other important thing is that homu right now runs tests on the merge commit from
the PR and latest master.

I realized that there's no special magic needed to make this work. It looks like the merge head is pushed as the auto branch on the repo, and since it's a branch update the taskcluster CI runs on it automatically. So that's one less thing that needs to be done.

I think really the next thing we want to do here is set up a webhook equivalent for taskcluster, so that it can notify the bots on success/failure. And then have the bots accept travis || taskcluster as success conditions for landing the merge.

staktrace · 2017-10-13T15:30:56Z

#1871 adds "routes" to the .taskcluster.yml file which will allow us to listen for task-completion notifications. We would need code running somewhere that listens for the four tasks for a particular PR to complete successfully and uses that as a success condition for landing the PR. This can either be added to homu directly or run as a separate service that simulates a travis webhook, or something. With the mozillapulse python library doing most of the work it shouldn't be too hard to glue things together.

glennw · 2018-01-16T07:21:44Z

We are running CI on TaskCluster now. Do we still need this open @staktrace @jrmuizel ?

staktrace · 2018-01-16T14:43:57Z

I think we can close it. Until rust-lang/rust#42867 is solved we probably won't get sccache to work on the OS X builder anyway and it might not be worth the effort unless we start building up a backlog again.

staktrace mentioned this issue Sep 22, 2017

Add a .taskcluster.yml file to run CI using taskcluster #1746

Merged

kvark added area: infrastructure difficulty: moderate type: enhancement labels Oct 17, 2017

glennw closed this as completed Jan 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sccache for builds #1724

Use sccache for builds #1724

jrmuizel commented Sep 19, 2017

jrmuizel commented Sep 20, 2017

jrmuizel commented Sep 20, 2017

jrmuizel commented Sep 20, 2017

glennw commented Sep 20, 2017

jrmuizel commented Sep 20, 2017

staktrace commented Sep 20, 2017

metajack commented Sep 20, 2017

staktrace commented Sep 21, 2017

jrmuizel commented Sep 21, 2017

staktrace commented Sep 21, 2017

glennw commented Sep 21, 2017

metajack commented Sep 21, 2017

staktrace commented Sep 22, 2017

luser commented Sep 22, 2017

staktrace commented Sep 22, 2017

staktrace commented Sep 25, 2017

glennw commented Sep 26, 2017 •

edited

Loading

staktrace commented Oct 3, 2017

staktrace commented Oct 13, 2017

glennw commented Jan 16, 2018

staktrace commented Jan 16, 2018

Use sccache for builds #1724

Use sccache for builds #1724

Comments

jrmuizel commented Sep 19, 2017

jrmuizel commented Sep 20, 2017

jrmuizel commented Sep 20, 2017

jrmuizel commented Sep 20, 2017

glennw commented Sep 20, 2017

jrmuizel commented Sep 20, 2017

staktrace commented Sep 20, 2017

metajack commented Sep 20, 2017

staktrace commented Sep 21, 2017

jrmuizel commented Sep 21, 2017

staktrace commented Sep 21, 2017

glennw commented Sep 21, 2017

metajack commented Sep 21, 2017

staktrace commented Sep 22, 2017

luser commented Sep 22, 2017

staktrace commented Sep 22, 2017

staktrace commented Sep 25, 2017

glennw commented Sep 26, 2017 • edited Loading

staktrace commented Oct 3, 2017

staktrace commented Oct 13, 2017

glennw commented Jan 16, 2018

staktrace commented Jan 16, 2018

glennw commented Sep 26, 2017 •

edited

Loading