Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching symlink inputs for top-level output #11536

Closed
wants to merge 2 commits into from

Conversation

ulfjack
Copy link
Contributor

@ulfjack ulfjack commented Jun 2, 2020

Top-level output files can be symlinks to non-top-level output files, such as
in the case of a cc_binary linking against a .so built in the same build.

I reuse the existing mayInsensitivelyPropagateInputs() method, which was
originally introduced for action rewinding, but seems to have exactly the
intended semantics here as well.

Unfortunately, this requires a small change to the BlazeModule API to pass in
the full analysis result (so we can access the action graph, so we can read
the aformentioned flag).

I considered using execution-phase information on whether an output is a
symlink. However, at that point it's too late! The current design only allows
pulling an output file when its generating action runs. It would be better if
this was changed to pull output files independently of action execution. That
would avoid a lot of problems with the current design, such as #10902.

Fixes #11532.

Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921

@ulfjack
Copy link
Contributor Author

ulfjack commented Jun 2, 2020

@buchgr (who wrote the code originally)
I'd page Mark, but he's apparently not on GitHub... (because I'm abusing a flag in Action that is used by rewinding) (sorry @mschaller)

@benjaminp
Copy link
Collaborator

itym @anakanemison

@ulfjack
Copy link
Contributor Author

ulfjack commented Jun 3, 2020

@anakanemison is not linked to Google. Might want to do that?

@aiuto aiuto requested review from gregestren and janakdr June 4, 2020 01:52
ActionExecutionMetadata action =
(ActionExecutionMetadata) analysisResult.getActionGraph().getGeneratingAction((Artifact) actionInput);
if (action.mayInsensitivelyPropagateInputs()) {
filesToDownload.addAll(action.getInputs().toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for there to be more than one symlink-ish action chained together? My concern is that this may be insufficient if the inputs to this action are outputs of other symlink actions. Rewinding recurses in such a case. Perhaps that would be overkill here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@@ -282,13 +283,13 @@ public BuildOptions getDefaultBuildOptions(BlazeRuntime runtime) {
* @param env the command environment
* @param request the build request
* @param buildOptions the build's top-level options
* @param configuredTargets the build's requested top-level targets as {@link ConfiguredTarget}s
* @param analysisResult the build's requested top-level targets as {@link ConfiguredTarget}s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this description deserves some rewording, given that the parameter's type changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, reworded.

@anakanemison
Copy link
Contributor

Sorry for the delay. I'm not used to GitHub! I think I'm in the Google org on GitHub already.
But I wasn't in bazelbuild, until Tony extended me an invite tonight!

@ulfjack
Copy link
Contributor Author

ulfjack commented Jun 4, 2020

No worries, and thanks for the review. GitHub seems to think that @anakanemison is not in the Google org. For example, you can compare your profile page with Jakob's: @buchgr vs. @anakanemison

Did you add your @google.com address to your GitHub profile?

@anakanemison
Copy link
Contributor

Thanks for pointing out my empty profile! I entered some information there. (Looks like I could say anything I wanted there..)

LGTM. I'll do the internal merging thing and respond here when the change is in.


private static void fetchSymlinkDependenciesRecursively(
ActionGraph actionGraph, Set<ActionInput> builder, Artifact artifact) {
ActionExecutionMetadata action = (ActionExecutionMetadata) actionGraph.getGeneratingAction(artifact);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if a top-level artifact is a tree artifact generated by a template action? I think that can happen?

If so, then it should be ok to ignore non-ActionExecutionMetadata instances, since templates shouldn't insensitively propagate their inputs (although I guess theoretically there could be a template action that just symlinked its inputs?). Could pull #mayInsensitivelyPropagateInputs() up to ActionAnalysisMetadata too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Right now that leads to a Bazel crash. Fixed!

The main issue right now are SymlinkAction and SymlinkTreeAction, but in general, any remotely run action can return a symlink in the open-source remote execution API. Therefore, this will eventually need to be rewritten to follow symlinks at execution time. The current design mixes up action execution and output fetching, and separating them should actually result in a nicer implementation, because we can then remove the code that injects remote execution flags into Skyframe.

So yeah, this is a short-term fix to get cc_binary to work properly, and there's more work needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a TODO for that somewhere appropriate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, now that I understand this better, could you add a test exhibiting this behavior? I assume it would be a Starlark action that created a symlink on the remote worker?

The behavior of creating symlinks on the remote worker seems very problematic to me. What happens if the action creates an intermediate file and then its final output is a symlink to that file? We'll just download a dangling symlink? Is there any chance of getting rid of this symlinking feature altogether?

@ulfjack
Copy link
Contributor Author

ulfjack commented Jun 19, 2020

Added a test for top-level template actions and fixed the resulting ClassCastException. PTAL.


private static void fetchSymlinkDependenciesRecursively(
ActionGraph actionGraph, Set<ActionInput> builder, Artifact artifact) {
ActionExecutionMetadata action = (ActionExecutionMetadata) actionGraph.getGeneratingAction(artifact);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a TODO for that somewhere appropriate?

@ulfjack
Copy link
Contributor Author

ulfjack commented Aug 6, 2020

Added a TODO as requested. Sorry for the delay - the last month was pretty hectic.

if (remoteOutputsMode != null && remoteOutputsMode.downloadToplevelOutputsOnly()) {
Preconditions.checkState(actionContextProvider != null, "actionContextProvider was null");
boolean isTestCommand = env.getCommandName().equals("test");
TopLevelArtifactContext artifactContext = request.getTopLevelArtifactContext();
ImmutableSet.Builder<ActionInput> filesToDownload = ImmutableSet.builder();
for (ConfiguredTarget configuredTarget : configuredTargets) {
Set<ActionInput> filesToDownload = new HashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be Artifact instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes test outputs which are not declared as artifacts, so no, not easily.


private static void fetchSymlinkDependenciesRecursively(
ActionGraph actionGraph, Set<ActionInput> builder, Artifact artifact) {
ActionExecutionMetadata action = (ActionExecutionMetadata) actionGraph.getGeneratingAction(artifact);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, now that I understand this better, could you add a test exhibiting this behavior? I assume it would be a Starlark action that created a symlink on the remote worker?

The behavior of creating symlinks on the remote worker seems very problematic to me. What happens if the action creates an intermediate file and then its final output is a symlink to that file? We'll just download a dangling symlink? Is there any chance of getting rid of this symlinking feature altogether?

@ulfjack
Copy link
Contributor Author

ulfjack commented Aug 19, 2020

Bazel currently errors out if the remote executor returns a dangling symlink, and also if it returns a symlink when build-without-the-bytes is enabled.

@ulfjack
Copy link
Contributor Author

ulfjack commented Aug 19, 2020

Not quite: it returns an error if an action generates a symlink and --remote_download_minimal is active, but not if --remote_download_toplevel is active. I added a test for this.

@ulfjack
Copy link
Contributor Author

ulfjack commented Aug 19, 2020

Note that genrules can also generate symlinks on the remote machine.

Copy link
Contributor

@janakdr janakdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test!

Just the question about the declaration of Set and my new request for the logline.

}
ActionExecutionMetadata action = (ActionExecutionMetadata) actionGraph.getGeneratingAction(artifact);
if (action.mayInsensitivelyPropagateInputs()) {
List<Artifact> inputs = action.getInputs().toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a warning-level logline here if the list is >5 elements or so? I don't think we expect such actions to insensitively propagate inputs, and if there's a bug, we could slow down significantly due to nested set expansion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Top-level output files can be symlinks to non-top-level output files, such as
in the case of a cc_binary linking against a .so built in the same build.

I reuse the existing mayInsensitivelyPropagateInputs() method, which was
originally introduced for action rewinding, but seems to have exactly the
intended semantics here as well.

Unfortunately, this requires a small change to the BlazeModule API to pass in
the full analysis result (so we can access the action graph, so we can read
the aformentioned flag).

I considered using execution-phase information on whether an output is a
symlink. However, at that point it's too late! The current design only allows
pulling an output file *when its generating action runs*. It would be better if
this was changed to pull output files independently of action execution. That
would avoid a lot of problems with the current design, such as bazelbuild#10902.

Fixes bazelbuild#11532.

Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921
Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427
@bazel-io bazel-io closed this in 07e152e Sep 21, 2020
Yannic pushed a commit to Yannic/bazel that referenced this pull request Oct 5, 2020
Top-level output files can be symlinks to non-top-level output files, such as
in the case of a cc_binary linking against a .so built in the same build.

I reuse the existing mayInsensitivelyPropagateInputs() method, which was
originally introduced for action rewinding, but seems to have exactly the
intended semantics here as well.

Unfortunately, this requires a small change to the BlazeModule API to pass in
the full analysis result (so we can access the action graph, so we can read
the aformentioned flag).

I considered using execution-phase information on whether an output is a
symlink. However, at that point it's too late! The current design only allows
pulling an output file *when its generating action runs*. It would be better if
this was changed to pull output files independently of action execution. That
would avoid a lot of problems with the current design, such as bazelbuild#10902.

Fixes bazelbuild#11532.

Change-Id: Iaf1e48895311fcf52d9e1802d53598288788a921

Closes bazelbuild#11536.

Change-Id: Ie1bf49a8d08f0b2422426ecd95fe79b3686f8427
PiperOrigin-RevId: 332939828
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remote_download_outputs=toplevel doesn't download libraries in cc_import needed to run.
5 participants