Skip to content

Conversation

@fmeum
Copy link
Collaborator

@fmeum fmeum commented May 23, 2022

With the new output mode --output=files, cquery lists all files advertised by the matched targets in the currently requested output groups.

This new mode has the following advantages over --output=starlark combined with an appropriate handcrafted --starlark:expr:

  • provides a canonical answer to the very common "Where are my build outputs?" question
  • is more friendly to new users as it doesn't require knowing about providers and non-BUILD dialect Starlark
  • takes the value of --output_groups into account
  • stays as close to the logic for build summaries printed by bazel build as possible

Fixes #8739

RELNOTES: cquery's new output mode --output=files lists the output files of the targets matching the query. It takes the current value of --output_groups into account.

@fmeum fmeum requested review from fweikert and gregestren as code owners May 23, 2022 12:36
@fmeum fmeum changed the title Add "files" output mode to cquery Add --output=files mode to cquery May 23, 2022
@fmeum fmeum force-pushed the cquery-output-files branch from f37b578 to a986ca1 Compare May 23, 2022 13:23
@aiuto aiuto requested review from aiuto and removed request for fweikert May 23, 2022 14:09
@aiuto aiuto added the team-Configurability platforms, toolchains, cquery, select(), config transitions label May 23, 2022
@aiuto
Copy link

aiuto commented May 23, 2022

I am very hesitant about adding this since it seems to be just a convenience for what can be done with starlark mode. We are trying to reduce the size of bazel and move features to starlark not the other way around. I can look at it later this week.

@fmeum
Copy link
Collaborator Author

fmeum commented May 23, 2022

I am very hesitant about adding this since it seems to be just a convenience for what can be done with starlark mode. We are trying to reduce the size of bazel and move features to starlark not the other way around. I can look at it later this week.

I generally agree with this position, but I think this case is a bit special: "Where is my build output?" is a very common question that is also quite relevant for developers that are not core Bazel users and just happen to interface with it. Think of an OSS project that needs to build another OSS project that uses Bazel and has to figure out the set of files it needs to import. If we have the barrier to entry somewhere at "knows about DefaultInfo", that may already be too high for people to comfortably build their OSS project with Bazel knowing that others less familiar with Bazel will have to answer such questions.

@gregestren
Copy link
Contributor

Disclaimer: I haven't reviewed this in any meaningful way yet.

Generally speaking I agree with @aiuto 's comment and default to a position of needing strong justification for new user-facing features.

I see @fmeum that you appreciate these points and are trying to make such a justification for this change. Whatever we ultimately decide I think it's fair for us to evaluate your points on their merit.

@pcjanzen
Copy link
Contributor

A wise man once said:

The major weakness of this model is that it makes paths completely opaque. This is a calculated compromise: it assumes new APIs can be developed to help users find the subset of outputs they care about, and that these APIs will be important no matter what path model Bazel uses.

@gregestren
Copy link
Contributor

Still support that statement 100%!

I'll hold back from saying more until I've properly read through this, and how all the themes connect.

@aiuto
Copy link

aiuto commented May 23, 2022

I think the value is not just from the output paths, but attaching a meaningful labeling to each. The output group name is generally good, but not everything produces that. Sometimes you have to go into the rule specific providers to understand the meaning of a file. Once that happens, we are back at writing rule specific starlark code.

@gregestren
Copy link
Contributor

Reviewed further. I support this, with some caveats.

  • As @pcjanzen wisely pointed out, I've long supported a proper API for finding build outputs.
  • That's because it's strongly in the dev team's interest to have flexibility to change output path patterns (bazel-out/arm-fastbuild-<hash>-...). This lets us focus on API and performance improvements freely. So builds can get faster and more cacheable at a faster pace.
  • Historically, users rely on exact path patterns (like looking at what bazel build produced and copying the exact path into downstream tools). That's the complete opposite of what we want. The only alternative is a canonical API.
  • This is important enough to be a) universal, b) easy to use, c) works out of the box
  • I think that overcomes the "keep the API simple" concerns mentioned
  • I never knew quite what that API would look like. This is a nice approach to it: leveraging existing concepts (cquery, output groups) clearly and I think effectively. And with room for further expansion
  • I still want to be careful to support backwards compatible API expansion
  • I agree IdentifyingName: output would be a more powerful API
  • That requires API expansion (I suppose extending the Starlark calls that declare outputs). Sounds manageable but would require its own design
  • For now, is it reasonable to output a raw list? Or a dictionary where every key is empty to offer forward-facing structure for consuming tools?
  • I also agree more complex cases still require custom Starlark. But I think it's reasonable to offer a trivial API for the common case and custom Starlark for advanced cases
  • Conceptually we could still implement the logic in Starlark, if there was some way to bundle that behind the canonical API. But I don't think we have the infrastructure for that. And, again, I think working cleanly, out of the box, without requiring build expertise is a powerful argument.

@gregestren
Copy link
Contributor

gregestren commented May 24, 2022

One more comment:

There's also some awkwardness with multi-target builds. I can imagine someone wanting to just replace their bazel build ... command with bazel cquery .... But:

  • Multiple targets all produce output in a big blob: you can't tell which output was from which target
  • Syntax isn't interchangeable: bazel build //:foo //:bar -> bazel cquery '//:foo+//:bar'

@fmeum
Copy link
Collaborator Author

fmeum commented May 25, 2022

  • Syntax isn't interchangeable: bazel build //:foo //:bar -> bazel cquery '//:foo+//:bar'

In fact, even before working on this change, I have always wondered about why bazel (c)query doesn't support a strict superset of the target specification grammar supported by bazel build. How about supporting bazel (c)query <expr 1> <expr 2> -- -<expr 3> and simply turning it into (<expr 1>) + (<expr 2>) - (<expr 3>) in the command logic before handing it to the query engine? bazel (c)query currently errors when it receives more than one positional argument, so this change would be backwards compatible and - at least in my opinion - useful beyond the situation of the current PR.

@fmeum
Copy link
Collaborator Author

fmeum commented May 25, 2022

  • That's because it's strongly in the dev team's interest to have flexibility to change output path patterns (bazel-out/arm-fastbuild-<hash>-...). This lets us focus on API and performance improvements freely. So builds can get faster and more cacheable at a faster pace.

With that aim, it is important to keep the following in mind: If content-based output paths are used to make builds more cacheable and if these paths are used throughout Bazel and not just on the level of Spawns, then cquery would no longer be the right tool to show output paths since knowing the paths would require executing (or recalling) a partial build.

  • Historically, users rely on exact path patterns (like looking at what bazel build produced and copying the exact path into downstream tools). That's the complete opposite of what we want. The only alternative is a canonical API.

Fully agreed. It's not completely uncommon to see ST hashes on the build outputs of top-level targets, at which point relying on exact paths is no longer reliable.

  • I agree IdentifyingName: output would be a more powerful API

Could you explain what you envision as IdentifyingName here? Would it contain more information than just the target (or possibly owner) label?

  • For now, is it reasonable to output a raw list? Or a dictionary where every key is empty to offer forward-facing structure for consuming tools?

I don't have a strong opinion here. When in doubt, I would prefer an output format that plays well with other Unix tools, given that this command will be a common interface to the non-Bazel world. There is jq however and it wouldn't be too uncommon to require it (or commit to emitting canonically indented JSON, which could also be parsed with grep in a pinch).

  • Conceptually we could still implement the logic in Starlark, if there was some way to bundle that behind the canonical API. But I don't think we have the infrastructure for that. And, again, I think working cleanly, out of the box, without requiring build expertise is a powerful argument.

I could implement --output=files as a thin wrapper around --output=starlark pointing at a shipped Starlark file, but for that would have to:

  1. Parse the value of the --output_groups option in Starlark, duplicating the parsing logic.
  2. Duplicate the filtering logic used in BuildResultPrinter.

That would probably defeat the purpose of decoupling this logic from core Bazel.

1 similar comment
@fmeum
Copy link
Collaborator Author

fmeum commented May 25, 2022

  • That's because it's strongly in the dev team's interest to have flexibility to change output path patterns (bazel-out/arm-fastbuild-<hash>-...). This lets us focus on API and performance improvements freely. So builds can get faster and more cacheable at a faster pace.

With that aim, it is important to keep the following in mind: If content-based output paths are used to make builds more cacheable and if these paths are used throughout Bazel and not just on the level of Spawns, then cquery would no longer be the right tool to show output paths since knowing the paths would require executing (or recalling) a partial build.

  • Historically, users rely on exact path patterns (like looking at what bazel build produced and copying the exact path into downstream tools). That's the complete opposite of what we want. The only alternative is a canonical API.

Fully agreed. It's not completely uncommon to see ST hashes on the build outputs of top-level targets, at which point relying on exact paths is no longer reliable.

  • I agree IdentifyingName: output would be a more powerful API

Could you explain what you envision as IdentifyingName here? Would it contain more information than just the target (or possibly owner) label?

  • For now, is it reasonable to output a raw list? Or a dictionary where every key is empty to offer forward-facing structure for consuming tools?

I don't have a strong opinion here. When in doubt, I would prefer an output format that plays well with other Unix tools, given that this command will be a common interface to the non-Bazel world. There is jq however and it wouldn't be too uncommon to require it (or commit to emitting canonically indented JSON, which could also be parsed with grep in a pinch).

  • Conceptually we could still implement the logic in Starlark, if there was some way to bundle that behind the canonical API. But I don't think we have the infrastructure for that. And, again, I think working cleanly, out of the box, without requiring build expertise is a powerful argument.

I could implement --output=files as a thin wrapper around --output=starlark pointing at a shipped Starlark file, but for that would have to:

  1. Parse the value of the --output_groups option in Starlark, duplicating the parsing logic.
  2. Duplicate the filtering logic used in BuildResultPrinter.

That would probably defeat the purpose of decoupling this logic from core Bazel.

@gregestren
Copy link
Contributor

  • I agree IdentifyingName: output would be a more powerful API

Could you explain what you envision as IdentifyingName here? Would it contain more information than just the target (or possibly owner) label?

Instead of:

$ bazel cquery --output=files '//tt:arm' 
bazel-out/arm-fastbuild-ST-7d684b6a9cec/bin/tt/libcb.a
bazel-out/arm--fastbuild-ST-7d684b6a9cec/bin/tt/libcb.so
bazel-out/arm--fastbuild-ST-7d684b6a9cec/bin/tt/libcb.ifso
bazel-out/arm--fastbuild-ST-7d684b6a9cec/bin/tt/arm.manifest

Something like:

$ bazel cquery --output=files '//tt:arm' 
StaticLibrary: bazel-out/arm-fastbuild-ST-7d684b6a9cec/bin/tt/libcb.a
DynamicLibrary: bazel-out/arm--fastbuild-ST-7d684b6a9cec/bin/tt/libcb.so
InterfaceSharedObjects: bazel-out/arm--fastbuild-ST-7d684b6a9cec/bin/tt/libcb.ifso
ManifestFile: bazel-out/arm--fastbuild-ST-7d684b6a9cec/bin/tt/arm.manifest

To be clear, I'm not asking you to implement this in this PR. I'm just thinking about a forward-facing API.

@gregestren gregestren self-assigned this Jun 7, 2022
@gregestren
Copy link
Contributor

Broadly speaking, and after more discussion with @brandjon , I support the simplest syntax:

$ bazel cquery --output=files '//tt:arm' 
bazel-out/arm-fastbuild-ST-7d684b6a9cec/bin/tt/libcb.a
...

which you're already doing.

The more complex structures quickly get speculative, where the core need is just a simple, easy API. Any extra syntax we add now would make future API changes that much more complicated if we get it wrong. @brandjon had some interesting ideas about separating the content request (outputs vs. BUILD vs. ...) with the format (JSON vs. plain vs. ...).

Now's not the time to get caught up in the subtleties.

@gregestren
Copy link
Contributor

Stated differently, lacking more experience with this, I think it's fair to assume users know which targets they're requesting outputs for. So we don't have to preemptively add API features to contextualize.

@fmeum fmeum force-pushed the cquery-output-files branch from 546a20a to 81da138 Compare June 30, 2022 12:37
ckolli5 pushed a commit to ckolli5/bazel that referenced this pull request Jul 22, 2022
With the new output mode `--output=files`, cquery lists all files advertised by the matched targets in the currently requested output groups.

This new mode has the following advantages over `--output=starlark` combined with an appropriate handcrafted `--starlark:expr`:
* provides a canonical answer to the very common "Where are my build outputs?" question
* is more friendly to new users as it doesn't require knowing about providers and non-BUILD dialect Starlark
* takes the value of `--output_groups` into account
* stays as close to the logic for build summaries printed by `bazel build` as possible

Fixes bazelbuild#8739

RELNOTES: `cquery`'s new output mode [`--output=files`](https://bazel.build/docs/cquery#files-output) lists the output files of the targets matching the query. It takes the current value of `--output_groups` into account.

Closes bazelbuild#15552.

PiperOrigin-RevId: 462630629
Change-Id: Ic648f22aa160ee57b476180561b444f08799ebb6
@ckolli5
Copy link

ckolli5 commented Jul 25, 2022

Hey @fmeum, we are trying to cherry pick this change to release-5.3.0. But presubmit checks are failing. Could you please help us in cherry-picking this PR to release-5.3.0? Thanks!

@gregestren
Copy link
Contributor

That might have been a merge discrepancy. I think it's just a matter of the right dependency in a BUILD file being added.

@gregestren
Copy link
Contributor

There's some kind of baseline discrepancy.

For example, for ProtoOutputFormatterCallback, this commit only adjusts the line

super(eventHandler, options, out, skyframeExecutor, accessor, /*uniquifyResults=*/ false);

But the cherrypick also includes

 OutputType outputType,
 @Nullable TransitionFactory<RuleTransitionData> trimmingTransitionFactory) {

which was added by 9994c32.

@aiuto
Copy link

aiuto commented Jul 26, 2022 via email

@fmeum fmeum deleted the cquery-output-files branch July 26, 2022 06:07
fmeum added a commit to fmeum/bazel that referenced this pull request Jul 26, 2022
With the new output mode `--output=files`, cquery lists all files advertised by the matched targets in the currently requested output groups.

This new mode has the following advantages over `--output=starlark` combined with an appropriate handcrafted `--starlark:expr`:
* provides a canonical answer to the very common "Where are my build outputs?" question
* is more friendly to new users as it doesn't require knowing about providers and non-BUILD dialect Starlark
* takes the value of `--output_groups` into account
* stays as close to the logic for build summaries printed by `bazel build` as possible

Fixes bazelbuild#8739

RELNOTES: `cquery`'s new output mode [`--output=files`](https://bazel.build/docs/cquery#files-output) lists the output files of the targets matching the query. It takes the current value of `--output_groups` into account.

Closes bazelbuild#15552.

PiperOrigin-RevId: 462630629
Change-Id: Ic648f22aa160ee57b476180561b444f08799ebb6
@fmeum
Copy link
Collaborator Author

fmeum commented Jul 26, 2022

I submitted #15979 with the merge conflicts resolved.

ckolli5 pushed a commit that referenced this pull request Jul 26, 2022
With the new output mode `--output=files`, cquery lists all files advertised by the matched targets in the currently requested output groups.

This new mode has the following advantages over `--output=starlark` combined with an appropriate handcrafted `--starlark:expr`:
* provides a canonical answer to the very common "Where are my build outputs?" question
* is more friendly to new users as it doesn't require knowing about providers and non-BUILD dialect Starlark
* takes the value of `--output_groups` into account
* stays as close to the logic for build summaries printed by `bazel build` as possible

Fixes #8739

RELNOTES: `cquery`'s new output mode [`--output=files`](https://bazel.build/docs/cquery#files-output) lists the output files of the targets matching the query. It takes the current value of `--output_groups` into account.

Closes #15552.

PiperOrigin-RevId: 462630629
Change-Id: Ic648f22aa160ee57b476180561b444f08799ebb6
copybara-service bot pushed a commit that referenced this pull request Aug 22, 2022