Improve pre-push file list when merges are involved
If I have a feature branch based on some old commit in develop or master, and I merge in develop or master, the pre-push hooks will run over all the merge commit files. Would it be possible to add a setting to provide a known "good" branch whose changes can be ignored? This is potentially useful as you gradually roll out a new check on a codebase, or if only some people have hooks installed.
A current workaround for this is by manually doing something like git diff $(git merge-base HEAD develop) --name-only [--diff-filter=ACMR?] in custom hooks, but I'd like to do this for other hooks I haven't written. The effect of this setting would just be to shrink the list of files provided to the hook, so their code wouldn't change.
Example repo: https://github.com/prem-nuro/precommit-issue-860
Hmmm, the three dots here are supposed to avoid merge commits.
I'm also not sure:
- how this feature would be configured
- top level configuration value?
- refspec?
- that it's a good idea
- even with a branch name, how would
pre-commitknow where to generate the diff from? - what if that branch doesn't exist locally?
- what if that branch does exist locally but is out of date?
- what if the fetched version of the remote branch is out of date?
- what if someone is pushing to that named branch?
- even with a branch name, how would
and I really don't think I'd want to take on that complexity for a feature that would only be used very rarely (pre-push already doesn't see all that much use as it is). It would likely be broken very easily :(
So basically: convince me this is a problem worth fixing, convince me it'll not add significant complexity / special snowflakes, convince me it can't (easily) be done in hooks themselves :)
Regarding the current use of ..., I thought you'd need --no-merges or --first-parent or something.
Good points, here's my attempt at addressing them:
top level configuration value?
One option is a per-hook configuration: ignore_branch: [develop]
Any hooks already aware of special branches don't need them, but we can add them to hooks we haven't authored. I think a simple implementation supporting one special branch or tag is good enough for most use cases. And maybe this option could only have relevant for push stage hooks.
Another option is just top level overall.
refspec?
I think something simpler would be most convenient, matching no-commit-to-branch, hence just develop.
even with a branch name, how would
pre-commitknow where to generate the diff from?
We can simplify and just do git diff develop...foo. The diff will grow larger as time goes on, but that's okay. We're totally ignoring the current behavior here, which AFAIK is to just do the diff on un-pushed commits.
what if that branch doesn't exist locally?
Use the remote of the current branch's upstream. If it does exist locally, use the upstream version of that branch. Error if it doesn't exist in whatever remote we use.
what if that branch does exist locally but is out of date?
I think the ... notation will do the right thing here, if we use the remote upstream instead.
what if the fetched version of the remote branch is out of date?
Use the upstream version of the branch.
what if someone is pushing to that named branch?
If pushing to the branch, do the diff with whatever is upstream like origin/develop...develop. Special care for no upstream.
Summary:
| Special branch is not local or remote | Pushing to special branch which only exists locally | Pushing to special branch | Pushing to other branch | |
|---|---|---|---|---|
| Diff computation | Error | Everything | develop...origin/develop |
origin/develop...foo |
| Which remote/upstream | Error | None required | Special branch remote/upstream | Special branch remote/upstream if it exists locally. Else, assume it is the same name in current branch's remote. If that doesn't exist, error. |
And to reiterate, I think the use-case here is:
- We pre-push because we want to run a lot of hooks but not on every commit since they're all work in progress until we push.
- Everything's going to be merged into
developeventually, which has varying levels of satisfying each hook as a new one is turned on. So we really care only about the diff withdevelop, such that any new code will satisfy hooks. This also helps us ignore things like rebasing ondevelop.
The way you describe it should already be the existing behaviour.
Since the commits in develop already exist on the remote, it should only notice the difference between those not pushed and those that you're pushing. Can you provide an example showing that not being the case?
I see. I linked an example repo above: https://github.com/prem-nuro/precommit-issue-860
master is the special branch and foo is the regular one. If you look at the entries in LOG you should be able to follow along, let me know if the entries aren't clear.
I had trouble following your example so I created a script:
script
#!/usr/bin/env bash
set -euxo pipefail
rm -rf upstream clone
git init upstream
git -C upstream commit --allow-empty -m 'Initial commit'
git -C upstream config receive.denyCurrentBranch ignore
git clone upstream clone
cd clone
cat > .pre-commit-config.yaml <<EOF
repos:
- repo: local
hooks:
- id: echo
name: echo
entry: echo
verbose: true
language: system
stages: [push]
EOF
git add .pre-commit-config.yaml
pre-commit install -t pre-push
git commit -m "Add push"
git push origin HEAD
touch master_only_file
git add master_only_file
git commit -m "Add master_only_file"
git push origin HEAD
git checkout origin/master^ -b bar
touch bar
git add bar
git commit -m "Add bar"
git merge origin/master --no-edit
git log --oneline --graph --decorate
: merge base
mb="$(git merge-base origin/master HEAD)"
git diff "${mb}" HEAD --name-only
: commits to push
git rev-list HEAD --topo-order --reverse --not --remotes=origin |
xargs git show
: source vs origin calculation
first_ancestor=$(
git rev-list HEAD --topo-order --reverse --not --remotes=origin |
head -1
)
source="$(git rev-parse "${first_ancestor}^")"
git show "${source}"
git diff --name-only "${source}...HEAD"
git push origin HEAD
output
$ bash t.sh
+ rm -rf upstream clone
+ git init upstream
Initialized empty Git repository in /tmp/t/upstream/.git/
+ git -C upstream commit --allow-empty -m 'Initial commit'
[master (root-commit) 50bebbc] Initial commit
+ git -C upstream config receive.denyCurrentBranch ignore
+ git clone upstream clone
Cloning into 'clone'...
done.
+ cd clone
+ cat
+ git add .pre-commit-config.yaml
+ pre-commit install -t pre-push
pre-commit installed at /tmp/t/clone/.git/hooks/pre-push
+ git commit -m 'Add push'
[master 625ef71] Add push
1 file changed, 9 insertions(+)
create mode 100644 .pre-commit-config.yaml
+ git push origin HEAD
echo.....................................................................Passed
hookid: echo
.pre-commit-config.yaml
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 354 bytes | 354.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To /tmp/t/upstream
50bebbc..625ef71 HEAD -> master
+ touch master_only_file
+ git add master_only_file
+ git commit -m 'Add master_only_file'
[master bbcf4f2] Add master_only_file
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 master_only_file
+ git push origin HEAD
echo.....................................................................Passed
hookid: echo
master_only_file
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 306 bytes | 306.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To /tmp/t/upstream
625ef71..bbcf4f2 HEAD -> master
+ git checkout 'origin/master^' -b bar
Switched to a new branch 'bar'
+ touch bar
+ git add bar
+ git commit -m 'Add bar'
[bar 90ef098] Add bar
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 bar
+ git merge origin/master --no-edit
Merge made by the 'recursive' strategy.
master_only_file | 0
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 master_only_file
+ git log --oneline --graph --decorate
* aabbc0a (HEAD -> bar) Merge remote-tracking branch 'origin/master' into bar
|\
| * bbcf4f2 (origin/master, origin/HEAD, master) Add master_only_file
* | 90ef098 Add bar
|/
* 625ef71 Add push
* 50bebbc Initial commit
+ : merge base
++ git merge-base origin/master HEAD
+ mb=bbcf4f2e695bf186b7af76f49ec288e068ad804e
+ git diff bbcf4f2e695bf186b7af76f49ec288e068ad804e HEAD --name-only
bar
+ : commits to push
+ git rev-list HEAD --topo-order --reverse --not --remotes=origin
+ xargs git show
commit 90ef098415b3069e676bde710e9885243b02b0bc
Author: Anthony Sottile <[email protected]>
Date: Mon Nov 12 11:20:31 2018 -0800
Add bar
diff --git a/bar b/bar
new file mode 100644
index 0000000..e69de29
commit aabbc0a9e4f1c3b60b0d93d4fd977efb8f8bf996 (HEAD -> bar)
Merge: 90ef098 bbcf4f2
Author: Anthony Sottile <[email protected]>
Date: Mon Nov 12 11:20:31 2018 -0800
Merge remote-tracking branch 'origin/master' into bar
+ : source vs origin calculation
++ head -1
++ git rev-list HEAD --topo-order --reverse --not --remotes=origin
+ first_ancestor=90ef098415b3069e676bde710e9885243b02b0bc
++ git rev-parse '90ef098415b3069e676bde710e9885243b02b0bc^'
+ source=625ef7124a9b348855de5c34caa7cbced08cf011
+ git show 625ef7124a9b348855de5c34caa7cbced08cf011
commit 625ef7124a9b348855de5c34caa7cbced08cf011
Author: Anthony Sottile <[email protected]>
Date: Mon Nov 12 11:20:30 2018 -0800
Add push
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000..79a4a6f
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,9 @@
+repos:
+- repo: local
+ hooks:
+ - id: echo
+ name: echo
+ entry: echo
+ verbose: true
+ language: system
+ stages: [push]
+ git diff --name-only 625ef7124a9b348855de5c34caa7cbced08cf011...HEAD
bar
master_only_file
+ git push origin HEAD
echo.....................................................................Passed
hookid: echo
bar master_only_file
Counting objects: 4, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 523 bytes | 523.00 KiB/s, done.
Total 4 (delta 1), reused 0 (delta 0)
To /tmp/t/upstream
* [new branch] HEAD -> bar
If we do this at all, I don't want a configuration value. If you can find a way to improve the push routine or revision differencing to determine this I'd be happiest with that.
https://github.com/pre-commit/pre-commit/issues/2424 was marked as duplicate of this, but the workflow I reported there does not involve merges but strictly rebase + force pushes.
If we do this at all, I don't want a configuration value. If you can find a way to improve the push routine or revision differencing to determine this I'd be happiest with that.
Internally I patched pre-commit in our monorepo with this
--- a/pre_commit/commands/hook_impl.py
+++ b/pre_commit/commands/hook_impl.py
@@ -117,14 +117,16 @@ def _pre_push_ns(
local_branch, local_sha, remote_branch, remote_sha = line.split()
if local_sha == Z40:
continue
- elif remote_sha != Z40 and _rev_exists(remote_sha):
- return _ns(
- 'pre-push', color,
- from_ref=remote_sha, to_ref=local_sha,
- remote_branch=remote_branch,
- local_branch=local_branch,
- remote_name=remote_name, remote_url=remote_url,
- )
+ # Irrelevant to monorepo workflow
+ #
+ # elif remote_sha != Z40 and _rev_exists(remote_sha):
+ # return _ns(
+ # 'pre-push', color,
+ # from_ref=remote_sha, to_ref=local_sha,
+ # remote_branch=remote_branch,
+ # local_branch=local_branch,
+ # remote_name=remote_name, remote_url=remote_url,
+ # )
else:
# ancestors not found in remote
ancestors = subprocess.check_output((
If current feature branch was updated because of a git rebase origin/master, we don't want our user have to pay attention to the files that was updated between origin/master@{1}..origin/master, but strictly only the files that was touched between the merge-base commit and feature-branch's HEAD.
So it makes no sense for us to run pre-commit for the diff between remote commit vs local commit in this workflow.
I think https://facebook.github.io/watchman/docs/scm-query.html described this class of diff calculation problem really well. Definitely worth a read.
solutioning:
I do recognize that this problem is workflow specific. So I think a top-level configuration to enable/disable this pre-push strategy, depending on workflow of the repository, would be much appreciated here.