-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Ensure all logging tasks are closed and simplify log task dispatching #5375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
chrisd8088
merged 10 commits into
git-lfs:main
from
chrisd8088:update-tasklog-complete-all-tasks
May 30, 2023
Merged
Ensure all logging tasks are closed and simplify log task dispatching #5375
chrisd8088
merged 10 commits into
git-lfs:main
from
chrisd8088:update-tasklog-complete-all-tasks
May 30, 2023
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We can update several code comments in the "tasklog" package to reflect their corresponding functions and fix typos.
In commit e90d54d of PR git-lfs#2329 the "git/githistory/log" package was added, including the PercentageTask structure and associated methods. Then in commit 122b765 of PR git-lfs#2610 the Entry() method was added to provide support for verbose logging in the "git lfs migrate impot" command. However, the name of the receiver argument for the Entry() method differs from that of the other methods for the PercentageTask structure, so we update it to match now, as well as fixing a typo which remains from the original commit in PR git-lfs#2329. Note that the "git/githistory/log" package was later renamed to the "tasklog" package in commits b9ab79e and 45c580e of PR git-lfs#2747.
In commit 3129b9a of PR git-lfs#2537 the "githistory" package was updated so the WithLogger global option variable now required a *log.Logger argument instead of calling log.NewLogger(). This was done so the "git lfs migrate" commands could pass in a previously initialized logger to which the commands had already written log messages; that change was introduced in PR git-lfs#2538. In the same commit of PR git-lfs#2537, a WithLoggerTo global option function was added to preserve the existing behaviour. However, this option has never been utilized, and so to reduce unnecessary complexity we just remove it now. Note that the use of functional options in the Git LFS project stems from PR git-lfs#2295 and is inspired by the design proposed in several articles from 2014; see, for reference: https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis https://commandcenter.blogspot.com/2014/01/self-referential-functions-and-design.html
The SimpleTask structure of the "tasklog" package, which was introduced in commit 7a760b6 of PR git-lfs#2756, is used at several points in the implementation of the "git lfs fetch" and "git lfs prune" commands, and in each case is constructed with a call to the NewSimpleTask() function, and then passed to a Logger structure's Enqueue() method. However, in commit 0cad488 of PR git-lfs#2767, a Logger.Simple() method was added to both create a new SimpleTask structure and pass it to a Logger structure's Enqueue() method. This Logger.Simple() method was added to be consistent with other existing similar methods such as Logger.List() and Logger.Percentage(). These other methods are used consistently throughout our code to create and enqueue tasks, so we now update the locations where we call NewSimpleTask() to use the Logger.Simple() method instead. This also provides an opportunity to tidy some of the surrounding code and whitespace.
In commit f100293 of PR git-lfs#2732 a local maxInt() function was added to the "git/githistory/log" package in order to avoid a circular dependancy on the "tools" package, which provides a common MaxInt() implementation. See, for reference, the discussion in this PR comment: git-lfs@f100293#r151551639 The "git/githistory/log" package was subsequently renamed to the "tasklog" package in commits b9ab79e and 45c580e of PR git-lfs#2747. At present, though, there is no cyclic dependency which prevents the use of the standard tools.MaxInt() function in the "tasklog" package, so we revert to its use now.
The *Logger.consume() method in the "tasklog" package was first introduced in commit e90d54d of PR git-lfs#2329, and since that time has included logic to manage an internal "pending" list of tasks; however, that logic is never exercised because tasks are only appended to the "pending" array when the "next" variable is non-nil, but that variable is initialized as nil and can only be set to another value if the "pending" array is non-empty. As a result, we can remove both variables and the related logic and simplify the consume() method as a result.
In commit e90d54d of PR git-lfs#2329 the "git/githistory/log" package was added, including the PercentageTask structure and associated methods, but unlike the WaitingTask which was added at the same time, no Complete() method was defined for the PercentageTask structure. (Note that the "git/githistory/log" package was later renamed to the "tasklog" package in commits b9ab79e and 45c580e of PR git-lfs#2747.) Other task types such as the ListTask and SimpleTask structures (introduced in commit 31ffeb9 of PR git-lfs#2335 and commit 7a760b6 of PR git-lfs#2756, respectively) provide a Complete() method, and the Meter task type of the "tq" package, which implemented the Task interface in commit 7c0f9e2 of PR git-lfs#2732, provides an equivalent Finish() method. These methods allow the caller to explicitly close the channel returned by the Updates() method, on which the anonymous goroutine started by the Logger.consume() method for the task is waiting, in a "range" loop on the channel in the Logger.logTask() method. One key use of the PercentageTask structure is in the methods of the Rewriter structure from the "git/githistory" package. It is initialized with the number of commits to be rewritten during a "git lfs migrate" command's traversal of a Git repository's history. As each commit is rewritten, the Count() method is called to increment the percentage completed value. When every commit has been rewritten, the count reaches the expected total, and the Count() method closes the channel, thus allowing the task receiving updates to finish and exit its goroutine. Under exceptional circumstances, though, the Rewrite() method of the Rewriter structure may never finish iterating through all the expected commits, and return in error or via a panic call. When this happens, the goroutine waiting on the PercentageTask's updates channel can never exit, leading to a hung process which never fully exits. In order to allow the Rewriter's Rewrite() method to define a deferred function which will always be called when it returns, under any circumstances, we add a Complete() method for the PercentageTask structure which sets the number of completed elements to the expected total in an atomic swap, and then closes the updates channel if the number of completed elements was previously less than the expected total. We also add a test to validate this new method's behaviour, and update the existing TestPercentageTaskCallsDoneWhenComplete() function to also confirm that calling the new Complete() method after the number of completed elements has reached the expected total does not cause a second attempt to close the updates channel.
In a prior commit in this PR, we added a Complete() method for the PercentageTask structure which sets the number of completed elements of the task to the task's expected total in an atomic swap, and then closes the task's updates channel if the number of completed elements was less than the expected total. As noted in that same commit, the PercentageTask structure is used in the methods of the Rewriter structure from the "git/githistory" package. It is initialized with the number of commits to be rewritten during a "git lfs migrate" command's traversal of a Git repository's history. As each commit is rewritten, the Count() method is called to increment the percentage completed value. When every commit has been rewritten, the count reaches the expected total, and the Count() method closes the channel, thus allowing the task receiving updates to finish and exit its goroutine. Under exceptional circumstances, though, the Rewrite() method of the Rewriter structure may never finish iterating through all the expected commits, and return in error or via a panic call. When this happens, the anonymous goroutine started by the consume() method of the Logger structure in "tasklog" package that is waiting on the PercentageTask's updates channel can never exit, leading to a hung process which never fully exits. We can resolve this problem now by defining a deferred call to the PercentageTask's new Complete() method after we create the task structure in the Rewrite() method of the Rewriter structure. As a result, even if the Rewrite() method suffers a panic() or other error condition causing an early return, the PercentageTask's updates channel is closed and the waiting goroutine will complete and exit normally. This usage corresponds to the resource management approach used elsewhere, such as in the scanAll() and pointersToFetchForRefs() functions of the "git lfs fetch" command, where SimpleTask objects are created and a deferred call to their Complete() method is immediately established. This technique is also used in the UpdateRefs() method of the refUpdater structure in the "git/githistory" package, where a ListTask is created and a matching deferred call to its Complete() method is then defined.
In a prior commit in this PR, we added a Complete() method for the PercentageTask structure which sets the number of completed elements of the task to the task's expected total in an atomic swap, and then closes the task's updates channel if the number of completed elements was less than the expected total. As noted in that same commit, this allows us to establish a deferred call to the Complete() method after creating a PercentageTask structure, akin to how we handle other "tasklog" package task structures such as ListTask and WaiterTask. Use of this idiom ensures that even if the calling function returns due to an unexpected error or panic condition, the anonymous goroutine started by the consume() method of the Logger structure that is waiting on the task's updates channel will exit cleanly, preventing a hung process. We therefore now update the other use of the PercentageTask structure, in the "git lfs prune" command's pruneDeleteFiles() function, to follow the same approach as we have already implemented for the Rewrite() method of the Rewriter structure in the "git/githistory" package.
In the implementation of several commands we create structures with the Task interface from the "tasklog" package, and enqueue them to be reported by an anonymous goroutine created by the Logger structure's consume() method for each task. These goroutines only exit when the channel returned by the the Task's Updates() method is closed. In order to ensure that we always close the channel associated with each task, we refactor the creation and use of these tasks into small utility functions. We can then establish a deferred call to the tasks' Complete() methods immediately after creating the tasks. As these deferred calls will always be executed when returning from the new utility functions, even under exceptional conditions, the waiting goroutines will never fail to exit and thus cause the entire program to hang. This usage corresponds to the resource management approach used elsewhere, such as in the scanAll() and pointersToFetchForRefs() functions of the "git lfs fetch" command, where SimpleTask objects are created and a deferred call to their Complete() method is immediately established. This technique is also used in the UpdateRefs() method of the refUpdater structure in the "git/githistory" package, where a ListTask is created and a matching deferred call to its Complete() method is then defined, and in previous commits in this PR we updated the locations in our code where we create PercentageTask structures to follow the same pattern too.
bk2204
approved these changes
May 30, 2023
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Jun 1, 2023
Support for Git macro attributes was added in a series of commits in PR git-lfs#3391, including commit 9d3e52d, where the Line structure of the "git/gitattr" package was updated to include a Macro element which would be set non-nil for macro definition lines in a .gitattributes file file, while the existing Pattern element would be set non-nil for all other lines. The "git lfs track" command, among others, was then adjusted to create a MacroProcessor structure (from the same "git/gitattr" package) and call its ProcessLines() method to resolve any macro references and thus convert the "raw" parsed Line structures into a set for which the Pattern element was always non-nil, and no Macro elements appeared. Later, the "git lfs fsck" command gained the ability to process macro definitions in .gitattributes files, in PR git-lfs#4525. However, the "git lfs migrate import" command was not adjusted, specifically in the implementation of its "--fixup" option, which initializes a Tree structure (also of the "git/gitattr" package) for the root tree of each commit in a repository's history using the package's New() function. This function traverses all the trees in the hierarchy and finds and parses all the .gitattributes files in them. Then, when the command visits each file within the commit's tree using the Rewrite() method of the Rewriter structure in the "git/githistory" package, it calls the (*Tree).Applied() method to match the file's path against any applicable Git attributes, to see if the file should be treated as a Git LFS object. This lack of support for macro attributes in the "git lfs migrate import --fixup" command was then propagated to the "git lfs migrate info --fixup" command in commit 4800c5e of PR git-lfs#4501, when the "git lfs migrate info" command was updated to respect the --fixup option. As a result, both of these commands (when used with the --fixup option) would panic if they encountered a .gitattributes file with any macro definition, as they would call the (*Tree).Applied() method and it would attempt to access the nil Pattern element of the lines with non-nil Macro elements. (Prior to the changes in commit c374d1f of PR git-lfs#5375 the "git lfs migrate import --fixup" command would then stall indefinitely, but it now also exits after the panic condition.) These problems were reported in issue git-lfs#5332. To resolve this problem and avoid similar ones in the future, we refactor the Line structure into a Line interface, which only provides a Attrs() method to retrieve a slice of Attr attributes, and no other methods. We then also define two additional interfaces, each of which embeds the Line interface, PatternLine and MacroLine, with corresponding getter methods for their respective elements. The ParseLine() function of the "git/gitattr" package now returns a slice of generic Line types, each of which is either a PatternLine or a MacroLine, but never both. Callers like the Applied() method of the Tree structure therefore need to perform type assertions or switches to determine which type of Line they are handling, which ensures they always access the line's data through safe methods. We then update the Go tests for the "git/gitattr" package as appropriate, and also add two tests each to the t/t-migrate-fixup.sh and t/t-migrate-import.sh test suites. All four of these new shell tests fail without the changes in this commit. In particular, several of these tests make sure to run the "git lfs migrate" commands outside of any shell pipeline so the test will fail if the command panics and produces no output, even if no output is the expected condition for a successful execution of the command.
chrisd8088
added a commit
to chrisd8088/git-lfs
that referenced
this pull request
Jun 1, 2023
Support for Git macro attributes was added in a series of commits in PR git-lfs#3391, including commit 9d3e52d, where the Line structure of the "git/gitattr" package was updated to include a Macro element which would be set non-nil for macro definition lines in a .gitattributes file file, while the existing Pattern element would be set non-nil for all other lines. The "git lfs track" command, among others, was then adjusted to create a MacroProcessor structure (from the same "git/gitattr" package) and call its ProcessLines() method to resolve any macro references and thus convert the "raw" parsed Line structures into a set for which the Pattern element was always non-nil, and no Macro elements appeared. Later, the "git lfs fsck" command gained the ability to process macro definitions in .gitattributes files, in PR git-lfs#4525. However, the "git lfs migrate import" command was not adjusted, specifically in the implementation of its "--fixup" option, which initializes a Tree structure (also of the "git/gitattr" package) for the root tree of each commit in a repository's history using the package's New() function. This function traverses all the trees in the hierarchy and finds and parses all the .gitattributes files in them. Then, when the command visits each file within the commit's tree using the Rewrite() method of the Rewriter structure in the "git/githistory" package, it calls the (*Tree).Applied() method to match the file's path against any applicable Git attributes, to see if the file should be treated as a Git LFS object. This lack of support for macro attributes in the "git lfs migrate import --fixup" command was then propagated to the "git lfs migrate info --fixup" command in commit 4800c5e of PR git-lfs#4501, when the "git lfs migrate info" command was updated to respect the --fixup option. As a result, both of these commands (when used with the --fixup option) panic if they encounter a .gitattributes file with any macro definitions, as they call the (*Tree).Applied() method and it attempts to access the nil Pattern element of the lines with non-nil Macro elements. (Prior to the changes in commit c374d1f of PR git-lfs#5375 the "git lfs migrate import --fixup" command would then stall indefinitely, but it now also exits after the panic condition.) These problems were reported in issue git-lfs#5332. To resolve this problem and avoid similar ones in the future, we refactor the Line structure into a Line interface, which only provides an Attrs() method to retrieve a slice of Attr attributes, and no other methods. We then also define two additional interfaces, each of which embeds the Line interface, PatternLine and MacroLine, with corresponding getter methods for their respective elements. The ParseLine() function of the "git/gitattr" package now returns a slice of generic Line types, each of which is either a PatternLine or a MacroLine, but never both. Callers like the Applied() method of the Tree structure therefore need to perform type assertions or switches to determine which type of Line they are handling, which ensures they always access the line's data through safe methods. We then update the Go tests for the "git/gitattr" package as appropriate, and also add two tests each to the t/t-migrate-fixup.sh and t/t-migrate-import.sh test suites. All four of these new shell tests fail without the changes in this commit. In particular, several of these tests make sure to run the "git lfs migrate" commands outside of any shell pipeline so the test will fail if the command panics and produces no output, even if no output is the expected condition for a successful execution of the command.
This was referenced Jun 1, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR revises our handling of structures that implement the
Taskinterface of thetasklogpackage so that in all cases we always immediately register a deferred call to theirComplete()method after allocating and initializing a new structure. This ensures that the channels created for the structures are always closed, which preventsLoggerinstances waiting on those channels from hanging indefinitely following a panic or other exceptional error condition.As well, we simplify some instances of the construction of these types of structures, and clean up some details in the implementation of the
tasklogpackage.In the case of the
PercentageTaskstructure, we first add a newComplete()method, which does not currently exist (although it was discussed in #2757 (comment), in relation to thegit-lfs-prune(1)command). We can then register a call to our new*PercentageTask.Complete()method withdeferfollowing the creation of aPercentageTaskstructure, such as occurs in theRewrite()method of theRewritestructure in thegit/githistorypackage.This particular use case resolves one of the problems reported in #5332, where after a panic occurs (due to an unrelated bug while processing Git macro attributes) in the Git history rewriting phase of a
git lfs migrate import --fixupcommand, the command hangs because the deferred*Logger.Close()call of the "main" goroutine is stuck waiting on a semaphore which will never decrement to zero. Under normal circumstances the*Rewriter.Rewrite()method would iterate through all the commits selected for rewriting, and so the final call to the*PercentageTask.Count()would cause the task's channel to be closed, thus notifying the anonymous function that is waiting on the channel that is should cease waiting and begin to exit. But if the*Rewriter.Rewrite()method panics during its processing of commits, it never closes the channel of thePercentageTask. By adding an explicitly deferred call to our newComplete()method for that task, though, we can guarantee that the channel will always be closed by the time the*Rewriter.Rewrite()method returns.(The macro attribute processing bug from #5332, and the discrepancy between how the
importandinfosubcommands of thegit-lfs-migrate(1)command manage the cleanup of their*Loggerstructures, will be dealt with in subsequent PRs.)Also, as in the case of the
SimpleTaskstructure used by thegit-lfs-prune(1)command'sprune()function, we refactor our code so as to introduce small utility functions which encapsulate the initialization and cleanup of structures implementing theTaskinterface. This allows us to always establish a deferred call to the structure'sCleanup()method immediately after initializing and enqueuing the structure, so the return from the utility function guarantees the structure's channel is closed in all instances.This pattern was discussed in #2757 (comment) as a desirable one to follow for all resource allocation and deallocation, and corresponds to the resource management approach used elsewhere, such as in the
UpdateRefs()method of therefUpdaterstructure in thegit/githistorypackage, where aListTaskis created and a matching deferred call to itsComplete()method is then defined, and in thescanAll()andpointersToFetchForRefs()functions of thegit-lfs-fetch(1)command, whereSimpleTaskstructures are created and a deferred call to theirComplete()method is immediately established. (Note, though, that we refine these functions by making use of the*Logger.Simple()method, which was introduced in commit 0cad488 of PR #2767 but is not used anywhere at present.)This PR also fixes some minor issues relating to the
tasklogpackage, removing unused or unnecessary code and correcting code comments, and simplifying the initialization of someSimpleTaskstructures with the*Logger.Simple()method, as mentioned above.One notable change is the removal of the
pendingandnextvariables and the code to populate and process them from the*Logger.consume()method of thetasklogpackage. This method was introduced in the initial form of thetasklogpackage in commit e90d54d of PR #2329, and included the internalpendingslice and the logic to push and pop tasks from it in much the same form as that code still exists today.However, that logic which manages the
pendingslice is never exercised because tasks are only appended in a conditional block which executes if thenextvariable is non-nil, but that variable is initialized asniland can only be set to another value if thependingslice is non-empty. As a result, neither condition is ever true, and so only the in the conditional block for when thenextvariable isnilever executes, and we can remove the rest.It may be easiest to review this PR commit-by-commit, as each commit has a detailed description and should be logically independent and bisectable, but it should also be possible to review this PR as a single diff.
/cc #5332 where the
git lfs migrate import --fixuphang behaviour was reported./cc @ttaylorr for insight on the design of the
tasklogand its buffering logic.