-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Modeling Real-World Versions #1997
Description
This is, in a way, a continuation of #1975. In #1975, I set out to find a way to get what we need without coding special versions such as develop into the system. That thread morphed into discussions of how we can double-down on the special versions, including changing the title to the OPPOSITE of what I originally sought. Hence I'm starting a new Issue. I would respectfully ask the following in this discussion:
- Please keep an open mind as we look at example problems and solutions; we need to free ourselves from bias toward how things are currently, or what would be super-easy to implement in the current system. Those aspects are important later on, but not now. At this point, I would like to come up with a model that we think WORKS, in the sense of modeling the real world.
- Please separate semantics from syntax. We might think up new ideas of how things MIGHT work, without yet understanding how the user should specify them.
- Please avoid jumping to comparisons or shooting things down until we have a clear idea of what we're discussing. Do not start shooting something down just because it's different. New designs sometimes need time to solidify before being evaluated.
- Please address the goals, issues and designs brought up in this thread. New ideas are welcome. Re-iteration of the ideas in extend Version class so that 2.0 > 1.develop > 1.1 and develop > master > head > trunk > 9999 #1983 are not (because that idea has already been discussed and we already know how it works).
Example Versioning Schemes
Example 1
It's been mentioned before that not all packages use the same versioning scheme. As a way to illustrate that point, let me bring up a couple of examples. One well-known piece of software uses the versions: Cheetah < Puma < Jaguar < Panther < Tiger < Leopard <Snow Leopard < Lion < Mountain Lion < Mavericks < Yosemite < El Capitan and Sierra. Within each of these major versions are sub-versions.
But you say... those are just aliases for underlying "regular" version numbers. True. But people frequently refer to versions by the alias. This scheme could be well represented using version numbers plus aliases.
Conclusion: People like version aliases.
NOTE: Aliases could replace the current preferred=True flag.
Another similar example is Ubuntu versions, the latest of which is Yakkety Yak. These are also aliases for a well-ordered set of numeric versions. Recent versions sort alphabetically, but not the earliest versions. (4.10 = Warty Warthog but 5.04 = Hoary Hedgehog).
Example 2
Another well-known piece of software uses the versions: 3.1 < 95 < 98 < 98SE < 98SE sp3 < Me < XP < Vista < 7 < 8 < 8.1 < 10. This is a little simplistic... it's atually two product lines. Let's try again:
Product line DOS: 3.1 < 95 < 98 < 98SE < 98SE sp3 < Me
Product line NT: 3.1 < 3.5 < 3.51 < 4.0 < XP < ...etc.
Apparently, releases in the both product lines all have "regular" numeric identifiers. Thus, as in Example 1, the trade names are also aliases for underlying numeric versions. But we have a new wrinkle here: these versions actually come from two separate "branches," with no specific ordering between those branches. For example, beetween DOS-98 and NT-3.51, we can't say which one is greater or lesser.
Conclusion: Versions are partially ordered, not fully ordered. Spack needs to deal with it gracefully, rather than forcing versions to be fully ordered.
One possible way to deal with the partial ordering (in some cases at least) is to define a Version as a tuple of (label, version). The idea here is that Versions with the same label are fully ordered; and Versions with different labels are non-comparable. In this case, there seem to be two labels (DOS and NT).
Example 3: Git
Now consider any piece of software residing in a Git repo. Each Git hash is potentially a version that can be installed, arranged in a partial order. But given two hashes themselves (with no additional information), there is no way to determine their relationship in that partial order. The algorithm to determine that relationship is specific to each package (in the sense that it needs information from that package's Git repo).
Conclusion: Partial ordering on versions is package-specific, not universal to Spack. So are version aliases, for that matter.
Maybe labels could correspond to Git branches; meaning, a git hash N is labeled by a particular branch X if N is reachable from X.
Now suppose that certain hashes within this Git repo are actually releases (say, they are tagged; or we have some other way to know which Git hash corresponds to which release number). So... we have 1.1 at one place in the Git repo, 1.2 at another place, and 2.0 somewhere else still. Let's give these all the RELEASE label. We would like to have RELEASE-1.1 < RELEASE-1.2 < RELEASE-2.0. This will not always be the case within the Git repo itself.
Conclusion: Partial order cannot be determined JUST from the raw Git repo itself. It needs annotation by the Spack package author. Given a labelled release (say 2.0) and a random Git hash, it may or may not be easy to come up with an algorithm that places the Git hash within a useful partial order with respect to the labelled release.
Example 4: Floating Versions
We've begun to discuss floating versions. If those versions come out of a Git repo and we have a good handle on versions in that repo (see Example 3 above), then floating versions are not so bad. If I install head of the master branch, then Spack could convert to a version identified by Git hash, and then do whatever it needs there.
Example 5: Uses of Version Comparison
AFAIK, Version ordering is used in (at least) two places: concretization and package conditionals. In package conditionals, it is common to have code like "if version between 3.1 and 4.5.2, then apply this patch...".
Concretization is a bit different: we want to select the "greatest" version, subject to certain constraints. In particular, unless the user specifies otherwise, release versions are to be preferred, EVEN if there are other available versions that are "greater."
Conclusion: Version number comparison is not the same a priority in the concretization algorithm.
Maybe labels can help us distinguish between release and non-release versions. A "RELEASE" label applied to ONLY released versions would allow for this.
I think we're still a ways from a complete versioning scheme that models the way real-world versions really work. But I would suggest the following places to start from in thinking about designs and solutions:
- Version comparison needs to reflect a partial order API, not a full order API. So we can't use the regular
__lt__()any more. Instead, we need something like:
class Version(object):
def compare(self, other):
"""Returns the smaller Version: SELF, OTHER, EQUAL or INCOMPARABLE"""
- Since version schemes seem to be package-specific, that should be reflected in the Spack API. This will give us the flexibility we need to implement different versioning schemes, partial orders, etc: For example:
a) Versions should maintain a reference to the Package to which they are related.
b) New versions are constructed off of a Package (eg: Package.new_version() method).
c) Comparison of Versions between packages throws an exception.
d) Comparison of Versions belonging to the same package is delegated to a method on the Package. Eg:
class Version(object):
def compare(self, other):
if self.package != other.package:
raise Exception(....)
return self.package.compare_versions(self, other)
Of course, delegating this to packages doesn't solve the problem of how version schemes should work for 99% of packages that will use the standard scheme. But it does give us flexibility for the corner cases. More importantly, it allows the standard scheme to do package-specific things (based on, for example, what it finds in package.py or the Git repo).
- There probably needs to be some notion of "resolving" versions. If I ask to install "head of the master branch," Spack should be able to resolve that to a SINGLE version within its well-defined partial ordering (e.g. the git hash representing head of master at that time). If we can do useful comparisons with Git hashes, then the moving branch problem becomes pretty easy.