Python identifiers, PEP 8, and consistency
While there are few rules on the names of variables, classes, functions, and so on (i.e. identifiers) in the Python language, there are some guidelines on how those things should be named. But, of course, those guidelines were not always followed in the standard library, especially in the early years of the project. A suggestion to add aliases to the standard library for identifiers that do not follow the guidelines seems highly unlikely to go anywhere, but it led to an interesting discussion on the python-ideas mailing list.
To a first approximation, a Python
identifier can be any sequence of Unicode code points that correspond
to characters, but they cannot start with a numeral nor be the same as
one of the 35 reserved keywords. That leaves a lot of room for
expressiveness (and some confusion) in
those names. There is, however, PEP 8
("Style Guide for Python Code
") that has some naming
conventions for identifiers, but the PEP contains a caveat: "The naming
conventions of Python's library are a bit of a mess, so we'll never get
this completely consistent
".
But consistency is just what Matt del Valle was after when he proposed making aliases for identifiers in the standard library that do not conform to the PEP 8 conventions. The idea cropped up after reading the documentation for the threading module in the standard library, which has a note near the top about deprecating the camel-case function names in the module for others that are in keeping with the guidelines in PEP 8. The camel-case names are still present, but were deprecated in Python 3.10 in favor of names that are lower case, sometimes with underscores (e.g. threading.current_thread() instead of threading.currentThread()).
The PEP
PEP 8 suggests that function names "should be lowercase, with words
separated by underscores as necessary to improve readability
", which
is what the changes for threading do. In addition, the PEP says
that names for variables,
methods, and arguments should follow the function convention, while types
and classes should use camel case (as defined by the PEP, which includes an
initial capital letter, unlike other camel-case definitions out there).
Del Valle calls that form of capitalization "PascalCase" and noted that
there are various inconsistencies in capitalization in the standard
library:
I realize that large chunks of the stdlib predates pep8 and therefore use various non-uniform conventions. For example, the logging module is fully camelCased, and many core types like `str` and `list` don't use PascalCase as pep8 recommends. The `collections` module is a veritable mosaic of casing conventions, with some types like `deque` and `namedtuple` being fully lowercased while others like `Counter` and `ChainMap` are PascalCased.
Given the precedent in threading, he wondered if it would be
feasible to "add aliases across the board for all public-facing
stdlib types and functions that don't follow pep8-recommended
casing
". The "wart" of inconsistent naming conventions in his code
bothers him, perhaps more than it should, he said, but he thought others
might feel similarly, which could perhaps lead to the problem being solved
rather than endured. Beyond that, though, it makes it somewhat more
difficult to teach good practices in the language:
I always try to cover pep8 very early to discourage people I'm training from internalizing bad habits, and it means you have to explain that the very standard library itself contains style violations that would get flagged in most modern code reviews, and that they just have to keep in mind that despite the fact that the core language does it, they should not.
Reactions
Overall, the reception was rather chilly, though not universally so. The commenters generally acknowledged that there are some unfortunate inconsistencies, but the pain of making a change like what he proposed is too high for the value it would provide. Eric V. Smith put it this way:
The cost of having two ways to name things for the indefinite future is too high. Not only would you have to maintain it in the various Python implementations, you'd have to explain why code uses "str" or "Str", or both.
Among Del Valle's suggested changes were aliasing the "type functions" to their PascalCase equivalents (e.g. str() to Str()), as Smith mentions. But that would be a fundamental change with no real upside and a high cost, Smith said. Mike Miller agreed with that, but wondered if there might be some middle ground, noting some common confusion with the datetime module:
One
of my biggest peeves is this:
import datetime # or
from datetime import datetime
Which is often confusing... is that the datetime module or the class
someone chose at random in this module? A minor thorn that… just doesn't
go away.
Neil Girdhar also thought
that changing str() and friends was "way too
ambitious. But some minor cleanup might not be so pernicious?
"
On the other hand, Jelle Zijlstra brought
some first-hand experience with changes of this sort to the discussion.
He had worked on
explicitly
deprecating (i.e. with DeprecationWarning) some of the camel-case
identifiers in the threading module; "in retrospect I don't feel like that was
a very useful contribution. It just introduces churn to a bunch of
codebases and makes it harder to write multiversion code.
"
Chris Angelico had a number of objections to Del Valle's ideas, but existing code that already reuses the names of some of the identifiers is particularly problematic:
Absolutely no value in adding aliases for everything, especially
things that can be shadowed. It's not hugely common, but suppose that
you deliberately shadow the name "list" in your project - now the List
alias has become disconnected from it, unless you explicitly shadow
that one as well. Conversely, a much more common practice is to
actually use the capitalized version as a variant:
class List(list):
...
This would now be shadowing just one, but not the other, of the
built-ins. Confusion would abound.
Angelico, along with others in the thread, pointed to the first
section of PEP 8, which is titled "A Foolish Consistency is
the Hobgoblin of Little Minds
" (from the Ralph Waldo Emerson quote).
That section makes it clear that the PEP is meant as a guide; consistency
is most important at the function and module level, with project-level
consistency being next in line. Any of those is more important than
rigidly following the guidelines. As
Angelico put it: "When a style guide becomes a boat anchor, it's not
doing its job.
"
Paul Moore had a more fundamental objection to aliasing the type functions, noting that the PEP does not actually offer clear-cut guidance. He quoted from the "Naming Conventions" section and showed how it led to ambiguity:
"""
Names that are visible to the user as public parts of the API should follow conventions that reflect usage rather than implementation.
"""
To examine some specific cases, lists are a type, but list(...) is a function for constructing lists. The function-style usage is far more common than the use of list as a type name (possibly depending on how much of a static typing advocate you are...). So "list" should be lower case by that logic, and therefore according to PEP 8. And str() is a function for getting the string representation of an object as well as being a type - so should it be "str" or "Str"? That's at best a judgement call (usage is probably more evenly divided in this case), but PEP 8 supports both choices. Or to put it another way, "uniform" casing is a myth, if you read PEP 8 properly.
But there are tools, such as the flake8 linter, that try to rigidly apply the PEP 8 "rules" to a code base; some projects enforce the use of these tools before commits can be made. But linters cannot really determine the intent of the programmer, so they are inflexible and are probably not truly appropriate as an enforcement mechanism. Moore said:
Unfortunately, this usually (in my experience) comes about through a "slippery slope" of people saying that mandating a linter will stop endless debates over style preferences, as we'll just be able to say "did the linter pass?" and move on. This of course ignores the fact that (again, in my experience) far *more* time is wasted complaining about linter rules than was ever lost over arguments about style :-(
Changes
Del Valle acknowledged
that "some awkward shadowing edge-cases are the strongest argument
against this proposal
", but Angelico disagreed. "The strongest argument is churn -
lots and lots of changes for zero benefit.
". Del Valle recognized
that the winds were strongly blowing against the sweeping changes he had
suggested, but in the hopes of "salvaging *something* out of
it
" he reduced the scope substantially: "Add pep8-compliant
aliases for camelCased public-facing names in the stdlib (such as logging
and unittest) in a similar manner as was done with threading
"
While Ethan Furman was in favor of such a change, others who had also mentioned the inconsistencies in unittest and logging did not follow suit. Most who replied to Furman recommended switching to pytest instead of unittest for testing, though alternatives to logging were not really on offer.
Guido van Rossum
had a succinct
response to the idea: "One thought: No.
" That essentially put the
kibosh on it (not formally, of course, but
Van Rossum's opinion carries a fair amount of weight), so Del Valle withdrew
it entirely. It is clear there was no groundswell of support for it, even
in more limited guises, but the discussion touched on various aspects of
the language and its history. It seems clear that if Python had been developed
in one fell swoop, rather than being added to in a piecemeal fashion over
decades, different choices would have been made. More (or even fully)
consistent identifiers within the project's code base may well have been
part of that.
But, at this point, it is far too late for a retrofit, at least for many; even if everyone agreed on how to change things, the upheaval, code churn, and dual-naming would be messy. And the gain, while not zero, is not huge. Beyond that, the day when the inconsistent names could actually be removed is extremely distant—likely never, in truth. So users and teachers of the language will need to keep in mind some semi-strange inconsistencies in the darker corners, warts, which exist in all programming (and other) languages. Humans are not consistent beasts, after all.
| Index entries for this article | |
|---|---|
| Python | Python Enhancement Proposals (PEP)/PEP 8 |
