Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Showing posts with label python essentials. Show all posts
Showing posts with label python essentials. Show all posts

Tuesday, November 10, 2015

Formatting Strings and the str.format() family of functions -- Python 3.4 Notes

I have to be clear that I am obsessed with the str.format() family of functions. I've happily left the string % operator behind. I recently re-discovered the vars() function.

My current go-to technique for providing debugging information is this:

print( "note: local={local!r}, this={this!r}, that={that!r}".format_map(vars)) )

I find this to be handy and expressive. It can be replaced with logging.debug() without a second thought. I can readily expand what's being dumped because all locals are provided by vars().

I also like this as a quick and dirty starting point for a class:

def __repr__(self):
    return "{__class__.__name__}(**{state!r})".format(__class__=self.__class__, state=vars(self))

This captures the name and state. But. There are nicer things we can do. One of the easiest is to use a helper function to reformat the current state in keyword parameter syntax, like this:

def args(obj):
    return ", ".join( "{k}={v!r}".format(k=k,v=v) for k,v in vars(obj).items())

This allows us to dump an object's state in a slightly nicer format. We can replace vars(self) with args(self) in our __repr__ method. We've dumped the state of an object with very little class-specific code. We can focus on the problem domain without having to wrestle with Python considerations.

Format Specifications

The use of !r for formatting is important. I've (frequently) messed up and used things like :s where data might be None. I've discovered that -- starting in Python 3.4 -- the :s format is unhappy with None objects. Here's the exhaustive enumeration of cases. 

>>> "{0} {1}".format("s",None)
's None'
>>> "{0:s} {1:s}".format("s",None)
Traceback (most recent call last):
  File "", line 1, in 
    "{0:s} {1:s}".format("s",None)
TypeError: non-empty format string passed to object.__format__
>>> "{0!s} {1!s}".format("s",None)
's None'
>>> "{0!r} {1!r}".format("s",None)
"'s' None"

Many things are implicitly converted to strings. This happens in a lot of places. Python is riddled with str() function evaluations. But they aren't everywhere. Python 3.3 had one that was removed for Python 3.4 and up.

Bottom Line: be careful where you use :s formatting.  It may do less than you think it should do.

Tuesday, July 21, 2015

A Surprising Confusion

Well, it was surprising to me.

And it should not have been a surprise.

This is something I need to recognize as a standard confusion. And rewrite some training material to better address this.

The question comes up when SQL hackers move beyond simple queries and canned desktop tools into "Big Data" analytics. The pure SQL lifestyle (using spreadsheets, or Business Objects, or SAS) leads to an understanding of data that's biased toward working with collections in an autonomous way.

Outside the SELECT clause, everything's a group or a set or some kind of collection. Even in spreadsheet world, a lot of Big Data folks slap summary expressions on the top of a column to show a sum or a count without too much worry or concern.

But when they start wrestling with Python for loops, and the atomic elements which comprise a set (or list or dict), then there's a bit of confusion that creeps in.

An important skill is building a list (or set or dict) from atomic elements. We'll often have code that looks like this:

some_list = []
for source_object in some_source_of_objects:
    if some_filter(source_object):
        useful_object = transform(source_object)
        some_list.append(useful_object)

This is, of course, simply a list comprehension. In some case, we might have a process that breaks one of the rules of using a generator and doesn't work out perfectly cleanly as a comprehension. This is somewhat more advanced topic.

The transformation step is what seems to causes confusion. Or -- more properly -- it's the disconnect between the transformation calculations on atomic items and the group-level processing to accumulate a collection from individual items.

The use of some_list.append() and some_list[index] and some_list is something that folks can't -- trivially -- articulate. The course material isn't clarifying this for them. And (worse) leaping into list comprehensions doesn't seem to help.

These are particularly difficult to explain if the long version isn't clear.

some_list = [transform(source_object) for source_object in some_source_of_objects if some_filter(source_object)]

and

some_list = list( map(transform, filter(some_filter, some_source_of_objects)) )

I'm going to have to build some revised course material that zeroes in on the atomic vs. collection concepts. What we do with an item (singular) and what we do with a list of items (plural).

I've been working with highly experienced programmers too long. I've lost sight of the n00b questions.

The goal is to get to the small data map-reduce. We have some folks who can make big data work, but the big data Hadoop architecture isn't ideal for all problems. We have to help people distinguish between big data and small data, and switch gears when appropriate. Since Python does both very nicely, we think we can build enough education to school up business experts to also make a more informed technology choice.