Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Showing posts with label mastering object-oriented python. Show all posts
Showing posts with label mastering object-oriented python. Show all posts

Tuesday, October 6, 2020

The Python Podcast __init__

Check out https://www.pythonpodcast.com/steven-lott-learn-to-code-episode-283/. This was a fun conversation on Python and learning.

We didn't talk about my books in detail. Instead, we talked about learning and what it takes to get closer to mastery.

It's a thing I worry about. I suspect other writers worry about it, also. Will the reader take the next steps? Or will they simply think they've got it because the read about it?

Tuesday, October 16, 2018

The Edge of the Envelope

I don't -- generally -- think of myself as an edge-of-the-envelope developer. I'm a tried-and-proven kind of engineer. I want stuff that's been around for years, with a long history of changes.

Except.

Today.

Currently, I'm revising Mastering Object-Oriented Python. Second Edition.

That means upgrading everything to Python 3.7 with full type hints throughout almost all of the 18 chapters. (SQLAlchemy presents some problems, so we're not going deep there.)

The chapter on foundational WSGI applications is *totally* broken. I can't get anything to work with mypy. (The unit tests run, but mypy complains. Loudly.) Of course, I tried every wrong thing for three solid days. Then I pulled the stub file from typeshed and realized how dumb I was.

Okay. I finally got the correct type hints. Yay!

But.

Something in mypy is balking at the start_response() function calls. Too many arguments.

Read the issues. Hm. Stack Overflow. Hm.

Just to be sure, I updated to the new 0.630 release in September, 2018.

Problem solved. So. I've arrived at the edge of the envelope. I now require the absolutely latest and greatest mypy release. By the time I'm done with the rewrites, this release will be ancient history. But today, it was wonderful to get past the examples.

Tuesday, July 24, 2018

Mastering Object-Oriented Python -- 2nd Edition

It's time to revise Mastering Object-Oriented Python. While the previous edition is solidly focused on Python3, it lacks some important features:
  • F-Strings
  • Type Hints
  • types.NamedTuple
  • Data Classes
So. There's some stuff to add. I don't think there's too much to take away. I plan to make some things a little more tidy. I will remove all references to Python2 and all references to how things used to be and why they're better now.

It will be several months before this is available. Stand by for updates.

The earliest drafts of this book date back to 2002. Seriously. I've been over this material a lot in the past 1.5 decades.

The nascent form of this book took me years (maybe 10 years) to accumulate. It covered everything: data structures, statements, built-in functions, classes, and a bunch of libraries. It was beyond merely ambitious and off into some void of "cover all the things." 

I was motivated by my undergrad CS text books on the foundations of computer science. The idea of putting the language features into a parallel structure with boolean algebra, set theory, and number theory was too cool for words. And -- lacking the necessary formal background -- it was something I'm not able to present very well.

While I wanted to cover all of Computer Science, acquisition editors were pointed out how crazy that idea was. A focus on the object-oriented features of Python was sufficient to sell a distinctive book. And they were absolutely right.

As I rework the outline for the 2nd edition, there are some other topics that crop up. These are not going to wind up in the book, but they're an implicit feature of the topics being covered.

CS Foundations and Python

One of the best of the introductory books (which came out after I graduated) was Structured Concurrent Programming With Operating Systems Applications. They presented a nested collection of sub-languages: SP/k. The organization of the nested subsets can be helpful for exposing programming incrementally. There are issues, and we'll look at them in detail below. Here's the collection of subsets from the original book (and related articles.)

  • SP/1 expressions and output. The print() function.
  • SP/2 variables, assignment, and the input() function.
  • SP/3 selection and repetition. The Python if and while constructs are the logical minimum, but the for statement makes more sense because it's so widely used.
  • SP/4 character strings. 
  • SP/5 arrays. Python lists, really.
  • SP/6 procedures. Python function definition.
  • SP/7 formatted input-output. f-strings for output, and regular expressions for parsing.
  • SP/8 records and files.
There are a lot of gaps between this list of subsets and modern programming languages. SP/k was explicitly based on subset of PL/I, saving the complexity of implementing special compilers. It also reflects the mid-70's state of the art.

What didn't age well is the implicit understanding that numbers are the only built-in data types. Strings are so magical they're isolated into two separate subsets: SP/4 and SP/7. Arrays are called out, but sets and dictionaries didn't exist in PL/I and aren't part of this nested sequence.

Also. And even more fundamental.

There's a bias toward "procedural" programming. The SP/k subsets expose the statements of the language. There are few data structures, and it seems the data structures require some statements before they're useful.

This leads to my restructuring of this. It doesn't apply to the Mastering OO Python book. It's something I use for Python bootcamp training.

  • py/1 expressions and output: int, float, numeric built-in functions, and the print() function.
  • py/2 variables, assignment, and the input() function.
  • py/3 strings, formatting, and various built-in string parsing methods.
  • py/4 tuples and multiple assignment. (Since tuples are immutable, they're more like strings than they are like lists.) And yes, this is kind of short.
  • py/5 if statements and try/except statements. These are the two fundamental "selection" statements. The raise statement is deferred until the functions section.
  • py/6 sets and the for statement.
  • py/7 lists.
  • py/8 dictionaries.
  • py/9 functions (avoiding higher-order functions, decorators, and generator functions.)
  • py/10 contexts, with, and file I/O.
  • py/11 classes and objects.
  • py/12 modules and packages.
The point here is to expose the data structures as the central theme of Python. Statements follow as needed to work with the data structures. 

Note that some topics -- like break, continue, and while -- are advanced parts of working with data structures.

The standard library? Not included. Perhaps should be. But. It's technically separate from the language and all of this can be done without any imports. We would then cover a bunch of standard library modules. The order includes math, random, re, collections, typing, and pathlib

Tuesday, September 13, 2016

On One Aspect of Design Patterns -- Flexibility

Something I forget to think about is the degree of detail or granularity of design patterns.  I have my own viewpoint and I often assume that others share it.

Here's a quote from an email describing the PLoP (Pattern Languages of Programs) patterns as quite distinct from the Gang of Four (Design Patterns: Elements of Reusable Object-Oriented Software) patterns.
In the main, the PLoP patterns are less granular than the persnickety GoF
"Design Patterns." (Classic GoF, in part, static type binding work-arounds. And
you need to talk about a "facade" pattern? Really? Although see Fowler's at it
again, coining a ™ term - "fluent API" - for Some Not Egregiously Stupid
Practice, to feed to the credulous who have never reflected on what they are doing.)
Cutting through the editorializing, the author is describing two families.

  • GoF patterns that are essentially ways to cope with static type checking in Java and C++.
  • PLoP patterns which are a little more generic and more widely applicable.

More...
"Plug-in Pattern" is a nice example. Enumerates the stuff you kinda know, with
qualities / attributes of its proposal, plus application samples / outcomes of
applying the pattern. The claims to relevance throughout are reminiscent of the
investigation behind Parnas' "Criteria for Decomposing Systems into Modules."
My habit is to assume this is pretty widely known. I assume everyone has wrestled with design patterns large and small and found that some of the GoF apply to Python, but the implementation details will differ. Dramatically. 

Look at the Singleton design pattern, for example. The concept is profound. There are times when we want stateful, global, Singleton instances. The Java or C++ technique of a small factory method which returns the one-and-only instance (or creates the one-and-only instance in the rare edge case) is extremely strange in Python. We can implement it. But why?

Module objects in Python are stateful singletons. Rather than invent a Singleton class, we can -- trivially -- just use a module. And we're done. Problem solved. No Code Written.

The email served as a reminder that sometimes people aren't quite so flexible in their understanding of design patterns. I need to cut them some slack and guide them to seeing that there's wiggle room there. The email reminds me that some people feel compelled to either follow the GoF prescription or discard the GoF entirely. The reminder about PLoP and other pattern languages is a helpful reminder to be more flexible.

The point here is that patterns are a concept. Not a law.

Tuesday, April 7, 2015

Going to PyCon 2015

In Montreal! How cool is that?

I'll be working for my current employer, also a sponsor, to locate Python talent.

I'll have a few copies of my books that I can give away.

Most importantly, the promotional code PYCON_LOTT gives 50% off my Packt titles and runs from April 7th to April 14th

Tuesday, January 20, 2015

Webcast Wednesday

Be there: http://www.oreilly.com/pub/e/3255

Of course, I've got too many slides. 58 slides for a 60 minute presentation. That's really about 2 hours of material. Unless people have questions, then it's a half-day seminar.

Seriously.

I think I've gone waaaay too far on this. But it's my first one, and I'd hate to burn through all eight slides, take a few questions and be done too soon.

If this goes well, perhaps I'll see if I can come up with other 1-hour topics.

I worry a great deal about rehashing the obvious.

On the other hand, I'm working with a room full of newbies, and I think I could spend several hours on each of their questions.

And straightening out their confusions.

Case in point. Not directly related to the webcast.

One of my colleagues had seen a webcast which described Python's &, |, and ~ operators, comparing  them with and, or and not.

I'm not 100% sure, but... I think that this podcast -- I'm getting this second-hand; it's just hearsay -- showed that there's an important equivalence between and and &.

This is true, but hopelessly obscure. Since & has a higher priority than the comparison operators, there will be serious confusion when one fails to parenthesize properly.

Examples like this abound:

>>> 3 == 3 & 4 < 5
False
>>> (3 == 3) & (4 < 5)

True

Further, the fact that & can't short-circuit had become confusing to the colleague. I figured out some of what was going on when trying to field some seemingly irrelevant questions on "Why are some operators more efficient?" and "How do you know which to use?"

Um. That's not really the point. There's no confusion if you set the bit-fiddling operators aside.

The point is that and, or, not, and the if-else conditional expression live in their own domain of boolean values. The fact that &, |, ^, and ~ will also operate on boolean values is a kind of weird duplication, not a useful feature. The arithmetic operators also work on booleans. Weirdly.

The Python rules are the rules; it makes sense for True&True to yield True. Results depend on the operands. It would be wrong in that sense for True&True to be 1. But it would also fit the concept of these operators a little better if they always coerced bool to int. This happens for * and +: True+True == 2.

Why can't it be true for & and |? It would reduce potential confusion. 

I'm sure the person who implemented __and__(), __or__(), __xor__(), and __invert__() was happy to create a parallel universe between and and &. I'm not sure I agree.

And perhaps I should have a webcast on Python logic. It seems like a rehash of fundamentals to me. But I have colleagues confused by fundamentals. So perhaps I'm way wrong about what's fundamental and what's useful information.


Thursday, November 6, 2014

Thursday, June 26, 2014

Package Deal for Learning Python

If you're very new to programming in general, Python's a great place to start.

There are many, many tutorials. I won't even try to summarize them. They're generally good. And the more you read, the more you learn.

Moving past the n00bz needs, there are some more advanced books. Here's a collection for generalists:


My suggestion is to master the general features of the language overall.

Focus on specific things (Django, NLTK, SciPy, Maya, Scrapy, MatPlotLib, etc.) can follow.

I worry that early exposure to some of the details of Python-based packages may obscure the fundamentals of using the language properly. Perhaps that worry is misplaced. I know that the NLTK Book has numerous good examples of Python which are independent of the NLTK focus.

Thursday, June 12, 2014

TDD, API Design and Refactoring

See this short discussion on a Stingray Reader feature:
https://sourceforge.net/p/stingrayreader/discussion/COBOL/thread/d2132851/?limit=25#2a3a

This turned into an exercise in pure TDD.

<rant>
I'm not a fan of applying TDD in a strict, death-march fashion.

I see the comments on Stack Overflow that indicate that some folks feel strongly that strict TDD is somehow helpful. While "test before code" is laudable and often helpful, there's no royal road to good software.

Design involves a great deal of back and forth between code and test. A great deal.

It's logically impossible to write a test without having thought about the code. In order to write the test first, there must be a notional API against which the test is written. Anyone who requires that the test file must be written before the notional class or module is just playing at petty tyranny.

The notional design -- the rough outline of the class or module -- can be written into a file before any tests. It's okay. It is still test-driven because the considerations of testability drove the design process.

In particular, when starting "from scratch" -- with nothing -- writing tests first is senseless. Some module or package structure must exist for the test modules to import.

</rant>
Having ranted, it still arises that the tests do come before any code under some circumstances.

In this case, the requested functionality was quite difficult to visualize. However, it was possible to cobble together a test case that simplified the problem down to something like this this:


01 Some-Record.
     05 Header PIC XXX.
     05 Body PIC X(17).

01 ABC-Segment.
     05 Field-ABC PIC X(17).

01 DEF-Segment.
     05 Field-DEF PIC X(17).


In COBOL, the program would use logic like IF Header EQUALS "ABC" THEN MOVE Body TO ABC-Segment. We need a way to handle something like this in Python so that we can parse the EBCDIC COBOL data.

This summarized example allowed construction of a test case that made use of a API that might have existed. I was pretty sure I had a test case that showed an approach.

What Actually Happened

Since the application already had 178 unit tests, there was plenty of structure that worked.

The single new unit test relied on a notional API that wasn't really in place. The new test bombed grotesquely.

There are two solutions:

  • Modify the test.
  • Fix the notional API so that it works properly.

I started out chasing the second option. I tweaked some things. More tests failed. I tweaked some more things. The new test finally passed, but another test was failing.

Some careful study of the failing test revealed that my approach was wrong. Way wrong.

The notional API was a bad idea.

The tweaks to make it work were a worse idea.

Back to the Lab Bench

At this point, I had made enough changes that the only thing to do was copy the new test and use the Git Revoke on the local changes to unwind the awful mistakes.

Staring again, I had a slightly better grip on the relevant code. I had a failing test. I tried a different approach that wasn't quite so inventive. This meant modifying the test.

I actually went through a few iterations of the test, using the test method as a kind of lab bench.

A more Pythonic approach to the lab bench is to work from the >>> prompt. I think that all of the exemplary projects use the >>> prompt examples in their documentation. This is a way to narrow and clarify the API. As projects get big, they can sprawl. New features can wind up with many imports to pick and choose elements from existing modules.

When it becomes difficult to use the >>> prompt as the lab bench, that's a sign that the API is too complex. Refactoring must happen.

Using the unit test framework as the lab bench was a hint that something had drifted out of tolerance.

However. I did get a test which passed. Yay. Sort of.

The test code was hideous.

TDD and API Design

The point of TDD, however, is that we have a working suite of tests. Refactoring won't break anything.

The point was that the hideous API could be rewritten into something that both

  • Passed all the tests, and
  • Was usable at the >>> prompt.
It's difficult to express how valuable the Python >>> prompt is to help clarify API design issues.

The rule is this:

If the API doesn't make sense at the >>> prompt, it's incomprehensible

Sadly, Java doesn't have this kind of boundary. Java programming can spin into quite complex API's, limited only by the laziness of the programmer who avoids refactoring.

Or the malice of the programmer's manager in not allowing time to refactor.

Thursday, May 29, 2014

Stingray 4.4 Update -- the Posix split command applied to COBOL files

Here's an interesting problem. Implement the split command for mainframe COBOL EBCDIC files with their BDW and RDW headers.

The conventional split can't handle COBOL EBCDIC files because they don't have sensible \n line breaks. Translating an EBCDIC file to ASCII is high-risk because COMP and COMP-3 fields will be trashed by the translation.
If the files include Occurs Depending On, then the FTP transfer should include the RDW/BDW headers. The SITE RDW (or LOCSITE RDW) are essential. It's much faster to include this overhead. Stingray can process files without the headers, but it's slower.
There are two essential Python techniques for building file splitters than involve parsing.
  • The itertools.groupby() function.
  • The with statement.
Along with this, we need an iterator over the underlying records.  For example, the stingray.cobol.RECFM subclasses will parse the various mainframe RECFM options and iterate over records or records+RDW headers or blocks (BDW headers plus records with RDW headers.

The itertools.groupby() function can break a record iterator into groups based on some group-by criteria. We can use this to break into sequential batches.

itertools.groupby( enumerate(reader), lambda x: x[0]//batch_size )

This expression will break the iterable, reader, into groups each of which has a size of batch_size records. The last group will have total%batch_size records.

The with statement allows us to make each individual group into a separate context. This assures that each file is properly opened and closed no matter what kinds of exceptions are raised.

Here's a typical script.

    import itertools
    import stringray.cobol
    import collections
    import pprint
    
    batch_size= 1000
    counts= collections.defaultdict(int)
    with open( "some_file.schema", "rb" ) as source:
        reader= stringray.cobol.RECFM_VB( source ).bdw_iter()
        batches= itertools.groupby(enumerate(reader), lambda x: x[0]//batch_size):
        for group, group_iter in batches:
            with open( "some_file_{0}.schema".format(group), "wb" ) as target:
            for id, row in group_iter:
                target.write( row )
                counts['rows'] += 1
                counts[str(group)] += 1
    pprint.pprint( dict(counts) )

There are several possible variations on the construction of the reader object.

  • cobol.RECFM_F( source ).record_iter() -- result is RECFM_F
  • cobol.RECFM_F( source ).rdw_iter() -- result is RECFM_V; RDW's have been added. 
  • cobol.RECFM_V( source ).rdw_iter() -- result is RECFM_V; RDW's have been preserved. 
  • cobol.RECFM_VB( source ).rdw_iter() -- result is RECFM_V; RDW's have been preserved; BDW's have been discarded. 
  • cobol.RECFM_VB( source ).bdw_iter() -- result is RECFM_VB; BDW's and RDW's have been preserved. The batch size is the number of blocks, not the number of records.
This should allow slicing up a massive mainframe file into pieces for parallel processing.

Thursday, May 22, 2014

Python Package Design, Refactoring and the Stingray Reader Project

We'll be digging into Mastering Object-Oriented Python. Chapter 17, specifically.

We'll also be looking at a big refactoring of the Stingray Schema-Based File Reader.

We can identify three species of packages.

One common design is a Simple Package. A directory with an empty __init__.py file. This package name becomes a qualifier for the internal module names. The package is simply a namespace for modules. We’ll use the package with something like this:

import package.module


Another common design is the Module-Package. This is a package which appears to be a module.  It will have a larger and more sophisticated __init__.py that is a effectively, a  module definition. There are two variations on this theme. Sometimes we'll use this during the early stages of development because we don't know if the package will get really big or stay small. If we start out with a package and all the code is in the __init__.py, we can refactor down to a module.

The more common use for a module-package is to have the __init__.py import objects or other modules from the package directory. Or, it can stand as a part of a larger design that includes the top-level module and the qualified sub-modules. We’ll use the package with something like this:

import package

or perhaps

from package import thing


The third common pattern is a package where the __init__.py selects among alternative implementations. The os module is a good example of this. We’ll use the package with something like this:


import package


Knowing that it did something roughly like the following for us.

import package.implementation as package


Refactoring Module to Package

The Stingray angle on this is the need to add iWork '13 numbers to the collection of spreadsheets which it can parse. The iWork '13 format is unique.

Previously, all of the spreadsheets fell into three families:

iWork '13 uses Snappy compress and Protobuf Serialization. Without some documentation, the files would be incomprehensible.  Read this: See https://github.com/obriensp/iWorkFileFormat. Brilliant. 

The previous releases of Stingray had a single, large module to handle a variety of workbook formats. Folding iWork '13 into this module would have been lunacy. It was already large to the point of being painful to understand.

The original module will be transparently turned into Module-Package. The API (import stingray.workbook or from stingray.workbook import SomeClass) will remain the same.

However.

The implementation will involve a package with each workbook format as a separate module inside that package. At the top, the __init__.py will include code like the following.

    from stingray.workbook.csv import CSV_Workbook
    from stingray.workbook.xls import XLS_Workbook
    from stingray.workbook.xlsx import XLSX_Workbook
    from stingray.workbook.ods import ODS_Workbook
    from stingray.workbook.numbers_09 import Numbers09_Workbook
    from stingray.workbook.numbers_13 import Numbers13_Workbook
    from stingray.workbook.fixed import Fixed_Workbook

This has the advantage of allowing us to include additional parsing cruft in each module that's not part of the exposed API in the workbook package.

The Mastering Object-Oriented Python book has more details on this kind of design.

Thursday, May 15, 2014

Want a copy of Mastering Object-Oriented Python? Free?

Want a copy free? See this contest: http://www.blog.pythonlibrary.org/2014/05/12/ebook-contest-win-a-free-copy-of-mastering-object-oriented-python/.

If you're really interested, I can sign a copy.  That will double the shipping cost, so perhaps that's not the best idea.

The bad news is that the errata have started to trickle in. Some of the default serialization cases in chapter 9 aren't handled properly. The demonstrations don't exercise the defaults very well, so things happen to work for me, but don't work when generalized or pulled out of context.

Sigh.

What's important -- I guess -- is that we have a critical mass of readers who are applying the concepts and finding problems.

Thursday, April 3, 2014

Mastering Object-Oriented Python

See http://www.packtpub.com/mastering-object-oriented-python/book

Coming soon.

This is relatively deep, under-the-hood stuff for folks who want to master the Python feature set.

Here's the overview of what you get:

  • 0 Some Preliminaries 3 examples, 56 lines
  • 1 The __init__() Method 55 examples, 351 lines
  • 2 Integrating Seamlessly with Python: Basic Special Methods 92 examples, 558 lines
  • 3 Attribute Access, Properties, and Descriptors 33 examples, 310 lines
  • 4 The ABC's of Consistent Design 18 examples, 108 lines
  • 5 Using Callables and Contexts 17 examples, 214 lines
  • 6 Creating Containers and Collections 50 examples, 438 lines
  • 7 Creating Numbers 12 examples, 232 lines
  • 8 Decorators And Mixins – Cross Cutting Aspects 39 examples, 233 lines
  • 9 Serializing and Saving: JSON, YAML, Pickle, CSV and XML 77 examples, 648 lines
  • 10 Storing and Retrieving Objects via shelve 34 examples, 272 lines
  • 11 Storing and Retrieving Objects via SQLite 45 examples, 410 lines
  • 12 Transmitting and Sharing Objects 38 examples, 388 lines
  • 13 Configuration Files and Persistence  59 examples, 490 lines
  • 14 The Logging and Warning Modules 46 examples, 343 lines
  • 15 Designing for Testability 38 examples, 393 lines
  • 16 Coping With The Command Line  42 examples, 222 lines
  • 17 Module and Package Design  31 examples, 93 lines
  • 18 Quality and Documentation  42 examples, 269 lines
  • Preface 3 examples, 12 lines
  • Bonus Chapter 1 Archives and Directories  11 examples, 119 lines
  • Bonus Chapter 2 Case Study: Document Analysis  39 examples, 308 lines

824 examples, 6467 lines

Yes. That's a lot of code. It's relentless.