Python 4
Python 4
Simple patterns
Using regular expression
Pattern power
Modifying string
Introduction
\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-
Z]{2,4}\b
What is this?
Is it a language?
Is it an engine?
Introduction (cont…)
To get complete information from
incomplete data in hand
Regular expression patterns are
compiled into a series of bytecodes
which are then executed by a matching
engine written in C
Simple Patterns
A string
A pattern that matches itself
Special characters
. ^ $ * + ? { [ ] \ | ( )
[ and ]
The first metacharacters we'll look at are "["
and "]".
Used for specifying a character class, which is
a set of characters that you wish to match
Characters can be listed individually, or a
range of characters can be indicated by giving
two characters and separating them by a "-".
Character class cont …
Metacharacters are not active inside
classes.
^ - To match the characters not within
a range by complementing the set
Special provision for backslash
Class using backslash
\d [0-9]
\D [^0-9]
\w [a-zA-Z0-9_]
\W [^a-zA-Z0-9_]
\s [ \t\n\r\f\v]
\S [^ \t\n\r\f\v]
How the engine works
Eager
Greedy
Lazy
Compiling Regular Expressions
>>> import re
>>> p = re.compile('ab*')
>>> print p
<re.RegexObject instance at
80b4150>
re.compile() also accepts an optional flags
argument
>>> p = re.compile('ab*',
re.IGNORECASE)
Performing Matches
match()Determine if the RE
matches at the beginning
of the string.
VERBOSE, X Enable verbose REs, which can be organized more cleanly and understandably.
Pattern powers
|
$
^
\b
\B
Grouping
>>> p = re.compile('(ab)*')
>>> print
p.match('ababababab').span()
(0, 10)
Grouping
>>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'
Non-capturing and Named
Groups
>>> m = re.match("([abc])+",
"abc")
>>> m.groups()
('c',)
>>> m = re.match("(?:[abc])+",
"abc")
>>> m.groups()
()
(?=...)
Positive lookahead assertion. This
succeeds if the contained regular
expression, represented here by ...,
successfully matches at the current
location, and fails otherwise. But, once
the contained expression has been
tried, the matching engine doesn't
advance at all; the rest of the pattern is
tried right where the assertion started.
(?!...)
Negative lookahead assertion. This is the
opposite of the positive assertion; it
succeeds if the contained expression
doesn't match at the current position in the
string.
What are the following?
.*[.].*$
.*[.](?!bat$).*$
.*[.](?!bat$|exe$).*$
Modifying strings
Method/Attribute Purpose
split() Split the string into a list, splitting it wherever the RE matches
sub() Find all substrings where the RE matches, and replace them with a different string
subn() Does the same thing as sub(), but returns the new string and the number of replacements
Splitting Strings
>>> p = re.compile(r'\W+')
>>> p.split('This is a test, short
and sweet, of split().')
['This', 'is', 'a', 'test',
'short', 'and', 'sweet', 'of',
'split', '']
>>> p.split('This is a test, short
and sweet, of split().', 3)
['This', 'is', 'a', 'test, short
and sweet, of split().']
Know the delimeter
>>> p = re.compile(r'\W+')
>>> p2 = re.compile(r'(\W+)')
>>> p.split('This... is a test.')
['This', 'is', 'a', 'test', '']
>>> p2.split('This... is a test.')
['This', '... ', 'is', ' ', 'a', '
', 'test', '.', '']
Search and Replace
>>> p = re.compile(
'(blue|white|red)')
>>> p.sub( 'colour', 'blue socks
and red shoes')
'colour socks and colour shoes'
>>> p.sub( 'colour', 'blue socks
and red shoes', count=1)
'colour socks and red shoes'
Debugging
Kodos
Site:http://kodos.sourceforge.net/
Case studies
Street address
URL
A valid C variable
Phone numbers
Cell numbers
Justifying output
for x in range(1, 11):
print repr(x).rjust(2), repr(x*x).rjust(3),
# Note trailing comma on previous line
print repr(x*x*x).rjust(4)
for x in range(1,11):
print '%2d %3d %4d' % (x, x*x, x*x*x)
zfill
>>> '12'.zfill(5)
'00012'
>>> '-3.14'.zfill(7)
'-003.14'
>>> '3.14159265359'.zfill(5)
'3.14159265359'
Passing tuple
>>> table = {'Sjoerd': 4127, 'Jack': 4098,
'Dcab': 7678}
>>> for name, phone in table.items():
... print '%-10s ==> %10d' % (name,
phone)
Files and I/O
Opening
Closing
Reading
Writing
Binary files
Files
>>> f=open('/tmp/workfile', 'w')
>>> print f
<open file '/tmp/workfile', mode 'w' at
80a0960>
>>>>>> myfile.write('hello text file\n')
>>> myfile.close( )
>>> myfile = open('myfile')
>>> myfile.readline( )
'hello text file\n'
>>> myfile.readline( )
''
Storing and parsing Python
objects
>>> X, Y, Z = 43, 44, 45
>>> S = 'Spam'
>>> D = {'a': 1, 'b': 2}
>>> L = [1, 2, 3]
>>>
>>> F = open('datafile.txt', 'w')
>>> F.write(S + '\n')
>>> F.write('%s,%s,%s\n' % (X, Y, Z))
>>> F.write(str(L) + '$' + str(D) + '\n')
>>> F.close( )
Methods of File Objects
call f.read(size), which reads some quantity of
data and returns it as a string. size is an
optional numeric argument.
When size is omitted or negative, the entire
contents of the file will be read and returned;
it's your problem if the file is twice as large as
your machine's memory
>>> f.read()
'This is the entire file.\n'
Raw bytes display
>>> bytes = open('datafile.txt').read( )
>>> bytes
"Spam\n43,44,45\n[1, 2, 3]${'a': 1, 'b': 2}\n"
>>> print bytes
Spam
43,44,45
[1, 2, 3]${'a': 1, 'b': 2}
Binary files
>>> F = open('data.bin', 'wb')
>>> import struct
>>> bytes = struct.pack('>i4sh', 7, 'spam', 8)
>>> bytes
'\x00\x00\x00\x07spam\x00\x08'
>>> F.write(bytes)
>>> F.close( )
Binary files (read)
>>> F = open('data.bin', 'rb')
>>> data = F.read( )
>>> data
'\x00\x00\x00\x07spam\x00\x08'
>>> values = struct.unpack('>i4sh', data)
>>> values
(7, 'spam', 8)
Binary files
file = open('test.txt', 'rb')
while True:
chunk = file.read(10)
if not chunk:
• break
print chunk,
readline
f.readline() reads a single line from the
file; a newline character (\n) is left at
the end of the string, and is only
omitted on the last line of the file if the
file doesn't end in a newline.
>>> f.readline()
'This is the first line of the file.\n'
readlines
f.readlines()
>>> for line in f: print line
Read in loops
file = open('test.txt')
while True:
char = file.read(1)
if not char: break
print char,
As opposed to…
for char in open('test.txt').read( ):
print char
leftovers
>>> f = open('/tmp/workfile', 'r+')
>>> f.write('0123456789abcdef')
>>> f.seek(5) # Go to the 6th byte in the
file
>>> f.read(1)
'5'
>>> f.seek(-3, 2) # Go to the 3rd byte
before the end
>>> f.read(1)
Object flexibility
L = ['abc', [(1, 2), ([3], 4)], 5]
Modules
Importing everything
Importing specific functions
>>> from fibo import fib, fib2
>>> fib(500)
1 1 2 3 5 8 13 21 34 55 89 144 233 377
The Module Search Path
Compiled python files
Existing modules
sys
os
__builtin__
dir functions
Using stdout
>>> import sys
>>> temp = sys.stdout
>>> sys.stdout = open('log.txt', 'a')
>>> print 'spam'
>>> print 1, 2, 3
>>> sys.stdout.close( )
>>> sys.stdout = temp
The >>
>>> log = open('log.txt', 'w')
>>> print >> log, 1, 2, 3
>>> print >> log, 4, 5, 6
>>> log.close( )
>>> print 7, 8, 9
7 8 9
>>> print open('log.txt').read( )
1 2 3
4 5 6
Exception Handling
Try…except block
Raising exceptions
Else block
Finally
Creating own exception class
Try block
while True:
try:
x = int(input("Please enter a number: "))
print(“x is”, x)
break
except ValueError:
print ("Oops! That was no valid number. Try
again..." )
Multiple except
import sys
try:
f = open('myfile.txt’)
s = f.readline()
i = int(s.strip())
except FileNotFoundError:
print (“No such file”)
except ValueError:
print "Could not convert data to an integer.“
except:
print "Unexpected error:", sys.exc_info()[0]
raise
Argument to Exception
try:
... raise Exception('spam', 'eggs')
... except Exception as inst:
… print type(inst)
... print inst.args
... print inst
... x, y = inst.args
... print 'x =', x
... print 'y =', y
... ^D
Else block
for arg in sys.argv[1:]:
try:
f = open(arg, 'r')
except FileNotFoundError:
print('cannot open', arg)
else:
print (arg, 'has', len(f.readlines()), 'lines’)
f.close()
Else block…why
The use of the else clause is better than
adding additional code to the try clause
because it avoids accidentally catching
an exception that wasn't raised by the
code being protected by the try ...
except statement.
The details
def this_fails():
x = 1/0
...^D
>>> try:
... this_fails()
... except ZeroDivisionError, detail:
... print 'Handling run-time error:', detail
Raising Exceptions
>>> try:
... raise NameError, 'HiThere'
... except NameError:
... print 'An exception flew by!'
... raise
...
User-defined Exceptions
import exceptions
class Expletive(exceptions.Exception):
def __init__(self):
return
def __str__(self):
print "","An Expletive occured!"
def main():
raise Expletive
if __name__=="__main__": try: main()
except ImportError: print "Unable to import
something..." except Exception, e: raise e
Finally block
try:
... raise KeyboardInterrupt
... finally:
... print 'Goodbye, world!'
...
Tying them together
def divide(x, y):
... try:
... result = x / y
... except ZeroDivisionError:
... print "division by zero!"
... else:
... print "result is", result
... finally:
... print "executing finally clause"
...
Debugging
pdb.run(statement[, globals[, locals]])
pdb.runeval(expression[, globals[,
locals]])
pdb.runcall(function[, argument, ...])
pdb.set_trace()
pdb.post_mortem(traceback) pdb.pm()
Pdb.run()
pdb.run() executes the
string statement under the debugger's
control.
Global and local dictionaries are optional
parameters
Example
def test_debugger(some_int):
print "start some_int>>", some_int
return_int = 10 / some_int
print "end some_int>>", some_int
return return_int
if __name__ == "__main__":
pdb.run("test_debugger(0)")
Pdb.runlevel()
pdb.runeval() is identical to pdb.run(),
except that
pdb.runeval() returns the value of the
evaluated string expression
if __name__ == "__main__":
pdb.runeval("test_debugger(0)")
Pdb.runcall()
pdb.runcall() calls the
specified function and passes any
specified arguments to it
if __name__ == "__main__":
pdb.runcall(test_debugger, 0)
pdb.set_trace()
drops the code into the debugger when
execution hits it:
def test_debugger(some_int):
pdb.set_trace()
…
if __name__ == "__main__":
test_debugger(0)
pdb.post_mortem(traceback)
pdb.post_mortem() performs postmortem
debugging of the specified traceback
if __name__ == "__main__":
try:
test_debugger(0)
except:
import sys
tb = sys.exc_info()[2]
pdb.post_mortem(tb)
pdb.pm()
pdb.pm() performs postmortem
debugging of the traceback contained
in sys.last_traceback
def do_debugger(type, value, tb):
pdb.pm()
if __name__ == "__main__":
sys.excepthook = do_debugger
test_debugger(0)
A sample session
First, step way down to f2() using
the (s)tep command,
then see where the position with
the (l)ist command
set a break point in f4() using
the (b)reak command b f4
continue to the break point using
the (c)ontinue command
Cont…
next issued a (w)here command
navigate up the stack trace with
the (u)p command (several)
execute the list command
Execute list command with 2 arguments
List 5, 10
The up command moves the debugger
up a frame in the stack trace to an older
frame
args lists out the arguments passed to
functions
To evaluate a python command,
precede with !
!var = “value”
Managing breakpoint
To view all breakpoints
break
To disable and enable all breakpoints
enable and disable command
use break to view them
clear to delete a breakpoint entirely
tbreak-> temporary breakpoint.
Cleared as soon as it is hit.
Conditional breakpoint
b 17, j > 10
To pass over a breakpoint without
stopping there:
(Pdb) ignore 1 2
Ignoring and Triggering
Explicitly resetting the ignore count to
zero re-enables the breakpoint
immediately.
(Pdb) ignore 1 0
(Pdb) break 9
(Pdb) commands 1
An example session
(Pdb) commands 1
(com) print 'debug i =', i
(com) print 'debug j =', j
(com) print 'debug n =',
n (com) end
(Pdb) continue
(Pdb) continue
navigate down the stack trace a few
frames with the (d)own command
what does it really mean to be in a
different frame?
print out a couple of variables in two
different frames
print some_arg
d
print some_arg
Jumping around
jumping ahead
jumping back
Illegal jumps
Do not jump into functions
You cannot jump into the middle of a
block such as a for loop or try:except
The code in a finally block must all be
executed, so you cannot jump out of
the block
Controlling file
$ cat ~/.pdbrc
# Show python help
alias ph !help(%1)
# Overridden alias
alias redefined p 'home definition'
$ cat .pdbrc # Breakpoints
break 10
Python Extension
The header file Python.h.
The C functions you want to expose as
the interface from your module.
A table mapping the names of your
functions as Python developers will see
them to C functions inside the extension
module.
An initialization function.
The C functions
static PyObject *MyFunction( PyObject
*self, PyObject *args );
static PyObject
*MyFunctionWithKeywords(PyObject
*self, PyObject *args, PyObject *kw);
static PyObject
*MyFunctionWithNoArgs( PyObject *self
);
The method mapping table
struct PyMethodDef { char *ml_name;
PyCFunction ml_meth; int ml_flags;
char *ml_doc; };
This table needs to be terminated with
a sentinel that consists of NULL and 0
values for the appropriate members.
The description of structures
ml_name: This is the name of the
function as the Python interpreter will
present it when it is used in Python
programs.
ml_meth: This must be the address to a
function that has any one of the
signatures described in previous
seection.
ml_flags: This tells the interpreter which
of the three signatures ml_meth is
using.
This flag will usually have a value of
METH_VARARGS.
This flag can be bitwise or'ed with
METH_KEYWORDS if you want to allow
keyword arguments into your function.
This can also have a value of
METH_NOARGS that indicates you don't
want to accept any arguments.
ml_doc: This is the docstring for the
function, which could be NULL if you
don't feel like writing one
The initialization function
PyMODINIT_FUNC initModule() {
Py_InitModule3(func, module_methods,
"docstring..."); }
func: This is the function to be
exported.
module_methods: This is the mapping
table name defined above.
docstring: This is the comment you
want to give in your extension.
For keyword arguments
API PyArg_ParseTuple function to
extract the arguments from the one
PyObject pointer passed into your C
function.
The first argument to PyArg_ParseTuple
is the args argument. This is the object
you'll be parsing.
The second argument is a format string
describing the arguments as you expect
them to appear.
Returning Values
static PyObject *foo_add(PyObject
*self, PyObject *args) {
int a; int b;
if (!PyArg_ParseTuple(args, "ii", &a,
&b)) { return NULL; }
return Py_BuildValue("i", a + b); }
Returning list
static PyObject
*foo_add_subtract(PyObject *self,
PyObject *args) {
int a; int b; if
(!PyArg_ParseTuple(args, "ii", &a, &b))
{ return NULL; }
return Py_BuildValue("ii", a + b, a - b);
}
Swig commands
$ swig -python example.i
$ gcc -O2 -fPIC -c example.c
$ gcc -O2 -fPIC -c example_wrap.c -
I/usr/local/include/python2.5
$ gcc -shared example.o example_wrap.o
-o _example.so
What Can You Do With
Decorators?
Decorators allow you to inject or modify
code in functions or classes
Aspect Oriented Programming
suppose you'd like to do something at
the entry and exit points of a function
(such as perform some kind of security,
tracing, locking, etc
Function Decorators
@myDecorator
def aFunction(): print "inside aFunction“
When the compiler passes over this
code, aFunction() is compiled and the
resulting function object is passed to the
myDecorator code, which does something to
produce a function-like object that is then
substituted for the original aFunction().
The decorator class
class myDecorator(object):
def __init__(self, f):
print( "inside myDecorator.__init__()“)
f() # Prove that function definition has
completed
def __call__(self): #Has to be there
print ("inside myDecorator.__call__()“)
The call
@myDecorator
def aFunction():
print "inside aFunction()"
print "Finished decorating aFunction()"
aFunction()
Using Functions as Decorators
def entryExit(f):
def new_f():
print "Entering", f.__name__
f()
print "Exited", f.__name__
return new_f
Implementation
func1()
@entryExit
func2()
def func1():
print func1.__name__
print "inside func1()“
@entryExit
def func2():
print "inside func2()“
Further info:
http://wiki.python.org/moin/PythonDecoratorLibrary
The Threading Module:
threading.activeCount(): Returns the
number of thread objects that are
active.
threading.currentThread(): Returns the
number of thread objects in the caller's
thread control.
threading.enumerate(): Returns a list of
all thread objects that are currently
active.
run(): The run() method is the entry
point for a thread.
start(): The start() method starts a
thread by calling the run method.
join([time]): The join() waits for threads
to terminate.
isAlive(): The isAlive() method checks
whether a thread is still executing.
getName(): The getName() method
returns the name of a thread.
setName(): The setName() method sets
the name of a thread.
Creating Thread
using Threading Module:
Define a new subclass of
the Thread class.
Override the __init__(self
[,args]) method to add additional
arguments.
Then override the run(self [,args])
method to implement what the thread
should do when started.
Multithreaded Priority Queue
get(): The get() removes and returns an item
from the queue.
put(): The put adds item to a queue.
qsize() : The qsize() returns the number of
items that are currently in the queue.
empty(): The empty( ) returns True if queue
is empty; otherwise, False.
full(): the full() returns True if queue is full;
otherwise, False.
Testing Code
def my_function(a, b):
"""
>>> my_function(2, 3)
6
>>> my_function('a', 3)
'aaa'
"""
return a * b
$ python -m doctest -v doctest_simple.py
Handling Unpredictable Output
class MyClass(object):
pass
def unpredictable(obj):
"""Returns a new list containing obj.
>>> unpredictable(MyClass()) #doctest: +ELLIPSIS
[<doctest_ellipsis.MyClass object at 0x...>]
"""
return [obj]
Dictionaries and sets
keys = [ 'a', 'aa', 'aaa' ]
d1 = dict( (k,len(k)) for k in keys )
d2 = dict( (k,len(k)) for k in reversed(keys) )
print 'd1:', d1
print 'd2:', d2
print 'd1 == d2:', d1 == d2
s1 = set(keys)
s2 = set(reversed(keys))
print
print 's1:', s1
print 's2:', s2
print 's1 == s2:', s1 == s2
Tracebacks
Tracebacks are a special case of changing
data.
The paths in a traceback depend on the
location where a module is installed on the
filesystem on a given system,
It would be impossible to write portable tests
if they were treated the same as other
output.
The solution
def this_raises():
"""This function always raises an exception.
>>> this_raises()
Traceback (most recent call last):
RuntimeError: here is the error
"""
raise RuntimeError('here is the error')
Line two.
"""
for l in lines:
print l
print
return
Solution
def double_space(lines):
"""Prints a list of lines double-spaced.
class SimplisticTest(unittest.TestCase):
def test(self):
self.assertTrue(True)
if __name__ == '__main__':
unittest.main()
python3 test_simple.py
Test Outcomes
Tests have 3 possible outcomes:
ok
The test passes.
FAIL
The test does not pass, and raises an AssertionError exception.
ERROR
The test raises an exception other than AssertionError.
Unittest
There is no explicit way to cause a test to "pass", so a test's status depends on the
presence (or absence) of an exception.
import unittest
class OutcomesTest(unittest.TestCase):
def test_pass(self):
self.assertTrue(True)
def test_fail(self):
self.assertTrue(False)
def test_error(self):
raise RuntimeError('Test error!')
if __name__ == '__main__':
unittest.main()
Asserting Truth
Most tests assert the truth of some condition. There are a few different ways to
write truth-checking tests,
import unittest
class TruthTest(unittest.TestCase):
def test_assert_true(self):
self.assertTrue(True)
def test_assert_false(self):
self.assertFalse(False)
if __name__ == '__main__':
unittest.main()
Failure Messages
These assertions are handy, since the values being compared appear in the failure
message when a test fails.
import unittest
class InequalityTest(unittest.TestCase):
def testEqual(self):
self.assertNotEqual(1, 3-2)
def testNotEqual(self):
self.assertEqual(2, 3-2)
if __name__ == '__main__':
unittest.main()
Asserting Truth
Most tests assert the truth of some condition. There are a few different ways to
write truth-checking tests,
import unittest
class TruthTest(unittest.TestCase):
def test_assert_true(self):
self.assertTrue(True)
def test_assert_false(self):
self.assertFalse(False)
if __name__ == '__main__':
unittest.main()
Unittest
There is no explicit way to cause a test to "pass", so a test's status depends on the
presence (or absence) of an exception.
import unittest
class OutcomesTest(unittest.TestCase):
def test_pass(self):
self.assertTrue(True)
def test_fail(self):
self.assertTrue(False)
def test_error(self):
raise RuntimeError('Test error!')
if __name__ == '__main__':
unittest.main()
Unittest
import unittest
class SimplisticTest(unittest.TestCase):
def test(self):
self.assertTrue(True)
if __name__ == '__main__':
unittest.main()
python3 test_simple.py
Nose Fixtures
Nose extends the unittest fixture model
of setup/teardown.
We can add specific code to run:
at the beginning and end of a module of
test code
(setup_module/teardown_module)
To get this to work, you just have to use
the right naming rules.
Nose Fixtures
at the beginning and end of a class of
test methods
(setup_class/teardown_class)
To get this to work, you have to use the
right naming rules, and include the
‘@classmethod’ decorator
before and after a test function call
(setup_function/teardown_function)
Nose Fixtures
You can use any name. You have to
apply them with the ‘@with_setup’
decorator imported from nose.
You can also use direct assignment,
which I’ll show in the example.
before and after a test method call
(setup/teardown)
To get this to work, you have to use the
right name.
Nose Fixtures
Setup_module () function: runs before
anything else in the file
teardown_module() function: runs after
everything else in the file
setup() method: runs before every test
method
teardown() method: runs after every
test method
Example
def my_setup_function():
pass
def my_teardown_function():
pass
@with_setup(my_setup_function,
my_teardown_function)
def test_numbers_3_4():
assert multiply(3,4) == 12
With classes
class TestUM:
@classmethod
def setup_class(cls):
print ("setup_class() before any
methods in this class")
@classmethod
def teardown_class(cls):
print ("teardown_class() after any
methods in this class")