Jabba Laci

Generators

October 19, 2010 Jabba Laci Leave a comment

“Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left-off (it remembers all the data values and which statement was last executed).”

Let’s rewrite our Fibonacci function using generators. In the previous approach, we specified how many Fibonacci numbers we want to get. The function calculated all of them and returned a list containing all the elements. With generators, we can calculate the numbers one by one. The new function will calculate a number, return it, and suspend its execution. When we call it again, it will resume where it left off and it runs until it computes another number, etc.

First let’s see a Fibonacci function that calculates the numbers in an infinite loop:

#!/usr/bin/env python

def fib():
    a, b = 0, 1
    while True:
        print a    # the current number is here
        a, b = b, a+b

fib()

In order to rewrite it in the form of a generator, we need to locate the part where the current value is calculated. This is the line with print a. We only need to replace this with yield a. It means that the function will return this value and suspend its execution until called again.

So, with generators it will look like this:

#!/usr/bin/env python

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a+b

f = fib()
for i in range(10):    # print the first ten Fibonacci numbers
    print f.next(),    # 0 1 1 2 3 5 8 13 21 34

It is also possible to get a slice from the values of a generator. For instance, we want the 5^th, 6^th, and 7^th Fibonacci numbers:

#!/usr/bin/env python

from itertools import islice

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a+b

for i in islice(fib(), 5, 8):
    print i    # 5 8 13

More info on islice is here. For this post I used tips from here.

Update (20110406)

Here is a presentation in PDF entitled “Generator Tricks For Systems Programmers” by David Beazley (presented at PyCon 2008). (_{Reddit thread is here.})

Categories: python Tags: David Beazley, fibonacci, generators, pdf, presentation, pycon 2008, slides

PyCon 2010, EuroPython 2010

October 18, 2010 Jabba Laci Leave a comment

PyCon is the largest annual gathering for the community using and developing the open-source Python programming language. Several videos are available too.

EuroPython is the European Python conference. It is aimed at everyone in the Python community, of all skill levels, both users and programmers. A lucky blogger was there, read his impressions here.

Update (20110509)

PyCon Video Archive

“This is a complete list of all recorded PyCon talks since 2009 with direct links to the video download. The official archive can be found at pycon.blip.tv.”

Categories: python

The News Television Project (HírTV)

October 18, 2010 Jabba Laci Leave a comment

In this post I describe how to watch news on a Hungarian site. Although the video that we want to play is in Hungarian, you might get some ideas that you can use in a different project.

Project description

Currently I live abroad and sometimes I want to watch news in my mother tongue. So, the Hungarian News Television (HírTV) collects its news programs at http://www.hirtv.hu/view/videoview/hirado . Here, a video has the following URL: http://www.hirtv.net/filmek/hirado21/hiradoYYYYMMDD.wmv , where YYYYMMDD is the date (for instance http://www.hirtv.net/filmek/hirado21/hirado20101018.wmv). Instead of starting a web browser, visiting this page and clicking on a link, I want to launch the news video with a Python script.

Difficulty

When the script is executed, it may be possible that the news of the current day is not yet uploaded. So we need to verify if the URL exists. However, if we want to get a WMV file that doesn’t exist, the web server of HirTv will return an HTML page instead of indicating that the given URL is missing. So we will have to verify the Content-Type of the URLs. If it’s text/html => error, if it’s video/x-ms-wmv => OK.

Solution

#!/usr/bin/env python

import datetime
import urllib
import os

WMV  = 'video/x-ms-wmv'

base = 'http://www.hirtv.net/filmek/hirado21/hirado'
ext = '.wmv'

def get_content_type(url):
    d = urllib.urlopen(url)
    return d.info()['Content-Type']

def date_to_str(d):
    return "%d%02d%02d" % d

def prettify(d):
    return "%d-%02d-%02d" % d

def play_video(video_url):
    print "> " + video_url
    command = 'mplayer %s 1>/dev/null 2>&1' % video_url
    #command = 'vlc %s 1>/dev/null 2>&1' % video_url    # if you prefer VLC
    os.system(command)

today = datetime.date.today().timetuple()[:3]
video_today = base + date_to_str(today) + ext
if get_content_type(video_today) == WMV:
    play_video(video_today)
else:
    yesterday = (datetime.date.today() - datetime.timedelta(days = 1)).timetuple()[:3]
    video_yesterday = base + date_to_str(yesterday) + ext

    print "The video for today (%s) is not available." % prettify(today)
    val = raw_input( "Do you want to watch the video of yesterday (%s) [y/n]? " % prettify(yesterday) )
    if val == "y":
        if get_content_type(video_yesterday) == WMV:
            play_video(video_yesterday)
        else:
            print "Sorry. The video of yesterday (%s) is not available either." % prettify(yesterday)

First we determine the today’s date and using this information we create a URL for the video file. If it really exists (i.e. the Content-Type is correct), then we play it calling mplayer. If the Content-Type is incorrect, then the video of today was not yet uploaded. In this case we offer the user to play the video of yesterday.

Update (20101107): A bug in date_to_str() and prettify() was corrected. Months and days must be padded with 0s, i.e. 6 must become 06 for instance. VLC support is also added, it’s put in comment.

Categories: python Tags: hirtv, mplayer, news, television, video, yesterday

Get URL info (file size, Content-Type, etc.)

October 18, 2010 Jabba Laci 2 comments

Problem

You have a URL and you want to get some info about it. For instance, you want to figure out the content type (text/html, image/jpeg, etc.) of the URL, or the file size without actually downloading the given page.

Solution

Let’s see an example with an image. Consider the URL http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg .

#!/usr/bin/env python

import urllib

def get_url_info(url):
    d = urllib.urlopen(url)
    return d.info()

url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg'
print get_url_info(url)

Output:
Date: Mon, 18 Oct 2010 18:58:07 GMT Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6 X-Powered-By: Zope (www.zope.org), Python (www.python.org) Last-Modified: Thu, 08 Nov 2007 09:56:19 GMT Content-Length: 103984 Accept-Ranges: bytes Connection: close Content-Type: image/jpeg

That is, the size of the image is 103,984 bytes and its content type is indeed image/jpeg.

In the code d.info() is a dictionary, so the extraction of a specific field is very easy:

#!/usr/bin/env python

import urllib

def get_content_type(url):
    d = urllib.urlopen(url)
    return d.info()['Content-Type']

url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg'
print get_content_type(url)    # image/jpeg

This post is based on this thread.

Update (20121202)

With requests:

>>> import requests
>>> from pprint import pprint
>>> url = 'http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg'
>>> r = requests.head(url)
>>> pprint(r.headers)
{'accept-ranges': 'none',
 'connection': 'close',
 'content-length': '103984',
 'content-type': 'image/jpeg',
 'date': 'Sun, 02 Dec 2012 21:05:57 GMT',
 'etag': 'ts94515779.19',
 'last-modified': 'Thu, 08 Nov 2007 09:56:19 GMT',
 'server': 'Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6',
 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)'}

Categories: python Tags: content-type, header, mime, requests, url, url size

check if URL exists

October 17, 2010 Jabba Laci 6 comments

Problem

You want to check if a URL exists without actually downloading the given file.

Solution

Update (20120124): There was something wrong with my previous solution, it didn’t work correctly. Here is my revised version.

import httplib
import urlparse

def get_server_status_code(url):
    """
    Download just the header of a URL and
    return the server's status code.
    """
    # http://stackoverflow.com/questions/1140661
    host, path = urlparse.urlparse(url)[1:3]    # elems [1] and [2]
    try:
        conn = httplib.HTTPConnection(host)
        conn.request('HEAD', path)
        return conn.getresponse().status
    except StandardError:
        return None

def check_url(url):
    """
    Check if a URL exists without downloading the whole file.
    We only check the URL header.
    """
    # see also http://stackoverflow.com/questions/2924422
    good_codes = [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY]
    return get_server_status_code(url) in good_codes

Tests:

assert check_url('http://www.google.com')    # exists
assert not check_url('http://simile.mit.edu/crowbar/nothing_here.html')    # doesn't exist

We only get the header of a given URL and we check the response code of the web server.

Update (20121202)

With requests:

>>> import requests
>>>
>>> url = 'http://hup.hu'
>>> r = requests.head(url)
>>> r.status_code
200    # requests.codes.OK
>>> url = 'http://www.google.com'
>>> r = requests.head(url)
>>> r.status_code
302    # requests.codes.FOUND
>>> url = 'http://simile.mit.edu/crowbar/nothing_here.html'
>>> r = requests.head(url)
>>> r.status_code
404    # requests.codes.NOT_FOUND

Categories: python Tags: header, requests, url

date today

October 17, 2010 Jabba Laci Leave a comment

Let’s see how to get today’s date in the format yyyymmdd, i.e. {year}{month}{day}. This format has an advantage. If you have several dates like this and you sort them lexicographically, then you get them in chronological order. I often use this format when creating subdirectories in the file system. Most file managers sort them automatically, so I can see them in order.

#!/usr/bin/env python

import datetime

def date_to_str(d):
    return ''.join(str(i) for i in d)

today = datetime.date.today().timetuple()[:3]

print today                 # (2010, 10, 17)
print date_to_str(today)    # 20101017

Here today is a tuple with three elements. The function date_to_str() joins the elements and returns a string. If you use the separator ‘-‘, i.e. '-'.join(...), then you get the following output: 2010-10-17.

Good to know

The form year before month before day is standard in Asian countries, Hungary, Sweden and the US armed forces.

Categories: python Tags: date, today

Pylint

October 16, 2010 Jabba Laci Leave a comment

“Pylint is a lint-like tool for Python code. It performs almost all the verifications that pychecker does, and additionally can perform some stylistic verification and coding standard enforcements. The checked code is assigned a mark based on the number and the severity of the encountered problems. The previous mark of a given piece of code is cached so that you can see if the code quality has improved since the last check.”

Pylint is a very nice code checker. If you use Ubuntu, you can install it from the repositories (sudo apt-get install pylint). Its usage is very simple:

pylint  file.py

The output is a nice report with suggestions how to improve the code quality. I especially like the “unused import” warnings.

Check out this tiny tutorial for some examples.

Categories: python Tags: code checker, pylint, refactoring

chomp() functionality in Python

October 11, 2010 Jabba Laci 1 comment

In Perl there is a function called chomp() which is very useful when reading a text file line by line. It removes the newline character ('\n') at the end of lines. How to do the same thing with Python?

Solution #1

For having the same effect, remove the '\n' from each line:

#!/usr/bin/env python

f = open('test.txt', 'r')
for line in f:
    line = line.replace('\n', '')    # remove '\n' only
    # do something with line
    
f.close()

This will replace the '\n' with an empty string.

Solution #2

There is a function called rstrip() which removes ALL whitespace characters on the right side of a string. This is not entirely the same as the previous because it will remove all whitespace characters on the right side, not only the '\n'. However, if you don’t need those whitespace characters, you can use this solution too.

#!/usr/bin/env python

f = open('test.txt', 'r')
for line in f:
    line = line.rstrip()    # remove ALL whitespaces on the right side, including '\n'
    # do something with line
    
f.close()

Update (20111011): As it was pointed out by John C in the comments, “rstrip() also accepts a string of characters…, so line.rstrip('\n') will remove just trailing newline characters.” More info on rstrip here.

Categories: python Tags: chomp, perl, text file, whitespace

Python Challenge #1

October 8, 2010 Jabba Laci 2 comments

This exercise is from http://www.pythonchallenge.com.

Challenge

We have the following encoded text:

g fmnc wms bgblr rpylqjyrc gr zw fylb. rfyrq ufyr amknsrcpq ypc dmp. bmgle gr gl zw fylb gq glcddgagclr ylb rfyr'q ufw rfgq rcvr gq qm jmle. sqgle qrpgle.kyicrpylq() gq pcamkkclbcb. lmu ynnjw ml rfc spj.

And the following hint to decode it: “K -> M, O -> Q, E -> G”.

First try to solve it yourself. My solution is below.
Read more…

Categories: python Tags: ascii, caesar, challenge, cipher, translation

Python Challenge

October 8, 2010 Jabba Laci Leave a comment

If you learn Python and you like challenges, don’t forget to visit the site http://www.pythonchallenge.com. These exercises are fun to solve and you will learn a lot from them. I will also start to solve them and post my solutions here. Of course, first you should try to solve them by yourself. When you are done, you can compare your solution with the others’.

Challenge #0 is just to warm you up. I will start with Challenge #1 in the next post.

Categories: python Tags: challenge

Newer Entries Older Entries

Python Adventures

Archive

Generators

PyCon 2010, EuroPython 2010

The News Television Project (HírTV)

Get URL info (file size, Content-Type, etc.)

check if URL exists

date today

Pylint

chomp() functionality in Python

Python Challenge #1

Python Challenge

Blog Stats

Random Post

Recent Posts

Archives

Meta