Archive
Fluffy is gone
We are sad to inform you that Fluffy, the world’s longest snake living in captivity, has died. 18-years-old and weighing 300-pounds Fluffy held the title of longest snake by Guinness World Records and was a hit attraction at Columbus Zoo.
Find more info here.
Levenshtein distance
The Levenshtein distance (or edit distance) between two strings is the minimal number of “edit operations” required to change one string into the other. The two strings can have different lengths. There are three kinds of “edit operations”: deletion, insertion, or alteration of a character in either string.
Example: the Levenshtein distance of “ag-tcc” and “cgctca” is 3.
#!/usr/bin/env python
def LD(s,t):
s = ' ' + s
t = ' ' + t
d = {}
S = len(s)
T = len(t)
for i in range(S):
d[i, 0] = i
for j in range (T):
d[0, j] = j
for j in range(1,T):
for i in range(1,S):
if s[i] == t[j]:
d[i, j] = d[i-1, j-1]
else:
d[i, j] = min(d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + 1)
return d[S-1, T-1]
a = 'ag-tcc'
b = 'cgctca'
print LD(a, b) # 3
The implementation is from here.
Hamming distance
The Hamming distance is defined between two strings of equal length. It measures the number of positions with mismatching characters.
Example: the Hamming distance between “toned” and “roses” is 3.
#!/usr/bin/env python
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
if __name__=="__main__":
a = 'toned'
b = 'roses'
print hamming_distance(a, b) # 3
If you need the number of matching character positions:
#!/usr/bin/env python
def similarity(s1, s2):
assert len(s1) == len(s2)
return sum(ch1 == ch2 for ch1, ch2 in zip(s1, s2))
if __name__=="__main__":
a = 'toned'
b = 'roses'
print similarity(a, b) # 2
Actually this is equal to len(s1) - hamming_distance(s1, s2). Remember, len(s1) == len(s2).
More info on zip() here.
Permutations of a list
Update (20120321): The methods presented here can generate all the permutations. However, the permutations are not ordered lexicographically. If you need the permutations in lexicographical order, refer to this post.
Problem
You need all the permutations of a list.
Solution
With generators:
#!/usr/bin/env python
def perms01(li):
if len(li) yield li
else:
for perm in perms01(li[1:]):
for i in range(len(perm)+1):
yield perm[:i] + li[0:1] + perm[i:]
for p in perms01(['a','b','c']):
print p
Output:
['a', 'b', 'c'] ['b', 'a', 'c'] ['b', 'c', 'a'] ['a', 'c', 'b'] ['c', 'a', 'b'] ['c', 'b', 'a']
This tip is from here.
Without generators:
def perms02(l):
sz = len(l)
if sz return [l]
return [p[:i]+[l[0]]+p[i:] for i in xrange(sz) for p in perms02(l[1:])]
for p in perms02(['a','b','c']):
print p
Output:
['a', 'b', 'c'] ['a', 'c', 'b'] ['b', 'a', 'c'] ['c', 'a', 'b'] ['b', 'c', 'a'] ['c', 'b', 'a']
This tip is from here.
The two outputs contain the same elements in a different order.
Notes
If S is a finite set of n elements, then there are n! permutations of S. For instance, if we have 4 letters (say a, b, c, and d), then we can arrange them in 4! = 4 * 3 * 2 * 1 = 24 different ways.
PyCon 2010, EuroPython 2010
PyCon is the largest annual gathering for the community using and developing the open-source Python programming language. Several videos are available too.
EuroPython is the European Python conference. It is aimed at everyone in the Python community, of all skill levels, both users and programmers. A lucky blogger was there, read his impressions here.
Update (20110509)
“This is a complete list of all recorded PyCon talks since 2009 with direct links to the video download. The official archive can be found at pycon.blip.tv.”
The News Television Project (HírTV)
In this post I describe how to watch news on a Hungarian site. Although the video that we want to play is in Hungarian, you might get some ideas that you can use in a different project.
Project description
Currently I live abroad and sometimes I want to watch news in my mother tongue. So, the Hungarian News Television (HírTV) collects its news programs at http://www.hirtv.hu/view/videoview/hirado . Here, a video has the following URL: http://www.hirtv.net/filmek/hirado21/hiradoYYYYMMDD.wmv , where YYYYMMDD is the date (for instance http://www.hirtv.net/filmek/hirado21/hirado20101018.wmv). Instead of starting a web browser, visiting this page and clicking on a link, I want to launch the news video with a Python script.
Difficulty
When the script is executed, it may be possible that the news of the current day is not yet uploaded. So we need to verify if the URL exists. However, if we want to get a WMV file that doesn’t exist, the web server of HirTv will return an HTML page instead of indicating that the given URL is missing. So we will have to verify the Content-Type of the URLs. If it’s text/html => error, if it’s video/x-ms-wmv => OK.
Solution
#!/usr/bin/env python
import datetime
import urllib
import os
WMV = 'video/x-ms-wmv'
base = 'http://www.hirtv.net/filmek/hirado21/hirado'
ext = '.wmv'
def get_content_type(url):
d = urllib.urlopen(url)
return d.info()['Content-Type']
def date_to_str(d):
return "%d%02d%02d" % d
def prettify(d):
return "%d-%02d-%02d" % d
def play_video(video_url):
print "> " + video_url
command = 'mplayer %s 1>/dev/null 2>&1' % video_url
#command = 'vlc %s 1>/dev/null 2>&1' % video_url # if you prefer VLC
os.system(command)
today = datetime.date.today().timetuple()[:3]
video_today = base + date_to_str(today) + ext
if get_content_type(video_today) == WMV:
play_video(video_today)
else:
yesterday = (datetime.date.today() - datetime.timedelta(days = 1)).timetuple()[:3]
video_yesterday = base + date_to_str(yesterday) + ext
print "The video for today (%s) is not available." % prettify(today)
val = raw_input( "Do you want to watch the video of yesterday (%s) [y/n]? " % prettify(yesterday) )
if val == "y":
if get_content_type(video_yesterday) == WMV:
play_video(video_yesterday)
else:
print "Sorry. The video of yesterday (%s) is not available either." % prettify(yesterday)
First we determine the today’s date and using this information we create a URL for the video file. If it really exists (i.e. the Content-Type is correct), then we play it calling mplayer. If the Content-Type is incorrect, then the video of today was not yet uploaded. In this case we offer the user to play the video of yesterday.
Update (20101107): A bug in date_to_str() and prettify() was corrected. Months and days must be padded with 0s, i.e. 6 must become 06 for instance. VLC support is also added, it’s put in comment.
Get URL info (file size, Content-Type, etc.)
Problem
You have a URL and you want to get some info about it. For instance, you want to figure out the content type (text/html, image/jpeg, etc.) of the URL, or the file size without actually downloading the given page.
Solution
Let’s see an example with an image. Consider the URL http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg .
#!/usr/bin/env python
import urllib
def get_url_info(url):
d = urllib.urlopen(url)
return d.info()
url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg'
print get_url_info(url)
Output:
Date: Mon, 18 Oct 2010 18:58:07 GMT
Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6
X-Powered-By: Zope (www.zope.org), Python (www.python.org)
Last-Modified: Thu, 08 Nov 2007 09:56:19 GMT
Content-Length: 103984
Accept-Ranges: bytes
Connection: close
Content-Type: image/jpeg
That is, the size of the image is 103,984 bytes and its content type is indeed image/jpeg.
In the code d.info() is a dictionary, so the extraction of a specific field is very easy:
#!/usr/bin/env python
import urllib
def get_content_type(url):
d = urllib.urlopen(url)
return d.info()['Content-Type']
url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg'
print get_content_type(url) # image/jpeg
This post is based on this thread.
Update (20121202)
With requests:
>>> import requests
>>> from pprint import pprint
>>> url = 'http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg'
>>> r = requests.head(url)
>>> pprint(r.headers)
{'accept-ranges': 'none',
'connection': 'close',
'content-length': '103984',
'content-type': 'image/jpeg',
'date': 'Sun, 02 Dec 2012 21:05:57 GMT',
'etag': 'ts94515779.19',
'last-modified': 'Thu, 08 Nov 2007 09:56:19 GMT',
'server': 'Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6',
'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)'}
check if URL exists
Problem
You want to check if a URL exists without actually downloading the given file.
Solution
Update (20120124): There was something wrong with my previous solution, it didn’t work correctly. Here is my revised version.
import httplib
import urlparse
def get_server_status_code(url):
"""
Download just the header of a URL and
return the server's status code.
"""
# http://stackoverflow.com/questions/1140661
host, path = urlparse.urlparse(url)[1:3] # elems [1] and [2]
try:
conn = httplib.HTTPConnection(host)
conn.request('HEAD', path)
return conn.getresponse().status
except StandardError:
return None
def check_url(url):
"""
Check if a URL exists without downloading the whole file.
We only check the URL header.
"""
# see also http://stackoverflow.com/questions/2924422
good_codes = [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY]
return get_server_status_code(url) in good_codes
Tests:
assert check_url('http://www.google.com') # exists
assert not check_url('http://simile.mit.edu/crowbar/nothing_here.html') # doesn't exist
We only get the header of a given URL and we check the response code of the web server.
Update (20121202)
With requests:
>>> import requests >>> >>> url = 'http://hup.hu' >>> r = requests.head(url) >>> r.status_code 200 # requests.codes.OK >>> url = 'http://www.google.com' >>> r = requests.head(url) >>> r.status_code 302 # requests.codes.FOUND >>> url = 'http://simile.mit.edu/crowbar/nothing_here.html' >>> r = requests.head(url) >>> r.status_code 404 # requests.codes.NOT_FOUND
date today
Let’s see how to get today’s date in the format yyyymmdd, i.e. {year}{month}{day}. This format has an advantage. If you have several dates like this and you sort them lexicographically, then you get them in chronological order. I often use this format when creating subdirectories in the file system. Most file managers sort them automatically, so I can see them in order.
#!/usr/bin/env python
import datetime
def date_to_str(d):
return ''.join(str(i) for i in d)
today = datetime.date.today().timetuple()[:3]
print today # (2010, 10, 17)
print date_to_str(today) # 20101017
Here today is a tuple with three elements. The function date_to_str() joins the elements and returns a string. If you use the separator ‘-‘, i.e. '-'.join(...), then you get the following output: 2010-10-17.
Good to know
The form year before month before day is standard in Asian countries, Hungary, Sweden and the US armed forces.


You must be logged in to post a comment.