Get the IMDb Top 250 list

August 19, 2016 Leave a comment

Problem
From IMDb you want to get the list of the Top 100 movies.

Solution
There is a Top 250 list here: http://akas.imdb.com/chart/top. To access IMDb info, I use the excellent imdbpy package. It has a get_top250_movies() function but it returns an empty list :)

During my research I found this post on SO. It suggests that one should download the official IMDb dump from here. The Top 250 list is in the file ratings.list.gz. However, this file doesn’t contain the IMDb IDs of the movies, so it’s good for nothing :(

There was only one solution left: let’s do some scraping. Here is the Python code that did the job for me. I didn’t use BeautifulSoup just plain ol’ regular expressions:

import requests
import re

top250_url = "http://akas.imdb.com/chart/top"

def get_top250():
    r = requests.get(top250_url)
    html = r.text.split("\n")
    result = []
    for line in html:
        line = line.rstrip("\n")
        m = re.search(r'data-titleid="tt(\d+?)">', line)
        if m:
            _id = m.group(1)
            result.append(_id)
    #
    return result

It returns the IMDb IDs of the Top 250 movies. Then, using the imdbpy package you can ask all the information about a movie, since you have the movie ID.

Links

Categories: python Tags: , , , ,

string distances

August 17, 2016 Leave a comment

See the Jellyfish project: “Jellyfish is a python library for doing approximate and phonetic matching of strings“.

Jellyfish implements the following algorithms: Levenshtein Distance, Damerau-Levenshtein Distance, Jaro Distance, Jaro-Winkler Distance, Match Rating Approach Comparison, Hamming Distance.

See the project page for more info.

Categories: python Tags: ,

compile lxml on Ubuntu 16.04

August 4, 2016 Leave a comment

Problem
lxml doesn’t want to compile on Ubuntu 16.04.

Solution

$ sudo apt install libxml2-dev libxslt1-dev python-dev zlib1g-dev

I was getting the error “/usr/bin/ld: cannot find -lz“. It turned out that the package zlib1g-dev was the cure…

Note that this is for Python 2. For Python 3 you might need to install the package python3-dev.

Categories: python, ubuntu Tags: ,

installing a Flask webapp on a Digital Ocean Ubuntu 16.04 box using Systemd

August 4, 2016 Leave a comment

I’ve updated my Digital Ocean Flask notes on GitHub. Now it includes information about installing a Flask webapp on a Digital Ocean Ubuntu 16.04 box using Systemd.

Categories: flask, python, ubuntu Tags: ,

Flask RESTful POST JSON

July 17, 2016 Leave a comment

Problem
Using Flask-RESTful, I needed an API endpoint that accepts JSON data.

Solution
I found the solution here: http://stackoverflow.com/questions/22273671/flask-restful-post-json-fails. You can copy / paste that code. Note that the JSON data is POSTed to your API endpoint, thus you need to implement the post() method.

However, how to test it?

1) using cURL:

$ curl -i -H "Content-Type: application/json" -H "Accept: application/json" -X POST -d "{\"Hello\":\"Karl\"}" http://domain/your_api_endpoint

Damn, that’s compicated, right? Is there an easier way?

2) using httpie:
You can install httpie with your favorite package manager. Then:

$ http POST http://domain/your_api_endpoint Hello=Karl

get the tweets of a user and save them in CSV

July 16, 2016 Leave a comment
Categories: python Tags: , , ,

remove tags from HTML

July 13, 2016 Leave a comment

Problem
You have an HTML string and you want to remove all the tags from it.

Solution
Install the package “bleach” via pip. Then:

>>> import bleach
>>> html = "Her <h1>name</h1> was <i>Jane</i>."
>>> cleaned = bleach.clean(html, tags=[], attributes={}, styles=[], strip=True)
>>> html
'Her <h1>name</h1> was <i>Jane</i>.'
>>> cleaned
'Her name was Jane.'

Tip from here.

Categories: python Tags: , ,

for / else and try / except / else

June 17, 2016 1 comment

Problem
What is that “else” in a for loop? And that “else” in an exception handler?

Solution
They can be confusing but in this thread I found a perfect way to remember what they mean. Asdayasman suggests that we should always annotate these “else” branches:

for _ in []:
    ...
else:  # nobreak
    ...

try:
    ...
except:
    ...
else:  # noexcept
    ...

To be honest, IMO it is best to avoid for / else completely.

Categories: python Tags: , , ,

email notification from a script

June 15, 2016 1 comment

Problem
You want to send an email to yourself from a script.

Solution
You can find here how to do it from a Bash script. That solution uses the mailx command.

Here is a simple Python wrapper for the mailx command:

#!/usr/bin/env python3
# coding: utf8

import os

DEBUG = True
# DEBUG = False

class NoSubjectError(Exception):
    pass

class NoRecipientError(Exception):
    pass

def send_email(to='', subject='', body=''):
    if not subject:
        raise NoSubjectError
    if not to:
        raise NoRecipientError
    #
    if not body:
        cmd = """mailx -s "{s}" < /dev/null "{to}" 2>/dev/null""".format(
            s=subject, to=to
        )
    else:
        cmd = """echo "{b}" | mailx -s "{s}" "{to}" 2>/dev/null""".format(
            b=body, s=subject, to=to
        )
    if DEBUG:
        print("#", cmd)
    #
    os.system(cmd)

def main():
    send_email(to="[email protected]",
               subject="subject")
    #
    send_email(to="[email protected]",
               subject="subject",
               body='this is the body of the email')

#############################################################################

if __name__ == "__main__":
    main()

You can also find this code as a gist.

Categories: python Tags: , ,

a simple GUI pomodoro timer

For managing my TODO lists, I use a sheet of paper where I make a list of tasks to do. A few days ago I started to use the pomodoro technique, which helps a lot to actually DO those tasks :)

As I don’t have a tomato-shaped kitchen timer (yet!), I wrote a simple GUI timer that you can find on github.

Update (20160608)
Here is an online timer: http://www.timeanddate.com/timer/. It can play a sound, and you can also launch several timers if you want. Thanks Jeszy for the link.

Design a site like this with WordPress.com
Get started