untangle | Python Adventures

Check Gmail for new messages

September 8, 2012 Jabba Laci 2 comments

Problem
I want to check my new Gmail messages periodically. When I get a message from a specific sender (with a specific Subject), I want to trigger some action. How to do that?

Solution
Fortunately, there is an atom feed of unread Gmail messages at https://mail.google.com/mail/feed/atom. All you have to do it is visit this page, send your login credentials, fetch the feed and process it.

import urllib2

FEED_URL = 'https://mail.google.com/mail/feed/atom'

def get_unread_msgs(user, passwd):
    auth_handler = urllib2.HTTPBasicAuthHandler()
    auth_handler.add_password(
        realm='New mail feed',
        uri='https://mail.google.com',
        user='{user}@gmail.com'.format(user=user),
        passwd=passwd
    )
    opener = urllib2.build_opener(auth_handler)
    urllib2.install_opener(opener)
    feed = urllib2.urlopen(FEED_URL)
    return feed.read()

##########

if __name__ == "__main__":
    import getpass

    user = raw_input('Username: ')
    passwd = getpass.getpass('Password: ')
    print get_unread_msgs(user, passwd)

For reading XML I use the untangle module:

import untangle    # sudo pip install untangle

xml = get_unread_msgs(USER, PASSWORD)
o = untangle.parse(xml)
try:
    for e in o.feed.entry:
        title = e.title.cdata
        print title
except IndexError:
    pass    # no new mail

Links

How to auto log into gmail atom feed with Python? (I took the script from here)
Read XML painlessly (the untangle module)

Categories: python Tags: atom, gmail, rss, untangle, xml

Upload an image to imgur.com from Python

November 6, 2011 Jabba Laci Leave a comment

If you are familiar with reddit, you must have noticed that most images are hosted on imgur. I would like to upload several images from my computer and I want to collect their URLs on imgur. Let’s see how to do that.

Imgur has an API, this is what we’ll use. Anonymous upload is fine for my needs. For this you need to register and you get an API key. Under the examples there is a very simple Python code. When you execute it, pycurl prints the server’s XML response to the standard output. How to store that in a variable? From that XML we want to extract some data.

Here is an extended version of the uploader script:

#!/usr/bin/env python

import pycurl
import cStringIO
import untangle    # XML parser

def upload_from_computer(image):
    response = cStringIO.StringIO()   # XML response is stored here
    
    c = pycurl.Curl()
    
    values = [
              ("key", your_api_key),
              ("image", (c.FORM_FILE, image))]
    # OR:     ("image", "http://example.com/example.jpg")]
    # OR:     ("image", "YOUR_BASE64_ENCODED_IMAGE_DATA")]
    
    c.setopt(c.URL, "http://api.imgur.com/2/upload.xml")
    c.setopt(c.HTTPPOST, values)
    c.setopt(c.WRITEFUNCTION, response.write)   # put the server's output in here
    c.perform()
    c.close()
    
    return response.getvalue()

def process(xml):
    o = untangle.parse(xml)
    url = o.upload.links.original.cdata
    delete_page = o.upload.links.delete_page.cdata
    
    print 'url:        ', url
    print 'delete page:', delete_page

#############################################################################

if __name__ == "__main__":
    img = '/tmp/something.jpg'
    xml = upload_from_computer(img)
    process(xml)

The tip for storing the XML output in a variable is from here. Untangle is a lightweight XML parser; more info here.

Categories: python Tags: imgur, imgur api, pycurl, reddit, untangle, upload image

Read XML painlessly

October 30, 2011 Jabba Laci 3 comments

Problem
I had an XML file (an RSS feed) from which I wanted to extract some data. I tried some XML libraries but I didn’t like any of them. Is there a simple, brain-friendly way for this? After all, it’s Python, so everything should be simple.

Solution
Yes, there is a simple library for reading XML called “untangle“, developed by Chris Stefanescu. It’s in PyPI, so installation is very easy:

sudo pip install untangle

For some examples, visit the project page.

Use Case
Let’s see a simple, real-world example. From the RSS feed of Planet Python, let’s extract the post titles and their URLs.

#!/usr/bin/env python

import untangle

#XML = 'examples/planet_python.xml'     # can read a file too
XML = 'http://planet.python.org/rss20.xml'

o = untangle.parse(XML)
for item in o.rss.channel.item:
    title = item.title.cdata
    link = item.link.cdata
    if link:
        print title
        print '   ', link

It couldn’t be any simpler :)

Limitations
According to Chris, untangle doesn’t support documents with namespaces (yet).

Related posts

Write XML

Alternatives (update 20111031)
Here are some alternatives (thanks reddit).

Python and XML (overview)
lxml
amara [official tutorial]
xmltodict (converts XML to dict; added on 20141229)

lxml and amara are heavyweight solutions and are built upon C libraries so you may not be able to use them everywhere. untangle is a lightweight parser that can be a perfect choice to read a small and simple XML file.

Categories: python Tags: amara, lxml, read xml, untangle, xml, xml library

Python Adventures

Archive

Check Gmail for new messages

Upload an image to imgur.com from Python

Read XML painlessly

Blog Stats

Random Post

Recent Posts

Archives

Meta