Archive
BeautifulSoup with CssSelect? Yes!
(20131215) This post is out-of-date. BeautifulSoup 4 has built-in support for CSS selectors. Check out this post.
A few days ago I started to explore lxml (it’s been on my list for a long time) and I really like its CSS selector. As I used BeautifulSoup a lot in the past, I wondered if it were possible to add this functionality to BS. I made a quick search on Google and here is what I found: https://code.google.com/p/soupselect/.
“A single function, select(soup, selector), that can be used to select items from a BeautifulSoup instance using CSS selector syntax. Currently supports type selectors, class selectors, id selectors, attribute selectors and the descendant combinator.”
Just what I needed :) You can also patch BS and integrate this new functionality:
>>> from BeautifulSoup import BeautifulSoup as Soup
>>> import soupselect; soupselect.monkeypatch()
>>> import urllib
>>> soup = Soup(urllib.urlopen('http://slashdot.org/'))
>>> soup.findSelect('div.title h3')
[</pre>
<h3>...
