Showing posts with label HTML5. Show all posts
Showing posts with label HTML5. Show all posts

Tuesday, February 10, 2015

Recursively dumping the structure of an HTML5 document

By Vasudev Ram





A while ago I had written this post,

The html5lib Python library (and Animatron :-)

which shows basic usage of a Python library called html5lib, that lets you parse HTML5 documents and then walk through their structure.

That post uses this HTML5 document as input for the program shown in it:


Yesteday I modified the program (test_html5lib.py) shown in that earlier post, to make it recursive, thereby simplifying it. Here is the code for the resulting program, html5_dump.py.
# Demo program to show how to dump the structure of 
# an HTML5 document to text, using html5lib.
# Author: Vasudev Ram.
# Copyright 2015 Vasudev Ram - http://www.dancingbison.com

import html5lib

# Define a function to dump HTML5 element info recursively, 
# given a top-level element.
def print_element(elem, indent, level):
    for sub_elem in elem:
        print "{}{}".format(indent * level, sub_elem)
        # Recursive call to print_element().
        print_element(sub_elem, indent, level + 1)

f = open("html5doc.html")
# Parse the HTML document.
tree = html5lib.parse(f)
indent = '----'
level = 0
print_element(tree, indent, level)
I ran the program with:
$ py html5_dump.py

where the py in the command refers to py, the Python Launcher for Windows

Here is the program output, which you can see is basically the same as the previous version, but, done using recursion.
<Element u'{http://www.w3.org/1999/xhtml}head' at 0x02978938>
<Element u'{http://www.w3.org/1999/xhtml}body' at 0x02978968>
----<Element u'{http://www.w3.org/1999/xhtml}header' at 0x02978980>
--------<Element u'{http://www.w3.org/1999/xhtml}h1' at 0x02978920>
--------<Element u'{http://www.w3.org/1999/xhtml}h2' at 0x02978B00>
--------<Element u'{http://www.w3.org/1999/xhtml}h3' at 0x02978AB8>
----<Element u'{http://www.w3.org/1999/xhtml}p' at 0x02978AE8>
----<Element u'{http://www.w3.org/2000/svg}svg' at 0x02978788>
--------<Element u'{http://www.w3.org/2000/svg}defs' at 0x02A12050>
--------<Element u'{http://www.w3.org/2000/svg}rect' at 0x02A12020>
--------<Element u'{http://www.w3.org/2000/svg}text' at 0x02A12068>
----<Element u'{http://www.w3.org/1999/xhtml}footer' at 0x02A12080>

The recursion helps in two ways: 1) recursively printing sub-elements, and 2) not having to keep track of the indentation level needed - the Python interpreter's handling of nested calls and backing out of them, takes care of that for us. See the line:
print_element(sub_elem, indent, level + 1)
However, if using deep recursion, we have to remember about python recursion depth issues.

Enjoy.

- Vasudev Ram - online Python trainer and freelance programmer

Seeking alpha ...

Signup to hear about my new software products.

Contact Page

Sub-feeds for my posts about Python and posts about xtopdf.

Saturday, March 8, 2014

The html5lib Python library (and Animatron :-)

By Vasudev Ram



I came across the html5lib Python library recently. The site describes it thusly:

"html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers."

So it doesn't say explicitly that it is for parsing HTML5, though the library name includes "5" in its name. But I tried it out on a simple HTML5 document and it seems to be able to parse HTML5 - at least the few HTML5 elements I tried it on.

Here's the code I used to try out html5lib:
# test_html5lib.py
# A program to try out the html5lib Python library.
# Author: Vasudev Ram - www.dancingbison.com
import html5lib

f = open("html5doc.html")
tree = html5lib.parse(f)
print "tree:"
print repr(tree)
print
print "items in tree:"

for item in tree:
    print item
    for item2 in item:
        print "-" * 4, item2
        for item3 in item2:
            print "-" * 8, item3
And here is the output of running python test_html5lib.py:

<Element u'{http://www.w3.org/1999/xhtml}head' at 0x02B663C8>
<Element u'{http://www.w3.org/1999/xhtml}body' at 0x02B66488>
---- <Element u'{http://www.w3.org/1999/xhtml}header' at 0x02B664B8>
-------- <Element u'{http://www.w3.org/1999/xhtml}h1' at 0x02B66530>
-------- <Element u'{http://www.w3.org/1999/xhtml}h2' at 0x02B664E8>
-------- <Element u'{http://www.w3.org/1999/xhtml}h3' at 0x02B665F0>
---- <Element u'{http://www.w3.org/1999/xhtml}p' at 0x02B66650>
---- <Element u'{http://www.w3.org/2000/svg}svg' at 0x02B66BC0>
-------- <Element u'{http://www.w3.org/2000/svg}defs' at 0x02B66B60>
-------- <Element u'{http://www.w3.org/2000/svg}rect' at 0x02B66B30>
-------- <Element u'{http://www.w3.org/2000/svg}text' at 0x02B66BD8>
---- <Element u'{http://www.w3.org/1999/xhtml}footer' at 0x02B66BF0>

Here is the documentation for html5lib.

And speaking of HTML5, coincidentally, I came across Animatron via Hacker News, today:



Animatron is "a simple and powerful online tool that allows you to create stunning HTML5 animations and interactive content." Animatron is not really related to html5lib, except for the fact that both of them are about HTML5, but it looks cool. Check it out.

Hacker News thread about Animatron.

Enjoy.


- Vasudev Ram - Dancing Bison Enterprises

Contact Page

Friday, September 21, 2012

PyMob to create mobile apps in Python

By Vasudev Ram


PyMob™ is a technology which allows developers to create mobile apps in Python.

It claims to support Android, iOS, HTML5 and Windows 8 as targets.

- Vasudev Ram - Dancing Bison Enterprises


Wednesday, July 25, 2012

Tuesday, July 3, 2012

Mozilla to launch HTML5 based mobile OS with broad telecom industry support

Mozilla Gains Global Support For a Firefox Mobile OS | The Mozilla Blog

Interesting indeed. If it works out it should spur more competition, choice for users, and innovation.

- Vasudev Ram
www.dancingbison.com

Monday, March 26, 2012

Rich-layout ebooks with EPUB3, HTML5 and CSS3


http://www.ibm.com/developerworks/library/x-richlayoutepub/index.html

The article is by Liza Daly, VP Engineering, Safari Books Online. It also links to an earlier EPUB tutorial by Liza, which uses Java and Python.

- Vasudev Ram
www.dancingbison.com
twitter.com/vasudevram

Wednesday, August 10, 2011

Amazon announces HTML5-based Kindle Cloud Reader

By Vasudev Ram - dancingbison.com | @vasudevram | jugad2.blogspot.com

Amazon has released a Kindle Cloud Reader "written from the ground up in HTML5", according to PCMag.com. It can automatically sync with other Kindle apps, allowing you to start reading on the Web and continue on other devices. You can also read offline.

See here:

http://goo.gl/HP47w

or here:

http://www.pcmag.com/article2/0,2817,2390784,00.asp

(Both above links go to the same page at PCMag.com)

You can access Amazon Cloud Reader here: https://read.amazon.com/about

Posted via email
- Vasudev Ram