Showing posts with label HTML. Show all posts
Showing posts with label HTML. Show all posts

Monday, September 29, 2014

CommonMark, a pure Python Markdown parser and renderer


By Vasudev Ram

I got to know about CommonMark.org via this post on the Python Reddit:

CommonMark.py - pure Python Markdown parser and renderer

From what I could gather, CommonMark is, or aims to be, two things:

1. "A standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests".

2. A Python parser and renderer for the CommonMark Markdown spec.

CommonMark on PyPI, the Python Package Index.

Excerpts from the CommonMark.org site:

[ We propose a standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests to validate Markdown implementations against this specification. We believe this is necessary, even essential, for the future of Markdown. ]

[ Who are you?
We're a group of Markdown fans who either work at companies with industrial scale deployments of Markdown, have written Markdown parsers, have extensive experience supporting Markdown with end users – or all of the above.

John MacFarlane
David Greenspan
Vicent Marti
Neil Williams
Benjamin Dumke-von der Ehe
Jeff Atwood ]

So I installed the Python library for it with:
pip install commonmark
Then modified this snippet of example code from the CommonMark PyPI site:
import CommonMark
parser = CommonMark.DocParser()
renderer = CommonMark.HTMLRenderer()
print(renderer.render(parser.parse("Hello *World*")))
on my local machine, to add a few more types of Markdown syntax:
import CommonMark
parser = CommonMark.DocParser()
renderer = CommonMark.HTMLRenderer()
markdown_string = \
"""
Heading
=======
 
Sub-heading
-----------
 
# Atx-style H1 heading.
## Atx-style H2 heading.
### Atx-style H3 heading.
#### Atx-style H4 heading.
##### Atx-style H5 heading.
###### Atx-style H6 heading.
 
Paragraphs are separated
by a blank line.
 
Let 2 spaces at the end of a line to do a  
line break
 
Text attributes *italic*, **bold**, `monospace`.
 
A [link](http://example.com).
 
Shopping list:
 
  * apples
  * oranges
  * pears
 
Numbered list:
 
  1. apples
  2. oranges
  3. pears
 
"""
print(renderer.render(parser.parse(markdown_string)))
Here is a screenshot of the output HTML generated by CommonMark, loaded in Google Chrome:


Reddit user bracewel, who seems to be a CommonMark team member, said on the Py Reddit thread:

eventually we'd like to add a few more renderers, PDF/RTF being the first....

So CommonMark looks interesting and worth keeping an eye on, IMO.

- Vasudev Ram - Dancing Bison Enterprises - Python training and consulting

Dancing Bison - Contact Page

Saturday, March 8, 2014

The html5lib Python library (and Animatron :-)

By Vasudev Ram



I came across the html5lib Python library recently. The site describes it thusly:

"html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers."

So it doesn't say explicitly that it is for parsing HTML5, though the library name includes "5" in its name. But I tried it out on a simple HTML5 document and it seems to be able to parse HTML5 - at least the few HTML5 elements I tried it on.

Here's the code I used to try out html5lib:
# test_html5lib.py
# A program to try out the html5lib Python library.
# Author: Vasudev Ram - www.dancingbison.com
import html5lib

f = open("html5doc.html")
tree = html5lib.parse(f)
print "tree:"
print repr(tree)
print
print "items in tree:"

for item in tree:
    print item
    for item2 in item:
        print "-" * 4, item2
        for item3 in item2:
            print "-" * 8, item3
And here is the output of running python test_html5lib.py:

<Element u'{http://www.w3.org/1999/xhtml}head' at 0x02B663C8>
<Element u'{http://www.w3.org/1999/xhtml}body' at 0x02B66488>
---- <Element u'{http://www.w3.org/1999/xhtml}header' at 0x02B664B8>
-------- <Element u'{http://www.w3.org/1999/xhtml}h1' at 0x02B66530>
-------- <Element u'{http://www.w3.org/1999/xhtml}h2' at 0x02B664E8>
-------- <Element u'{http://www.w3.org/1999/xhtml}h3' at 0x02B665F0>
---- <Element u'{http://www.w3.org/1999/xhtml}p' at 0x02B66650>
---- <Element u'{http://www.w3.org/2000/svg}svg' at 0x02B66BC0>
-------- <Element u'{http://www.w3.org/2000/svg}defs' at 0x02B66B60>
-------- <Element u'{http://www.w3.org/2000/svg}rect' at 0x02B66B30>
-------- <Element u'{http://www.w3.org/2000/svg}text' at 0x02B66BD8>
---- <Element u'{http://www.w3.org/1999/xhtml}footer' at 0x02B66BF0>

Here is the documentation for html5lib.

And speaking of HTML5, coincidentally, I came across Animatron via Hacker News, today:



Animatron is "a simple and powerful online tool that allows you to create stunning HTML5 animations and interactive content." Animatron is not really related to html5lib, except for the fact that both of them are about HTML5, but it looks cool. Check it out.

Hacker News thread about Animatron.

Enjoy.


- Vasudev Ram - Dancing Bison Enterprises

Contact Page

Friday, November 22, 2013

Errata for recent post ""Publish Microsoft Excel XLSX data to HTML with openpyxl"



By Vasudev Ram

Dear readers, while publishing my recent post,

Publish Microsoft Excel XLSX data to HTML with openpyxl, there were some errors in the HTML markup in the code listing, that had to do with missing or wrongly typed HTML entities, HTML elements, or quotes.

My apologies for the inconvenience caused.

I've now posted the corrected code below:

# XLSXtoHTML.py

# Program to convert the data from an XLSX file to HTML.
# Uses the openpyxl library.

# Author: Vasudev Ram - http://www.dancingbison.com

import openpyxl
from openpyxl import load_workbook

workbook = load_workbook('fruits.xlsx')
worksheet = workbook.get_active_sheet()

html_data = """
<html>
    <head>
        <title>
        XLSX to HTML demo
        </title>
    </head>
    <body>
        <h3>
        XLSX to HTML demo
        </h3>
        <table>
"""

ws_range = worksheet.range('A1:H13')
for row in ws_range:
    html_data += "<tr>"
    for cell in row:
        if cell.value is None:
            html_data += "<td>" + ' ' + "</td>"
        else:
            html_data += "<td>" + str(cell.value) + "</td>"
    html_data += "</tr>"
html_data += "</table></body></html>"

with open("fruits.html", "w") as html_fil:
    html_fil.write(html_data)

# EOF


- Vasudev Ram - Dancing Bison Enterprises



Thursday, November 21, 2013

Publish Microsoft Excel XLSX data to HTML with openpyxl


By Vasudev Ram

I had come across openpyxl, a library by Eric Gazoni, for reading and writing Microsoft Excel XLSX files (Open Office XML), a while ago.

So today I wrote a demo program that reads the data from an XLSX file using openpyxl and writes that data to HTML as a table. Here is a screenshot of the sample XLSX file used, fruits.xlsx (click image to enlarge):


Here is the program, XLSXtoHTMLdemo.py:
# XLSXtoHTMLdemo.py

# Program to convert the data from an XLSX file to HTML.
# Uses the openpyxl library.

# Author: Vasudev Ram - http://www.dancingbison.com

import openpyxl
from openpyxl import load_workbook

workbook = load_workbook('fruits.xlsx')
worksheet = workbook.get_active_sheet()

html_data = """
<html>
    <head>
        <title>
        XLSX to HTML demo
        <title>
    <head>
    <body>
        <h3>
        XLSX to HTML demo
        <h3>
    <table>
"""

ws_range = worksheet.range('A1:H13')
for row in ws_range:
    html_data += "<tr>
    for cell in row:
        if cell.value is None:
            html_data += "<td> + ' ' + "<td>
        else:
            html_data += "<td> + str(cell.value) + "<td>
    html_data += "<tr>
html_data += "<table>lt;body>lt;html>

with open("fruits.html", "w") as html_fil:
    html_fil.write(html_data)

# EOF

You can run the program with:
python XLSXtoHTMLdemo.py
Then the program's HTML output will be in the file fruits.html, a screenshot of which is below (click to enlarge):


- Enjoy.

- Vasudev Ram - Python, C, Linux, databases, open source - training and consulting.

Read all Python posts on my blog.




O'Reilly 50% Ebook Deal of the Day

Sunday, December 16, 2012

D3.js looks interesting

D3.js - Data-Driven Documents

I have been seeing the D3.js JavaScript library mentioned on tech sites for some days. Took a look at it today. Seems interesting and useful.

It uses a functional style, at least in parts, which can result in shorter and clearer code, something like the difference between SQL and earlier C-ISAM based approaches to data handling.

D3.js is  an abstraction over the DOM but still gives you access to the underlying HTML, CSS and SVG elements.

The D3 Gallery:

https://github.com/mbostock/d3/wiki/Gallery

- Vasudev Ram
www.dancingbison.com

Wednesday, October 24, 2012

epubmaker, Project Gutenberg tool to convert between HTML, ReST, to EPUB, Kindle, PDF

By Vasudev Ram


epubmaker is a Project Gutenberg tool to convert HTML or restructured text to EPUB, Kindle, PDF formats.

- Vasudev Ram - Dancing Bison Enterprises


Thursday, September 27, 2012

Docverter HTTP API converts marked-up docs to PDF, Docx, RTF or ePub


Docverter, an HTTP API to convert marked-up docs to PDF, Docx, RTF or ePub (and other formats, both input and output).

Docverter is a paid service.

It uses pandoc, the swiss-army-knife format conversion tool (open source), which I've blogged about a couple of times before.

UPDATE: The Docverter service is not available yet - it is in closed beta. When you try to sign up, you see a form to enter your email address so they can inform you when it is open to use. They are using the model of gauging user interest and getting email addresses of people interested, as some other startups are doing nowadays. But in this case, the creator says that he already has some working code, just that it needs some improvement before letting users in. Interesting thread about it on Hacker News, where the creator, HN user zrail, also participates, answering questions about the service, including why it is a paid service when pandoc is free.

Inspired by nature.
- dancingbison.com | @vasudevram | jugad2.blogspot.com

Monday, September 17, 2012

Pdf2htmlEX, an open source PDF to HTML converter


Pdf2htmlEX, a PDF to HTML converter

Though it has a focus on converting PDFs with mathematical content, it works for regular PDFs too, says the site.

HN thread about pdf2htmlEx has many comments, some interesting.

Inspired by nature.
- dancingbison.com | @vasudevram | jugad2.blogspot.com