Miki> Hello All, I'm looking for a PDF parser. Any pointers?
A little more info would be helpful: do you need access to all the pdf
structures or just the text? AFAIK, there is no full pdf parser in
python. The subject has come up several times before, so check the
google.groups archives
"John Hunter" <[email protected] sd.uchicago.edu >
[color=blue]
> A little more info would be helpful: do you need access to all the pdf
> structures or just the text? AFAIK, there is no full pdf parser in
> python.[/color]
If you need to access the graphical elements, you may use pstoedit to
convert the PDF into SVG (Structured Vector Graphics). Since SVG is XML, you
can then use any Python-based XML toolkit to parse the data.
Comment