Parsing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michael

    Parsing

    I have been assigned a project to parse a webpage for data using
    Python. I have finished only basic tutorials. Any suggestions as to
    where I should go from here? Thanks in advance.
  • Simon Bayling

    #2
    Re: Parsing

    whatsupg21@hotm ail.com (Michael) wrote in
    news:e5fb8973.0 307100938.13fce [email protected] gle.com:
    [color=blue]
    > I have been assigned a project to parse a webpage for data using
    > Python. I have finished only basic tutorials. Any suggestions as to
    > where I should go from here? Thanks in advance.
    >[/color]

    Parsing? What are you looking for?
    Do you have to download the page as well?

    If it's a fairly simple thing to find, you could use something like;
    [color=blue][color=green][color=darkred]
    >>> import urllib
    >>> source = urllib.urlopen( "http://www.google.com" ).readlines()
    >>> for line in source:
    >>> if line.find("logo .gif") > -1:
    >>> print "Found google logo"[/color][/color][/color]

    If the data to find is more complicated, or you need to parse the HTML as
    well, you should look at more string methods, maybe regular expressions
    (import re)...

    Cheers,
    Simon.

    Comment

    • Peter van Kampen

      #3
      Re: Parsing

      In article <e5fb8973.03071 00938.13fcea56@ posting.google. com>, Michael wrote:[color=blue]
      > I have been assigned a project to parse a webpage for data using
      > Python. I have finished only basic tutorials. Any suggestions as to
      > where I should go from here? Thanks in advance.[/color]


      Try to be a little more specific. Parse for what? Links? Images? Tags?

      Anyway. A good start might be the HTMLParser that comes with the
      batteries since 2.2 if I remember correctly. See



      for a tiny example.

      PterK

      --
      Peter van Kampen
      pterk -- at -- datatailors.com

      Comment

      Working...