Tokenize

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ken Fettig

    Tokenize

    Does Python have an equivelent to the Java StringTokenizer ? If so, what is
    it and how do you implement it?
    Thanks
    Ken Fettig
    kenfettig@btine t.net
    [email protected] d.us



  • Alan Kennedy

    #2
    Re: Tokenize

    Ken Fettig wrote:
    [color=blue]
    > Does Python have an equivelent to the Java StringTokenizer ? If so,
    > what is it and how do you implement it?[/color]

    Is this the kind of thing that you mean?

    Python 2.3b1 (#40, Apr 25 2003, 19:06:24)
    Type "help", "copyright" , "credits" or "license" for more information.
    [color=blue][color=green][color=darkred]
    >>> s = "This is a string to be tokenised"
    >>> s.split()[/color][/color][/color]
    ['This', 'is', 'a', 'string', 'to', 'be', 'tokenised']
    [color=blue][color=green][color=darkred]
    >>> s = "This:is:a:stri ng:to:be:tokeni sed"
    >>> s.split(':')[/color][/color][/color]
    ['This', 'is', 'a', 'string', 'to', 'be', 'tokenised']
    [color=blue][color=green][color=darkred]
    >>> s.split(':', 2)[/color][/color][/color]
    ['This', 'is', 'a:string:to:be :tokenised'][color=blue][color=green][color=darkred]
    >>>[/color][/color][/color]

    Or maybe you have something more specific in mind?

    --
    alan kennedy
    -----------------------------------------------------
    check http headers here: http://xhaus.com/headers
    email alan: http://xhaus.com/mailto/alan

    Comment

    • Andrew Dalke

      #3
      Re: Tokenize

      Ken Fetting wants a 'StringTokenize r'.

      Alan Kennedy points out[color=blue][color=green][color=darkred]
      > >>> s = "This is a string to be tokenised"
      > >>> s.split()[/color][/color]
      > ['This', 'is', 'a', 'string', 'to', 'be', 'tokenised'][/color]
      ...[color=blue]
      > Or maybe you have something more specific in mind?[/color]

      Another option is the little-known 'shlex' module, part of the standard
      library.
      [color=blue][color=green][color=darkred]
      >>> import shlex, StringIO
      >>> infile = StringIO.String IO("""ls -lart "have space.*" will travel""")
      >>> x = shlex.shlex(inf ile)
      >>> x.get_token()[/color][/color][/color]
      'ls'[color=blue][color=green][color=darkred]
      >>> x.get_token()[/color][/color][/color]
      '-'[color=blue][color=green][color=darkred]
      >>> x.get_token()[/color][/color][/color]
      'lart'[color=blue][color=green][color=darkred]
      >>> x.get_token()[/color][/color][/color]
      '"have space.*"'[color=blue][color=green][color=darkred]
      >>> x.get_token()[/color][/color][/color]
      'will'[color=blue][color=green][color=darkred]
      >>> x.get_token()[/color][/color][/color]
      'travel'[color=blue][color=green][color=darkred]
      >>> x.get_token()[/color][/color][/color]
      ''[color=blue][color=green][color=darkred]
      >>>[/color][/color][/color]

      As you can see, it treats '-' unexpectedly (compared to the shell).
      Also, with __iter__ in newer Pythons, if these module were useful
      then it would be nice if "for token in shlex..." worked.

      Andrew
      dalke@dalkescie ntific.com


      Comment

      • Harry George

        #4
        Re: Tokenize

        "Ken Fettig" <kfettig@state. nd.us> writes:
        [color=blue]
        > Does Python have an equivelent to the Java StringTokenizer ? If so, what is
        > it and how do you implement it?
        > Thanks
        > Ken Fettig
        > kenfettig@btine t.net
        > [email protected] d.us
        >
        >
        >[/color]

        See shlex (in the main distribution) or see a variety of lexer/parser
        tools such as Ply, Yapp.py,

        shlex is about the level of complexity you want. See the Library
        Reference Manual for instructions.

        --
        harry.g.george@ boeing.com
        6-6M31 Knowledge Management
        Phone: (425) 342-5601

        Comment

        • Hartmut Goebel

          #5
          Re: Tokenize

          Andrew Dalke schrieb:
          [color=blue]
          > Another option is the little-known 'shlex' module, part of the standard
          > library.[/color]
          ....[color=blue]
          > As you can see, it treats '-' unexpectedly (compared to the shell).[/color]

          This is why shellword was written (see
          <http://www.crazy-compilers.com/py-lib/>)

          Regards
          Hartmut Goebel
          --
          | Hartmut Goebel | We build the crazy compilers |
          | [email protected] | Compiler Manufacturer |

          Comment

          Working...