email bug?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Stuart D. Gathman

    email bug?

    Running the following with Python 2.2.2:

    from email.Parser import Parser

    txt = """Subject: IE is Evil
    Content-Type: image/pjpeg; name="Jim&& amp;Jill"

    <html>
    </html>
    """

    msg = email.message_f rom_string(txt)
    print msg.get_params( )

    I get:
    [('image/pjpeg', ''), ('name', '"Jim&amp'), ('&amp', ''), ('Jill"', '')]

    What IE apparently gets is:

    [('image/pjpeg', ''), ('name', '"Jim&amp;&amp; Jill"')]

    Is this a bug (in the email package, I mean - obviously IE is buggy)?

    Do I have to write my own custom param parsing routines to handle this?

    --
    Stuart D. Gathman <[email protected] m>
    Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
    "Confutatis maledictis, flamis acribus addictis" - background song for
    a Microsoft sponsored "Where do you want to go from here?" commercial.
  • Andrew Dalke

    #2
    Re: email bug?

    Stuart D. Gathman:[color=blue]
    > Content-Type: image/pjpeg; name="Jim&amp;& amp;Jill"[/color]
    [color=blue]
    > What IE apparently gets is:
    >
    > [('image/pjpeg', ''), ('name', '"Jim&amp;&amp; Jill"')]
    >
    > Is this a bug (in the email package, I mean - obviously IE is buggy)?
    >
    > Do I have to write my own custom param parsing routines to handle this?[/color]

    BTW, I verified this in 2.3.

    Looks like the Content-Type syntax is defined in

    5.1. Syntax of the Content-Type Header Field

    content := "Content-Type" ":" type "/" subtype
    *(";" parameter)

    parameter := attribute "=" value

    value := token / quoted-string

    token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
    or tspecials>

    tspecials := "(" / ")" / "<" / ">" / "@" /
    "," / ";" / ":" / "\" / <">
    "/" / "[" / "]" / "?" / "="
    ; Must be in quoted-string,
    ; to use within parameter values

    So the ";" must be in a quoted string. That's defined in
    RFC 822, http://www.faqs.org/rfcs/rfc822.html
    (now obsolete)

    quoted-string = <"> *(qtext/quoted-pair) <">

    qtext = <any CHAR excepting <">, ; => may be folded
    "\" & CR, and including
    linear-white-space>

    CHAR = <any ASCII character>

    The ';' is in CHAR and is not "\" nor CR so it's in qtext,
    so it's part of quoted-string, so it's allowed in a value
    without extra interpretation.

    I looks like 2822 (the updated version of 822) a
    http://www.faqs.org/rfcs/rfc2822.html agrees.

    So I think it's a bug in the email module's parser.

    The actual bug is in email/Parser.py with

    # Regular expression used to split header parameters. BAW: this may be too
    # simple. It isn't strictly RFC 2045 (section 5.1) compliant, but it
    catches
    # most headers found in the wild. We may eventually need a full fledged
    # parser eventually.
    paramre = re.compile(r'\s *;\s*')

    A quick scan of the code suggests that it isn't a quick fix (eg,
    not just a matter of tweaking that regexp.

    Could you file a bug report against it?

    Andrew
    dalke@dalkescie ntific.com


    Comment

    • Stuart D. Gathman

      #3
      Re: email bug?

      On Sun, 24 Aug 2003 00:14:45 -0400, Andrew Dalke wrote:

      [color=blue]
      > A quick scan of the code suggests that it isn't a quick fix (eg, not
      > just a matter of tweaking that regexp.[/color]

      Here is my quick (and probably incorrect) fix:

      from email.Message import Message
      from email.Utils import unquote

      # helper to split params while ignoring ';' inside quotes
      def _parseparam(str ):
      plist = []
      while str[:1] == ';':
      str = str[1:]
      end = str.find(';')
      while end > 0 and (str.count('"', 0,end) & 1):
      end = str.find(';',en d + 1)
      if end < 0: end = len(str)
      f = str[:end]
      if '=' in f:
      i = f.index('=')
      f = f[:i].strip().lower( ) + \
      '=' + f[i+1:].strip()
      plist.append(f. strip())
      str = str[end:]
      return plist

      class MimeMessage(Mes sage):

      def getparam(self,n ame,header='con tent-type'):
      for key,val in self.getparams( header):
      if key == name: return unquote(val)
      return None

      # like get_params but obey quotes
      def getparams(self, header='content-type'):
      "Return all parameter names and values. Use parser that handles quotes."
      val = self.get(header )
      result = []
      if val:
      plist = _parseparam(';' + val)
      for p in plist:
      i = p.find('=')
      if i >= 0: result.append(( p[:i].lower(),unquot e(p[i+1:])))
      return result

      --
      Stuart D. Gathman <[email protected] m>
      Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
      "Confutatis maledictis, flamis acribus addictis" - background song for
      a Microsoft sponsored "Where do you want to go from here?" commercial.

      Comment

      • Andrew Dalke

        #4
        Re: email bug?

        Stuart D. Gathman:[color=blue]
        > Here is my quick (and probably incorrect) fix:[/color]

        There's a test suite in email.tests. It includes tests for
        getparams, and I see some commented out code which
        lists a test known to fail.

        You could use that to check the validity of your code.

        Andrew
        dalke@dalkescie ntific.com


        Comment

        Working...