Unicode question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Gerhard Häring

    Unicode question

    >>> u"äöü"
    u'\x84\x94\x81'

    (Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")

    Why does this work?

    Does Python guess which encoding I mean? I thought Python should refuse
    to guess :-)


    -- Gerhard

  • Thomas Heller

    #2
    Re: Unicode question

    Gerhard Häring <[email protected] > writes:
    [color=blue][color=green][color=darkred]
    > >>> u"äöü"[/color][/color]
    > u'\x84\x94\x81'
    >
    > (Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")
    >
    > Why does this work?
    >
    > Does Python guess which encoding I mean? I thought Python should
    > refuse to guess :-)[/color]

    I stumbled over this yesterday, and it seems it is (at least) partially
    answered by PEP 263:

    In Python 2.1, Unicode literals can only be written using the
    Latin-1 based encoding "unicode-escape". This makes the programming
    environment rather unfriendly to Python users who live and work in
    non-Latin-1 locales such as many of the Asian countries. Programmers
    can write their 8-bit strings using the favorite encoding, but are
    bound to the "unicode-escape" encoding for Unicode literals.

    I have the impression that this is undocumented on purpose, because you
    should not write unescaped non-ansi characters into the source file
    (with 'unknown' encoding).

    Thomas

    Comment

    • Gerhard Häring

      #3
      Re: Unicode question

      Thomas Heller wrote:[color=blue]
      > Gerhard Häring <[email protected] > writes:
      >
      > [color=green][color=darkred]
      >> >>> u"äöü"[/color]
      >>u'\x84\x94\x8 1'
      >>
      >>(Python 2.2.3/2.3b2; sys.getdefaulte ncoding() == "ascii")
      >>
      >>Why does this work?
      >>
      >>Does Python guess which encoding I mean? I thought Python should
      >>refuse to guess :-)[/color]
      >
      >
      > I stumbled over this yesterday, and it seems it is (at least) partially
      > answered by PEP 263:
      >
      > In Python 2.1, Unicode literals can only be written using the
      > Latin-1 based encoding "unicode-escape". This makes the programming
      > environment rather unfriendly to Python users who live and work in
      > non-Latin-1 locales such as many of the Asian countries. Programmers
      > can write their 8-bit strings using the favorite encoding, but are
      > bound to the "unicode-escape" encoding for Unicode literals.
      >
      > I have the impression that this is undocumented on purpose, because you
      > should not write unescaped non-ansi characters into the source file
      > (with 'unknown' encoding).[/color]

      I agree that using latin1 as default is bad. If there's an encoding
      cookie in the 2.3+ source file then this encoding could be used.

      I stumbled on this when giving another Python user on this list a
      pointer to the relevant section in the Python tutorial
      (http://www.python.org/doc/current/tu...00000000000000)
      where Guido uses u"äöü" in an example.

      As this is BAD the tutorial should probably be changed. I'll file a bug
      report.

      -- Gerhard

      Comment

      • Gerhard Häring

        #4
        Re: Unicode question

        Gerhard Häring wrote:[color=blue]
        > Ricardo Bugalho wrote:[color=green]
        >> On Fri, 18 Jul 2003 02:07:13 +0200, Gerhard Häring wrote:
        >>[color=darkred]
        >>>> Gerhard Häring <[email protected] > writes:
        >>>>
        >>>>>>>> u"äöü"
        >>>>>
        >>>>> u'\x84\x94\x81'
        >>>>> [this works, but IMO shouldn't][/color][/color][/color]
        [color=blue]
        > [...]
        > You'll get warnings if you don't define an encoding (either encoding
        > cookie or BOM) and use 8-Bit characters in your source files. These
        > warnings will becomome errors in later Python versions.
        >
        > It's all in the PEP :)[/color]

        I feel like an idiot now :-( I do get the warnings when I run a Python
        script, but I do not get the warnings when I'm using the interactive
        prompt. So it's all good (almost). Why not also produce warnings at the
        interactive prompt?

        -- Gerhard

        Comment

        Working...