Windows XP - Environment variable - Unicode

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • sebastien.hugues

    Windows XP - Environment variable - Unicode

    Hi

    I would like to retrieve the application data directory path of the
    logged user on
    windows XP. To achieve this goal i use the environment variable
    APPDATA.

    The logged user has this name: sébastien. The second character is not an
    ascii one and when i try to encode the path that contains this name in
    utf-8,
    i got this error:

    Ascii error: index not in range (128)

    I would like to first decode this string and then re-encode it in utf-8, but
    i am not able to find out what encoding is used when i make:

    appdata = os.environ ['APPDATA']

    Any ideas ?

    Thanks in advance
    Sebastien

  • Rob Williscroft

    #2
    Re: Windows XP - Environment variable - Unicode

    sebastien.hugue s wrote in news:3f0e77fc@e pflnews.epfl.ch :
    [color=blue]
    > Hi
    >
    > I would like to retrieve the application data directory path of the
    > logged user on
    > windows XP. To achieve this goal i use the environment variable
    > APPDATA.
    >
    > The logged user has this name: sébastien. The second character is not
    > an ascii one and when i try to encode the path that contains this name
    > in utf-8,
    > i got this error:
    >
    > Ascii error: index not in range (128)
    >
    > I would like to first decode this string and then re-encode it in
    > utf-8, but i am not able to find out what encoding is used when i
    > make:
    >
    > appdata = os.environ ['APPDATA']
    >
    > Any ideas ?
    >[/color]

    I don't know if it will help but:
    [color=blue][color=green][color=darkred]
    >>> import win32com.client
    >>> shell = win32com.client .Dispatch("WScr ipt.Shell")
    >>> env = shell.GetEnviro nment("VOLATILE ")[/color][/color][/color]
    [color=blue][color=green][color=darkred]
    >>> j = []
    >>> for i in env:[/color][/color][/color]
    .... j.append(i)
    ....[color=blue][color=green][color=darkred]
    >>> j[/color][/color][/color]
    [u'LOGONSERVER=\ \\\COMPUTERNAME ', u'APPDATA=C:\\D ocuments and Settings
    \\username\\App lication Data'][color=blue][color=green][color=darkred]
    >>>[/color][/color][/color]

    Note the leading u, which I don't get with:
    [color=blue][color=green][color=darkred]
    >>> import os
    >>> os.environ["APPDATA"][/color][/color][/color]
    'C:\\Documents and Settings\\usern ame\\Applicatio n Data'

    Also note that APPDATA should also be in[color=blue][color=green][color=darkred]
    >>> env = shell.GetEnviro nment("PROCESS" )[/color][/color][/color]

    HTH

    Rob.
    --

    Comment

    • John Roth

      #3
      Re: Windows XP - Environment variable - Unicode


      "sebastien.hugu es" <sebastien.hugu [email protected] g> wrote in message
      news:3f0e77fc@e pflnews.epfl.ch ...[color=blue]
      > Hi
      >
      > I would like to retrieve the application data directory path of the
      > logged user on
      > windows XP. To achieve this goal i use the environment variable
      > APPDATA.
      >
      > The logged user has this name: sébastien. The second character is not an
      > ascii one and when i try to encode the path that contains this name in
      > utf-8,
      > i got this error:
      >
      > Ascii error: index not in range (128)
      >
      > I would like to first decode this string and then re-encode it in utf-8,[/color]
      but[color=blue]
      > i am not able to find out what encoding is used when i make:
      >
      > appdata = os.environ ['APPDATA']
      >
      > Any ideas ?[/color]

      I don't think encoding is an issue. Windows XP stores all character data as
      unicode internally, so whatever you get back from os.environ() is either
      going to be unicode, or it's going to be translated back to some single byte
      code by Python. In the latter case, you may not be able to recover non-ascii
      values, so Rob Willscroft's workaround to get the unicode version may be
      your only hope.

      If you're getting a standard string though, I'd try using Latin-1, or the
      Windows
      equivalent first (it's got an additional 32 characters that aren't in
      Latin-1.)
      Sorry I don't remember the actual names.

      Note that Release 2.3 fixes the unicode problems for files under XP.
      It's currently in late beta, though. I don't know if it fixes the
      os.environ()
      interface though, and it's rather late to get anything into 2.3.

      John Roth

      [color=blue]
      >
      > Thanks in advance
      > Sebastien
      >[/color]


      Comment

      • Martin v. Löwis

        #4
        Re: Windows XP - Environment variable - Unicode

        John Roth wrote:
        [color=blue]
        > I don't think encoding is an issue. Windows XP stores all character data as
        > unicode internally, so whatever you get back from os.environ() is either
        > going to be unicode, or it's going to be translated back to some single byte
        > code by Python.[/color]

        Read the source, Luke. Python uses environ, which is a C library
        variable pointing to byte strings, so no Unicode here.
        [color=blue]
        > In the latter case, you may not be able to recover non-ascii
        > values, so Rob Willscroft's workaround to get the unicode version may be
        > your only hope.[/color]

        You are certainly able to recover non-ascii values, as long as they
        only use CP_ACP.
        [color=blue]
        > If you're getting a standard string though, I'd try using Latin-1, or the
        > Windows equivalent first (it's got an additional 32 characters that aren't in
        > Latin-1.)[/color]

        That, in general, is wrong. It is only true for the Western European and
        American editions of Windows. In all other installations, CP_ACP differs
        significantly from Latin-1.
        [color=blue]
        > Note that Release 2.3 fixes the unicode problems for files under XP.
        > It's currently in late beta, though. I don't know if it fixes the
        > os.environ()[/color]

        It doesn't. "Fixing" something here is less urgent and more difficult,
        as environment variables rarely exceed CP_ACP.

        If people get support for Unicode environment variables, they want
        Unicode command line arguments next.

        Regards,
        Martin

        Comment

        • John Roth

          #5
          Re: Windows XP - Environment variable - Unicode


          "Martin v. Löwis" <[email protected] s.de> wrote in message
          news:3F10795B.9 [email protected] .de...[color=blue]
          > John Roth wrote:
          >[color=green]
          > > I don't think encoding is an issue. Windows XP stores all character data[/color][/color]
          as[color=blue][color=green]
          > > unicode internally, so whatever you get back from os.environ() is either
          > > going to be unicode, or it's going to be translated back to some single[/color][/color]
          byte[color=blue][color=green]
          > > code by Python.[/color]
          >
          > Read the source, Luke.[/color]

          I haven't gotten into the Python source, and my name is not Luke.
          Also, don't respond to my e-mail address. Unfortunately, I had a problem
          where I had to reload my system, and it's gotten out to usenet. It used
          to go to an ISP I no longer have an account with.
          [color=blue]
          > Python uses environ, which is a C library
          > variable pointing to byte strings, so no Unicode here.[/color]

          The OP's question revolved around ***which*** code page was
          being used internally. Windows uses Unicode. That's not the same
          question as what code set Python uses to attempt to translate Unicode
          into a single byte character set.
          [color=blue][color=green]
          > > In the latter case, you may not be able to recover non-ascii
          > > values, so Rob Willscroft's workaround to get the unicode version may be
          > > your only hope.[/color]
          >
          > You are certainly able to recover non-ascii values, as long as they
          > only use CP_ACP.[/color]

          I said "may not," not "cannot in any and all circumstances."
          [color=blue][color=green]
          > > If you're getting a standard string though, I'd try using Latin-1, or[/color][/color]
          the[color=blue][color=green]
          > > Windows equivalent first (it's got an additional 32 characters that[/color][/color]
          aren't in[color=blue][color=green]
          > > Latin-1.)[/color]
          >
          > That, in general, is wrong. It is only true for the Western European and
          > American editions of Windows. In all other installations, CP_ACP differs
          > significantly from Latin-1.[/color]

          The OP's problem was a character that's in the Western European range.
          [color=blue][color=green]
          > > Note that Release 2.3 fixes the unicode problems for files under XP.
          > > It's currently in late beta, though. I don't know if it fixes the
          > > os.environ()[/color]
          >
          > It doesn't. "Fixing" something here is less urgent and more difficult,
          > as environment variables rarely exceed CP_ACP.[/color]

          Less urgent I can see, unless you're concerned about whether Python
          survives against systems that do it right. Now that the Windows 9x
          series is dying off, the vast majority of systems on the desktop are
          going to have Unicode support internally. Granted, Python is not
          targeted at "the vast majority of systems," but if you can't easily get
          Unicode from the environment and the registry, then it's not very
          useful for system administration tasks or automation tasks on
          Windows.

          Many, if not most, environment variables are file names. If file
          names need Unicode support, then so do environment variables.

          As to more difficult, as I said above, I haven't perused the source,
          so I can't comment on that. If I had to do it myself, I'd probably
          start out by always using the Unicode variant of the Windows API
          call, and then check the type of the arguement to environ() to determine
          which to pass back. I'm not sure whether or not I'd throw an exception
          if the actual value couldn't be translated to the current SBCS code.
          [color=blue]
          > If people get support for Unicode environment variables, they want
          > Unicode command line arguments next.[/color]

          Why not? I can enter a command with Unicode at the Windows
          command prompt, and that command is likely to contain file names.
          Same problem raising it's head in a different spot.

          John Roth

          On reading this over, it does sound a bit more strident than my
          responses usually do, but I will admit to being irritated at the
          assumption that you need to read the source to find out the
          answer to various questions.
          [color=blue]
          > Regards,
          > Martin
          >[/color]


          Comment

          • Martin v. Löwis

            #6
            Re: Windows XP - Environment variable - Unicode

            "John Roth" <newsgroups@jhr othjr.com> writes:
            [color=blue]
            > The OP's question revolved around ***which*** code page was
            > being used internally. Windows uses Unicode. That's not the same
            > question as what code set Python uses to attempt to translate Unicode
            > into a single byte character set.[/color]

            Yes and no. What Windows uses is largely irrelevant, as Python does
            not use Windows here. Instead, it uses the Microsoft C library, in
            which environment variables are *not* stored in some Unicode encoding,
            when accessed through the _environ pointer.
            [color=blue]
            > As to more difficult, as I said above, I haven't perused the source,
            > so I can't comment on that. If I had to do it myself, I'd probably
            > start out by always using the Unicode variant of the Windows API
            > call, and then check the type of the arguement to environ() to determine
            > which to pass back. I'm not sure whether or not I'd throw an exception
            > if the actual value couldn't be translated to the current SBCS code.[/color]

            Notice that os.environ is not a function, but a dictionary. So there
            is no system call involved when retrieving an environment
            variable. Instead, they are all precomputed.
            [color=blue]
            > On reading this over, it does sound a bit more strident than my
            > responses usually do, but I will admit to being irritated at the
            > assumption that you need to read the source to find out the
            > answer to various questions.[/color]

            If the question is "how does software Foo do something", the *only*
            reliable way is to read the source. You may have a mental model that
            may allow you to give an educated guess how Foo *might* do
            something. In this case, your educated guess was wrong, that's why I
            referred you to the source.

            Regards,
            Martin

            Comment

            • John Roth

              #7
              Re: Windows XP - Environment variable - Unicode


              "Martin v. Löwis" <[email protected] s.de> wrote in message
              news:m33chakd25 [email protected] matik.hu-berlin.de...[color=blue]
              > "John Roth" <newsgroups@jhr othjr.com> writes:
              >[color=green]
              > > The OP's question revolved around ***which*** code page was
              > > being used internally. Windows uses Unicode. That's not the same
              > > question as what code set Python uses to attempt to translate Unicode
              > > into a single byte character set.[/color]
              >
              > Yes and no. What Windows uses is largely irrelevant, as Python does
              > not use Windows here. Instead, it uses the Microsoft C library, in
              > which environment variables are *not* stored in some Unicode encoding,
              > when accessed through the _environ pointer.[/color]

              I've found at various times that using the C library causes lots of
              problems with Microsoft.
              [color=blue][color=green]
              > > As to more difficult, as I said above, I haven't perused the source,
              > > so I can't comment on that. If I had to do it myself, I'd probably
              > > start out by always using the Unicode variant of the Windows API
              > > call, and then check the type of the arguement to environ() to determine
              > > which to pass back. I'm not sure whether or not I'd throw an exception
              > > if the actual value couldn't be translated to the current SBCS code.[/color]
              >
              > Notice that os.environ is not a function, but a dictionary. So there
              > is no system call involved when retrieving an environment
              > variable. Instead, they are all precomputed.[/color]

              Good point. That does make it somewhat harder; the routine
              would have to precompute both versions, and store them with
              both standard strings and unicode strings as keys. Whether the
              overhead would be worth it is debatable. It's not, however,
              all that difficult to understand for the user of the facility, though.
              It would work exactly the same way the file functions work: if
              you use a unicode key, you get a unicode result.

              John Roth
              [color=blue]
              >
              > Regards,
              > Martin
              >[/color]


              Comment

              • John Roth

                #8
                Re: Windows XP - Environment variable - Unicode


                "Fredrik Lundh" <fredrik@python ware.com> wrote in message
                news:mailman.10 58088861.14670. [email protected] ...[color=blue]
                > John Roth wrote:
                >[color=green][color=darkred]
                > > > Read the source, Luke.[/color]
                > >
                > > I haven't gotten into the Python source, and my name
                > > is not Luke.[/color]
                >
                > And life's to short to waste on movies...[/color]

                Depends on what your goals in life are.
                [color=blue][color=green]
                > > On reading this over, it does sound a bit more strident than my
                > > responses usually do, but I will admit to being irritated at the
                > > assumption that you need to read the source to find out the
                > > answer to various questions.[/color]
                >
                > Well, you obviously didn't bother to read the documentation for
                > os.environ, so pointing you to the source sounds like a reasonable
                > idea.[/color]

                Not particularly. I might be one of that not inconsiderable number
                of people that doesn't know C. I'm not, but the number of people
                who use Python and who don't know C is not zero.

                I like Python because, for the most part, it's much more
                understandable than many languages I know, and that
                makes it much more productive. What I've learned in this
                conversation is that os.environ fails to handle one of the
                major corner cases in a Windows NT/2000/XP environment.
                So if I need that corner case, I'm going to have to use
                the Windows API call. Not a big deal, but also not something
                that I regard as one of the language's strengths.

                John Roth
                [color=blue]
                > </F>[/color]


                Comment

                • John Roth

                  #9
                  Re: Windows XP - Environment variable - Unicode


                  "Martin v. Löwis" <[email protected] s.de> wrote in message
                  news:bert0r$o6$ [email protected]...[color=blue]
                  > John Roth wrote:[color=green]
                  > > Good point. That does make it somewhat harder; the routine
                  > > would have to precompute both versions, and store them with
                  > > both standard strings and unicode strings as keys.[/color]
                  >
                  > That doesn't work. You cannot have separate dictionary entries
                  > for unicode and byte string keys if the keys compare and hash
                  > equal, which is the case for all-ASCII keys (which environment
                  > variable names typically are).[/color]

                  Ah, so.

                  John Roth[color=blue]
                  >
                  > Regards,
                  > Martin
                  >[/color]


                  Comment

                  Working...