Starting point for unicode conversion

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Howard Lightstone

    Starting point for unicode conversion

    I *foolishly* started a Python project (3 years ago) with considering
    Unicode issues. Now, I want to resolve future problems with international
    versions of my software.

    The key point here is Tkinter. I believe (from reading this list) that I
    can expect that SOME returned text may be Unicode (depending on content and
    Windows locale settings).

    Would it be best to just (somehow) force all text into Unicode or would it
    be "better" to handle specific instances?

    I also have the problem of embedded text in data files I create that I have
    to store as *something* that I can fully recover and convert back to
    something reasonable even if the locale changes.

    Any thoughts welcome .... this is something I am NOT looking forward to.

    Thanks
  • Martin v. Löwis

    #2
    Re: Starting point for unicode conversion

    Howard Lightstone <howard@eegsoft ware.com> writes:
    [color=blue]
    > The key point here is Tkinter. I believe (from reading this list) that I
    > can expect that SOME returned text may be Unicode (depending on content and
    > Windows locale settings).[/color]

    Yes, and no. Yes, some returned text may be Unicode, but no, it won't
    depend on the locale settings. Instead, Tkinter will return a byte
    string if the result contains only ASCII characters, and return a
    Unicode string if there are non-ASCII characters.
    [color=blue]
    > Would it be best to just (somehow) force all text into Unicode or would it
    > be "better" to handle specific instances?[/color]

    If you are prepared to deal with Unicode, it would be best to force
    that throughout. I was contemplating to make this an option in
    _tkinter, but that has not been implemented - contributions are
    welcome.

    Meanwhile, you can use

    s = unicode(s)

    on all strings returned from Tkinter: if s is an ASCII string, the
    default encoding should happily convert it to a Unicode object; if s
    is a Unicode string, unicode(s) will be a no-op.
    [color=blue]
    > I also have the problem of embedded text in data files I create that I have
    > to store as *something* that I can fully recover and convert back to
    > something reasonable even if the locale changes.[/color]

    Don't worry about the locale; it does not matter here.

    Regards,
    Martin

    Comment

    Working...