Vacuuming unicode database

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Tambet Matiisen

    Vacuuming unicode database


    My Postgres databases used to have default (SQL_ASCII) encoding. I could
    store any 8-bit character in it regardless of actual charset, because
    all clients also used default encoding and no charset conversion was
    done.

    Now we started a new project with Qt3 and it's Postgres driver defaults
    to UNICODE client encoding. This time charset conversion comes to play
    and messes up all 8-bit characters. This forces me to create all
    databases with correct charsets. So I recreated my database with UNICODE
    encoding and imported all data in it, paying attention to client
    encoding. I didn't re-initdb whole cluster, only recreated problematic
    database. Everything seems to work fine, only I can't vacuum the
    database:

    epos=# vacuum verbose analyze;
    INFO: --Relation pg_catalog.pg_d escription--
    INFO: Pages 18: Changed 0, Empty 0; Tup 1895: Vac 0, Keep 0, UnUsed 5.
    Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO: --Relation pg_toast.pg_toa st_16416--
    INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
    Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO: Analyzing pg_catalog.pg_d escription
    INFO: --Relation pg_catalog.pg_g roup--
    INFO: Pages 1: Changed 0, Empty 0; Tup 31: Vac 0, Keep 0, UnUsed 31.
    Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO: --Relation pg_toast.pg_toa st_1261--
    INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
    Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO: Analyzing pg_catalog.pg_g roup
    ERROR: Invalid UNICODE character sequence found (0xdc6b)

    Table pg_group is giving errors, because I have group name with 8-bit
    characters in it. As I understand, groups are common for all databases
    and pg_group is created during initdb, so it should be considered having
    SQL_ASCII charset, not UNICODE. Seems like a bug to me?

    What would you suggest in this case:
    1. Re-initdb with UNICODE encoding and recreate all databases. Basically
    all databases should have the same encoding.
    2. Use some single-byte encoding for database, instead of UNICODE.
    Vacuuming wouldn't complain any more, but I have some doubts that CREATE
    GROUP "Name with 8-bit characters" behaves differently depending on
    encoding of the active database.

    Tambet

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

  • Tom Lane

    #2
    Re: Vacuuming unicode database

    "Tambet Matiisen" <t.matiisen@apr ote.ee> writes:[color=blue]
    > Table pg_group is giving errors, because I have group name with 8-bit
    > characters in it. As I understand, groups are common for all databases
    > and pg_group is created during initdb, so it should be considered having
    > SQL_ASCII charset, not UNICODE. Seems like a bug to me?[/color]

    Unfortunately, we don't have any way of dealing with different character
    sets or locales in different tables. (AFAICT this is not practical
    without implementing our own locale library, which would be an enormous
    task; someday we will solve this problem, but don't hold your breath.)
    So pg_shadow, pg_group, and pg_database are all risk spots.

    I think the best advice is to limit your user/group/database names to
    7-bit-ASCII if you are going to use different encodings in different
    databases. That way, they'll look valid in all databases.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Comment

    Working...