Skip to content

non-ascii characters assigned to var/char columns in SQL are truncated to one byte #7932

@Ceshion

Description

@Ceshion

Issue Description

We are using MS-SQL in typeorm, and our data model includes some varchar columns. When we are inserting or updating records using strings that contain non-ASCII characters, those characters are stored without any errors, but using only their second byte. For example, if we store "\u2021" (‡), the value actually stored to the database will be equal to "\u0021" (!). Note also that the object returned from e.g. save shows the original value, "\u2021" (‡). The collation setting for the database, tables, and columns are all SQL_Latin1_General_CP1_CI_AS, and the characters we are attempting to store can therefore be represented by the corresponding codepage.

Expected Behavior

If a value can be stored correctly in a given var/char column in MS-SQL based on the relevant collation settings, we would expect that it would be stored unchanged. If it cannot, we would expect an error to be thrown when trying to store it. It seems that determining whether a value can be stored or not should be the responsibility of the database server, and we could just forward any error from it to the end consumer.

Actual Behavior

Attempting to store any value that contains characters that do not fit into one byte in MS-SQL results in each character being stored as if only the second byte were set.

Steps to Reproduce

  1. Set up a project with the mssql driver, and an ormconfig with a type: 'mssql' connection which has an entity type that has at least one char or varchar column
  2. Store any string into that or those columns that contain a character that must be encoded in ucs2 or above
  3. Inspect the actual value stored to the database

./index.ts

import {
    createConnection,
    getConnection,
    Repository
} from 'typeorm';
import { OneModel } from './entity/OneModel';

async function main() {
    await createConnection();
    const connection = getConnection();
    const repository: Repository<OneModel> =
        connection.getRepository(OneModel);

    const one = new OneModel();
    one.content = '\u2021';

    const claimedSavedOne = await repository.save(one);
    const actualSavedOne = 
        await repository.findOne({ order: { created: 'DESC' } });

    // expect(claimedSavedOne.content).toBe(one.content) // assertion succeeds
    // expect(actualSavedOne.content).toBe(one.content); // assertion fails
    // or
    console.log(claimedSavedOne.content === one.content); // Output: true
    console.log(claimedSavedOne.content, one.content); // Output: ‡ ‡

    console.log(actualSavedOne?.content === one.content); // Output: false
    console.log(actualSavedOne?.content, one.content); // Output: ! ‡
}

main().then(() => process.exit(0));

./entity/OneModel.ts

import {
    Entity,
    PrimaryGeneratedColumn,
    Column,
    CreateDateColumn
} from 'typeorm';

@Entity()
export class OneModel {

    @PrimaryGeneratedColumn('uuid')
    id?: string;

    @CreateDateColumn({ type: 'datetime' })
    created?: Date;

    @Column('varchar', { length: 10 })
    content: string = '';
}

My Environment

Dependency Version
Operating System Windows 10.2004 (build 19041.1110) and macOS v11.2.3
Node.js version v14.15.5 and v8.9.3
Typescript version v4.3.5 and v2.9.2
TypeORM version v0.2.34 and v0.2.22
mssql driver version v7.1.3 and v4.1.0
Database version Microsoft SQL Server 2019 Express and Microsoft SQL Server 2017

Additional Context

I have tracked down the point where truncation occurs to be within tedious' data-types/varchar module, in which varchar values are converted to bytes to be written into the SQL RPC request stream using Buffer.from(value, 'ascii'). Based on this issue it seems that the onus is on consumers to use nvarchar parameters when appropriate as a solution to this issue, even when setting a varchar column.

Relevant Database Driver(s)

  • aurora-data-api
  • aurora-data-api-pg
  • better-sqlite3
  • cockroachdb
  • cordova
  • expo
  • mongodb
  • mysql
  • nativescript
  • oracle
  • postgres
  • react-native
  • sap
  • sqlite
  • sqlite-abstract
  • sqljs
  • sqlserver

Are you willing to resolve this issue by submitting a Pull Request?

  • Yes, I have the time, and I know how to start.
  • Yes, I have the time, but I don't know how to start. I would need guidance.
  • No, I don't have the time, although I believe I could do it if I had the time...
  • No, I don't have the time and I wouldn't even know how to start.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions