Skip to content

SQLServerBulkCSVFileRecord incorrectly splits line when given delimiters with special regex characters  #1691

@stavshamir

Description

@stavshamir

Driver version

9.2.1.jre15

JAVA/JVM version

JDK 15

Problem description

Expected behaviour: Importing data from a csv file with a delimiter containing a special regex character (for example ("|") should work

I have a csv file where the delimiter is the pipe character (|):

FOO1 | BAR1 | 1
FOO2 | BAR2 | 2

When trying to import the data using SQLServerBulkCSVFileRecord, import fails because the lines are not split correctly:

        try (
                var connection = DriverManager.getConnection(connectionUrl);
                var bulkCopy = new SQLServerBulkCopy(connection)
        ) {
            # delimiter = |
            var fileRecord = new SQLServerBulkCSVFileRecord(csvSourcePath, null, "|",false);
            ...

            bulkCopy.setDestinationTableName(destinationTable);
            bulkCopy.writeToServer(fileRecord);
        }

I looked into SQLServerBulkCSVFileRecord::getRowData and this clearly happens because the method String::split is used to split the line. However special regex characters are not treated as delimiters if not escaped, so the line is not split correctly.

This can be fixed by one of the following:

  1. Implementing special handling for delimiters passed to the constructor of SQLServerBulkCSVFileRecord
  2. Passing the responsibility to the user, and explicitly state in the javadoc that special regex chars should be escaped.

I think the second approach is better, but I would be happy to provide a pull request for either approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementAn enhancement to the driver. Lower priority than bugs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions