Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792) by Ananya2 · Pull Request #2795 · microsoft/mssql-jdbc

Ananya2 · 2025-10-08T09:58:09Z

Overview
This PR fixes bulk insert parsing of isolated quotes in tab-delimited data by removing problematic global quote state tracking from the parseString method in SQLServerBulkCSVFileRecord. The fix ensures that isolated quote characters are treated as literal data rather than field boundary markers, resolving IndexOutOfBoundsException errors during bulk copy operations.

Problem Description
The current implementation uses a global quoted boolean state in the parseString method that toggles on every quote character encounter. This causes issues when tab-delimited data contains isolated quotes within fields:
if (buffer.charAt(i) == doubleQuoteChar) { quoted = !quoted; } else if (!quoted && /* delimiter found */) { // Process delimiter }

When parsing data like "Do you wish to remove the product "\t22451\t1", the isolated quote incorrectly toggles the quoted state, causing subsequent tab delimiters to be ignored. This results in:
Expected: 5 fields parsed correctly
Actual: 3 fields parsed, causing IndexOutOfBoundsException

Root Cause
PR #2434 introduced quote handling logic to fix stack overflow issues in CSV parsing. While the fix successfully resolved the stack overflow problem for CSV files, it created a new issue where isolated quotes in tab-delimited data are treated as field boundary markers instead of literal characters.

Solution
Reverted to using currentLine.split(delimiter, -1) instead of parseString(currentLine, delimiter) for simple delimiter-based parsing.
Maintained stack overflow fix from PR #2434 while fixing the quote parsing regression.
Added comprehensive test coverage with the exact problematic data patterns from issue #2792

Testing

Added testBulkCopyTabDelimitedWithQuotes() test case with problematic data from issue Bulk insert does not handle " properly #2792

Closes #2792

codecov · 2025-10-08T10:25:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.20%. Comparing base (e783ae4) to head (beaa4e6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##               main    #2795   +/-   ##
=========================================
  Coverage     52.20%   52.20%           
- Complexity     4142     4144    +2     
=========================================
  Files           149      149           
  Lines         34306    34306           
  Branches       5723     5723           
=========================================
+ Hits          17908    17909    +1     
+ Misses        13906    13905    -1     
  Partials       2492     2492

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

abbrev · 2026-01-12T15:48:12Z

This removes support for a regex delimiter.

After digging through old versions of the code, it appears that the original support for a regex delimiter was an accidental feature, but it might not have worked properly in some cases (especially when escapeDelimiters is true and there's a double quote in the line). Also, a note about regex was added to comments in #1711 in response to #1691.

At any rate, a delimiter cannot be a regex as of this merge, so all I propose now is to remove the mention of regex in the following comment (it appears twice in SQLServerBulkCSVFileRecord.java) since it is no longer true:

     *        Delimiter to used to separate each column. Regex characters must be escaped with double backslashes.

Ananya2 · 2026-01-13T14:53:54Z

@abbrev - Thanks for the detailed context.
Since this pull request removes support for regex delimiters and the update you’re suggesting is about correcting comments /documentation rather than code behavior addressed here, could you please open a new issue to track the documentation change?

Ananya2 added 2 commits October 8, 2025 15:12

Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792)

6861de4

removed comments

1b76bf9

Ananya2 self-assigned this Oct 8, 2025

Ananya2 requested review from David-Engel, divang, machavan and muskan124947 October 8, 2025 09:58

machavan reviewed Oct 8, 2025

View reviewed changes

Comment thread src/main/java/com/microsoft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java

fixed CI check failure

beaa4e6

Ananya2 requested a review from machavan October 13, 2025 10:34

Ananya2 added this to the 13.3.0 milestone Oct 13, 2025

machavan approved these changes Oct 13, 2025

View reviewed changes

muskan124947 approved these changes Oct 13, 2025

View reviewed changes

Ananya2 merged commit 7f4a3a3 into main Oct 14, 2025
19 checks passed

abbrev mentioned this pull request Jan 14, 2026

Remove outdated comment about escaping delimiter #2879

Closed

Ananya2 mentioned this pull request Jan 19, 2026

Remove outdated comment about escaping delimiter #2880

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792)#2795

Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792)#2795
Ananya2 merged 3 commits intomainfrom
user/anagarg/issue#2792

Ananya2 commented Oct 8, 2025

Uh oh!

codecov Bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

abbrev commented Jan 12, 2026

Uh oh!

Ananya2 commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Ananya2 commented Oct 8, 2025

Uh oh!

codecov Bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

abbrev commented Jan 12, 2026

Uh oh!

Ananya2 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Oct 8, 2025 •

edited

Loading

Ananya2 commented Jan 13, 2026 •

edited

Loading