Cleaning Numeric Strings Before Conversion in SQL Server

Working with messy numeric data is one of those unavoidable realities in database development. Whether you’re importing data from CSV files, web APIs, or legacy systems, you’ll often encounter numeric values stored as strings with all sorts of unwanted characters mixed in. SQL Server’s conversion functions are pretty strict about what they’ll accept, so you need to clean up these strings before attempting any conversions.

The Challenge with Dirty Numeric Data

When SQL Server tries to convert a string to a numeric type using functions like CAST(), CONVERT(), or their TRY_ equivalents TRY_CAST() and TRY_CONVERT(), it expects the string to be in a clean, recognizable format. Throw in some extra spaces, currency symbols, formatting characters, or non-numeric junk, and you’ll either get conversion errors or NULL values where you expected numbers.

Usually the best way around this is to systematically strip out the unwanted characters and normalize the format before attempting the conversion. This approach is typically more reliable than trying to handle every edge case during the conversion itself.

Common Cleaning Options

Most numeric string cleaning involves removing or replacing specific character patterns. Here are some common options:

  • REPLACE() for removing specific characters like currency symbols, commas, or parentheses
  • TRANSLATE() to do a similar thing to REPLACE() (although these functions work differently)
  • LTRIM() and RTRIM() for stripping whitespace from the beginning and end
  • SUBSTRING() for extracting portions of strings when the numeric part is embedded in text
  • PATINDEX() for finding positions of patterns, useful for locating where numeric data starts or ends
  • Regular expressions through CLR functions or newer SQL Server features for complex pattern matching

The trick is to apply these operations in the right order. You typically want to handle the most disruptive characters first, then work your way down to the subtle formatting issues.

Example

In this example we’ll use REPLACE() to clean the numeric strings before conversion. We’ll also look at combining REPLACE() with TRANSLATE() to achieve the same result.

Sample Data

Suppose we have the following table and data:

CREATE TABLE SalesRaw (
    Id INT IDENTITY(1,1),
    RawAmount VARCHAR(50)
);

INSERT INTO SalesRaw (RawAmount)
VALUES
    (' 1,200 '),     -- extra spaces and a comma
    ('$450.75'),     -- currency symbol and decimal
    ('3000'),        -- clean number
    ('12abc34'),     -- bad characters inside
    ('(200)'),       -- negative stored in accounting style
    (NULL),          -- null value
    ('7 500');       -- embedded space

SELECT * FROM SalesRaw;

Output:

Id  RawAmount
-- ---------
1 1,200
2 $450.75
3 3000
4 12abc34
5 (200)
6 null
7 7 500

The RawAmount column is VARCHAR, which is a string type (not a numeric type).

Attempting to Convert the Data Without Cleaning

Now if we try to convert the above column directly to a numeric type, we’ll get errors on most rows:

SELECT 
    Id,
    RawAmount,
    CAST(RawAmount AS DECIMAL(18,2)) AS CleanAmount
FROM SalesRaw;

Output:

Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to numeric.

It threw an error at line 1, due to it having a value that couldn’t be converted to our numeric type (in this case DECIMAL(18,2)).

We could avoid the error by replacing CAST() with TRY_CAST(), but that still won’t result in a conversion:

SELECT 
    Id,
    RawAmount,
    TRY_CAST(RawAmount AS DECIMAL(18,2)) AS CleanAmount
FROM SalesRaw;

Output:

Id  RawAmount  CleanAmount
-- --------- -----------
1 1,200 null
2 $450.75 null
3 3000 3000
4 12abc34 null
5 (200) null
6 null null
7 7 500 null

This function returns null whenever a conversion fails, and so most of our rows return null, which means most of the conversions failed.

Using REPLACE() to Clean the Data

In order to get the conversions to succeed, we’ll need to clean the raw data before attempting to convert it. The idea is simple in that we will strip out everything that isn’t part of a valid number. In particular, we’ll remove commas, spaces, and dollar signs. We’ll also handle accounting-style negatives (e.g., (200) becomes -200).

One way to do this in SQL Server is by using a custom REPLACE() chain.

Here’s an approach that handles most of the cases above:

SELECT 
    Id,
    RawAmount,
    TRY_CAST(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            RawAmount, '(', '-'
                        ), ')', ''
                    ), '$', ''
                ), ',', ''
            ), ' ', ''
        ) AS DECIMAL(18,2)
    ) AS CleanAmount
FROM SalesRaw;

Result:

Id  RawAmount  CleanAmount
-- --------- -----------
1 1,200 1200
2 $450.75 450.75
3 3000 3000
4 12abc34 null
5 (200) -200
6 null null
7 7 500 7500

We can see that most of the rows were converted successfully. Two failed, because our data cleansing process didn’t catch them. This is mainly because one of the raw values was null, which is an unknown value, so we wouldn’t be able to clean it anyway (there’s nothing to clean). The other value contains a bunch of letters in the middle, which suggests that it’s either not a number, or it uses a different base (e.g., base-16), which means that it would need more than just a bit of a clean.

Using TRANSLATE() to Clean the Data

Here’s an example that uses TRANSLATE() to replace some of the REPLACE() functions (although we still use one REPLACE() function):

SELECT 
    Id,
    RawAmount,
    CleanAmount = TRY_CAST(
        REPLACE(
            TRANSLATE(RawAmount, '()$, ', '-' + REPLICATE(' ', 4)),
            ' ', ''
        ) AS DECIMAL(18,2)
    )
FROM SalesRaw;

Result:

Id  RawAmount  CleanAmount
-- --------- -----------
1 1,200 1200
2 $450.75 450.75
3 3000 3000
4 12abc34 null
5 (200) -200
6 null null
7 7 500 7500

In this case the code is a lot more concise, however, you’ll need to be careful that the arguments work for your scenario, otherwise you could end up with a mess.

Takeaways

  • It’s good practice to cleanse numeric strings before conversion in order to avoid runtime errors.
  • You can use REPLACE() (and/or TRANSLATE() if you prefer) to strip out known problem characters.
  • Use TRY_CAST() or TRY_CONVERT() to avoid failures when bad data slips through (and set the bad values to null).
  • Watch for accounting-style negatives and other business-specific quirks.

With a reliable cleansing pattern in place, you can safely convert messy text-based numbers into proper numeric data types and move on with analysis or reporting.