Working with messy numeric data is one of those unavoidable realities in database development. Whether you’re importing data from CSV files, web APIs, or legacy systems, you’ll often encounter numeric values stored as strings with all sorts of unwanted characters mixed in. SQL Server’s conversion functions are pretty strict about what they’ll accept, so you need to clean up these strings before attempting any conversions.
The Challenge with Dirty Numeric Data
When SQL Server tries to convert a string to a numeric type using functions like CAST(), CONVERT(), or their TRY_ equivalents TRY_CAST() and TRY_CONVERT(), it expects the string to be in a clean, recognizable format. Throw in some extra spaces, currency symbols, formatting characters, or non-numeric junk, and you’ll either get conversion errors or NULL values where you expected numbers.
Usually the best way around this is to systematically strip out the unwanted characters and normalize the format before attempting the conversion. This approach is typically more reliable than trying to handle every edge case during the conversion itself.
Common Cleaning Options
Most numeric string cleaning involves removing or replacing specific character patterns. Here are some common options:
REPLACE()for removing specific characters like currency symbols, commas, or parenthesesTRANSLATE()to do a similar thing toREPLACE()(although these functions work differently)LTRIM()andRTRIM()for stripping whitespace from the beginning and endSUBSTRING()for extracting portions of strings when the numeric part is embedded in textPATINDEX()for finding positions of patterns, useful for locating where numeric data starts or ends- Regular expressions through CLR functions or newer SQL Server features for complex pattern matching
The trick is to apply these operations in the right order. You typically want to handle the most disruptive characters first, then work your way down to the subtle formatting issues.
Example
In this example we’ll use REPLACE() to clean the numeric strings before conversion. We’ll also look at combining REPLACE() with TRANSLATE() to achieve the same result.
Sample Data
Suppose we have the following table and data:
CREATE TABLE SalesRaw (
Id INT IDENTITY(1,1),
RawAmount VARCHAR(50)
);
INSERT INTO SalesRaw (RawAmount)
VALUES
(' 1,200 '), -- extra spaces and a comma
('$450.75'), -- currency symbol and decimal
('3000'), -- clean number
('12abc34'), -- bad characters inside
('(200)'), -- negative stored in accounting style
(NULL), -- null value
('7 500'); -- embedded space
SELECT * FROM SalesRaw;
Output:
Id RawAmount
-- ---------
1 1,200
2 $450.75
3 3000
4 12abc34
5 (200)
6 null
7 7 500
The RawAmount column is VARCHAR, which is a string type (not a numeric type).
Attempting to Convert the Data Without Cleaning
Now if we try to convert the above column directly to a numeric type, we’ll get errors on most rows:
SELECT
Id,
RawAmount,
CAST(RawAmount AS DECIMAL(18,2)) AS CleanAmount
FROM SalesRaw;
Output:
Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to numeric.
It threw an error at line 1, due to it having a value that couldn’t be converted to our numeric type (in this case DECIMAL(18,2)).
We could avoid the error by replacing CAST() with TRY_CAST(), but that still won’t result in a conversion:
SELECT
Id,
RawAmount,
TRY_CAST(RawAmount AS DECIMAL(18,2)) AS CleanAmount
FROM SalesRaw;
Output:
Id RawAmount CleanAmount
-- --------- -----------
1 1,200 null
2 $450.75 null
3 3000 3000
4 12abc34 null
5 (200) null
6 null null
7 7 500 null
This function returns null whenever a conversion fails, and so most of our rows return null, which means most of the conversions failed.
Using REPLACE() to Clean the Data
In order to get the conversions to succeed, we’ll need to clean the raw data before attempting to convert it. The idea is simple in that we will strip out everything that isn’t part of a valid number. In particular, we’ll remove commas, spaces, and dollar signs. We’ll also handle accounting-style negatives (e.g., (200) becomes -200).
One way to do this in SQL Server is by using a custom REPLACE() chain.
Here’s an approach that handles most of the cases above:
SELECT
Id,
RawAmount,
TRY_CAST(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
RawAmount, '(', '-'
), ')', ''
), '$', ''
), ',', ''
), ' ', ''
) AS DECIMAL(18,2)
) AS CleanAmount
FROM SalesRaw;
Result:
Id RawAmount CleanAmount
-- --------- -----------
1 1,200 1200
2 $450.75 450.75
3 3000 3000
4 12abc34 null
5 (200) -200
6 null null
7 7 500 7500
We can see that most of the rows were converted successfully. Two failed, because our data cleansing process didn’t catch them. This is mainly because one of the raw values was null, which is an unknown value, so we wouldn’t be able to clean it anyway (there’s nothing to clean). The other value contains a bunch of letters in the middle, which suggests that it’s either not a number, or it uses a different base (e.g., base-16), which means that it would need more than just a bit of a clean.
Using TRANSLATE() to Clean the Data
Here’s an example that uses TRANSLATE() to replace some of the REPLACE() functions (although we still use one REPLACE() function):
SELECT
Id,
RawAmount,
CleanAmount = TRY_CAST(
REPLACE(
TRANSLATE(RawAmount, '()$, ', '-' + REPLICATE(' ', 4)),
' ', ''
) AS DECIMAL(18,2)
)
FROM SalesRaw;
Result:
Id RawAmount CleanAmount
-- --------- -----------
1 1,200 1200
2 $450.75 450.75
3 3000 3000
4 12abc34 null
5 (200) -200
6 null null
7 7 500 7500
In this case the code is a lot more concise, however, you’ll need to be careful that the arguments work for your scenario, otherwise you could end up with a mess.
Takeaways
- It’s good practice to cleanse numeric strings before conversion in order to avoid runtime errors.
- You can use
REPLACE()(and/orTRANSLATE()if you prefer) to strip out known problem characters. - Use
TRY_CAST()orTRY_CONVERT()to avoid failures when bad data slips through (and set the bad values tonull). - Watch for accounting-style negatives and other business-specific quirks.
With a reliable cleansing pattern in place, you can safely convert messy text-based numbers into proper numeric data types and move on with analysis or reporting.