Padding with Zeroes in Sql Server

The problem: Take an integer column with a value like 123 and output it as four digits, prefixing with zero if necessary, i.e. 123 gets output as 0123. You get situations like this if, for example, you store e.g. the last four digits of a card number as an integer-type, and then try and output it. As some cards will end 0xxx this means you have to be prepared to recreate the leading zeroes when displaying the value later. Here are two approaches to solve this problem.


Convert to STR, Replace Spaces

This method basically takes the number, converts it to a fixed-length character string, and then replaces the spaces with zeroes:

REPLACE(STR(CardPostfix, 4), ' ', '0')

where 4 is the desired length of the field.

Convert to VARCHAR (twice) – then prefix with Zeroes

Here we go – we convert the number to a string of maximum length, then do the same again to find its length (I do not know if Sql Server is clever enough to optimise that and do it just once):

REPLICATE('0', 4 - LEN(CONVERT(varchar(4), CardPostfix)))
    + CONVERT(varchar(4), CardPostfix)

Comparison


CREATE TABLE #temp (Name VARCHAR(30) PRIMARY KEY NOT NULL,
					CardPostfix INT)

INSERT INTO #temp
      SELECT 'Normal Length', 1234
UNION SELECT 'Short 1', 123
UNION SELECT 'Short 2', 12
UNION SELECT 'Null Value', NULL
UNION SELECT 'Long 1 (Bad data?)', 12345
UNION SELECT 'Long 2 (Bad data?)', 345678901

DECLARE @colLength INT
SET @colLength = 4;

SELECT	Name,
        CardPostfix,
        -- STR converts to a right-aligned string of the specified length, then replaces spaces with zeroes
        REPLACE(STR(CardPostfix, @colLength), ' ', '0') AS [ReplaceSpaces],
        -- Calculates the length of the string, and for each digit 'short', prefix with zeroes to the correct length
        REPLICATE('0', @colLength - LEN(CONVERT(varchar(8), CardPostfix))) + CONVERT(varchar(8), CardPostfix) AS [PrefixWithZeroes],
        REPLICATE('0', @colLength - LEN(CONVERT(varchar(4), CardPostfix))) + CONVERT(varchar(4), CardPostfix) AS [PrefixWithZeroes1]
FROM #temp

And here is the output, reformatted a little to improve readability:

Name                 CardPostfix ReplaceSpaces PrefixWithZeroes PrefixWithZeroes1
-------------------- ----------- ------------- ---------------- -----------------
Normal Length               1234 1234          1234             1234      
Null Value                  NULL NULL          NULL             NULL
Short 1                      123 0123          0123             0123      
Short 2                       12 0012          0012             0012      
Long 1 (Bad data?)         12345 ****          NULL             000*      
Long 2 (Bad data?)     345678901 ****          000*             000*      

Consider first the behaviour for Normal Length data, Null Values and Short values: all of our techniques have come up with the expected outputs: 0123 for a four-digit output of 123, and 0012 for a four-digit output of 12.

But what about if we have incorrectly – or unexpectedly – long data? Firstly, we can see that the ‘Replace Spaces’ algorithm has output ‘****’ in both cases. I like this: it draws attention to the idea that the data does not fit, and so instead of displaying partial data, or breaking the instruction to output a particular length of data, it outputs a ‘warning’ with the asterisks.

In contrast, the ‘Prefix with Zeroes’ algorithm has null or ’000*’, dependent on the length of the data in comparison to the intermediate type we cast the expression to (the VARCHAR(4) or VARCHAR(8))… and now perhaps you can see why I have output two slightly differing algorithms; first we can not do this: VARCHAR(@colLength) so we have to state a column length – twice – and then it seems we have to push the number high ‘just to be sure’ that we will not have inconsistent outputs based on our data being too long. Indeed, one sample I found when researching this post used VARCHAR(8000). In practical terms, this will not matter, but still I prefer the handling and behaviour of the replace-spaces approach!

In Conclusion

To my surprise, although I sort of think the REPLACE mechanism cheats a little, the SQL is so much more succinct than the alternative, and can be converted more readily to a generic function than the alternative. Additionally, its handling of longer-than-expected values is more satisfying to my eye.

We should not forget, though, that in this particular example, it may have been more appropriate to have a different data-type to store the last four digits. Why? Because the data ceases to be numeric; I would never want to add this to another number, or consider the average of the column values… had we stored this as a CHAR(4) we would not have taken any additional space in the DB but we have used a type that is more agreeable to the sort of operations we would expect to carry out on it: with integer type columns we might save a byte or two, but we may have to write SQL like this:

REPLACE(STR(CardPrefix, @colLength), ' ', '0') + '******' + REPLACE(STR(CardPostfix, @colLength), ' ', '0')

whereas, had the values been stored as CHAR then we could do

CardPrefix + '******' + CardPostfix

which is far easier to understand!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>