Replacing Data – Part 2

I have been recently devoting a-lot of time answering questions on STACKOVERFLOW.

Many times, you are just helping and/or teaching younger SQL Server developers and/or administrators the tricks of the trade. Sometimes you come across a GEM, a question that you can blog about so that when they are asked again, you have a well constructed answer.

One such question was recently asked. How do you replace all the digits in a credit card, expiration date and authorization code string?

In my last article, I created a inline table valued function to solve this problem. However, this was specific to replacing numbers. This solution does not scale very well when we want to replace tens, if not hundreds, of key value pairs.

A table value parameter is the best way to pass the (search string/replacement string) mapping pair to the function. It first starts with defining a table type. The code below creates a table type named [AssociativeArray] in the [AdventureWorks2012] database.

Ultimately, we want to package up the logic into a in-line table valued functions since they perform very well. See Wayne Sheffield’s blog article that attests to this fact.

However, the statement inside the RETURN has to evaluate to a single SELECT statement. Performing self assignment of the input parameter using the REPLACE() function violates this rule. We need this logic since it applies every mapping to the source string.

Therefore, we need to execute a CREATE FUNCTION with two TSQL statements. The return value is a simple character string. Please notice that I did ORDER the associative array BY LEN(KEY) descending, then KEY ascending. We will talk about this design choice later.

Now that we have this new function named [ufn_Super_Replace]. Let’s see how we use it to solve the numbers masking problem. First, define a table variable and load it with the key value mappings. Second, call the function passing the correct arguments. Last but not least, verify the output.

super-replace-output1

It is now time to talk about that ORDER BY design decision. I am assuming that a user of the [usp_Super_Replace] function wants the biggest replacement to WIN when one key is a subset of another.

The example below has a TV show named Maximum Exposure that was popular in the year 2000. The word max is a subset of the word maximum. In this example, the larger word replacement wins.

Let’s do a quick run down of this article. The [usp_Super_Replace] function takes an associative array (table value parameter) and source string as input. It returns a destination string in which key value mappings are replaced in length descending order. In short, this function may be of use if you are masking or cleaning up data in the near future.

Related posts

Leave a Comment