String Functions – SOUNDEX()


I am going to continue my series of very short articles or tidbits on Transaction SQL string functions. I will exploring the SOUNDEX() function today.

The SOUNDEX() function calculates a four-character code that is based on how the string sounds when spoken. Please see my prior blog entry for a sample use of this function in conjunction with the DIFFERENCE() function.

Soundex was developed by Robert C. Russell and Margaret K. Odell and patented in 1918 / 1922. The United States government created a variation named the American Soundex which was used in the 1930s for retrospective analysis of the US censuses from 1890 through 1920.

The system became popular when Donald Knuth’s reference it in his book named “The Art of Computer Programming”.

The current American Soundex standard is maintained by the National Archives and Records Administration (NARA).

I will not go into the nitty gritty details of the rules for this system. However, I want to review how the word ‘Mongoose’ recieves a soundex code of M522.

The system disregards the letters A, E, I, O, U, H, W, and Y. A soundex code always starts with the first letter of the word. It is followed by a three digit number. Zeros are added at the end if neccessary.

Therefore, ‘Mongoose’ becomes ‘ngs’. Given the lookup table, the number 522 represents this sequence. Finally, the first letter and number are combined into the code ‘M522’.

In summary, the SOUNDEX() function is a great way to hash words into codes by considering what sounds simular. Next time, I will be talking about the SPACE() function.

Related posts

Leave a Comment