Translate (Convert) Words to Numbers
-
Monday, February 14, 2011 10:50 PMModerator
What is the best way to translate English number words into numbers? Project calls for T-SQL only solution. Supporting scripts/links below. Thanks.
-- Number dictionary to be used in translation DECLARE @NumberWord TABLE (Word varchar(24), Number bigint) INSERT @NumberWord VALUES ('ZERO', 0) ,('ONE', 1) ,('TWO', 2) ,('THREE', 3) ,('FOUR', 4) ,('FIVE', 5) ,('SIX', 6) ,('SEVEN', 7) ,('EIGHT', 8) ,('NINE', 9) ,('TEN', 10) ,('ELEVEN', 11) ,('TWELVE', 12) ,('THIRTEEN', 13) ,('FOURTEEN', 14) ,('FIFTEEN', 15) ,('SIXTEEN', 16) ,('SEVENTEEN', 17) ,('EIGHTEEN', 18) ,('NINETEEN', 19) ,('TWENTY', 20) ,('THIRTY', 30) ,('FORTY', 40) ,('FIFTY', 50) ,('SIXTY', 60) ,('SEVENTY', 70) ,('EIGHTY', 80) ,('NINETY', 90) ,('HUNDRED', 100) ,('THOUSAND', 1000) ,('MILLION', 1000000) ,('BILLION', 1000000000) ,('TRILLION', 1000000000000) ,('QUADRILLION',1000000000000000) SELECT * FROM @NumberWord -- Translate the following words to numbers DECLARE @EnglishNumber TABLE (Words varchar( 512)) INSERT @EnglishNumber VALUES ('Nine Hundred Ninety-Nine') ,('One Thousand One Hundred Eleven') ,('Eleven Hundred Eleven') ,('Eight Thousand Eight Hundred Eighty-Eight') ,('Eighty-Eight Hundred Eighty-Eight') ,('Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven') ,('Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine') SELECT * FROM @EnglishNumber
Number to word translator UDF: http://www.sqlusa.com/bestpractices2008/number-to-words/
Kalman Toth SQL SERVER 2012 & BI TRAINING
New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012- Edited by Kalman TothMicrosoft Community Contributor, Moderator Monday, October 01, 2012 6:25 PM
All Replies
-
Tuesday, February 15, 2011 12:37 AMModerator
Why do you think T-SQL is the best way for artificial intelligence?
I do not think that. It is just T-SQL I know and love because it can deal with sets of all different kinds and sizes with ease.
Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM -
Tuesday, February 15, 2011 7:38 PM
Could be done as RPN logic and using a stack.
eight hundred thousand translates as
8 <- on stack
hundred <- multiply
800 <- on stack
thousand <- multiply
800000 <- on stack
Nine-teen hundred sixty nine translates as
19 <- on stack
hundred <- multiply
1900 <- on stack
sixty <- add
1960 <- on stack
nine <- add
1969 <- on stack
All you have do to with @NumberWord above is to type the values to either ADD or MULT.- Proposed As Answer by Naomi NMicrosoft Community Contributor, Moderator Tuesday, February 15, 2011 8:01 PM
- Unproposed As Answer by Kalman TothMicrosoft Community Contributor, Moderator Tuesday, February 15, 2011 8:13 PM
- Proposed As Answer by Naomi NMicrosoft Community Contributor, Moderator Tuesday, February 15, 2011 8:16 PM
- Unproposed As Answer by Kalman TothMicrosoft Community Contributor, Moderator Tuesday, February 15, 2011 8:17 PM
-
Tuesday, February 15, 2011 7:41 PM
I saw similar task at TSQL Challenge 45 at:
It is about evaluating mathematical expressions using Roman Numerals. I believe the princip is the same as your request. At the link bellow are winners with their solutions, so you can study how others solve this via T-SQL:
http://beyondrelational.com/tc/challenge45/Winners/default.aspx
- Proposed As Answer by Naomi NMicrosoft Community Contributor, Moderator Tuesday, February 15, 2011 8:02 PM
- Unproposed As Answer by Kalman TothMicrosoft Community Contributor, Moderator Tuesday, February 15, 2011 8:16 PM
-
Tuesday, February 15, 2011 8:05 PMModerator
A while ago I had a similar problem, so at that time I studied these links - may be they can be of some help
http://en.wikipedia.org/wiki/Reverse_Polish_notation
http://www.maths.abdn.ac.uk/~igc/tch/mx4002/notes/node74.html
http://www-128.ibm.com/developerworks/java/library/j-w3eval/j-w3eval.htmlUniversalThread.com thread is <B>Re: Expression parser</B> Thread #1271086
(but we discussed this problem for Visual FoxPro, so it may be not very suitable here).
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blog -
Tuesday, February 15, 2011 8:11 PM
One way is to create a Numbers table with a column for the English word for the numbers from 0 thru 999. You could do this using standard techniques for generating a numbers table and then loading the English word using the function in the link in the original post. So the table looks like
Create Table NumbersWithWords(Number int, Cardinal varchar(1000));
Load the numbers from 0 to 1000, then create the English words:
Update NumbersWithWords Set Cardinal = dbo.fnNumberToWords(Number); Update NumbersWithWords Set Cardinal = 'Zero' Where Number = 0;
(The special case for zero is because the function as written returns the empty string for 0 and we want it to be 'Zero'.) Now write a function that chops the passed English words into the various parts and computes the result. Something like the following. Note that this function only handles numbers less than 1 billion. Adding the capability to handle billions, trillions, etc is left as an exercise for the reader.
TomCreate Function dbo.EnglishToNumber (@Word varchar(1000)) Returns bigint As Begin Declare @RemainingWord varchar(1000), @IndexValue int, @Hundreds varchar(100), @Thousands varchar(100), @Millions varchar(100); Set @RemainingWord = @Word; -- Get the millions part Set @IndexValue = CharIndex('Million', @Word); If @IndexValue > 0 Begin Set @Millions = Left(@RemainingWord, @IndexValue - 1); Set @RemainingWord = SubString(@RemainingWord, @IndexValue + 8, Len(@RemainingWord)) End -- Get the thousands part Set @IndexValue = CharIndex('Thousand', @RemainingWord); If @IndexValue > 0 Begin Set @Thousands = Left(@RemainingWord, @IndexValue - 1); Set @RemainingWord = SubString(@RemainingWord, @IndexValue + 9, Len(@RemainingWord)) End -- Get the hundreds part -- We need this for the special case of numbers like eighty-eight hundred Set @IndexValue = CharIndex('Hundred', @RemainingWord); If @IndexValue > 0 Begin Set @Hundreds = Left(@RemainingWord, @IndexValue - 1); Set @RemainingWord = SubString(@RemainingWord, @IndexValue + 8, Len(@RemainingWord)) End -- What is left is the less than 1000 part, so return the result Return Coalesce(1000000 * (Select Number From NumbersWithWords Where @Millions = Cardinal), 0) + Coalesce(1000 * (Select Number From NumbersWithWords Where @Thousands = Cardinal), 0) + Coalesce(100 * (Select Number From NumbersWithWords Where @Hundreds = Cardinal), 0) + (Select Number From NumbersWithWords Where @RemainingWord = Cardinal); End go
-
Tuesday, February 15, 2011 8:29 PMModerator
One way is to create a Numbers table with a column for the English word for the numbers from 0 thru 999. Tom
Tom - Isn't the @NumberWord table (posted in question) sufficient for the conversion? How would a third grader do it? Thanks.
Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM -
Tuesday, February 15, 2011 10:15 PM
Tom - Isn't the @NumberWord table (posted in question) sufficient for the conversion? How would a third grader do it? Thanks.
Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAMYes you could. But it would make the function more complex. If you had a number like "twenty-three thousand two hundred fourteen", instead of just finding the "thousand", grabbing the "twenty-three" and looking up "twenty-three" and multiplying the value by 1000, the function would have to first split the "twenty-three" into two separate strings, then look up the values.
So the trade-off of having a look-up table with fewer entries is the function must do more string manipulation. Since my table only needs 1,000 rows, it would seem to me to be a better choice since T-SQL is far better at table look-ups than it is at string manipulation.
Indeed, if I had to do this for some reasonably small number of numbers, say the numbers from 1 to 1,000,000, I would just do this with a look-up table with every possible number text from 1 to 1,000,000 (some numbers would get two rows because 8,888 can be either "eighty-eight hundred eighty-eight" or "eight thousane eight hundred eighty-eight", but still the table would only take a few 10's of megabytes and the answer could be directly looked up with no processing, so it would (IMO) be worth it.
Tom
-
Tuesday, February 15, 2011 10:24 PMModerator
Thanks Tom. I understand your point, however, performance is not top priority in this case, and I am looking for various logic algorithms.
I forgot to stress in the question post that I seek a solution based on the @NumberWord table, like a child would parse a word number.
Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM -
Wednesday, February 16, 2011 3:52 AM
I forgot to stress in the question post that I seek a solution based on the @NumberWord table , like a child would parse a word number.
Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAMIf you wanted a solution based on the @NumberWord table, then
TomCreate Function dbo.EnglishToNumber (@Word varchar(1000)) Returns bigint As Begin DECLARE @NumberWord TABLE (Word varchar(24), Number bigint) INSERT @NumberWord VALUES ('ZERO', 0) ,('ONE', 1) ,('TWO', 2) ,('THREE', 3) ,('FOUR', 4) ,('FIVE', 5) ,('SIX', 6) ,('SEVEN', 7) ,('EIGHT', 8) ,('NINE', 9) ,('TEN', 10) ,('ELEVEN', 11) ,('TWELVE', 12) ,('THIRTEEN', 13) ,('FOURTEEN', 14) ,('FIFTEEN', 15) ,('SIXTEEN', 16) ,('SEVENTEEN', 17) ,('EIGHTEEN', 18) ,('NINETEEN', 19) ,('TWENTY', 20) ,('THIRTY', 30) ,('FORTY', 40) ,('FIFTY', 50) ,('SIXTY', 60) ,('SEVENTY', 70) ,('EIGHTY', 80) ,('NINETY', 90) ,('HUNDRED', 100) ,('THOUSAND', 1000) ,('MILLION', 1000000) ,('BILLION', 1000000000) ,('TRILLION', 1000000000000) ,('QUADRILLION',1000000000000000) Declare @ResolvedValue bigint, @CurrentGroupValue bigint, @CurrentValue bigint, @CurrentWord varchar(512), @RemainingWord char(512); Set @ResolvedValue = 0; Set @CurrentGroupValue = 0; Set @RemainingWord = Replace(@Word, '-', ' '); While Len(@RemainingWord) > 0 Begin Set @CurrentWord = Left(@RemainingWord, CharIndex(' ', @RemainingWord)); Select @CurrentValue = Number From @NumberWord Where Word = @CurrentWord; If @CurrentValue >= 1000 Begin Set @ResolvedValue = @ResolvedValue + @CurrentGroupValue * @CurrentValue; Set @CurrentGroupValue = 0 End Else If @CurrentValue >= 100 Begin Set @CurrentGroupValue = @CurrentGroupValue * @CurrentValue; End Else Begin Set @CurrentGroupValue = @CurrentGroupValue + @CurrentValue; End Set @RemainingWord = SubString(@RemainingWord, Len(@CurrentWord) + 2, Len(@RemainingWord)); End Return @ResolvedValue + @CurrentGroupValue; End go -- Translate the following words to numbers DECLARE @EnglishNumber TABLE (Words varchar( 512)) INSERT @EnglishNumber VALUES ('Nine Hundred Ninety-Nine') ,('One Thousand One Hundred Eleven') ,('Eleven Hundred Eleven') ,('Eight Thousand Eight Hundred Eighty-Eight') ,('Eighty-Eight Hundred Eighty-Eight') ,('Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven') ,('Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine') --SELECT * FROM @EnglishNumber SELECT *, dbo.EnglishToNumber(Words) FROM @EnglishNumber go Drop Function dbo.EnglishToNumber
- Marked As Answer by Kalman TothMicrosoft Community Contributor, Moderator Wednesday, February 16, 2011 3:10 PM
-
Wednesday, February 16, 2011 3:14 PMModeratorThanks for all the contributions. Special thanks to Tom Cooper.
Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM -
Wednesday, February 16, 2011 4:43 PM
Need more tweak.
,('Two thousand hundred eleven')
-
Wednesday, February 16, 2011 4:51 PMModeratorThis is not correct number in English. The function probably does not filter invalid input.
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blog

