none
Translate (Convert) Words to Numbers

    Question

  • What is the best way to translate English number words into numbers? Project calls for T-SQL only solution. Supporting scripts/links below. Thanks.

    -- Number dictionary to be used in translation
    DECLARE @NumberWord TABLE (Word varchar(24), Number bigint)
    INSERT @NumberWord VALUES
    ('ZERO',	0)
    ,('ONE',	1)
    ,('TWO',	2)
    ,('THREE',	3)
    ,('FOUR',	4)
    ,('FIVE',	5)
    ,('SIX',	6)
    ,('SEVEN',	7)
    ,('EIGHT',	8)
    ,('NINE',	9)
    ,('TEN',	10)
    ,('ELEVEN',	11)
    ,('TWELVE',	12)
    ,('THIRTEEN',	13)
    ,('FOURTEEN',	14)
    ,('FIFTEEN',	15)
    ,('SIXTEEN',	16)
    ,('SEVENTEEN',	17)
    ,('EIGHTEEN',	18)
    ,('NINETEEN',	19)
    ,('TWENTY',		20)
    ,('THIRTY',		30)
    ,('FORTY',		40)
    ,('FIFTY',		50)
    ,('SIXTY',		60)
    ,('SEVENTY',	70)
    ,('EIGHTY',		80)
    ,('NINETY',		90)
    ,('HUNDRED',	100)
    ,('THOUSAND',	1000)
    ,('MILLION',  1000000)
    ,('BILLION',  1000000000)
    ,('TRILLION',  1000000000000)
    ,('QUADRILLION',1000000000000000)
    SELECT * FROM @NumberWord
    
    -- Translate the following words to numbers
    DECLARE @EnglishNumber TABLE (Words varchar( 512))
    INSERT @EnglishNumber VALUES
    ('Nine Hundred Ninety-Nine')
    ,('One Thousand One Hundred Eleven')
    ,('Eleven Hundred Eleven')
    ,('Eight Thousand Eight Hundred Eighty-Eight')
    ,('Eighty-Eight Hundred Eighty-Eight')
    ,('Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven')
    ,('Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine')
    SELECT * FROM @EnglishNumber
    
    

    Number to word translator UDF: http://www.sqlusa.com/bestpractices2008/number-to-words/


    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012
    Monday, February 14, 2011 10:50 PM
    Moderator

Answers

  • I forgot to stress in the question post that I seek a solution based on the @NumberWord  table , like a child would parse a word number.
    Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM

    If you wanted a solution based on the @NumberWord table, then

    Create Function dbo.EnglishToNumber (@Word varchar(1000)) Returns bigint As
    Begin
    	DECLARE @NumberWord TABLE (Word varchar(24), Number bigint)
    	INSERT @NumberWord VALUES
    	('ZERO',	0)
    	,('ONE',	1)
    	,('TWO',	2)
    	,('THREE',	3)
    	,('FOUR',	4)
    	,('FIVE',	5)
    	,('SIX',	6)
    	,('SEVEN',	7)
    	,('EIGHT',	8)
    	,('NINE',	9)
    	,('TEN',	10)
    	,('ELEVEN',	11)
    	,('TWELVE',	12)
    	,('THIRTEEN',	13)
    	,('FOURTEEN',	14)
    	,('FIFTEEN',	15)
    	,('SIXTEEN',	16)
    	,('SEVENTEEN',	17)
    	,('EIGHTEEN',	18)
    	,('NINETEEN',	19)
    	,('TWENTY',		20)
    	,('THIRTY',		30)
    	,('FORTY',		40)
    	,('FIFTY',		50)
    	,('SIXTY',		60)
    	,('SEVENTY',	70)
    	,('EIGHTY',		80)
    	,('NINETY',		90)
    	,('HUNDRED',	100)
    	,('THOUSAND',	1000)
    	,('MILLION', 1000000)
    	,('BILLION', 1000000000)
    	,('TRILLION', 1000000000000)
    	,('QUADRILLION',1000000000000000)
    	Declare @ResolvedValue bigint,
    	 @CurrentGroupValue bigint,
    	 @CurrentValue bigint,
    	 @CurrentWord varchar(512),
    	 @RemainingWord char(512);
    	 
    	Set @ResolvedValue = 0;
    	Set @CurrentGroupValue = 0;
    	Set @RemainingWord = Replace(@Word, '-', ' ');
    	While Len(@RemainingWord) > 0
    	Begin
    		Set @CurrentWord = Left(@RemainingWord, CharIndex(' ', @RemainingWord));
    		Select @CurrentValue = Number From @NumberWord Where Word = @CurrentWord;
    		If @CurrentValue >= 1000
    		Begin
    		 Set @ResolvedValue = @ResolvedValue + @CurrentGroupValue * @CurrentValue;
    		 Set @CurrentGroupValue = 0
    		End
    		Else If @CurrentValue >= 100
    		Begin
    		 Set @CurrentGroupValue = @CurrentGroupValue * @CurrentValue;
    		End
    		Else
    		Begin
    		 Set @CurrentGroupValue = @CurrentGroupValue + @CurrentValue;
    		End
    		Set @RemainingWord = SubString(@RemainingWord, Len(@CurrentWord) + 2, Len(@RemainingWord));
    	End
    	Return @ResolvedValue + @CurrentGroupValue;
    End
    go
    
    
    -- Translate the following words to numbers
    DECLARE @EnglishNumber TABLE (Words varchar( 512))
    INSERT @EnglishNumber VALUES
    ('Nine Hundred Ninety-Nine')
    ,('One Thousand One Hundred Eleven')
    ,('Eleven Hundred Eleven')
    ,('Eight Thousand Eight Hundred Eighty-Eight')
    ,('Eighty-Eight Hundred Eighty-Eight')
    ,('Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven')
    ,('Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine')
    --SELECT * FROM @EnglishNumber
    
    SELECT *, dbo.EnglishToNumber(Words) FROM @EnglishNumber
    
    go
    
    Drop Function dbo.EnglishToNumber
    
    Tom

    Wednesday, February 16, 2011 3:52 AM

All replies

  • Why do you think T-SQL is the best way for artificial intelligence?

    I do not think that.  It is just T-SQL I know and love because it can deal with sets of all different kinds and sizes with ease.


    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM
    Tuesday, February 15, 2011 12:37 AM
    Moderator
  • Could be done as RPN logic and using a stack.

    eight hundred thousand translates as

    8 <- on stack
    hundred <- multiply
    800 <- on stack
    thousand <- multiply
    800000 <- on stack


    Nine-teen hundred sixty nine translates as

    19 <- on stack
    hundred <- multiply
    1900 <- on stack
    sixty <- add
    1960 <- on stack
    nine <- add
    1969 <- on stack

    All you have do to with @NumberWord above is to type the values to either ADD or MULT.
    Tuesday, February 15, 2011 7:38 PM
  • I saw similar task at TSQL Challenge 45 at:

    http://beyondrelational.com/blogs/tc/archive/2010/12/13/tsql-challenge-45-evaluate-mathematical-expressions-presented-using-roman-numerals.aspx

    It is about evaluating mathematical expressions using Roman Numerals. I believe the princip is the same as your request. At the link bellow are winners with their solutions, so you can study how others solve this via T-SQL:

    http://beyondrelational.com/tc/challenge45/Winners/default.aspx

    Tuesday, February 15, 2011 7:41 PM
  • A while ago I had a similar problem, so at that time I studied these links - may be they can be of some help

    http://en.wikipedia.org/wiki/Reverse_Polish_notation

    http://www.maths.abdn.ac.uk/~igc/tch/mx4002/notes/node74.html

    http://www-128.ibm.com/developerworks/java/library/j-w3eval/j-w3eval.html

    UniversalThread.com thread is <B>Re: Expression parser</B> Thread #1271086

    (but we discussed this problem for Visual FoxPro, so it may be not very suitable here).


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Tuesday, February 15, 2011 8:05 PM
    Moderator
  • One way is to create a Numbers table with a column for the English word for the numbers from 0 thru 999.  You could do this using standard techniques for generating a numbers table and then loading the English word using the function in the link in the original post.  So the table looks like

    Create Table NumbersWithWords(Number int, Cardinal varchar(1000));
    

    Load the numbers from 0 to 1000, then create the English words:

    Update NumbersWithWords Set Cardinal = dbo.fnNumberToWords(Number);
    Update NumbersWithWords Set Cardinal = 'Zero' Where Number = 0;
    

    (The special case for zero is because the function as written returns the empty string for 0 and we want it to be 'Zero'.)  Now write a function that chops the passed English words into the various parts and computes the result.  Something like the following.  Note that this function only handles numbers less than 1 billion.  Adding the capability to handle billions, trillions, etc is left as an exercise for the reader.

     

    Create Function dbo.EnglishToNumber (@Word varchar(1000)) Returns bigint As
    Begin
     Declare @RemainingWord varchar(1000),
      @IndexValue int,
      @Hundreds varchar(100),
      @Thousands varchar(100),
      @Millions varchar(100);
     Set @RemainingWord = @Word;
     
     -- Get the millions part
     Set @IndexValue = CharIndex('Million', @Word);
     If @IndexValue > 0
     Begin
      Set @Millions = Left(@RemainingWord, @IndexValue - 1);
      Set @RemainingWord = SubString(@RemainingWord, @IndexValue + 8, Len(@RemainingWord))
     End
    
     -- Get the thousands part
     Set @IndexValue = CharIndex('Thousand', @RemainingWord);
     If @IndexValue > 0
     Begin
      Set @Thousands = Left(@RemainingWord, @IndexValue - 1);
      Set @RemainingWord = SubString(@RemainingWord, @IndexValue + 9, Len(@RemainingWord))
     End
    
     -- Get the hundreds part
     -- We need this for the special case of numbers like eighty-eight hundred
     Set @IndexValue = CharIndex('Hundred', @RemainingWord);
     If @IndexValue > 0
     Begin
      Set @Hundreds = Left(@RemainingWord, @IndexValue - 1);
      Set @RemainingWord = SubString(@RemainingWord, @IndexValue + 8, Len(@RemainingWord))
     End
    
     -- What is left is the less than 1000 part, so return the result
     Return Coalesce(1000000 * (Select Number From NumbersWithWords Where @Millions = Cardinal), 0)
      + Coalesce(1000 * (Select Number From NumbersWithWords Where @Thousands = Cardinal), 0)
      + Coalesce(100 * (Select Number From NumbersWithWords Where @Hundreds = Cardinal), 0)
      + (Select Number From NumbersWithWords Where @RemainingWord = Cardinal);
    End
    go
    
    Tom

    Tuesday, February 15, 2011 8:11 PM
  • One way is to create a Numbers table with a column for the English word for the numbers from 0 thru 999.  Tom

     


    Tom - Isn't the @NumberWord table (posted in question) sufficient for the conversion? How would a third grader do it? Thanks.
    Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM
    Tuesday, February 15, 2011 8:29 PM
    Moderator

  • Tom - Isn't the @NumberWord table (posted in question) sufficient for the conversion? How would a third grader do it? Thanks.
    Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM

    Yes you could.  But it would make the function more complex.  If you had a number like "twenty-three thousand two hundred fourteen", instead of just finding the "thousand", grabbing the "twenty-three" and looking up "twenty-three" and multiplying the value by 1000, the function would have to first split the "twenty-three" into two separate strings, then look up the values.

    So the trade-off of having a look-up table with fewer entries is the function must do more string manipulation.  Since my table only needs 1,000 rows, it would seem to me to be a better choice since T-SQL is far better at table look-ups than it is at string manipulation.

    Indeed, if I had to do this for some reasonably small number of numbers, say the numbers from 1 to 1,000,000, I would just do this with a look-up table with every possible number text from 1 to 1,000,000 (some numbers would get two rows because 8,888 can be either "eighty-eight hundred eighty-eight" or "eight thousane eight hundred eighty-eight", but still the table would only take a few 10's of megabytes and the answer could be directly looked up with no processing, so it would (IMO) be worth it.

    Tom

    Tuesday, February 15, 2011 10:15 PM
  • Thanks Tom. I understand your point, however, performance is not top priority in this case, and I am looking for various logic algorithms.

    I forgot to stress in the question post that I seek a solution based on the @NumberWord  table, like a child would parse a word number.

     

     


    Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM
    Tuesday, February 15, 2011 10:24 PM
    Moderator
  • I forgot to stress in the question post that I seek a solution based on the @NumberWord  table , like a child would parse a word number.
    Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM

    If you wanted a solution based on the @NumberWord table, then

    Create Function dbo.EnglishToNumber (@Word varchar(1000)) Returns bigint As
    Begin
    	DECLARE @NumberWord TABLE (Word varchar(24), Number bigint)
    	INSERT @NumberWord VALUES
    	('ZERO',	0)
    	,('ONE',	1)
    	,('TWO',	2)
    	,('THREE',	3)
    	,('FOUR',	4)
    	,('FIVE',	5)
    	,('SIX',	6)
    	,('SEVEN',	7)
    	,('EIGHT',	8)
    	,('NINE',	9)
    	,('TEN',	10)
    	,('ELEVEN',	11)
    	,('TWELVE',	12)
    	,('THIRTEEN',	13)
    	,('FOURTEEN',	14)
    	,('FIFTEEN',	15)
    	,('SIXTEEN',	16)
    	,('SEVENTEEN',	17)
    	,('EIGHTEEN',	18)
    	,('NINETEEN',	19)
    	,('TWENTY',		20)
    	,('THIRTY',		30)
    	,('FORTY',		40)
    	,('FIFTY',		50)
    	,('SIXTY',		60)
    	,('SEVENTY',	70)
    	,('EIGHTY',		80)
    	,('NINETY',		90)
    	,('HUNDRED',	100)
    	,('THOUSAND',	1000)
    	,('MILLION', 1000000)
    	,('BILLION', 1000000000)
    	,('TRILLION', 1000000000000)
    	,('QUADRILLION',1000000000000000)
    	Declare @ResolvedValue bigint,
    	 @CurrentGroupValue bigint,
    	 @CurrentValue bigint,
    	 @CurrentWord varchar(512),
    	 @RemainingWord char(512);
    	 
    	Set @ResolvedValue = 0;
    	Set @CurrentGroupValue = 0;
    	Set @RemainingWord = Replace(@Word, '-', ' ');
    	While Len(@RemainingWord) > 0
    	Begin
    		Set @CurrentWord = Left(@RemainingWord, CharIndex(' ', @RemainingWord));
    		Select @CurrentValue = Number From @NumberWord Where Word = @CurrentWord;
    		If @CurrentValue >= 1000
    		Begin
    		 Set @ResolvedValue = @ResolvedValue + @CurrentGroupValue * @CurrentValue;
    		 Set @CurrentGroupValue = 0
    		End
    		Else If @CurrentValue >= 100
    		Begin
    		 Set @CurrentGroupValue = @CurrentGroupValue * @CurrentValue;
    		End
    		Else
    		Begin
    		 Set @CurrentGroupValue = @CurrentGroupValue + @CurrentValue;
    		End
    		Set @RemainingWord = SubString(@RemainingWord, Len(@CurrentWord) + 2, Len(@RemainingWord));
    	End
    	Return @ResolvedValue + @CurrentGroupValue;
    End
    go
    
    
    -- Translate the following words to numbers
    DECLARE @EnglishNumber TABLE (Words varchar( 512))
    INSERT @EnglishNumber VALUES
    ('Nine Hundred Ninety-Nine')
    ,('One Thousand One Hundred Eleven')
    ,('Eleven Hundred Eleven')
    ,('Eight Thousand Eight Hundred Eighty-Eight')
    ,('Eighty-Eight Hundred Eighty-Eight')
    ,('Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven')
    ,('Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine')
    --SELECT * FROM @EnglishNumber
    
    SELECT *, dbo.EnglishToNumber(Words) FROM @EnglishNumber
    
    go
    
    Drop Function dbo.EnglishToNumber
    
    Tom

    Wednesday, February 16, 2011 3:52 AM
  • Thanks for all the contributions. Special thanks to Tom Cooper.
    Kalman Toth, Admin/Prog, SSAS, SSIS, SSRS; SQL 2008 GRAND SLAM
    Wednesday, February 16, 2011 3:14 PM
    Moderator
  • Need more tweak.

    ,('Two thousand hundred eleven')

    Wednesday, February 16, 2011 4:43 PM
  • This is not correct number in English. The function probably does not filter invalid input.
    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Wednesday, February 16, 2011 4:51 PM
    Moderator