none
Converting string with accented characters to non-accented equivalent RRS feed

  • Question

  • I want to create a VB.NET function that will take a string containing accented characters and convert it into the non-accented equivalent. My particular requirement is associated with non-English names. Some real examples are Štimac, Hukić, Böttcher, Bjørnbakk, Fürnrohr and Synnevåg. I would want to convert those examples to Stimac, Hukic, Bottcher, Bjornbakk, Furnrohr and Synnevag, respectively.

    I am aware that a ü (with an umlaut accent) is often replaced by 'ue' but replacing it with u is better for my purposes. Similarly, I want to replace ö with o, rather than 'oe'. I am assuming that all the required conversions will need to be hard-coded in the function.

    Are there any slick ways to write such a function, ideally preserving the capitalisation in the original string?

    David

    Thursday, February 9, 2012 1:45 PM

Answers

  • Hi haggis999,

    If you come across any other characters just add them into this code in the strings in their equivalent positions.

    :-)

    Worth reading: I have just discovered most of the accented characters here with a

    note about keyboard shortcuts in WordPad so I have now edited the code below.  :-D

    >>

    http://www.jarte.com/help_new/accent_marks_diacriticals_and_special_characters.html

    The page includes the following:

    • Acute accented characters
    • Grave accented characters
    • Umlaut accented characters
    • Circumflex accented characters
    • Tilde accented characters
    • Cedilla accented characters
    • Ring accented characters
    • Caron accented characters

    Try this with one Button on a Form please.>>

    Public Class Form1
    
        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    
            Dim inputString As String = "Štimac"
            Dim outputString As String = ""
            outputString = UnAccent(inputString)
            MessageBox.Show(outputString)
    
        End Sub
    
        Public Function UnAccent(ByVal aString As String) As String
            Dim toReplace() As Char = "àèìòùÀÈÌÒÙ äëïöüÄËÏÖÜ âêîôûÂÊÎÔÛ áéíóúÁÉÍÓÚðÐýÝ ãñõÃÑÕšŠžŽçÇåÅøØ".ToCharArray
            Dim replaceChars() As Char = "aeiouAEIOU aeiouAEIOU aeiouAEIOU aeiouAEIOUdDyY anoANOsSzZcCaAoO".ToCharArray
            For index As Integer = 0 To toReplace.GetUpperBound(0)
                aString = aString.Replace(toReplace(index), replaceChars(index))
            Next
            Return aString
        End Function
    
    End Class




    Regards,

    profile for John Anthony Oliver at Stack Overflow, Q&A for professional and enthusiast programmers

    Click this link to see the NEW way of how to insert a picture into a forum post.

    Installing VB6 on Windows 7

    App Hub for Windows Phone & XBOX 360 developers.

    • Proposed as answer by chipmunkofdoom2 Thursday, February 9, 2012 8:47 PM
    • Edited by John Anthony Oliver Thursday, February 9, 2012 9:41 PM Added the slashed letter O character Ø in upper and lower case.
    • Marked as answer by haggis999 Friday, February 10, 2012 12:28 AM
    Thursday, February 9, 2012 7:59 PM

All replies

  • There's no direct way to tell the compiler through syntax that an accented e is the same as a regular one. To the compiler that just looks like two different chars regardless. So you'll have to have a "database" of your own for the comparison. Perhaps a multi-dimensional array can help you with this? Otherwise some kind of XML file with a comparison to a regular char value, but encoding is going to be an issue as well, which is something you need to consider here.

    If a post helps you in any way or solves your particular issue, please remember to use the Propose As Answer option or Vote As Helpful
    ~ "The universe is an intelligence test." - Timothy Leary ~


    • Edited by Troy Garner Thursday, February 9, 2012 2:17 PM
    Thursday, February 9, 2012 2:16 PM
  • I am aware that I could write code as follows, but is there a slicker way to do it?

    'Lower case substitutions     

    NewString = Replace(OrigString, "š", "s")

    NewString = Replace(NewString, "ć", "c")

    NewString = Replace(NewString, "ö", "o")

    NewString = Replace(NewString, "ø", "o")

    NewString = Replace(NewString, "ü", "u")

    NewString = Replace(NewString, "å", "a")

    'Upper case substitutions

    NewString = Replace(NewString, "Š", "S")

    NewString = Replace(NewString, "Ć", "C")

    NewString = Replace(NewString, "Ö", "O")

    NewString = Replace(NewString, "Ø", "O")

    NewString = Replace(NewString, "Ü", "U")

    NewString = Replace(NewString, "Å", "A")



    • Edited by haggis999 Thursday, February 9, 2012 3:02 PM
    Thursday, February 9, 2012 2:47 PM
  • Yeah, there are better ways, you can do this an infinite number of ways almost actually. Like I said, if you're going as simple as that, a 2 dimensional array might help you out :)

    Read more here: http://msdn.microsoft.com/en-us/library/02e7z943(v=vs.80).aspx

    Basically what you'll want to end up doing is aligning each in the code you have above, across from eachother in this 2 dimensional array, and then you can swap them as you find a char that isn't regular A-Z.


    If a post helps you in any way or solves your particular issue, please remember to use the Propose As Answer option or Vote As Helpful
    ~ "The universe is an intelligence test." - Timothy Leary ~


    • Edited by Troy Garner Thursday, February 9, 2012 2:54 PM
    Thursday, February 9, 2012 2:52 PM
  • I've never used arrays before (except perhaps in copied code samples) so thanks for the link. I will check it out.

    Are there any clever ways to handle the different case conversions or can that only be done by explicitly defining each substitution in both upper and lower case?

    David

    Thursday, February 9, 2012 3:10 PM
  • The framework defines EncoderFallback and EncoderFallbackBuffer classes in System.Text to provide a mechanism for solving this kind of issue.

    You can create your own EncoderFallback class and associated buffer and then use an instance of the fallback class to perform an encoding conversion from Unicode to ASCII or UTF8, as you prefer.

    Please see the following MSDN Library pages for more detail:
    http://msdn.microsoft.com/en-us/library/system.text.encoderfallback.aspx
    http://msdn.microsoft.com/en-us/library/system.text.encoderfallbackbuffer.aspx


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Thursday, February 9, 2012 4:37 PM
    Moderator
  • Hi Reed,
    Thanks for the contribution but as my current VB.NET skills are fairly basic, your reply went right over my head. I've checked your links but they are designed for someone who already knows the purpose of the EncoderFallback and EncoderFallbackBuffer classes. I've no idea how these classes relate to my stated problem.

    David

    Thursday, February 9, 2012 6:59 PM
  • Hi haggis999,

    If you come across any other characters just add them into this code in the strings in their equivalent positions.

    :-)

    Worth reading: I have just discovered most of the accented characters here with a

    note about keyboard shortcuts in WordPad so I have now edited the code below.  :-D

    >>

    http://www.jarte.com/help_new/accent_marks_diacriticals_and_special_characters.html

    The page includes the following:

    • Acute accented characters
    • Grave accented characters
    • Umlaut accented characters
    • Circumflex accented characters
    • Tilde accented characters
    • Cedilla accented characters
    • Ring accented characters
    • Caron accented characters

    Try this with one Button on a Form please.>>

    Public Class Form1
    
        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    
            Dim inputString As String = "Štimac"
            Dim outputString As String = ""
            outputString = UnAccent(inputString)
            MessageBox.Show(outputString)
    
        End Sub
    
        Public Function UnAccent(ByVal aString As String) As String
            Dim toReplace() As Char = "àèìòùÀÈÌÒÙ äëïöüÄËÏÖÜ âêîôûÂÊÎÔÛ áéíóúÁÉÍÓÚðÐýÝ ãñõÃÑÕšŠžŽçÇåÅøØ".ToCharArray
            Dim replaceChars() As Char = "aeiouAEIOU aeiouAEIOU aeiouAEIOU aeiouAEIOUdDyY anoANOsSzZcCaAoO".ToCharArray
            For index As Integer = 0 To toReplace.GetUpperBound(0)
                aString = aString.Replace(toReplace(index), replaceChars(index))
            Next
            Return aString
        End Function
    
    End Class




    Regards,

    profile for John Anthony Oliver at Stack Overflow, Q&A for professional and enthusiast programmers

    Click this link to see the NEW way of how to insert a picture into a forum post.

    Installing VB6 on Windows 7

    App Hub for Windows Phone & XBOX 360 developers.

    • Proposed as answer by chipmunkofdoom2 Thursday, February 9, 2012 8:47 PM
    • Edited by John Anthony Oliver Thursday, February 9, 2012 9:41 PM Added the slashed letter O character Ø in upper and lower case.
    • Marked as answer by haggis999 Friday, February 10, 2012 12:28 AM
    Thursday, February 9, 2012 7:59 PM
  • John's post is probably the best solution. It's a good blend of being an elegant solution and being understandable.
    Thursday, February 9, 2012 8:48 PM
  • Hi John,
    That is just the sort of compact, elegant and clear solution I was hoping for. I have just tried it out on a test ASP.NET web page and it works perfectly.

    Many thanks for advancing my VB.NET skills ;=)

    David

     
    Thursday, February 9, 2012 8:56 PM
  • John's post is probably the best solution. It's a good blend of being an elegant solution and being understandable.

     Hi chipmunkofdoom2,

     Thank you for your vote and proposing my previous post as answer and your kind words.  :-D




    Regards,

    profile for John Anthony Oliver at Stack Overflow, Q&A for professional and enthusiast programmers

    Click this link to see the NEW way of how to insert a picture into a forum post.

    Installing VB6 on Windows 7

    App Hub for Windows Phone & XBOX 360 developers.

    Thursday, February 9, 2012 8:59 PM
  • Hi John,
    That is just the sort of compact, elegant and clear solution I was hoping for. I have just tried it out on a test ASP.NET web page and it works perfectly.

    Many thanks for advancing my VB.NET skills ;=)

    David

     

    Hi David,

    You are welcome.  :)

    If it answers your question please mark my post with the code as answer

    using a left mouse click on mark as answer in my 1st post here  :-)




    Regards,

    profile for John Anthony Oliver at Stack Overflow, Q&A for professional and enthusiast programmers

    Click this link to see the NEW way of how to insert a picture into a forum post.

    Installing VB6 on Windows 7

    App Hub for Windows Phone & XBOX 360 developers.

    Thursday, February 9, 2012 9:09 PM
  • Hi John,
    Oops. I forgot about the 'Mark As Answer' link and clicked the 'Vote As Helpful' link instead. You now have both.

    Thanks again!

    David

    Friday, February 10, 2012 12:31 AM