none
How to export Windows Character Map? RRS feed

  • Question

  • Hi all,

    I have written a program that will take an input file that contains strings of UTF-8 character strings, and then launch the program I am testing, attempt to create a file name using the string, and then, if successful, attempt to open the same file.  The purpose here is to test my software application to make sure it does not choke on special (foreign language) character sets.

    The program appears to work great.

    Now, I would like to generate a robust list of character strings for it to use to test.

    To this end, I would like to output every character from the Windows Character Map.

    Unfortunately, the Windows Character Map requires you to select each character individually.

    Is there a way to extract the entire character set?

    Thanks,

    Steve

    Monday, July 30, 2018 1:09 PM

All replies

  • I believe I am getting closer in that I found this resource:

    https://www.fileformat.info/info/unicode/utf8test.htm

    However even this will be tedious to extract into lines of text.

    Would be great to have a list of all UTF-8 character mappings as plain text.

    Steve

    Monday, July 30, 2018 1:23 PM
  • I have discovered this resource:

    https://www.fileformat.info/info/charset/UTF-8/list.htm

    Now, it only lists them a page at a time; you must click "more..." at the bottom of each page, but it should be relatively easy to strip all of these out into a text file.

    Steve

    Monday, July 30, 2018 2:48 PM
  • This is a little rough and still dumps a few non-display characters (depending on what you use to view the file), but this may get you close enough for your needs:

    Public Sub DumpUnicodeCharacters(filePath As String)
        Using stream = IO.File.Open(filePath, FileMode.OpenOrCreate)
            Using writer As New IO.StreamWriter(stream, System.Text.Encoding.Unicode)
                For i = 1 To UShort.MaxValue
                    Dim b = BitConverter.GetBytes(i)
    
                    For Each c In System.Text.Encoding.Unicode.GetChars(b)
                        Select Case Char.GetUnicodeCategory(c)
                            Case Globalization.UnicodeCategory.Control,
                                 Globalization.UnicodeCategory.Format,
                                 Globalization.UnicodeCategory.EnclosingMark,
                                 Globalization.UnicodeCategory.LineSeparator,
                                 Globalization.UnicodeCategory.NonSpacingMark,
                                 Globalization.UnicodeCategory.ParagraphSeparator,
                                 Globalization.UnicodeCategory.SpacingCombiningMark,
                                 Globalization.UnicodeCategory.OtherNotAssigned,
                                 Globalization.UnicodeCategory.PrivateUse,
                                  Globalization.UnicodeCategory.Surrogate
                            Case Else
                                writer.Write(c)
                        End Select
                    Next
                Next
            End Using
        End Using
    End Sub
    


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Monday, July 30, 2018 7:39 PM
    Moderator