none
Extracting Chinese UTF-8 Characters to a text file returns "???" RRS feed

  • Question

  • I have an MS Word macro which searches for italicized words. For each italicized word found, the code captures the word, the current page number and the current Section number into variables.  The variables are concatenated into a "$" delimited string and appended to a text file. The code loops until all of the italicized words are extracted. Once this is complete, an Access program imports the text file as a table.

    This works great for Western character sets, but now I am applying the macro to a Chinese UTF8 and of course when I open the text file, the variables holding the Chinese words return "????".
    This is the VBA code I use to append the concatenated string to the text file:

    Open strPath For Append As #1
    varstrLLT = iCounter & "$" & varSection & "$" & strText & "$" & strPage & "$" & pstrLevel & "$" & strNote
    Print #1, varstrLLT
    Close #1
    
    Any suggestions on how I can maintain the UTF-8 characters? Is there a better way I can do this? I am looking into using ADODB.Stream (for the first time.) Can Word VBA insert the string directly into an Access table?  Thanks in advance!

    Friday, July 13, 2018 8:07 PM

Answers

  • The "Open", "Print" etc. commands in VBA are quite old and have not been kept up to date in a number of ways.

    I would guess most people would use the FileSystemObject to do this (on Windows Word, anyway). First use Tools->References to make a reference to Microsoft Scripting Runtime, then use code along the following lines:

    Dim fso As Scripting.FileSystemObject
    Dim ts As Scripting.TextStream
    
    Set fso = New Scripting.FileSystemObject
    ' TristateTrue specifies Unicode encoding
    Set ts = fso.OpenTextFile( _
               FileName:=strPath, _
               IOMode:=IOMode.ForAppending, _
               create:=True, 
               Format:=Tristate.TristateTrue)
    varstrLLT = iCounter & "$" & varSection & "$" & strText & "$" & strPage & "$" & pstrLevel & "$" & strNote
    ' or use ts.Write
    ts.WriteLine varstrLLT
    ts.Close
    Set ts = Nothing
    Set fso = Nothing
    

    Although you can use an ADODB Stream to write a text file (with the Unicode characters intact), I only used it once (rather a long time ago) and cannot remember what problems and limitations I encountered.  e.g., since you only get to specify a filename when you save the stream, it is possible that there is a buffer size limit. But, first make a reference to the appropriate ADODB library (e.g. Microsoft ActiveX Data Objects 6.1 Library), then you can use code along the lines of:

    Dim ads As ADODB.Stream
    Set ads = New ADODB.Stream
    ads.Open
    varstrLLT = iCounter & "$" & varSection & "$" & strText & "$" & strPage & "$" & pstrLevel & "$" & strNote
    
    ' This should overwrite on the first write, and append until you close
    ads.WriteText varstrLLT, adWriteChar
    
    ' Save and close when done.
    ads.SaveToFile strPath, adSaveCreateOverWrite
    ads.Close
    Set ads = Nothing
    

    You can also use ADO or DAO to insert the string directly into a field in a record in an Access table. For ADO, again, make a reference to the appropriate ADODB library, as above, then there are several ways you could add records, e.g. via a RecordSet, or using SQL Commands. The Recordset is probably the simplest,  e.g.

    Dim con As ADODB.Connection Dim rst As ADODB.Recordset Set con = New ADODB.Connection ' strPath would need to be the full path and name of the database con.Open "Provider=Microsoft.ACE.OLEDB.12.0;User ID=Admin;Data Source=" & strPath & ";" Set rst = New ADODB.Recordset Set rst.ActiveConnection = con ' Assumes you want to append to a table called "mytable" rst.Open "mytable", , , adLockOptimistic ' Say instead you have a tables with columns for iCounter ' etc., called "Counter", "Section", "Text", "Page", "Level" ' and "Note".

    ' Then you can do this. I assume the columns ' have been created with suitable data types rst("Counter") = iCounter rst("Section") = varSection rst("Text") = strText rst("Page") = strPage rst("Level") = pstrLevel rst("Note") = strNote rst.Update Set rst = Nothing con.Close Set con = Nothing End Sub





    Peter Jamieson

    • Proposed as answer by Terry Xu - MSFT Monday, July 16, 2018 12:38 AM
    • Marked as answer by Herr Egg Monday, July 16, 2018 5:32 PM
    Sunday, July 15, 2018 10:40 AM

All replies

  • The "Open", "Print" etc. commands in VBA are quite old and have not been kept up to date in a number of ways.

    I would guess most people would use the FileSystemObject to do this (on Windows Word, anyway). First use Tools->References to make a reference to Microsoft Scripting Runtime, then use code along the following lines:

    Dim fso As Scripting.FileSystemObject
    Dim ts As Scripting.TextStream
    
    Set fso = New Scripting.FileSystemObject
    ' TristateTrue specifies Unicode encoding
    Set ts = fso.OpenTextFile( _
               FileName:=strPath, _
               IOMode:=IOMode.ForAppending, _
               create:=True, 
               Format:=Tristate.TristateTrue)
    varstrLLT = iCounter & "$" & varSection & "$" & strText & "$" & strPage & "$" & pstrLevel & "$" & strNote
    ' or use ts.Write
    ts.WriteLine varstrLLT
    ts.Close
    Set ts = Nothing
    Set fso = Nothing
    

    Although you can use an ADODB Stream to write a text file (with the Unicode characters intact), I only used it once (rather a long time ago) and cannot remember what problems and limitations I encountered.  e.g., since you only get to specify a filename when you save the stream, it is possible that there is a buffer size limit. But, first make a reference to the appropriate ADODB library (e.g. Microsoft ActiveX Data Objects 6.1 Library), then you can use code along the lines of:

    Dim ads As ADODB.Stream
    Set ads = New ADODB.Stream
    ads.Open
    varstrLLT = iCounter & "$" & varSection & "$" & strText & "$" & strPage & "$" & pstrLevel & "$" & strNote
    
    ' This should overwrite on the first write, and append until you close
    ads.WriteText varstrLLT, adWriteChar
    
    ' Save and close when done.
    ads.SaveToFile strPath, adSaveCreateOverWrite
    ads.Close
    Set ads = Nothing
    

    You can also use ADO or DAO to insert the string directly into a field in a record in an Access table. For ADO, again, make a reference to the appropriate ADODB library, as above, then there are several ways you could add records, e.g. via a RecordSet, or using SQL Commands. The Recordset is probably the simplest,  e.g.

    Dim con As ADODB.Connection Dim rst As ADODB.Recordset Set con = New ADODB.Connection ' strPath would need to be the full path and name of the database con.Open "Provider=Microsoft.ACE.OLEDB.12.0;User ID=Admin;Data Source=" & strPath & ";" Set rst = New ADODB.Recordset Set rst.ActiveConnection = con ' Assumes you want to append to a table called "mytable" rst.Open "mytable", , , adLockOptimistic ' Say instead you have a tables with columns for iCounter ' etc., called "Counter", "Section", "Text", "Page", "Level" ' and "Note".

    ' Then you can do this. I assume the columns ' have been created with suitable data types rst("Counter") = iCounter rst("Section") = varSection rst("Text") = strText rst("Page") = strPage rst("Level") = pstrLevel rst("Note") = strNote rst.Update Set rst = Nothing con.Close Set con = Nothing End Sub





    Peter Jamieson

    • Proposed as answer by Terry Xu - MSFT Monday, July 16, 2018 12:38 AM
    • Marked as answer by Herr Egg Monday, July 16, 2018 5:32 PM
    Sunday, July 15, 2018 10:40 AM
  • Thank you. I will give it a try!

    Sunday, July 15, 2018 2:21 PM