none
OpenDataSource wdOpenFormatText UTF8 file RRS feed

  • Question

  • I have been using OpenDataSource via COM for sometime with ANSI csv files

    I set the format to wdOpenFormatText and it opens fine

    However if I present it with a UTF-8 CSV file with a BOM marker at the front it opens the conversion wizard and I don't want that

    I thought if I set the format to wdOpenFormatUnicodeText it would work but it does not it still brings up the conversion wizard

    In both cases it has pre selected UTF-8 so it obviously knows.

    How can I prevent the dialog from appearing when presenting Word with a UTF-8 csv file and just have it get on with it

    Thanks

    Wednesday, April 22, 2020 1:15 PM

All replies

  • One of the OpenDataSource Method's parameters is to set ConfirmConversions:=False. Did you try that?

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Wednesday, April 22, 2020 11:26 PM
  • Yes I already have that set to false

    The registry hack mentioned in point 2 here 'works' but it is not a viable solution for commercial software communicating via COM

    https://answers.microsoft.com/en-us/msoffice/forum/all/file-conversion-encoding-default/5ad8234b-284f-46b0-9532-7f2e1f8f80ee

    Thursday, April 23, 2020 7:41 AM
  • A viable alternative might be to edit the file via code and remove the BOM marker before using it as a datasource.

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Thursday, April 23, 2020 8:45 AM
  • But would it treat that file as ANSI or UTF8 ?
    Thursday, April 23, 2020 8:48 AM
  • The problem seems to be that there is a "sticky" flag within Word that remembers whether or not you specified noEncodingDialog in the last Documents.Open call.

    I haven't explored in depth, but if you can

     a. programmatically open a file (preferably something you know you can open without problems, perhpas even the Mailmerge Data Source file that you want to use), using Documents.Open and passing noEncodingDialog:=true
     b. Close the file without saving
     c. then do your OpenDataSource

    then I think you will find that the Encoding dialog does not pop up, and that Word will correctly recognise UTF-8 with BOM and will probably correctly recognise UTF-8 without BOM.

    Of course you shouldn't have to do any of that - you should just be able to specify whatever parameters are needed for Word to recognise the encoding, the delimiters and so on. But there it is.

    FWIW my own experience has generally been that using OpenDataSource programmatically (at least with a text file) never pops up the "Confirm Conversions" dialog, regardless of the setting in Word and the value of the ConfirmConversions parameter. However, the Encoding dialog popping up on its own is something I hadn't personally experienced, at least not that I remember.

    When OpenDataSource opens a text file in this way, it uses Word's internal text file converter, which seems to do a pretty good job of recognising encoding anyway. When you open a text file data source *manually*, what happens depends on whether Confirm Conversions is enabled in the User Interface or not. If it is, you'll probably get the conversion dialog with the OLE DB data source option, which is a lot less likely to read the file correctly. If it isn't enabled, Word opens using its text converter and that generally works a lot better (though if you have large data sources it may alo be a lot slower).


    Peter Jamieson


    Peter Jamieson

    Thursday, April 23, 2020 5:20 PM
  • Thanks for the detailed reply

    I've been having a go with this without much sucess

    If I call Documents.Open passing my CSV, visible false, read only true, encoding msoEncodingUTF8 and noEncodingDialog true then is opens my CSV without a pop up

    If I immediately close it again and then call OpenDataSource I still get the popup regardless of if I don't set the format or set it to wdOpenFormatUnicodeText or wdOpenFormatText

    I then had a Ha! moment when I saw that there was a DefaultTextEncoding of the Options object

    So I got it and it said 1252 so I set it to msoEncodingUTF8 (65001) and .... no difference with OpenDataSource

    I tried setting it again at the earliest opportunity after connecting to Word but still no luck

    Interestingly if I create that hack registry entry then when I get DefaultTextEncoding it is 65001 and not 1252 and OpenDataSource does work without popup 

    So either that the existence of that reg entry also sets something else or it is just not possible to set this early enough to have the desired effect

    Frustrating to say the least



    Friday, April 24, 2020 11:04 AM
  • You could, of course, have your code write the registry entry before opening the file, then remove the entry afterwards. For some VBA code you could adapt, see 'Toggle the SQL security entry in the registry through vba' at: https://www.gmayor.com/word_vba_examples.htm

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Friday, April 24, 2020 11:31 AM
  • But would it treat that file as ANSI or UTF8 ?
    Try it and see.

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Friday, April 24, 2020 11:32 AM
  • Well, just in case, here's the code I was using - it's Excel VBA code to simulate controlling Word from outside (it's just a test - I happen to know that Word is open and has a document open). If I use False in the last parameter of the Documents.Open call, I see the encoding dialog twice. If I use True, I don't see it either time.

    But unfortunately that probably just means there is some other factor in play.

    BTW, what does your OpenDataSource call look like, and which versions of Word do you have to support? Does the OpenDataSource call have any parameters other than the file name? There is another way to do this that *might* avoid the dialog box but it involves two additional external files ( a .odc file and a schema.ini file) and it can only work with text files that can be opened using the ACE/Jet OLE DB provider (e.g. there is a limit of 255 columns)

    Finally, yes, the DefaultTextEncoding property looks like it's made for the job but unfortunately only affects the format when Word is saving (to HTML, I think).

    Sub dsconnect()
    Dim wapp As Word.Application
    Dim wdoc As Word.Document
    Dim wtmp As Word.Document
    Set wapp = GetObject(, "Word.Application")
    Set wdoc = wapp.Documents(1)
    Set wtmp = wapp.Documents.Open("g:\test\utf8nobom.txt", False, , , , , , , , , , , , , True)
    wtmp.Close savechanges:=wdDoNotSaveChanges
    Set wtmp = Nothing
    With wdoc.MailMerge
      .OpenDataSource Name:="g:\test\utf8nobom.dat"
    End With
    Set wdoc = Nothing
    Set wapp = Nothing
    End Sub
    



    Peter Jamieson

    Friday, April 24, 2020 12:11 PM
  • Thanks again

    Currently my OpenDataSource is passed the filename wdOpenFormatUnicodeText ConfirmConversion false ReadOnly true LinkToSource false

    I have tried different format values including nothing but it does not make a difference

    Friday, April 24, 2020 12:25 PM
  • Yes, I was more concerned that you might have had stuff in there that *might* have told Word to use some other mechanism to open the data source (like a Connection String and a SQL Query).

    However, I am fairly sure that in Word 2016/19, when you pass a .txt file in the Name parameter, Word ignores any connection string and query anyway.

    (The Name parameter is almost certainly enough in this case.)

    Peter Jamieson

    Friday, April 24, 2020 1:05 PM
  • "There is another way to do this that *might* avoid the dialog box but it involves two additional external files ( a .odc file and a schema.ini file) and it can only work with text files that can be opened using the ACE/Jet OLE DB provider (e.g. there is a limit of 255 columns)"

    If I am looking at the right thing this seems to suggest schema.ini only supports ANSI and OEM ?

    https://docs.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-ver15#selecting-a-character-set

    Friday, April 24, 2020 2:15 PM
  • Maybe the only solution is to connect Word to an SQL database. I assume if I did that it can handle Unicode data  that is in nchar and nvarchar fields ?
    Friday, April 24, 2020 2:18 PM
  • Yes, you could connect via a .odc to SQL Sever for example (you may still have to use an OLE DB provider with SQL Server because Word doesn't "see" Unicode field types when you use ODBC. Or maybe you are thinking of somethng like MySQL - should still be OK AFAIK.

    Peter Jamieson

    Friday, April 24, 2020 2:30 PM
  • When you connect to a *text file* from Open Data Source what you need for a comma-delimited UTF-8 file is a folder that contains the .txt file and the schema.ini. It can also contain the .odc if that makes sense.

    Let's say your folder is g:\data and your text file is called myfile.txt. Then schema.ini needs an entry like this:

    [myfile.txt]
    ColNameHeader=True 
    Format=Delimited(,) 
    MaxScanRows=25 
    CharacterSet=65001


    (MaxScanRows tells the provider how many rows to look at to decide what data type is in each column. You may have come across the problem with Excel where the provider scans 8 rows and may return the data wrongly with mixed types. You can set it to 0 to get the provider to scan all the rows if necessary, or you can specify all the data types in schema.ini). I think you should be able to use

    Format=CSVDelimited

    if you prefer.

    Your .odc can be a completely empty text file, which is handy because that means it doesn't need any paths hardcoded in it. Let's call it universal.odc and put it in the same folder.

    Your OpenDataSource then needs to be:

    WordDocObject.MailMerge.OpenDataSource _
      Name:="G:\data\universal.odc", _
      Connection:="Provider=Microsoft.ACE.OLEDB.12.0;Data Source=g:\data;Mode=Read;Jet OLEDB:Engine Type=96;", _
      SQLStatement:="SELECT * FROM [myfile.txt]"
    

    (Usually I assume people have the ACE provider now but on older systemss you'd have to use the older Jet one. 

    Anyway, see how that goes...

     

    Peter Jamieson

    Friday, April 24, 2020 2:53 PM
  • Thanks Peter

    I've got to the 'next bit' but when I come to do the merge I get the 

    So it has not picked up the column names from the first row they are all F* apart from one called T

    Friday, April 24, 2020 3:26 PM
  • My best guess is that your title row has titles with double quotation marks around them.

    You shouldn't need those unless you have some really weird characters in your column names. Spaces in the names shouldn't cause a problem, for example.

    You should be able to have double quotation marks around the data values below the title row if you need them.

    FWIW all my tests have been with a CSV or Tab delimited file with no double quotation marks in the title row.

    Peter Jamieson

    Friday, April 24, 2020 4:04 PM
  • OK It seems that it can't cope with "" around column names (yes I know they should not be needed)

    But it does cope with them in the body of CSV data

    So OpenDataSource now works

    However I tried putting an é in the text and both outlook and my personal webmail and Word itself if you merge to a new document  are displaying it as Ã©

    So it looks like it read it correctly as that is the correct code point bytes but I think Word is seeing it as two bytes and not one character

    My dev machine has Office16 on it

    Friday, April 24, 2020 4:07 PM
  • Thanks looks like our messages crossed mid air !

    So yes it is now reading ok …. just not displaying OK ….

    Friday, April 24, 2020 4:11 PM
  • BTW, if that problem *is* down to double quotation marks, I suppose it's possible that they also had an effect on whether or not Word displays the encoding dialog. I don't think so, but perhaps worth checking.

    Peter Jamieson

    Friday, April 24, 2020 4:12 PM
  • It's all OK here. Can you try copying an

    é

    (e-acute) from the Windows Character map into your text file and trying again?

    Is the character set definitely set to 65001 in the schema.ini, with the schema.ini in the same folder as the text file?

    Peter Jamieson

    Friday, April 24, 2020 4:18 PM
  • [myfile.csv]
    ColNameHeader=True
    Format=CSVDelimited
    MaxScanRows=25
    CharacterSet=65001

    The é in the character map is the ANSI one (E9) the UTF8 one is C3 A9

    If I add that registry hack Word still displays the UTF8 incorrectly it makes no difference

    End of day here so speak Monday. 

    Thanks


    • Edited by Minieggs1999 Friday, April 24, 2020 4:35 PM incorrectly
    Friday, April 24, 2020 4:32 PM
  • Hi 

    Just quickly

    Last night I looked at the exact same message on my iPhone that had been sent to my personal account and that displayed the é correctly

    So I would say Word is reading the data correctly and outputting correctly it is just whatever is viewing it, including Word itself is not being told this is UTF8

    I had a 30 second go of changing  the Outlook advanced option for encoding of outgoing messages to UTF8 but that has not affected the email creaed by Word

    I could not see a similar option in Word advanced settings

    Anyway enjoy your weekend and chat next week

    Thanks

    Saturday, April 25, 2020 7:09 AM
  • Ha !

    With the UTF8 setting in Outlook the same message actually displays INCORECTLY on the iPhone

    So it has an effect ... just not a desireed one

    I'm definately stopping now for the weekend !

    Cheers

    Saturday, April 25, 2020 7:13 AM
  • I was originally going to wait until Monday. The thing about the character map é being "E9" is a bit of a red herring - when you put it in your text file and save as UTF8, you should get the correct UTF8 encoding. If necessary, at some point I'll post my small UTF-8 test file.

    Meanwhile, it looks to me either as if  as if the OLE DB is not finding the schema.ini. You could check that by editing the schema.ini and changing the delimiter specified in there. If OLE DB *is* seeing the schema.ini, that should make Word interpret the title line wrongly (e.g. you may just see one long column name).. If it *isn't* seeing the schema.ini, then OLE DB is assuming csv anyway and you should get the same result as before. My best guess is that if you did something like save schema.ini from Notepad, it may actually have saved as schema.ini.txt.

    Another possibility is that the OpenDataSource code isn't actually opening using OLE DB (as it is here). If it is, after the code runs, if you put ?ActiveDocument.MailMerge.DataSource.Type in the Immediate Window of the VBE, you should see 5 rather than 0. Catch up on Monday...


    Peter Jamieson

    Sunday, April 26, 2020 4:59 PM
  • I can confirm the mailmerge datasource type is 5

    The problem does seem to be the schema.ini file is not being used

    It is definitely called schema.ini and is in the same folder as the csv

    If I change it to TabDelimited or rename it to zschema.ini then it has no effect the file is still opened and the characters displayed "incorrectly" so the defaults must be CSV with header row

    I tried renaming the csv to txt in all places but that did not appear to have an effect either just incase it was inoring schema.ini for csv extension

    I can also confirm if I open the SAME CSV directly without the odc datasource and the first row not having quotes I still get the encoding dialog but if I choose UTF8 (which is pre selected) then it does display the é correctly in the merged document. 

    So if I could get the odc version to read my schema.ini it should work

    I can't for the life of me see why it is not


    • Edited by Minieggs1999 Monday, April 27, 2020 10:13 AM added info on direct read of CSV
    Monday, April 27, 2020 9:20 AM
  • I can't spot any errors, but I'm going to put together a small example for you to download to see if that makes any difference. It won't be ready until later today. Meanwhile, perhaps a good time to explore the other options mentioned way back (AFAICR the other simple approach is to use Word to open the text file with a known encoding, then save as a *Word document* where encoding is a non-issue, and use that as the data source.)

    Peter Jamieson

    Monday, April 27, 2020 11:27 AM
  • Thanks Peter I look forward to the sample

    Monday, April 27, 2020 11:31 AM
  • I've put a .zip file at https://1drv.ms/u/s!AgOi7jMKouNNoxqp1CFDC-JlWaog?e=NqAKR7 - unzip it, put the contents in c:\test if possible, read the readme.txt and take it from there.

    Peter Jamieson

    Monday, April 27, 2020 1:21 PM
  • Peter

    Thanks VERY much for this highly detailed example

    Your example appears to work (as you would expect!) and editing the schema.ini file has an effect too so that is obviously being used

    I can't see anything obviously different that your example does other than it is running inside of Word and is not a 3rd party calling it via the COM interface

    Whilst you were creating you excellent sample I setup a test and tried connecting Word directly to an SQL database and put this obscure character ߷ in one of the nchar fields

    http://www.fileformat.info/info/unicode/char/07f7/index.htm

    Word seems to merge it OK and Outlook displays it correctly in the merged message it sent and in my web email and on my iPhone

    So that does seem like a possibility although it is a bit of a sledge hammer to crack a nut and I would like to understand why I can't get the csv/OLE example to work. 

    Do you think there could be a quirk between automating Word from within itself as opposed to connecting to it externally via COM ?


    • Edited by Minieggs1999 Monday, April 27, 2020 2:07 PM schema.ini !
    Monday, April 27, 2020 2:02 PM
  • I'd forgotten about that aspect after the early tests as there was no problem here, but I've uploaded a .xlsm with an adapted version of the same code. It's at https://1drv.ms/x/s!AgOi7jMKouNNoxsXpiAv3oSUOB3h?e=mJHi0h . If you can download it to the same test folder as the other stuff, open it up, edit the Sub in there if necessary, then run it, that should at least show that it's not external COM in general that's creating a problem. You may get the usual problems if you have saved the .docm with a data source so it's probably worth starting Word, opening the .docm, and disconnecting the datasource before running the Excel code.

    What are you actually using to automate Word?

    Peter Jamieson

    Monday, April 27, 2020 3:09 PM
  • Thanks for the new sample

    I managed to download it but when I first clicked on the link it opened it in an online version of Excel

    That seems to work too !

    You almost certainly won't have head of the language I am using, It's called Dataflex and is by dataaccess.com

    I will look again at my code but it is the same as yours in terms of the operations performed

    I have Office 365 (which I think is 16.0) on my dev machine

    Monday, April 27, 2020 3:25 PM
  • Oh I meant to say is there any way of knowing where it is ACTUALLY looking for the schema.ini ?
    Monday, April 27, 2020 3:27 PM
  • I wonder if Dataflex tries to intercept the Jet/ACE drivers/providers in any way, or sets up its own Jet/ACE settings in the registry. OTTOMH I can't think of any settings that would actually affect this, and since in both my tests it is still the Word object that is actually opening the file, it would have to be something that Dataflex changes while it's running *and* which affects what Word does. Re. the "using SQL Server" approach, I'm sure it's workable - you basically need a .odc with the necessary connection parameters, or you may stil be able to use an empty .odc + a connectstring in the opendatasource call as in my existing example. The main problem is usually security - it's very difficult to set that up without leaving plaintext username/password in the .odc or .docx file unless you're using Microsoft's Integrated Security (in a domain) or perhaps they have an equivalent for the Azure version now.

    Peter Jamieson

    Monday, April 27, 2020 3:44 PM
  • Thanks again

    I don't think dataflex is doing anything like that it is just making calls via COM

    I think the key to my problem is the schema.ini and more specifically why it is not finding it

    At the moment it in effect does not exist

    I''ve tried putting it in the same folder as the dataflex application itself but no difference

    **** Massive face palm ****

    My schema.ini is not called schema.ini it is called schema..ini

    Unsurprisingly it now works !

    Thanks for all your extensive help !

    Monday, April 27, 2020 4:07 PM
  • Well spotted! Phew! But since I've rarely had good follow-up in this kind of area, you've pushed me into discovering something which may simplify things anyway, which is that you seem to be able to specify the CharacterSet in the connection string, by adding e.g. this:

    ExtendedProperties=""CharacterSet=65001;"";

    (Not sure which other items in the schema.ini might be used in this way but I don't think you actually need any of the other ones).

    Peter Jamieson

    Monday, April 27, 2020 4:14 PM
  • Thanks

    Thar seems to work too but I will double check tomorrow after the day I have had today !

    It's a much better solution in a multiuser environment than having to ensure schema.ini doesn't get over wriiten 

    Monday, April 27, 2020 4:57 PM
  • Yes, that was my thinking. Unfortunately I think the only options for "the delimiter" in the connection string are "FMT=Delimited" and "FMT=Fixed", so in the end you are reliant on the Text provider being eable to detect the delimiter correctly. As long as you don't have commas in your data values, or the provider deals with them correctly inside quoted strings, you're probably OK.

    Peter Jamieson

    Monday, April 27, 2020 5:05 PM
  • Thanks again for all your help

    One quick question, I've not tested it but can the SQL Statement for the CSV have a 'simple' Where clause eg

    "SELECT * FROM [myfile.txt] Where Email<>'' and LastName<>''"

    For example ?

    I'm not expecting anything too complex but some basic filtering would be handy

    Wednesday, April 29, 2020 11:33 AM
  • Yes, for data sources accessed via the Jet/ACE provider (which is what we're doing here) the SQL dialect is basically the same as in Access (i.e. quite a powerful SQL dialect) so you can do that kind of stuff. Word imposes a limit of around 255 characters on the SQLStatement parameter - you can extend by using the other SQLStatement1 (?) parameter if necessary. If you need wildcards, use "_" for a single character and "%" for 0->many characters instead of the "?" and "*" typically used within Access.

    If you open the file *without* OLE DB then what you get is Word's internal SQL dialect, which is a really simple dialect that supports what you can specify in the MailMerge Sort/Filter dialog box (when the data source is connected) and nothing more. But that can still do the <> "" stuff for a number of columns.

    Peter Jamieson

    Wednesday, April 29, 2020 12:26 PM
  • RecordCount seems to always return -1

    If the selection results in no records then it carrues on until the Execute of the MailMerge object

    Word pops up a message saying there is no data which is fine

    Except after this I need to stop some things happening if there were no records

    Obviously Execute does not return anything and RecordCount is always -1 so what can I hook into to know the Merge did not happen due to lack of records ?

    Thanks

     

    Thursday, April 30, 2020 1:14 PM
  • You should be able to trap error 5631.

    You can certainly get a .RecordCount of -1, e.g. if you use a SELECT DISTINCT query, or a query that uses the "Like" comparison operator (even without wildcards in the match expression).


    Peter Jamieson

    Thursday, April 30, 2020 2:49 PM
  • Yes I have like in my query

    I've had a look to trap the error 5631 but unless I've missed something I can't see an OnError type event of the Application object or MailMerge object or anywhere else in the Word COM interface

    I can see events for things like spelling errors etc

    I also typed the word error in the filter here but there was nothing for Word at all

    https://docs.microsoft.com/en-us/office/vba/api/overview/word/object-model

    Thursday, April 30, 2020 3:07 PM
  • Normally in VBA we'd use its On Error and test the VBA error object, but I don't know how you do that in DataFlex - the only thing I found was  https://help.dataaccess.com/mergedProjects/DevelopmentGuide/COM_Error_Handling.htm

    Peter Jamieson

    Thursday, April 30, 2020 3:36 PM
  • Otherwise, I think you need to trap this situation before you .Execute the merge. What you do (VBA syntax) is set

    ActiveDocument.MailMerge.DataSOurce.ActiveRecord = wdLastRecord

    (wdLastRecord is a constant with value -5)
    Word then goes off and actually finds the last record. It doesn't alter .RecordCount, but .ActiveRecord will be the same as the record count, if there are records, and 0 if there are not. NB, for a very large number of data rows, it can take Word quite a long time to find the end (e.g. it took my system several minutes to find the last record in a 1-million row SQL Server table. 

    That technique is only really reliable if there is no opportunity for the user to exclude records. If they can do that, and some records have been returned but they have all been excluded, .ActiveRecord is rather unhelpfully set to 1 rather than 0 when you do this.

    Peter Jamieson

    Thursday, April 30, 2020 4:00 PM
  • Thanks for this

    I've had a play around and this is what I have found

    Having opened the datasouce simply getting ActiveDocument.MailMerge.DataSOurce.ActiveRecord is enough to tell you if the selection resulted in any records i.e. 1 if it did or 0 if it didn't

    However if I also want to be able to set FirstRecord/LastRecord then you do need to set the ActiveRecord to the last record and test if it is >= to the first record you want to set. 

    Otherwise again you won't find a problem until you execute the merge and it says it has no records to merge

    I'm not quite sure on your last para do you mean having opened the datasource you loop through and set the included property on them all to false resulting in no records that can be processed ?

    Friday, May 1, 2020 9:20 AM
  • A better solution, checking .ActiveRecord then. So in the simplest case you don't need to do what I suggested.

    Forgetting about individually included/excluded records for now...

    Let's say you actually retrieved 7 records. Let's go through some "normal" settings of .FirstRecord and .LastRecord

    .FirstRecord = -1
    .LastRecord -1 (or any legitimate value <0)
     You get records 1->7

    .FirstRecord = -1
    .LastRecord = 0

    You get record 1

    .FirstRecord = -1
    .LastRecord >= 1 and .LastRecord <=7

    You get records 1 to .LastRecord

    .FirstRecord = -1
    .LastRecord > 7

    You get records 1 to 7

    .FirstRecord >=1 and .FirstRecord <=7
    .LastRecord < 0 *or* .LastRecord > .FirstRecord

    You get records .FirstRecord to 7

    .FirstRecord > 7
    .LastRecord = -1 (or <0), or .LastRecord > 7

    You get error 5631

    If you have other combinations, such as .FirstRecord = 0, .LastRecord = -1, both values < -1, you get situations where records from the end are duplicated at the beginning, or you get records taken from the end of list rather than the beginning, and so on. 

    SO IMO if you are not sticking to "all" it is important to ensure that either *one* of .FirstRecord or .LastRecord is -1 and the other one is between 1 and the actual record count, or they are both between 1 and the actual record count and .LastRecord>=.FirstRecord.

    Either of which, in the general case requires you to know what the actual last record is, unless you're only merging the first record.

    The situation where the users can include/exclude individual records creates another layer of difficulty which I'd rather not try to deal with unless you really have to!


    Peter Jamieson

    Friday, May 1, 2020 11:38 AM
  • Thanks again

    Initially I thought we had slightly crossed wires but having read it a couple of times I see what you mean

    I was starting from a slightly different angle but the end result is the same in wanting to avoid error 5631

    1) At the point of open has the data source returned any records at all i.e. is it worth continuing

    2) We do have some records and either the end user has supplied so start and finish row numbers (possibly) or we want to process this data and merge to email in blocks of say 50 with a pause between. i.e. we need to know the total number of record and chop them up

    You have now given me enough information to achive this

    On a separate note have you done many Merge to email from Word ? We are not looking to do 1000's but say 1500 is not unusual for a large event such as a fun run entry confirmation etc

    I'm just wondering, I assume Word will be tied up for the length of time it takes to "print" the emails to Outlook, but I then presume they sit in the outbox until sent i.e. Outlook is not tied up while it sends those 1500 emails ?

    It's a little difficult to do performance bulk testing without actually sending lots emails :-) 

    Friday, May 1, 2020 1:56 PM
  • I don't do my own bulk-emails any more and rarely use Outlook so the answer is "I don't know". Perhaps a set of scheduled merge out of working hours is the way to go to avoid holdups, as long as you have some way to avoid all dialog boxes (such as the ones that "give you access" to Outlook during a merge).

    Peter Jamieson

    Friday, May 1, 2020 2:17 PM
  • No problem

    Thanks again for all your assistance, it's been MOST useful

    Cheers

    Friday, May 1, 2020 2:20 PM
  • Due to all your help I am a now bit further down the road, one question …..

    When performing the mailmerge Word does a somewhat strobe lighting effect to show you how busy it is !

    To prevent this I set the Visible of the Application object to false, which works

    However this means if there are are any problems say opening the CSV you don't see the error and everything seems to hang

    If you go into task manager you can see Word in there and you can switch task to the error dialog (say incorrect merge field name)

    Is it possible to set it up so I don't see all the flashing as it merges each page but do see any error dialogs ?

    The Pause argument on MailMerge.Execute doesn't help much here

    Thanks

    Wednesday, May 6, 2020 1:11 PM
  • Haven’t tried, but how about minimising Word?

    Peter Jamieson

    Wednesday, May 6, 2020 1:37 PM
  • I thought I would briefly explore a few other options

    1) It appears that to use OLEDB to read a CSV it has to have a CSV extension i.e it can't have any other. No biggie, just an observation

    2) I created a simple HTML table in a text editor. When I used it as a Word user as a data source I was prompted which table (there was only one) and was there a header row (I had used TH tags) not sure if this dialog could be suppressed using OpenDataSource. However two more important things, despite a BOM marker and a meta tag to say UTF8 when I clicked on edit recipients or did an actual merge it was reading it as though it was ANSI data. Secondly a TD tag containing 10.00 was truncated to 10

    3) I tried a similar test with an XLSX file which displayed the UTF8 chars OK but it still showed 10 rather than 10.00 regardless of any column formatting options I saved in the XLSX file

    Thanks

    Thursday, May 21, 2020 2:58 PM
  • If you are able to write the text file in any format you need, one possibility would be to write it as an XML WordProcessingML file in "Flat OPC format" that Word should treat as a Word document.

    What you do is start with a "minimal document". The first chunk of the file looks like this (you should not need to change anything):

    <?xml version='1.0' encoding='UTF-8'?>
    <pkg:package xmlns:pkg='http://schemas.microsoft.com/office/2006/xmlPackage'>
    
      <pkg:part pkg:name='/_rels/.rels'
        pkg:contentType='application/vnd.openxmlformats-package.relationships+xml'>
        <pkg:xmlData>
          <Relationships xmlns='http://schemas.openxmlformats.org/package/2006/relationships'>
            <Relationship Id='rid1'
              Type='http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument'
              Target='word/document.xml'/>
          </Relationships>
        </pkg:xmlData>
      </pkg:part>
    
      <pkg:part pkg:name='/word/document.xml'
        pkg:contentType='application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml'>
        <pkg:xmlData>
          <w:document xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'>
            <w:body>

    The last part of the file looks like this:

         </w:body>
          </w:document>
        </pkg:xmlData>
      </pkg:part>
    </pkg:package>
    

    Between those two predefined pieces, you need to put your data. There are various ways you can do that, but there's no real reason not to stick to "CSV" format. So if you have a data source like this:

    key,text
    1,"mytext1"
    2,"mytext2"

    then the XML would need to look like this:

    <w:p><w:r><w:t>key,text</w:t></w:r></w:p>
    <w:p><w:r><w:t>1,"mytext1"</w:t></w:r></w:p>
    <w:p><w:r><w:t>2,"mytext2"</w:t></w:r></w:p>
    If you have the characters "&" or "<" in the data, you have to "escape" them - in essence, instead of "&" use &amp; and instead of "<" use &lt;

    As for "unicode" characters, you can either insert the appropriate UTF-8 byte sequence or use, say "&#x0;" format so you could insert "Greek Small Letter Alpha" using "&#x03B1;"


    Peter Jamieson

    Thursday, May 21, 2020 4:33 PM
  • Thanks very much for this Peter

    I've ben having a play in Notepad++ and it looks like it has got some real legs

    It took me a moment to work out I should save it with a .doc extension to use it as a data source in Word as an end user but using MailMerge.OpenDataSource it does not seem fussed, which is great

    The UTF8 I have tried so far seems to work as do commas in fields and it hangs onto .00 decimal places on numbers great

    One thing I've not bottomed out is embedding a CRLF in the data to produce a soft CR or para break

    I tried

    1234,"Some Text&#x0D;&#x0A;More Text",84.00,Smith

    and the chars are stripped and remaining columns remain in the correct places but there is no line or para break in the merged document output

    Friday, May 22, 2020 10:08 AM
  • Adding the following line below the XML declaration at the top of the file helps file type recognition on Windows but it may interfere in other ways:

    <?mso-application progid="Word.Document"?>
    

    To insert line/para. breaks, there are two possibilities. In the example blow, the first one inserts a hard line break, and the second one inserts a paragraph mark. 

    <w:p><w:r><w:t>key,text</w:t></w:r></w:p>
    <w:p><w:r><w:t>1,"myte</w:t><w:br/><w:t>xt1"</w:t></w:r></w:p>
    <w:p><w:r><w:t>2,"myte</w:t></w:r></w:p><w:p><w:r><w:t>xt2"</w:t></w:r></w:p>
    


    THese are only going to work if you have quoted the relevant fields. 

    I suppose the simple way to see it is that for a hard line break you need to insert

    </w:t><w:br/><w:t>

    and for a paragraph break you need to insert 

    </w:t></w:r></w:p><w:p><w:r><w:t>



    Peter Jamieson

    Friday, May 22, 2020 10:25 AM
  • Thanks Peter

    I've not added the mso bit but the rest seems to work a treat in Notepad++

    I presume this overall WordProcessingML data source technique should work from Word 2007 onwards ?

    This is for me to solve but just out of curiosity how would you write the line of code in VB to put </w:t><w:br/><w:t> in the middle of a string that you are adding to an XML doc without it escaping the <>


    Friday, May 22, 2020 10:40 AM
  • Don't worry. It's obvious when I look at it a bit harder ! …. it is still valid XML it just didn't quite look like it at first glance

    By Bad as they say

    Friday, May 22, 2020 10:52 AM
  • Yes, Word 2007 on Windows and later recognise WordProcessingML. If you need support for earlier versions you could probably get back to 2002/XP using the older "WordML" format, to 2000 using Word's "HTML/XML round-trip" format, which I think would probably cope with the encoding issues. On the whole I think the newer WordProcessingML is a better choice primarily because it's the current standard, is more widely understood and better documented. You could use RTF, but I would have to be in a very good frame of mind to go there!

    Peter Jamieson

    Friday, May 22, 2020 11:06 AM
  • Thanks again

    I went down a RTF worm hole about 25 years ago …. I don't want to be going back :-) 

    Friday, May 22, 2020 11:10 AM
  • Hi Peter

    I'm probably going to end up giving these Word document 'data files' their own extension to avoid confusion but just to ask a few questions

    When using MailMerge.OpenDataSource is the format argument only needed if you want to convert what you are reading or if not is wdOpenFormatXMLDocument the correct value for these "document data files"

    Looking here https://en.wikipedia.org/wiki/Office_Open_XML seems to suggest they should have a DOCX extension. I've always thought (probably wrongly) that DOCX were always zip files that contain a bunch of XML files

    I guess Word might just look for PK as the first couple of bytes and if not assumes it to be a flat XML file

    Thanks

    Tuesday, May 26, 2020 3:59 PM
  • I did quite a lot of experiments many years ago and as far as I could tell, many of the parameters of OpenDataSource have no effect at all. Some of them only have an effect in certain circumstances. My assumption was that the developers probably copied the method signature of Documents.Open On the grounds that if you needed a parameter for an Open, you would likely need it for a data source Open. But then They seem to have gone in a different direction because the types of thing you can open as a data source aren’t just files. Pretty sure the Format parameter has no effect.

    re. “.xml” the thing is that strictly speaking, .docx is the only format of Office Open XML For Word actually documented in the ECMA and ISO standards for the format. The “Flat OPC format“, which is what we’re talking about here, isn’t actually covered by the standard. In fact I can’t remember if there’s even a documented schema for the extra XML elements it uses. But you have always been able to save Word documents in that format and Word uses the .xml extension for both that and the older Word 2003 XML format. You could of course create a .docx as your data source but you have to create a bunch of folders and files, and you have to be able to compress them. It’s not completely straightforward from VBA.

    so I don’t really see why you would have to name it as a .docx.

    [Word has three main “recognition” techniques for Detecting file formats. It recognises all generations and variants of its own .doc, .docx, .rtf and .xml formats internally, along with a couple of other things such as HTML. Then there’s the old converter technology where Word queries each installed external Converter to see if it can open the file. The converter is passed the file name so gets a chance to read the entire thing if need be and decide whether it can convert it.then there’s a more modern COM converter which was the basis of the original standalone .doc to .docx converter. I don’t think that last one is used by OpenDataSource.]


    Peter Jamieson


    Tuesday, May 26, 2020 4:38 PM
  • Thanks for the detailed reply. Most useful.

    One other quirk I have worked around that is true for this XML document source as well as CSV

    If you want a data source file that only contains the column headers so that you can setup a template without any data …..

    If you create a CSV or WordML with a single row of comma delimited column headings and attach to it either with OpenDatasource or as a user of Word you will get the Header Record Delimiters dialog to appear

    To avoid this write out a row of empty quoted fields to match you structure

    e.g.

    Title,FirstName,LastName
    "","",""

    Wednesday, May 27, 2020 9:07 AM
  • Yes, FWIW the old "Header data" thing isn't supported any more in the .docx format (i.e. you can create a header in VBA, but when you save as .docx and re-open, Word either reports an XML error or ignores the header, I forget which.)

    I usually used to copy the header line and make it the first data line. AFAICR the only problem then is that if you define a filter in the Edit Recipients dialog, Word will probably use quotation marks with values you want to treat as numeric, when it shouldn't, and vice versa.

    Peter Jamieson

    Wednesday, May 27, 2020 9:15 AM