none
Unicode characters in the file name

    Question

  • Hi, I did a search on the forum but I couldn't find out any answers to my question. I recently got a client who had Unicode characters in the file name and VFP FILE() func. fails to locate the file.

    Eg. FILE("C:\..\Desktop\PRJ0004128800PCREP0002_C.txt") && returns .F. though the file exists.

    I’m aware of the fact the VFP uses ANSI chars. But I was just wondering whether there is any workaround for this issue. I meant can we get the VFP FILE() func. working by changing any regional settings or changing the code pages, etc....?

    If not, how can I identify whether there is any Unicode characters/multibyte characters in a string using VFP?

    Thanks in Advance!


    Cheers!

    Thursday, September 13, 2012 5:03 AM

Answers

  • Using VFP9 SP2 on a XP SP3 machine.

    Your file name shows small squares in place of the - hypen under Windows Explorer.

    As you stated File() does not find the file.

    GetFile() will list the file, but when selected it seems to automatically convert the Unicode hypen to ANSI/ASC  code 45 for a hypen. So the filename is "changed" into ANSI code. So a subsequent FILE() does not find a match.

    Even the low level File System Object and its GetFile method did not find it with a hard coded file name. See code.

    m.ls1 = "Found"
    TRY
    * unicode file name
    m.gofs = CREATEOBJECT("Scripting.FileSystemObject")
    m.lobj2 = gofs.GetFile("C:\Program Files\Microsoft Visual FoxPro 9\AAA\P-1-8-PC-RP-0_C.txt")
    CATCH
    	m.ls1 = "Not Found"
    ENDTRY
    WAIT WINDOW "Type is " + VARTYPE(lobj2) + " " + m.ls1 && L = logical or O = object
    RELEASE lobj2, gofs

    Even using the GetFolder() method and iterating through the Files collection using ASC() on each Char in the file name did not detect it. See below.

    m.ls1 = ""
    m.lobj2 = gofs.GetFolder("C:\Program Files\Microsoft Visual FoxPro 9\AAA")
    m.lobj3 = m.lobj2.Files
    FOR EACH afile IN m.lobj3
    	m.ls3 = ""
    	m.ls2 = ""
    	m.ls1 = afile.Name
    	FOR m.ln1 = 1 TO LEN(m.ls1)
    		m.ls3 = m.ls3 + CHR(ASC(SUBSTR(m.ls1, m.ln1, 1 )))
    	    m.ls2 = m.ls2 + SUBSTR(m.ls1, m.ln1, 1) + " is "  && Display a character
    		try
    	  	    m.ls2 = + m.ls2 + STR(ASC(SUBSTR(m.ls1, m.ln1, 1 )),4,0) + CS_CRLF    && Display ANSI value
    	 	CATCH
    	   	    m.ls2 = + m.ls2 + "??" + CS_CRLF 
    	 	ENDTRY
    	ENDFOR
    	MESSAGEBOX(m.ls2 + CS_CRLF + m.ls3)
    NEXT
    RELEASE lobj2, lobj3
    So, for me it looks as if you cannot detect Unicode chars in a file name.

    Thursday, September 13, 2012 11:48 AM
  • I don't even see unicode characters in your posting!?

    Obviously you can't write a UNICODE string literal in a prg, as also prg files have a codepage, which can differ from country to country, but not be unicode. So you'd have to begin with your source code, rather using STRCONV() or similar to get from ANSI code/text to a variable value holding unicode to use with Win API unicode file functions.

    Nevertheless, there is a way to get a ascii file name via the ADIR option to retrieve DOS8.3 format file names, which also Win7 still hands out.

    I created a file 'd:\temp\α-version.txt' and using ADIR(laFile,'*version.txt','',2) get back -VERSI~1.TXT as the DOS name. FILE('-VERSI~1.TXT') does return .F., but ADIR(laFile,'-VERSI~1.TXT') finds it and returns 1, FOPEN('-VERSI~1.TXT') can open it, for example.

    You can also make use of Scripting.Filesystemobject, if also setting SYS(3101) to translate into something reversable to unicode, eg to retrieve UTF-8:

    o=CreateObject("Scripting.Filesystemobject")
    Sys(3101,65001) && retrieve UTF-8 from COM objects
    oFolder = o.GetFolder('d:\temp\')
    For each oFile in oFolder.Files
    lcFilename = oFile.Name
     ? lcFilename
     ? o.FileExists('d:\temp\'+lcFilename)
    EndFor 

    Essential here is the SYS(3101,65001) setting doing a conversion from whatever a COM object like Scripting.Filesystemobject wants or returns to UTF-8. The file name isn't readable, but the UTF-8 bytes come in to vfp and when you pass such a file name into FileExists() unicode arrives there.

    So when you need a Unicode file name literal, you can either retrieve a UTF-8 filename from Scripting.Filesystemobject or when you know the utf-8 bytes from eg character tables in the internet, you can compose the filename this way and use FileExists. In this case alpha is ce b1, used via 0hceb1 here:

    o=CreateObject("scripting.filesystemobject")
    sys(3101,65001)
    
    lcFilename ='d:\temp\'+0hceb1+'-version.txt'
    ? o.FileExists(lcFilename)

    Bye, Olaf.


    • Edited by Olaf Doschke Friday, September 14, 2012 7:15 AM
    • Proposed as answer by EnglishBob2Editor Friday, September 14, 2012 10:25 AM
    • Marked as answer by eCasper Wednesday, September 19, 2012 10:18 AM
    Friday, September 14, 2012 6:46 AM
  • >I really don't understand what is happening here.

    Well, there is a lot of automatic translation of codepages going on behind the scenes, even with COPY&PASTE. The clipboard is capable to copy Unicode, eg from Browser to Word, but Foxpro will get it translated to Ansi in the normal case.

    The clipboard is more complex than Foxpros system variable _cliptext. Look into foxtools.chm and you find it offers GetClipDat() to get clipboard data as CF_Text, but also CF_OEMText. Windows API offers the one you'd need: CF_UNICODE: http://winapi.freetechsecrets.com/win32/WIN32GetClipboardData.htm

    SYS(3101) also is just a hint on what happens at transitions from processes, here in the case of the foxpro world and the COM world. Windows is a mixed world, for historical reasons. Most any other application but Foxpro is capable to handle unicode, but there still are mostly pairs of Win API function for ANSI and Unicode, besides several conversion functions to even other UTF-7 -8 -16 , Double Byte cahracter sets etc.

    Without having installed Japanese, ASC("]") still is just it's ascii value of 93. I don't know how ASC() would ever return something >255, but actually ANSI codepages can also have double byte chars.

    You can always think of a foxpro string as a byte array, capable to contain anything in any codepage. Foxpro just can't put that into source code, as source code is translated into a codepage, so you better convert to binary (0h prefix) or you could also use strconv() to convert to base64, anything you can put into the prg editor codepage without losing the original character information.

    For example, if I copy α into a fox window it get's '?', chr(63) - as any unicode char that has no equivalent in Ansi1252. I get the unicode bytes (hex b103), if I do:

    Set Library To Foxtools additive
    OpenClip(_vfp.hwnd)
    ? CreateBinary(GetClipDat(13))

    With formats 1 and 7 GetClipDat() returns ? and a as a replacement for α.

    Be warned, only one process can have the clipboard open this way, so also CloseClip() to be able to paste to other windows. Read more in foxtools.chm.

    Bye, Olaf.


    • Edited by Olaf Doschke Friday, September 14, 2012 11:13 AM
    • Marked as answer by eCasper Wednesday, September 19, 2012 10:18 AM
    Friday, September 14, 2012 11:08 AM

All replies

  • Hi eCasper,

    Which VFP Version using ?


    Please "Mark as Answer" if this post answered your question. :)

    Kalpesh Chhatrala | Software Developer | Rajkot | India

    Kalpesh 's Blog

    VFP Form to C#, Vb.Net Conversion Utility

    Thursday, September 13, 2012 6:04 AM
  • Hi Kalpesh, it's VFP 9 SP2.

    Cheers!

    Thursday, September 13, 2012 6:07 AM
  • Try with Single Quote

     FILE('C:\..\Desktop\PRJ0004‐128‐800‐PC‐REP‐0002_C.txt')

    or if File is Hidden or try like below

     FILE('C:\..\Desktop\PRJ0004‐128‐800‐PC‐REP‐0002_C.txt',1)



    Please "Mark as Answer" if this post answered your question. :)

    Kalpesh Chhatrala | Software Developer | Rajkot | India

    Kalpesh 's Blog

    VFP Form to C#, Vb.Net Conversion Utility

    Thursday, September 13, 2012 6:30 AM
  • Hi Kalpesh, I’m sorry but I don’t think you understood my question. Please try to create a file by the same name I mentioned (i.e. copy and past the file name with Unicode characters in it) on your desktop. Then you will noticed VFP FILE() func. fails to locate the file created on your desktop.

    PS: Sometimes above method will not work as when you copy & paste, it will remove the Unicode characters in it.


    Cheers!


    • Edited by eCasper Thursday, September 13, 2012 7:06 AM
    Thursday, September 13, 2012 7:01 AM
  • Unicode character in my case become ???? marks.. 

    VFP not support unicode as per my knowledge. 


    Please "Mark as Answer" if this post answered your question. :)

    Kalpesh Chhatrala | Software Developer | Rajkot | India

    Kalpesh 's Blog

    VFP Form to C#, Vb.Net Conversion Utility

    Thursday, September 13, 2012 8:56 AM
  • Do you think we can identify Unicode characters in a string using VFP?

    Cheers!

    Thursday, September 13, 2012 9:06 AM
  • Using VFP9 SP2 on a XP SP3 machine.

    Your file name shows small squares in place of the - hypen under Windows Explorer.

    As you stated File() does not find the file.

    GetFile() will list the file, but when selected it seems to automatically convert the Unicode hypen to ANSI/ASC  code 45 for a hypen. So the filename is "changed" into ANSI code. So a subsequent FILE() does not find a match.

    Even the low level File System Object and its GetFile method did not find it with a hard coded file name. See code.

    m.ls1 = "Found"
    TRY
    * unicode file name
    m.gofs = CREATEOBJECT("Scripting.FileSystemObject")
    m.lobj2 = gofs.GetFile("C:\Program Files\Microsoft Visual FoxPro 9\AAA\P-1-8-PC-RP-0_C.txt")
    CATCH
    	m.ls1 = "Not Found"
    ENDTRY
    WAIT WINDOW "Type is " + VARTYPE(lobj2) + " " + m.ls1 && L = logical or O = object
    RELEASE lobj2, gofs

    Even using the GetFolder() method and iterating through the Files collection using ASC() on each Char in the file name did not detect it. See below.

    m.ls1 = ""
    m.lobj2 = gofs.GetFolder("C:\Program Files\Microsoft Visual FoxPro 9\AAA")
    m.lobj3 = m.lobj2.Files
    FOR EACH afile IN m.lobj3
    	m.ls3 = ""
    	m.ls2 = ""
    	m.ls1 = afile.Name
    	FOR m.ln1 = 1 TO LEN(m.ls1)
    		m.ls3 = m.ls3 + CHR(ASC(SUBSTR(m.ls1, m.ln1, 1 )))
    	    m.ls2 = m.ls2 + SUBSTR(m.ls1, m.ln1, 1) + " is "  && Display a character
    		try
    	  	    m.ls2 = + m.ls2 + STR(ASC(SUBSTR(m.ls1, m.ln1, 1 )),4,0) + CS_CRLF    && Display ANSI value
    	 	CATCH
    	   	    m.ls2 = + m.ls2 + "??" + CS_CRLF 
    	 	ENDTRY
    	ENDFOR
    	MESSAGEBOX(m.ls2 + CS_CRLF + m.ls3)
    NEXT
    RELEASE lobj2, lobj3
    So, for me it looks as if you cannot detect Unicode chars in a file name.

    Thursday, September 13, 2012 11:48 AM
  • Hi EnglishBob2, So it seems we are passing wrong value to the each of the above functions.

    a) Isn't it because we don't have the correct font installed?

    b) If VFP not support Unicode at all, how is it able to save Unicode data in VFP databases, how does it manage to show Unicode texts in labels, etc....? 

    c) Will VFP be able to read the file name by selecting correct language (with Unicode, eg. Javanese ) as the language to match the language version of the non-Unicode programs you want to use via Control Panel, Regional and Language Options, Advance tab?

    d) Will it make any difference to the end result by changing the VFP code page?


    Cheers!



    • Edited by eCasper Friday, September 14, 2012 4:32 AM
    Friday, September 14, 2012 4:30 AM
  • I don't even see unicode characters in your posting!?

    Obviously you can't write a UNICODE string literal in a prg, as also prg files have a codepage, which can differ from country to country, but not be unicode. So you'd have to begin with your source code, rather using STRCONV() or similar to get from ANSI code/text to a variable value holding unicode to use with Win API unicode file functions.

    Nevertheless, there is a way to get a ascii file name via the ADIR option to retrieve DOS8.3 format file names, which also Win7 still hands out.

    I created a file 'd:\temp\α-version.txt' and using ADIR(laFile,'*version.txt','',2) get back -VERSI~1.TXT as the DOS name. FILE('-VERSI~1.TXT') does return .F., but ADIR(laFile,'-VERSI~1.TXT') finds it and returns 1, FOPEN('-VERSI~1.TXT') can open it, for example.

    You can also make use of Scripting.Filesystemobject, if also setting SYS(3101) to translate into something reversable to unicode, eg to retrieve UTF-8:

    o=CreateObject("Scripting.Filesystemobject")
    Sys(3101,65001) && retrieve UTF-8 from COM objects
    oFolder = o.GetFolder('d:\temp\')
    For each oFile in oFolder.Files
    lcFilename = oFile.Name
     ? lcFilename
     ? o.FileExists('d:\temp\'+lcFilename)
    EndFor 

    Essential here is the SYS(3101,65001) setting doing a conversion from whatever a COM object like Scripting.Filesystemobject wants or returns to UTF-8. The file name isn't readable, but the UTF-8 bytes come in to vfp and when you pass such a file name into FileExists() unicode arrives there.

    So when you need a Unicode file name literal, you can either retrieve a UTF-8 filename from Scripting.Filesystemobject or when you know the utf-8 bytes from eg character tables in the internet, you can compose the filename this way and use FileExists. In this case alpha is ce b1, used via 0hceb1 here:

    o=CreateObject("scripting.filesystemobject")
    sys(3101,65001)
    
    lcFilename ='d:\temp\'+0hceb1+'-version.txt'
    ? o.FileExists(lcFilename)

    Bye, Olaf.


    • Edited by Olaf Doschke Friday, September 14, 2012 7:15 AM
    • Proposed as answer by EnglishBob2Editor Friday, September 14, 2012 10:25 AM
    • Marked as answer by eCasper Wednesday, September 19, 2012 10:18 AM
    Friday, September 14, 2012 6:46 AM
  • I got the feeling that there is something in it.

    I manged to get the FILE function working after installing Asian languages and selecting "Japanese" as the language to match the language version of the non-Unicode programs you want to use via Control Panel, Regional and Language Options, Advance tab.

    Also I changed the IDE font to MS Mincho but still VFP did not locate the file. When I copy the file path into the IDE it appeared as follows.

    C:\..\Desktop\PRJ0004-128-800-PC-REP-0002_C.txt

    Then I copied the file path into a word document then copy it back to the VFP. i.e. FILE(_Cliptext) returns .T. now. also when I copy the file path into the IDE, it appeared as follows.

    C:\..\Desktop\PRJ0004]128]800]PC]REP]0002_C.txt

    I also checked the ASC("]") value that returns 33117. i.e. CHR(33117) return "‐" which is the correct value.

    I really don't understand what is happening here.


    Cheers!



    • Edited by eCasper Friday, September 14, 2012 6:51 AM
    Friday, September 14, 2012 6:47 AM
  • >I really don't understand what is happening here.

    Well, there is a lot of automatic translation of codepages going on behind the scenes, even with COPY&PASTE. The clipboard is capable to copy Unicode, eg from Browser to Word, but Foxpro will get it translated to Ansi in the normal case.

    The clipboard is more complex than Foxpros system variable _cliptext. Look into foxtools.chm and you find it offers GetClipDat() to get clipboard data as CF_Text, but also CF_OEMText. Windows API offers the one you'd need: CF_UNICODE: http://winapi.freetechsecrets.com/win32/WIN32GetClipboardData.htm

    SYS(3101) also is just a hint on what happens at transitions from processes, here in the case of the foxpro world and the COM world. Windows is a mixed world, for historical reasons. Most any other application but Foxpro is capable to handle unicode, but there still are mostly pairs of Win API function for ANSI and Unicode, besides several conversion functions to even other UTF-7 -8 -16 , Double Byte cahracter sets etc.

    Without having installed Japanese, ASC("]") still is just it's ascii value of 93. I don't know how ASC() would ever return something >255, but actually ANSI codepages can also have double byte chars.

    You can always think of a foxpro string as a byte array, capable to contain anything in any codepage. Foxpro just can't put that into source code, as source code is translated into a codepage, so you better convert to binary (0h prefix) or you could also use strconv() to convert to base64, anything you can put into the prg editor codepage without losing the original character information.

    For example, if I copy α into a fox window it get's '?', chr(63) - as any unicode char that has no equivalent in Ansi1252. I get the unicode bytes (hex b103), if I do:

    Set Library To Foxtools additive
    OpenClip(_vfp.hwnd)
    ? CreateBinary(GetClipDat(13))

    With formats 1 and 7 GetClipDat() returns ? and a as a replacement for α.

    Be warned, only one process can have the clipboard open this way, so also CloseClip() to be able to paste to other windows. Read more in foxtools.chm.

    Bye, Olaf.


    • Edited by Olaf Doschke Friday, September 14, 2012 11:13 AM
    • Marked as answer by eCasper Wednesday, September 19, 2012 10:18 AM
    Friday, September 14, 2012 11:08 AM