none
Finding string between characters RRS feed

  • Question

  • I have an Access app that is a front-end for a MySQL DB that also has a web presence via PHP. The clever users (and, of course, clever PHP developer) want to display an entry that includes character 333. This is lower-case "o" with a diacritical macron (long vowel), or this: ō.

    In order to accommodate PHP, MySQL stores this as this: ō PHP, of course, interprets this as Unicode character 333 and correctly displays "ō".

    But I need to render this on an Access report. I wrote a function specific to this case, using this: Replace(<MyFieldName>, "&#333;", ChrW(333)).

    But how would I (in a function), handle the general case where the "333" could be any number; that is, I need to reliably retrieve whatever numbers exists between matched sets of "&#" and ";" And there could be more than one set of these in the original string.

    Thursday, July 30, 2015 4:50 PM

Answers

  • Are you saying that 333 is just an example and that there will be others? If so, regular expressions should do nicely. If not, then you need to explain more.
    • Proposed as answer by ProcessIT Thursday, July 30, 2015 10:04 PM
    • Marked as answer by Brian D. Hart Wednesday, August 5, 2015 3:40 PM
    Thursday, July 30, 2015 5:40 PM
  • Hi Brian,

    Thanks for the explanation. In that case, I agree with Alphonse that you may be able to use regular expression, and something like this might help you get started.

    Hope that helps...

    • Marked as answer by Brian D. Hart Wednesday, August 5, 2015 3:40 PM
    Thursday, July 30, 2015 5:56 PM
  • Please try the following.  The function accepts a string containing numeric character references and returns a modified string containing Unicode equivalents.

    'Requires a project reference to Microsoft VBScript Regular Expressions 5.5
     Public Function UnicodeFromNCR(strNCR As String) As String
         Dim oMatches As VBScript_RegExp_55.MatchCollection
         Dim oRegex As VBScript_RegExp_55.RegExp
         Dim strTest As String
        
         strTest = strNCR
         Set oRegex = New RegExp
         With oRegex
             .Pattern = "&#(\d{1,});"
             .Global = False
             Do While .Test(strTest)
                 Set oMatches = .Execute(strTest)
                 strTest = .Replace(strTest, ChrW(oMatches(0).SubMatches(0)))
             Loop
         End With
         Set oMatches = Nothing
         Set oRegex = Nothing
         UnicodeFromNCR = strTest
     End Function
    • Edited by Mark18991230 Wednesday, August 5, 2015 9:04 PM
    • Marked as answer by Brian D. Hart Wednesday, August 5, 2015 10:46 PM
    Wednesday, August 5, 2015 8:38 PM

All replies

  • Hi,

    Not sure I understand the issue. You're not looking for just 333 but the whole thing: &#333;, so why would there be any confusion? Also, does your function need to check for other special characters as well? If so, I would recommend storing them in a table, so you can easily look up their replacement values. Just my 2 cents...

    Thursday, July 30, 2015 5:08 PM
  • Are you saying that 333 is just an example and that there will be others? If so, regular expressions should do nicely. If not, then you need to explain more.
    • Proposed as answer by ProcessIT Thursday, July 30, 2015 10:04 PM
    • Marked as answer by Brian D. Hart Wednesday, August 5, 2015 3:40 PM
    Thursday, July 30, 2015 5:40 PM
  • This is an Instr question, I suspect. The question is not what to use for the replacement; that will always be the number between "&#" and ";", but how to identify substrings beginning with "&#" and ending with ";" and how to extract the values between those two.

    A replacement table could work but would require an entry for every Unicode character. But once I identify a substring beginning with "&#" and ending with ";", is it not possible to extract whatever characters lie between those two?

    Examples:

    My current function includes this line (hard-coded to handle Unicode character 333)

    • Replace(<source string>, "&#333;", ChrW(333))

    If my source string is this: ABC&#333;XYZ, the result would be:

    • ABCōXYZ

    But I do not know that it will always be character code 333. I could have this:

    • ABC&#339;XYZ

    To complicate things, I could have something like this:

    • ABC&#333;DEF&#339;XYZ

    Unicode 333 is ō, and Unicode 339 is œ, so needs should render as:

    • ABCōDEFœXYZ

    But how do I:

    • Find multiple instances of "&#<some number>;
    • For each instance, isolate the characters between "&#" and ";"

    Once I have that, I just pass that number to ChrW. I know I will need to iterate, replacing them one at a time until there are no more &#" and ";" left in my resulting string. I am just finding the logic to be a bit of a brain teaser.

    I guess it would go something like this:

    1. See if there is a "&#" followed by ";" in the string using If Instr(...) <> 0 and Instr (...) <> 0 and the second Instr return value is greater than the first Instr value.
    2. Find the position of the first "&#" from Instr as StartPosition
    3. Use Instr to identify position of first ";" after StartPosition, to ignore any ";" that could occur prior to Start Position (i.e. not as part of character delimiter set) - as EndPosition
    4. Extract the characters between StartPosition+2 and EndPosition-1 as CharCode
    5. Replace the Mid string from StartPosition to EndPosition with ChrW(CharCode)
    6. Loop until there are no more "&#" and ";" pairs.

    So maybe now I have the answer. I was just hoping I would not have to think that hard:)

    Thursday, July 30, 2015 5:48 PM
  • Hi Brian,

    Thanks for the explanation. In that case, I agree with Alphonse that you may be able to use regular expression, and something like this might help you get started.

    Hope that helps...

    • Marked as answer by Brian D. Hart Wednesday, August 5, 2015 3:40 PM
    Thursday, July 30, 2015 5:56 PM
  • Ha, ha! After 15 years of developing Access apps (some quite large), this is the first time I have ever heard of regular expressions. Just never needed anything quite like this before. That certainly looks like a lot more elegant than my convoluted plan (which would probably work, but why reinvent the wheel?).

    Thank you. I will try this.

    Thursday, July 30, 2015 6:05 PM
  • Hi Brian,

    You're welcome! Alphonse and I were happy to assist. Please let us know how it goes. Good luck!

    Thursday, July 30, 2015 6:20 PM
  • I am sure Regular Expressions would have done the trick. I can see, though, that Regular Expressions syntax is very powerful but also very complex, and the examples I found were generally (if not all) for extracting one or more instances of patterns within strings and not so were not particularly helpful in figuring out how to replace characters between found strings when there is an unknown number of instances of a varying number of characters between two known strings.

    To be clear, I am sure that is a limitation of my time and brain capacity, not one of your answer or the MS scripting developers. So I eventually opted to accomplish the same thing using Instr to find the relative positions of the two the Unicode wrapper strings("&#" and ";") then simply rebuild my string, using Mid to manually replace strings by position rather than using Replace by found string. I loop until there are no more Unicode wrapper pairs.

    Public Function ReplaceSpecialCharacters(strBase As Variant) As String
    If IsNull(strBase) Then
        ReplaceSpecialCharacters = ""
        Exit Function
    Else
        ReplaceSpecialCharacters = strBase
    End If
    'declare variables
        Dim strStart As String
        Dim StartPosition As Integer
        Dim strEnd As String
        Dim EndPosition As Integer
        Dim strBetween As Variant
        Dim strReplacement As String
    'replace Unicode character codes with Unicode characters
        strStart = "&#"
        strEnd = ";"
        Do While InStr(ReplaceSpecialCharacters, strStart) > 0 And InStr(ReplaceSpecialCharacters, strEnd) > InStr(ReplaceSpecialCharacters, strStart)
            StartPosition = InStr(ReplaceSpecialCharacters, strStart)
            EndPosition = InStr(ReplaceSpecialCharacters, strEnd)
            If StartPosition > 0 And EndPosition > StartPosition Then 'both strings are found
                strBetween = Mid(ReplaceSpecialCharacters, StartPosition + 2, EndPosition - StartPosition - 2) 'get the Unicode character code between "&#" and ";"
                strReplacement = ChrW(strBetween)
                ReplaceSpecialCharacters = Left(ReplaceSpecialCharacters, StartPosition - 1) & strReplacement & Mid(ReplaceSpecialCharacters, EndPosition + 1)
            End If
        Loop
    End Function

    Wednesday, August 5, 2015 3:56 PM
  • Hi Brian. Congratulations on finding a solution that works for you. You did say that you didn't want to reinvent the wheel, but I would also do it if I was more comfortable with creating it from scratch. Good luck with your project.
    Wednesday, August 5, 2015 7:07 PM
  • Thanks...but I also did mark your posts as answers, since it looks like it would have been the ticket had I a better understanding of Regular Expressions. And now I know another option to try next time I run into one of these!
    Wednesday, August 5, 2015 7:13 PM
  • Hi Brian. Thanks for that. Just FYI, you can also user RegEx to "replace" matching patterns. My code sample just displays them instead. Cheers!
    Wednesday, August 5, 2015 7:24 PM
  • I could see that RegEx could do replacements, but this is not a simple pattern replacement or even multiple pattern replacement. I have to find two strings ("&#" and ";), extract the number represented by the characters between them, then replace the whole thing (from "&#" to ";", including the number) with the Unicode character represented by the number between those two substrings. To complicate things, there could be more than one of these, each having a different number of numeric characters between the "&#" and ";" in my source string. That is:

    So, for example, I need to convert this:

    ABC&#333;DEF&#339;XYZ

    so that it correctly displays thus:

    ABCōDEFœXYZ

     This effectively replaces this

    &#333;

    with this

    ō

    and this

    &#339;

    with this

    œ

    The numbers 333 & 339 being the number for the required Unicode characters ō and œ.

    I suspected I might have to loop through the source string even if I used Regular Expressions. Am I missing something there?

    It is quite the brain teaser for me. Haven't had one of these in a while.

    Wednesday, August 5, 2015 8:23 PM
  • Please try the following.  The function accepts a string containing numeric character references and returns a modified string containing Unicode equivalents.

    'Requires a project reference to Microsoft VBScript Regular Expressions 5.5
     Public Function UnicodeFromNCR(strNCR As String) As String
         Dim oMatches As VBScript_RegExp_55.MatchCollection
         Dim oRegex As VBScript_RegExp_55.RegExp
         Dim strTest As String
        
         strTest = strNCR
         Set oRegex = New RegExp
         With oRegex
             .Pattern = "&#(\d{1,});"
             .Global = False
             Do While .Test(strTest)
                 Set oMatches = .Execute(strTest)
                 strTest = .Replace(strTest, ChrW(oMatches(0).SubMatches(0)))
             Loop
         End With
         Set oMatches = Nothing
         Set oRegex = Nothing
         UnicodeFromNCR = strTest
     End Function
    • Edited by Mark18991230 Wednesday, August 5, 2015 9:04 PM
    • Marked as answer by Brian D. Hart Wednesday, August 5, 2015 10:46 PM
    Wednesday, August 5, 2015 8:38 PM
  • So, for example, I need to convert this:

    ABC&#333;DEF&#339;XYZ

    so that it correctly displays thus:

    ABCōDEFœXYZ

    Hi Brian,

    A part of my "clients" are poor volunteer organizations, working mostly on "old" hardware, in quite different environments. That is why I try not to use RegEx, because of the external reference.

    I can give you a small function with the functionality as in the above citation, if you are interested. But only tomorrow, as I am in a little hurry on this moment.

    Imb.,

    Wednesday, August 5, 2015 10:07 PM
  • Mark,

    I tested and see that this accomplishes the same thing as my logic, but more concisely, for sure.

    Thank you.

    Wednesday, August 5, 2015 10:46 PM
  • You're welcome!
    Wednesday, August 5, 2015 11:16 PM