locked
Expert: Texttext uniqueness RRS feed

  • Question

  • User761125113 posted

     Hi all,

     

    I trying to make a script in asp.net (vb)

    that checks the uniqueness of an text like this website does http://www.dupecop.com/duplicate-content-checker.php

    anyone know some sample code or help to give me a start

    thanks a lot,

    mike

    Saturday, January 3, 2009 9:43 PM

Answers

  • User1006193418 posted

    HI Shengqing,

    thanks for the code, but it is a little bit more complex then that.

    make it multiline text box and enter this text in left: 

    " I have tested the web page you provided above, however, I am confused on how it works to compare the texts in two textbox. I always get the result of 0% or 50% but don't know how it comes."

    and right just remove first part, left: 

    "provided above, however, I am confused on how it works to compare the texts in two textbox. I always get the result of 0% or 50% but don't know how it comes."

    it considers it 100% unique wich it is not since it has all the text from the first text box so it should be 0%,

    Thanks

    Hi Xcraft,

    Yes. According to the code I provided before, the two sentences are totaly different because from the first word, it is 'I' in the first sentence but 'provided' in the second one, to the last word, the code compares them one by one based on the order and find no one the same.

    Here is another version I worked for just now. It gathers each of the words in the sencond sentence and try to find the same one in the first sentence. If there is the same words, the code reduces the unique rate. I think this version is much more close to what you are after. 

    Protected Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click
       Dim s1, s2 As String()
        Dim result As Single = 100
        s1 = TextBox1.Text.Split(" ")
        s2 = TextBox2.Text.Split(" ")
    
        Dim cell As Single = 100 / s2.Length
        Dim isCheck(s1.Length - 1) As Boolean
    
        For i As Integer = 0 To s2.Length - 1
            For j As Integer = 0 To s1.Length - 1
                If isCheck(j) = False AndAlso s2(i) = s1(j) Then
                    result -= cell
                    isCheck(j) = True
                    Exit For
                End If
            Next j
        Next i
    
        Response.Write(String.Format("{0:f2}%", result))
    End Sub

    You can replace the Button_Click event handler in the previous version with the code above and have another try.

    In addition, I tested these two texts in the web tool you gave(http://www.dupecop.com/duplicate-content-checker.php), but have no idea how the result comes:

    Best Regards,
    Shengqing Yang

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, January 8, 2009 10:39 PM

All replies

  • User1006193418 posted

    Xcraft

     Hi all,

     

    I trying to make a script in asp.net (vb)

    that checks the uniqueness of an text like this website does http://www.dupecop.com/duplicate-content-checker.php

    anyone know some sample code or help to give me a start

    thanks a lot,

    mike

    Hi Mike,

    I have tested the web page you provided above, however, I am confused on how it works to compare the texts in two textbox. I always get the result of 0% or 50% but don't know how it comes.

    Could you explain a bit more about the method that the tool works on? It will be much helpful if we can know the working principium. Thanks for your understanding.

    Best Regards,
    Shengqing Yang

    Wednesday, January 7, 2009 12:11 AM
  • User761125113 posted

    Hi Shengqing,

     

    just enter a small text, the same in both both boxes.

    then just change 1 word at a time, you'll see % changing (use IE)

     logic behind it is quite simple ex:

    if you have a text with 100 words, if you would change 1 word that would be 1 % difference.

    i see that the online tool is not good because if i change the order of the sentences it considers it unique

     

    Wednesday, January 7, 2009 4:23 PM
  • User1006193418 posted

    Hi Shengqing,

    just enter a small text, the same in both both boxes.

    then just change 1 word at a time, you'll see % changing (use IE)

    logic behind it is quite simple ex:

    if you have a text with 100 words, if you would change 1 word that would be 1 % difference.

    i see that the online tool is not good because if i change the order of the sentences it considers it unique

    Hi Xcraft,

    Actually, I am still a bit confused about the way this tool works. For example, I compare 'abcdefg' and 'aaaaaaa' and the tool return me 50% unique. I have no sence about how this result comes[:S].

    However, according to your kindly explanation, I worked out something like below. It compares each word in two textboxes and tell us how many words of them are different. 

    <%@ Page Language="vb" %>
    
    <script runat="server">
        Protected Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click
            Dim s1, s2 As String()
            Dim result As Single = 100
            s1 = TextBox1.Text.Split(" ")
            s2 = TextBox2.Text.Split(" ")
            Dim cell As Single = 100 / s1.Length
    
            For i As Integer = 0 To s1.Length - 1
                If (s2.Length > i) Then
                    If s1(i) = s2(i) Then
                        result -= cell
                    End If
                End If
            Next
    
            Response.Write(String.Format("{0:f2}%", result))
        End Sub
    </script>
    
    <html>
    <head id="Head1" runat="server">
        <title></title>
    </head>
    <body>
        <form id="form1" runat="server">
        <div>
            <asp:TextBox ID="TextBox1" runat="server"></asp:TextBox>
            <asp:TextBox ID="TextBox2" runat="server"></asp:TextBox>
            <asp:Button ID="Button1" runat="server" Text="Button" />
        </div>
        </form>
    </body>
    </html>

    Here are some tests I did:

    'a b c d e f g' and 'h i j k l m n' returns 100.00%

    'a b c d e f g' and 'h i j k l f n' returns 85.71%

    Hope this demo could get you start with the task[:)].

    Best Regards,
    Shengqing Yang

    Thursday, January 8, 2009 2:40 AM
  • User761125113 posted

    HI Shengqing,

     

    thanks for the code, but it is a little bit more complex then that.

    make it multiline text box and enter this text in left: 

    " I have tested the web page you provided above, however, I am confused on how it works to compare the texts in two textbox. I always get the result of 0% or 50% but don't know how it comes."

    and right just remove first part, left: 

     "provided above, however, I am confused on how it works to compare the texts in two textbox. I always get the result of 0% or 50% but don't know how it comes."

    it considers it 100% unique wich it is not since it has all the text from the first text box so it should be 0%,

     

    Thanks

    Thursday, January 8, 2009 4:40 PM
  • User1006193418 posted

    HI Shengqing,

    thanks for the code, but it is a little bit more complex then that.

    make it multiline text box and enter this text in left: 

    " I have tested the web page you provided above, however, I am confused on how it works to compare the texts in two textbox. I always get the result of 0% or 50% but don't know how it comes."

    and right just remove first part, left: 

    "provided above, however, I am confused on how it works to compare the texts in two textbox. I always get the result of 0% or 50% but don't know how it comes."

    it considers it 100% unique wich it is not since it has all the text from the first text box so it should be 0%,

    Thanks

    Hi Xcraft,

    Yes. According to the code I provided before, the two sentences are totaly different because from the first word, it is 'I' in the first sentence but 'provided' in the second one, to the last word, the code compares them one by one based on the order and find no one the same.

    Here is another version I worked for just now. It gathers each of the words in the sencond sentence and try to find the same one in the first sentence. If there is the same words, the code reduces the unique rate. I think this version is much more close to what you are after. 

    Protected Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click
       Dim s1, s2 As String()
        Dim result As Single = 100
        s1 = TextBox1.Text.Split(" ")
        s2 = TextBox2.Text.Split(" ")
    
        Dim cell As Single = 100 / s2.Length
        Dim isCheck(s1.Length - 1) As Boolean
    
        For i As Integer = 0 To s2.Length - 1
            For j As Integer = 0 To s1.Length - 1
                If isCheck(j) = False AndAlso s2(i) = s1(j) Then
                    result -= cell
                    isCheck(j) = True
                    Exit For
                End If
            Next j
        Next i
    
        Response.Write(String.Format("{0:f2}%", result))
    End Sub

    You can replace the Button_Click event handler in the previous version with the code above and have another try.

    In addition, I tested these two texts in the web tool you gave(http://www.dupecop.com/duplicate-content-checker.php), but have no idea how the result comes:

    Best Regards,
    Shengqing Yang

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, January 8, 2009 10:39 PM
  • User761125113 posted

     HI Shengqing,

     This is perfect thanks a lot. only needed to remove the line breaks and the the dubbel space.

    only thing it if thare are the same words but in difrent order, making a totaly diffrent content, it is still since as not original ex:

     

    love running dogs

    dogs love running

     

    but for me it works fine. ad least with thsi script we know the text is unique.

     

    thanks a lot !!

    Friday, January 9, 2009 3:05 AM
  • User1006193418 posted

    Hi Xcraft,

    Sorry for the late.

    For the problem:

    only thing it if thare are the same words but in difrent order, making a totaly diffrent content, it is still since as not original ex:

    love running dogs

    dogs love running

    I think we still have the way to make it solved.

    Have a try on the code below: 

    Protected Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click
       Dim s1, s2 As String()
        Dim result As Single = 100
        s1 = TextBox1.Text.Split(" ")
        s2 = TextBox2.Text.Split(" ")
    
        Dim cell As Single = 100 / s2.Length
        Dim isCheck(s1.Length - 1) As Boolean
    
        For i As Integer = 0 To s2.Length - 1
            'this line is the only place I modified to meet the order requrement
            For j As Integer = IIf(i < s1.Length - 1, i, s1.Length - 1) To s1.Length - 1
                If isCheck(j) = False AndAlso s2(i) = s1(j) Then
                    result -= cell
                    isCheck(j) = True
                    Exit For
                End If
            Next j
        Next i
    
        Response.Write(String.Format("{0:f2}%", result))
    End Sub

    When I tested it with 'love running dogs' and 'dogs love running', the restult comes to 66.7% for ''love running' in these two texts are the same.

    Best Regareds,
    Shengqing Yang

    Sunday, January 11, 2009 10:01 PM
  • User761125113 posted

    Hi Shengqin,

     

    seems like this is not that easy

    try these 2 text (i only changed the bottom text to  top)

     Thats because these areas have the highest percentages of mortgage holders whose monthly housing costs

    >>>

    housing costs Thats because these areas have the highest percentages of mortgage holders whose monthly 

     87,50%!!!!

     

    Monday, January 12, 2009 1:45 PM
  • User1006193418 posted

    Xcraft

    Hi Shengqin,

    seems like this is not that easy

    try these 2 text (i only changed the bottom text to  top)

    Thats because these areas have the highest percentages of mortgage holders whose monthly housing costs

    >>>

    housing costs Thats because these areas have the highest percentages of mortgage holders whose monthly 

    87,50%!!!!

    Hi Xcraft,

    After some texts, I think the problem is caused by the comparing order. Still the example you provided above, if we input them in this way:

    TextBox1: Thats because these areas have the highest percentages of mortgage holders whose monthly housing costs

    TextBox2: housing costs Thats because these areas have the highest percentages of mortgage holders whose monthly

    We will get the result as 13.33%.

    So, I peel the comparator off from Button_Click event handler so that we can compare the texts twice and return the less result to the user.

    Here is the lastest version. Try it and free to tell me if there are any more questions: 

    <%@ Page Language="VB" %>
    
    
    <script runat="server">
        Protected Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click
            Dim s1, s2 As String()
            
    
            s1 = TextBox1.Text.Trim.Split(" ")
            s2 = TextBox2.Text.Trim.Split(" ")
            
    
            Dim result1 As Single = TextCompare(s1, s2)
            Dim result2 As Single = TextCompare(s2, s1)
            
    
            Response.Write(String.Format("{0:f2}%", IIf(result1 < result2, result1, result2)))
                    
    
        End Sub
        
    
        Protected Function TextCompare(ByVal s1 As String(), ByVal s2 As String()) As Single
            Dim result As Single = 100
    
    
            Dim cell As Single = 100 / s2.Length
            Dim isCheck(s1.Length - 1) As Boolean
    
    
            For i As Integer = 0 To s2.Length - 1
                For j As Integer = IIf(i < s1.Length - 1, i, s1.Length - 1) To s1.Length - 1
                    If isCheck(j) = False AndAlso s2(i).ToLower = s1(j).ToLower Then
                        result -= cell
                        isCheck(j) = True
                        Exit For
                    End If
                Next j
            Next i
            
    
            Return result
        End Function
    </script>
    
    
    <html>
    <head id="Head1" runat="server">
        <title></title>
    </head>
    <body>
        <form id="form1" runat="server">
        <div>
            <asp:TextBox ID="TextBox1" runat="server"></asp:TextBox>
            <asp:TextBox ID="TextBox2" runat="server"></asp:TextBox>
            <asp:Button ID="Button1" runat="server" Text="Button" />
        </div>
        </form>
    </body>
    </html>

    PS. I did a little change to make the comparator not case-sensitive as well. Hope you like it.

    Best Regards,
    Shengqing Yang

    Tuesday, January 13, 2009 3:55 AM
  • User761125113 posted

     Hi Shengqing,

     

    Thanks again.

    it seems to work. they question is always what is a unique text?

    I would to find out what google considers unique ;-)

     

    Thanks a lot

    Mike

    Tuesday, January 13, 2009 6:36 AM
  • User761125113 posted

      Hi Shengqing,

     

    seems there is a problem still ;-)

     

    try to compare 2 text like this:

    1- i'm a big boy in a small world.

    2- i'm a boy in a world. 

     

    see, 0% diffrence :-(

     

    maybe count?

     

    Thanks a lot mike 

    Sunday, February 1, 2009 12:44 PM
  • User-1630302068 posted

     Maybe you want the Darmerau-Levenshtien Distance?

    Tuesday, February 3, 2009 8:57 AM