locked
Get Url in a text file RRS feed

  • Question

  • Hi

    Im working on an appliccation that reads all image file url (jpg,gif...) then uploads using ftp and replaces with another link & saves.

    This is my code so far: (FORM1_FTP is ftp uploader sub)

    Try Dim mydir As String = "C:/" Dim savetxt As New List(Of String) For Each txtfile As String In System.IO.Directory.GetFiles(mydir, "*.txt") 'sadece txt dosyalarını alıyor. Hepsi için ayarlanabilir. For Each line As String In System.IO.File.ReadAllLines(txtfile, System.Text.Encoding.Default) '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 'It will just replace url in line if it contains. It needs to save rest of it. It needs to only detect image urls.

    'Gets URL OF IMAGE FILES THEN FORM1_FTP(url) 'Then aves as img.xxxx.com/imagefilename.xxx '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Next System.IO.File.WriteAllLines(txtfile, savetxt.ToArray, System.Text.Encoding.UTF8) savetxt.Clear() Next Catch ex As Exception MsgBox("Hata!" + vbNewLine + "Olası sebep:" + vbNewLine + "Klasör seçilmedi", MsgBoxStyle.Critical, "Error") Exit Sub End Try End Sub


    Thursday, May 2, 2013 2:15 PM

Answers

  • Thanks for your help. I found a solution. This is it:

    Dim regex As Regex = New Regex( _
          "(?<=http://).*?(?=\.png)", _
        RegexOptions.Multiline _
        Or RegexOptions.CultureInvariant _
        Or RegexOptions.IgnorePatternWhitespace _
        Or RegexOptions.Compiled _
        )
    Dim m As Match = regex.Match(InputText)

    Same code for other image formats & filename



    • Marked as answer by xboost Friday, May 3, 2013 6:55 PM
    • Edited by xboost Friday, May 3, 2013 7:00 PM
    Friday, May 3, 2013 6:55 PM

All replies

  • Can you post a few sample lines of the text in the input line. Replace anything confidential with XXX.

    jdweng

    Thursday, May 2, 2013 3:01 PM
  • Just a simple txt

    blablabla bla bla 

    bla bla http:/xx......jpg

    bla bla

    http://yy.......png


    Thursday, May 2, 2013 4:12 PM
  • So the URL can appear in the middle of the line put always starts with "http://" or do some of the URL have "file://" or "FTP://".  I may need to use REGEX class to extract the URL.  Just checking.


    jdweng

    Thursday, May 2, 2013 4:23 PM
  • Only starts with http://
    Thursday, May 2, 2013 4:33 PM
  • See if the code below helps

    Sub Main()
            Try
                Dim mydir As String = "C:/"
                Dim savetxt As New List(Of String)
                For Each txtfile As String In System.IO.Directory.GetFiles(mydir, "*.txt") 'sadece txt dosyalarını alıyor. Hepsi için ayarlanabilir.
                    For Each url As String In System.IO.File.ReadAllLines(txtfile, System.Text.Encoding.Default)
                        'get all the characters after the last forward slash "/"
                        Dim baseFileName As String = url.Substring(url.LastIndexOf("/") + 1)
                        If baseFileName.StartsWith("img") = True Then
                            FORM1_FTP(baseFileName)
                            '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                            'It will just replace url in line if it contains. It needs to save rest of it. It needs to only detect image urls.                    
                            'Gets URL OF IMAGE FILES THEN FORM1_FTP(url)
                            'Then aves as img.xxxx.com/imagefilename.xxx
                            '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                        End If
                    Next
                    System.IO.File.WriteAllLines(txtfile, savetxt.ToArray, System.Text.Encoding.UTF8)
                    savetxt.Clear()
                Next
            Catch ex As Exception
                MsgBox("Hata!" + vbNewLine + "Olası sebep:" + vbNewLine + "Klasör seçilmedi", MsgBoxStyle.Critical, "Error")
                Exit Sub
            End Try
        End Sub
        End Sub


    jdweng

    Thursday, May 2, 2013 5:35 PM
  • but this code will get url if starts with img (img.xxx.xxx/xxx.png)

    it can be in main domain or another subdomain.

    + I need to have rest of the line as before & after and I will put new url between them then save (beforetext + "resim.xxxx.com/"+filename+aftertext

    resim.xxx.com is where the http version of ftp so reader will see new url






    • Edited by xboost Thursday, May 2, 2013 5:54 PM
    Thursday, May 2, 2013 5:51 PM
  • You are right.  Didn't think about that.  Does this make more sense.  Note that I put url into the FORM1_FTP

        Sub Main()
            Try
                Dim mydir As String = "C:/"
                Dim savetxt As New List(Of String)
                For Each txtfile As String In System.IO.Directory.GetFiles(mydir, "*.txt") 'sadece txt dosyalarını alıyor. Hepsi için ayarlanabilir.
                    For Each url As String In System.IO.File.ReadAllLines(txtfile, System.Text.Encoding.Default)
                        'get all the characters after the last forward slash "/"
                        Dim baseFileName As String = url.Substring(url.LastIndexOf("/") + 1)
                        If baseFileName.StartsWith("img") = True Then
                            FORM1_FTP(url)
                        Else
                            Dim parentBaseFileName As String = baseFileName.Substring(url.LastIndexOf("/") + 1)
                            If parentBaseFileName.StartsWith("img") = True Then
                                FORM1_FTP(url)
                                '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                                'It will just replace url in line if it contains. It needs to save rest of it. It needs to only detect image urls.                    
                                'Gets URL OF IMAGE FILES THEN FORM1_FTP(url)
                                'Then aves as img.xxxx.com/imagefilename.xxx
                                '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                            End If
                        End If
                    Next
                    System.IO.File.WriteAllLines(txtfile, savetxt.ToArray, System.Text.Encoding.UTF8)
                    savetxt.Clear()
                Next
            Catch ex As Exception
                MsgBox("Hata!" + vbNewLine + "Olası sebep:" + vbNewLine + "Klasör seçilmedi", MsgBoxStyle.Critical, "Error")
                Exit Sub
            End Try
        End Sub


    jdweng

    Thursday, May 2, 2013 6:18 PM
  • But what I need is get filename as string from url

    get urloffile as string (extensions can be used to determine images not url)

    FORM1_FTP(urltofile)

    then:

    if line.contains(urloffile) then
    line=line.replace(urloffile,"http://resim.xxx.com/"+filename)
    end if
    savetxt.add(line)


    • Edited by xboost Thursday, May 2, 2013 6:45 PM
    Thursday, May 2, 2013 6:43 PM
  • If the URLs are all well formatted and encoded then you can find the index of "HTTP://" then find the index of the next whitespace character and that will give you the starting point and length of the URL substring.  You can then construct an new URI instance from that substring and check the file extension.  If it matches one of your desired extensions (probably stored in an array, list, or concatenated string) then you can execute the replace method as per your pseudo code above.

    If the URLs are not well formatted or not encoded (that is, they may contain a space in the name of the file instead of a %20), then it will take more knowledge of the text file format to try and figure out where the URL ends at.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Thursday, May 2, 2013 7:27 PM
  • They are formatted correctly using http. But how can i get url from a string? (it will work like that code in first post.)

    If I summerize the thing I need is find url in a string then check its extension. If it is ok (image) I need to have 2 strings. Url and filename.


    • Edited by xboost Thursday, May 2, 2013 7:47 PM
    Thursday, May 2, 2013 7:35 PM
  • Here's an example.  Note in the comments how your file contents may affect the actual code that you need to write:

    Public Class Form1
        Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
            Dim testString As String = <text>This is a line of text.
    And then we have a link to http://somedomain.com/thisimage.jpg on the second line.
    Another line of worthless text follows.
    And then we find another link to http://anotherdomain.com/somefolder/apicture.png on the last line.</text>.Value
    
            Dim imageExtensions As String = ".JPG .PNG .BMP .TIF .GIF" 'create a list of desired extensions
            For Each line As String In testString.Split(ControlChars.Lf) 'loop each line in some source
                line = line.ToUpper 'convert line to uppercase
                Dim first As Integer = line.IndexOf("HTTP://") 'find index of "HTTP://"
                If first > -1 Then 'If the text was found
                    Dim last As Integer = first + 1 'start looking for the end of the URL
                    Do While last < line.Length
                        'the first whitespace character will designate the end;
                        'will not work if a URL is followed by punctuation;
                        'more testing may be needed based on your text file structure
                        If Char.IsWhiteSpace(line(last)) Then
                            Exit Do
                        End If
                        last += 1
                    Loop
                    'Once URL position is known...
                    Dim url As New Uri(line.Substring(first, last - first)) 'construct URI from URL string
                    Dim fileName As String = System.IO.Path.GetFileName(url.LocalPath) 'Get file name
                    Dim extension As String = System.IO.Path.GetExtension(url.LocalPath) 'Get Extension
                    If imageExtensions.Contains(extension) Then 'Compare extension
                        Dim newline As String = line.Substring(0, first) & "the new url" & line.Substring(last) 'replace with some other URL
                        MessageBox.Show(newline)
                    End If
                End If
            Next
        End Sub
    End Class


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    • Proposed as answer by Junelily Friday, May 3, 2013 10:58 PM
    Thursday, May 2, 2013 8:07 PM
  • I think using the split function is the cleanest way of doing this task.  See code below

        Sub Main()
            Try
                Dim mydir As String = "C:/"
                Dim savetxt As New List(Of String)
                For Each txtfile As String In System.IO.Directory.GetFiles(mydir, "*.txt") 'sadece txt dosyalarını alıyor. Hepsi için ayarlanabilir.
                    For Each url As String In System.IO.File.ReadAllLines(txtfile, System.Text.Encoding.Default)
                        'get all the characters after the last forward slash "/"
                        Dim splitFileName() As String = url.Split("/")
                        Select Case splitFileName.Count
                            'http://xxx.xxx
                            Case 2
                                If splitFileName(2).StartsWith("img") = True Then
                                    FORM1_FTP(url)
                                End If
                            Case Is >= 2
                                If splitFileName(splitFileName.Count - 1).StartsWith("img") = True Or _
                                   splitFileName(splitFileName.Count - 1).StartsWith("img") Then
                                    FORM1_FTP(url)
                                End If
                        End Select
                        '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                        'It will just replace url in line if it contains. It needs to save rest of it. It needs to only detect image urls.                    
                        'Gets URL OF IMAGE FILES THEN FORM1_FTP(url)
                        'Then aves as img.xxxx.com/imagefilename.xxx
                        '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                    Next
                    System.IO.File.WriteAllLines(txtfile, savetxt.ToArray, System.Text.Encoding.UTF8)
                    savetxt.Clear()
                Next
            Catch ex As Exception
                MsgBox("Hata!" + vbNewLine + "Olası sebep:" + vbNewLine + "Klasör seçilmedi", MsgBoxStyle.Critical, "Error")
                Exit Sub
            End Try
        End Sub


    jdweng

    Thursday, May 2, 2013 9:49 PM
  • We simply don't know enough about the source text file to say what is best.

    For example, suppose one line is:

    Image http://somedomain.com/someimage.jpg uploaded on 5/2/2013

    Perhaps a simple rule like "find the last /" will work against the source file, or perhaps a more complex rule than I used would be necessary.  We just don't know.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Thursday, May 2, 2013 10:10 PM
  • Reed : Read all the postings. sboost already said that wasn't the case.

    jdweng

    Friday, May 3, 2013 3:32 AM
  • Here's an example.  Note in the comments how your file contents may affect the actual code that you need to write:

    Public Class Form1
        Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
            Dim testString As String = <text>This is a line of text.
    And then we have a link to http://somedomain.com/thisimage.jpg on the second line.
    Another line of worthless text follows.
    And then we find another link to http://anotherdomain.com/somefolder/apicture.png on the last line.</text>.Value
    
            Dim imageExtensions As String = ".JPG .PNG .BMP .TIF .GIF" 'create a list of desired extensions
            For Each line As String In testString.Split(ControlChars.Lf) 'loop each line in some source
                line = line.ToUpper 'convert line to uppercase
                Dim first As Integer = line.IndexOf("HTTP://") 'find index of "HTTP://"
                If first > -1 Then 'If the text was found
                    Dim last As Integer = first + 1 'start looking for the end of the URL
                    Do While last < line.Length
                        'the first whitespace character will designate the end;
                        'will not work if a URL is followed by punctuation;
                        'more testing may be needed based on your text file structure
                        If Char.IsWhiteSpace(line(last)) Then
                            Exit Do
                        End If
                        last += 1
                    Loop
                    'Once URL position is known...
                    Dim url As New Uri(line.Substring(first, last - first)) 'construct URI from URL string
                    Dim fileName As String = System.IO.Path.GetFileName(url.LocalPath) 'Get file name
                    Dim extension As String = System.IO.Path.GetExtension(url.LocalPath) 'Get Extension
                    If imageExtensions.Contains(extension) Then 'Compare extension
                        Dim newline As String = line.Substring(0, first) & "the new url" & line.Substring(last) 'replace with some other URL
                        MessageBox.Show(newline)
                    End If
                End If
            Next
        End Sub
    End Class


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    What if I have other chars after link? I think it can be recognised using last 4 chars of url. (exp.: .png or .jpg) What I mean is finds http://xxxxx.xxx/xxxx/xxx.png in string and gives me string between http:// and .png (url) and between last / before .png and .png (filename) same structure for other filetypes like tif jpg etc.
    Friday, May 3, 2013 5:31 PM
  • Thanks for your help. I found a solution. This is it:

    Dim regex As Regex = New Regex( _
          "(?<=http://).*?(?=\.png)", _
        RegexOptions.Multiline _
        Or RegexOptions.CultureInvariant _
        Or RegexOptions.IgnorePatternWhitespace _
        Or RegexOptions.Compiled _
        )
    Dim m As Match = regex.Match(InputText)

    Same code for other image formats & filename



    • Marked as answer by xboost Friday, May 3, 2013 6:55 PM
    • Edited by xboost Friday, May 3, 2013 7:00 PM
    Friday, May 3, 2013 6:55 PM
  • Final full code

    Imports System.Text.RegularExpressions
    
    Public Class Form3
    
        Private Sub Form3_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            Try
                Dim mydir As String = Form2.TextBox2.Text
                Dim savetxt As New List(Of String)
                For Each txtfile As String In System.IO.Directory.GetFiles(mydir, "*.txt") 'sadece txt dosyalarını alıyor. Hepsi için ayarlanabilir.
                    For Each line As String In System.IO.File.ReadAllLines(txtfile, System.Text.Encoding.UTF8)
                        Dim regex As Regex = New Regex( _
          "(?<=]).*?(?=\[)", _
        RegexOptions.Multiline _
        Or RegexOptions.CultureInvariant _
        Or RegexOptions.IgnorePatternWhitespace _
        Or RegexOptions.Compiled _
        ) 'yup, i placed ][ near urls
                        Dim m As Match = regex.Match(line)
                        If (m.Success) Then
                            Dim cnttest As String = m.Value
                            If cnttest.Contains(".png") Then
                                ListBox1.Items.Add(m.Value)
                            Else
                                If cnttest.Contains(".bmp") Then
                                    ListBox1.Items.Add(m.Value)
                                Else
                                    If cnttest.Contains(".gif") Then
                                        ListBox1.Items.Add(m.Value)
                                    Else
                                        If cnttest.Contains(".tif") Then
                                            ListBox1.Items.Add(m.Value)
                                        Else
                                            If cnttest.Contains(".jpg") Then
                                                ListBox1.Items.Add(m.Value)
                                            Else
                                                ListBox3.Items.Add(m.Value)
                                            End If
                                        End If
                                    End If
                                End If
                            End If
                        End If
                        savetxt.Add(line)
                    Next
                    System.IO.File.WriteAllLines(txtfile, savetxt.ToArray)
                    savetxt.Clear()
                    ftp()
                    ProgressBar1.Style = ProgressBarStyle.Blocks
                    ProgressBar1.Maximum = ListBox1.Items.Count
                Next
                Label3.Text = "Bulunan resimler: " + ListBox1.Items.Count + " Bulunan diğer URL'ler: " + ListBox3.Items.Count + " yüklenme durumu durum çunuğundadır..."
            Catch ex As Exception
                MsgBox("Hata!" + vbNewLine + "Olası sebepler:" + vbNewLine + "Klasör seçilmedi" + vbNewLine + "İnternette problem var", MsgBoxStyle.Critical, "Error")
                Exit Sub
            End Try
        End Sub
        Private Sub ftp()
            Do Until ListBox1.Items.Count = 0
                'ftp code
            Loop
        End Sub
    End Class


    • Edited by xboost Friday, May 3, 2013 7:16 PM
    Friday, May 3, 2013 7:14 PM
  • thank you for helping out the community!

    Friday, May 3, 2013 10:59 PM