none
WebBrowser1.DocumentText讀取html問題 RRS feed

  • 問題

  • 使用WebBrowser1.DocumentText可以讀取整個html文件, 但是如果只想讀取html中的某一個數值應如何做?

    例如
    http://www.xxx.com/ 的html如下

    Code Snippet
    <html>
    <head>
    <title>Welcome</title>
    </head>
    <body>
    <table>
    <tr><td>Welcome to <b>www.xxx.com</b></td></tr>
    <tr><td><img src="logo.jpg"></td></tr>
    <tr><td style="font-size: 8pt">Total Guest: 12345</td></tr>
    </table>
    </body>
    </html>

     

     

     

    我現在的程式如下

    Code Snippet

    Public Class Form1

        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            WebBrowser1.Navigate(TextBox1.Text)
        End Sub

        Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
            MessageBox.Show(WebBrowser1.DocumentText)
        End Sub
    End Class

     

     

    那如果我只想讀取Total Guest後的數值要如何改?

    2008年3月27日 下午 02:59

解答

  • 使用 Regex 來達成,參考

    private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        string s = webBrowser1.DocumentText;
        Match m = Regex.Match(s, @"Total Guest: (\d+)");
        int guestCount;
     
        if(m.Success)
        {
            if( int.TryParse(m.Groups[1].Value, out guestCount) == false)
            {
                guestCount = -1;
            }
        }
        else
        {
            guestCount = -1;
        }
     
        if(guestCount == -1)
        {
            MessageBox.Show("抓取失敗");
        }
        else
        {
            MessageBox.Show("Total Guest: " + guestCount);
        }
    }

    2008年3月27日 下午 03:28
  • 發個已轉回VB的給大家參考

    Code Snippet

    Imports System.Text.RegularExpressions

    Public Class Form1

        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            WebBrowser1.Navigate(TextBox1.Text)
        End Sub

        Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
            Dim s As String = WebBrowser1.DocumentText
            Dim m As Match = Regex.Match(s, "Total Guest: (\d+)")
            Dim guestCount As Integer

            If m.Success Then
                If Integer.TryParse(m.Groups(1).Value, guestCount) = False Then
                    guestCount = -1
                End If
            Else
                guestCount = -1
            End If

            If guestCount = -1 Then
                MessageBox.Show("抓取失敗")
            Else
                MessageBox.Show("Total Guest: " + guestCount)
            End If
        End Sub
    End Class

     

     

     

    2008年3月27日 下午 03:44

所有回覆

  • 使用 Regex 來達成,參考

    private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        string s = webBrowser1.DocumentText;
        Match m = Regex.Match(s, @"Total Guest: (\d+)");
        int guestCount;
     
        if(m.Success)
        {
            if( int.TryParse(m.Groups[1].Value, out guestCount) == false)
            {
                guestCount = -1;
            }
        }
        else
        {
            guestCount = -1;
        }
     
        if(guestCount == -1)
        {
            MessageBox.Show("抓取失敗");
        }
        else
        {
            MessageBox.Show("Total Guest: " + guestCount);
        }
    }

    2008年3月27日 下午 03:28
  • 解決了, 謝謝老大

     

    2008年3月27日 下午 03:31
  • 發個已轉回VB的給大家參考

    Code Snippet

    Imports System.Text.RegularExpressions

    Public Class Form1

        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            WebBrowser1.Navigate(TextBox1.Text)
        End Sub

        Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
            Dim s As String = WebBrowser1.DocumentText
            Dim m As Match = Regex.Match(s, "Total Guest: (\d+)")
            Dim guestCount As Integer

            If m.Success Then
                If Integer.TryParse(m.Groups(1).Value, guestCount) = False Then
                    guestCount = -1
                End If
            Else
                guestCount = -1
            End If

            If guestCount = -1 Then
                MessageBox.Show("抓取失敗")
            Else
                MessageBox.Show("Total Guest: " + guestCount)
            End If
        End Sub
    End Class

     

     

     

    2008年3月27日 下午 03:44
  • 想問一下..Total Guest: (\d+)中的\d+是解作甚麼?

    還有這句

    If Integer.TryParse(m.Groups(1).Value, guestCount) = False Then
                    guestCount = -1
                End If

     

    能否解釋一下嗎

    2008年3月27日 下午 04:29
  • \d 表示0-9,+ 是數量詞表示一個以上...

    關於 Regular Expresssion 請參考: .NET Framework 規則運算式

    裡面也有相關類別 Match 、Group 的操作說明


     


    2008年3月28日 上午 09:55