none
紧急求助,正则表达式提取超链接和替换文本 RRS feed

  • 问题

  • 代码片段如下:<div class="mod_interact_main">
          <div class="mudule_comment_cont">
           <a href="http://qz.qq.com/51062169/home"> 明明 </a>
           <time datetime="1年前">1年前</time>
           <a href="javascript:;" id="2_link_reply_1268192440_1" class="link_reply" onclick="QOM.FP.showCommentBox('1268192440_1',2);return false;">回复</a>
           <a href="javascript:;" id="2_link_hide_1268192440_1" class="link_reply none" onclick="QOM.FP.hideCommentBox();return false;">收起回复</a>
          </div>
          <div class="mudule_comment_detail">  希望能增加留言和相册备份  </div>
          
          <ol class="sub_comment " id="2_sub_comment_1268192440_1">
           
            <li class="mod_interact">
             <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo2.store.qq.com/qzone/51062169/51062169/30" alt="pic" /></div>
             <div class="mod_interact_main">
              <div class="mudule_comment_cont">
               <a href="http://qz.qq.com/51062169/home">  明明 </a>
               <time datetime="1年前">回复 1年前</time>
              </div>
              <div class="mudule_comment_detail">  还有说说等
      </div>
             </div>
            </li>
           
            <li class="mod_interact">
             <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo1.store.qq.com/qzone/405797768/405797768/30" alt="pic" /></div>
             <div class="mod_interact_main">
              <div class="mudule_comment_cont">
               <a href="http://qz.qq.com/405797768/home">    非狐 </a>
               <time datetime="1年前">回复 1年前</time>
              </div>
              <div class="mudule_comment_detail">  谢谢你的建议,正在开发中,敬请期待<img src="http://qzs.qq.com/qzone/em/e181.gif"/>  </div>
             </div>
            </li>
          
       
        <li class="mod_interact">
         <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo2.store.qq.com/qzone/59991021/59991021/30" alt="pic" /></div>
         <div class="mod_interact_main">
          <div class="mudule_comment_cont">
           <a href="http://qz.qq.com/59991021/home"> ♂Jay_小翟♀ </a>
           <time datetime="05月01日 02:24">05月01日 02:24</time>
           <a href="javascript:;" id="2_link_reply_1268192440_3" class="link_reply" onclick="QOM.FP.showCommentBox('1268192440_3',2);return false;">回复</a>
           <a href="javascript:;" id="2_link_hide_1268192440_3" class="link_reply none" onclick="QOM.FP.hideCommentBox();return false;">收起回复</a>
          </div>
          <div class="mudule_comment_detail">  远程服务器返回错误:(404)未找到
    。。。。
    什么情况?  </div>
          
          <ol class="sub_comment " id="2_sub_comment_1268192440_3">
           
            <li class="mod_interact">
             <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo1.store.qq.com/qzone/405797768/405797768/30" alt="pic" /></div>
             <div class="mod_interact_main">
              <div class="mudule_comment_cont">
               <a href="http://qz.qq.com/405797768/home">    非狐 </a>
               <time datetime="06月07日 20:22">回复 06月07日 20:22</time>
              </div>
              <div class="mudule_comment_detail">  请下载最新版本软件,并在参数里面设置为完全备份即可~  </div>
             </div>
            </li>
           
          </ol>
       
        <li class="mod_interact">
         <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo2.store.qq.com/qzone/59991021/59991021/30" alt="pic" /></div>
         <div class="mod_interact_main">
          <div class="mudule_comment_cont">
           <a href="http://qz.qq.com/59991021/home"> ♂Jay_小翟♀ </a>
           <time datetime="06月07日 21:11">06月07日 21:11</time>
           <a href="javascript:;" id="2_link_reply_1268192440_6" class="link_reply" onclick="QOM.FP.showCommentBox('1268192440_6',2);return false;">回复</a>
           <a href="javascript:;" id="2_link_hide_1268192440_6" class="link_reply none" onclick="QOM.FP.hideCommentBox();return false;">收起回复</a>
          </div>
          <div class="mudule_comment_detail">  新版本?是这个吗?[url=http://dl.dbank.com/c0ttdot198]http://dl.dbank.com/c0ttdot198[/url]
    不行啊  </div>
          
          <ol class="sub_comment " id="2_sub_comment_1268192440_6">
           
            <li class="mod_interact">
             <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo1.store.qq.com/qzone/405797768/405797768/30" alt="pic" /></div>
             <div class="mod_interact_main">
              <div class="mudule_comment_cont">
               <a href="http://qz.qq.com/405797768/home">    非狐 </a>
               <time datetime="06月08日 10:15">回复 06月08日 10:15</time>
              </div>
              <div class="mudule_comment_detail">  请到http://www.sbys.org.cn下载最新版本~  </div>
             </div>
            </li>
           
            <li class="mod_interact">
             <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo2.store.qq.com/qzone/59991021/59991021/30" alt="pic" /></div>
             <div class="mod_interact_main">
              <div class="mudule_comment_cont">
               <a href="http://qz.qq.com/59991021/home">  ♂Jay_小翟♀ </a>
               <time datetime="06月08日 10:34">回复 06月08日 10:34</time>
              </div>
              <div class="mudule_comment_detail">  这个域名和主机是哪家的?价格多少呀?用着怎样?  </div>
             </div>
            </li>
           
            <li class="mod_interact">
             <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo2.store.qq.com/qzone/59991021/59991021/30" alt="pic" /></div>
             <div class="mod_interact_main">
              <div class="mudule_comment_cont">
               <a href="http://qz.qq.com/59991021/home">  ♂Jay_小翟♀ </a>
               <time datetime="06月08日 16:11">回复 06月08日 16:11</time>
              </div>
              <div class="mudule_comment_detail">  我表示 我没找到下载的地方  </div>
             </div>
            </li>
           
            <li class="mod_interact">
             <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo1.store.qq.com/qzone/405797768/405797768/30" alt="pic" /></div>
             <div class="mod_interact_main">
              <div class="mudule_comment_cont">
               <a href="http://qz.qq.com/405797768/home">    非狐 </a>
               <time datetime="06月09日 09:45">回复 06月09日 09:45</time>
              </div>
              <div class="mudule_comment_detail">  直接加我的QQ吧~  </div>
             </div>
            </li> 
          </ol>
          
       
        <li class="mod_interact">
         <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo1.store.qq.com/qzone/94028028/94028028/30" alt="pic" /></div>
         <div class="mod_interact_main">
          <div class="mudule_comment_cont">
           <a href="http://qz.qq.com/94028028/home"> min&#39;er </a>
           <time datetime="07月07日 13:40">07月07日 13:40</time>
           <a href="javascript:;" id="2_link_reply_1268192440_10" class="link_reply" onclick="QOM.FP.showCommentBox('1268192440_10',2);return false;">回复</a>
           <a href="javascript:;" id="2_link_hide_1268192440_10" class="link_reply none" onclick="QOM.FP.hideCommentBox();return false;">收起回复</a>
          </div>
          <div class="mudule_comment_detail">  <img src="/qzone/em/e106.gif">我觉得,如果备份出来的日志有时间信息会更好  </div>
          
          <ol class="sub_comment " id="2_sub_comment_1268192440_10">
           
            <li class="mod_interact">
             <div class="mod_interact_avatar"><span class="avatar_round"></span><img src="http://qlogo1.store.qq.com/qzone/405797768/405797768/30" alt="pic" /></div>
             <div class="mod_interact_main">
              <div class="mudule_comment_cont">
               <a href="http://qz.qq.com/405797768/home">    非狐 </a>
               <time datetime="07月08日 11:15">回复 07月08日 11:15</time>
              </div>
              <div class="mudule_comment_detail">  好,马上就更新,谢谢你<img src="http://qzs.qq.com/qzone/em/e181.gif"/>  </div>
             </div>
            </li> 
      </div>
     </div>
    想要提取超链接后的名字比如“非狐”,还有后面的内容(其实就是评论)。
    还有一个,就是把类似[img]love[/img]替换成lovo.gif,不知道怎么做?恳请高手指点。谢谢!!
    2011年7月25日 8:51

答案

  • 因为你的内容是一个网页上的, 所有我建议你直接用webbrowser来做:

    这样我这有个例子你可以参考下:

      Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        Try
          Dim doc As HtmlDocument = WebBrowser1.Document
          Dim inputC As HtmlElementCollection = doc.GetElementsByTagName("a")
          For Each input As HtmlElement In inputC
            Console.WriteLine(input.GetAttribute("herf"))
            Console.WriteLine(input.InnerText)
          Next
        Catch ex As Exception
          MsgBox(ex.Message)
        End Try
      End Sub
    

    另外一个就不需要正则表达式了, 你只需要将所有的<img>都删掉,将</img>替换为.gif

    Best regards,

     


    Mike Feng [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    2011年7月27日 12:08
    版主
  • 我提供一个非WebBrowser的吧

    先匹配<a href=".+">.+</a>,然后再把<a href=".+">和</a>替换为""即可

     

    img那个,你可以匹配[img].+[/img],找到后再将[img]和[/img]替换为空,得到内容,再将其更改并在头尾加上去掉的[img][/img]

     

    2011年7月31日 16:16

全部回复

  • 因为你的内容是一个网页上的, 所有我建议你直接用webbrowser来做:

    这样我这有个例子你可以参考下:

      Private Sub WebBrowser1_DocumentCompleted(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        Try
          Dim doc As HtmlDocument = WebBrowser1.Document
          Dim inputC As HtmlElementCollection = doc.GetElementsByTagName("a")
          For Each input As HtmlElement In inputC
            Console.WriteLine(input.GetAttribute("herf"))
            Console.WriteLine(input.InnerText)
          Next
        Catch ex As Exception
          MsgBox(ex.Message)
        End Try
      End Sub
    

    另外一个就不需要正则表达式了, 你只需要将所有的<img>都删掉,将</img>替换为.gif

    Best regards,

     


    Mike Feng [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    2011年7月27日 12:08
    版主
  • 我提供一个非WebBrowser的吧

    先匹配<a href=".+">.+</a>,然后再把<a href=".+">和</a>替换为""即可

     

    img那个,你可以匹配[img].+[/img],找到后再将[img]和[/img]替换为空,得到内容,再将其更改并在头尾加上去掉的[img][/img]

     

    2011年7月31日 16:16