none
正侧表达式 筛选字符串 RRS feed

答案

  • 应该是可以的。我筛选出左右都是<a>的匹配(但不包括它们),取中间的中文字符。

    我用英文就更明显了:

      

    strings="<div class='breadcrumb' name='__Breadcrumb_pub'><a href='http://category.dangdang.com/all/?category_path=01.00.00.00.00.00' target='_blank' class='domain' name='__Breadcrumb_pub'><b class='domain'>Book</b></a>&nbsp;&gt;&nbsp;<a href='http://category.dangdang.com/all/?category_path=01.05.00.00.00.00' target='_blank' name='__Breadcrumb_pub'>BB</a>&nbsp;&gt;&nbsp;<a href='http://category.dangdang.com/all/?category_path=01.05.15.00.00.00' target='_blank' name='__Breadcrumb_pub'>AA</a>&nbsp;&gt;&nbsp;<span>BB</span></div>";             Regexreg=newRegex(@"(?<=<a.*?>)\w+(?=</a.+?>)");             foreach (Matchiteminreg.Matches(s))             {                 Console.WriteLine(item.Value);             }

    结果:

    BB

    AA

              

    If you think one reply solves your problem, please mark it as An Answer, if you think someone's reply helps you, please mark it as a Proposed Answer

    Help by clicking:
    Click here to donate your rice to the poor
    Click to Donate
    Click to feed Dogs & Cats


    Found any spamming-senders? Please report at: Spam Report

    2013年5月14日 9:02
    版主

全部回复

  • 判断<a>有几个也行 目的是知道当前产品是第几级分类  在此多谢了
    2013年5月14日 8:00
  • 直接用中文表达式即可:

     string s = "<div class='breadcrumb' name='__Breadcrumb_pub'><a href='http://category.dangdang.com/all/?category_path=01.00.00.00.00.00' target='_blank' class='domain' name='__Breadcrumb_pub'><b class='domain'>图书</b></a>&nbsp;&gt;&nbsp;<a href='http://category.dangdang.com/all/?category_path=01.05.00.00.00.00' target='_blank' name='__Breadcrumb_pub'>文学</a>&nbsp;&gt;&nbsp;<a href='http://category.dangdang.com/all/?category_path=01.05.15.00.00.00' target='_blank' name='__Breadcrumb_pub'>中国古代随笔</a>&nbsp;&gt;&nbsp;<span>商品详情</span></div>";
                Regex reg = new Regex("(?<=<a.*?>)[\u4e00-\u9fa5]+(?=</a>)");
                foreach (Match item in reg.Matches(s))
                {
                    Console.WriteLine(item.Value);
                }

    If you think one reply solves your problem, please mark it as An Answer, if you think someone's reply helps you, please mark it as a Proposed Answer

    Help by clicking:
    Click here to donate your rice to the poor
    Click to Donate
    Click to feed Dogs & Cats


    Found any spamming-senders? Please report at: Spam Report

    2013年5月14日 8:04
    版主
  • 没取到 以上代码只是页面中一小段,能不能筛选“class='domain'>图书” 到">商品详情<"或者是“<span>商品详情</span>”
    2013年5月14日 8:23
  • 应该是可以的。我筛选出左右都是<a>的匹配(但不包括它们),取中间的中文字符。

    我用英文就更明显了:

      

    strings="<div class='breadcrumb' name='__Breadcrumb_pub'><a href='http://category.dangdang.com/all/?category_path=01.00.00.00.00.00' target='_blank' class='domain' name='__Breadcrumb_pub'><b class='domain'>Book</b></a>&nbsp;&gt;&nbsp;<a href='http://category.dangdang.com/all/?category_path=01.05.00.00.00.00' target='_blank' name='__Breadcrumb_pub'>BB</a>&nbsp;&gt;&nbsp;<a href='http://category.dangdang.com/all/?category_path=01.05.15.00.00.00' target='_blank' name='__Breadcrumb_pub'>AA</a>&nbsp;&gt;&nbsp;<span>BB</span></div>";             Regexreg=newRegex(@"(?<=<a.*?>)\w+(?=</a.+?>)");             foreach (Matchiteminreg.Matches(s))             {                 Console.WriteLine(item.Value);             }

    结果:

    BB

    AA

              

    If you think one reply solves your problem, please mark it as An Answer, if you think someone's reply helps you, please mark it as a Proposed Answer

    Help by clicking:
    Click here to donate your rice to the poor
    Click to Donate
    Click to feed Dogs & Cats


    Found any spamming-senders? Please report at: Spam Report

    2013年5月14日 9:02
    版主