日期:2014-05-18  浏览次数:20381 次

求个HTML正则表达式
HTML如下:
<tr>                                       <td   width= '20 '   class= 'hei14 '> · </td>                                     <td   width= '360 '> <a   href=http://news.xinhuanet.com/travel/2007-05/17/content_6108964.htm   target= '_blank '   class= 'hei14 '> 武夷山风景名胜区门票价格上调 </a> <span   class= 'sj '> (05-17) </span> </td>                                 </tr>

需要获取
1,http://news.xinhuanet.com/travel/2007-05/17/content_6108964.htm
2,武夷山风景名胜区门票价格上调
3,05-17

------解决方案--------------------
格式固定吗,楼主应该是要同时取多个吧,这样试下

string yourStr = ...........;
MatchCollection mc = Regex.Matches(yourStr, @ " <tr[^> ]*?> [\s\S]*? <a\s+href=([ " " ']?)(? <url> [^ " " '\s]*)\1?[^> ]*?> (? <text> [^ <]*?) </a> \s* <span[^> ]*?> \((? <time> [^ <\)]*?)\) </span> </td> \s* </tr> ", RegexOptions.IgnoreCase);
foreach (Match m in mc)
{
richTextBox2.Text += m.Groups[ "url "].Value + "\n ";
richTextBox2.Text += m.Groups[ "text "].Value + "\n ";
richTextBox2.Text += m.Groups[ "time "].Value + "\n ";
}