日期:2014-05-18  浏览次数:20409 次

使用正则来读取table问题
我读取一个html文件到一个变量里
如何使用正则或者其他方式来获取信息.信息格式如下:
id image
1 t3.jpg
2 t4.jpg

html文件如下:
C# code


<table border="2" width=100%  >
     <th><p align="center">Id</p></th>
     <th><p align="center">image</p></th>

 <tr>
     <td>1</td>
    <td><a href="i3.jpg" target="_blank"><img src = "t3.jpg"></a></td>
</tr>
 <tr>
     <td>2</td>
    <td><a href="i3.jpg" target="_blank"><img src = "t4.jpg"></a></td>
</tr>

</table>





 我对正则不太熟悉 所以请高手贴出代码来

------解决方案--------------------
C# code

void Main()
{
var html = @"<table border=""2"" width=100%  >
     <th><p align=""center"">Id</p></th>
     <th><p align=""center"">image</p></th>

 <tr>
     <td>1</td>
    <td><a href=""i3.jpg"" target=""_blank""><img src = ""t3.jpg""></a></td>
</tr>
 <tr>
     <td>2</td>
    <td><a href=""i3.jpg"" target=""_blank""><img src = ""t4.jpg""></a></td>
</tr>

</table>";
var i=0;
      foreach(Match m in Regex.Matches(html,@"(?i)(?<=(?:>|<img\ssrc\s*=\s*""))[^<>\s]+(?=<|"")"))
      { 
          i++;
        Console.Write("{0}\t",m.Value);
        if(i%2==0) Console.WriteLine();
      }
      
      /*
    Id    image    
    1    t3.jpg    
    2    t4.jpg    
      */
}

------解决方案--------------------
C# code

 string str = @"<table border=""2"" width=100%  >
     <th><p align=""center"">Id</p></th>
     <th><p align=""center"">image</p></th>

 <tr>
     <td>1</td>
    <td><a href=""i3.jpg"" target=""_blank""><img src = ""t3.jpg""></a></td>
</tr>
 <tr>
     <td>2</td>
    <td><a href=""i3.jpg"" target=""_blank""><img src = ""t4.jpg""></a></td>
</tr>

</table>
";
        Regex reg = new Regex("(?is)<tr[^>]*?>.*?<td>(?<num>\\d+)</td>.*?<td><a[^>]*?><img[^>]*?\"(?<url>.*?)\"></a></td>.*?</tr>");
        foreach (Match item in reg.Matches(str))
        {
            Response.Write(string.Format("num:{0},url:{1}<hr/>", item.Groups["num"].Value, item.Groups["url"].Value));
        }
        Response.Write("--------------------------下面的是字段名称-----------------------------<br/>");
        foreach (Match item in Regex.Matches(str, "(?is)<th><p[^>]*?>(?<column>.*?)</p></th>"))
        {
            Response.Write(string.Format("column:{0}<hr/>", item.Groups["column"].Value));
        }