日期:2014-05-17  浏览次数:20517 次

网页表格信息抓取


页面源代码如下:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<HTML><HEAD><TITLE></TITLE>
<META content="text/html; charset=GBK" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.7601.18106"></HEAD>
<BODY>
<FORM method=post name=pusManageForm action=pus.do><INPUT type=hidden 
name=method> <INPUT value=15647695 type=hidden name=sid> <INPUT value=2 
type=hidden name=partCount> 
<TABLE width="100%" align=center>
  <TBODY>
  <TR>
    <TD>
      <TABLE border=0 width="100%">
        <TBODY>
        <TR>
          <TD width=10>&nbsp; </TD>
          <TD>
            <TABLE border=0 cellSpacing=1 cellPadding=0 width="95%" 
align=center>
              <TBODY>
              <TR>
                <TD height=40 align=left><B><FONT color=rgb(0,0,20) 
                  size=2>aaaaaa</FONT></B> <BR><B><FONT 
                  color=rgb(0,0,20) size=2>aaaaaa</FONT></B> </TD></TR>
              <TR>
                <TD height=40 align=left><FONT color=rgb(0,0,20) 
                  size=1>aaaaaa</FONT> <FONT color=rgb(0,0,20) 
                  size=1>xxxx(aaaaaa)</FONT> </TD></TR>
              <TR>