开发者社区 问答 正文

BeautifulSoup 用 .find(text=True) 找不到 table 里边的文字

.find(text=True)对table里的一些文字没有作用,下边是我的代码:

import urllib
import urllib2
import cookielib
import re
import csv
import codecs
from bs4 import BeautifulSoup

listmain = 'http://gdemba.gicp.net:84/ListMain.asp'
header = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(listmain,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

table = soup.find(id='Table11')
f = open('table.csv', 'w')
csv_writer = csv.writer(f)
td = re.compile('td')

client = ""
tag = ""
tel = ""
catalogue = ""
region = ""
client_type = ""
email = ""
creater = ""
department = ""
action = ""

for row in table.find_all("tr"):
    cells = row.find_all("td")
    if len(cells) == 10:
        client = cells[0].find(text=True)
        tag = cells[1].find(text=True)
        tel = cells[2].find(text=True)
        catalogue = cells[3].find(text=True)
        region = cells[4].find(text=True)
        client_type = cells[5].find(text=True)
        email = cells[6].find(text=True)
        creater = cells[7].find(text=True)
        department = cells[8].find(text=True)
        action = cells[9].find(text=True)

    csv_writer.writerow([x.encode('utf-8') for x in [client, tag, tel, catalogue, region, client_type, email, creater, department, action]])

f.close()

有一条要处理的

<tr class="ListTableRow" id="Row0" onclick="javascript:setRowFocus(this,false,0);FirstDataFormat('0000008688')" ondblclick="viewcoinfo('interunit','0000008688','{A31618B2-90CC-456F-A2E7-4C5B0D577E25}')">
<td nowrap=""> <span id="spanshare0000008688"></span>深圳营业部</td>
<td id="0000008688sign" nowrap=""> 福田</td>
<td nowrap=""> 0755-66666666</td>
<td nowrap=""> 手机配件</td>
<td nowrap=""> 深圳市</td>
<td nowrap=""> 普通客户</td>
<td nowrap=""> <span class="BlueText" onclick="javascript:EmailTo('0000008688','123456@qq.com')" onmouseout="javascript:this.style.textDecoration=''" onmouseover="javascript:this.style.textDecoration='underline'>123456@qq.com</span></td>
<td nowrap=""> 信息资源部</td>
<td nowrap=""> 信息资源部</td>
<td height="16" nowrap="" style="width: 78px"> </td>
</tr>

screenshot
请问是什么原因,跟标签有关系吗?
但是客户名称和Email两个

里边的text没办法取出来:

展开
收起
杨冬芳 2016-06-17 11:42:30 3971 分享 版权
1 条回答
写回答
取消 提交回答
  • 码农|Coder| Pythonista

    不需要使用cells[0].find(text=True),直接用cells[0].text就行

    2019-07-17 19:42:22
    赞同 展开评论
问答分类:
问答地址: