开发者社区> 问答> 正文

如何格式化一个美丽的汤的整个输出(列表)?

我试图格式的全部输出我美丽的汤网页刮刀在这里。输出如下:

AT-FVFX1BN7J1WK:Python 522672$ /Library/Frameworks/Python.framework/Versions/3.7/bin/python3 "/Users/522672/Desktop/Python/Scraper/Beautiful Soup/Python2.py"

[<div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="@-yet">@-yet</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="ADDI-DATA">ADDI-DATA</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="ADE-Werk">ADE-Werk</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="Adelmann Umwelt">Adelmann Umwelt</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="Ademco 1">Ademco 1</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="adesso">adesso</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="ADITO Software">ADITO Software</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="ADMOS Gleitlager">ADMOS Gleitlager</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="ads-tec Industrial IT">ads-tec Industrial IT</div>
</div>, <div class="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left">
<div class="ellipsis" title="ADVES">ADVES</div>
</div>]

这是我在打印company_name时得到的原始输出,但是我不知道如何将company_name格式化为只有公司名称。所以当我打印company_name时,我只会得到一个完整的公司列表,就像“@-yet”或“ADDI-ADTA”那样。

from bs4 import BeautifulSoup
import requests
import lxml

url = 'https://www.vdma.org/en/mitglieder?p_p_lifecycle=2&p_p_resource_id=getPage&p_p_id=vdma2publicusers_WAR_vdma2publicusers&s=&page=5'
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
company_name = soup.find_all('div', class_="col-xs-7 col-sm-7 col-md-7 col-lg-7 text-left")
company_website = soup.find_all('div', class_="col-xs-5 col-sm-5 col-md-5 col-lg-5 text-right")
company_adress = soup.find_all('div', class_="col-xs-5 col-sm-5 col-md-5 col-lg-5")
company_contact = soup.find_all('div', class_="col-xs-10 col-sm-10 col-md-9 col-lg-9")

问题来源StackOverflow 地址:/questions/59383246/how-to-format-an-entire-output-list-of-a-beautiful-soup

展开
收起
kun坤 2019-12-27 11:19:11 332 0
1 条回答
写回答
取消 提交回答
  • 尝试使用这个CSS选择器来获得公司名称 更新后的代码:

    from bs4 import BeautifulSoup
    import requests
    
    url = 'https://www.vdma.org/en/mitglieder?p_p_lifecycle=2&p_p_resource_id=getPage&p_p_id=vdma2publicusers_WAR_vdma2publicusers&s=&page=5'
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    with open ("test.txt", "w") as output:
        companies = soup.select('.col-lg-10')
        for company in companies:
            company_name = company.select('.text-left')[0].text.strip()
            company_contacts = company.select('.col-lg-9 .ellipsis')
            # If you want to check the type of every contact
            # for contact in company_contacts:
            #   if "@" in contact.text.strip():
            #       print("Contact is email")
            #   else:
            #       print("Contact is a number")
            output.write(f"Name: {company_name}\nContacts: {', '.join([contact.text.strip() for contact in company_contacts])}\n\n")
    
    
            # Output 
            # Name: 2W Technische Informations
            # Contacts: info@2wgmbh.de, (+49 89) 5 20 35-0
    
            # Name: 3 S Schnecken + Spindeln + Spiralen
            # Contacts: office@3s-gmbh.at, (+43 7613) 50 04
    
            # Name: 365FarmNet
            # Contacts: info@365farmnet.com, (+49 30) 2 59 32 95 00, (+49 30) 2 59 32 95 01
    
            # Name: 3D Interaction Technologies
            # Contacts: info@3dit.de, (+49 351) 21 96-74 95
            # ...
    
    2019-12-27 11:19:18
    赞同 展开评论 打赏
问答分类:
问答地址:
问答排行榜
最热
最新

相关电子书

更多
低代码开发师(初级)实战教程 立即下载
冬季实战营第三期:MySQL数据库进阶实战 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载