大家好,前面把基本概念介绍完了。今天开始正式介绍如何使用python 去写playwright 脚本。
安装
Pip方式
pip install --upgrade pippip install playwrightplaywright install
Conda方式安装
conda config --add channels conda-forgeconda config --add channels microsoftconda install playwrightplaywright install
这些命令会下载 Playwright 包以及安装相关浏览器Chromium, Firefox and WebKit。
使用这些库
安装好了之后,就可以在一个Python脚本中导入Playwright,然后启动这三个浏览器中 的一个(chromium, firefox and webkit)。
from playwright.sync_api import sync_playwrightwith sync_playwright() as p: browser = p.chromium.launch() page = browser.new_page() page.goto("http://playwright.dev") print(page.title()) browser.close()
Playwright 支持2个类型的API: 同步和异步。如果你的项目使用的异步, 那么你就可以用异步的API 。
import asynciofrom playwright.async_api import async_playwrightasync def main(): async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.new_page() await page.goto("http://playwright.dev") print(await page.title()) await browser.close()asyncio.run(main())
你的第一个Playwright脚本
在第一个脚本中,我们将访问whatsmyuseragent.org 然后截个图。
from playwright.sync_api import sync_playwrightwith sync_playwright() as p: browser = p.webkit.launch() page = browser.new_page() page.goto("http://whatsmyuseragent.org/") page.screenshot(path="example.png") browser.close()
默认情况下, Playwrright 使用浏览器的无头模式。 如果想关闭这个模式,上设置 headless=False 就可以启动浏览器了。也可以用 slow_mo 让脚本执行变慢.
firefox.launch(headless=False, slow_mo=50)
交互模式 (REPL)
也可以启动python 的交互模式 REPL:
python
然后快速启动Playwright
>>> from playwright.sync_api import sync_playwright>>> playwright = sync_playwright().start()# Use playwright.chromium, playwright.firefox or playwright.webkit# Pass headless=False to launch() to see the browser UI>>> browser = playwright.chromium.launch()>>> page = browser.new_page()>>> page.goto("http://whatsmyuseragent.org/")>>> page.screenshot(path="example.png")>>> browser.close()>>> playwright.stop()
也可以用异步模式:
python -m asyncio
>>> from playwright.async_api import async_playwright>>> playwright = await async_playwright().start()>>> browser = await playwright.chromium.launch()>>> page = await browser.new_page()>>> await page.goto("http://whatsmyuseragent.org/")>>> await page.screenshot(path="example.png")>>> await browser.close()>>> await playwright.stop()
Pyinstaller
也可以用 Playwright 的 Pyinstaller来创建一个独立的执行脚本.
# main.pyfrom playwright.sync_api import sync_playwrightwith sync_playwright() as p: browser = p.chromium.launch() page = browser.new_page() page.goto("http://whatsmyuseragent.org/") page.screenshot(path="example.png") browser.close()
如果想绑定浏览器到执行脚本上, 可以设置如下配置。
Bash
PLAYWRIGHT_BROWSERS_PATH=0 playwright install chromiumpyinstaller -F main.py
PowerShell
$env:PLAYWRIGHT_BROWSERS_PATH="0"playwright install chromiumpyinstaller -F main.py
Batch
set PLAYWRIGHT_BROWSERS_PATH=0playwright install chromiumpyinstaller -F main.py
已知问题
time.sleep()会导致状态过期
很可能你不需要手动等待,因为Playwright有自动等待功能。如果你仍然依赖它,你应该使用page.wait_for_timeout(5000)而不是time.sleep(5),最好是完全不等待超时,但有时它对调试是有用的。在这些情况下,使用我们的等待(wait_for_timeout)方法而不是时间模块。这是因为我们内部依赖于异步操作,当使用time.sleep(5)时,它们不能得到正确处理。
与Windows上Asyncio的SelectorEventLoop不兼容
Playwright在一个子进程中运行驱动,所以在Windows上需要asyncio的ProactorEventLoop,因为SelectorEventLoop不支持异步子进程。
在Windows的Python 3.7上,Playwright将默认的事件循环设置为ProactorEventLoop,因为它在Python 3.8以上是默认的。
线程
Playwright的API不是线程安全的。如果你在多线程环境中使用Playwright,你应该为每个线程创建一个playwright实例。
如果觉得阿萨的内容对你有帮助,欢迎围观点赞。