English Version
pre-commit suddenly started to fail installing the isort hook in our builds today with the following error
[INFO] Installing environment for https://github.com/pycqa/isort. [INFO] Once installed this environment will be reused. [INFO] This may take a few minutes... An unexpected error has occurred: CalledProcessError: command: ('/builds/.../.cache/pre-commit/repo0_h0f938/py_env-python3.8/bin/python', '-mpip', 'install', '.') return code: 1 expected return code: 0 [...] stderr: ERROR: Command errored out with exit status 1: [...] File "/tmp/pip-build-env-_3j1398p/overlay/lib/python3.8/site-packages/poetry/core/masonry/api.py", line 40, in prepare_metadata_for_build_wheel poetry = Factory().create_poetry(Path(".").resolve(), with_groups=False) File "/tmp/pip-build-env-_3j1398p/overlay/lib/python3.8/site-packages/poetry/core/factory.py", line 57, in create_poetry raise RuntimeError("The Poetry configuration is invalid:\n" + message) RuntimeError: The Poetry configuration is invalid: - [extras.pipfile_deprecated_finder.2] 'pip-shims<=0.3.4' does not match '^[a-zA-Z-_.0-9]+$'
It seems to be related with poetry configuration..
I have a text, contains HTML tags something like:
text = <p>Some text</p> <h1>Some text</h1> .... soup = BeautifulSoup(text)
I parsed this text using BeautifulSoup
. I would like to extract every sentence with corresponding text and tag. I tried:
for sent in soup: print(sent.text) <- ok print(sent.tag) <- **not ok since NavigableString does not has tag attribute**
I also tried soup.find_all()
and stuck at the same point: I have access to text but not original tag.
Instead of tag
use name
to get the elements tag name:
for tag in soup.find_all(): print(tag.text, tag.name)
Use the parameter 'html.parser'
to avoid behavior of standard parser lxml
that will slightly reshape the structure and wraps partial HTML in <html>
and <body>
Example
from bs4 import BeautifulSoup html = '''<p>Some text</p><h1>Some text</h1>''' soup = BeautifulSoup(html, 'html.parser') for tag in soup.find_all(): print(tag.text, tag.name)
Output
Some text p Some text h1