[ 源代码: html-text ]
软件包:python3-html-text(0.7.0-1.1)
试制(Experimental)软件包
警告:这个软件包来自于 experimental 发行版。这表示它很有可能表现出不稳定或者出现 bug ,甚至是导致资料损失。请务必在使用之前查阅 changelog 以及其他潜在的文档。
extract text from HTML.
How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup ?
* Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users; * html_text normalizes whitespace, but in a way smarter than .xpath('normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation; * html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers.
其他与 python3-html-text 有关的软件包
|
|
|
|
-
- dep: python3
- interactive high-level object-oriented language (default python3 version)
-
- dep: python3-lxml
- pythonic binding for the libxml2 and libxslt libraries
-
- dep: python3-lxml-html-clean
- blocklist-based HTML cleaner