[ Source: html-text ]
Пакунок: python3-html-text (0.7.0-1.1)
Links for python3-html-text
Debian Resources:
Download Source Package html-text:
Maintainer:
External Resources:
- Homepage [github.com]
Similar packages:
Експериментальний пакунок
Warning: This package is from the experimental distribution. That means it is likely unstable or buggy, and it may even cause data loss. Please be sure to consult the changelog and other possible documentation before using it.
extract text from HTML.
How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup ?
* Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users; * html_text normalizes whitespace, but in a way smarter than .xpath('normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation; * html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers.
Інші пакунки пов'язані з python3-html-text
|
|
|
|
-
- dep: python3
- interactive high-level object-oriented language (default python3 version)
-
- dep: python3-lxml
- Прив’язки в pythonic-стилі для бібліотек libxml2 та libxslt
-
- dep: python3-lxml-html-clean
- blocklist-based HTML cleaner
Завантажити python3-html-text
Архітектура | Розмір пакунка | Розмір після встановлення | Файли |
---|---|---|---|
all | 10.0 kB | 40.0 kB | [список файлів] |