套件：python3-html-text（0.7.0-1.1）

python3-html-text 的相關連結

Debian 的資源：

下載原始碼套件 html-text：

維護者：

Christian Marillat (QA 頁面)

外部的資源：

主頁 [github.com]

相似套件：

試製（Experimental）套件

警告：這個套件來自於 experimental 發行版。這表示它很有可能表現出不穩定或者出現 bug ，甚至是導致資料損失。請務必在使用之前查閱 changelog 以及其他潛在的文件。

extract text from HTML.

How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup ?

 * Text extracted with html_text does not contain inline styles,
   javascript, comments and other text that is not normally visible to
   users;
 * html_text normalizes whitespace, but in a way smarter than
   .xpath('normalize-space()), adding spaces around inline elements (which
   are often used as block elements in html markup), and trying to avoid
   adding extra spaces for punctuation;
 * html-text can add newlines (e.g. after headers or paragraphs), so that
   the output text looks more like how it is rendered in browsers.

其他與 python3-html-text 有關的套件

依賴

推薦

建議

增強

dep: python3

interactive high-level object-oriented language (default python3 version)
dep: python3-lxml

pythonic binding for the libxml2 and libxslt libraries
dep: python3-lxml-html-clean

blocklist-based HTML cleaner

下載 python3-html-text

下載可用於所有硬體架構的
硬體架構	套件大小	安裝後大小	檔案
all	10。0 kB	40。0 kB	[檔案列表]