Skip to content

intohole/sixgod

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sixgod pyton 网页正文提取

思想

  • 优势: 线性时间、不建DOM树、与HTML标签无关
from vampire.htmlextract import HtmlExtract
import requests
html = requests.get('http://www.fabao365.com/fangchan/167193/')
html.encoding="utf-8"
ex = HtmlExtract()
print ex.get_text(html.text)

About

正文提取|extract content from html

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages