Python读取HTML

骏妈和ADHD俱乐部

AI新大陆Pro

陶朱公的投资小院——从0到1实现利息躺平

库称为beautifulsoup。使用该库，无涯教程可以搜索html标签的值，并获取特定的数据，例如页面的标题和页面中的标题列表。

安装Beautifulsoup

使用Anaconda软件包管理器来安装所需的软件包及其从属软件包。

conda install Beaustifulsoap

读取HTML文件

在下面的示例中，请求将网址加载到python环境中，然后使用html parser参数读取整个html文件，接下来，打印html页面的前几行。

import urllib2
from bs4 import BeautifulSoup

# 获取 html 文件
response=urllib2.urlopen('http://learnfk.com/python/python_overview.htm')
html_doc=response.read()

# 解析html文件
soup=BeautifulSoup(html_doc, 'html.parser')

# 格式化解析后的html文件
strhtm=soup.prettify()

# 打印前几个字符
print (strhtm[:225])

<!DOCTYPE html>
<!--[if IE 8]><html class="ie ie8"> <![endif]-->
<!--[if IE 9]><html class="ie ie9"> <![endif]-->
<!--[if gt IE 9]><!-->
<html>
 <!--<![endif]-->
 <head>
  <!-- Basic-->
  <meta charset="utf-8"/>
  <title>

提取标签值

无涯教程可以使用以下代码从标签的第一个实例中提取标签值。

import urllib2
from bs4 import BeautifulSoup

response = urllib2.urlopen('http://learnfk.com/python/python_overview.htm')
html_doc = response.read()

soup = BeautifulSoup(html_doc, 'html.parser')

print (soup.title)
print(soup.title.string)
print(soup.a.string)
print(soup.b.string)

<span class="typ">Python</span><span class="pln"> </span><span class="typ">Overview</span>
Python Overview
None
Python is Interpreted

提取所有标签

可以使用以下代码从标签的所有实例中提取标签值。

import urllib2
from bs4 import BeautifulSoup

response = urllib2.urlopen('http://learnfk.com/python/python_overview.htm')
html_doc = response.read()
soup = BeautifulSoup(html_doc, 'html.parser')

for x in soup.find_all('b'): print(x.string)

当无涯教程执行上面的代码时，它将产生以下输出。

Python is Interpreted
Python is Interactive
Python is Object-Oriented
Python is a Beginner's Language
Easy-to-learn
Easy-to-read
Easy-to-maintain
A broad standard library
Interactive Mode
Portable
Extendable
Databases
GUI Programming
Scalable

祝学习愉快！(内容编辑有误？请选中要编辑内容 -> 右键 -> 修改 -> 提交！)

教程推荐

Spring Boot与Kubernetes云原生微服务实践 -〔杨波〕

好记忆不如烂笔头。留下您的足迹吧 :)