如何去除爬取网站数据中的转义字符？

ID:20608 / 打印

如何去除爬取网站数据中的转义字符？

针对问题中出现的 "" 和 "

" 等转义字符，可以通过以下方法将其去除：

1.使用正则表达式：

import re  html = "<p style="width: 100%;">(.*)</p>" dr = re.compile(r'<[^>]+>', re.s)  contant =re.findall(findcontant1, item) if len(contant) <= 0:     contant = re.findall(findcontant2, item) contant = dr.sub('', str(contant))

2.使用beautifulsoup进行解析：

from bs4 import BeautifulSoup import re  html = "<p style="width: 100%;">(.*)</p>" soup = BeautifulSoup(html, "html.parser") contant = soup.find('p').text

经过上述处理，即可去除转义字符，获得干净的文本内容。

上一篇: 如何用Python计算字符串中个位数整数的总和或数量？

下一篇: 如何从网页中提取网址，避免括号和单引号的干扰？

作者：admin @ 24资源网 2025-01-14

本站所有软件、源码、文章均有网友提供，如有侵权联系308410122@qq.com

与本文相关文章

发表评论:取消回复

◎欢迎参与讨论，请在这里发表您的看法、交流您的观点。

如何去除爬取网站数据中的转义字符？

与本文相关文章

栏目导航

最新文章

随机文章

热门文章