python博客爬虫算法

编程

更新时间：2025-05-180

admin管理员组
文章数量:1437903

python博客爬虫算法

博客爬虫算法

我希望从某些网站，把博客文章保存成本地的md文件，用python实现

不管你怎么想，反正我是成功了

step1:C:\Users\wangrusheng\PycharmProjects\FastAPIProject1\hello.py

```python

import requests

from bs4 import BeautifulSoup

import html2text # 新增HTML转Markdown库

# 目标文章URL

url = ''

# 设置请求头模拟浏览器

headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'

}

try:

# 发送HTTP请求

response = requests.get(url, headers=headers)

response.raise_for_status()

# 解析HTML内容

soup = BeautifulSoup(response.text, 'html.parser')

# 查找文章主体

article_content = soup.find('div', {'id': 'content_views'})

if article_content:

# 创建HTML转Markdown转换器

h = html2text.HTML2Text()

h.body_width = 0 # 禁用自动换行

h.emphasis_mark = '*' # 设置斜体符号

h.strong_mark = '**' # 设置加粗符号

h.ul_item_mark = '-' # 设置无序列表符号

# 转换HTML内容为Markdown

md_content = h.handle(str(article_content))

# 保存到文件

with open('csdn_article2.md', 'w', encoding='utf-8') as f:

f.write(md_content)

print('文章已成功保存为Markdown格式！')

else:

print('未找到文章内容，请检查页面结构。')

except requests.exceptions.RequestException as e:

print(f'请求出错：{e}')

except Exception as e:

print(f'发生错误：{e}')

```

end

本文标签： python博客爬虫算法

版权声明：本文标题：python博客爬虫算法内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/biancheng/1747538989a2703660.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python博客爬虫算法

python博客爬虫算法

更多相关文章

python博客爬虫算法

发表评论

推荐文章

functions - Display a text message if the shortcode is found?

javascript - How to disable ckeditor cloudservices - Stack Overflow

javascript - How to debug tests with yarn - Stack Overflow

IFrame Shortcode plugin - issue

javascript - Split a string in AngularJs 2 - Stack Overflow

热门文章

database - Can someoene please help with customization of a post?

javascript - fill an array with getElementById - Stack Overflow

Java template engine with spring boot application not running in Intellj - Stack Overflow

javascript - How to get Element from Node - Stack Overflow

Image (SandTimer) Animation - HTML, CSS, Javascript - Stack Overflow

ios - Could not Localize strings in my xcframework - Stack Overflow

post thumbnails - Smush Featured image constraining in image file that is being loaded on page

javascript - window.document.write - write html element and more - Stack Overflow

戴尔Latitude 5540 i5 1340P8GB1TB集显参数报价

【进程控制】

最新文章

信息技术教师职业路线（一）初识系统安装与维修

前端性能优化：理论支撑与实践之路

JUC并发—5.AQS源码分析一

22. python循环教程

开源端口映射工具神器大集合，内网穿透飕飕快，收藏！

javascript - Type 'undefined' is not assignable to type 'menuItemProps[]' - Stack Overflow

javascript - VS 2015 Angular 2 import modules cannot be resolved - Stack Overflow

javascript - Get the JSON objects that are not present in another array - Stack Overflow

javascript - How to dismiss a phonegap notification programmatically - Stack Overflow

c - Solaris 10 make Error code 1 Fatal Error when trying to build python 2.7.16 - Stack Overflow

联想lenovo小新Pro16 2024 AI超能本R7 8745H24GB1TB集显参数报价

华硕灵耀14 2023 旗舰版 i9 13900H16GB512GB集显金参数报价

清华同方S30i-46参数报价

VAIO SX12 2022 i7 1260P16GB512GB集显樱花粉参数报价

ThinkPad 翼465 20翼X000PCD 参数报价

编程频道|软件玩家 - 软件改变生活！

python博客爬虫算法

python博客爬虫算法

更多相关文章

python博客爬虫算法

发表评论

推荐文章

functions - Display a text message if the shortcode is found?

javascript - How to disable ckeditor cloudservices - Stack Overflow

javascript - How to debug tests with yarn - Stack Overflow

IFrame Shortcode plugin - issue

javascript - Split a string in AngularJs 2 - Stack Overflow

热门文章

database - Can someoene please help with customization of a post?

javascript - fill an array with getElementById - Stack Overflow

Java template engine with spring boot application not running in Intellj - Stack Overflow

javascript - How to get Element from Node - Stack Overflow

Image (SandTimer) Animation - HTML, CSS, Javascript - Stack Overflow

ios - Could not Localize strings in my xcframework - Stack Overflow

post thumbnails - Smush Featured image constraining in image file that is being loaded on page

javascript - window.document.write - write html element and more - Stack Overflow

戴尔Latitude 5540 i5 1340P8GB1TB集显参数报价

【进程控制】

最新文章

信息技术教师职业路线（一）初识系统安装与维修

前端性能优化：理论支撑与实践之路

JUC并发—5.AQS源码分析一

22. python循环教程

开源端口映射工具神器大集合，内网穿透飕飕快，收藏！

javascript - Type &#39;undefined&#39; is not assignable to type &#39;menuItemProps[]&#39; - Stack Overflow

javascript - VS 2015 Angular 2 import modules cannot be resolved - Stack Overflow

javascript - Get the JSON objects that are not present in another array - Stack Overflow

javascript - How to dismiss a phonegap notification programmatically - Stack Overflow

c - Solaris 10 make Error code 1 Fatal Error when trying to build python 2.7.16 - Stack Overflow

联想lenovo小新Pro16 2024 AI超能本R7 8745H24GB1TB集显参数报价

华硕灵耀14 2023 旗舰版 i9 13900H16GB512GB集显金参数报价

清华同方S30i-46参数报价

VAIO SX12 2022 i7 1260P16GB512GB集显樱花粉 参数报价

ThinkPad 翼465 20翼X000PCD 参数报价

javascript - Type 'undefined' is not assignable to type 'menuItemProps[]' - Stack Overflow

VAIO SX12 2022 i7 1260P16GB512GB集显樱花粉参数报价