2

Python一键检查Butterfly主题友情链接可用性脚本

 1 week ago
source link: https://blog.zhheo.com/p/baa6b18b.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
HeoGPT
生成中...
此内容根据文章生成,并经过人工审核,仅用于文章内容的解释与总结
投诉

因为友情链接实在是太多了,想要高频率检查几乎不可能,所以写了个python脚本来读取本地yml文件,然后发送head请求来检查是否能够访问。检查完生成一个txt文件能够看到所有无法请求到的url,然后就可以一个一个手动访问查看是否有问题。证明确实无法访问的移动到失联列表,显著提高效率。

友情链接格式

我的友情链接格式为:

- class_name: 友情鏈接
class_desc: 那些人,那些事
link_list:
- name: JerryC
link: https://jerryc.me/
avatar: https://jerryc.me/img/avatar.png
descr: 今日事,今日畢
- name: Hexo
link: https://hexo.io/zh-tw/
avatar: https://d33wubrfki0l68.cloudfront.net/6657ba50e702d84afb32fe846bed54fba1a77add/827ae/logo.svg
descr: 快速、簡單且強大的網誌框架
- class_name: 網站
class_desc: 值得推薦的網站
link_list:
- name: Youtube
link: https://www.youtube.com/
avatar: https://i.loli.net/2020/05/14/9ZkGg8v3azHJfM1.png
descr: 視頻網站
- name: Weibo
link: https://www.weibo.com/
avatar: https://i.loli.net/2020/05/14/TLJBum386vcnI1P.png
descr: 中國最大社交分享平台
- name: Twitter
link: https://twitter.com/
avatar: https://i.loli.net/2020/05/14/5VyHPQqR6LWF39a.png
descr: 社交分享平台

Python脚本

import yaml
import requests
import concurrent.futures
import os

# Path to the YAML file containing the link information
yaml_file_path = '你的友情链接文件地址link.yml'

# Path to the output text file that will list all inaccessible links
output_txt_path = '写入的无法访问网址列表文本地址inaccessible_links.txt'

# Load the YAML data
with open(yaml_file_path, 'r', encoding='utf-8') as file:
data = yaml.safe_load(file)

# User-Agent string to mimic a web browser
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

# Dictionaries to store accessible and inaccessible links with their original index
accessible_links = {}
inaccessible_links = {}

# Function to check if a link is accessible with a HEAD request
def check_link_accessibility(link, index):
headers = {"User-Agent": user_agent} # Add User-Agent to headers
try:
# Send a HEAD request instead of GET
response = requests.head(link, headers=headers, timeout=5)
if response.status_code == 200:
accessible_links[index] = link # Store accessible link with its index
print(f"Accessible: {link}", flush=True) # Print accessible links
else:
inaccessible_links[index] = link # Store inaccessible link with its index
except requests.RequestException:
inaccessible_links[index] = link # Store inaccessible link with its index

# Use a ThreadPoolExecutor to check multiple links concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
# Collect all links from the YAML data
links_to_check = []
index = 0 # Index to maintain the original order
for section in data:
if 'link_list' in section:
for item in section['link_list']:
links_to_check.append((index, item['link'])) # Keep track of index
index += 1

# Submit all the tasks to the executor with the original index
futures = [executor.submit(check_link_accessibility, link, idx) for idx, link in links_to_check]

# Ensure all futures are completed
concurrent.futures.wait(futures)

# Write the inaccessible links to the output text file in original order
with open(output_txt_path, 'w', encoding='utf-8') as file:
if inaccessible_links:
file.write("Inaccessible Links:\n")
for idx in sorted(inaccessible_links.keys()): # Sort by index to maintain order
file.write(f"{inaccessible_links[idx]}\n")
else:
file.write("All links are accessible.")

# Print the accessible links in the original order
print("Accessible Links:")
for idx in sorted(accessible_links.keys()): # Sort by index to maintain order
print(accessible_links[idx], flush=True)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK