14

HTTP错误403-禁止使用urlretrieve

 4 years ago
source link: http://coding2live.com/detail/78/HTTP%E9%94%99%E8%AF%AF403-%E7%A6%81%E6%AD%A2%E4%BD%BF%E7%94%A8urlretrieve
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

HTTP错误403-禁止使用urlretrieve

coding2live 2021-01-29 15:27:18 0 154 python, http, python-requests, urllib

我正在下载PDF,遇到了一个报错:HTTP Error 403: Forbidden

我个人猜测的原因可能是请求被禁止了,但我没有找到解决方案。

下面是我的代码:

import urllib.request
import urllib.parse
import requests


def download_pdf(url):

full_name = "Test.pdf"
urllib.request.urlretrieve(url, full_name)


try: 
url =         ('http://papers.xtremepapers.com/CIE/Cambridge IGCSE/Mathematics (0580)/0580_s03_qp_1.pdf')

print('initialized')

hdr = {}
hdr = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2)     AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36',
'Content-Length': '136963',
}



print('HDR recieved')

req = urllib.request.Request(url, headers=hdr)

print('Header sent')

resp = urllib.request.urlopen(req)

print('Request sent')

respData = resp.read()

download_pdf(url)


print('Complete')

except Exception as e:
print(str(e))

以下答案仅供参考

你的猜测是对的。

远程服务器显然正在检查user agent header,并拒绝来自Python的urllib的请求。

虽然urllib.request.urlretrieve()不允许更改HTTP请求头。但是,你可以用urllib.request.URLopener.retrieve():

import urllib.request

opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'whatever')
filename, headers = opener.retrieve(url, 'Test.pdf')

注意:你使用的是python3,这些函数现在被认为是“遗留接口”的一部分,而且URLopener已被弃用。

所以,不应该继续使用这些老旧的方法了。

另外,简单直接地访问URL也会遇到很多麻烦。

你的项目里引入了requests包,那应该使用requests而不是用urllib

requests使用起来更简单:

import requests

url = 'http://papers.xtremepapers.com/CIE/Cambridge IGCSE/Mathematics (0580)/0580_s03_qp_1.pdf'
r = requests.get(url)
with open('0580_s03_qp_1.pdf', 'wb') as outfile:
    outfile.write(r.content)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK