4

python 获取网站cookie

 2 years ago
source link: https://www.hi-roy.com/posts/python-%E8%8E%B7%E5%8F%96%E7%BD%91%E7%AB%99cookie/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

python 获取网站cookie

2013-11-15

对于一般的网站来说,通过以下代码便可以获取到cookie:

import urllib2
import urllib
import cookielib
logurl = "https://www.digikey.com/classic/RegisteredUser/Login.aspx?"
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
resp = urllib2.urlopen(logurl)
for index, cookie in enumerate(cj):
    print '[',index, ']',cookie

然后在构造post数据向目标url发送即可(至于header,有人说如果在此时再次提交自己构造的handers将会覆盖获取到的有cookie的hander,未亲自试验,不过若是真的可以试试调用opener.addheaders方法添加)

但digikey这个网站不知什么原因访问后不给返回cookie???

经过试验发现,从浏览器中直接提取登录后的cookie添加到headers中,直接访问登录后的页面就实现了登录!连postdata似乎都是多余的。

teurl = "https://www.digikey.com/classic/RegisteredUser/MyDigikey.aspx"
headers = {
            "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language":"q=0.8,en-us",
            "Host":"www.digikey.com",
            "Cookie":"TS6b482d=1aa460f525235eaaaf6763d718b53ba564323; sid=5278426151-44225; TS50f921=c7b09221315dc72ed80a2b24f014fc2ffa4b16ccf877976952494826a3c65dd840b6a4ae6bad0a341809f751ca85ab033df615a7; TS168127=b58c4c2296cc37b2a5515974e403a008cdc46701e181d2a252494823019cdd5115d4a2a6a3c65dd840b6a4ae6bad0a341809f751ca85ab033df615a724f3d3ac91dbdb26; cur=USD; SiteForCur=US; utag_main=_st:1380536340264$ses_id:1380534378638%3Bexp-session; TS50f921_77=6487_b8824fd7e25d22fc_rsb_0_rs_https%3A%2F%2Fwww.digikey.com%2Fclassic%2FRegisteredUser%2FLogin.aspx%3FReturnUrl%3D%252fclassic%252fregistereduser%252fmydigikey.aspx%253fsite%253dus%2526lang%253den%26site%3Dus%26lang%3Den_rs_0; WT_FPC=id=36664e24-272b-40b3-9c92-2bce365f251d:lv=1380484152114:ss=1380483959856"
          }
#postdata = {
#            "__EVENTARGUMENT":"",
#            "__EVENTTARGET":"",
#            "__EVENTVALIDATION":"BcGpYOslmB3LGgxIVeQ+h35cvehYPZQcz1tM4jAlXqyYqV/g1blGRZnSJ4itN0YHd4C7aQtlJT0qWTL7vspdqVLEZtyljs5BJJuR+NhrIxCG0sdcfegZ1ZR1hdl/qIcNf1qpWfClikXsLCYWLe1N/Q6P1kU=",
#            "__LASTFOCUS":"",
#            "__SCROLLPOSITIONX":0,
#            "__SCROLLPOSITIONY":0,
#            "ctl00$ctl00$mainContentPlaceHolder$mainContentPlaceHolder$btnLogin":"Log In",
#            "ctl00$ctl00$mainContentPlaceHolder$mainContentPlaceHolder$txtPassword":"xxxxx",
#            "ctl00$ctl00$mainContentPlaceHolder$mainContentPlaceHolder$txtUsername":"xxxxx",
#          }
#postdata=urllib.urlencode(postdata)
req = urllib2.Request(teurl,postdata,headers)
res = urllib2.urlopen(req)

没想到最后解决的方法居然这么简单,甚至可以使用动态ip的方式抓取。但不知道以后会不会出现cookie过期的情况。

至于获取不到cookie的情况,或许因为302页面跳转原因。

这样可以考虑用LWPCookieJar或MozillaCookieJar将获取的cookie存到文件中,再load()载入。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK