HTML Analysis and Authentication in Python

advertisements

I'm trying to write a Python script that will automatically authenticate a user so that the site can be parsed and (eventually) output relevant data. I can't seem solve the auto authentication problem, which is likely why the parsing will not take place. One issue is that the site we're trying to log on to does not change and is always the same IP. We're also not using a a "submit" on this webpage (line 18), so we're not sure how to go about modifying that so it fits out needs.

from lxml import html
import requests
import sys

USER = ''
PASS = ''

URL = ''

def main():
# Start a session so we can have persistant cookies
session = requests.session(config={'verbose': sys.stderr})

# This is the form data that the page sends when logging in
login_data = {
    'InputPanel': USER,
    'InputPassword': PASS,
    'submit': 'login',
}

# Authenticate
r = session.post(URL, data=login_data)

# Try accessing a page that requires you to be logged in
r = session.get('')

if __name__ == '__main__':
main()

go('')

fv("1", "InputPanel", "")
fv("1", "InputPassword", "")

submit('0')

page = requests.get('')
tree = html.fromstring(page.text)

#Hours
hours = tree.xpath('//div[@class="staticObject"]/text()')
#Machine
projectors = tree.xpath('//div[@class1="corners dynamicObject"]/text()')

print 'Hours: ', hours
print 'Projectors: ', projectors

Searched for authentication in Python and found some results but not many seemed to apply to me. The code I have now is from an example I found, only as you'll see on line 25, there is no page we can't get to without being logged in, as the URL remains the same.

Any help would be great.

Without more information this is hard to answer. You could just go ahead and do the post request using

response = urllib2.urlopen(url)

(https://docs.python.org/2/library/urllib2.html) for the login and catch the session that it outputs, if it's a header/cookie you can catch them and use them for the rest of the session but you'll likely have to pass them for each page request. There isn't really enough information in your question to give you a full in depth answer here.

If you're parsing html you may want to look at http://www.crummy.com/software/BeautifulSoup/bs4/doc/

HTML Analysis and Authentication in Python

HTML Analysis and Authentication in Python

Recommend

从市场到政策，中国汽车消费者都更需要打着“国产烙印”的混动技术

Move a view when you move a button

How Important Is Email Marketing? 7 Points to Justify

Oculus Quest 2 gets official wireless-VR mode, 120 Hz support via patch

优爱腾管不住抖音快手

Good Bots & Bad Bots: Management And Their Impacts | Codecondo

Verizon, AT&T, and T-Mobile kill their cross-carrier RCS messaging plans

成立17年向潮流化转型，完美世界能靠近30款新游戏缓解利润压力吗？

10-30万元区间内，小米汽车的潜在对手有哪些？

英伟达携手Steam，玩家有机会买到原价显卡？

About Joyk