Python自动登录网站脚本

找到个免费VPN不容易，虽然过一会就断线，但是对于俺连接美服更新Diablo3是绰绰有余了。不过这个VPN需要每3天登录一次，这个比较麻烦。

思路：

很早之前就读过关于用Python抓站的文章（simplecd.org作者Observer大大的文章很给力）里面提到自动登录。所以直接使用拿来主义借用。

不过现在网站登录没个验证码真是说不过去，所以这个也是一个需要解决的问题。不过还好，网站使用的验证码非常的正常，没有扭曲变形加噪，更不像google的captcha那么变态。是非常正常的英文加数字：
- 那必须直接用google的开源ocr工具tesseract-ocr啊，这也是之前研究Sikuli时了解到的~
- 至于和Python挂钩，使用google搜了下，找到个简单的tesseract的wrapper工具pytesser符合我的要求

开工：

安装tesseract-ocr的Windows最新版本（其它OS的同学安装对应的就好）
下载pytesser，解压出来pytesser.py，util.py，errors.py直接放到脚本文件夹备用
然后自己写的脚本。如下：

#!/usr/bin/env python
#-*- coding: UTF-8 -*-
# filename: AutoLogin.py

from __future__ import unicode_literals
import urllib2
import cookielib
import urllib
import Image
from cStringIO import StringIO
import re
from pytesser import *

LOGIN_URL = 'http://*.*.*.*/lr.sm' #网站就隐了，被发现了估计验证码加强了就不好整了-_-||
IMAGE_URL = 'http://*.*.*.*/image'
USER = 'yourusername'
PWD = 'yourpassword'

### OCR using pytesser ###
img_file=urllib2.urlopen(IMAGE_URL)
img= StringIO(img_file.read())
checkImg= Image.open(img)
ocr_str= image_to_string(checkImg).lower()
CODE=''.join(ocr_str.split())

postdata=urllib.urlencode({
    'user.nick':USER,
    'password':PWD,
    'validationCode':CODE,
})

headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1',
    'Referer':LOGIN_URL
}

cookie_support = urllib2.HTTPCookieProcessor(cookielib.CookieJar())
opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)
urllib2.install_opener(opener)

req = urllib2.Request( url = LOGIN_URL, data = postdata, headers = headers )

result = urllib2.urlopen(req).read()
decoded_result=result.decode('utf-8')
if re.search('{} **欢迎您'.format(USER), decoded_result): #隐去网站名称...
    print 'Logged in successfully!'
else:
    with open('result.html','w') as f:
        f.write(result)
    print 'Logged in failed, check result.html file for details'

应该是只登录就好了，所以没对cookie做处理。以后有时间研究下cookielib~

Share this:

Related

Published by seganw

Leave a comment Cancel reply