搜狗微信链接转换成真实微信URL

搜狗微信搜索优点不用多说,由于能比较方便获取微信资讯,自然吸引了N多爬虫光顾,为了反爬虫,搜狗经历了N次升级,现在简直可以用丧心病狂来形容。
别的不说,现在直接通过搜狗打开的微信网页是有时效性的,一段时间后,该链接就失效了。

如果我只是想简单的收藏个网址,该怎么做呢?

一、分析

先对比一下同一篇文章的URL地址,不难看出两者的区别。

#微信URL
http://mp.weixin.qq.com/s?__biz=MzI5MjMyNjk2Mg==&mid=2247485233&idx=1&sn=fb9ca534526a3940c0d5bb3e96b4c8c9&chksm=ec025c6cdb75d57aa401d411af2f0bb649cfe9e548982180bb94f12f26be3dc4c4a045a77df3&scene=4#wechat_redirect  
#搜狗URL
http://mp.weixin.qq.com/s?src=3&timestamp=1477458807&ver=1&signature=waQ8jIV2bU4Ak9MpzkBgORUzHVHxHscAQQP5rASR2E6yCWLHue87FI2zP0UP4HvAIGT3L0O15SNwK4M8VDjJHb721nsEOnxFJdocP-67ymL4io8apFiq5KhPvPwSgVlrRISGW2NHpdvhF*6kiv132PIVOMQ2SnKMGpM-igEQTV4=  

简单分析一下,微信URL是由几个参数构成的,我们需要还原真实的URL,必须找到这几个参数。
庆幸的是这几个参数就隐藏在网页中,并没有什么复杂的运算,我们要做的,只是提取网页中的几个参数,重新整合成一个新的URL即可。

二、解决

通过re检索网页文本,找到对应的字符串即可。
不赘述了,python代码奉上:

#-*-coding:utf-8-*-
import re  
import urllib2

def weixin_params_fix(url):  
    request = urllib2.Request(url)
    response = urllib2.urlopen(request,timeout=5)
    content = response.read()
    params = re.findall('''var msg_link = "(.*?)";''', content)
    url_temp =  str(params[0])
    url_true = re.sub(re.compile(";"),'&',url_temp)
    return url_true[:-3]

sogou_url = "http://mp.weixin.qq.com/s?src=3&timestamp=..."  
weixin_url = weixin_params_fix(sogou_url)  
print weixin_url

为方便操作,完整版:

# -*- coding: UTF-8 -*-
import re  
import urllib2  
import webbrowser

def weixin_params_fix(url):  
    request = urllib2.Request(url)
    response = urllib2.urlopen(request,timeout=5)
    content = response.read()
    rParams = '''var msg_link = "(.*?)";'''
    params = re.findall(rParams, content)
    url_temp =  str(params[0])
    replace = re.compile(";")
    url_true = re.sub(replace,'&',url_temp)
    return url_true[:-3]

url = raw_input("please iuput sogou url:")  
link = weixin_params_fix(url)

if link is not None:  
    print "url resolved:",link
    print "done."
    webbrowser.open(link)
else:  
    print "resolve failed."

leeway

继续阅读此作者的更多文章

comments powered by Disqus