python - Get google search pages from specific dates -


i'm trying scrape google in specific time dates, year 2002, 2004, , on. can't use pygoogle, xgoogle or google search since not have option specify period searching for. so, found out query that, when running script, google sending me same results, no matter in search page am.

this code:

import time import urllib2 import re import random #define search term. agent='pt+e+pmdb'  #define headers hdr = {'user-agent': 'mozilla/5.0 (x11; linux x86_64) applewebkit/537.11 (khtml, gecko) chrome/23.0.1271.64 safari/537.11', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'accept-charset': 'iso-8859-1,utf-8;q=0.7,*;q=0.3', 'accept-encoding': 'none', 'accept-language': 'en-us,en;q=0.8', 'connection': 'keep-alive'}  #inc variable of loop contador=0 #vector links stored. links2002={} #number of pages search through. npages=50  #start routine. in range(1,npages,1):     tempurl2002='https://www.google.com/search?q='+str(agent)+'&hl=pt-br&biw=1137&bih=1354&sa=x&ei=er8ru8hteiqhkqeeuocicg&ved=0cboqpwuobjgu&source=lnt&tbs=cdr%3a1%2ccd_min%3a01%2f01%2f2002%2ccd_max%3a31%2f12%2f2002&tbm=#filter=0&hl=pt-br&q='+str(agent)+'&start='+str(i*10)+'&tbs=cdr:1,cd_min:01/01/2002,cd_max:31/12/2002'     #url used request.     req=urllib2.request(tempurl2002,headers=hdr)     #search.     searchresults=urllib2.urlopen(req)     #get search data.     page=searchresults.read()     #define random pause of algorithm.     wt=random.uniform(10,30)     #pause algorithm in order prevent google stoping it.     time.sleep(wt)     #get links.     links = re.findall('http[s]?://(?:[a-za-z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fa-f] [0-9a-fa-f]))+', page)     #armazena os resultados.     url in links:         contador=contador+1         links2002[contador]=url 

does know how right? there clever way google search results specific dates?

best, julio.


Comments

Popular posts from this blog

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -

google shop client API returns 400 bad request error while adding an item -