how to scrape search results if returned in javascript (using python) -

February 15, 2010

the site want scrape populates returns using javascript.

can call script somehow , work results? (then without pagination, of course.) don't want run entire thing scrape resulting formatted html, raw source blank.

have look: http://kozbeszerzes.ceu.hu/searchresults.xhtml?q=1998&page=0

the source of return simply

<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="/templates/base_template.xsl"?> <content>   <head>     <script type="text/javascript" src="/js/searchresultsview.js"></script>       </head>     <whitebox>     <div id = "hits"></div>     </whitebox> </content>

i prefer simple python tools.

i downloaded selenium , chromedriver.

from selenium import webdriver  driver = webdriver.chrome() driver.get('http://kozbeszerzes.ceu.hu/searchresults.xhtml?q=1998&page=0')  e in driver.find_elements_by_class_name('result'):     link = e.find_element_by_tag_name('a')     print(link.text.encode('ascii', 'ignore'), link.get_attribute('href').encode('ascii', 'ignore'))  driver.quit()

if you're using chrome, can inspect page attributes using f12, pretty useful.

Search This Blog

Silver

how to scrape search results if returned in javascript (using python) -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

objective c - Greedy NSProgressIndicator Allocation -

how to set an OCR language in Google Drive -