how to scrape search results if returned in javascript (using python) -
the site want scrape populates returns using javascript.
can call script somehow , work results? (then without pagination, of course.) don't want run entire thing scrape resulting formatted html, raw source blank.
have look: http://kozbeszerzes.ceu.hu/searchresults.xhtml?q=1998&page=0
the source of return simply
<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="/templates/base_template.xsl"?> <content> <head> <script type="text/javascript" src="/js/searchresultsview.js"></script> </head> <whitebox> <div id = "hits"></div> </whitebox> </content> i prefer simple python tools.
i downloaded selenium , chromedriver.
from selenium import webdriver driver = webdriver.chrome() driver.get('http://kozbeszerzes.ceu.hu/searchresults.xhtml?q=1998&page=0') e in driver.find_elements_by_class_name('result'): link = e.find_element_by_tag_name('a') print(link.text.encode('ascii', 'ignore'), link.get_attribute('href').encode('ascii', 'ignore')) driver.quit() if you're using chrome, can inspect page attributes using f12, pretty useful.
Comments
Post a Comment