python - xpath get the text from multi lines -
i have html
<td width="70%">regen real estate, dubai – u.a.e rera id: 12087 specialist licensed property brokers & consultants residential / commercial – buying, selling, r <a href="http://www.justproperty.com/company_view/index/3963">...read more...</a></td> i want text inside td
what have tried?
normalize-space(td/text()) but got last line.
what should lines?
you can use u"".join(selector.xpath('.//td//text()').extract()) or u"".join(selector.css('td ::text').extract())
i forgot simple way, if want every text content of specific node, can use normalize-space() on directly:
paul@wheezy:~$ ipython python 2.7.3 (default, jan 2 2013, 13:56:14) type "copyright", "credits" or "license" more information. ipython 0.13.1 -- enhanced interactive python. ? -> introduction , overview of ipython's features. %quickref -> quick reference. -> python's own system. object? -> details 'object', use 'object??' details. in [1]: scrapy.selector import selector in [2]: selector = selector(text="""<td width="70%">regen real estate, dubai – u.a.e ...: ...: rera id: 12087 ...: ...: specialist licensed property brokers & consultants ...: residential / commercial – buying, selling, r <a href="http://www.justproperty.com/company_view/index/3963">...read more...</a></td>""", type="html") in [3]: selector.xpath("normalize-space(.//td)") out[3]: [<selector xpath='normalize-space(.//td)' data=u'regen real estate, dubai \u2013 u.a.e rera id'>] in [4]: selector.xpath("normalize-space(.//td)").extract() out[4]: [u'regen real estate, dubai \u2013 u.a.e rera id: 12087 specialist licensed property brokers & consultants residential / commercial \u2013 buying, selling, r ...read more...'] in [5]: [td.xpath("normalize-space(.)").extract() td in selector.css("td")] out[5]: [[u'regen real estate, dubai \u2013 u.a.e rera id: 12087 specialist licensed property brokers & consultants residential / commercial \u2013 buying, selling, r ...read more...']] in [7]: remember normalize-space() consider 1st node in node-set give argument, want if sure argument match 1 , 1 node want.
Comments
Post a Comment