python - xpath get the text from multi lines -


i have html

<td width="70%">regen real estate, dubai – u.a.e  rera id: 12087  specialist licensed property brokers &amp; consultants residential / commercial – buying, selling, r <a href="http://www.justproperty.com/company_view/index/3963">...read more...</a></td> 

i want text inside td

what have tried?

normalize-space(td/text()) 

but got last line.

what should lines?

you can use u"".join(selector.xpath('.//td//text()').extract()) or u"".join(selector.css('td ::text').extract())

i forgot simple way, if want every text content of specific node, can use normalize-space() on directly:

paul@wheezy:~$ ipython python 2.7.3 (default, jan  2 2013, 13:56:14)  type "copyright", "credits" or "license" more information.  ipython 0.13.1 -- enhanced interactive python. ?         -> introduction , overview of ipython's features. %quickref -> quick reference.      -> python's own system. object?   -> details 'object', use 'object??' details.  in [1]: scrapy.selector import selector  in [2]: selector = selector(text="""<td width="70%">regen real estate, dubai – u.a.e    ...:     ...: rera id: 12087    ...:     ...: specialist licensed property brokers &amp; consultants    ...: residential / commercial – buying, selling, r <a href="http://www.justproperty.com/company_view/index/3963">...read more...</a></td>""", type="html")  in [3]: selector.xpath("normalize-space(.//td)") out[3]: [<selector xpath='normalize-space(.//td)' data=u'regen real estate, dubai \u2013 u.a.e rera id'>]  in [4]: selector.xpath("normalize-space(.//td)").extract() out[4]: [u'regen real estate, dubai \u2013 u.a.e rera id: 12087 specialist licensed property brokers & consultants residential / commercial \u2013 buying, selling, r ...read more...']  in [5]: [td.xpath("normalize-space(.)").extract() td in selector.css("td")] out[5]: [[u'regen real estate, dubai \u2013 u.a.e rera id: 12087 specialist licensed property brokers & consultants residential / commercial \u2013 buying, selling, r ...read more...']]  in [7]:  

remember normalize-space() consider 1st node in node-set give argument, want if sure argument match 1 , 1 node want.


Comments

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

objective c - Greedy NSProgressIndicator Allocation -

how to set an OCR language in Google Drive -