python - Regex pattern to extract tag and its contents -


considering this:

input = """yesterday<person>peter</person>drove to<location>new york</location>""" 

how can 1 use regex patterns extract:

person: peter location: new york 

this works well, dont want hard code tags, can change:

print re.findall("<person>(.*?)</person>", input) print re.findall("<location>(.*?)</location>", input) 

use tool designed work. happen lxml other

>>> minput = """yesterday<person>peter smith</person>drove to<location>new york</location>""" >>> lxml import html >>> tree = html.fromstring(minput) >>> e in tree.iter():         print e, e.tag, e.text_content()         if e.tag() == 'person':          # getting last name per comment            last = e.text_content().split()[-1]            print last   <element p @ 0x3118ca8> p yesterdaypeter smithdrove tonew york <element person @ 0x3118b48> person peter smith smith                                            # here last name <element location @ 0x3118ba0> location new york 

if new python might want visit site installer number of packages including lxml.


Comments

Popular posts from this blog

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -

google shop client API returns 400 bad request error while adding an item -