python - Regex pattern to extract tag and its contents -

August 15, 2012

considering this:

input = """yesterday<person>peter</person>drove to<location>new york</location>"""

how can 1 use regex patterns extract:

person: peter location: new york

this works well, dont want hard code tags, can change:

print re.findall("<person>(.*?)</person>", input) print re.findall("<location>(.*?)</location>", input)

use tool designed work. happen lxml other

>>> minput = """yesterday<person>peter smith</person>drove to<location>new york</location>""" >>> lxml import html >>> tree = html.fromstring(minput) >>> e in tree.iter():         print e, e.tag, e.text_content()         if e.tag() == 'person':          # getting last name per comment            last = e.text_content().split()[-1]            print last   <element p @ 0x3118ca8> p yesterdaypeter smithdrove tonew york <element person @ 0x3118b48> person peter smith smith                                            # here last name <element location @ 0x3118ba0> location new york

if new python might want visit site installer number of packages including lxml.

Search This Blog

Silver

python - Regex pattern to extract tag and its contents -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -