python - Regex pattern to extract tag and its contents -
considering this:
input = """yesterday<person>peter</person>drove to<location>new york</location>"""
how can 1 use regex patterns extract:
person: peter location: new york
this works well, dont want hard code tags, can change:
print re.findall("<person>(.*?)</person>", input) print re.findall("<location>(.*?)</location>", input)
use tool designed work. happen lxml other
>>> minput = """yesterday<person>peter smith</person>drove to<location>new york</location>""" >>> lxml import html >>> tree = html.fromstring(minput) >>> e in tree.iter(): print e, e.tag, e.text_content() if e.tag() == 'person': # getting last name per comment last = e.text_content().split()[-1] print last <element p @ 0x3118ca8> p yesterdaypeter smithdrove tonew york <element person @ 0x3118b48> person peter smith smith # here last name <element location @ 0x3118ba0> location new york
if new python might want visit site installer number of packages including lxml.
Comments
Post a Comment