xml data handling regex find replace with conditional values -
i got xml file looks this
<documentelement> <table1> <date>2013-08-24</date> <time>00:07:23</time> <type>in</type> <number>393483419761</number> <name>marc</name> <message>lorem ipsum</message> </table1> <table1> <date>2013-08-24</date> <time>00:09:09</time> <type>out</type> <number>1215468498561</number> <name>marc</name> <message>lorem ipsum</message> </table1> <documentelement>
what want check date value , if month 01, add <month>january</month>
after </date>
, , if month 02 add <month>february</month>
, on. got far either:
<date>(\d{4})-01-(\d{2})</date> <date>$1-01-$2</date> <month>january</month>
or i'd like:
<date>(\d{4})-(\d{2})-(\d{2})</date> if ($2 = 01) { <date>$1-$2-$3</date> <month>january</month> } elseif ($2 = 02) { <date>$1-$2-$3</date> <month>february</month> }
whats usual way handle , manipulate data this?
normally if parsing xml use real parser instead of regular expressions. in particular case simple operation want do. go on each line, print it, , if current line date, extract month , print additional line.
here's example python script that.
import re months = ["january", "february", "march", "april", "may", "june", "july", "august", "september", "october", "november", "december"] open(your_xml_file) f: line in f: print line match = re.search(r'<date>\d{4}-(?p<month>\d{2})-\d{2}</date>', line) if match not none: print months[int(match.group('month')) - 1]
note, fail insert whitespace or add else attributes date. that's why it's better use real parser. if format stated, faster write small throw away script this.
Comments
Post a Comment