python - parsing space delimited, named fields -
i have specific format of data (exported splunk>) mixture of csv , named fields. understand if possible in python parse such data via template (or simplified, average-human understandable regex)
"harry potter", "book", "12 mar 2014 note=""good"" language=""english""" "forrest gump", "movie", "14 march 2015 note=""good"" language=""aztec"""
as can see first fields comma separated, comes 1 long string starts date , have few named fields (note
, language
).
i build list of dicts solely named fields:
[ {'note': 'good', 'language'='english'}, {'note': 'good', 'language'='aztec'} ]
after parsing csv end last field (e.g. "12 mar 2014 note=""good"" language=""english"""
first line) , stuck, solution can think of try describe line in regex (which scary :). if managed extract tuples, how translate them dict?
the csv
module handle outer and doubled quoting you, out of box. columns have outer quotes (making sure delimiters, quotes , newlines in values preserved), , quotes in values doubled; csv.reader()
remove outer quotes , return strings single quotes 3rd column.
the named fields can handled regular expression:
import csv import re keyvalue = re.compile(r'([^"= ]+)="([^"]+)"') open(filename, 'rb') infh: reader = csv.reader(infh, skipinitialspace=true) namedfields = [dict(keyvalue.findall(row[2])) row in reader]
the skipinitialspace
option removes spaces after delimiter; needed ensure spaces before quoted column values removed correctly, in turn ensuring quoting handled.
the re.findall()
method here returns list of (key, value)
tuples, , dict()
type turn directly dictionary.
demo:
>>> import csv >>> import re >>> keyvalue = re.compile(r'([^"= ]+)="([^"]+)"') >>> sample = '''\ ... "harry potter", "book", "12 mar 2014 note=""good"" language=""english""" ... "forrest gump", "movie", "14 march 2015 note=""good"" language=""aztec""" ... ''' >>> reader = csv.reader(sample.splitlines(true), skipinitialspace=true) >>> [dict(keyvalue.findall(row[2])) row in reader] [{'note': 'good', 'language': 'english'}, {'note': 'good', 'language': 'aztec'}]
Comments
Post a Comment