Do not match word boundary beetwen parenthesis with python regex -
i have:
regex = r'\bon the\b'
but need regex match if keyword (actually "on the") not between parentheses in text:
should match:
john on beach let me put on fridge (my son) on beach arnold on road (to home)
should not match:
(my son )on beach john @ beach bob @ pool (berkeley) spon (is on table)
i don't think regex here general case. examples, regex work want to:
((?<=[^\(\)].{3})\bon the\b(?=.{3}[^\(\)])
description:
(?<=[^\(\)].{3}) positive lookbehind - assert regex below can matched [^\(\)] match single character not present in list below \( matches character ( literally \) matches character ) literally .{3} matches character (except newline) quantifier: 3 times \b assert position @ word boundary (^\w|\w$|\w\w|\w\w) on matches characters on literally (case sensitive) \b assert position @ word boundary (^\w|\w$|\w\w|\w\w) (?=.{3}[^\(\)]) positive lookahead - assert regex below can matched .{3} matches character (except newline) quantifier: 2 times [^\(\)] match single character not present in list below \( matches character ( literally \) matches character ) literally
if want generalize problem string between parentheses , string searching for, not work regex. issue length of string between parentheses , string. in regex lookbehind quantifiers not allowed indefinite.
in regex used positive lookahead , positive lookbehind, same result achieved negative ones, issue remains.
suggestion: write small python code can check whole line if contain text not between parentheses, regex alone can't job.
example:
import re mystr = 'on the' unwanted = re.findall(r'\(.*'+mystr+'.*\)|\)'+mystr, data) # <- here put un-wanted string series, easy define regex # delete un-wanted strings line in mylist: item in unwanted: if item in line: mylist.remove(line) # want line in mylist: if mystr in line: print line
where:
mylist: list contains lines want search through. mystr: string want find.
hope helped.
Comments
Post a Comment