Finding sentences that contain one of an array of keywords using Python -
i'm using python 2.7
i want go through .txt file , keep sentences contain 1 or more of list of keywords.
after want go through remaining text once more list of keywords , repeat proces.
the result want save in .txt, rest can deleted.
i'm new python (but loving it!) don't worry hurting feelings by, you're free assume little knowledge on side , dumb down bit :)
this have far:
import re f = open('c:\\python27\\test\\a.txt') text = f.read() define_words = 'contractual' print re.findall(r"([^.]*?%s[^.]*\.)" % define_words,text) and works in far filters out sentence 'contractual' in it. if i'd put 'contractual obligation' there filter out sentences have 2 words next each other.
what i'm stuck @ how change array of words considered seperately of each other? 'contractual', 'obligation', 'law', 'employer' etc etc
edit regarding applepi's answer:
i've done testing small test:
"the quick brown fox jumps on lazy dog.
new line.
yet nice new line."
i sentence if put 2 words in sentence in string. ['quick', 'brown']
output: ['t', 'h', 'e', ' ', 'q', 'u', 'i', 'c', 'k', ' ', 'b', 'r', 'o', 'w', 'n', ' ', 'f', 'o', 'x', 'y', ' ', 'j', 'u', 'm', 'p', 's', ' ', 'o', 'v', 'e', 'r', ' ', 't', 'h', 'e', ' ', 'l', 'a', 'z', 'y', ' ', 'd', 'o', 'g', '.']
so ['quick', 'another'] comes nothing.
['yet', 'another'] come with:
output: [' ', '\n', '\n', 'y', 'e', 't', ' ', 'a', 'n', 'o', 't', 'h', 'e', 'r', ' ', 'n', 'i', 'c', 'e', ' ', 'n', 'e', 'w', ' ', 'l', 'i', 'n', 'e', '.']
why not use list comprehension?
print [sent sent in text.split('.') if any(word in sent word in define_words.split()) ] or if change define_words list of strings:
# define_words = ['contractual', 'obligations'] define_words = 'contractual obligations'.split() print [sent sent in text.split('.') if any(word in sent word in define_words) ]
Comments
Post a Comment