hadoop - How to send multiple line text to key/value in MapReduce, not single line -
i'm studying mapreduce. face problem.
my data ...(in example.)
<doc> <id> no-001 </id> <value> 001 value. </value> </doc> <doc> <id> no-002 </id> <value> 002 value. </value> </doc> ...
i need change above text ...
this 001 value. no-001 002 value. no-002 ...
i want send multiple line between , value of mapper in mapreduce. key anything. have searched example, can't problem.
to solve problem, think must handle inputformat.
please answer problem.
you should use mahout xmlinputformat class xml-files' parsing. allows configure driver code this:
conf.set("xmlinput.start", "<doc>"); conf.set("xmlinput.end", "</doc>"); job.setinputformatclass(xmlinputformat.class);
and inside mapper may process xml-content parser like. there is tutorial xmlinputformat class.
Comments
Post a Comment