hadoop - How to send multiple line text to key/value in MapReduce, not single line -


i'm studying mapreduce. face problem.

my data ...(in example.)

<doc> <id> no-001 </id> <value> 001 value. </value> </doc>  <doc> <id> no-002 </id> <value> 002 value. </value> </doc> ... 

i need change above text ...

this 001 value. no-001 002 value. no-002 ... 

i want send multiple line between , value of mapper in mapreduce. key anything. have searched example, can't problem.

to solve problem, think must handle inputformat.

please answer problem.

you should use mahout xmlinputformat class xml-files' parsing. allows configure driver code this:

conf.set("xmlinput.start", "<doc>"); conf.set("xmlinput.end", "</doc>"); job.setinputformatclass(xmlinputformat.class); 

and inside mapper may process xml-content parser like. there is tutorial xmlinputformat class.


Comments

Popular posts from this blog

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -

google shop client API returns 400 bad request error while adding an item -