java - Mahout random forest classifier example ArrayIndexOutOfBoundsException -
while trying run random forest example encounter java.lang.arrayindexoutofboundsexception: 100
error. here 100 bind number of trees. map part 100% complete , reduce 0%. use hadoop-1.2.1
, mahout-distribution-0.7
. have tried mahout-distribution-0.9
same error.
does ran example luck?
problem found. if running hadoop mapred.job.tracker=local, partialbuilder cannot number of mapping tasks using mapred.map.tasks. consequence computes number of trees per mapping task wrong.
solution: don't use parameter "-p" when running random forest job on local hadoop.
details:
windiana@host:~/mahout/data/> hadoop jar $mahout_home/examples/target/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.buildforest -dmapred.max.split.size=1874231 -d testdata/kddtrain+.arff -ds testdata/kddtrain+.info -sl 5 -t 100 -o nsl-forest warning: $hadoop_home deprecated. 14/08/07 11:25:18 info mapreduce.buildforest: inmem mapred implementation 14/08/07 11:25:18 info mapreduce.buildforest: building forest... 14/08/07 11:25:18 info util.nativecodeloader: loaded native-hadoop library 14/08/07 11:25:19 info filecache.trackerdistributedcachemanager: creating kddtrain+.info in /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata-work-5026960219142699303 rwxr-xr-x 14/08/07 11:25:19 info filecache.trackerdistributedcachemanager: cached testdata/kddtrain+.info /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/kddtrain+.info 14/08/07 11:25:19 info filecache.trackerdistributedcachemanager: cached testdata/kddtrain+.info /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/kddtrain+.info 14/08/07 11:25:19 info filecache.trackerdistributedcachemanager: creating kddtrain+.arff in /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata-work-5750487161401524172 rwxr-xr-x 14/08/07 11:25:19 info filecache.trackerdistributedcachemanager: cached testdata/kddtrain+.arff /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/kddtrain+.arff 14/08/07 11:25:19 info filecache.trackerdistributedcachemanager: cached testdata/kddtrain+.arff /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/kddtrain+.arff 14/08/07 11:25:19 info mapred.jobclient: running job: job_local966281240_0001 14/08/07 11:25:19 info mapred.localjobrunner: waiting map tasks 14/08/07 11:25:19 info mapred.localjobrunner: starting task: attempt_local966281240_0001_m_000000_0 14/08/07 11:25:19 info util.processtree: setsid exited exit code 0 14/08/07 11:25:19 info mapred.task: using resourcecalculatorplugin : org.apache.hadoop.util.linuxresourcecalculatorplugin@2df8fdda 14/08/07 11:25:19 info mapred.maptask: processing split: [firstid:0, nbtrees:100, seed:null] 14/08/07 11:25:19 info inmem.inmemmapper: loading data... 14/08/07 11:25:20 info mapred.jobclient: map 0% reduce 0% 14/08/07 11:25:21 info inmem.inmemmapper: data loaded : 125973 instances 14/08/07 11:25:25 info mapred.localjobrunner: 14/08/07 11:25:26 info mapred.jobclient: map 1% reduce 0% ... 14/08/07 11:27:59 info mapred.jobclient: map 98% reduce 0% 14/08/07 11:28:00 info mapred.task: task:attempt_local966281240_0001_m_000000_0 done. , in process of commiting 14/08/07 11:28:00 info mapred.localjobrunner: 14/08/07 11:28:00 info mapred.task: task attempt_local966281240_0001_m_000000_0 allowed commit 14/08/07 11:28:00 info output.fileoutputcommitter: saved output of task 'attempt_local966281240_0001_m_000000_0' file:/home/martin/programmieren/mahout/data/cut/nsl-forest 14/08/07 11:28:00 info mapred.localjobrunner: 14/08/07 11:28:00 info mapred.task: task 'attempt_local966281240_0001_m_000000_0' done. 14/08/07 11:28:00 info mapred.localjobrunner: finishing task: attempt_local966281240_0001_m_000000_0 14/08/07 11:28:00 info mapred.localjobrunner: map task executor complete. 14/08/07 11:28:00 info mapred.jobclient: map 99% reduce 0% 14/08/07 11:28:00 info mapred.jobclient: job complete: job_local966281240_0001 14/08/07 11:28:00 info mapred.jobclient: counters: 12 14/08/07 11:28:00 info mapred.jobclient: file output format counters 14/08/07 11:28:00 info mapred.jobclient: bytes written=2353226 14/08/07 11:28:00 info mapred.jobclient: file input format counters 14/08/07 11:28:00 info mapred.jobclient: bytes read=0 14/08/07 11:28:00 info mapred.jobclient: filesystemcounters 14/08/07 11:28:00 info mapred.jobclient: file_bytes_read=61962918 14/08/07 11:28:00 info mapred.jobclient: file_bytes_written=45667235 14/08/07 11:28:00 info mapred.jobclient: map-reduce framework 14/08/07 11:28:00 info mapred.jobclient: map input records=100 14/08/07 11:28:00 info mapred.jobclient: physical memory (bytes) snapshot=0 14/08/07 11:28:00 info mapred.jobclient: spilled records=0 14/08/07 11:28:00 info mapred.jobclient: total committed heap usage (bytes)=132120576 14/08/07 11:28:00 info mapred.jobclient: cpu time spent (ms)=0 14/08/07 11:28:00 info mapred.jobclient: virtual memory (bytes) snapshot=0 14/08/07 11:28:00 info mapred.jobclient: split_raw_bytes=90 14/08/07 11:28:00 info mapred.jobclient: map output records=100 14/08/07 11:28:00 info common.hadooputil: deleting file:/home/martin/programmieren/mahout/data/cut/nsl-forest 14/08/07 11:28:00 info mapreduce.buildforest: build time: 0h 2m 41s 702 14/08/07 11:28:00 info mapreduce.buildforest: forest num nodes: 130056 14/08/07 11:28:00 info mapreduce.buildforest: forest mean num nodes: 1300 14/08/07 11:28:00 info mapreduce.buildforest: forest mean max depth: 19 14/08/07 11:28:00 info mapreduce.buildforest: storing forest in: nsl-forest/forest.seq
Comments
Post a Comment