tokenize - Java: StringTokenizer does not respect separator -
i have following code extracts tab-separated strings string array:
static public list<string> getcontents(file afile, string separator){ // strings, split based on separator list<string> contentlist = new arraylist<string>(); stringtokenizer tokenizer = new stringtokenizer(util.getcontents(afile), separator); while (tokenizer.hasmoretokens()){ contentlist.add(tokenizer.nexttoken()); } return contentlist; }
the separator in case therefore "\t".
as long 2 strings separated 1 tab, great. however, dataset has 2 strings between separated 2 tabs. means 1 parameter missing , emptry string shoulid added list. method ignores , returns array 1 string less.
in particular case, want array of 5 strings back. means, text containing 4 tabs no text returns array of 5 empty strings (needed parsing job based on that). unfortunately, have no control on content , working millions of files generated out of control.
is there better way stringtokenizer ? or have implement on own?
here examples:
string ok = a\tb\tc\td\te string nok = a\tb\tc\t\te
ralf
found this: how split string in java
and can
"mystring".split("\t", -1);
to obtain empty strings if there multiple separators custering in 1 place.
thanks anyway!
Comments
Post a Comment