reshape - Data set containing columns of unequal length to long form in R -
edited clarify nas removed in final data frame. nas added upon import avoid dealing blanks. not have significance beyond that.
i have data set (csv file) consisting of columns of character vectors, each of different lengths. combine them long form. (i believe "long form" correct term in case please correct me if wrong). below simple example illustrate want.
when imported data, filled missing spaces na avoid dealing blanks have caused me problems in past. following code simulates how data upon import after filling nas:
set1 <- c("a", "f", "r", "g", na, na, na, na) set2 <- c("g", "q", "u", "i", "g", "d", "k", "b") set3 <- c("v", "s", "m", "j", "k", "l", na, na) dat <- data.frame(set1, set2, set3)
which gives following r console output:
set1 set2 set3 1 g v 2 f q s 3 r u m 4 g j 5 <na> g k 6 <na> d l 7 <na> k <na> 8 <na> b <na>
i data appear in two-column format nas removed. first column contain column number letter appears in. second column contain each of columns stacked on each other. believe called long form may mistaken. this:
col char 1 1 2 1 f 3 1 r 4 1 g 5 2 g 6 2 q 7 2 u 8 2 9 2 g 10 2 d 11 2 k 12 2 b 13 3 v 14 3 s 15 3 m 16 3 j 17 3 k 18 3 l
i have managed make work combination of stack
function, removing nas, , bit of code count number of occurrences put them first column. seems overly cumbersome , know if there better way or better way handle kind of data have deal with. data frame not seem best way since columns different lengths not know of suitable alternatives.
the reason need data in format can plot in ggplot2
. there corresponding numerical values each letter left out of example above simplicity. final result actual dataset dot plot column number on x axis, numerical value on y axis, , color coded character vectors.
thank help.
here approaches produce 2 column output shown in question given dat
:
stack
transform(na.omit(stack(lapply(dat, as.character))), ind = as.numeric(ind))
reshape
na.omit(reshape(dat, dir = "long", varying = list(names(dat)))[1:2])
Comments
Post a Comment