r - getting from histogram counts to cdf -
i have dataframe have values, , each value have counts associated value. so, plotting counts against values gives me histogram. have 3 types, a
, b
, , c
.
value counts type 0 139648267 1 34945930 2 5396163 3 1400683 4 485924 5 204631 6 98599 7 53056 8 30929 9 19556 10 12873 11 8780 12 6200 13 4525 14 3267 15 2489 16 1943 17 1588 ... ... ...
how cdf?
so far, approach super inefficient: first write function sums counts value:
get_cumulative <- function(x) { result <- numeric(nrow(x)) (i in seq_along(result)) { result[i] = sum(x[x$num_groups <= x$num_groups[i], ]$count) } x$cumulative <- result x }
then wrap in ddply
splits type. not best way, , i'd love suggestions on how proceed.
you can use ave
, cumsum
(assuming data in df
, sorted value):
transform(df, cdf=ave(counts, type, fun=function(x) cumsum(x) / sum(x)))
here toy example:
df <- data.frame(counts=sample(1:100, 10), type=rep(letters[1:2], each=5)) transform(df, cdf=ave(counts, type, fun=function(x) cumsum(x) / sum(x)))
that produces:
counts type cdf 1 55 0.2750000 2 61 0.5800000 3 27 0.7150000 4 20 0.8150000 5 37 1.0000000 6 45 b 0.1836735 7 79 b 0.5061224 8 12 b 0.5551020 9 63 b 0.8122449 10 46 b 1.0000000
Comments
Post a Comment