r - Looking for a better way of creating a new data.frame with purchase frequency information from another data.frame -
ok, may come across little convoluted, i'll try , explain mean:
i have data.frame tells me number of products in different categories sold @ different sessions throughout each day of several years, , give true/false column each product in each category tells me if @ least 1 of type of product sold within session.
here's small example:
year month day session number sold @ least 1 of x 2014 1 15 19876 10 true 2014 1 15 25698 10 false 2014 1 15 2156 2 true 2014 1 16 21578 0 false 2014 1 16 1123 12 false 2014 1 16 15469 25 true 2014 1 16 32654 0 false 2014 1 17 12589 8 true 2014 1 18 54269 4 true 2014 1 18 25136 14 true
so, first observation, there 10 items in category sold, , @ least 1 of them product x. there 10 sold in category in next session, none of them product x.
here's dput() above table:
structure(list(year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014), month = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), day = c(15, 15, 15, 16, 16, 16, 16, 17, 18, 18), session = c(19876, 25698, 2156, 21578, 1123, 15469, 32654, 12589, 54269, 25136), numbsold = c(10, 10, 2, 0, 12, 25, 0, 8, 4, 14), min1x = structure(c(2l, 1l, 2l, 1l, 1l, 2l, 1l, 2l, 2l, 1l), .label = c("false", "true"), class = "factor")), .names = c("year", "month", "day", "session", "numbsold", "min1x"), row.names = c(na, -10l), class = "data.frame")
so, i'm making data.frames show how % of time x @ least 1 of products purchased within category day of year. i'm looking following output:
date frequency sold 2014-01-15 0.6666667 2014-01-16 0.5000000 2014-01-17 1.0000000 2014-01-18 0.5000000
with extremely limited knowledge of r (i'm learning, slowly), know can done so:
myout<-data.frame(c("1/15/2014","1/16/2014","1/17/2014","1/18/2014"), c(sum(mydf$min1x=="true" & mydf$year==2014 & mydf$month==1 & mydf$day==15) /sum(mydf$numbsold > 0 & mydf$year==2014 & mydf$month==1 & mydf$day == 15) ,sum(mydf$min1x=="true" & mydf$year==2014 & mydf$month==1 & mydf$day==16) /sum(mydf$numbsold > 0 & mydf$year==2014 & mydf$month==1 & mydf$day == 16) ,sum(mydf$min1x=="true" & mydf$year==2014 & mydf$month==1 & mydf$day==17) /sum(mydf$numbsold > 0 & mydf$year==2014 & mydf$month==1 & mydf$day == 17) ,sum(mydf$min1x=="true" & mydf$year==2014 & mydf$month==1 & mydf$day==18) /sum(mydf$numbsold > 0 & mydf$year==2014 & mydf$month==1 & mydf$day == 18))) names(myout)<-c("date","frequency sold") myout$date<-as.date(myout$date, "%m/%d/%y")
where i'm repeating commands , forcing build data.frame 1 day @ time. of course, doing on course of 2 years , dozens of different products gets quickly.
what can use tell r build data.frame particular product, showing percentage of time product made @ least 1 purchase within category each date found within dataset?
thanks!
this problem can solved using data.table
package (which you'll have install).
first create date column out of year, month, , day columns:
mydf$date = as.date(paste(mydf$year, mydf$month, mydf$day, sep="-"))
then turn data.table
:
library(data.table) mydt = as.data.table(mydf)
then can perform summarizing operation with:
mydt[, list(frequencysold=mean(min1x[numbsold > 0] == "true")), by=date]
Comments
Post a Comment