r - Looking for a better way of creating a new data.frame with purchase frequency information from another data.frame -


ok, may come across little convoluted, i'll try , explain mean:

i have data.frame tells me number of products in different categories sold @ different sessions throughout each day of several years, , give true/false column each product in each category tells me if @ least 1 of type of product sold within session.

here's small example:

    year    month   day session  number sold  @ least 1 of x     2014    1        15 19876         10         true     2014    1        15 25698         10         false     2014    1        15 2156           2         true     2014    1        16 21578          0         false     2014    1        16 1123          12         false     2014    1        16 15469         25         true     2014    1        16 32654          0         false     2014    1        17 12589          8         true     2014    1        18 54269          4         true     2014    1        18 25136         14         true 

so, first observation, there 10 items in category sold, , @ least 1 of them product x. there 10 sold in category in next session, none of them product x.

here's dput() above table:

structure(list(year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014,  2014, 2014, 2014), month = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), day = c(15,  15, 15, 16, 16, 16, 16, 17, 18, 18), session = c(19876, 25698,  2156, 21578, 1123, 15469, 32654, 12589, 54269, 25136), numbsold = c(10,  10, 2, 0, 12, 25, 0, 8, 4, 14), min1x = structure(c(2l, 1l, 2l,  1l, 1l, 2l, 1l, 2l, 2l, 1l), .label = c("false", "true"), class = "factor")), .names = c("year",  "month", "day", "session", "numbsold", "min1x"), row.names = c(na,  -10l), class = "data.frame") 

so, i'm making data.frames show how % of time x @ least 1 of products purchased within category day of year. i'm looking following output:

      date      frequency sold     2014-01-15  0.6666667     2014-01-16  0.5000000     2014-01-17  1.0000000     2014-01-18  0.5000000 

with extremely limited knowledge of r (i'm learning, slowly), know can done so:

  myout<-data.frame(c("1/15/2014","1/16/2014","1/17/2014","1/18/2014"),               c(sum(mydf$min1x=="true" & mydf$year==2014 & mydf$month==1 & mydf$day==15)                 /sum(mydf$numbsold > 0 & mydf$year==2014 & mydf$month==1 & mydf$day == 15)               ,sum(mydf$min1x=="true" & mydf$year==2014 & mydf$month==1 & mydf$day==16)                  /sum(mydf$numbsold > 0 & mydf$year==2014 & mydf$month==1 & mydf$day == 16)               ,sum(mydf$min1x=="true" & mydf$year==2014 & mydf$month==1 & mydf$day==17)                  /sum(mydf$numbsold > 0 & mydf$year==2014 & mydf$month==1 & mydf$day == 17)               ,sum(mydf$min1x=="true" & mydf$year==2014 & mydf$month==1 & mydf$day==18)                  /sum(mydf$numbsold > 0 & mydf$year==2014 & mydf$month==1 & mydf$day == 18))) names(myout)<-c("date","frequency sold") myout$date<-as.date(myout$date, "%m/%d/%y")   

where i'm repeating commands , forcing build data.frame 1 day @ time. of course, doing on course of 2 years , dozens of different products gets quickly.

what can use tell r build data.frame particular product, showing percentage of time product made @ least 1 purchase within category each date found within dataset?

thanks!

this problem can solved using data.table package (which you'll have install).

first create date column out of year, month, , day columns:

mydf$date = as.date(paste(mydf$year, mydf$month, mydf$day, sep="-")) 

then turn data.table:

library(data.table) mydt = as.data.table(mydf) 

then can perform summarizing operation with:

mydt[, list(frequencysold=mean(min1x[numbsold > 0] == "true")), by=date] 

Comments

Popular posts from this blog

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -

google shop client API returns 400 bad request error while adding an item -