lm - Removing character level outlier in R -
i have linear model1<-lm(divorce_rate~marriage_rate+median_age+population)
leverage plot shows outlier @ 28 (state variable id "nevada"). i'd specify model without nevada in dataset. tried following got stuck.
data<-read.dta("census.dta") attach(data) data1<-data.frame(pop,divorce,marriage,popurban,medage,divrate,marrate) attach(data1) model1<-lm(divrate~marrate+medage+pop,data=data1) summary(model1) layout(matrix(1:4,2,2)) plot(model1) dfbetaplots(lm(divrate~marrate+medage+pop),id.n=50) vif(model1) datanv<-data[!data$state == "nevada",] attach(datanv) model3<-lm(divrate~marrate+medage+pop,data=datanv)
the last line of above code gives me
error in model.frame.default(formula = divrate ~ marrate + medage + pop, : variable lengths differ (found 'medage')
i suspect have glitch in code such have attach()ed copies still lying around in environment -- that's why it's best practice not use attach()
. following code works me:
library(foreign) ## best not call data 'data' mydata <- read.dta("http://www.stata-press.com/data/r8/census.dta")
i didn't find divrate
or marrate
in data set: i'm going speculate want per capita rates:
## best practice use new name rather transforming 'in place' mydata2 <- transform(mydata,marrate=marriage/pop,divrate=divorce/pop) model1 <- lm(divrate~marrate+medage+pop,data=mydata2) library(car) plot(model1) dfbetaplots(model1)
this works fine me in clean session:
datanv <- subset(mydata2,state != "nevada") ## update() may nice avoid repeating details of ## model specification (not necessary in case) model3 <- update(model1,data=datanv)
or can use subset
argument:
model4 <- update(model1,subset=(state != "nevada"))
Comments
Post a Comment