LearnMore

Regression(Personal Project) 본문

Programming/R

Regression(Personal Project)

zionadd 2018. 10. 16. 15:53
Project Regression

Regression analysis

carData 패키지를 설치 및 로딩하고, 패키지네 들어 있는 데이터목록을 조회하시오.

if(!require(carData)){
  install.packages("carData")
  library(carData)
}else{
  library(carData)
}
## Loading required package: carData
if(!require(car)){
  install.packages("car")
  library(car)
}else{
  library(car)
}
## Loading required package: car
data(package="carData")

carData 패키지 내 데이터셋 목록 중 에 Prestige 데이터셋을 로딩하고 코딩북을 확인하시오.

data("Prestige")
help("Prestige")
## starting httpd help server ... done

Prestige 데이터셋 간단조회, 구조파악, 간단 기술통계분석하시오.

head(Prestige)
##                     education income women prestige census type
## gov.administrators      13.11  12351 11.16     68.8   1113 prof
## general.managers        12.26  25879  4.02     69.1   1130 prof
## accountants             12.77   9271 15.70     63.4   1171 prof
## purchasing.officers     11.42   8865  9.11     56.8   1175 prof
## chemists                14.62   8403 11.68     73.5   2111 prof
## physicists              15.64  11030  5.13     77.6   2113 prof
str(Prestige)
## 'data.frame':    102 obs. of  6 variables:
##  $ education: num  13.1 12.3 12.8 11.4 14.6 ...
##  $ income   : int  12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
##  $ women    : num  11.16 4.02 15.7 9.11 11.68 ...
##  $ prestige : num  68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
##  $ census   : int  1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
##  $ type     : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...
summary(Prestige)
##    education          income          women           prestige    
##  Min.   : 6.380   Min.   :  611   Min.   : 0.000   Min.   :14.80  
##  1st Qu.: 8.445   1st Qu.: 4106   1st Qu.: 3.592   1st Qu.:35.23  
##  Median :10.540   Median : 5930   Median :13.600   Median :43.60  
##  Mean   :10.738   Mean   : 6798   Mean   :28.979   Mean   :46.83  
##  3rd Qu.:12.648   3rd Qu.: 8187   3rd Qu.:52.203   3rd Qu.:59.27  
##  Max.   :15.970   Max.   :25879   Max.   :97.510   Max.   :87.20  
##      census       type   
##  Min.   :1113   bc  :44  
##  1st Qu.:3120   prof:31  
##  Median :5135   wc  :23  
##  Mean   :5402   NA's: 4  
##  3rd Qu.:8312            
##  Max.   :9517
raw<-na.omit(Prestige)
summary(raw)
##    education          income          women           prestige    
##  Min.   : 6.380   Min.   : 1656   Min.   : 0.000   Min.   :17.30  
##  1st Qu.: 8.445   1st Qu.: 4250   1st Qu.: 3.268   1st Qu.:35.38  
##  Median :10.605   Median : 6036   Median :14.475   Median :43.60  
##  Mean   :10.795   Mean   : 6939   Mean   :28.986   Mean   :47.33  
##  3rd Qu.:12.755   3rd Qu.: 8226   3rd Qu.:52.203   3rd Qu.:59.90  
##  Max.   :15.970   Max.   :25879   Max.   :97.510   Max.   :87.20  
##      census       type   
##  Min.   :1113   bc  :44  
##  1st Qu.:3116   prof:31  
##  Median :5132   wc  :23  
##  Mean   :5400            
##  3rd Qu.:8328            
##  Max.   :9517

다중회귀분석 실시하시오.

fit<-lm(income ~ education+women+prestige+census+type,data = raw)
summary(fit)
## 
## Call:
## lm(formula = income ~ education + women + prestige + census + 
##     type, data = raw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7752.4  -954.6  -331.2   742.6 14301.3 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.32053 3037.27048   0.002  0.99808    
## education    131.18372  288.74961   0.454  0.65068    
## women        -53.23480    9.83107  -5.415 4.96e-07 ***
## prestige     139.20912   36.40239   3.824  0.00024 ***
## census         0.04209    0.23568   0.179  0.85865    
## typeprof     509.15150 1798.87914   0.283  0.77779    
## typewc       347.99010 1173.89384   0.296  0.76757    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2633 on 91 degrees of freedom
## Multiple R-squared:  0.6363, Adjusted R-squared:  0.6123 
## F-statistic: 26.54 on 6 and 91 DF,  p-value: < 2.2e-16

유의하지 않은 변수를 제외한 추가 다중회귀분석 실시하시오.

fit.fw<-step(fit,direction ="forward",trace = FALSE)
summary(fit.fw)
## 
## Call:
## lm(formula = income ~ education + women + prestige + census + 
##     type, data = raw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7752.4  -954.6  -331.2   742.6 14301.3 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.32053 3037.27048   0.002  0.99808    
## education    131.18372  288.74961   0.454  0.65068    
## women        -53.23480    9.83107  -5.415 4.96e-07 ***
## prestige     139.20912   36.40239   3.824  0.00024 ***
## census         0.04209    0.23568   0.179  0.85865    
## typeprof     509.15150 1798.87914   0.283  0.77779    
## typewc       347.99010 1173.89384   0.296  0.76757    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2633 on 91 degrees of freedom
## Multiple R-squared:  0.6363, Adjusted R-squared:  0.6123 
## F-statistic: 26.54 on 6 and 91 DF,  p-value: < 2.2e-16
fit.bw<-step(fit,direction ="backward",trace = FALSE )
summary(fit.bw)
## 
## Call:
## lm(formula = income ~ women + prestige, data = raw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7678.4 -1050.9  -310.1   839.6 14114.3 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   653.17     838.09   0.779    0.438    
## women         -50.50       8.42  -5.997 3.61e-08 ***
## prestige      163.74      15.46  10.593  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2587 on 95 degrees of freedom
## Multiple R-squared:  0.6334, Adjusted R-squared:  0.6257 
## F-statistic: 82.08 on 2 and 95 DF,  p-value: < 2.2e-16
fit.cor<-lm(income~women+prestige,data=raw)
summary(fit.cor)
## 
## Call:
## lm(formula = income ~ women + prestige, data = raw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7678.4 -1050.9  -310.1   839.6 14114.3 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   653.17     838.09   0.779    0.438    
## women         -50.50       8.42  -5.997 3.61e-08 ***
## prestige      163.74      15.46  10.593  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2587 on 95 degrees of freedom
## Multiple R-squared:  0.6334, Adjusted R-squared:  0.6257 
## F-statistic: 82.08 on 2 and 95 DF,  p-value: < 2.2e-16

멀티캔버스를 통한 회귀모형 진단 플롯 6개 종합 시각화하시오.

par(mfrow=c(2,3))
plot(fit.cor,which=c(1:6))

다중공선성을 진단하시오.

vif(fit.cor)
##    women prestige 
##  1.01228  1.01228

예측모형을 통한 새로운 값을 예측하시오.(평균교육연수 12년, 직업별성점수 33, 직업여성비율 22)

new<-data.frame(education=12,prestige=33,women=22)
predict(fit.cor,new,interval = "none")
##        1 
## 4945.629
predict(fit.cor,new,interval = "confidence")
##        fit      lwr      upr
## 1 4945.629 4247.515 5643.743
predict(fit.cor,new,interval = "prediction")
##        fit       lwr      upr
## 1 4945.629 -236.9247 10128.18

'Programming > R' 카테고리의 다른 글

Classification(Personal Project)  (0) 2018.10.16
Association Rule Analysis(Personal Project)  (0) 2018.10.16
ABTest(Personal Project)  (0) 2018.10.16
'right' and 'include.lowest' parameter in cut() function  (0) 2018.08.28
t() function in table  (0) 2018.08.28
Comments