Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/nrow
- https://stat.ethz.ch/R-manual/R-devel/library/base/html/table.html
- https://stackoverflow.com/questions/38976217/what-is-the-meaning-of-include-lowest-in-reclassify-raster-package-r
- Latex is suck
Archives
- Today
- Total
LearnMore
Classification(Personal Project) 본문
Project Classification
Classification
party 패키지 설치 및 로딩, 패키지 내 데이터 목록조회
if(!require(party)){
install.packages("party")
library(party)
}else{
library(party)
}
## Loading required package: party
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
if(!require(caret)){
install.packages("caret")
library(caret)
}else{
library(caret)
}
## Loading required package: caret
## Loading required package: lattice
## Loading required package: ggplot2
data(package="party")
part 패키지 내 readingSkills 데이터셋 로딩과 코딩북 확인
data("readingSkills")
help("readingSkills")
## starting httpd help server ... done
readingSkills 데이터셋에 대한 간단조회, 구조파악, 간단 기술통계분석
head(readingSkills)
## nativeSpeaker age shoeSize score
## 1 yes 5 24.83189 32.29385
## 2 yes 6 25.95238 36.63105
## 3 no 11 30.42170 49.60593
## 4 yes 7 28.66450 40.28456
## 5 yes 11 31.88207 55.46085
## 6 yes 10 30.07843 52.83124
str(readingSkills)
## 'data.frame': 200 obs. of 4 variables:
## $ nativeSpeaker: Factor w/ 2 levels "no","yes": 2 2 1 2 2 2 1 2 2 1 ...
## $ age : int 5 6 11 7 11 10 7 11 5 7 ...
## $ shoeSize : num 24.8 26 30.4 28.7 31.9 ...
## $ score : num 32.3 36.6 49.6 40.3 55.5 ...
summary(readingSkills)
## nativeSpeaker age shoeSize score
## no :100 Min. : 5.000 Min. :23.17 Min. :25.26
## yes:100 1st Qu.: 6.000 1st Qu.:26.23 1st Qu.:33.94
## Median : 8.000 Median :27.85 Median :40.33
## Mean : 7.925 Mean :27.87 Mean :40.66
## 3rd Qu.: 9.250 3rd Qu.:29.49 3rd Qu.:47.57
## Max. :11.000 Max. :32.33 Max. :56.71
raw<-readingSkills
반응변수인 nativeSpeaker의 레이블순서를 yes < no 순서로 변경
raw<-readingSkills
raw<-na.omit(raw)
raw$nativeSpeaker<-factor(readingSkills$nativeSpeaker,levels = c("yes","no"),ordered = T)
str(raw)
## 'data.frame': 200 obs. of 4 variables:
## $ nativeSpeaker: Ord.factor w/ 2 levels "yes"<"no": 1 1 2 1 1 1 2 1 1 2 ...
## $ age : int 5 6 11 7 11 10 7 11 5 7 ...
## $ shoeSize : num 24.8 26 30.4 28.7 31.9 ...
## $ score : num 32.3 36.6 49.6 40.3 55.5 ...
학습(트레이닝) & 검증(테스트) 데이터 70:30 비율로 추출
set.seed(1234)
train<-sample(nrow(raw),0.7*nrow(raw))
data.train<-raw[train,]
data.test<-raw[-train,]
학습 & 검증 데이터 간단조회
str(data.train)
## 'data.frame': 140 obs. of 4 variables:
## $ nativeSpeaker: Ord.factor w/ 2 levels "yes"<"no": 2 1 2 1 2 2 1 1 1 1 ...
## $ age : int 8 8 10 7 5 9 6 8 11 7 ...
## $ shoeSize : num 28.7 27.7 29.8 26.7 28 ...
## $ score : num 38.1 43.8 45.7 39.5 26.2 ...
str(data.test)
## 'data.frame': 60 obs. of 4 variables:
## $ nativeSpeaker: Ord.factor w/ 2 levels "yes"<"no": 1 1 2 2 2 2 1 1 1 1 ...
## $ age : int 5 11 7 6 6 7 6 8 7 7 ...
## $ shoeSize : num 24.8 31.9 26.7 26.9 25.2 ...
## $ score : num 32.3 55.5 33.9 30 30.4 ...
학습데이터를 이용한 분류규칙 생성 & 분류규칙 그래프 그리기
ctre<-ctree(nativeSpeaker~.,data = data.train)
print(ctre)
##
## Conditional inference tree with 8 terminal nodes
##
## Response: nativeSpeaker
## Inputs: age, shoeSize, score
## Number of observations: 140
##
## 1) score <= 30.86356; criterion = 1, statistic = 26.067
## 2)* weights = 21
## 1) score > 30.86356
## 3) score <= 50.84003; criterion = 0.96, statistic = 6.1
## 4) age <= 6; criterion = 1, statistic = 24.668
## 5)* weights = 17
## 4) age > 6
## 6) age <= 9; criterion = 0.98, statistic = 7.344
## 7) score <= 43.34602; criterion = 1, statistic = 23.825
## 8) age <= 7; criterion = 0.999, statistic = 12.697
## 9) score <= 34.72458; criterion = 1, statistic = 18.526
## 10)* weights = 10
## 9) score > 34.72458
## 11)* weights = 10
## 8) age > 7
## 12)* weights = 24
## 7) score > 43.34602
## 13)* weights = 21
## 6) age > 9
## 14)* weights = 16
## 3) score > 50.84003
## 15)* weights = 21
par(mfrow=c(1,1))
plot(ctre,type="simple")
분류규칙을 이용한 학습(train)데이터 분류분석
cpart.prob.train<-predict(ctre,data.train)
학습데이터 response 패턴과 분류규칙 분류패턴간 교차분석
cpart.perf.train <- table(cpart.prob.train, data.train$nativeSpeaker,
dnn=c( "TrainRule", "TrainActual"))
addmargins(cpart.perf.train)
## TrainActual
## TrainRule yes no Sum
## yes 69 0 69
## no 2 69 71
## Sum 71 69 140
검증(test)데이터에 대한 분류분석
cpart.prob.test <- predict(ctre, data.test)
검증데이터 response 패턴과 분류규칙 분류패턴간 교차분석
cpart.perf.test <- table(cpart.prob.test, data.test$nativeSpeaker,
dnn=c("TestPredicted", "TestActual"))
addmargins(cpart.perf.test)
## TestActual
## TestPredicted yes no Sum
## yes 28 2 30
## no 1 29 30
## Sum 29 31 60
혼동표 그리기
confusionMatrix(cpart.perf.test)
## Confusion Matrix and Statistics
##
## TestActual
## TestPredicted yes no
## yes 28 2
## no 1 29
##
## Accuracy : 0.95
## 95% CI : (0.8608, 0.9896)
## No Information Rate : 0.5167
## P-Value [Acc > NIR] : 1.837e-13
##
## Kappa : 0.9
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9655
## Specificity : 0.9355
## Pos Pred Value : 0.9333
## Neg Pred Value : 0.9667
## Prevalence : 0.4833
## Detection Rate : 0.4667
## Detection Prevalence : 0.5000
## Balanced Accuracy : 0.9505
##
## 'Positive' Class : yes
##
'Programming > R' 카테고리의 다른 글
Regression(Personal Project) (0) | 2018.10.16 |
---|---|
Association Rule Analysis(Personal Project) (0) | 2018.10.16 |
ABTest(Personal Project) (0) | 2018.10.16 |
'right' and 'include.lowest' parameter in cut() function (0) | 2018.08.28 |
t() function in table (0) | 2018.08.28 |
Comments