๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
CODING/AI & ML & DL

[๊ธฐ๊ณ„ํ•™์Šต] ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด | Decision Tree

by ๋ฐํ†จ๋งนํ†จ 2020. 10. 30.
728x90
728x90

๐Ÿ”Ž ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด

    - ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๋ฐ์ดํ„ฐ์— ๋‚ด์žฌ๋˜์–ด ์žˆ๋Š” ํŒจํ„ด์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธก ๋ฐ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ

    - ๋ถ„๋ฆฌ ๊ธฐ์ค€๊ณผ ์ •์ง€ ๊ทœ์น™์„ ์ง€์ •ํ•ด์„œ ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด๋ฅผ ์ƒ์„ฑ

 

์žฅ์ 

1๏ธโƒฃ ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๊ณ  ์ ์šฉํ•˜๊ธฐ ์‰ฝ๋‹ค

2๏ธโƒฃ ์˜์‚ฌ๊ฒฐ์ •๊ณผ์ •์— ๋Œ€ํ•œ ์„ค๋ช… ๊ฐ€๋Šฅ 

      โœ”๏ธ ์˜๋ฃŒ๋ถ€๋ถ„์ด๋‚˜  ๊ธˆ์œต๋ถ€๋ถ„์—์„œ ์ด์œ ๋ฅผ ์•Œ๋ ค์ค„ ์ˆ˜ ์žˆ์–ด ์‘์šฉ์ด ๊ฐ€๋Šฅํ•จ

3๏ธโƒฃ ์ค‘์š”ํ•œ ๋ณ€์ˆ˜ ์„ ํƒ์— ์œ ์šฉ 

      โœ”๏ธ ์ƒ๋‹จ์— ์‚ฌ์šฉ๋œ ๋ณ€์ˆ˜๊ฐ€ ์ค‘์š”ํ•œ ๋ณ€์ˆ˜ ( ์œ„์˜ ์‚ฌ์ง„์œผ๋กœ๋Š” ๋‚ ์”จ )

4๏ธโƒฃ ๋ฐ์ดํ„ฐ์˜ ํ†ต๊ณ„์  ๊ฐ€์ •์ด ํ•„์š” ์—†์Œ 

      โœ”๏ธ ex ) LDA : ๋ฐ์ดํ„ฐ ์ •๊ทœ์„ฑ์ด๋ผ๋Š” ๊ฐ€์ •์ด ํ•„์š”ํ–ˆ์Œ

 


๋‹จ์ 

1๏ธโƒฃ ๋งŽ์€ ๋ฐ์ดํ„ฐ ํ•„์š”

2๏ธโƒฃ ํŠธ๋ฆฌ๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ ์ƒ๋Œ€์ ์œผ๋กœ ์‹œ๊ฐ„์ด ๋งŽ์ด ์†Œ์š”

3๏ธโƒฃ ๋ฐ์ดํ„ฐ ๋ณ€ํ™”์— ๋ฏผ๊ฐ 

     โœ”๏ธ ํ•™์Šต ๋ฐ์ดํ„ฐ ↔๏ธ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ๋„๋ฉ”์ธ(์˜์—ญ)์ด ์œ ์‚ฌํ•ด์•ผํ•จ

4๏ธโƒฃ ์„ ํ˜• ๊ตฌ์กฐ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธกํ•  ๋•Œ ๋ณต์žก

     โœ”๏ธ ์ด ๊ฒฝ์šฐ ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด๊ฐ€ ์•„๋‹Œ ๋‹ค๋ฅธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ์ข‹์Œ

4๋ฒˆ ์˜ˆ์‹œ 


๐Ÿ“Œ ์˜์‚ฌ ๊ฒฐ์ • ํŠธ๋ฆฌ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฆฌ ๊ธฐ์ค€์— ๋”ฐ๋ผ 2๊ฐœ or ๊ทธ ์ด์ƒ์œผ๋กœ ๋ถ„ํ•  

๐Ÿ”ป

๋ฐ์ดํ„ฐ ์ˆœ๋„๊ฐ€ ๊ท ์ผํ•ด์ง€๋„๋ก ์žฌ๊ท€์  ๋ถ„ํ• 

๋ถ„๋ฅ˜(Classification) vs ์˜ˆ์ธก(Regression)

๐Ÿ”ธ ๋ถ„๋ฅ˜

 

โœ”๏ธ ๋ถ„ํ•  ์ข…๋ฃŒ ์กฐ๊ฑด : ๋ ๋…ธ๋“œ์— ๋น„์Šทํ•œ ๋ฒ”์ฃผ(ํด๋ž˜์Šค)๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š” ๊ด€์ธก ๋ฐ์ดํ„ฐ ๋ผ๋ฆฌ

โœ”๏ธ ํŒ๋ณ„ : ๋ ๋…ธ๋“œ์—์„œ ๊ฐ€์žฅ ๋นˆ๋„๊ฐ€ ๋†’์€ ์ข…์†๋ณ€์ˆ˜(y)๋ฅผ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์˜ ๊ฒฐ๊ณผ๋กœ ๋ถ€์—ฌ

๋ถ„๋ฅ˜

๐Ÿ“Œ ๊ฒฝํ–ฅ์„ฑ๋„ ํ™•๋ฅ ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅ 

 

๐Ÿ”ธ ์˜ˆ์ธก

 

โœ”๏ธ ๋ถ„ํ•  ์ข…๋ฃŒ ์กฐ๊ฑด: ๋ ๋…ธ๋“œ์— ๋น„์Šทํ•œ ์ˆ˜์น˜๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š” ๊ด€์ธก๋ฐ์ดํ„ฐ ๋ผ๋ฆฌ

โœ”๏ธ ํŒ๋ณ„ : ๋ ๋…ธ๋“œ์˜ ์ข…์†๋ณ€์ˆ˜(y)์˜ ํ‰๊ท ์„ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์˜ ๊ฒฐ๊ณผ๋กœ ๋ถ€์—ฌ

 

๐Ÿ“Œ  ์˜ˆ์ธก์˜ ๊ฒฝ์šฐ ํšŒ๊ท€ ๋‚˜๋ฌด๋ณด๋‹ค ์‹ ๊ฒฝ๋ง (neural network) or ํšŒ๊ท€ ๋ถ„์„์ด ๋” ์ข‹์Œ


๐Ÿ“Œ ๊ณผ์ ํ•ฉ

    - ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ์— ์™„์ „ํžˆ ์ ํ•ฉํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ ธ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ ์˜ค์ฐจ๊ฐ€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ฆ๊ฐ€

 ๐Ÿ”Ž ํ”ผํ•˜๋Š” ๋ฐฉ๋ฒ•

๐Ÿ“Œ ๊ฐ€์น˜์น˜๊ธฐ 

    - ๋ฐ์ดํ„ฐ๋ฅผ ๋ฒ„๋ฆฌ๋Š” ๊ฐœ๋…์ด ์•„๋‹Œ ํ•ฉ์น˜๋Š” ๊ฐœ๋…
    - ๋‚˜๋ฌด ๋ชจ๋ธ ์ƒ์„ฑ ํ›„ ํ•„์š” ์—†๋Š” ๊ฐ€์ง€ ์ œ๊ฑฐ
    - ์„ฑ์žฅ ๋ฉˆ์ถ”๊ธฐ ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ 
    - ๊ฐ€์ง€์น˜๊ธฐ ๋น„์šฉํ•จ์ˆ˜๋ฅผ ์ตœ์†Œ๋กœ ํ•˜๋Š” ๋ถ„๊ธฐ๋ฅผ ์ฐพ์Œ 
๐Ÿ“Œ ์„ฑ์žฅ๋ฉˆ์ถ”๊ธฐ

- ๋‚˜๋ฌด ๋ชจ๋ธ์˜ max depth๋ฅผ ์„ค์ •
- ๋‚˜๋ฌด ๋ชจ๋ธ์„ ์„ฑ์žฅ์‹œํ‚ค๋ฉด์„œ ํŠน์ • ์กฐ๊ฑด์— ์„ฑ์žฅ์„ ์ค‘๋‹จ

[ ๋ถ„๋ฅ˜ ๋‚˜๋ฌด / ๋ฐ์ดํ„ฐ : ์™€์ธ ๋ถ„๋ฅ˜ ]

 

1๏ธโƒฃ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 

### ์™€์ธ

from sklearn import datasets
wine=datasets.load_wine()

n_samples = len(wine.data)
data = wine.data.reshape((n_samples, -1))

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data, wine.target, test_size=0.3, shuffle=True)

2๏ธโƒฃ ๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ

 

from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier(criterion='gini', random_state=1)
dtc.fit(X_train, y_train)

[ DecisionTreeClassifier ๋งค๋‰ด์–ผ ] 

 

 

3๏ธโƒฃ ์„ฑ๋Šฅ ์ธก์ •

 


[ ์˜ˆ์ธก ๋‚˜๋ฌด / ๋ฐ์ดํ„ฐ : ๋ณด์Šคํ„ด ์ง‘๊ฐ’ ์˜ˆ์ธก ]

 

1๏ธโƒฃ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

### Boston Housing

from sklearn import datasets
price=datasets.load_boston()

n_samples = len(price.data)
data = price.data.reshape((n_samples, -1))

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data, price.target, test_size=0.3, shuffle=True)

2๏ธโƒฃ ๋ชจ๋ธ ๋งŒ๋“ค๊ธฐ

from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor(random_state=1)
regressor.fit(X_train, y_train)

[ DecisionTreeRegressior ๋งค๋‰ด์–ผ ]

 

3๏ธโƒฃ ์„ฑ๋Šฅ ์ธก์ •

 

 

728x90

๋Œ“๊ธ€