Skip to content

Latest commit

ย 

History

History
105 lines (63 loc) ยท 4.42 KB

File metadata and controls

105 lines (63 loc) ยท 4.42 KB

๋ชจ๋‘์˜ ํœด๊ฒŒ์†Œ AI ๋ถ€๋ถ„

1. ๋‹ค์ด์–ด๋ฆฌ ๊ธฐ๋ฐ˜ ๊ฐ์ • ๋ถ„์„

1. Data

AI hub์˜ ๊ฐ์„ฑ ๋Œ€ํ™” ๋ง๋ญ‰์น˜ ์‚ฌ์šฉ

AI์™€ ์‚ฌ๋žŒ์˜ ๋Œ€ํ™” ๋ฐ์ดํ„ฐ๋กœ ์ด 6๊ฐœ์˜ ๋Œ€๋ถ„๋ฅ˜ ๊ฐ์ •(๋ถ„๋…ธ, ์Šฌํ””, ๋ถˆ์•ˆ, ์ƒ์ฒ˜, ๋‹นํ™ฉ, ๊ธฐ์จ)์•ˆ์— 60๊ฐœ์˜ ์†Œ๋ถ„๋ฅ˜ ๊ฐ์ •์ด ์žˆ์Œ.

๊ฐ ๊ฐ์ •์— ๋”ฐ๋ฅธ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ ๋ฅด๊ฒŒ ๋ถ„ํฌ ๋˜์–ด ์žˆ์Œ.

์˜ํ™”์™€ ์Œ์•… ์ถ”์ฒœ์„ ์œ„ํ•ด ์ƒ์ฒ˜ ๊ฐ์ •์€ ์‚ญ์ œํ•จ

2. preprocessing

๋Œ€ํ™” ์ค‘ ์ธ๊ณต์ง€๋Šฅ์˜ ๋Œ€๋‹ต์€ ์‚ญ์ œํ•˜๊ณ  ์‚ฌ๋žŒ์˜ ๋ฐœํ™”๋งŒ ์ €์žฅ

์‚ฌ๋žŒ์˜ ๋ฐœํ™” ์ค‘ ๋ฌธ์ž์— ํ•ด๋‹นํ•˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„์€ regex๋กœ ์‚ญ์ œ

์ดํ›„ ๊ฐ์ • ์ˆซ์ž ์ธ๋ฑ์Šค๋กœ ๋ณ€ํ™˜

3. models

1. LSTM

koNLPy์˜ Okt, Komoran, Hannanum ์‚ฌ์šฉํ•ด์„œ ๊ฐ ์„ฑ๋Šฅ ๋น„๊ต.

Stopwords ์ œ๊ฑฐ๋Š” ๋งํฌ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ง„ํ–‰.

Stopwords๋ฅผ ํ˜•ํƒœ์†Œ ๋ถ„์„์„ ํ†ตํ•ด ์กฐ์‚ฌ, ์–ด๋ฏธ๋“ฑ์„ ์‚ญ์ œํ•ด์คฌ์œผ๋‚˜ ์œ„ ๋ฐ์ดํ„ฐ ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋–จ์–ด์ ธ์„œ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ.

Best Valid Accuracy

  • Hannanum :0.6716

  • Komoran :0.6699

  • Okt :0.6636

2. BERT

bert-base-multilingual-cased์˜ tokenizer์™€ classfication model์‚ฌ์šฉ.

optimizer๋Š” Adam์œผ๋กœ ์ง„ํ–‰.

์„ฑ๋Šฅ์€ ๋‚˜์˜์ง€ ์•Š์€๋ฐ 1epoch๋‹น ํ•œ ์‹œ๊ฐ„์ด ์†Œ์š”๋ผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๋‹ค.

3. KoBERT

์ตœ์ข… ์‚ฌ์šฉ ๋ชจ๋ธ.

๊ธฐ๋ณธ bert tokenizer ์‚ฌ์šฉ.

max-len์ด ๋ณดํ†ต 80์ดํ•˜๋กœ ๋Š๊ฒจ์„œ 80์ดํ•˜๋กœ ํ•™์Šตํ•˜๋‹ˆ acc๋Š” ์ข‹์•˜์ง€๋งŒ ์‹ค ์ ์šฉ์—์„œ ์ฒด๊ฐ์ƒ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ ธ 200์œผ๋กœ ๋Š˜๋ ค์„œ ํ•™์Šต์‹œํ‚ด. ์ด ๋ถ€๋ถ„์— ์žˆ์–ด์„œ๋Š” ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ ํ™•๋ณด ํ•„์š”

Best Valid Accuracy : 0.7485

drawing image ํด๋ฆญ์‹œ wandb๋กœ ์ด๋™

2. ๊ฐ์ • ๊ธฐ๋ฐ˜ ์˜ํ™” ์ถ”์ฒœ

1. Data

Large Movie Review Dataset ์‚ฌ์šฉ

์˜ํ™”์™€ ๊ฐ์ •์„ ์—ฐ๊ด€์ง€์„ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๋ฆฌ๋ทฐ์—์„œ ์ฐพ์Œ.

hugging face์—์„œ ์ œ๊ณตํ•˜๋Š” emotion dataset์œผ๋กœ ํ•™์Šต์‹œํ‚จ ํ›„ review๋กœ ์ถ”๋ก ํ•ด ๊ฐ ๊ฐ์ •๋‹น ๊ฐ์ •์„ ๋А๋‚€ ์‚ฌ๋žŒ์ด ๋งŽ์€ ๊ฐ์ •์— ํ•ด๋‹น ์˜ํ™”๋“ค์„ ๋งค์น˜

2. preprocessing

  • ์˜ํ™” ๋ฆฌ๋ทฐ, ์˜ํ™” ์•„์ด๋””๋กœ ์ด๋ฃจ์–ด์ง„ sent table

  • ๊ฐ๋…, ๋ฐฐ์šฐ ์•„์ด๋”” crew table

  • ๊ฐ๋…, ๋ฐฐ์šฐ์˜ ์ด๋ฆ„์ด ๋งค์น˜๋˜๋Š” name table

  • ์˜ํ™” ์•„์ด๋””, ์˜ํ™” ์ œ๋ชฉ, ์˜ํ™” ๊ฐœ๋ด‰๋…„๋„๋กœ ๊ตฌ์„ฑ๋œ title table

sent table๋กœ ์ถ”๋ก  ํ›„ ๋ชจ๋“  table์„ id concatํ•ด ์˜ํ™” ์ œ๋ชฉ, ๊ฐ๋…์„ ๋ฐ›์•„์˜ค๋ฉฐ ์ž‘ํ’ˆ์˜ ์—ฐ๋„๊ฐ€ 90๋…„๋„ ์ด์ƒ์˜ ์˜ํ™”๋งŒ ๋ฐ›์•„์˜ค๋„๋ก ์ฒ˜๋ฆฌ.

3. model

Hugging face์— ๊ณต๊ฐœ๋œ DistilBERT ์‚ฌ์šฉ

3. ๊ฐ์ • ๊ธฐ๋ฐ˜ ์Œ์•… ์ถ”์ฒœ

https://sites.tufts.edu/eeseniordesignhandbook/2015/music-mood-classification/๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถ„๋ฅ˜

1. Data

Spotify์—์„œ ์ œ๊ณตํ•˜๋Š” ์—ฌ๋Ÿฌ ์Œ์•… ์š”์†Œ๋ฅผ ๊ฐ€์ง€๊ณ  Happy, Sad, Calm์˜ ๊ฐ์ •์„ ๊ตฌ๋ถ„ ํ•˜๋Š” ํ•™์Šต ์ง„ํ–‰

ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” https://github.com/cristobalvch/Spotify-Machine-Learning์˜ data_moods.csv๋กœ ์ง„ํ–‰ํ–ˆ์œผ๋ฉฐ ์ถ”๋ก ์€ spotify์˜ ํ•œ๊ตญ playlist uri๋ฅผ ๊ฐ€์ ธ์™€ ์ง„ํ–‰ํ–ˆ๋‹ค.

2. model

Keras์—์„œ ์ œ๊ณตํ•˜๋Š” classifier ์‚ฌ์šฉ.

4. Serving

platform : AWS EC2

S3์— ํ•™์Šต๋œ ๋ชจ๋ธ ํŒŒ์ผ ์˜ฌ๋ ค์„œ AWS cliํ™œ์šฉํ•ด ๋‹ค์šด๋กœ๋“œ ํ›„ inference.

ํ”„๋ฆฌํ‹ฐ์–ด์ธ micro์‚ฌ์šฉ์‹œ ๋ชจ๋ธ ๋กœ๋”ฉ๋ถ€ํ„ฐ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ฃฝ์–ด๋ฒ„๋ฆฌ๋Š” ๊ด€๊ณ„๋กœ t2.Large ์‚ฌ์šฉ.


์ดํ›„ ๊ฐœ๋ฐœ ๋ชฉํ‘œ

๊ฐ์ •์˜ ๊ณต์œ ๊ฐ€ ๋ชฉ์ ์ธ ๋งŒํผ ์œ ์ €์˜ ๋Œ€๋žต์  ์ •๋ณด(๋‚˜์ด๋Œ€, ์„ฑ๋ณ„, ์ง€์—ญ ๋“ฑ)๊ณผ ๊ฐ์ •์„ ๋งค์น˜ํ•ด ๋‹ค๋ฅธ ์œ ์ € ์ค‘ ๋‚˜์™€ ๋™์ผํ•œ ๊ฐ์ •์„ ๋А๋ผ๋Š” ์œ ์ €๋“ค์ด ์ด์šฉํ•œ ์ฝ˜ํ…์ธ ๋ฅผ ์ถ”์ฒœํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค.

๋”ฐ๋ผ์„œ ํ˜„์žฌ ์ฝ˜ํ…์ธ ์—์„œ feedback์„ ๋ฐ›์•„ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•œ ๋‹ค์Œ ์™„์ „ํ•œ ์ถ”์ฒœ์‹œ์Šคํ…œ ๋ชจ๋ธ๋กœ ๋ฐ”๊ฟ”์•ผํ•œ๋‹ค.

๋˜ํ•œ, BERT์˜ ๊ฒฝ์šฐ ๋ฌด๊ฑฐ์šด ํŽธ์ด๋ผ ์ถ”๋ก ์‹œ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๋ฏ€๋กœ feature engineering๊ณผ ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ ์ˆ˜์ง‘์„ ํ†ตํ•ด ์ตœ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ๋Œ์–ด์˜ฌ๋ฆฌ๋ฉฐ ๊ฐ€๋ฒผ์šด ML๋ชจ๋ธ๋กœ ๊ต์ฒดํ•ด์•ผํ•œ๋‹ค.