Skip to content

Junseo5/SSAFY_AI_PJT_2025

Repository files navigation

๐Ÿ“’ Kaggle VQA Challenge - ํ†ตํ•ฉ ๋…ธํŠธ๋ถ ๋ฒ„์ „

๐ŸŽฏ ํ”„๋กœ์ ํŠธ ๊ฐœ์š”

Visual Question Answering (VQA) ์ฑŒ๋ฆฐ์ง€๋ฅผ ์œ„ํ•œ ๋‹จ์ผ ํ†ตํ•ฉ ๋…ธํŠธ๋ถ ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค.

  • ๋ชจ๋ธ: Qwen2.5-VL (3B/7B) + QLoRA
  • ๋ชฉํ‘œ ์ •ํ™•๋„: 85-88% (Top 10%)
  • ํ™˜๊ฒฝ: T4 GPU ์™„๋ฒฝ ํ˜ธํ™˜
  • ํŠน์ง•: ๋ชจ๋“  ๊ธฐ๋Šฅ์ด ํ•˜๋‚˜์˜ ๋…ธํŠธ๋ถ์— ํ†ตํ•ฉ

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

๐Ÿ“’ ๋ฉ”์ธ ๋…ธํŠธ๋ถ

Kaggle_AllInOne_Pro.ipynb - ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ํ†ตํ•ฉ ๋…ธํŠธ๋ถ

์ด ๋…ธํŠธ๋ถ ํ•˜๋‚˜๋กœ ๋ชจ๋“  ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค:

  • โœ… ํ™˜๊ฒฝ ์„ค์ • ๋ฐ ํŒจํ‚ค์ง€ ์„ค์น˜
  • โœ… Config ํ†ตํ•ฉ ๊ด€๋ฆฌ
  • โœ… ๋ฐ์ดํ„ฐ ๋กœ๋“œ ๋ฐ EDA
  • โœ… Stratified K-Fold CV
  • โœ… ๊ณ ๊ธ‰ ํ•™์Šต ๋ฃจํ”„ (AMP, EMA, SWA, Cosine Warmup)
  • โœ… TTA ์ถ”๋ก 
  • โœ… ์•™์ƒ๋ธ”
  • โœ… ์ œ์ถœ ํŒŒ์ผ ์ƒ์„ฑ

๐Ÿ”ต ๋ฒ ์ด์Šค๋ผ์ธ ์ฐธ๊ณ 

251023_Baseline.ipynb - ๊ฒฝ์Ÿ ๋ฒ ์ด์Šค๋ผ์ธ ์ฝ”๋“œ (์ฐธ๊ณ ์šฉ)

โœจ ์ฃผ์š” ๊ธฐ๋Šฅ

1. T4 GPU ์™„๋ฒฝ ํ˜ธํ™˜

  • โœ… Float16 (BFloat16 ๋Œ€์‹ )
  • โœ… SDPA Attention (FlashAttention ์ œ๊ฑฐ)
  • โœ… 4-bit QLoRA
  • โœ… Gradient Checkpointing

2. ๋ผ๋ฒจ ์ •๋ ฌ ๊ต์ • (ํ•ต์‹ฌ!)

  • โœ… Assistant ๋ฉ”์‹œ์ง€์— ์ •๋‹ต ํฌํ•จ
  • โœ… add_generation_prompt=False ์‚ฌ์šฉ
  • โœ… ์ •๋‹ต ํ† ํฐ ์œ„์น˜ ์ •ํ™•ํ•œ ํ•™์Šต

3. ๊ณ ๊ธ‰ ํ•™์Šต ๊ธฐ๋ฒ•

  • โœ… AMP (Automatic Mixed Precision)
  • โœ… EMA (Exponential Moving Average)
  • โœ… SWA (Stochastic Weight Averaging)
  • โœ… Cosine Warmup Scheduler
  • โœ… Gradient Clipping

4. K-Fold Cross-Validation

  • โœ… Stratified K-Fold (๋‹ต๋ณ€ ๋ถ„ํฌ ์œ ์ง€)
  • โœ… 3-Fold ๊ธฐ๋ณธ ์„ค์ •
  • โœ… Fold๋ณ„ ๋…๋ฆฝ ํ•™์Šต

5. TTA & Ensemble

  • โœ… Test-Time Augmentation ์ง€์›
  • โœ… Majority Voting ์•™์ƒ๋ธ”
  • โœ… Weighted Ensemble ์˜ต์…˜

๐Ÿ“Š ์˜ˆ์ƒ ์„ฑ๋Šฅ

์„ค์ • ์ •ํ™•๋„ ํ•™์Šต ์‹œ๊ฐ„ ๋…ธํŠธ
Baseline (200 samples) 60-65% ~20min ๋น ๋ฅธ ํ…Œ์ŠคํŠธ
Single Fold (3B, full data) 75-78% ~2h ๋‹จ์ผ ๋ชจ๋ธ
3-Fold Ensemble (3B) 79-82% ~6h ์•™์ƒ๋ธ”
3-Fold Ensemble (7B) 83-85% ~12h ๊ณ ์„ฑ๋Šฅ
+ TTA + Optimization (7B) 85-88% ~15h ์ตœ๊ณ  ์„ฑ๋Šฅ

๐Ÿ—‚๏ธ ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

SSAFY_AI_PJT_2025/
โ”œโ”€โ”€ ๐Ÿ“’ Kaggle_AllInOne_Pro.ipynb    โญ ๋ฉ”์ธ ํ†ตํ•ฉ ๋…ธํŠธ๋ถ
โ”œโ”€โ”€ ๐Ÿ“’ 251023_Baseline.ipynb         ์ฐธ๊ณ ์šฉ ๋ฒ ์ด์Šค๋ผ์ธ
โ”œโ”€โ”€ README.md                         ์ด ํŒŒ์ผ
โ”œโ”€โ”€ PROJECT_SUMMARY.md                ํ”„๋กœ์ ํŠธ ์š”์•ฝ
โ”œโ”€โ”€ requirements.txt                  ํŒจํ‚ค์ง€ ๋ชฉ๋ก
โ”œโ”€โ”€ install.sh                        ์ž๋™ ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ
โ”œโ”€โ”€ data/                             ๋ฐ์ดํ„ฐ ํด๋”
โ”‚   โ”œโ”€โ”€ train.csv
โ”‚   โ”œโ”€โ”€ test.csv
โ”‚   โ””โ”€โ”€ sample_submission.csv
โ”œโ”€โ”€ experiments/                      ์‹คํ—˜ ๊ฒฐ๊ณผ ์ €์žฅ
โ”‚   โ””โ”€โ”€ README.md
โ”œโ”€โ”€ checkpoints/                      ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ (ํ•™์Šต ํ›„ ์ƒ์„ฑ)
โ”œโ”€โ”€ outputs/                          ์ œ์ถœ ํŒŒ์ผ (์ถ”๋ก  ํ›„ ์ƒ์„ฑ)
โ””โ”€โ”€ logs/                             ํ•™์Šต ๋กœ๊ทธ (์„ ํƒ)

๐ŸŽ“ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

1. ํ™˜๊ฒฝ ์ค€๋น„ (Colab/Kaggle)

# Kaggle_AllInOne_Pro.ipynb์˜ ์ฒซ ๋ฒˆ์งธ ์ฝ”๋“œ ์…€ ์‹คํ–‰
!pip install -q "transformers>=4.44.2" "accelerate>=0.34.2" "peft>=0.13.2" \
    "bitsandbytes>=0.43.1" datasets pillow pandas torch torchvision \
    scikit-learn matplotlib seaborn tqdm --upgrade
!pip install -q qwen-vl-utils==0.0.8

# ๋Ÿฐํƒ€์ž„ ์žฌ์‹œ์ž‘

2. ๋ฐ์ดํ„ฐ ์—…๋กœ๋“œ

Colab์˜ ๊ฒฝ์šฐ:

from google.colab import drive
drive.mount('/content/drive')

# ๋ฐ์ดํ„ฐ ์••์ถ• ํ•ด์ œ
!unzip "/content/drive/My Drive/data.zip" -d "/content/"

Kaggle์˜ ๊ฒฝ์šฐ:

  • Add Data โ†’ Upload Dataset

3. Config ์„ค์ •

๋…ธํŠธ๋ถ์˜ Config ์…€์—์„œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •:

class Config:
    # ๋ชจ๋ธ ์„ค์ •
    MODEL_ID = "Qwen/Qwen2.5-VL-3B-Instruct"  # ๋˜๋Š” 7B
    IMAGE_SIZE = 384  # 384, 512, 768

    # K-Fold ์„ค์ •
    N_FOLDS = 3
    USE_KFOLD = True

    # ํ•™์Šต ์„ค์ •
    NUM_EPOCHS = 1  # ์‹ค์ „: 3~5
    BATCH_SIZE = 1
    GRAD_ACCUM_STEPS = 4
    LEARNING_RATE = 1e-4

    # ๊ณ ๊ธ‰ ๊ธฐ๋ฒ•
    USE_AMP = True
    USE_EMA = True
    USE_SWA = False
    USE_TTA = False

    # ์ƒ˜ํ”Œ๋ง (๋””๋ฒ„๊น…)
    USE_SAMPLE = True  # False: ์ „์ฒด ๋ฐ์ดํ„ฐ
    SAMPLE_SIZE = 200

4. ์ˆœ์ฐจ ์‹คํ–‰

๋…ธํŠธ๋ถ์˜ ๋ชจ๋“  ์…€์„ ์ˆœ์„œ๋Œ€๋กœ ์‹คํ–‰:

  1. ํ™˜๊ฒฝ ์„ค์ •
  2. Config
  3. ๋ฐ์ดํ„ฐ ๋กœ๋“œ & EDA
  4. K-Fold ์ƒ์„ฑ
  5. Dataset ์ •์˜
  6. ๋ชจ๋ธ ๋กœ๋“œ
  7. ํ•™์Šต
  8. ์ถ”๋ก 
  9. ์•™์ƒ๋ธ”
  10. ๊ฒฐ๊ณผ ๋ถ„์„

5. ์ œ์ถœ

outputs/submission_ensemble.csv (๋˜๋Š” submission_single.csv) ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ์ œ์ถœ

๐Ÿ”ง ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๊ฐ€์ด๋“œ

๋น ๋ฅธ ํ…Œ์ŠคํŠธ (20๋ถ„)

USE_SAMPLE = True
SAMPLE_SIZE = 200
NUM_EPOCHS = 1
USE_KFOLD = False

๋‹จ์ผ ๋ชจ๋ธ ์‹คํ—˜ (2์‹œ๊ฐ„)

USE_SAMPLE = False
NUM_EPOCHS = 3
USE_KFOLD = False
MODEL_ID = "Qwen/Qwen2.5-VL-3B-Instruct"

3-Fold ์•™์ƒ๋ธ” (6-12์‹œ๊ฐ„)

USE_SAMPLE = False
NUM_EPOCHS = 3
USE_KFOLD = True
N_FOLDS = 3
MODEL_ID = "Qwen/Qwen2.5-VL-7B-Instruct"  # ๊ณ ์„ฑ๋Šฅ

์ตœ๊ณ  ์„ฑ๋Šฅ (15์‹œ๊ฐ„)

USE_SAMPLE = False
NUM_EPOCHS = 5
USE_KFOLD = True
N_FOLDS = 3
MODEL_ID = "Qwen/Qwen2.5-VL-7B-Instruct"
IMAGE_SIZE = 512  # ๋˜๋Š” 768
USE_EMA = True
USE_SWA = True
USE_TTA = True
TTA_SCALES = [0.9, 1.0, 1.1]

โš ๏ธ ์ค‘์š” ์‚ฌํ•ญ

T4 GPU ํ˜ธํ™˜์„ฑ

  • Float16 ์‚ฌ์šฉ (BFloat16 ์•„๋‹˜) - T4๋Š” BF16 ๋ฏธ์ง€์›
  • SDPA Attention (FlashAttention ์ œ๊ฑฐ) - T4 ์ตœ์ ํ™” ๋ถˆ๊ฐ€
  • 4-bit Quantization - ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ

๋ผ๋ฒจ ์ •๋ ฌ ๊ต์ •

์ด๊ฒƒ์ด ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ˆ˜์ • ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค!

โŒ ์ž˜๋ชป๋œ ๋ฐฉ๋ฒ• (ํ•™์Šต/์ถ”๋ก  ๋ถˆ์ผ์น˜):

# ํ•™์Šต ์‹œ ์ •๋‹ต ์—†์ด ํ•™์Šต
messages = [
    {"role": "user", "content": [...]},
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)

โœ… ์˜ฌ๋ฐ”๋ฅธ ๋ฐฉ๋ฒ• (๋ผ๋ฒจ ์ •๋ ฌ):

# ํ•™์Šต ์‹œ ์ •๋‹ต ํฌํ•จ
messages = [
    {"role": "user", "content": [...]},
    {"role": "assistant", "content": [{"type": "text", "text": "a"}]}  # ์ •๋‹ต!
]
text = processor.apply_chat_template(messages, add_generation_prompt=False)  # False!

์žฌํ˜„์„ฑ

  • Seed 42๋กœ ๊ณ ์ •
  • torch.backends.cudnn.deterministic = True

๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ

  • Gradient Checkpointing ํ™œ์„ฑํ™”
  • Batch Size 1 + Gradient Accumulation 4

๐Ÿ“Œ FAQ

Q1: OOM (Out of Memory) ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•ด์š”

A: ๋‹ค์Œ์„ ์‹œ๋„ํ•˜์„ธ์š”:

  • BATCH_SIZE = 1๋กœ ๊ฐ์†Œ
  • IMAGE_SIZE = 384๋กœ ๊ฐ์†Œ
  • MODEL_ID๋ฅผ 3B๋กœ ๋ณ€๊ฒฝ
  • USE_EMA = False, USE_SWA = False

Q2: ํ•™์Šต์ด ๋„ˆ๋ฌด ๋А๋ ค์š”

A:

  • USE_SAMPLE = True, SAMPLE_SIZE = 200์œผ๋กœ ๋น ๋ฅธ ํ…Œ์ŠคํŠธ
  • NUM_EPOCHS = 1๋กœ ๊ฐ์†Œ
  • USE_KFOLD = False๋กœ ๋‹จ์ผ ๋ชจ๋ธ ํ•™์Šต

Q3: ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์•„์š”

A:

  • NUM_EPOCHS ์ฆ๊ฐ€ (3~5)
  • MODEL_ID๋ฅผ 7B๋กœ ๋ณ€๊ฒฝ
  • IMAGE_SIZE ์ฆ๊ฐ€ (512, 768)
  • USE_KFOLD = True๋กœ ์•™์ƒ๋ธ”
  • USE_EMA = True, USE_TTA = True

Q4: scripts/ ํด๋”๊ฐ€ ์—†์–ด์š”

A: ๋ชจ๋“  ์ฝ”๋“œ๊ฐ€ Kaggle_AllInOne_Pro.ipynb ๋…ธํŠธ๋ถ์— ํ†ตํ•ฉ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณ„๋„ ์Šคํฌ๋ฆฝํŠธ ํŒŒ์ผ์ด ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค.

๐Ÿ“š ์ฐธ๊ณ  ์ž๋ฃŒ

๐Ÿ“Š ๋ณ€๊ฒฝ ์‚ฌํ•ญ (์ด์ „ ๋ฒ„์ „ ๋Œ€๋น„)

โœ… ํ†ตํ•ฉ ์™„๋ฃŒ

  • โŒ scripts/ ํด๋” โ†’ โœ… ๋…ธํŠธ๋ถ์— ํ†ตํ•ฉ
  • โŒ config/ ํด๋” โ†’ โœ… Config ํด๋ž˜์Šค๋กœ ํ†ตํ•ฉ
  • โŒ notebooks/VQA_Training_Complete.ipynb โ†’ โœ… Kaggle_AllInOne_Pro.ipynb๋กœ ๋Œ€์ฒด

โœ… ์ถ”๊ฐ€๋œ ๊ธฐ๋Šฅ

  • โœ… EMA (Exponential Moving Average)
  • โœ… SWA (Stochastic Weight Averaging)
  • โœ… Cosine Warmup Scheduler
  • โœ… TTA (Test-Time Augmentation)
  • โœ… ํ†ตํ•ฉ Config ๊ด€๋ฆฌ
  • โœ… ์ž๋™ EDA & ์‹œ๊ฐํ™”

โœ… ์œ ์ง€๋œ ๊ธฐ๋Šฅ

  • โœ… T4 ํ˜ธํ™˜์„ฑ (Float16, SDPA)
  • โœ… ๋ผ๋ฒจ ์ •๋ ฌ ๊ต์ •
  • โœ… Stratified K-Fold
  • โœ… QLoRA (4-bit)
  • โœ… Gradient Checkpointing

๐ŸŽฏ ๋‹ค์Œ ๋‹จ๊ณ„

  1. ์‹คํ—˜ ๊ด€๋ฆฌ: experiments/ ํด๋”์— ์‹คํ—˜ ๋กœ๊ทธ ์ €์žฅ
  2. ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™”: Optuna ๋“ฑ ํ™œ์šฉ
  3. ์•™์ƒ๋ธ” ๊ฐœ์„ : Weighted Voting, Stacking
  4. ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•: Choice Shuffle, Paraphrase
  5. ์—๋Ÿฌ ๋ถ„์„: ์˜ˆ์ธก ์‹คํŒจ ์ƒ˜ํ”Œ ๋ถ„์„

๐Ÿ“ง ๋ฌธ์˜

  • GitHub Issues: ํ”„๋กœ์ ํŠธ ๊ด€๋ จ ์งˆ๋ฌธ

๐Ÿค– SSAFY AI Project 2025

โญ ํ–‰์šด์„ ๋น•๋‹ˆ๋‹ค!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •