Charting Student Math Misunderstandings (MAP)

🥈 銀メダル(上位5%)

Public LB: 0.945  Private LB: 0.946

[コンペページ]

一、タスク概要

本コンペの目的は、学生が数学の選択式問題に回答する際に示す 誤答タイプ(Category:Misconception)を予測することである。

出力は65種類の「Category:Misconception」ラベルのいずれか。

二、全体的なアプローチ

複数LLMのLoRA微調整+加重アンサンブル戦略を採用。 4つの指示微調整モデルを組み合わせ、言語・論理・数学的補完性を活用した。

モデルパラメータ規模学習方式特徴
Gemma2-9B-IT9BLoRA-CV945英語理解と意味的ロバスト性
Qwen3-8B8BLoRA-MAP多言語適応性
DeepSeek-Math-7B7BLoRA-MAP数学的推論力
Hunyuan-7B-Instruct7BLoRA-MAP指示一般化と安定性

三、データ前処理

ラベル構築

train["target"] = train["Category"] + ":" + train["Misconception"]
train["label"] = LabelEncoder().fit_transform(train["target"])

正答特徴抽出

idx = train.apply(lambda row: row.Category.split("_")[0], axis=1) == "True"
correct = train.loc[idx].groupby(["QuestionId","MC_Answer"]).head(1)

入力テンプレート

Question: {QuestionText}
Answer: {MC_Answer}
Correct? {Yes/No}
Student Explanation: {StudentExplanation}

四、ファインチューニング(LoRA)

モデルRankLRBatchCV説明
Gemma2-9B-IT162e-480.945メイン
Qwen3-8B162e-480.944意味的広さ
DeepSeek-Math-7B162e-480.944数学的論理性
Hunyuan-7B162e-480.943安定性

五、推論プロセス

row_id, top_classes, prob_0, prob_1, ..., prob_24

六、アンサンブル戦略

final_score = 0.6 * weighted_mean_prob \
             + 0.3 * agreement_bonus \
             + 0.1 * confidence_bonus

4モデル出力を統合し、Top-3を最終予測。

七、結果概要

モデルCVPublicPrivate
DeepSeek-7B0.9440.9420.942
Qwen3-8B0.9450.9440.945
Gemma2-9B0.9420.9430.944
Hunyuan-7B0.9430.9430.943
アンサンブル0.9480.9450.946 🥈

八、環境

九、プロジェクト構成

├── gemma2_inference.py
├── qwen3_deepseek_inference.py
├── hunyuan_inference.py
├── ensemble.py
└── submission.csv

Charting Student Math Misunderstandings (MAP)

🥈 Silver Medal (~Top 5%) Rank 53 / 1858

Public LB 0.945 Private LB 0.946

[Competition Page]

1. Problem Overview

Multi-class classification: predict the most likely Category:Misconception label (65 classes) from a math question, the student’s chosen answer, and written explanation.

2. Overall Approach

Adopted a multi-LLM LoRA fine-tuning + weighted ensemble strategy.

ModelSizeFine-tuneRole
Gemma2-9B-IT9BLoRA-CV945Main baseline
Qwen3-8B8BLoRA-MAPSemantic coverage
DeepSeek-Math-7B7BLoRA-MAPMath logic understanding
Hunyuan-7B7BLoRA-MAPStable reasoning generalization

3. Data Processing

train["target"] = train["Category"] + ":" + train["Misconception"]
train["label"] = LabelEncoder().fit_transform(train["target"])
idx = train.apply(lambda r: r.Category.split("_")[0], axis=1) == "True"
correct = train.loc[idx].groupby(["QuestionId","MC_Answer"]).head(1)
Question: {QuestionText}
Answer: {MC_Answer}
Correct? {Yes/No}
Student Explanation: {StudentExplanation}

4. Model Fine-Tuning

ModelRankLRBatchCV
Gemma2-9B-IT162e-480.945
Qwen3-8B162e-480.944
DeepSeek-Math-7B162e-480.944
Hunyuan-7B162e-480.943

5. Inference Pipeline

row_id, top_classes, prob_0, prob_1, ..., prob_24

6. Ensemble Strategy

final_score = 0.6 * weighted_mean_prob \
             + 0.3 * agreement_bonus \
             + 0.1 * confidence_bonus

Combine four models and rank Top-3 predictions.

7. Results Summary

ModelCVPublicPrivate
DeepSeek-Math-7B0.9440.9420.942
Qwen3-8B0.9450.9440.945
Gemma2-9B0.9420.9430.944
Hunyuan-7B0.9430.9430.943
Ensemble (final)0.9480.9450.946 🥈

8. Environment & Reproducibility

9. Repository Structure

├── gemma2_inference.py
├── qwen3_deepseek_inference.py
├── hunyuan_inference.py
├── ensemble.py
└── submission.csv