🥈 銀メダル(上位5%)
Public LB: 0.945 Private LB: 0.946
本コンペの目的は、学生が数学の選択式問題に回答する際に示す 誤答タイプ(Category:Misconception)を予測することである。
出力は65種類の「Category:Misconception」ラベルのいずれか。
複数LLMのLoRA微調整+加重アンサンブル戦略を採用。 4つの指示微調整モデルを組み合わせ、言語・論理・数学的補完性を活用した。
| モデル | パラメータ規模 | 学習方式 | 特徴 |
|---|---|---|---|
| Gemma2-9B-IT | 9B | LoRA-CV945 | 英語理解と意味的ロバスト性 |
| Qwen3-8B | 8B | LoRA-MAP | 多言語適応性 |
| DeepSeek-Math-7B | 7B | LoRA-MAP | 数学的推論力 |
| Hunyuan-7B-Instruct | 7B | LoRA-MAP | 指示一般化と安定性 |
train["target"] = train["Category"] + ":" + train["Misconception"]
train["label"] = LabelEncoder().fit_transform(train["target"])
idx = train.apply(lambda row: row.Category.split("_")[0], axis=1) == "True"
correct = train.loc[idx].groupby(["QuestionId","MC_Answer"]).head(1)
Question: {QuestionText}
Answer: {MC_Answer}
Correct? {Yes/No}
Student Explanation: {StudentExplanation}
| モデル | Rank | LR | Batch | CV | 説明 |
|---|---|---|---|---|---|
| Gemma2-9B-IT | 16 | 2e-4 | 8 | 0.945 | メイン |
| Qwen3-8B | 16 | 2e-4 | 8 | 0.944 | 意味的広さ |
| DeepSeek-Math-7B | 16 | 2e-4 | 8 | 0.944 | 数学的論理性 |
| Hunyuan-7B | 16 | 2e-4 | 8 | 0.943 | 安定性 |
row_id, top_classes, prob_0, prob_1, ..., prob_24
final_score = 0.6 * weighted_mean_prob \
+ 0.3 * agreement_bonus \
+ 0.1 * confidence_bonus
4モデル出力を統合し、Top-3を最終予測。
| モデル | CV | Public | Private |
|---|---|---|---|
| DeepSeek-7B | 0.944 | 0.942 | 0.942 |
| Qwen3-8B | 0.945 | 0.944 | 0.945 |
| Gemma2-9B | 0.942 | 0.943 | 0.944 |
| Hunyuan-7B | 0.943 | 0.943 | 0.943 |
| アンサンブル | 0.948 | 0.945 | 0.946 🥈 |
├── gemma2_inference.py
├── qwen3_deepseek_inference.py
├── hunyuan_inference.py
├── ensemble.py
└── submission.csv
🥈 Silver Medal (~Top 5%) Rank 53 / 1858
Public LB 0.945 Private LB 0.946
Multi-class classification: predict the most likely Category:Misconception label (65 classes) from a math question, the student’s chosen answer, and written explanation.
Adopted a multi-LLM LoRA fine-tuning + weighted ensemble strategy.
| Model | Size | Fine-tune | Role |
|---|---|---|---|
| Gemma2-9B-IT | 9B | LoRA-CV945 | Main baseline |
| Qwen3-8B | 8B | LoRA-MAP | Semantic coverage |
| DeepSeek-Math-7B | 7B | LoRA-MAP | Math logic understanding |
| Hunyuan-7B | 7B | LoRA-MAP | Stable reasoning generalization |
train["target"] = train["Category"] + ":" + train["Misconception"]
train["label"] = LabelEncoder().fit_transform(train["target"])
idx = train.apply(lambda r: r.Category.split("_")[0], axis=1) == "True"
correct = train.loc[idx].groupby(["QuestionId","MC_Answer"]).head(1)
Question: {QuestionText}
Answer: {MC_Answer}
Correct? {Yes/No}
Student Explanation: {StudentExplanation}
| Model | Rank | LR | Batch | CV |
|---|---|---|---|---|
| Gemma2-9B-IT | 16 | 2e-4 | 8 | 0.945 |
| Qwen3-8B | 16 | 2e-4 | 8 | 0.944 |
| DeepSeek-Math-7B | 16 | 2e-4 | 8 | 0.944 |
| Hunyuan-7B | 16 | 2e-4 | 8 | 0.943 |
row_id, top_classes, prob_0, prob_1, ..., prob_24
final_score = 0.6 * weighted_mean_prob \
+ 0.3 * agreement_bonus \
+ 0.1 * confidence_bonus
Combine four models and rank Top-3 predictions.
| Model | CV | Public | Private |
|---|---|---|---|
| DeepSeek-Math-7B | 0.944 | 0.942 | 0.942 |
| Qwen3-8B | 0.945 | 0.944 | 0.945 |
| Gemma2-9B | 0.942 | 0.943 | 0.944 |
| Hunyuan-7B | 0.943 | 0.943 | 0.943 |
| Ensemble (final) | 0.948 | 0.945 | 0.946 🥈 |
├── gemma2_inference.py
├── qwen3_deepseek_inference.py
├── hunyuan_inference.py
├── ensemble.py
└── submission.csv