← Back to Home

Kitakyushu City Smart Waste Sorting System

Interview Project Introduction

Project Overview

Project Name: Kita – Kitakyushu City Smart Waste Sorting Q&A System Project Type: RAG (Retrieval-Augmented Generation) Question Answering System Development Period: GMO Internship Project Team Size: 3 members (Sota Aoki, Hanyang Yin, Taro Yasuda) My Role: [Please describe your specific role and responsibilities]

Background and Objectives

Business Challenges

Solution

We developed a smart Q&A system based on a RAG architecture that provides the following through natural language interaction:


System Architecture Design

Technology Stack Selection

Frontend Layer Backend Layer Core RAG Engine LLM Inference

Architecture Highlights


User Input → Hybrid Grounding → ChromaDB Search → RAG Prompt → LLM Generation → Response
(Smart Recognition)   (Multi-source Retrieval)                (Streaming / Blocking)

Three-layer Architecture:
  • Presentation Layer (Streamlit): Chat UI, logs display, file upload
  • Business Layer (FastAPI): API routing, RAG orchestration, data validation
  • Data Layer (ChromaDB): Waste rules, area information, user knowledge base

User Interface

Text-based Query Interface

The system provides an intuitive chat interface where users can ask questions in natural language about waste sorting and collection schedules.

Text-based Query Interface
Text-based query interface showing natural language Q&A interaction

Image-based Query Interface

Users can also upload images of items they want to dispose of, and the system will identify the item and provide appropriate disposal instructions.

Image-based Query Interface
Image-based query interface with visual item recognition

Core Technical Innovations

1. Hybrid Grounding System v2.0 (Key Technology)

Problem: Traditional MeCab tokenization lacks accuracy for complex queries Innovative Solution: A three-layer intelligent recognition system

Layer 1: Exact Match

  • Full match with database → confidence 1.0 → immediate response
  • Response time < 5 ms

Layer 2: Smart Routing (Path Selection)

if input_length < 20:
    → Path A (Fast Path): Global embedding search
    → Response time < 300 ms
else:
    → Path A + Path B (Dual Path):
       - Path A: Global semantic search
       - Path B: LLM-assisted phrase extraction + segmented search
    → Response time < 600 ms

Layer 3: Confidence Evaluation

* High (≥ 0.70): Direct adoption

* Medium (0.45–0.70): Presented to user

* Low (< 0.45): Automatic fallback to MeCab

Results

* Accuracy improvement: 78% → 92%

* Average response time: < 400 ms

* Automatic fault tolerance via fallback mechanisms


2. Streaming Response Optimization

Problem: Blocking responses caused long wait times and poor UX Solution:

* Implemented Server-Sent Events (SSE) for streaming responses

* Token-by-token output with Time to First Byte (TTFB) < 1s

* Real-time frontend rendering, significantly reducing perceived latency


3. Multi-source Knowledge Base Integration

Three independent ChromaDB collections: Benefits:

* Higher retrieval accuracy through specialization

* Support for hot updates of knowledge bases

* Easier access control and version management


Implementation Details

RAG Pipeline Walkthrough

# 1. Query understanding
query = "I want to dispose of a laptop"

# 2. Item extraction via Hybrid Grounding
result = hybrid_grounding.extract(query)
# → primary_candidate: "laptop"
# → confidence: "high" (0.98)
# → execution_time: 35ms

# 3. Multi-source retrieval
gomi_docs = chroma_gomi.query(embedding(candidate_item), k=3)
area_docs = chroma_area.query(embedding(town_name), k=2)
knowledge_docs = chroma_knowledge.query(embedding(query), k=2)

# 4. Context construction
context = format_rag_prompt(gomi_docs, area_docs, knowledge_docs)

# 5. LLM generation
response = ollama.generate(
    model="swallow:latest",
    prompt=context + query,
    stream=True  # streaming response
)

Data Structure Design

ChromaDB Collection Schema:
# gomi collection
{
    "id": "gomi_001",
    "document": "Please dispose of laptops as bulky waste...",
    "metadata": {
        "item_name": "laptop",
        "category": "bulky waste",
        "source_file": "gomi_rules.pdf",
        "page": 15
    }
}

# area collection
{
    "id": "area_001",
    "document": "Household waste in Yahatahigashi Ward is collected on Mondays and Thursdays...",
    "metadata": {
        "town": "Yahatahigashi Ward",
        "waste_type": "household waste",
        "collection_days": ["Mon", "Thu"]
    }
}

API Design

Blocking Mode – POST /api/bot/respond
{
  "message": "I want to dispose of a laptop",
  "user_id": "user123",
  "stream": false
}
Streaming Mode – POST /api/bot/respond_stream
data: {"type": "token", "content": "laptop"}
...
data: {"type": "done", "references": [...]}

Performance Metrics

Response Performance

* TTFB (Time to First Byte): 800 ms

* Full Response Time: 3–5 seconds (depending on answer length)

* Hybrid Grounding: 35–600 ms

* ChromaDB Retrieval: 50–150 ms

Accuracy Metrics

* Item Recognition Accuracy: 92% (vs. MeCab 78%)

* Waste Rule Accuracy: 96%

* Area Information Accuracy: 99% (structured data)

System Resources

* VRAM Usage: 6–8 GB (8B model)

* CPU Usage: 20–40%

* Memory Usage: 4–6 GB


Development Process and Challenges

Challenge 1: Low Accuracy in Japanese Item Name Extraction

Issue:

* MeCab tokenization struggles with long sentences and compound nouns

* Example: “使わなくなったノートパソコン” → extraction failure

Resolution: Result: Accuracy improved from 78% to 92%

Challenge 2: Choppy Streaming Responses

Issue: Frontend rendering lag and token accumulation Solution:

* Tuned backend buffer size

* Implemented asynchronous updates using useEffect + useState

* Added heartbeat detection mechanism


Challenge 3: ChromaDB Performance Optimization

Optimizations:

Key Learnings

Technical Growth

Engineering Skills

Business Understanding