Spaces:

ucalyptus
/

recbert-viz

Running

File size: 11,671 Bytes

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>RecBERT Recommendation System</title>
    <script src="https://cdn.tailwindcss.com"></script>
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
    <style>
        /* Use Inter font */
        body {
            font-family: 'Inter', sans-serif;
            background-color: #f8fafc; /* Light gray background */
        }
        /* Custom styles for arrows and boxes */
        .arrow {
            position: relative;
            width: 100%;
            height: 2px;
            background-color: #6b7280; /* Gray-500 */
            margin: 1.5rem 0;
        }
        .arrow::after {
            content: '';
            position: absolute;
            right: -1px;
            top: -4px;
            width: 0;
            height: 0;
            border-top: 5px solid transparent;
            border-bottom: 5px solid transparent;
            border-left: 8px solid #6b7280; /* Gray-500 */
        }
        .llm-box {
            border: 2px solid #fbbf24; /* Amber-400 */
            background-color: #fefce8; /* Amber-50 */
        }
        .transformer-box {
            border: 2px solid #60a5fa; /* Blue-400 */
            background-color: #eff6ff; /* Blue-50 */
        }
         .data-box {
            border: 2px dashed #a78bfa; /* Violet-400 */
            background-color: #f5f3ff; /* Violet-50 */
         }
        .input-box, .output-box {
             border: 2px solid #9ca3af; /* Gray-400 */
             background-color: #ffffff; /* White */
        }
        .process-box {
            border: 2px solid #34d399; /* Emerald-400 */
            background-color: #ecfdf5; /* Emerald-50 */
        }
        .card {
            background-color: white;
            border-radius: 0.75rem; /* lg */
            box-shadow: 0 4px 6px -1px rgb(0 0 0 / 0.1), 0 2px 4px -2px rgb(0 0 0 / 0.1);
            padding: 1.5rem; /* p-6 */
            margin-bottom: 2rem; /* mb-8 */
        }
        .section-title {
            font-size: 1.5rem; /* text-2xl */
            font-weight: 600; /* font-semibold */
            margin-bottom: 1rem; /* mb-4 */
            color: #1f2937; /* Gray-800 */
        }
        .step-label {
            font-size: 0.875rem; /* text-sm */
            font-weight: 500; /* font-medium */
            color: #4b5563; /* Gray-600 */
            margin-bottom: 0.5rem; /* mb-2 */
            text-align: center;
        }
        .formula {
            font-family: 'Courier New', Courier, monospace;
            background-color: #f3f4f6; /* Gray-100 */
            padding: 0.5rem;
            border-radius: 0.25rem;
            font-size: 0.8rem;
            overflow-x: auto;
            white-space: pre;
        }
    </style>
</head>
<body class="p-4 md:p-8">

    <h1 class="text-3xl md:text-4xl font-bold text-center mb-8 md:mb-12 text-gray-900">
        Visualizing the RecBERT Recommendation System
    </h1>

    <div class="card">
        <h2 class="section-title">1. Training the RecBERT Embedding Model</h2>
        <p class="text-gray-700 mb-6">RecBERT first adapts a base transformer model to the specific domain of user comments and then fine-tunes it to generate meaningful sentence-level embeddings.</p>

        <div class="grid grid-cols-1 md:grid-cols-5 gap-4 items-start">
            <div class="flex flex-col items-center">
                <div class="step-label">Base Model</div>
                <div class="transformer-box p-3 rounded-lg w-full text-center shadow">
                    <span class="font-semibold text-blue-700">RoBERTa</span>
                    <div class="text-xs text-blue-600">(Pre-trained on general text)</div>
                </div>
            </div>

            <div class="flex flex-col items-center justify-start md:mt-6">
                 <div class="arrow w-16 md:w-full"></div>
                 <div class="text-xs text-center text-gray-500 -mt-4">Domain Adaptation (MLM on User Comments)</div>
            </div>

            <div class="flex flex-col items-center">
                <div class="step-label">Domain Adapted</div>
                <div class="transformer-box p-3 rounded-lg w-full text-center shadow">
                     <span class="font-semibold text-blue-700">RoBERTa</span>
                    <div class="text-xs text-blue-600">(Understands comment-specific language)</div>
                </div>
                 <div class="data-box p-2 mt-2 rounded-lg w-full text-center shadow text-xs">
                    <span class="font-semibold text-violet-700">Input Data:</span>
                    <div class="text-violet-600">User Comments Dataset (e.g., MyAnimeList reviews)</div>
                 </div>
            </div>

             <div class="flex flex-col items-center justify-start md:mt-6">
                 <div class="arrow w-16 md:w-full"></div>
                 <div class="text-xs text-center text-gray-500 -mt-4">Fine-tuning (SimCSE + MNR Loss)</div>
            </div>

            <div class="flex flex-col items-center">
                <div class="step-label">Fine-Tuned Model</div>
                <div class="transformer-box p-3 rounded-lg w-full text-center shadow border-emerald-500 bg-emerald-50">
                     <span class="font-semibold text-emerald-700">RecBERT</span>
                    <div class="text-xs text-emerald-600">(Generates Semantic Comment Embeddings)</div>
                </div>
                 <div class="process-box p-2 mt-2 rounded-lg w-full text-center shadow text-xs">
                    <span class="font-semibold text-emerald-700">Method:</span>
                    <div class="text-emerald-600">Siamese Network + SimCSE (Contrastive Learning)</div>
                 </div>
            </div>
        </div>
         <p class="text-sm text-gray-600 mt-8">
            <span class="font-semibold">Benefit:</span> This process creates a model that can accurately represent the semantic meaning of entire user comments as dense vectors (embeddings), tailored to the specific language used in those comments. This is crucial for comparing comments and queries effectively.
        </p>
    </div>

    <div class="card">
        <h2 class="section-title">2. Query Processing & Ranking Retrieval</h2>
        <p class="text-gray-700 mb-6">When a user query arrives, RecBERT segments it using an LLM and calculates similarity scores through two channels (full query and subqueries) to rank relevant classes (e.g., stories, items).</p>

        <div class="grid grid-cols-1 md:grid-cols-3 gap-6 items-start mb-8">
             <div class="flex flex-col items-center">
                <div class="step-label">User Query (γ)</div>
                <div class="input-box p-3 rounded-lg w-full text-center shadow">
                    "isekai story with strong female lead and magic system"
                </div>
            </div>

             <div class="flex flex-col items-center justify-start md:mt-6">
                 <div class="arrow w-16 md:w-full"></div>
                 <div class="text-xs text-center text-gray-500 -mt-4">LLM Query Segmentation (Few-Shot)</div>
            </div>

             <div class="flex flex-col items-center">
                <div class="step-label">Subqueries (γ1, γ2, γ3)</div>
                <div class="llm-box p-3 rounded-lg w-full text-left text-sm shadow">
                    <ul class="list-disc list-inside">
                        <li>isekai story (γ1)</li>
                        <li>strong female lead (γ2)</li>
                        <li>magic system (γ3)</li>
                    </ul>
                </div>
            </div>
        </div>

        <div class="grid grid-cols-1 md:grid-cols-2 gap-8 border-t border-gray-200 pt-8">
            <div class="process-box p-4 rounded-lg shadow">
                <h3 class="font-semibold text-emerald-800 mb-2 text-center">Channel 1: Full Query Similarity (S1)</h3>
                <div class="flex flex-col items-center space-y-3">
                    <div class="input-box p-2 rounded text-xs w-full text-center">Full Query Embedding e(γ)</div>
                    <div class="text-emerald-600 text-2xl">&darr;</div>
                    <div class="data-box p-2 rounded text-xs w-full text-center">KNN Search vs. All Comment Embeddings e(A)</div>
                     <div class="text-emerald-600 text-2xl">&darr;</div>
                     <div class="output-box p-2 rounded text-xs w-full text-center">Max Similarity per Class</div>
                     <div class="formula mt-2">S1 = max(cos_sim(e(γ), e(A)))</div>
                </div>
            </div>

            <div class="process-box p-4 rounded-lg shadow">
                 <h3 class="font-semibold text-emerald-800 mb-2 text-center">Channel 2: Subquery Similarity (S2)</h3>
                 <div class="flex flex-col items-center space-y-3">
                    <div class="input-box p-2 rounded text-xs w-full text-center">Subquery Embeddings e(γ1), e(γ2), ...</div>
                     <div class="text-emerald-600 text-2xl">&darr;</div>
                    <div class="data-box p-2 rounded text-xs w-full text-center">KNN Search per Subquery vs. All Comment Embeddings e(B)</div>
                     <div class="text-emerald-600 text-2xl">&darr;</div>
                     <div class="output-box p-2 rounded text-xs w-full text-center">Avg. of Max Similarities per Class (s2)</div>
                     <div class="formula mt-2">s2 = avg(max(cos_sim(e(γi), e(B))))</div>
                     <div class="text-emerald-600 text-2xl">&darr;</div>
                     <div class="output-box p-2 rounded text-xs w-full text-center">Adjusted Similarity (S2)</div>
                     <div class="formula mt-2">S2 = clamp(tanh⁻¹(s2), max=1)</div>
                 </div>
            </div>
        </div>

        <div class="mt-12 pt-8 border-t border-gray-200 text-center">
             <h3 class="text-lg font-semibold mb-3 text-gray-800">Final Class Ranking</h3>
             <div class="flex flex-col items-center">
                 <div class="flex items-center gap-4 mb-4">
                     <div class="output-box p-3 rounded-lg shadow text-sm">Similarity S1</div>
                     <div class="text-2xl font-bold text-gray-700">&amp;</div>
                     <div class="output-box p-3 rounded-lg shadow text-sm">Similarity S2</div>
                 </div>
                 <div class="text-emerald-600 text-2xl mb-2">&darr;</div>
                 <div class="process-box p-4 rounded-lg shadow inline-block">
                     <div class="font-semibold text-emerald-800">Final Score (S) per Class</div>
                     <div class="formula mt-2">S = max(S1, S2)</div>
                 </div>
                  <div class="text-emerald-600 text-2xl mt-2">&darr;</div>
                  <div class="output-box p-3 rounded-lg shadow text-sm font-medium">Ranked List of Classes</div>
             </div>
        </div>

        <p class="text-sm text-gray-600 mt-8">
            <span class="font-semibold">Benefit:</span> Query segmentation allows RecBERT to understand and match different facets of a complex query that might be discussed in separate comments within the same class. Combining the full query and subquery similarities provides a robust ranking, capturing both direct matches and composite relevance. The `tanh⁻¹` adjustment non-linearly boosts scores when multiple subqueries match within a class.
        </p>
    </div>
</body>
</html>