Spaces:
Running
Running
File size: 11,671 Bytes
f81d32a 0939556 325ae8e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>RecBERT Recommendation System</title>
<script src="https://cdn.tailwindcss.com"></script>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
<style>
/* Use Inter font */
body {
font-family: 'Inter', sans-serif;
background-color: #f8fafc; /* Light gray background */
}
/* Custom styles for arrows and boxes */
.arrow {
position: relative;
width: 100%;
height: 2px;
background-color: #6b7280; /* Gray-500 */
margin: 1.5rem 0;
}
.arrow::after {
content: '';
position: absolute;
right: -1px;
top: -4px;
width: 0;
height: 0;
border-top: 5px solid transparent;
border-bottom: 5px solid transparent;
border-left: 8px solid #6b7280; /* Gray-500 */
}
.llm-box {
border: 2px solid #fbbf24; /* Amber-400 */
background-color: #fefce8; /* Amber-50 */
}
.transformer-box {
border: 2px solid #60a5fa; /* Blue-400 */
background-color: #eff6ff; /* Blue-50 */
}
.data-box {
border: 2px dashed #a78bfa; /* Violet-400 */
background-color: #f5f3ff; /* Violet-50 */
}
.input-box, .output-box {
border: 2px solid #9ca3af; /* Gray-400 */
background-color: #ffffff; /* White */
}
.process-box {
border: 2px solid #34d399; /* Emerald-400 */
background-color: #ecfdf5; /* Emerald-50 */
}
.card {
background-color: white;
border-radius: 0.75rem; /* lg */
box-shadow: 0 4px 6px -1px rgb(0 0 0 / 0.1), 0 2px 4px -2px rgb(0 0 0 / 0.1);
padding: 1.5rem; /* p-6 */
margin-bottom: 2rem; /* mb-8 */
}
.section-title {
font-size: 1.5rem; /* text-2xl */
font-weight: 600; /* font-semibold */
margin-bottom: 1rem; /* mb-4 */
color: #1f2937; /* Gray-800 */
}
.step-label {
font-size: 0.875rem; /* text-sm */
font-weight: 500; /* font-medium */
color: #4b5563; /* Gray-600 */
margin-bottom: 0.5rem; /* mb-2 */
text-align: center;
}
.formula {
font-family: 'Courier New', Courier, monospace;
background-color: #f3f4f6; /* Gray-100 */
padding: 0.5rem;
border-radius: 0.25rem;
font-size: 0.8rem;
overflow-x: auto;
white-space: pre;
}
</style>
</head>
<body class="p-4 md:p-8">
<h1 class="text-3xl md:text-4xl font-bold text-center mb-8 md:mb-12 text-gray-900">
Visualizing the RecBERT Recommendation System
</h1>
<div class="card">
<h2 class="section-title">1. Training the RecBERT Embedding Model</h2>
<p class="text-gray-700 mb-6">RecBERT first adapts a base transformer model to the specific domain of user comments and then fine-tunes it to generate meaningful sentence-level embeddings.</p>
<div class="grid grid-cols-1 md:grid-cols-5 gap-4 items-start">
<div class="flex flex-col items-center">
<div class="step-label">Base Model</div>
<div class="transformer-box p-3 rounded-lg w-full text-center shadow">
<span class="font-semibold text-blue-700">RoBERTa</span>
<div class="text-xs text-blue-600">(Pre-trained on general text)</div>
</div>
</div>
<div class="flex flex-col items-center justify-start md:mt-6">
<div class="arrow w-16 md:w-full"></div>
<div class="text-xs text-center text-gray-500 -mt-4">Domain Adaptation (MLM on User Comments)</div>
</div>
<div class="flex flex-col items-center">
<div class="step-label">Domain Adapted</div>
<div class="transformer-box p-3 rounded-lg w-full text-center shadow">
<span class="font-semibold text-blue-700">RoBERTa</span>
<div class="text-xs text-blue-600">(Understands comment-specific language)</div>
</div>
<div class="data-box p-2 mt-2 rounded-lg w-full text-center shadow text-xs">
<span class="font-semibold text-violet-700">Input Data:</span>
<div class="text-violet-600">User Comments Dataset (e.g., MyAnimeList reviews)</div>
</div>
</div>
<div class="flex flex-col items-center justify-start md:mt-6">
<div class="arrow w-16 md:w-full"></div>
<div class="text-xs text-center text-gray-500 -mt-4">Fine-tuning (SimCSE + MNR Loss)</div>
</div>
<div class="flex flex-col items-center">
<div class="step-label">Fine-Tuned Model</div>
<div class="transformer-box p-3 rounded-lg w-full text-center shadow border-emerald-500 bg-emerald-50">
<span class="font-semibold text-emerald-700">RecBERT</span>
<div class="text-xs text-emerald-600">(Generates Semantic Comment Embeddings)</div>
</div>
<div class="process-box p-2 mt-2 rounded-lg w-full text-center shadow text-xs">
<span class="font-semibold text-emerald-700">Method:</span>
<div class="text-emerald-600">Siamese Network + SimCSE (Contrastive Learning)</div>
</div>
</div>
</div>
<p class="text-sm text-gray-600 mt-8">
<span class="font-semibold">Benefit:</span> This process creates a model that can accurately represent the semantic meaning of entire user comments as dense vectors (embeddings), tailored to the specific language used in those comments. This is crucial for comparing comments and queries effectively.
</p>
</div>
<div class="card">
<h2 class="section-title">2. Query Processing & Ranking Retrieval</h2>
<p class="text-gray-700 mb-6">When a user query arrives, RecBERT segments it using an LLM and calculates similarity scores through two channels (full query and subqueries) to rank relevant classes (e.g., stories, items).</p>
<div class="grid grid-cols-1 md:grid-cols-3 gap-6 items-start mb-8">
<div class="flex flex-col items-center">
<div class="step-label">User Query (γ)</div>
<div class="input-box p-3 rounded-lg w-full text-center shadow">
"isekai story with strong female lead and magic system"
</div>
</div>
<div class="flex flex-col items-center justify-start md:mt-6">
<div class="arrow w-16 md:w-full"></div>
<div class="text-xs text-center text-gray-500 -mt-4">LLM Query Segmentation (Few-Shot)</div>
</div>
<div class="flex flex-col items-center">
<div class="step-label">Subqueries (γ1, γ2, γ3)</div>
<div class="llm-box p-3 rounded-lg w-full text-left text-sm shadow">
<ul class="list-disc list-inside">
<li>isekai story (γ1)</li>
<li>strong female lead (γ2)</li>
<li>magic system (γ3)</li>
</ul>
</div>
</div>
</div>
<div class="grid grid-cols-1 md:grid-cols-2 gap-8 border-t border-gray-200 pt-8">
<div class="process-box p-4 rounded-lg shadow">
<h3 class="font-semibold text-emerald-800 mb-2 text-center">Channel 1: Full Query Similarity (S1)</h3>
<div class="flex flex-col items-center space-y-3">
<div class="input-box p-2 rounded text-xs w-full text-center">Full Query Embedding e(γ)</div>
<div class="text-emerald-600 text-2xl">↓</div>
<div class="data-box p-2 rounded text-xs w-full text-center">KNN Search vs. All Comment Embeddings e(A)</div>
<div class="text-emerald-600 text-2xl">↓</div>
<div class="output-box p-2 rounded text-xs w-full text-center">Max Similarity per Class</div>
<div class="formula mt-2">S1 = max(cos_sim(e(γ), e(A)))</div>
</div>
</div>
<div class="process-box p-4 rounded-lg shadow">
<h3 class="font-semibold text-emerald-800 mb-2 text-center">Channel 2: Subquery Similarity (S2)</h3>
<div class="flex flex-col items-center space-y-3">
<div class="input-box p-2 rounded text-xs w-full text-center">Subquery Embeddings e(γ1), e(γ2), ...</div>
<div class="text-emerald-600 text-2xl">↓</div>
<div class="data-box p-2 rounded text-xs w-full text-center">KNN Search per Subquery vs. All Comment Embeddings e(B)</div>
<div class="text-emerald-600 text-2xl">↓</div>
<div class="output-box p-2 rounded text-xs w-full text-center">Avg. of Max Similarities per Class (s2)</div>
<div class="formula mt-2">s2 = avg(max(cos_sim(e(γi), e(B))))</div>
<div class="text-emerald-600 text-2xl">↓</div>
<div class="output-box p-2 rounded text-xs w-full text-center">Adjusted Similarity (S2)</div>
<div class="formula mt-2">S2 = clamp(tanh⁻¹(s2), max=1)</div>
</div>
</div>
</div>
<div class="mt-12 pt-8 border-t border-gray-200 text-center">
<h3 class="text-lg font-semibold mb-3 text-gray-800">Final Class Ranking</h3>
<div class="flex flex-col items-center">
<div class="flex items-center gap-4 mb-4">
<div class="output-box p-3 rounded-lg shadow text-sm">Similarity S1</div>
<div class="text-2xl font-bold text-gray-700">&</div>
<div class="output-box p-3 rounded-lg shadow text-sm">Similarity S2</div>
</div>
<div class="text-emerald-600 text-2xl mb-2">↓</div>
<div class="process-box p-4 rounded-lg shadow inline-block">
<div class="font-semibold text-emerald-800">Final Score (S) per Class</div>
<div class="formula mt-2">S = max(S1, S2)</div>
</div>
<div class="text-emerald-600 text-2xl mt-2">↓</div>
<div class="output-box p-3 rounded-lg shadow text-sm font-medium">Ranked List of Classes</div>
</div>
</div>
<p class="text-sm text-gray-600 mt-8">
<span class="font-semibold">Benefit:</span> Query segmentation allows RecBERT to understand and match different facets of a complex query that might be discussed in separate comments within the same class. Combining the full query and subquery similarities provides a robust ranking, capturing both direct matches and composite relevance. The `tanh⁻¹` adjustment non-linearly boosts scores when multiple subqueries match within a class.
</p>
</div>
</body>
</html>
|