File size: 11,671 Bytes
f81d32a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0939556
325ae8e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>RecBERT Recommendation System</title>
    <script src="https://cdn.tailwindcss.com"></script>
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
    <style>
        /* Use Inter font */
        body {
            font-family: 'Inter', sans-serif;
            background-color: #f8fafc; /* Light gray background */
        }
        /* Custom styles for arrows and boxes */
        .arrow {
            position: relative;
            width: 100%;
            height: 2px;
            background-color: #6b7280; /* Gray-500 */
            margin: 1.5rem 0;
        }
        .arrow::after {
            content: '';
            position: absolute;
            right: -1px;
            top: -4px;
            width: 0;
            height: 0;
            border-top: 5px solid transparent;
            border-bottom: 5px solid transparent;
            border-left: 8px solid #6b7280; /* Gray-500 */
        }
        .llm-box {
            border: 2px solid #fbbf24; /* Amber-400 */
            background-color: #fefce8; /* Amber-50 */
        }
        .transformer-box {
            border: 2px solid #60a5fa; /* Blue-400 */
            background-color: #eff6ff; /* Blue-50 */
        }
         .data-box {
            border: 2px dashed #a78bfa; /* Violet-400 */
            background-color: #f5f3ff; /* Violet-50 */
         }
        .input-box, .output-box {
             border: 2px solid #9ca3af; /* Gray-400 */
             background-color: #ffffff; /* White */
        }
        .process-box {
            border: 2px solid #34d399; /* Emerald-400 */
            background-color: #ecfdf5; /* Emerald-50 */
        }
        .card {
            background-color: white;
            border-radius: 0.75rem; /* lg */
            box-shadow: 0 4px 6px -1px rgb(0 0 0 / 0.1), 0 2px 4px -2px rgb(0 0 0 / 0.1);
            padding: 1.5rem; /* p-6 */
            margin-bottom: 2rem; /* mb-8 */
        }
        .section-title {
            font-size: 1.5rem; /* text-2xl */
            font-weight: 600; /* font-semibold */
            margin-bottom: 1rem; /* mb-4 */
            color: #1f2937; /* Gray-800 */
        }
        .step-label {
            font-size: 0.875rem; /* text-sm */
            font-weight: 500; /* font-medium */
            color: #4b5563; /* Gray-600 */
            margin-bottom: 0.5rem; /* mb-2 */
            text-align: center;
        }
        .formula {
            font-family: 'Courier New', Courier, monospace;
            background-color: #f3f4f6; /* Gray-100 */
            padding: 0.5rem;
            border-radius: 0.25rem;
            font-size: 0.8rem;
            overflow-x: auto;
            white-space: pre;
        }
    </style>
</head>
<body class="p-4 md:p-8">

    <h1 class="text-3xl md:text-4xl font-bold text-center mb-8 md:mb-12 text-gray-900">
        Visualizing the RecBERT Recommendation System
    </h1>

    <div class="card">
        <h2 class="section-title">1. Training the RecBERT Embedding Model</h2>
        <p class="text-gray-700 mb-6">RecBERT first adapts a base transformer model to the specific domain of user comments and then fine-tunes it to generate meaningful sentence-level embeddings.</p>

        <div class="grid grid-cols-1 md:grid-cols-5 gap-4 items-start">
            <div class="flex flex-col items-center">
                <div class="step-label">Base Model</div>
                <div class="transformer-box p-3 rounded-lg w-full text-center shadow">
                    <span class="font-semibold text-blue-700">RoBERTa</span>
                    <div class="text-xs text-blue-600">(Pre-trained on general text)</div>
                </div>
            </div>

            <div class="flex flex-col items-center justify-start md:mt-6">
                 <div class="arrow w-16 md:w-full"></div>
                 <div class="text-xs text-center text-gray-500 -mt-4">Domain Adaptation (MLM on User Comments)</div>
            </div>

            <div class="flex flex-col items-center">
                <div class="step-label">Domain Adapted</div>
                <div class="transformer-box p-3 rounded-lg w-full text-center shadow">
                     <span class="font-semibold text-blue-700">RoBERTa</span>
                    <div class="text-xs text-blue-600">(Understands comment-specific language)</div>
                </div>
                 <div class="data-box p-2 mt-2 rounded-lg w-full text-center shadow text-xs">
                    <span class="font-semibold text-violet-700">Input Data:</span>
                    <div class="text-violet-600">User Comments Dataset (e.g., MyAnimeList reviews)</div>
                 </div>
            </div>

             <div class="flex flex-col items-center justify-start md:mt-6">
                 <div class="arrow w-16 md:w-full"></div>
                 <div class="text-xs text-center text-gray-500 -mt-4">Fine-tuning (SimCSE + MNR Loss)</div>
            </div>

            <div class="flex flex-col items-center">
                <div class="step-label">Fine-Tuned Model</div>
                <div class="transformer-box p-3 rounded-lg w-full text-center shadow border-emerald-500 bg-emerald-50">
                     <span class="font-semibold text-emerald-700">RecBERT</span>
                    <div class="text-xs text-emerald-600">(Generates Semantic Comment Embeddings)</div>
                </div>
                 <div class="process-box p-2 mt-2 rounded-lg w-full text-center shadow text-xs">
                    <span class="font-semibold text-emerald-700">Method:</span>
                    <div class="text-emerald-600">Siamese Network + SimCSE (Contrastive Learning)</div>
                 </div>
            </div>
        </div>
         <p class="text-sm text-gray-600 mt-8">
            <span class="font-semibold">Benefit:</span> This process creates a model that can accurately represent the semantic meaning of entire user comments as dense vectors (embeddings), tailored to the specific language used in those comments. This is crucial for comparing comments and queries effectively.
        </p>
    </div>

    <div class="card">
        <h2 class="section-title">2. Query Processing & Ranking Retrieval</h2>
        <p class="text-gray-700 mb-6">When a user query arrives, RecBERT segments it using an LLM and calculates similarity scores through two channels (full query and subqueries) to rank relevant classes (e.g., stories, items).</p>

        <div class="grid grid-cols-1 md:grid-cols-3 gap-6 items-start mb-8">
             <div class="flex flex-col items-center">
                <div class="step-label">User Query (γ)</div>
                <div class="input-box p-3 rounded-lg w-full text-center shadow">
                    "isekai story with strong female lead and magic system"
                </div>
            </div>

             <div class="flex flex-col items-center justify-start md:mt-6">
                 <div class="arrow w-16 md:w-full"></div>
                 <div class="text-xs text-center text-gray-500 -mt-4">LLM Query Segmentation (Few-Shot)</div>
            </div>

             <div class="flex flex-col items-center">
                <div class="step-label">Subqueries (γ1, γ2, γ3)</div>
                <div class="llm-box p-3 rounded-lg w-full text-left text-sm shadow">
                    <ul class="list-disc list-inside">
                        <li>isekai story (γ1)</li>
                        <li>strong female lead (γ2)</li>
                        <li>magic system (γ3)</li>
                    </ul>
                </div>
            </div>
        </div>

        <div class="grid grid-cols-1 md:grid-cols-2 gap-8 border-t border-gray-200 pt-8">
            <div class="process-box p-4 rounded-lg shadow">
                <h3 class="font-semibold text-emerald-800 mb-2 text-center">Channel 1: Full Query Similarity (S1)</h3>
                <div class="flex flex-col items-center space-y-3">
                    <div class="input-box p-2 rounded text-xs w-full text-center">Full Query Embedding e(γ)</div>
                    <div class="text-emerald-600 text-2xl">&darr;</div>
                    <div class="data-box p-2 rounded text-xs w-full text-center">KNN Search vs. All Comment Embeddings e(A)</div>
                     <div class="text-emerald-600 text-2xl">&darr;</div>
                     <div class="output-box p-2 rounded text-xs w-full text-center">Max Similarity per Class</div>
                     <div class="formula mt-2">S1 = max(cos_sim(e(γ), e(A)))</div>
                </div>
            </div>

            <div class="process-box p-4 rounded-lg shadow">
                 <h3 class="font-semibold text-emerald-800 mb-2 text-center">Channel 2: Subquery Similarity (S2)</h3>
                 <div class="flex flex-col items-center space-y-3">
                    <div class="input-box p-2 rounded text-xs w-full text-center">Subquery Embeddings e(γ1), e(γ2), ...</div>
                     <div class="text-emerald-600 text-2xl">&darr;</div>
                    <div class="data-box p-2 rounded text-xs w-full text-center">KNN Search per Subquery vs. All Comment Embeddings e(B)</div>
                     <div class="text-emerald-600 text-2xl">&darr;</div>
                     <div class="output-box p-2 rounded text-xs w-full text-center">Avg. of Max Similarities per Class (s2)</div>
                     <div class="formula mt-2">s2 = avg(max(cos_sim(e(γi), e(B))))</div>
                     <div class="text-emerald-600 text-2xl">&darr;</div>
                     <div class="output-box p-2 rounded text-xs w-full text-center">Adjusted Similarity (S2)</div>
                     <div class="formula mt-2">S2 = clamp(tanh⁻¹(s2), max=1)</div>
                 </div>
            </div>
        </div>

        <div class="mt-12 pt-8 border-t border-gray-200 text-center">
             <h3 class="text-lg font-semibold mb-3 text-gray-800">Final Class Ranking</h3>
             <div class="flex flex-col items-center">
                 <div class="flex items-center gap-4 mb-4">
                     <div class="output-box p-3 rounded-lg shadow text-sm">Similarity S1</div>
                     <div class="text-2xl font-bold text-gray-700">&amp;</div>
                     <div class="output-box p-3 rounded-lg shadow text-sm">Similarity S2</div>
                 </div>
                 <div class="text-emerald-600 text-2xl mb-2">&darr;</div>
                 <div class="process-box p-4 rounded-lg shadow inline-block">
                     <div class="font-semibold text-emerald-800">Final Score (S) per Class</div>
                     <div class="formula mt-2">S = max(S1, S2)</div>
                 </div>
                  <div class="text-emerald-600 text-2xl mt-2">&darr;</div>
                  <div class="output-box p-3 rounded-lg shadow text-sm font-medium">Ranked List of Classes</div>
             </div>
        </div>

        <p class="text-sm text-gray-600 mt-8">
            <span class="font-semibold">Benefit:</span> Query segmentation allows RecBERT to understand and match different facets of a complex query that might be discussed in separate comments within the same class. Combining the full query and subquery similarities provides a robust ranking, capturing both direct matches and composite relevance. The `tanh⁻¹` adjustment non-linearly boosts scores when multiple subqueries match within a class.
        </p>
    </div>
</body>
</html>