Spaces:

smjain
/

jepa

Running

App Files Files Community

jepa / index.html

smjain

Add 3 files

be25a05 verified 6 days ago

raw

history blame contribute delete

17.9 kB

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>JEPA and Cognitive Architectures</title>
	<script src="https://cdn.tailwindcss.com"></script>
	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
	<style>
	@import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap');

	body {
	font-family: 'Inter', sans-serif;
	background-color: #f9fafb;
	color: #111827;
	}

	.gradient-header {
	background: linear-gradient(135deg, #4f46e5 0%, #7c3aed 100%);
	}

	.diagram-container {
	background-color: #f3f4f6;
	border-radius: 0.5rem;
	padding: 1.5rem;
	margin: 1.5rem 0;
	border-left: 4px solid #4f46e5;
	}

	.concept-card {
	transition: all 0.3s ease;
	border-radius: 0.5rem;
	box-shadow: 0 1px 3px rgba(0,0,0,0.1);
	}

	.concept-card:hover {
	transform: translateY(-2px);
	box-shadow: 0 10px 15px -3px rgba(0,0,0,0.1);
	}

	.section-divider {
	border-top: 2px dashed #d1d5db;
	margin: 2rem 0;
	}

	.key-point {
	background-color: #eef2ff;
	border-left: 4px solid #4f46e5;
	padding: 1rem;
	margin: 1rem 0;
	border-radius: 0 0.375rem 0.375rem 0;
	}

	code {
	background-color: #f3f4f6;
	padding: 0.2rem 0.4rem;
	border-radius: 0.25rem;
	font-family: 'Courier New', monospace;
	font-size: 0.9em;
	color: #7c3aed;
	}

	.pseudo-code {
	background-color: #1e293b;
	color: #f8fafc;
	padding: 1rem;
	border-radius: 0.5rem;
	font-family: 'Courier New', monospace;
	overflow-x: auto;
	margin: 1.5rem 0;
	}

	.pseudo-code .keyword {
	color: #f472b6;
	}

	.pseudo-code .comment {
	color: #94a3b8;
	font-style: italic;
	}

	.pseudo-code .string {
	color: #86efac;
	}

	.pseudo-code .function {
	color: #60a5fa;
	}
	</style>
	</head>
	<body class="bg-gray-50">
	<div class="max-w-5xl mx-auto px-4 py-8">
	<!-- Header -->
	<header class="gradient-header text-white rounded-xl p-8 mb-8 shadow-lg">
	<div class="flex items-center justify-between">
	<div>
	<h1 class="text-4xl font-bold mb-2">JEPA and Cognitive Architectures</h1>
	<p class="text-xl opacity-90">A Comprehensive Introduction to Predictive AI Systems</p>
	</div>
	<div class="bg-white/20 p-4 rounded-lg">
	<i class="fas fa-brain text-4xl"></i>
	</div>
	</div>
	</header>

	<!-- Navigation -->
	<nav class="bg-white rounded-lg shadow-sm p-4 mb-8 sticky top-4 z-10">
	<ul class="flex flex-wrap gap-4 justify-center">
	<li><a href="#motivation" class="text-indigo-600 hover:text-indigo-800 font-medium">Motivation</a></li>
	<li><a href="#jepa-core" class="text-indigo-600 hover:text-indigo-800 font-medium">JEPA Core</a></li>
	<li><a href="#cognitive-arch" class="text-indigo-600 hover:text-indigo-800 font-medium">Cognitive Architecture</a></li>
	<li><a href="#modules" class="text-indigo-600 hover:text-indigo-800 font-medium">Modules</a></li>
	<li><a href="#examples" class="text-indigo-600 hover:text-indigo-800 font-medium">Examples</a></li>
	<li><a href="#conclusion" class="text-indigo-600 hover:text-indigo-800 font-medium">Conclusion</a></li>
	</ul>
	</nav>

	<!-- Main Content -->
	<main class="space-y-8">
	<!-- Motivation Section -->
	<section id="motivation" class="bg-white rounded-xl shadow-sm p-6">
	<h2 class="text-2xl font-bold mb-4 text-gray-800 flex items-center">
	<i class="fas fa-lightbulb text-yellow-500 mr-3"></i>
	<span>1. Motivation and Background</span>
	</h2>

	<h3 class="text-xl font-semibold mt-6 mb-3 text-gray-700">1.1 The Need for Predictive Representations</h3>
	<p class="text-gray-700 mb-4">
	Modern AI systems must <span class="font-medium">perceive</span>, <span class="font-medium">reason</span>, and <span class="font-medium">act</span> in complex, dynamic environments. Human intelligence excels not because we memorize every detail, but because we <span class="font-medium">summarize</span>, <span class="font-medium">predict</span>, and <span class="font-medium">plan</span> using abstract representations—ignoring irrelevant noise and focusing on what is useful for future reasoning or action.
	</p>
	<p class="text-gray-700 mb-4">
	Recent advances in deep learning (e.g., large language models, vision transformers) have shown the power of self-supervised representation learning. However, standard architectures (like autoregressive models) are often forced to model all details, including noise and unpredictability, limiting robustness and sample efficiency.
	</p>

	<h3 class="text-xl font-semibold mt-6 mb-3 text-gray-700">1.2 Enter JEPA: Joint Embedding Predictive Architecture</h3>
	<p class="text-gray-700">
	Proposed by Yann LeCun and colleagues, <span class="font-medium text-indigo-700">JEPA</span> offers a novel approach:
	</p>
	<ul class="list-disc pl-6 mt-2 space-y-2 text-gray-700">
	<li><span class="font-medium">Learn representations by predicting only what is predictable</span>—not every detail, but the essential structure that allows for accurate reasoning and planning.</li>
	</ul>

	<div class="key-point mt-6">
	<p class="font-medium text-gray-800">Key Insight:</p>
	<p>JEPA focuses on learning the predictable aspects of data while ignoring unpredictable noise, leading to more robust and efficient representations.</p>
	</div>
	</section>

	<!-- JEPA Core Section -->
	<section id="jepa-core" class="bg-white rounded-xl shadow-sm p-6">
	<h2 class="text-2xl font-bold mb-4 text-gray-800 flex items-center">
	<i class="fas fa-puzzle-piece text-blue-500 mr-3"></i>
	<span>2. JEPA: Core Ideas and Mechanism</span>
	</h2>

	<h3 class="text-xl font-semibold mt-6 mb-3 text-gray-700">2.1 What is JEPA?</h3>
	<p class="text-gray-700 mb-4">
	<span class="font-medium text-indigo-700">JEPA (Joint Embedding Predictive Architecture)</span> is a self-supervised learning framework where a model is trained to embed contexts (observed parts) and targets (future or missing parts) into a shared semantic space.
	</p>

	<div class="bg-blue-50 p-4 rounded-lg mb-6">
	<p class="font-medium text-blue-800">Objective:</p>
	<ul class="list-disc pl-6 mt-2 space-y-1 text-blue-800">
	<li>If the context and target belong together (e.g., two halves of the same image, or a sentence and its continuation), their embeddings should be <span class="font-medium">close</span>.</li>
	<li>If they do not (random combinations), their embeddings should be <span class="font-medium">far apart</span>.</li>
	<li>This is typically implemented via a <span class="font-medium">contrastive loss</span>.</li>
	</ul>
	</div>

	<h3 class="text-xl font-semibold mt-6 mb-3 text-gray-700">2.2 Why Is This Powerful?</h3>
	<div class="grid grid-cols-1 md:grid-cols-3 gap-4 mb-6">
	<div class="concept-card bg-white p-4 border border-gray-200">
	<div class="text-purple-600 mb-2">
	<i class="fas fa-filter text-xl"></i>
	</div>
	<h4 class="font-semibold mb-2">Focuses on Structure</h4>
	<p class="text-sm text-gray-600">Encodes only predictable, meaningful features while ignoring noise</p>
	</div>
	<div class="concept-card bg-white p-4 border border-gray-200">
	<div class="text-green-600 mb-2">
	<i class="fas fa-shapes text-xl"></i>
	</div>
	<h4 class="font-semibold mb-2">Multi-Modal</h4>
	<p class="text-sm text-gray-600">Works for vision, language, audio, video, and more</p>
	</div>
	<div class="concept-card bg-white p-4 border border-gray-200">
	<div class="text-red-600 mb-2">
	<i class="fas fa-robot text-xl"></i>
	</div>
	<h4 class="font-semibold mb-2">Transferable Features</h4>
	<p class="text-sm text-gray-600">Learns representations useful for reasoning and planning</p>
	</div>
	</div>

	<h3 class="text-xl font-semibold mt-6 mb-3 text-gray-700">2.3 The JEPA Training Loop</h3>
	<div class="diagram-container">
	<div class="flex flex-col items-center">
	<div class="flex items-center justify-center space-x-8 mb-6">
	<div class="text-center">
	<div class="bg-indigo-100 p-3 rounded-lg inline-block">
	<i class="fas fa-eye text-indigo-600 text-2xl"></i>
	</div>
	<p class="mt-2 font-medium">Context Encoder</p>
	<p class="text-sm text-gray-600">Takes observed input</p>
	</div>
	<div class="text-center">
	<div class="bg-indigo-100 p-3 rounded-lg inline-block">
	<i class="fas fa-project-diagram text-indigo-600 text-2xl"></i>
	</div>
	<p class="mt-2 font-medium">Embedding Space</p>
	<p class="text-sm text-gray-600">Shared representation</p>
	</div>
	<div class="text-center">
	<div class="bg-indigo-100 p-3 rounded-lg inline-block">
	<i class="fas fa-bullseye text-indigo-600 text-2xl"></i>
	</div>
	<p class="mt-2 font-medium">Target Encoder</p>
	<p class="text-sm text-gray-600">Takes future/missing part</p>
	</div>
	</div>
	<div class="w-full bg-indigo-50 p-4 rounded-lg">
	<div class="flex justify-between items-center px-4">
	<div class="text-center">
	<p class="font-medium">Input Context</p>
	<p class="text-sm">(e.g., left image half)</p>
	</div>
	<div class="text-center">
	<p class="font-medium">Similarity</p>
	<p class="text-sm">Contrastive Loss</p>
	</div>
	<div class="text-center">
	<p class="font-medium">Input Target</p>
	<p class="text-sm">(e.g., right image half)</p>
	</div>
	</div>
	</div>
	</div>
	</div>

	<h4 class="font-semibold mt-6 mb-2 text-gray-700">Concrete Examples:</h4>
	<div class="grid grid-cols-1 md:grid-cols-2 gap-4">
	<div class="bg-gray-50 p-4 rounded-lg border border-gray-200">
	<div class="flex items-center mb-2">
	<div class="bg-purple-100 p-2 rounded-full mr-3">
	<i class="fas fa-image text-purple-600"></i>
	</div>
	<h5 class="font-medium">Vision Example</h5>
	</div>
	<ul class="list-disc pl-6 text-sm text-gray-700">
	<li>Context: Left half of a cat image</li>
	<li>Target: Right half</li>
	<li>Embeddings should be close if they come from the same photo, far otherwise</li>
	</ul>
	</div>
	<div class="bg-gray-50 p-4 rounded-lg border border-gray-200">
	<div class="flex items-center mb-2">
	<div class="bg-green-100 p-2 rounded-full mr-3">
	<i class="fas fa-language text-green-600"></i>
	</div>
	<h5 class="font-medium">Language Example</h5>
	</div>
	<ul class="list-disc pl-6 text-sm text-gray-700">
	<li>Context: "The cat sat on the"</li>
	<li>Target: "mat"</li>
	<li>Close if the sequence is real, far if target is random</li>
	</ul>
	</div>
	</div>
	</section>

	<!-- Cognitive Architecture Section -->
	<section id="cognitive-arch" class="bg-white rounded-xl shadow-sm p-6">
	<h2 class="text-2xl font-bold mb-4 text-gray-800 flex items-center">
	<i class="fas fa-sitemap text-teal-500 mr-3"></i>
	<span>3. From Representation to Reasoning: JEPA in Cognitive Architectures</span>
	</h2>

	<p class="text-gray-700 mb-4">
	JEPA shines as a <span class="font-medium">perception module</span> within a larger, <span class="font-medium">modular cognitive agent</span>. This mirrors biological systems: sensory organs and cortex encode perceptions, while higher reasoning and planning are handled by specialized systems.
	</p>

	<h3 class="text-xl font-semibold mt-6 mb-3 text-gray-700">3.1 The Modular Agent</h3>
	<p class="text-gray-700 mb-4">
	The LeCun-style architecture for an intelligent agent typically includes:
	</p>

	<div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4 mb-6">
	<div class="concept-card bg-indigo-50 p-4">
	<div class="flex items-center mb-2">
	<div class="bg-indigo-100 p-2 rounded-full mr-3">
	<i class="fas fa-eye text-indigo-600"></i>
	</div>
	<h4 class="font-medium">1. Perception Module (JEPA)</h4>
	</div>
	<p class="text-sm text-gray-700">Encodes current observation into a compact, predictive embedding</p>
	</div>
	<div class="concept-card bg-blue-50 p-4">
	<div class="flex items-center mb-2">
	<div class="bg-blue-100 p-2 rounded-full mr-3">
	<i class="fas fa-memory text-blue-600"></i>
	</div>
	<h4 class="font-medium">2. Short-term Memory</h4>
	</div>
	<p class="text-sm text-gray-700">Stores recent sequence of embeddings (history)</p>
	</div>
	<div class="concept-card bg-purple-50 p-4">
	<div class="flex items-center mb-2">
	<div class="bg-purple-100 p-2 rounded-full mr-3">
	<i class="fas fa-globe text-purple-600"></i>
	</div>
	<h4 class="font-medium">3. World Model</h4>
	</div>
	<p class="text-sm text-gray-700">Integrates the sequence to produce a latent state</p>
	</div>
	<div class="concept-card bg-green-50 p-4">
	<div class="flex items-center mb-2">
	<div class="bg-green-100 p-2 rounded-full mr-3">
	<i class="fas fa-cogs text-green-600"></i>
	</div>
	<h4 class="font-medium">4. Configurator</h4>
	</div>

	</html>