burtenshaw commited on
Commit
f140a78
Β·
1 Parent(s): d338da0

build gradio app around scripts

Browse files
.gitignore CHANGED
@@ -189,4 +189,7 @@ cython_debug/
189
  *.gif
190
  *.bmp
191
  *.tiff
192
- *.pdf
 
 
 
 
189
  *.gif
190
  *.bmp
191
  *.tiff
192
+ *.pdf
193
+
194
+ .DS_Store
195
+ .vscode
app/.tmp/pdf/presentation.html ADDED
The diff for this file is too large to render. See raw diff
 
app/.url_cache/presentations_cache.db ADDED
Binary file (41 kB). View file
 
app/app.py ADDED
@@ -0,0 +1,1096 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import requests
3
+ from bs4 import BeautifulSoup
4
+ import os
5
+ import re
6
+ import subprocess
7
+ import tempfile
8
+ import shutil
9
+ from pathlib import Path
10
+ import logging
11
+ from dotenv import load_dotenv
12
+ import shelve
13
+
14
+ # Import functions from your scripts (assuming they are structured appropriately)
15
+ # It's often better to refactor scripts into functions for easier import
16
+ try:
17
+ from src.create_presentation import (
18
+ generate_presentation_with_llm,
19
+ DEFAULT_LLM_MODEL,
20
+ DEFAULT_PRESENTATION_PROMPT_TEMPLATE,
21
+ )
22
+ from src.transcription_to_audio import text_to_speech, VOICE_ID
23
+ from src.create_video import (
24
+ find_audio_files,
25
+ convert_pdf_to_images,
26
+ create_video_clips,
27
+ concatenate_clips,
28
+ cleanup_temp_files,
29
+ )
30
+ from huggingface_hub import InferenceClient
31
+ except ImportError as e:
32
+ print(f"Error importing script functions: {e}")
33
+ print("Please ensure scripts are in the 'src' directory and structured correctly.")
34
+ exit(1)
35
+
36
+ load_dotenv()
37
+
38
+ # --- Configuration & Setup ---
39
+ logging.basicConfig(
40
+ level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
41
+ )
42
+ logger = logging.getLogger(__name__)
43
+
44
+ HF_API_KEY = os.getenv("HF_API_KEY")
45
+ LLM_MODEL = os.getenv("LLM_MODEL", DEFAULT_LLM_MODEL)
46
+ PRESENTATION_PROMPT = os.getenv(
47
+ "PRESENTATION_PROMPT", DEFAULT_PRESENTATION_PROMPT_TEMPLATE
48
+ )
49
+ CACHE_DIR = ".cache" # For TTS caching
50
+ URL_CACHE_DIR = ".url_cache"
51
+ URL_CACHE_FILE = os.path.join(URL_CACHE_DIR, "presentations_cache")
52
+
53
+ # Initialize clients (do this once if possible, or manage carefully in functions)
54
+ try:
55
+ if HF_API_KEY:
56
+ hf_client = InferenceClient(token=HF_API_KEY, provider="cohere")
57
+ else:
58
+ logger.warning("HF_API_KEY not found. LLM generation will fail.")
59
+ hf_client = None
60
+ except Exception as e:
61
+ logger.error(f"Failed to initialize Hugging Face client: {e}")
62
+ hf_client = None
63
+
64
+ # --- Helper Functions ---
65
+
66
+
67
+ def fetch_webpage_content(url):
68
+ """Fetches and extracts basic text content from a webpage."""
69
+ logger.info(f"Fetching content from: {url}")
70
+ try:
71
+ response = requests.get(url, timeout=15)
72
+ response.raise_for_status() # Raise an exception for bad status codes
73
+ soup = BeautifulSoup(response.text, "html.parser")
74
+
75
+ # Basic text extraction (can be improved significantly)
76
+ paragraphs = soup.find_all("p")
77
+ headings = soup.find_all(["h1", "h2", "h3", "h4", "h5", "h6"])
78
+ list_items = soup.find_all("li")
79
+
80
+ content = (
81
+ "\n".join([h.get_text() for h in headings])
82
+ + "\n\n"
83
+ + "\n".join([p.get_text() for p in paragraphs])
84
+ + "\n\n"
85
+ + "\n".join(["- " + li.get_text() for li in list_items])
86
+ )
87
+
88
+ # Simple cleanup
89
+ content = re.sub(r"\s\s+", " ", content).strip()
90
+ logger.info(
91
+ f"Successfully fetched and parsed content (length: {len(content)})."
92
+ )
93
+ return content
94
+ except requests.exceptions.RequestException as e:
95
+ logger.error(f"Error fetching URL {url}: {e}")
96
+ return None
97
+ except Exception as e:
98
+ logger.error(f"Error parsing URL {url}: {e}")
99
+ return None
100
+
101
+
102
+ def parse_presentation_markdown(markdown_content):
103
+ """Splits presentation markdown into slides with content and notes."""
104
+ slides = []
105
+ slide_parts = re.split(r"\n\n---\n\n", markdown_content)
106
+ for i, part in enumerate(slide_parts):
107
+ if "???" in part:
108
+ content, notes = part.split("???", 1)
109
+ slides.append({"id": i, "content": content.strip(), "notes": notes.strip()})
110
+ else:
111
+ # Handle slides without notes (like title slide maybe)
112
+ slides.append(
113
+ {
114
+ "id": i,
115
+ "content": part.strip(),
116
+ "notes": "", # Add empty notes field
117
+ }
118
+ )
119
+ logger.info(f"Parsed {len(slides)} slides from markdown.")
120
+ return slides
121
+
122
+
123
+ def reconstruct_presentation_markdown(slides_data):
124
+ """Reconstructs the markdown string from slide data."""
125
+ full_md = []
126
+ for slide in slides_data:
127
+ slide_md = slide["content"]
128
+ if slide[
129
+ "notes"
130
+ ]: # Only add notes separator if notes exist and are not just whitespace
131
+ slide_md += f"\n\n???\n{slide['notes'].strip()}"
132
+ full_md.append(slide_md.strip()) # Ensure each slide part is stripped
133
+ return "\n\n---\n\n".join(full_md)
134
+
135
+
136
+ def generate_pdf_from_markdown(markdown_file_path, output_pdf_path):
137
+ """Generates a PDF from a Markdown file using bs export + decktape."""
138
+ logger.info(f"Attempting PDF gen: {markdown_file_path} -> {output_pdf_path}")
139
+ working_dir = os.path.dirname(markdown_file_path)
140
+ markdown_filename = os.path.basename(markdown_file_path)
141
+ html_output_dir_name = "bs_html_output"
142
+ html_output_dir_abs = os.path.join(working_dir, html_output_dir_name)
143
+ expected_html_filename = os.path.splitext(markdown_filename)[0] + ".html"
144
+ generated_html_path_abs = os.path.join(html_output_dir_abs, expected_html_filename)
145
+ pdf_gen_success = False # Flag to track success
146
+
147
+ # ---- Step 1: Generate HTML using bs export ----
148
+ try:
149
+ Path(html_output_dir_abs).mkdir(parents=True, exist_ok=True)
150
+ export_command = ["bs", "export", markdown_filename, "-o", html_output_dir_name]
151
+ logger.info(f"Running: {' '.join(export_command)} in CWD: {working_dir}")
152
+ export_result = subprocess.run(
153
+ export_command,
154
+ cwd=working_dir,
155
+ capture_output=True,
156
+ text=True,
157
+ check=True,
158
+ timeout=60,
159
+ )
160
+ logger.info("Backslide (bs export) OK.")
161
+ logger.debug(f"bs export stdout:\n{export_result.stdout}")
162
+ logger.debug(f"bs export stderr:\n{export_result.stderr}")
163
+
164
+ if not os.path.exists(generated_html_path_abs):
165
+ logger.error(f"Expected HTML not found: {generated_html_path_abs}")
166
+ try:
167
+ files_in_dir = os.listdir(html_output_dir_abs)
168
+ logger.error(f"Files in {html_output_dir_abs}: {files_in_dir}")
169
+ except FileNotFoundError:
170
+ logger.error(
171
+ f"HTML output directory {html_output_dir_abs} not found after bs run."
172
+ )
173
+ raise FileNotFoundError(
174
+ f"Generated HTML not found: {generated_html_path_abs}"
175
+ )
176
+
177
+ except FileNotFoundError:
178
+ logger.error(
179
+ "`bs` command not found. Install backslide (`npm install -g backslide`)."
180
+ )
181
+ raise gr.Error("HTML generation tool (backslide/bs) not found.")
182
+ except subprocess.CalledProcessError as e:
183
+ logger.error(f"Backslide (bs export) failed (code {e.returncode}).")
184
+ logger.error(f"bs stderr:\n{e.stderr}")
185
+ raise gr.Error(f"Backslide HTML failed: {e.stderr[:500]}...")
186
+ except subprocess.TimeoutExpired:
187
+ logger.error("Backslide (bs export) timed out.")
188
+ raise gr.Error("HTML generation timed out (backslide).")
189
+ except Exception as e:
190
+ logger.error(f"Unexpected error during bs export: {e}", exc_info=True)
191
+ raise gr.Error(f"Unexpected error during HTML generation: {e}")
192
+
193
+ # ---- Step 2: Generate PDF from HTML using decktape ----
194
+ try:
195
+ Path(output_pdf_path).parent.mkdir(parents=True, exist_ok=True)
196
+ html_file_url = Path(generated_html_path_abs).as_uri()
197
+ decktape_command = ["decktape", html_file_url, str(output_pdf_path)]
198
+ logger.info(f"Running PDF conversion: {' '.join(decktape_command)}")
199
+ decktape_result = subprocess.run(
200
+ decktape_command,
201
+ capture_output=True,
202
+ text=True,
203
+ check=True,
204
+ timeout=120,
205
+ )
206
+ logger.info("Decktape command executed successfully.")
207
+ logger.debug(f"decktape stdout:\n{decktape_result.stdout}")
208
+ logger.debug(f"decktape stderr:\n{decktape_result.stderr}")
209
+
210
+ if os.path.exists(output_pdf_path):
211
+ logger.info(f"PDF generated successfully: {output_pdf_path}")
212
+ pdf_gen_success = True # Mark as success
213
+ return output_pdf_path
214
+ else:
215
+ logger.error("Decktape command finished but output PDF not found.")
216
+ return None
217
+
218
+ except FileNotFoundError:
219
+ logger.error(
220
+ "`decktape` command not found. Install decktape (`npm install -g decktape`)."
221
+ )
222
+ raise gr.Error("PDF generation tool (decktape) not found.")
223
+ except subprocess.CalledProcessError as e:
224
+ logger.error(f"Decktape command failed (code {e.returncode}).")
225
+ logger.error(f"decktape stderr:\n{e.stderr}")
226
+ raise gr.Error(f"Decktape PDF failed: {e.stderr[:500]}...")
227
+ except subprocess.TimeoutExpired:
228
+ logger.error("Decktape command timed out.")
229
+ raise gr.Error("PDF generation timed out (decktape).")
230
+ except Exception as e:
231
+ logger.error(
232
+ f"Unexpected error during decktape PDF generation: {e}", exc_info=True
233
+ )
234
+ raise gr.Error(f"Unexpected error during PDF generation: {e}")
235
+ finally:
236
+ # --- Cleanup HTML output directory ---
237
+ if os.path.exists(html_output_dir_abs):
238
+ try:
239
+ shutil.rmtree(html_output_dir_abs)
240
+ logger.info(f"Cleaned up HTML temp dir: {html_output_dir_abs}")
241
+ except Exception as cleanup_e:
242
+ logger.warning(
243
+ f"Could not cleanup HTML dir {html_output_dir_abs}: {cleanup_e}"
244
+ )
245
+ # Log final status
246
+ if pdf_gen_success:
247
+ logger.info(f"PDF generation process completed for {output_pdf_path}.")
248
+ else:
249
+ logger.error(f"PDF generation process failed for {output_pdf_path}.")
250
+
251
+
252
+ # --- Helper Function to Read CSS ---
253
+ def load_css(css_path="template/style.css"):
254
+ """Loads CSS content from a file."""
255
+ try:
256
+ with open(css_path, "r", encoding="utf-8") as f:
257
+ return f.read()
258
+ except FileNotFoundError:
259
+ logger.warning(f"CSS file not found at {css_path}. No custom styles applied.")
260
+ return "" # Return empty string instead of None
261
+ except Exception as e:
262
+ logger.error(f"Error reading CSS file {css_path}: {e}")
263
+ return "" # Return empty string on error
264
+
265
+
266
+ # --- Gradio Workflow Functions ---
267
+
268
+
269
+ def step1_fetch_and_generate_presentation(url, progress=gr.Progress(track_tqdm=True)):
270
+ """Fetches content, generates presentation markdown, prepares editor, and copies template. Uses caching based on URL."""
271
+ if not url:
272
+ raise gr.Error("Please enter a URL.")
273
+ logger.info(f"Step 1: Fetching & Generating for {url}")
274
+ gr.Info(f"Starting Step 1: Fetching content from {url}...")
275
+
276
+ # --- Cache Check ---
277
+ try:
278
+ os.makedirs(URL_CACHE_DIR, exist_ok=True) # Ensure cache dir exists
279
+ with shelve.open(URL_CACHE_FILE) as cache:
280
+ if url in cache:
281
+ logger.info(f"Cache hit for URL: {url}")
282
+ progress(0.5, desc="Loading cached presentation...")
283
+ cached_data = cache[url]
284
+ presentation_md = cached_data.get("presentation_md")
285
+ slides_data = cached_data.get("slides_data")
286
+
287
+ if presentation_md and slides_data:
288
+ temp_dir = tempfile.mkdtemp()
289
+ md_path = os.path.join(temp_dir, "presentation.md")
290
+ try:
291
+ with open(md_path, "w", encoding="utf-8") as f:
292
+ f.write(presentation_md)
293
+ logger.info(
294
+ f"Wrote cached presentation to temp file: {md_path}"
295
+ )
296
+
297
+ # --- Copy Template Directory for Cached Item ---
298
+ template_src_dir = "template"
299
+ template_dest_dir = os.path.join(temp_dir, "template")
300
+ if os.path.isdir(template_src_dir):
301
+ try:
302
+ shutil.copytree(template_src_dir, template_dest_dir)
303
+ logger.info(
304
+ f"Copied template dir to {template_dest_dir} (cached)"
305
+ )
306
+ except Exception as copy_e:
307
+ logger.error(
308
+ f"Failed to copy template dir for cache: {copy_e}"
309
+ )
310
+ shutil.rmtree(temp_dir)
311
+ raise gr.Error(f"Failed to prepare template: {copy_e}")
312
+ else:
313
+ logger.error(
314
+ f"Template source dir '{template_src_dir}' not found."
315
+ )
316
+ shutil.rmtree(temp_dir)
317
+ raise gr.Error(
318
+ f"Required template '{template_src_dir}' not found."
319
+ )
320
+
321
+ progress(0.9, desc="Preparing editor from cache...")
322
+ logger.info(f"Using cached data for {len(slides_data)} slides.")
323
+ # Return updates for the UI state and controls
324
+ return (
325
+ temp_dir,
326
+ md_path,
327
+ slides_data,
328
+ gr.update(visible=True), # editor_column
329
+ gr.update(
330
+ visible=True
331
+ ), # btn_generate_pdf (Enable PDF button next)
332
+ gr.update(
333
+ interactive=False
334
+ ), # btn_fetch_generate (disable)
335
+ )
336
+ except Exception as e:
337
+ logger.error(f"Error writing cached markdown: {e}")
338
+ if os.path.exists(temp_dir):
339
+ shutil.rmtree(temp_dir)
340
+ else:
341
+ logger.warning(f"Cache entry for {url} incomplete. Regenerating.")
342
+ # --- Cache Miss or Failed Cache Load ---
343
+ logger.info(f"Cache miss for URL: {url}. Proceeding with generation.")
344
+ progress(0.1, desc="Fetching webpage content...")
345
+ if not hf_client:
346
+ raise gr.Error("LLM Client not initialized. Check API Key.")
347
+
348
+ web_content = fetch_webpage_content(url)
349
+ if not web_content:
350
+ raise gr.Error("Failed to fetch or parse content from the URL.")
351
+
352
+ progress(0.3, desc="Generating presentation with LLM...")
353
+ try:
354
+ presentation_md = generate_presentation_with_llm(
355
+ hf_client, LLM_MODEL, PRESENTATION_PROMPT, web_content, url
356
+ )
357
+ except Exception as e:
358
+ logger.error(f"Error during LLM call: {e}", exc_info=True)
359
+ raise gr.Error(f"Failed to generate presentation from LLM: {e}")
360
+
361
+ if not presentation_md:
362
+ logger.error("LLM generation returned None.")
363
+ raise gr.Error("LLM generation failed (received None).")
364
+
365
+ # Check for basic structure early, but parsing handles final validation
366
+ if "---" not in presentation_md:
367
+ logger.warning(
368
+ "LLM output missing slide separators ('---'). Parsing might fail."
369
+ )
370
+ if "???" not in presentation_md:
371
+ logger.warning(
372
+ "LLM output missing notes separators ('???'). Notes might be empty."
373
+ )
374
+
375
+ progress(0.7, desc="Parsing presentation slides...")
376
+ slides_data = parse_presentation_markdown(presentation_md)
377
+ if not slides_data:
378
+ logger.error("Parsing markdown resulted in zero slides.")
379
+ raise gr.Error("Failed to parse generated presentation markdown.")
380
+
381
+ # Create a temporary directory for this session
382
+ temp_dir = tempfile.mkdtemp()
383
+ md_path = os.path.join(temp_dir, "presentation.md")
384
+ with open(md_path, "w", encoding="utf-8") as f:
385
+ f.write(presentation_md)
386
+ logger.info(f"Presentation markdown saved to temp file: {md_path}")
387
+
388
+ # --- Copy Template Directory for New Item ---
389
+ template_src_dir = "template"
390
+ template_dest_dir = os.path.join(temp_dir, "template")
391
+ if os.path.isdir(template_src_dir):
392
+ try:
393
+ shutil.copytree(template_src_dir, template_dest_dir)
394
+ logger.info(f"Copied template directory to {template_dest_dir}")
395
+ except Exception as copy_e:
396
+ logger.error(f"Failed to copy template directory: {copy_e}")
397
+ shutil.rmtree(temp_dir)
398
+ raise gr.Error(f"Failed to prepare template: {copy_e}")
399
+ else:
400
+ logger.error(f"Template source dir '{template_src_dir}' not found.")
401
+ shutil.rmtree(temp_dir)
402
+ raise gr.Error(f"Required template '{template_src_dir}' not found.")
403
+
404
+ # --- Store in Cache ---
405
+ try:
406
+ with shelve.open(URL_CACHE_FILE) as cache_write:
407
+ cache_write[url] = {
408
+ "presentation_md": presentation_md,
409
+ "slides_data": slides_data,
410
+ }
411
+ logger.info(
412
+ f"Stored generated presentation in cache for URL: {url}"
413
+ )
414
+ except Exception as e:
415
+ logger.error(f"Failed to write to cache for URL {url}: {e}")
416
+
417
+ progress(0.9, desc="Preparing editor...")
418
+ logger.info(f"Prepared data for {len(slides_data)} slides.")
419
+
420
+ # Return updates for the UI state and controls
421
+ return (
422
+ temp_dir,
423
+ md_path,
424
+ slides_data,
425
+ gr.update(visible=True), # editor_column
426
+ gr.update(visible=True), # btn_generate_pdf (Enable PDF button next)
427
+ gr.update(interactive=False), # btn_fetch_generate (disable)
428
+ )
429
+
430
+ except Exception as e:
431
+ logger.error(f"Error in step 1 (fetch/generate): {e}", exc_info=True)
432
+ raise gr.Error(f"Error during presentation setup: {e}")
433
+
434
+
435
+ def step2_build_slides(
436
+ state_temp_dir,
437
+ state_md_path,
438
+ state_slides_data,
439
+ *editors,
440
+ progress=gr.Progress(track_tqdm=True),
441
+ ):
442
+ """Renamed from step2_generate_pdf"""
443
+ if not all([state_temp_dir, state_md_path, state_slides_data]):
444
+ raise gr.Error("Session state missing.")
445
+ logger.info("Step 2: Building Slides (PDF + Images)")
446
+ gr.Info("Starting Step 2: Building slides...")
447
+ num_slides = len(state_slides_data)
448
+ MAX_SLIDES = 20
449
+ all_editors = list(editors)
450
+ if len(all_editors) != MAX_SLIDES * 2:
451
+ raise gr.Error(f"Incorrect editor inputs: {len(all_editors)}")
452
+ edited_contents = all_editors[:MAX_SLIDES][:num_slides]
453
+ edited_notes_list = all_editors[MAX_SLIDES:][:num_slides]
454
+ if len(edited_contents) != num_slides or len(edited_notes_list) != num_slides:
455
+ raise gr.Error("Editor input mismatch.")
456
+
457
+ progress(0.1, desc="Saving edited markdown...")
458
+ updated_slides = []
459
+ for i in range(num_slides):
460
+ updated_slides.append(
461
+ {"id": i, "content": edited_contents[i], "notes": edited_notes_list[i]}
462
+ )
463
+ updated_md = reconstruct_presentation_markdown(updated_slides)
464
+ try:
465
+ with open(state_md_path, "w", encoding="utf-8") as f:
466
+ f.write(updated_md)
467
+ logger.info(f"Saved edited markdown: {state_md_path}")
468
+ except IOError as e:
469
+ raise gr.Error(f"Failed to save markdown: {e}")
470
+
471
+ progress(0.3, desc="Generating PDF...")
472
+ pdf_output_path = os.path.join(state_temp_dir, "presentation.pdf")
473
+ generated_pdf_path = generate_pdf_from_markdown(state_md_path, pdf_output_path)
474
+ if not generated_pdf_path:
475
+ raise gr.Error("PDF generation failed (check logs).")
476
+
477
+ progress(0.7, desc="Converting PDF to images...")
478
+ pdf_images = []
479
+ try:
480
+ pdf_images = convert_pdf_to_images(
481
+ generated_pdf_path, dpi=150
482
+ ) # Use generated path
483
+ if not pdf_images:
484
+ raise gr.Error("PDF to image conversion failed.")
485
+ logger.info(f"Converted PDF to {len(pdf_images)} images.")
486
+ if len(pdf_images) != num_slides:
487
+ gr.Warning(
488
+ f"PDF page count ({len(pdf_images)}) != slide count ({num_slides}). Images might mismatch."
489
+ )
490
+ # Pad or truncate? For now, just return what we have, UI update logic handles MAX_SLIDES
491
+ except Exception as e:
492
+ logger.error(f"Error converting PDF to images: {e}", exc_info=True)
493
+ # Proceed without images? Or raise error? Let's raise.
494
+ raise gr.Error(f"Failed to convert PDF to images: {e}")
495
+
496
+ info_msg = f"Built {len(pdf_images)} slide images. Ready for Step 3."
497
+ logger.info(info_msg)
498
+ gr.Info(info_msg)
499
+ progress(1.0, desc="Slide build complete.")
500
+ # Return tuple WITHOUT status textbox update
501
+ return (
502
+ generated_pdf_path,
503
+ pdf_images, # Return the list of image paths
504
+ gr.update(visible=True),
505
+ gr.update(visible=False),
506
+ gr.update(value=generated_pdf_path, visible=True),
507
+ )
508
+
509
+
510
+ def step3_generate_audio(*args, progress=gr.Progress(track_tqdm=True)):
511
+ """Generates audio files for the speaker notes using edited content."""
512
+ # Args structure:
513
+ # args[0]: state_temp_dir
514
+ # args[1]: state_md_path
515
+ # args[2]: original_slides_data (list of dicts, used to get count)
516
+ # args[3 : 3 + MAX_SLIDES]: values from all_code_editors
517
+ # args[3 + MAX_SLIDES :]: values from all_notes_textboxes
518
+
519
+ state_temp_dir = args[0]
520
+ state_md_path = args[1]
521
+ original_slides_data = args[2]
522
+ editors = args[3:]
523
+ num_slides = len(original_slides_data)
524
+ if num_slides == 0:
525
+ logger.error("Step 3 (Audio) called with zero slides data.")
526
+ raise gr.Error("No slide data available. Please start over.")
527
+
528
+ MAX_SLIDES = 20 # Ensure this matches UI definition
529
+ code_editors_start_index = 3
530
+ notes_textboxes_start_index = 3 + MAX_SLIDES
531
+
532
+ # Slice the *actual* edited values based on num_slides
533
+ edited_contents = args[
534
+ code_editors_start_index : code_editors_start_index + num_slides
535
+ ]
536
+ edited_notes_list = args[
537
+ notes_textboxes_start_index : notes_textboxes_start_index + num_slides
538
+ ]
539
+
540
+ if not state_temp_dir or not state_md_path:
541
+ raise gr.Error("Session state lost (Audio step). Please start over.")
542
+
543
+ # Check slicing
544
+ if len(edited_contents) != num_slides or len(edited_notes_list) != num_slides:
545
+ logger.error(
546
+ f"Input slicing error (Audio step): Expected {num_slides}, got {len(edited_contents)} contents, {len(edited_notes_list)} notes."
547
+ )
548
+ raise gr.Error(
549
+ f"Input processing error: Mismatch after slicing ({num_slides} slides)."
550
+ )
551
+
552
+ logger.info(f"Processing {num_slides} slides for audio generation.")
553
+ audio_dir = os.path.join(state_temp_dir, "audio")
554
+ os.makedirs(audio_dir, exist_ok=True)
555
+
556
+ # --- Update the presentation.md file AGAIN in case notes changed after PDF ---
557
+ # This might be redundant if users don't edit notes between PDF and Audio steps,
558
+ # but ensures the audio matches the *latest* notes displayed.
559
+ progress(0.1, desc="Saving latest notes...")
560
+ updated_slides_data = []
561
+ for i in range(num_slides):
562
+ updated_slides_data.append(
563
+ {
564
+ "id": original_slides_data[i]["id"], # Keep original ID
565
+ "content": edited_contents[i], # Use sliced edited content
566
+ "notes": edited_notes_list[i], # Use sliced edited notes
567
+ }
568
+ )
569
+
570
+ updated_markdown = reconstruct_presentation_markdown(updated_slides_data)
571
+ try:
572
+ with open(state_md_path, "w", encoding="utf-8") as f:
573
+ f.write(updated_markdown)
574
+ logger.info(f"Updated presentation markdown before audio gen: {state_md_path}")
575
+ except IOError as e:
576
+ logger.error(f"Failed to save updated markdown before audio gen: {e}")
577
+ # Continue with audio gen, but log warning
578
+ gr.Warning(f"Could not save latest notes to markdown file: {e}")
579
+
580
+ generated_audio_paths = ["" for _ in range(num_slides)]
581
+ audio_generation_failed = False
582
+ successful_audio_count = 0
583
+
584
+ for i in range(num_slides):
585
+ note_text = edited_notes_list[i]
586
+ slide_num = i + 1
587
+ progress(
588
+ (i + 1) / num_slides * 0.8 + 0.1,
589
+ desc=f"Audio slide {slide_num}/{num_slides}",
590
+ )
591
+ output_file_path = Path(audio_dir) / f"{slide_num}.wav"
592
+ if not note_text or not note_text.strip():
593
+ try: # Generate silence
594
+ subprocess.run(
595
+ [
596
+ "ffmpeg",
597
+ "-y",
598
+ "-f",
599
+ "lavfi",
600
+ "-i",
601
+ "anullsrc=r=44100:cl=mono",
602
+ "-t",
603
+ "0.1",
604
+ "-q:a",
605
+ "9",
606
+ str(output_file_path),
607
+ ],
608
+ check=True,
609
+ capture_output=True,
610
+ text=True,
611
+ )
612
+ generated_audio_paths[i] = str(output_file_path)
613
+ except Exception as e:
614
+ audio_generation_failed = True
615
+ logger.error(f"Silence gen failed slide {i + 1}: {e}")
616
+ continue
617
+ try: # Generate TTS
618
+ success = text_to_speech(
619
+ note_text, output_file_path, voice=VOICE_ID, cache_dir=CACHE_DIR
620
+ )
621
+ if success:
622
+ generated_audio_paths[i] = str(output_file_path)
623
+ successful_audio_count += 1
624
+ else:
625
+ audio_generation_failed = True
626
+ logger.error(f"TTS failed slide {i + 1}")
627
+ except Exception as e:
628
+ audio_generation_failed = True
629
+ logger.error(f"TTS exception slide {i + 1}: {e}", exc_info=True)
630
+
631
+ # --- Prepare outputs for Gradio ---
632
+ audio_player_updates = [
633
+ gr.update(value=p if p else None, visible=bool(p and os.path.exists(p)))
634
+ for p in generated_audio_paths
635
+ ]
636
+ regen_button_updates = [gr.update(visible=True)] * num_slides
637
+ audio_player_updates.extend(
638
+ [gr.update(value=None, visible=False)] * (MAX_SLIDES - num_slides)
639
+ )
640
+ regen_button_updates.extend([gr.update(visible=False)] * (MAX_SLIDES - num_slides))
641
+
642
+ info_msg = f"Generated {successful_audio_count}/{num_slides} audio clips. "
643
+ if audio_generation_failed:
644
+ info_msg += "Some audio failed. Review/Regenerate before video."
645
+ gr.Warning(info_msg)
646
+ else:
647
+ info_msg += "Ready for Step 4."
648
+ gr.Info(info_msg)
649
+ logger.info(info_msg)
650
+ progress(1.0, desc="Audio generation complete.")
651
+
652
+ # Return tuple WITHOUT status textbox update
653
+ return (
654
+ audio_dir,
655
+ gr.update(visible=True), # btn_generate_video
656
+ gr.update(visible=False), # btn_generate_audio
657
+ *audio_player_updates,
658
+ *regen_button_updates,
659
+ )
660
+
661
+
662
+ def step4_generate_video(
663
+ state_temp_dir,
664
+ state_audio_dir,
665
+ state_pdf_path, # Use PDF path from state
666
+ progress=gr.Progress(track_tqdm=True),
667
+ ):
668
+ """Generates the final video using PDF images and audio files."""
669
+ if not state_temp_dir or not state_audio_dir or not state_pdf_path:
670
+ raise gr.Error("Session state lost (Video step). Please start over.")
671
+ if not os.path.exists(state_pdf_path):
672
+ raise gr.Error(f"PDF file not found: {state_pdf_path}. Cannot generate video.")
673
+ if not os.path.isdir(state_audio_dir):
674
+ raise gr.Error(
675
+ f"Audio directory not found: {state_audio_dir}. Cannot generate video."
676
+ )
677
+
678
+ video_output_path = os.path.join(state_temp_dir, "final_presentation.mp4")
679
+
680
+ progress(0.1, desc="Preparing video components...")
681
+ pdf_images = [] # Initialize to ensure cleanup happens
682
+ try:
683
+ # Find audio files (natsorted)
684
+ audio_files = find_audio_files(state_audio_dir, "*.wav")
685
+ if not audio_files:
686
+ logger.warning(
687
+ f"No WAV files found in {state_audio_dir}. Video might lack audio."
688
+ )
689
+ # Decide whether to proceed with silent video or error out
690
+ # raise gr.Error(f"No audio files found in {state_audio_dir}")
691
+
692
+ # Convert PDF to images
693
+ progress(0.2, desc="Converting PDF to images...")
694
+ pdf_images = convert_pdf_to_images(state_pdf_path, dpi=150)
695
+ if not pdf_images:
696
+ raise gr.Error(f"Failed to convert PDF ({state_pdf_path}) to images.")
697
+
698
+ # Allow video generation even if audio is missing or count mismatch
699
+ # The create_video_clips function should handle missing audio gracefully (e.g., use image duration)
700
+ if len(pdf_images) != len(audio_files):
701
+ logger.warning(
702
+ f"Mismatch: {len(pdf_images)} PDF pages vs {len(audio_files)} audio files. Video clips might have incorrect durations or missing audio."
703
+ )
704
+ # Pad the shorter list? For now, let create_video_clips handle it.
705
+
706
+ progress(0.5, desc="Creating individual video clips...")
707
+ buffer_seconds = 1.0
708
+ output_fps = 10
709
+ video_clips = create_video_clips(
710
+ pdf_images, audio_files, buffer_seconds, output_fps
711
+ )
712
+
713
+ if not video_clips:
714
+ raise gr.Error("Failed to create any video clips.")
715
+
716
+ progress(0.8, desc="Concatenating clips...")
717
+ concatenate_clips(video_clips, video_output_path, output_fps)
718
+
719
+ logger.info(f"Video concatenation complete: {video_output_path}")
720
+
721
+ progress(0.95, desc="Cleaning up temp images...")
722
+ cleanup_temp_files(pdf_images) # Pass the list of image paths
723
+
724
+ except Exception as e:
725
+ if pdf_images:
726
+ cleanup_temp_files(pdf_images)
727
+ logger.error(f"Video generation failed: {e}", exc_info=True)
728
+ raise gr.Error(f"Video generation failed: {e}")
729
+
730
+ info_msg = f"Video generated: {os.path.basename(video_output_path)}"
731
+ logger.info(info_msg)
732
+ gr.Info(info_msg)
733
+ progress(1.0, desc="Video Complete.")
734
+ # Return tuple WITHOUT status textbox update
735
+ return (
736
+ gr.update(value=video_output_path, visible=True), # video_output
737
+ gr.update(visible=False), # btn_generate_video
738
+ )
739
+
740
+
741
+ def cleanup_session(temp_dir):
742
+ """Removes the temporary directory."""
743
+ if temp_dir and isinstance(temp_dir, str) and os.path.exists(temp_dir):
744
+ try:
745
+ shutil.rmtree(temp_dir)
746
+ logger.info(f"Cleaned up temporary directory: {temp_dir}")
747
+ return "Cleaned up session files."
748
+ except Exception as e:
749
+ logger.error(f"Error cleaning up temp directory {temp_dir}: {e}")
750
+ return f"Error during cleanup: {e}"
751
+ logger.warning(f"Cleanup called but temp_dir invalid or not found: {temp_dir}")
752
+ return "No valid temporary directory found to clean."
753
+
754
+
755
+ # --- Gradio Interface ---
756
+
757
+ # Load custom CSS
758
+ custom_css = load_css()
759
+
760
+ with gr.Blocks(
761
+ theme=gr.themes.Soft(), css=custom_css, title="Webpage to Video"
762
+ ) as demo:
763
+ gr.Markdown("# Webpage to Video Presentation Generator")
764
+
765
+ # State variables
766
+ state_temp_dir = gr.State(None)
767
+ state_md_path = gr.State(None)
768
+ state_audio_dir = gr.State(None)
769
+ state_pdf_path = gr.State(None)
770
+ state_slides_data = gr.State([])
771
+ state_pdf_image_paths = gr.State([])
772
+
773
+ MAX_SLIDES = 20
774
+
775
+ # --- Tabbed Interface ---
776
+ with gr.Tabs(elem_id="tabs") as tabs_widget:
777
+ # Tab 1: Generate Presentation
778
+ with gr.TabItem("1. Generate Presentation", id=0):
779
+ with gr.Row():
780
+ with gr.Column(scale=1):
781
+ gr.Markdown("**Step 1:** Enter URL")
782
+ input_url = gr.Textbox(
783
+ label="Webpage URL",
784
+ value="https://huggingface.co/blog/llm-course",
785
+ )
786
+ btn_fetch_generate = gr.Button(
787
+ value="1. Fetch & Generate", variant="primary"
788
+ )
789
+ with gr.Column(scale=4):
790
+ gr.Markdown(
791
+ "### Instructions\n1. Enter URL & click 'Fetch & Generate'.\n2. Editor appears below tabs.\n3. Go to next tab."
792
+ )
793
+
794
+ # Tab 2: Build Slides
795
+ with gr.TabItem("2. Build Slides", id=1):
796
+ with gr.Row():
797
+ with gr.Column(scale=1):
798
+ gr.Markdown("**Step 2:** Review/Edit, then build slides.")
799
+ btn_build_slides = gr.Button(
800
+ value="2. Build Slides", variant="secondary", visible=False
801
+ )
802
+ pdf_download_link = gr.File(
803
+ label="Download PDF", visible=False, interactive=False
804
+ )
805
+ with gr.Column(scale=4):
806
+ gr.Markdown(
807
+ "### Instructions\n1. Edit content/notes below.\n2. Click 'Build Slides'. Images appear.\n3. Download PDF from sidebar.\n4. Go to next tab."
808
+ )
809
+
810
+ # Tab 3: Generate Audio
811
+ with gr.TabItem("3. Generate Audio", id=2):
812
+ with gr.Row():
813
+ with gr.Column(scale=1):
814
+ gr.Markdown("**Step 3:** Review/Edit notes, then generate audio.")
815
+ btn_generate_audio = gr.Button(
816
+ value="3. Generate Audio", variant="primary", visible=False
817
+ )
818
+ with gr.Column(scale=4):
819
+ gr.Markdown(
820
+ "### Instructions\n1. Finalize notes below.\n2. Click 'Generate Audio'.\n3. Regenerate if needed.\n4. Go to next tab."
821
+ )
822
+
823
+ # Tab 4: Generate Video
824
+ with gr.TabItem("4. Create Video", id=3):
825
+ with gr.Row():
826
+ with gr.Column(scale=1):
827
+ gr.Markdown("**Step 4:** Create the final video.")
828
+ btn_generate_video = gr.Button(
829
+ value="4. Create Video", variant="primary", visible=False
830
+ )
831
+ with gr.Column(scale=4):
832
+ gr.Markdown(
833
+ "### Instructions\n1. Click 'Create Video'.\n2. Video appears below."
834
+ )
835
+ video_output = gr.Video(label="Final Video", visible=False)
836
+
837
+ # Define the shared editor structure once, AFTER tabs
838
+ slide_editors_group = []
839
+ with gr.Column(visible=False) as editor_column: # Initially hidden
840
+ gr.Markdown("--- \n## Edit Slides & Notes")
841
+ gr.Markdown("_(PDF uses content & notes, Audio uses notes only)_")
842
+ for i in range(MAX_SLIDES):
843
+ with gr.Accordion(f"Slide {i + 1}", open=(i == 0), visible=False) as acc:
844
+ with gr.Row(): # Row for Content/Preview/Image
845
+ with gr.Column(scale=1):
846
+ code_editor = gr.Code(
847
+ label="Content (Markdown)",
848
+ language="markdown",
849
+ lines=15,
850
+ interactive=True,
851
+ visible=False,
852
+ )
853
+ notes_textbox = gr.Code(
854
+ label="Script/Notes (for Audio)",
855
+ lines=8,
856
+ language="markdown",
857
+ interactive=True,
858
+ visible=False,
859
+ )
860
+ with gr.Column(scale=1):
861
+ slide_image = gr.Image(
862
+ label="Slide Image",
863
+ visible=False,
864
+ interactive=False,
865
+ height=300,
866
+ )
867
+ md_preview = gr.Markdown(visible=False)
868
+ with gr.Row(): # Row for audio controls
869
+ audio_player = gr.Audio(
870
+ label="Generated Audio",
871
+ visible=False,
872
+ interactive=False,
873
+ scale=3,
874
+ )
875
+ regen_button = gr.Button(
876
+ value="Regen Audio", visible=False, scale=1, size="sm"
877
+ )
878
+ slide_editors_group.append(
879
+ (
880
+ acc,
881
+ code_editor,
882
+ md_preview,
883
+ notes_textbox,
884
+ audio_player,
885
+ regen_button,
886
+ slide_image,
887
+ )
888
+ )
889
+ code_editor.change(
890
+ fn=lambda x: x,
891
+ inputs=code_editor,
892
+ outputs=md_preview,
893
+ show_progress="hidden",
894
+ )
895
+
896
+ # --- Component Lists for Updates ---
897
+ all_editor_components = [comp for group in slide_editors_group for comp in group]
898
+ all_code_editors = [group[1] for group in slide_editors_group]
899
+ all_notes_textboxes = [group[3] for group in slide_editors_group]
900
+ all_audio_players = [group[4] for group in slide_editors_group]
901
+ all_regen_buttons = [group[5] for group in slide_editors_group]
902
+ all_slide_images = [group[6] for group in slide_editors_group]
903
+
904
+ # --- Function to regenerate audio --- (Assumed correct)
905
+ # ... (regenerate_single_audio implementation as fixed before)...
906
+ def regenerate_single_audio(
907
+ slide_idx, note_text, temp_dir, progress=gr.Progress(track_tqdm=True)
908
+ ):
909
+ # ...(Implementation as fixed before)...
910
+ if (
911
+ not temp_dir
912
+ or not isinstance(temp_dir, str)
913
+ or not os.path.exists(temp_dir)
914
+ ):
915
+ logger.error(f"Regen audio failed: Invalid temp_dir '{temp_dir}'")
916
+ return gr.update(value=None, visible=False)
917
+ slide_num = slide_idx + 1
918
+ audio_dir = os.path.join(temp_dir, "audio")
919
+ os.makedirs(audio_dir, exist_ok=True)
920
+ output_file = Path(audio_dir) / f"{slide_num}.wav"
921
+ logger.info(f"Regenerating audio for slide {slide_num} -> {output_file}")
922
+ progress(0.1, desc=f"Regen audio slide {slide_num}...")
923
+ if not note_text or not note_text.strip():
924
+ logger.warning(f"Note for slide {slide_num} empty. Generating silence.")
925
+ try:
926
+ subprocess.run(
927
+ [
928
+ "ffmpeg",
929
+ "-y",
930
+ "-f",
931
+ "lavfi",
932
+ "-i",
933
+ "anullsrc=r=44100:cl=mono",
934
+ "-t",
935
+ "0.1",
936
+ "-q:a",
937
+ "9",
938
+ str(output_file),
939
+ ],
940
+ check=True,
941
+ capture_output=True,
942
+ text=True,
943
+ )
944
+ logger.info(f"Created silent placeholder: {output_file}")
945
+ progress(1.0, desc=f"Generated silence slide {slide_num}.")
946
+ return gr.update(value=str(output_file), visible=True)
947
+ except Exception as e:
948
+ logger.error(f"Failed silent gen slide {slide_num}: {e}")
949
+ return gr.update(value=None, visible=False)
950
+ else:
951
+ try:
952
+ success = text_to_speech(
953
+ note_text, output_file, voice=VOICE_ID, cache_dir=CACHE_DIR
954
+ )
955
+ if success:
956
+ logger.info(f"Regen OK slide {slide_num}")
957
+ progress(1.0, desc=f"Audio regen OK slide {slide_num}.")
958
+ return gr.update(value=str(output_file), visible=True)
959
+ else:
960
+ logger.error(f"Regen TTS failed slide {slide_num}")
961
+ return gr.update(value=None, visible=False)
962
+ except Exception as e:
963
+ logger.error(
964
+ f"Regen TTS exception slide {slide_num}: {e}", exc_info=True
965
+ )
966
+ return gr.update(value=None, visible=False)
967
+
968
+ # --- Connect the individual Re-generate buttons ---
969
+ # Update unpacking to include slide_image (7 items)
970
+ for i, (
971
+ acc,
972
+ code_edit,
973
+ md_preview,
974
+ notes_tb,
975
+ audio_pl,
976
+ regen_btn,
977
+ slide_image,
978
+ ) in enumerate(slide_editors_group):
979
+ regen_btn.click(
980
+ fn=regenerate_single_audio,
981
+ inputs=[gr.State(i), notes_tb, state_temp_dir],
982
+ outputs=[audio_pl],
983
+ show_progress="minimal",
984
+ )
985
+
986
+ # --- Main Button Click Handlers --- (Outputs use locally defined component vars)
987
+
988
+ # Step 1 Click Handler
989
+ step1_outputs = [
990
+ state_temp_dir,
991
+ state_md_path,
992
+ state_slides_data,
993
+ editor_column, # Show the editor column
994
+ btn_build_slides, # Enable the button in Tab 2
995
+ btn_fetch_generate, # Disable self
996
+ ]
997
+ btn_fetch_generate.click(
998
+ fn=step1_fetch_and_generate_presentation,
999
+ inputs=[input_url],
1000
+ outputs=step1_outputs,
1001
+ ).then(
1002
+ fn=lambda s_data: [
1003
+ upd
1004
+ for i, slide in enumerate(s_data)
1005
+ if i < MAX_SLIDES
1006
+ for upd in [
1007
+ gr.update(
1008
+ label=f"Slide {i + 1}: {slide['content'][:25]}...",
1009
+ visible=True,
1010
+ open=(i == 0),
1011
+ ), # Accordion
1012
+ gr.update(value=slide["content"], visible=True), # Code Editor
1013
+ gr.update(value=slide["content"], visible=True), # MD Preview
1014
+ gr.update(value=slide["notes"], visible=True), # Notes Textbox
1015
+ gr.update(value=None, visible=False), # Audio Player
1016
+ gr.update(visible=False), # Regen Button
1017
+ gr.update(value=None, visible=False), # Slide Image
1018
+ ]
1019
+ ]
1020
+ + [
1021
+ upd
1022
+ for i in range(len(s_data), MAX_SLIDES)
1023
+ for upd in [gr.update(visible=False)] * 7
1024
+ ],
1025
+ inputs=[state_slides_data],
1026
+ outputs=all_editor_components,
1027
+ show_progress="hidden",
1028
+ ).then(lambda: gr.update(selected=1), outputs=tabs_widget) # Switch to Tab 2
1029
+
1030
+ # Step 2 Click Handler
1031
+ step2_inputs = (
1032
+ [state_temp_dir, state_md_path, state_slides_data]
1033
+ + all_code_editors
1034
+ + all_notes_textboxes
1035
+ )
1036
+ step2_outputs = [
1037
+ state_pdf_path,
1038
+ state_pdf_image_paths,
1039
+ btn_generate_audio, # Enable button in Tab 3
1040
+ btn_build_slides, # Disable self
1041
+ pdf_download_link, # Update download link in Tab 2
1042
+ ]
1043
+ btn_build_slides.click(
1044
+ fn=step2_build_slides,
1045
+ inputs=step2_inputs,
1046
+ outputs=step2_outputs,
1047
+ ).then(
1048
+ fn=lambda image_paths: [
1049
+ gr.update(
1050
+ value=image_paths[i] if i < len(image_paths) else None,
1051
+ visible=(i < len(image_paths)),
1052
+ )
1053
+ for i in range(MAX_SLIDES)
1054
+ ],
1055
+ inputs=[state_pdf_image_paths],
1056
+ outputs=all_slide_images,
1057
+ show_progress="hidden",
1058
+ ).then(lambda: gr.update(selected=2), outputs=tabs_widget) # Switch to Tab 3
1059
+
1060
+ # Step 3 Click Handler
1061
+ step3_inputs = (
1062
+ [state_temp_dir, state_md_path, state_slides_data]
1063
+ + all_code_editors
1064
+ + all_notes_textboxes
1065
+ )
1066
+ step3_outputs = (
1067
+ [
1068
+ state_audio_dir,
1069
+ btn_generate_video, # Enable button in Tab 4
1070
+ btn_generate_audio, # Disable self
1071
+ ]
1072
+ + all_audio_players
1073
+ + all_regen_buttons
1074
+ )
1075
+ btn_generate_audio.click(
1076
+ fn=step3_generate_audio,
1077
+ inputs=step3_inputs,
1078
+ outputs=step3_outputs,
1079
+ ).then(lambda: gr.update(selected=3), outputs=tabs_widget) # Switch to Tab 4
1080
+
1081
+ # Step 4 Click Handler
1082
+ step4_inputs = [state_temp_dir, state_audio_dir, state_pdf_path]
1083
+ step4_outputs = [
1084
+ video_output, # Update video output in Tab 4
1085
+ btn_generate_video, # Disable self
1086
+ ]
1087
+ btn_generate_video.click(
1088
+ fn=step4_generate_video,
1089
+ inputs=step4_inputs,
1090
+ outputs=step4_outputs,
1091
+ )
1092
+
1093
+ if __name__ == "__main__":
1094
+ os.makedirs(CACHE_DIR, exist_ok=True)
1095
+ os.makedirs(URL_CACHE_DIR, exist_ok=True)
1096
+ demo.queue().launch(debug=True)
app/src/__init__.py ADDED
File without changes
{scripts β†’ app/src}/create_presentation.py RENAMED
@@ -19,10 +19,12 @@ You are an expert technical writer and presentation creator. Your task is to con
19
  2. **Slide Format:** Each slide should start with `# Slide Title` derived from the corresponding `## Heading`.
20
  3. **Content:** Include the relevant text, code blocks (preserving language identifiers like ```python), and lists from the input markdown within each slide.
21
  4. **Images:** Convert Markdown images `![alt](url)` into Remark.js format: `.center[![alt](url)]`. Ensure the image URL is correct and accessible.
22
- 5. **Presenter Notes (Transcription Style):** For each slide, generate a detailed script or **transcription** of what the presenter should say to explain the slide's content. This should be flowing text suitable for reading aloud. Place this transcription after the slide content, separated by `???`.
 
23
  6. **Separators:** Separate individual slides using `\n\n---\n\n`.
24
  7. **Cleanup:** Do NOT include any HTML/MDX specific tags like `<CourseFloatingBanner>`, `<Tip>`, `<Question>`, `<Youtube>`, or internal links like `[[...]]`. Remove frontmatter.
25
- 8. **Start Slide:** Begin the presentation with a title slide:
 
26
  ```markdown
27
  class: impact
28
 
@@ -34,7 +36,8 @@ You are an expert technical writer and presentation creator. Your task is to con
34
  ???
35
  Welcome everyone. This presentation, automatically generated from the course material titled '{input_filename}', will walk you through the key topics discussed in the document. Let's begin.
36
  ```
37
- 9. **Output:** Provide ONLY the complete Remark.js Markdown content, starting with the title slide and ending with the last content slide. Do not include any introductory text, explanations, or a final 'Thank You' slide.
 
38
 
39
  **Generate the Remark.js presentation now:**
40
  """
 
19
  2. **Slide Format:** Each slide should start with `# Slide Title` derived from the corresponding `## Heading`.
20
  3. **Content:** Include the relevant text, code blocks (preserving language identifiers like ```python), and lists from the input markdown within each slide.
21
  4. **Images:** Convert Markdown images `![alt](url)` into Remark.js format: `.center[![alt](url)]`. Ensure the image URL is correct and accessible.
22
+ 5. **Presenter Notes (Transcription Style):** For each slide, generate a detailed **transcription** of what the presenter should say with the slide's content. This should be flowing text suitable for reading aloud. Place this transcription after the slide content, separated by `???`.
23
+ 6. **Speaker Style:** The speaker should flow smoothly from one slide to the next. No need to explicitly mention the slide number or introduce the content directly.
24
  6. **Separators:** Separate individual slides using `\n\n---\n\n`.
25
  7. **Cleanup:** Do NOT include any HTML/MDX specific tags like `<CourseFloatingBanner>`, `<Tip>`, `<Question>`, `<Youtube>`, or internal links like `[[...]]`. Remove frontmatter.
26
+ 8. **References:** Do not include references to files like `2.mdx`. Instead, refer to the title of the section.
27
+ 9. **Start Slide:** Begin the presentation with a title slide:
28
  ```markdown
29
  class: impact
30
 
 
36
  ???
37
  Welcome everyone. This presentation, automatically generated from the course material titled '{input_filename}', will walk you through the key topics discussed in the document. Let's begin.
38
  ```
39
+ 10. **Output:** Provide ONLY the complete Remark.js Markdown content, starting with the title slide and ending with the last content slide. Do not include any introductory text, explanations, or a final 'Thank You' slide.
40
+ 11. **Style:** Keep slide content concise and to the point with no paragraphs. Speaker notes can expand the content of the slide further.
41
 
42
  **Generate the Remark.js presentation now:**
43
  """
{scripts β†’ app/src}/create_video.py RENAMED
File without changes
{scripts β†’ app/src}/transcription_to_audio.py RENAMED
File without changes
{chapter1 β†’ app}/template/index.html RENAMED
File without changes
{chapter1 β†’ app}/template/remark.min.js RENAMED
File without changes
{chapter1 β†’ app}/template/style.scss RENAMED
File without changes
chapter1/material/1_presentation.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ class: impact
2
+
3
+ # Presentation based on 1.md
4
+ ## Generated Presentation
5
+
6
+ .center[![Hugging Face Logo](https://huggingface.co/front/assets/huggingface_logo.svg)]
7
+
8
+ ???
9
+ Welcome everyone. This presentation, automatically generated from the course material titled '1.md', will walk you through the key topics discussed in the document. Let's begin.
10
+
11
+ ---
12
+
13
+ # Introduction
14
+
15
+ ???
16
+ Welcome to the first chapter of our course. In this introductory section, we'll set the stage for what you can expect to learn and explore throughout this journey.
17
+
18
+ ---
19
+
20
+ # Welcome to the πŸ€— Course!
21
+
22
+ This course will teach you about large language models (LLMs) and natural language processing (NLP) using libraries from the [Hugging Face](https://huggingface.co/) ecosystem β€” [πŸ€— Transformers](https://github.com/huggingface/transformers), [πŸ€— Datasets](https://github.com/huggingface/datasets), [πŸ€— Tokenizers](https://github.com/huggingface/tokenizers), and [πŸ€— Accelerate](https://github.com/huggingface/accelerate) β€” as well as the [Hugging Face Hub](https://huggingface.co/models). It's completely free and without ads.
23
+
24
+ ???
25
+ Welcome to the Hugging Face course! This course is designed to teach you about large language models and natural language processing using the Hugging Face ecosystem. We'll be exploring libraries like Transformers, Datasets, Tokenizers, and Accelerate, as well as the Hugging Face Hub. The best part? This course is completely free and ad-free.
26
+
27
+ ---
28
+
29
+ # Understanding NLP and LLMs
30
+
31
+ While this course was originally focused on NLP (Natural Language Processing), it has evolved to emphasize Large Language Models (LLMs), which represent the latest advancement in the field.
32
+
33
+ **What's the difference?**
34
+ - **NLP (Natural Language Processing)** is the broader field focused on enabling computers to understand, interpret, and generate human language. NLP encompasses many techniques and tasks such as sentiment analysis, named entity recognition, and machine translation.
35
+ - **LLMs (Large Language Models)** are a powerful subset of NLP models characterized by their massive size, extensive training data, and ability to perform a wide range of language tasks with minimal task-specific training. Models like the Llama, GPT, or Claude series are examples of LLMs that have revolutionized what's possible in NLP.
36
+
37
+ Throughout this course, you'll learn about both traditional NLP concepts and cutting-edge LLM techniques, as understanding the foundations of NLP is crucial for working effectively with LLMs.
38
+
39
+ ???
40
+ Let's start by understanding the difference between NLP and LLMs. NLP, or Natural Language Processing, is a broad field that focuses on enabling computers to understand, interpret, and generate human language. It encompasses various techniques and tasks like sentiment analysis and machine translation. On the other hand, LLMs, or Large Language Models, are a subset of NLP models known for their massive size and ability to perform a wide range of language tasks with minimal task-specific training. Throughout this course, we'll explore both traditional NLP concepts and cutting-edge LLM techniques.
41
+
42
+ ---
43
+
44
+ # What to expect?
45
+
46
+ .center[![Brief overview of the chapters of the course](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary.svg)]
47
+
48
+ - Chapters 1 to 4 provide an introduction to the main concepts of the πŸ€— Transformers library. By the end of this part of the course, you will be familiar with how Transformer models work and will know how to use a model from the [Hugging Face Hub](https://huggingface.co/models), fine-tune it on a dataset, and share your results on the Hub!
49
+ - Chapters 5 to 8 teach the basics of πŸ€— Datasets and πŸ€— Tokenizers before diving into classic NLP tasks and LLM techniques. By the end of this part, you will be able to tackle the most common language processing challenges by yourself.
50
+ - Chapter 9 goes beyond NLP to cover how to build and share demos of your models on the πŸ€— Hub. By the end of this part, you will be ready to showcase your πŸ€— Transformers application to the world!
51
+ - Chapters 10 to 12 dive into advanced LLM topics like fine-tuning, curating high-quality datasets, and building reasoning models.
52
+
53
+ This course:
54
+ - Requires a good knowledge of Python
55
+ - Is better taken after an introductory deep learning course
56
+ - Does not expect prior PyTorch or TensorFlow knowledge, though some familiarity will help
57
+
58
+ ???
59
+ Now, let's talk about what you can expect from this course. We'll start with an introduction to the Transformers library, followed by exploring Datasets and Tokenizers. We'll then dive into classic NLP tasks and LLM techniques. By the end of this course, you'll be equipped to tackle common language processing challenges and even build and share your own models. This course assumes a good knowledge of Python and is best taken after an introductory deep learning course.
60
+
61
+ ---
62
+
63
+ # Who are we?
64
+
65
+ About the authors:
66
+
67
+ - **Abubakar Abid**: Completed his PhD at Stanford in applied machine learning. Founded Gradio, acquired by Hugging Face.
68
+ - **Ben Burtenshaw**: Machine Learning Engineer at Hugging Face, PhD in NLP from the University of Antwerp.
69
+ - **Matthew Carrigan**: Machine Learning Engineer at Hugging Face, previously at Parse.ly and Trinity College Dublin.
70
+ - **Lysandre Debut**: Machine Learning Engineer at Hugging Face, core maintainer of the πŸ€— Transformers library.
71
+ - **Sylvain Gugger**: Research Engineer at Hugging Face, co-author of _Deep Learning for Coders with fastai and PyTorch_.
72
+ - **Dawood Khan**: Machine Learning Engineer at Hugging Face, co-founder of Gradio.
73
+ - **Merve Noyan**: Developer Advocate at Hugging Face, focused on democratizing machine learning.
74
+ - **Lucile Saulnier**: Machine Learning Engineer at Hugging Face, involved in NLP research projects.
75
+ - **Lewis Tunstall**: Machine Learning Engineer at Hugging Face, co-author of _Natural Language Processing with Transformers_.
76
+ - **Leandro von Werra**: Machine Learning Engineer at Hugging Face, co-author of _Natural Language Processing with Transformers_.
77
+
78
+ ???
79
+ Let me introduce you to the authors of this course. We have a diverse team of experts, including Abubakar Abid, who founded Gradio; Ben Burtenshaw, with a PhD in NLP; and many others who bring a wealth of knowledge and experience to this course.
80
+
81
+ ---
82
+
83
+ # FAQ
84
+
85
+ - **Does taking this course lead to a certification?**
86
+ Currently, no certification is available, but a program is in development.
87
+
88
+ - **How much time should I spend on this course?**
89
+ Each chapter is designed for 1 week, with 6-8 hours of work per week.
90
+
91
+ - **Where can I ask a question?**
92
+ Click the "Ask a question" banner to be redirected to the [Hugging Face forums](https://discuss.huggingface.co/).
93
+
94
+ .center[![Link to the Hugging Face forums](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/forum-button.png)]
95
+
96
+ - **Where can I get the code for the course?**
97
+ Click the banner to run code in Google Colab or Amazon SageMaker Studio Lab.
98
+
99
+ .center[![Link to the Hugging Face course notebooks](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/notebook-buttons.png)]
100
+
101
+ - **How can I contribute to the course?**
102
+ Open an issue on the [course repo](https://github.com/huggingface/course) or help translate the course.
103
+
104
+ - **Can I reuse this course?**
105
+ Yes, under the [Apache 2 license](https://www.apache.org/licenses/LICENSE-2.0.html).
106
+
107
+ ???
108
+ Before we proceed, let's address some frequently asked questions. We're working on a certification program, but it's not available yet. Each chapter is designed for one week of study, with 6-8 hours of work per week. If you have questions, you can ask them on the Hugging Face forums. The code for the course is available on GitHub, and you can contribute by opening issues or helping with translations. Finally, this course is released under the Apache 2 license, so feel free to reuse it.
109
+
110
+ ---
111
+
112
+ # Let's Go
113
+
114
+ In this chapter, you will learn:
115
+ - How to use the `pipeline()` function to solve NLP tasks such as text generation and classification
116
+ - About the Transformer architecture
117
+ - How to distinguish between encoder, decoder, and encoder-decoder architectures and use cases
118
+
119
+ ???
120
+ Now that we've covered the basics, let's dive into what you'll learn in this chapter. We'll explore the `pipeline()` function for solving NLP tasks, understand the Transformer architecture, and learn to distinguish between different model architectures and their use cases.
121
+
122
+ ---
123
+
124
+ class: center, middle
125
+
126
+ # Thank You!
127
+
128
+ ???
129
+ That concludes the material covered in this presentation, generated from the provided course document. Thank you for your time and attention. Are there any questions?
130
+ ```