Spaces:

Victarry
/

PP-schedule-visualizer

Running

App Files Files Community

Victarry commited on Mar 5

Commit

c048b97

0 Parent(s):

Initial commit: PP schedule visualization.

Browse files

Files changed (14) hide show

.gitattributes +2 -0
.gitignore +12 -0
Dockerfile +19 -0
LICENSE +21 -0
README.md +157 -0
app.py +340 -0
conf/config.yaml +25 -0
main.py +156 -0
pyproject.toml +69 -0
requirements.txt +9 -0
src/__init__.py +3 -0
src/execution_model.py +401 -0
src/strategies.py +581 -0
src/visualizer.py +612 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ *.png filter=lfs diff=lfs merge=lfs -text
2	+ assets/*.png filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,12 @@

+# Python
+./venv
+uv.lock
+outputs/
+.cursor/*
+*.json
+# Uncomment below if you want to include these files
+# !assets/*.png
+# !assets/*.jpg
+# !docs/*.png
+# !docs/*.jpg

Dockerfile ADDED Viewed

	@@ -0,0 +1,19 @@

+# Read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
+FROM python:3.9-slim
+RUN useradd -m -u 1000 user
+USER user
+ENV PATH="/home/user/.local/bin:$PATH"
+ENV HOME="/home/user"
+WORKDIR /home/user/app
+COPY --chown=user requirements.txt ./
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+COPY --chown=user . ./
+# Expose the port app will run on
+EXPOSE 7860
+# Start the app
+CMD ["gunicorn", "-b", "0.0.0.0:7860", "app:server"]

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2024
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,157 @@

+# Pipeline Parallelism Emulation and Visualization
+This project provides tools for emulating and visualizing pipeline parallelism strategies used in large language model training.
+## Overview
+Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to:
+- Simulate different pipeline parallelism strategies (1F1B, Interleaved, Zero-Bubble, etc.)
+- Visualize the execution schedule on multiple devices
+- Compare different strategies for efficiency
+## Features
+- **Supported Pipeline Strategies**:
+  - 1F1B (One-Forward-One-Backward)
+  - Interleaved 1F1B
+  - Zero-Bubble 1F1B (ZB-1P)
+  - 1F1B with computation-communication overlap
+  - Interleaved 1F1B with computation-communication overlap
+  - DualPipe (Bidirectional pipeline parallelism with full forward-backward overlap)
+- **Visualization**:
+  - Interactive visualization dashboard using Plotly/Dash
+- **Configuration**:
+  - Configurable simulation parameters through Hydra
+  - Customizable stage latency and communication costs
+## Installation
+This project uses [uv](https://github.com/astral-sh/uv) for dependency management.
+Setup `uv` if not installed on your computer:
+```bash
+# On macOS and Linux
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+## Running the Interactive Server
+To visualize schedules interactively:
+```bash
+uv run src/server.py
+```
+This will start a Dash server (usually on `http://127.0.0.1:8050/`). Open this URL in your web browser.
+You can then adjust parameters like the number of devices, stages, batches, operation times, and select different scheduling strategies to see the resulting pipeline visualization.
+## Running from Command Line
+### Running for 1F1B strategy:
+```bash
+uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8
+```
+![1f1b](assets/1f1b.png)
+### Running for interleaved strategy:
+```bash
+uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8
+```
+![interleave](assets/interleave_1f1b.png)
+### Running for ZB-1P strategy:
+```bash
+uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8
+```
+![zb1p](assets/zb1p.png)
+### Running for DualPipe strategy:
+```bash
+uv run python main.py strategy=dualpipe num_devices=8 num_stages=8 num_batches=20
+```
+![dualpipe](assets/dualpipe.png)
+### Running for 1F1B-batch-overlap strategy:
+```bash
+uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8
+```
+![1f1b_overlap](assets/1f1b_overlap.png)
+### Running for 1F1B-interleave-overlap strategy:
+```bash
+uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8
+```
+![1f1b_interleave_overlap](assets/1f1b_interleave_overlap.png)
+## Configuration
+The default configuration is in `conf/config.yaml`. You can override any parameter on the command line or create configuration groups for different scenarios.
+#### Override Specific Parameters
+You can override specific parameters at runtime:
+```bash
+uv run python main.py op_times.forward=0.5 op_times.backward=1.0 num_batches=6
+```
+Use DualPipe as an example, you can manually set different time for forward/backward/backward_D/backward_W/overlapped_forward_backward:
+```bash
+uv run python main.py strategy=dualpipe num_devices=8 num_stages=8 num_batches=32 op_times.forward=1.0 op_times.backward=2.0 op_times.backward_D=1.0 op_times.backward_W=1.0 op_times.overlapped_forward_backward=2.5
+```
+### Using Different Configuration Files
+You can use different configuration files with Hydra in several ways:
+#### Recommended Approach
+1. Create multiple configuration files in the `conf` directory for different use cases:
+   ```
+   conf/
+   ├── config.yaml     # Default configuration
+   └── model_A.yaml    # Create your own config with stage-specific latency for performance projection
+   ```
+2. Run with your desired configuration using the `--config-name` flag:
+   ```bash
+   uv run python main.py --config-name=model_A
+   ```
+## Project Structure
+```
+PP-Emulation/
+├── conf/                   # Hydra configuration files
+│   └── config.yaml         # Default configuration
+├── src/                    # Source code
+│   ├── __init__.py         # Package initialization
+│   ├── execution_model.py  # Schedule execution models
+│   ├── strategies.py       # Pipeline parallelism strategies
+│   └── visualizer.py       # Visualization utilities
+├── main.py                 # Main entry point
+├── pyproject.toml          # Project metadata and dependencies
+└── README.md               # This file
+```
+## References
+1. _PipeDream: Fast and Efficient Pipeline Parallel DNN Training_. [arxiv](https://arxiv.org/abs/1806.03377)
+2. _Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM_. [arxiv](https://arxiv.org/abs/2104.04473)
+3. _Zero Bubble Pipeline Parallelism_. [arxiv](https://arxiv.org/abs/2401.10241)
+4. _Communication-Computation Overlap in MoE Training with 1F1B Pipeline Parallelism_. [blog](https://zhuanlan.zhihu.com/p/28463368206)
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.

app.py ADDED Viewed

	@@ -0,0 +1,340 @@

+import dash
+import dash_bootstrap_components as dbc
+from dash import dcc, html, Input, Output, State, callback_context
+import plotly.graph_objects as go
+from src.execution_model import ScheduleConfig, Schedule
+from src.strategies import (
+    generate_1f1b_schedule,
+    generate_zero_bubble_1p_schedule,
+    generate_1f1b_overlap_schedule,
+    generate_1f1b_interleave_schedule,
+    generate_1f1b_interleave_overlap_schedule,
+    generate_dualpipe_schedule
+)
+from src.visualizer import convert_schedule_to_visualization_format, create_pipeline_figure
+STRATEGIES = {
+    "1f1b": generate_1f1b_schedule,
+    "zb1p": generate_zero_bubble_1p_schedule,
+    "1f1b_overlap": generate_1f1b_overlap_schedule,
+    "1f1b_interleave": generate_1f1b_interleave_schedule,
+    "1f1b_interleave_overlap": generate_1f1b_interleave_overlap_schedule,
+    "dualpipe": generate_dualpipe_schedule,
+}
+app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP], suppress_callback_exceptions=True)
+app.title = "Pipeline Parallelism Schedule Visualizer"
+# Initial default values
+default_values = {
+    "num_devices": 4,
+    "num_stages": 8,
+    "num_batches": 16,
+    "p2p_latency": 0.0,
+    "op_time_forward": 1.0,
+    "op_time_backward_d": 1.0,
+    "op_time_backward_w": 1.0,
+    "op_time_backward": 2.0,
+    "strategy": "1f1b_interleave",
+    "op_time_overlapped_fwd_bwd": None,
+}
+# Define input groups using dbc components
+basic_params_card = dbc.Card(
+    dbc.CardBody([
+        html.H5("Basic Parameters", className="card-title"),
+        html.Div([
+            dbc.Label("Number of Devices (GPUs):"),
+            dbc.Input(id='num_devices', type='number', value=default_values["num_devices"], min=1, step=1),
+        ], className="mb-3"),
+        html.Div([
+            dbc.Label("Number of Stages (Model Chunks):"),
+            dbc.Input(id='num_stages', type='number', value=default_values["num_stages"], min=1, step=1),
+        ], className="mb-3"),
+        html.Div([
+            dbc.Label("Number of Microbatches:"),
+            dbc.Input(id='num_batches', type='number', value=default_values["num_batches"], min=1, step=1),
+        ], className="mb-3"),
+        html.Div([
+            dbc.Label("P2P Latency (ms):"),
+            dbc.Input(id='p2p_latency', type='number', value=default_values["p2p_latency"], min=0, step=0.01),
+        ], className="mb-3"),
+    ])
+)
+scheduling_params_card = dbc.Card(
+    dbc.CardBody([
+        html.H5("Scheduling Parameters", className="card-title"),
+        html.Div([
+            dbc.Label("Scheduling Strategies:"),
+            dbc.Checklist(
+                id='strategy-checklist',
+                options=[{'label': k, 'value': k} for k in STRATEGIES.keys()],
+                value=list(STRATEGIES.keys()),
+                inline=False,
+            ),
+        ], className="mb-3"),
+    ])
+)
+timing_params_card = dbc.Card(
+    dbc.CardBody([
+        html.H5("Operation Timing (ms)", className="card-title"),
+        html.Div([
+            dbc.Label("Forward:"),
+            dbc.Input(id='op_time_forward', type='number', value=default_values["op_time_forward"], min=0.01, step=0.01),
+        ], className="mb-3"),
+        html.Div([
+            dbc.Label("Backward (Combined):"),
+            dbc.Input(id='op_time_backward', type='number', value=default_values["op_time_backward"], min=0.01, step=0.01),
+            dbc.FormText("Used when strategy does NOT require split backward."),
+        ], className="mb-3"),
+        html.Div([
+            dbc.Label("Backward D (Data Grad):"),
+            dbc.Input(id='op_time_backward_d', type='number', value=default_values["op_time_backward_d"], min=0.01, step=0.01),
+            dbc.FormText("Used when strategy requires split backward (e.g., ZB-1P, DualPipe)."),
+        ], className="mb-3"),
+        html.Div([
+            dbc.Label("Backward W (Weight Grad):"),
+            dbc.Input(id='op_time_backward_w', type='number', value=default_values["op_time_backward_w"], min=0.01, step=0.01),
+            dbc.FormText("Used when strategy requires split backward (e.g., ZB-1P, DualPipe)."),
+        ], className="mb-3"),
+        html.Div([
+            dbc.Label("Overlapped Forward+Backward:"),
+            dbc.Input(id='op_time_overlapped_fwd_bwd', type='number', placeholder="Optional: Defaults to Fwd + Bwd times", min=0.01, step=0.01, value=default_values["op_time_overlapped_fwd_bwd"]),
+            dbc.FormText("Specify a custom duration if Forward and Backward ops overlap completely."),
+        ], className="mb-3"),
+    ])
+)
+# Updated app layout using dbc components and structure
+app.layout = dbc.Container([
+    html.H1("Pipeline Parallelism Schedule Visualizer", className="my-4 text-center"),
+    dbc.Row([
+        dbc.Col(basic_params_card, md=4),
+        dbc.Col(scheduling_params_card, md=4),
+        dbc.Col(timing_params_card, md=4),
+    ]),
+    dbc.Row([
+        dbc.Col([
+            dbc.Button('Generate Schedule', id='generate-button', n_clicks=0, color="primary", className="mt-4"),
+        ], className="text-center")
+    ]),
+    dbc.Row([
+        dbc.Col([
+            dcc.Loading(
+                id="loading-graph-area",
+                type="circle",
+                children=html.Div(id='graph-output-container', className="mt-4")
+            )
+        ])
+    ])
+], fluid=True)
+@app.callback(
+    Output('graph-output-container', 'children'),
+    Input('generate-button', 'n_clicks'),
+    State('num_devices', 'value'),
+    State('num_stages', 'value'),
+    State('num_batches', 'value'),
+    State('p2p_latency', 'value'),
+    State('op_time_forward', 'value'),
+    State('op_time_backward', 'value'),
+    State('op_time_backward_d', 'value'),
+    State('op_time_backward_w', 'value'),
+    State('op_time_overlapped_fwd_bwd', 'value'),
+    State('strategy-checklist', 'value'),
+    prevent_initial_call=True
+)
+def update_graph(n_clicks, num_devices, num_stages, num_batches, p2p_latency,
+                 op_time_forward, op_time_backward, op_time_backward_d, op_time_backward_w,
+                 op_time_overlapped_fwd_bwd,
+                 selected_strategies):
+    # Define the desired display order for strategies
+    strategy_display_order = ["1f1b", "1f1b_interleave", "1f1b_overlap", "1f1b_interleave_overlap", "dualpipe", "zb1p"]
+    output_components = []
+    valid_results = []  # Store (strategy_name, schedule, vis_data) for valid schedules
+    error_messages = []  # Store (strategy_name, error_message) for errors
+    automatic_adjustments = []  # Store messages about automatic parameter adjustments
+    if not selected_strategies:
+        return [dbc.Alert("Please select at least one scheduling strategy.", color="warning")]
+    if not all([num_devices, num_stages, num_batches, op_time_forward]):
+         return [dbc.Alert("Missing required basic input values (Devices, Stages, Batches, Forward Time).", color="danger")]
+    for strategy in selected_strategies:
+        error_message = ""
+        placement_strategy = ""
+        # Use local copies of params that might be adjusted for this strategy
+        current_num_stages = num_stages
+        current_num_devices = num_devices
+        # Apply automatic adjustments for dualpipe
+        if strategy == "dualpipe" and num_stages != num_devices:
+            current_num_stages = num_devices  # Force num_stages = num_devices for dualpipe
+            automatic_adjustments.append(
+                f"Strategy '{strategy}': Number of Stages automatically adjusted to {num_devices} to match Number of Devices."
+            )
+        # Apply automatic adjustments for strategies that require num_stages == num_devices
+        if strategy in ["1f1b", "1f1b_overlap", "zb1p"] and num_stages != num_devices:
+            current_num_stages = num_devices
+            automatic_adjustments.append(
+                f"Strategy '{strategy}': Number of Stages automatically adjusted to {num_devices} to match Number of Devices."
+            )
+        split_backward = strategy in ["zb1p", "dualpipe"]
+        if split_backward and not all([op_time_backward_d, op_time_backward_w]):
+            error_message = f"Strategy '{strategy}': Backward D and Backward W times are required."
+        elif not split_backward and not op_time_backward:
+            error_message = f"Strategy '{strategy}': Combined Backward time is required."
+        if not error_message:
+            if strategy in ["1f1b", "1f1b_overlap", "zb1p"]:
+                placement_strategy = "standard"
+                # No need to check num_stages == num_devices as we've enforced it above
+            elif strategy in ["1f1b_interleave", "1f1b_interleave_overlap"]:
+                placement_strategy = "interleave"
+                if current_num_stages % current_num_devices != 0:
+                    error_message = f"Strategy '{strategy}': Requires Number of Stages to be divisible by Number of Devices."
+            elif strategy == "dualpipe":
+                placement_strategy = "dualpipe"
+                if current_num_stages % 2 != 0:
+                    error_message = f"Strategy '{strategy}' (DualPipe): Requires an even number of stages."
+        # Create adjusted operation times based on placement strategy
+        if not error_message:
+            try:
+                # Calculate number of stages per device for time adjustment
+                stages_per_device = current_num_stages // current_num_devices
+                # Calculate scaling factor - this normalizes operation time by stages per device
+                # For standard placement (1:1 stage:device mapping), this remains 1.0
+                # For interleaved, this scales down the time proportionally
+                time_scale_factor = 1.0 / stages_per_device if stages_per_device > 0 else 1.0
+                if stages_per_device > 1:
+                    automatic_adjustments.append(
+                        f"Strategy '{strategy}': Operation times scaled by 1/{stages_per_device} to account for {stages_per_device} stages per device."
+                    )
+                # Apply scaling to operation times
+                op_times = {
+                    "forward": float(op_time_forward) * time_scale_factor
+                }
+                if split_backward:
+                    op_times["backward_D"] = float(op_time_backward_d) * time_scale_factor
+                    op_times["backward_W"] = float(op_time_backward_w) * time_scale_factor
+                    # Keep combined for compatibility
+                    op_times["backward"] = (float(op_time_backward_d) + float(op_time_backward_w)) * time_scale_factor
+                else:
+                    op_times["backward"] = float(op_time_backward) * time_scale_factor
+                if op_time_overlapped_fwd_bwd is not None:
+                    try:
+                        overlapped_val = float(op_time_overlapped_fwd_bwd)
+                        if overlapped_val > 0:
+                             # Scale overlapped time too
+                             op_times["overlapped_forward_backward"] = overlapped_val * time_scale_factor
+                    except (ValueError, TypeError):
+                         pass
+                config = ScheduleConfig(
+                    num_devices=int(current_num_devices),
+                    num_stages=int(current_num_stages),  # Use adjusted value
+                    num_batches=int(num_batches),
+                    p2p_latency=float(p2p_latency),
+                    placement_strategy=placement_strategy,
+                    split_backward=split_backward,
+                    op_times=op_times,
+                )
+                schedule_func = STRATEGIES.get(strategy)
+                if not schedule_func:
+                     raise ValueError(f"Invalid strategy function for: {strategy}")
+                schedule = schedule_func(config)
+                schedule.execute()
+                # Store valid results instead of creating figure immediately
+                vis_data = convert_schedule_to_visualization_format(schedule)
+                valid_results.append((strategy, schedule, vis_data))
+            except (AssertionError, ValueError, TypeError) as e:
+                 error_message = f"Error generating schedule for '{strategy}': {e}"
+                 import traceback
+                 traceback.print_exc()
+            except Exception as e:
+                 error_message = f"An unexpected error occurred for '{strategy}': {e}"
+                 import traceback
+                 traceback.print_exc()
+        if error_message:
+             error_messages.append((strategy, error_message))
+    # Add alerts for any automatic parameter adjustments
+    for adjustment in automatic_adjustments:
+        output_components.append(
+            dbc.Alert(adjustment, color="info", dismissable=True)
+        )
+    # If we have valid results, calculate the maximum execution time across all schedules
+    if valid_results:
+        # Find global maximum execution time
+        max_execution_time = max(schedule.get_total_execution_time() for _, schedule, _ in valid_results)
+        # Sort valid results according to the display order
+        sorted_valid_results = []
+        # First add strategies in the predefined order
+        for strategy_name in strategy_display_order:
+            for result in valid_results:
+                if result[0] == strategy_name:
+                    sorted_valid_results.append(result)
+        # Then add any remaining strategies that might not be in the predefined order
+        for result in valid_results:
+            if result[0] not in strategy_display_order:
+                sorted_valid_results.append(result)
+        # Create figures with aligned x-axis, using the sorted results
+        for strategy, _, vis_data in sorted_valid_results:
+            fig = create_pipeline_figure(vis_data, max_time=max_execution_time, show_progress=False)
+            # Force the x-axis range to be the same for all figures
+            # Add a small margin (5%) for better visualization
+            margin = max_execution_time * 0.05
+            fig.update_layout(
+                xaxis=dict(
+                    range=[0, max_execution_time + margin]
+                )
+            )
+            output_components.append(html.Div([
+                html.H4(f"Schedule: {strategy}", className="text-center mt-3 mb-2"),
+                dcc.Graph(figure=fig)
+            ]))
+    # Add error messages to output
+    for strategy, msg in error_messages:
+        output_components.append(
+            dbc.Alert(msg, color="danger", className="mt-3")
+        )
+    return output_components
+# For Hugging Face Spaces deployment
+server = app.server
+if __name__ == '__main__':
+    app.run_server(debug=False, host='0.0.0.0', port=7860)

conf/config.yaml ADDED Viewed

	@@ -0,0 +1,25 @@

+# Default configuration for Pipeline Parallelism Emulation
+num_devices: 4
+num_stages: 4
+num_batches: 8
+visualization_port: 8050
+strategy: "1f1b"  # Options: "1f1b", "interleave"
+p2p_latency: 0.0
+# Operation time configurations
+op_times:
+  # Option 1: Simple configuration (same time for all stages)
+  forward: 1.0
+  backward: 2.0
+  backward_D: 1.0
+  backward_W: 1.0
+  overlapped_forward_backward: 3.0
+  # Option 2: Commented example of stage-specific configuration
+  # forward:
+  #   0: 0.8  # Stage 0 forward time
+  #   1: 1.2  # Stage 1 forward time
+  #   2: 1.5  # Stage 2 forward time
+  #   3: 0.9  # Stage 3 forward time
+  # backward:
+  #   0: 1.0  # Stage 0 backward time

main.py ADDED Viewed

	@@ -0,0 +1,156 @@

+from src.execution_model import ScheduleConfig
+from src.strategies import (
+    generate_1f1b_interleave_overlap_schedule,
+    generate_1f1b_interleave_schedule,
+    generate_1f1b_overlap_schedule,
+    generate_1f1b_schedule,
+    generate_zero_bubble_1p_schedule,
+    generate_dualpipe_schedule,
+)
+from src.visualizer import visualize_pipeline_parallelism_dash
+import hydra
+from omegaconf import DictConfig, OmegaConf
+@hydra.main(config_path="conf", config_name="config", version_base=None)
+def main(cfg: DictConfig) -> None:
+    """Run pipeline parallelism simulation with the specified configuration."""
+    print(f"Running with configuration: {cfg}")
+    if cfg.strategy == "1f1b":
+        run_1f1b(cfg)
+    elif cfg.strategy == "interleave":
+        run_interleave(cfg)
+    elif cfg.strategy == "zb1p":
+        run_zero_bubble_1p(cfg)
+    elif cfg.strategy == "1f1b_overlap":
+        run_1f1b_overlap(cfg)
+    elif cfg.strategy == "1f1b_interleave_overlap":
+        run_1f1b_interleave_overlap(cfg)
+    elif cfg.strategy == "dualpipe":
+        run_dualpipe(cfg)
+    else:
+        raise ValueError(f"Unknown strategy: {cfg.strategy}")
+def run_1f1b(cfg: DictConfig) -> None:
+    """Run 1F1B pipeline parallelism simulation."""
+    # Convert OmegaConf to dict for op_times if it exists
+    op_times = (
+        OmegaConf.to_container(cfg.op_times) if hasattr(cfg, "op_times") else None
+    )
+    schedule_config = ScheduleConfig(
+        num_devices=cfg.num_devices,
+        num_stages=cfg.num_stages,
+        num_batches=cfg.num_batches,
+        p2p_latency=cfg.p2p_latency,
+        op_times=op_times,
+        placement_strategy="standard",
+    )
+    schedule = generate_1f1b_schedule(schedule_config)
+    schedule.execute()
+    visualize_pipeline_parallelism_dash(schedule, port=cfg.visualization_port)
+def run_interleave(cfg: DictConfig) -> None:
+    """Run interleaved pipeline parallelism simulation."""
+    # Convert OmegaConf to dict for op_times if it exists
+    op_times = (
+        OmegaConf.to_container(cfg.op_times) if hasattr(cfg, "op_times") else None
+    )
+    schedule_config = ScheduleConfig(
+        num_devices=cfg.num_devices,
+        num_stages=cfg.num_stages,
+        num_batches=cfg.num_batches,
+        p2p_latency=cfg.p2p_latency,
+        placement_strategy="interleave",
+        op_times=op_times,
+    )
+    schedule = generate_1f1b_interleave_schedule(schedule_config)
+    schedule.execute()
+    visualize_pipeline_parallelism_dash(schedule, port=cfg.visualization_port)
+def run_zero_bubble_1p(cfg: DictConfig) -> None:
+    """Run zero bubble 1P pipeline parallelism simulation."""
+    # Convert OmegaConf to dict for op_times if it exists
+    op_times = (
+        OmegaConf.to_container(cfg.op_times) if hasattr(cfg, "op_times") else None
+    )
+    schedule_config = ScheduleConfig(
+        num_devices=cfg.num_devices,
+        num_stages=cfg.num_stages,
+        num_batches=cfg.num_batches,
+        p2p_latency=cfg.p2p_latency,
+        op_times=op_times,
+        split_backward=True,
+    )
+    schedule = generate_zero_bubble_1p_schedule(schedule_config)
+    schedule.execute()
+    visualize_pipeline_parallelism_dash(schedule, port=cfg.visualization_port)
+def run_1f1b_overlap(cfg: DictConfig) -> None:
+    """Run 1F1B overlap pipeline parallelism simulation."""
+    # Convert OmegaConf to dict for op_times if it exists
+    op_times = (
+        OmegaConf.to_container(cfg.op_times) if hasattr(cfg, "op_times") else None
+    )
+    schedule_config = ScheduleConfig(
+        num_devices=cfg.num_devices,
+        num_stages=cfg.num_stages,
+        num_batches=cfg.num_batches,
+        p2p_latency=cfg.p2p_latency,
+        op_times=op_times,
+        split_backward=False,
+    )
+    schedule = generate_1f1b_overlap_schedule(schedule_config)
+    schedule.execute()
+    visualize_pipeline_parallelism_dash(schedule, port=cfg.visualization_port)
+def run_1f1b_interleave_overlap(cfg: DictConfig) -> None:
+    """Run 1F1B interleave overlapped pipeline parallelism simulation."""
+    # Convert OmegaConf to dict for op_times if it exists
+    op_times = (
+        OmegaConf.to_container(cfg.op_times) if hasattr(cfg, "op_times") else None
+    )
+    schedule_config = ScheduleConfig(
+        num_devices=cfg.num_devices,
+        num_stages=cfg.num_stages,
+        num_batches=cfg.num_batches,
+        p2p_latency=cfg.p2p_latency,
+        placement_strategy="interleave",
+        op_times=op_times,
+    )
+    schedule = generate_1f1b_interleave_overlap_schedule(schedule_config)
+    schedule.execute()
+    visualize_pipeline_parallelism_dash(schedule, port=cfg.visualization_port)
+def run_dualpipe(cfg: DictConfig) -> None:
+    """Run DualPipe pipeline parallelism simulation."""
+    # Convert OmegaConf to dict for op_times if it exists
+    op_times = (
+        OmegaConf.to_container(cfg.op_times) if hasattr(cfg, "op_times") else None
+    )
+    schedule_config = ScheduleConfig(
+        num_devices=cfg.num_devices,
+        num_stages=cfg.num_stages,
+        num_batches=cfg.num_batches,
+        p2p_latency=cfg.p2p_latency,
+        op_times=op_times,
+        split_backward=True,
+        placement_strategy="dualpipe",
+    )
+    schedule = generate_dualpipe_schedule(schedule_config)
+    schedule.execute()
+    visualize_pipeline_parallelism_dash(schedule, port=cfg.visualization_port)
+if __name__ == "__main__":
+    main()

pyproject.toml ADDED Viewed

	@@ -0,0 +1,69 @@

+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[project]
+name = "pp-emulation"
+version = "0.1.0"
+description = "Pipeline Parallelism Emulation and Visualization"
+readme = "README.md"
+requires-python = ">=3.10"
+authors = [
+    {name = "Zhenhuan Liu"}
+]
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+]
+dependencies = [
+    "dash>=2.14.0",
+    "hydra-core>=1.3.2",
+    "omegaconf>=2.3.0",
+    "plotly>=5.18.0",
+    "pandas>=2.1.0",
+    "numpy>=1.26.0",
+    "tqdm>=4.67.0",
+    "dash-bootstrap-components>=1.7.1",
+    "gunicorn>=23.0.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.4.0",
+    "black>=23.7.0",
+    "isort>=5.12.0",
+    "mypy>=1.5.1",
+]
+# Add Hatch configuration to explicitly define where source code is located
+[tool.hatch.build.targets.wheel]
+packages = ["src"]
+[tool.hatch.build.targets.sdist]
+include = [
+    "src",
+    "main.py",
+    "conf",
+    "LICENSE",
+    "README.md",
+]
+[tool.black]
+line-length = 88
+target-version = ["py310"]
+[tool.isort]
+profile = "black"
+line_length = 88
+[tool.mypy]
+python_version = "3.10"
+warn_return_any = true
+warn_unused_configs = true
+disallow_untyped_defs = true
+disallow_incomplete_defs = true
+[tool.pytest]
+testpaths = ["tests"]
+pythonpath = ["."]

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+dash==2.14.2
+dash-bootstrap-components==1.7.1
+plotly==5.18.0
+gunicorn==21.2.0
+hydra-core==1.3.2
+omegaconf==2.3.0
+pandas==2.1.0
+numpy==1.26.0
+tqdm==4.67.0

src/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Pipeline Parallelism Emulation and Visualization package."""
2	+
3	+ __version__ = "0.1.0"

src/execution_model.py ADDED Viewed

	@@ -0,0 +1,401 @@

+from collections import defaultdict
+from typing import Dict, List, Optional, Union
+class Operation:
+    """Operation is a single operation in the pipeline."""
+    def __init__(self, batch_id: int, stage_id: int, op_type: str):
+        self.batch_id = batch_id
+        self.stage_id = stage_id
+        self.op_type = op_type
+        self.device_id = None
+        self.start_time = None
+        self.end_time = None
+    def set_end_time(self, end_time: float):
+        self.end_time = end_time
+    def set_start_time(self, start_time: float):
+        self.start_time = start_time
+    def __repr__(self) -> str:
+        return f"Operation(batch_id={self.batch_id}, stage_id={self.stage_id}, op_type={self.op_type})"
+class OverlappedOperation:
+    """Represents multiple operations that are overlapped/executed concurrently."""
+    def __init__(self, operations: List[Operation]):
+        self.operations = operations
+        self.device_id = operations[0].device_id
+        # Validate all operations are on the same device
+        for op in operations:
+            assert op.device_id == self.device_id, "All operations must be on the same device"
+        # Create a combined op_type (e.g., "overlapped_forward_backward")
+        self.op_type = "overlapped_" + "_".join([op.op_type for op in operations])
+        # Use the batch_id and stage_id of the first operation for identification
+        # (though we'll track all operations internally)
+        self.batch_id = operations[0].batch_id
+        self.stage_id = operations[0].stage_id
+        # Initialize timing information
+        self.start_time = None
+        self.end_time = None
+    def set_end_time(self, end_time: float):
+        self.end_time = end_time
+        for op in self.operations:
+            op.set_end_time(end_time)
+    def set_start_time(self, start_time: float):
+        self.start_time = start_time
+        for op in self.operations:
+            op.set_start_time(start_time)
+    def __repr__(self) -> str:
+        op_str = ", ".join([f"({op.batch_id},{op.stage_id},{op.op_type})" for op in self.operations])
+        return f"OverlappedOperation([{op_str}])"
+class DeviceQueue:
+    def __init__(self, stages: List[int], device_id: int):
+        self.stages = stages
+        self.device_id = device_id
+        self.ops = []  # List of operations
+    def add_operation(self, op: Operation):
+        assert op.stage_id in self.stages
+        self.ops.append(op)
+        assert op.device_id is None, f"Operation {op.batch_id}, {op.stage_id}, {op.op_type} already has a device id on {op.device_id}"
+        op.device_id = self.device_id
+class ScheduleConfig:
+    def __init__(
+        self,
+        num_devices: int,
+        num_stages: int,
+        num_batches: int,
+        p2p_latency: float = 0.0,
+        placement_strategy: str = "standard",
+        split_backward: bool = False,
+        op_times: Optional[Dict[str, Union[float, Dict[int, float]]]] = None,
+    ):
+        self.num_devices = num_devices
+        self.num_stages = num_stages
+        self.num_batches = num_batches
+        self.p2p_latency = p2p_latency
+        self.placement_strategy = placement_strategy
+        self.split_backward = split_backward
+        # Initialize default operation times
+        if self.split_backward:
+            self.op_times = {
+                "forward": 1.0,
+                "backward_D": 1.0,
+                "backward_W": 1.0,
+                "backward": 2.0,
+            }
+        else:
+            self.op_times = {
+                "forward": 1.0,
+                "backward": 2.0,
+            }
+        # Update with user-provided operation times
+        if op_times:
+            for op_type, times in op_times.items():
+                if isinstance(times, dict):
+                    # If a dict is provided, it maps stage_id -> time
+                    if op_type not in self.op_times:
+                        self.op_times[op_type] = {}
+                    elif not isinstance(self.op_times[op_type], dict):
+                        # Convert float to dict if needed
+                        self.op_times[op_type] = {i: self.op_times[op_type] for i in range(num_stages)}
+                    # Update with provided stage-specific times
+                    for stage_id, time in times.items():
+                        if not isinstance(self.op_times[op_type], dict):
+                            self.op_times[op_type] = {i: self.op_times[op_type] for i in range(num_stages)}
+                        self.op_times[op_type][stage_id] = time
+                else:
+                    # If a float is provided, use same time for all stages
+                    self.op_times[op_type] = times
+        assert num_stages % num_devices == 0, "num_stages must be divisible by num_devices"
+        self.num_stages_per_device = num_stages // num_devices
+        self.init_device_to_stages()
+        if self.placement_strategy == "dualpipe":
+            assert (
+                sum(len(stages) for stages in self.device_to_stages.values()) == num_stages * 2
+            )
+        else:
+            assert (
+                sum(len(stages) for stages in self.device_to_stages.values()) == num_stages
+            )
+    def init_device_to_stages(self):
+        if self.placement_strategy == "standard":
+            # Evenly distributed
+            stages_per_device = self.num_stages // self.num_devices
+            self.device_to_stages = defaultdict(list)
+            for i in range(self.num_stages):
+                device_to_put = i // stages_per_device
+                self.device_to_stages[device_to_put].append(i)
+        elif self.placement_strategy == "interleave":
+            self.device_to_stages = defaultdict(list)
+            for i in range(self.num_stages):
+                device_to_put = i % self.num_devices
+                self.device_to_stages[device_to_put].append(i)
+        elif self.placement_strategy == "dualpipe":
+            # For DualPipe, each device has two stages
+            assert self.num_devices == self.num_stages, "DualPipe requires num_devices == num_stages"
+            assert self.num_devices % 2 == 0, "DualPipe requires an even number of devices"
+            self.device_to_stages = defaultdict(list)
+            for i in range(self.num_stages):
+                self.device_to_stages[i] = [i, self.num_stages - i - 1]
+        else:
+            raise ValueError(f"Invalid placement strategy: {self.placement_strategy}")
+    def get_op_time(self, op_type: str, stage_id: int):
+        # For overlapped operations, extract the original operation types
+        if op_type.startswith("overlapped_"):
+            if op_type in self.op_times:
+                if isinstance(self.op_times[op_type], dict):
+                    if stage_id in self.op_times[op_type]:
+                        return self.op_times[op_type][stage_id]
+                    else:
+                        raise ValueError(f"No time specified for operation {op_type} at stage {stage_id}")
+                else:
+                    return self.op_times[op_type]
+            else:
+                op_parts = op_type.split("_")[1:]
+                if len(op_parts) >= 2:
+                    op_type1, op_type2 = op_parts[0], op_parts[1]
+                    return self.get_op_time(op_type1, stage_id) + self.get_op_time(op_type2, stage_id)
+        if op_type not in self.op_times:
+            raise ValueError(f"Invalid operation type: {op_type}")
+        times = self.op_times[op_type]
+        if isinstance(times, dict):
+            # If we have stage-specific times, use those
+            if stage_id not in times:
+                raise ValueError(f"No time specified for operation {op_type} at stage {stage_id}")
+            return times[stage_id]
+        else:
+            # If we have a single float, use the same value for all stages
+            return times
+class Schedule:
+    def __init__(self, config: ScheduleConfig, init_ops: bool = True):
+        self.ops = {}  # (batch_id, stage_id, op_type) -> Operation
+        self.device_queues: List[DeviceQueue] = []
+        for dev_id in range(config.num_devices):
+            self.device_queues.append(DeviceQueue(config.device_to_stages[dev_id], dev_id))
+        self.config = config
+        if init_ops:
+            self.init_operations()
+        self.op_to_overlapped = {}
+    def register_overlapped_operation(self, overlapped_op: OverlappedOperation):
+        for op in overlapped_op.operations:
+            self.op_to_overlapped[(op.batch_id, op.stage_id, op.op_type)] = overlapped_op
+            self.ops[(op.batch_id, op.stage_id, op.op_type)] = overlapped_op
+    def register_operation(self, op: Operation):
+        assert (op.batch_id, op.stage_id, op.op_type) not in self.ops, f"Operation {op.batch_id}, {op.stage_id}, {op.op_type} already registered"
+        self.ops[(op.batch_id, op.stage_id, op.op_type)] = op
+    def init_operations(self):
+        op_types = ["forward", "backward"]
+        if self.config.split_backward:
+            op_types = ["forward", "backward_D", "backward_W"]
+        for batch_id in range(self.config.num_batches):
+            for stage_id in range(self.config.num_stages):
+                for op_type in op_types:
+                    self.ops[(batch_id, stage_id, op_type)] = Operation(
+                        batch_id, stage_id, op_type
+                    )
+    def get_op(self, batch_id: int, stage_id: int, op_type: str, allow_none=False):
+        if (batch_id, stage_id, op_type) in self.op_to_overlapped:
+            return self.op_to_overlapped[(batch_id, stage_id, op_type)]
+        if allow_none:
+            if (batch_id, stage_id, op_type) not in self.ops:
+                return None
+        return self.ops[(batch_id, stage_id, op_type)]
+    def get_dependencies(self, op: Operation, include_device_dependency=True):
+        deps = []
+        if isinstance(op, OverlappedOperation):
+            for sub_op in op.operations:
+                deps.extend(self.get_dependencies(sub_op, include_device_dependency=False))
+            if include_device_dependency:
+                device_index = self.device_queues[op.device_id].ops.index(op)
+                if device_index > 0:
+                    deps.append((self.device_queues[op.device_id].ops[device_index - 1], 0.0))
+            return deps
+        if op.op_type == "forward":
+            if op.stage_id > 0:
+                deps.append(
+                    (
+                        self.get_op(op.batch_id, op.stage_id - 1, "forward"),
+                        self.config.p2p_latency,
+                    )
+                )
+        if self.config.split_backward:
+            if op.op_type == "backward_D":
+                if op.stage_id < self.config.num_stages - 1:
+                    op_bwd_d = self.get_op(op.batch_id, op.stage_id + 1, "backward_D", allow_none=True)
+                    if op_bwd_d is not None:
+                        deps.append(
+                            (
+                                op_bwd_d,
+                                self.config.p2p_latency,
+                            )
+                        )
+                    else:
+                        deps.append(
+                            (
+                                self.get_op(op.batch_id, op.stage_id + 1, "backward"),
+                                self.config.p2p_latency,
+                            )
+                        )
+            elif op.op_type == "backward_W":
+                if op.stage_id < self.config.num_stages - 1:
+                    op_bwd_d = self.get_op(op.batch_id, op.stage_id, "backward_D", allow_none=True)
+                    if op_bwd_d is not None:
+                        deps.append(
+                            (
+                                op_bwd_d,
+                                self.config.p2p_latency,
+                            )
+                        )
+                    else:
+                        deps.append(
+                            (
+                                self.get_op(op.batch_id, op.stage_id, "backward"),
+                                self.config.p2p_latency,
+                            )
+                        )
+            elif op.op_type == "backward":
+                if op.stage_id < self.config.num_stages - 1:
+                    op_bwd = self.get_op(op.batch_id, op.stage_id + 1, "backward", allow_none=True)
+                    if op_bwd is not None:
+                        deps.append(
+                            (
+                                op_bwd,
+                                self.config.p2p_latency,
+                            )
+                        )
+                    else:
+                        deps.append(
+                            (
+                                self.get_op(op.batch_id, op.stage_id + 1, "backward_D"),
+                                self.config.p2p_latency,
+                            )
+                        )
+        else:
+            if op.op_type == "backward":
+                if op.stage_id < self.config.num_stages - 1:
+                    deps.append(
+                        (
+                            self.get_op(op.batch_id, op.stage_id + 1, "backward"),
+                            self.config.p2p_latency,
+                        )
+                    )
+        if include_device_dependency:
+            device_index = self.device_queues[op.device_id].ops.index(op)
+            if device_index > 0:
+                deps.append((self.device_queues[op.device_id].ops[device_index - 1], 0.0))
+        return deps
+    def show(self):
+        """Display detailed information about the schedule for debugging purposes."""
+        print("\n=== SCHEDULE DETAILS ===")
+        print(f"Devices: {self.config.num_devices}, Stages: {self.config.num_stages}, Batches: {self.config.num_batches}")
+        print(f"Placement Strategy: {self.config.placement_strategy}")
+        print("\n=== DEVICE QUEUES ===")
+        for dev_id in range(self.config.num_devices):
+            print(f"\nDEVICE {dev_id} (Stages: {self.device_queues[dev_id].stages}):")
+            print("-" * 80)
+            print(f"{'Batch':^6} | {'Stage':^6} | {'Type':^10} | {'Start':^10} | {'End':^10} | {'Duration':^10}")
+            print("-" * 80)
+            for op in self.device_queues[dev_id].ops:
+                op_type = op.op_type
+                start = f"{op.start_time:.2f}" if op.start_time is not None else "N/A"
+                end = f"{op.end_time:.2f}" if op.end_time is not None else "N/A"
+                duration = "N/A"
+                if op.start_time is not None and op.end_time is not None:
+                    duration = f"{op.end_time - op.start_time:.2f}"
+                print(f"{op.batch_id:^6} | {op.stage_id:^6} | {op_type:^10} | {start:^10} | {end:^10} | {duration:^10}")
+        # Find the total execution time (if timing info is available)
+        if all(op.end_time is not None for op in self.ops.values()):
+            total_time = max(op.end_time for op in self.ops.values())
+            print(f"\nTotal execution time: {total_time:.2f}")
+    def execute(self):
+        # TODO: change the execution order to topological order via DAG
+        def execute_op(op: Operation):
+            if op.end_time is not None:
+                return
+            deps = self.get_dependencies(op)
+            if len(deps) == 0:
+                op.set_start_time(0.0)
+            else:
+                for dep, gap in deps:
+                    if dep.end_time is None or dep.start_time is None:
+                        execute_op(dep)
+                op.set_start_time(max(dep.end_time + gap for dep, gap in deps))
+            op.set_end_time(op.start_time + self.config.get_op_time(
+                op.op_type, op.stage_id
+            ))
+        op_num = len(self.device_queues[0].ops)
+        for i in range(op_num):
+            for dev_id in range(self.config.num_devices):
+                if len(self.device_queues[dev_id].ops) <= i:
+                    continue
+                op = self.device_queues[dev_id].ops[i]
+                execute_op(op)
+        for op in self.ops.values():
+            assert (
+                op.start_time is not None
+            ), f"op {op.batch_id}, {op.stage_id}, {op.op_type} has no start time"
+            assert (
+                op.end_time is not None
+            ), f"op {op.batch_id}, {op.stage_id}, {op.op_type} has no end time"
+    def get_total_execution_time(self):
+        return max(op.end_time for op in self.ops.values())
+    def get_bubble_rate(self):
+        actual_time = self.get_total_execution_time()
+        ideal_time = 0
+        for stage_id in range(self.config.num_stages):
+            for op_type in ["forward", "backward"]:
+                ideal_time += self.config.get_op_time(op_type, stage_id)
+        ideal_time = ideal_time * self.config.num_batches / self.config.num_devices
+        return (actual_time - ideal_time) / ideal_time
+    def get_device_running_time(self):
+        device_time = [0] * self.config.num_devices
+        for dev_id in range(self.config.num_devices):
+            for op in self.device_queues[dev_id].ops:
+                device_time[dev_id] += op.end_time - op.start_time
+        return device_time

src/strategies.py ADDED Viewed

	@@ -0,0 +1,581 @@

+from collections import defaultdict, deque
+from src.execution_model import OverlappedOperation, Operation, Schedule, ScheduleConfig
+def generate_1f1b_schedule(config: ScheduleConfig):
+    schedule = Schedule(config)
+    assert config.num_devices == config.num_stages, "num_devices must be equal to num_stages for 1F1B"
+    for i in range(config.num_devices):
+        fwd_batch_id = 0
+        bwd_batch_id = 0
+        cooldown_batches = warmup_batches = config.num_devices - i - 1
+        steady_batches = config.num_batches - warmup_batches
+        for _ in range(warmup_batches):
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(fwd_batch_id, i, "forward")
+            )
+            fwd_batch_id += 1
+        for _ in range(steady_batches):
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(fwd_batch_id, i, "forward")
+            )
+            fwd_batch_id += 1
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(bwd_batch_id, i, "backward")
+            )
+            bwd_batch_id += 1
+        for _ in range(cooldown_batches):
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(bwd_batch_id, i, "backward")
+            )
+            bwd_batch_id += 1
+    return schedule
+def generate_zero_bubble_1p_schedule(config: ScheduleConfig):
+    # Create a new schedule with split_backward=True to support backward_D and backward_W operations
+    schedule = Schedule(config)
+    total_batches = config.num_batches
+    assert config.num_devices == config.num_stages, "num_devices must be equal to num_stages for ZB-1P"
+    assert config.split_backward, "ZB-1P requires split_backward=True"
+    for i in range(config.num_devices):
+        fwd_batch_id = 0
+        bwd_d_batch_id = 0
+        bwd_w_batch_id = 0
+        cooldown_batches = warmup_batches = config.num_devices - i - 1
+        steady_batches = total_batches - warmup_batches
+        for _ in range(warmup_batches):
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(fwd_batch_id, i, "forward")
+            )
+            fwd_batch_id += 1
+        for _ in range(steady_batches):
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(fwd_batch_id, i, "forward")
+            )
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(bwd_d_batch_id, i, "backward_D")
+            )
+            if fwd_batch_id - bwd_w_batch_id >= config.num_devices - 1:
+                schedule.device_queues[i].add_operation(
+                    schedule.get_op(bwd_w_batch_id, i, "backward_W")
+                )
+                bwd_w_batch_id += 1
+            bwd_d_batch_id += 1
+            fwd_batch_id += 1
+        for _ in range(cooldown_batches):
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(bwd_d_batch_id, i, "backward_D")
+            )
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(bwd_w_batch_id, i, "backward_W")
+            )
+            bwd_w_batch_id += 1
+            bwd_d_batch_id += 1
+        while bwd_w_batch_id < total_batches:
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(bwd_w_batch_id, i, "backward_W")
+            )
+            bwd_w_batch_id += 1
+    return schedule
+def generate_1f1b_overlap_schedule(config: ScheduleConfig):
+    schedule = Schedule(config)
+    assert config.num_devices == config.num_stages, "num_devices must be equal to num_stages for 1F1B"
+    for i in range(config.num_devices):
+        fwd_batch_id = 0
+        bwd_batch_id = 0
+        cooldown_batches = warmup_batches = 2 * (config.num_devices - i - 1) + 1
+        steady_batches = config.num_batches - warmup_batches
+        for _ in range(warmup_batches):
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(fwd_batch_id, i, "forward")
+            )
+            fwd_batch_id += 1
+        for _ in range(steady_batches):
+            fwd_op = schedule.get_op(fwd_batch_id, i, "forward")
+            bwd_op = schedule.get_op(bwd_batch_id, i, "backward")
+            overlapped_op = OverlappedOperation([fwd_op, bwd_op])
+            schedule.register_overlapped_operation(overlapped_op)
+            schedule.device_queues[i].add_operation(overlapped_op)
+            fwd_batch_id += 1
+            bwd_batch_id += 1
+        for _ in range(cooldown_batches):
+            schedule.device_queues[i].add_operation(
+                schedule.get_op(bwd_batch_id, i, "backward")
+            )
+            bwd_batch_id += 1
+    return schedule
+def _get_pp_rank_microbatches(
+    num_microbatches,
+    num_devices,
+    device_id,
+    num_stages_per_device,
+    microbatch_group_size_per_vp_stage,
+):
+    """Get the number of total, warmup, and remaining microbatches in PP scheduling."""
+    total_num_microbatches = num_microbatches * num_stages_per_device
+    if num_devices > 1:
+        # Run (num_model_chunks-1)*microbatch_group_size_per_vp_stage on
+        # all workers, followed by more microbatches after depending on
+        # stage ID (more forward passes for earlier stages, later stages can
+        # immediately start with 1F1B).
+        num_warmup_microbatches = (num_devices - device_id - 1) * 2
+        num_warmup_microbatches += (num_stages_per_device - 1) * microbatch_group_size_per_vp_stage
+    else:
+        # forward_backward_no_pipelining
+        num_warmup_microbatches = 1
+    if num_warmup_microbatches >= total_num_microbatches:
+        num_warmup_microbatches = total_num_microbatches
+    return num_warmup_microbatches
+def _get_schedule_table(num_microbatches, num_model_chunks, microbatch_group_size_per_vp_stage):
+    """Get the schedule table for PP scheduling.
+    Create a tunable schedule lookup table.
+    The schedule lookup table uses the virtual_microbatch_id to find the corresponding microbatch_id and model_chunk_id.
+    For example, the tunable schedule table for PP2 N3M5 with VP2 is constructed as below:
+    virtual_microbatch_id | 0 1 2 3 4 5 6 7 8 9
+    microbatch_id         | 0 1 2 0 1 2 3 4 3 4
+    model_chunk_id        | 0 0 0 1 1 1 0 0 1 1
+    """
+    schedule_table = []
+    for min_microbatch_id_in_group in range(
+        0, num_microbatches, microbatch_group_size_per_vp_stage
+    ):
+        if min_microbatch_id_in_group + microbatch_group_size_per_vp_stage >= num_microbatches:
+            # Construct schedule for the last microbatch group
+            schedule_table.extend(
+                [
+                    (microbatch_id, model_chunk_id)
+                    for model_chunk_id in range(num_model_chunks)
+                    for microbatch_id in range(min_microbatch_id_in_group, num_microbatches)
+                ]
+            )
+        else:
+            # Construct schedule for other microbatch groups
+            schedule_table.extend(
+                [
+                    (microbatch_id, model_chunk_id)
+                    for model_chunk_id in range(num_model_chunks)
+                    for microbatch_id in range(
+                        min_microbatch_id_in_group,
+                        min_microbatch_id_in_group + microbatch_group_size_per_vp_stage,
+                    )
+                ]
+            )
+    return schedule_table
+def _convert_schedule_table_to_order(num_warmup_microbatches, num_model_chunks, schedule_table):
+    """Convert a tunable schedule lookup table to the te.make_graphed_callables() accepted
+    order format. For example, the tunable schedule table for PP2 N3M5 with VP2 is as below:
+    virtual_microbatch_id | 0 1 2 3 4 5 6 7 8 9
+    microbatch_id         | 0 1 2 0 1 2 3 4 3 4
+    model_chunk_id        | 0 0 0 1 1 1 0 0 1 1
+    Then the forward backward separated order is:
+    forward               | 1 1 1 2 2 2 1 1 2 2
+    backward              | -2 -2 -2 -1 -1 -1 -2 -2 -1 -1
+    If num_warmup_microbatches is 5, the output order is:
+    1 1 1 2 2 2 -2 1 -2 1 -2 2 -1 2 -1 -1 -2 -2 -1 -1
+    """
+    _, model_chunk_id_table = zip(*schedule_table)
+    forward_order = [chunk_id + 1 for chunk_id in model_chunk_id_table]
+    backward_order = [chunk_id - num_model_chunks for chunk_id in model_chunk_id_table]
+    order = forward_order[:num_warmup_microbatches]
+    for i in range(num_warmup_microbatches, len(forward_order)):
+        order.append(forward_order[i])
+        order.append(backward_order[i - num_warmup_microbatches])
+    if num_warmup_microbatches > 0:
+        order.extend(backward_order[-num_warmup_microbatches:])
+    return order
+# Some codes are copied from Megatron-LM
+def generate_1f1b_interleave_schedule(config: ScheduleConfig):
+    schedule = Schedule(config)
+    for device_id in range(config.num_devices):
+        microbatch_group_size_per_vp_stage = config.num_devices
+        num_warmup_microbatches = _get_pp_rank_microbatches(
+            config.num_batches,
+            config.num_devices,
+            device_id,
+            config.num_stages_per_device,
+            microbatch_group_size_per_vp_stage,
+        )
+        schedule_table = _get_schedule_table(
+            config.num_batches,
+            config.num_stages_per_device,
+            microbatch_group_size_per_vp_stage,
+        )
+        order = _convert_schedule_table_to_order(
+            num_warmup_microbatches,
+            num_model_chunks=config.num_stages_per_device,
+            schedule_table=schedule_table,
+        )
+        cur_stage_microbatch_id = {}
+        for i in range(1, config.num_stages_per_device+1):
+            cur_stage_microbatch_id[i] = 0
+            cur_stage_microbatch_id[-i] = 0
+        for order_item in order:
+            stage_id = schedule.device_queues[device_id].stages[abs(order_item)-1]
+            if order_item > 0:
+                op_type = "forward"
+                micro_batch_id = cur_stage_microbatch_id[order_item]
+                cur_stage_microbatch_id[order_item] = cur_stage_microbatch_id[order_item] + 1
+            elif order_item < 0:
+                op_type = "backward"
+                micro_batch_id = cur_stage_microbatch_id[order_item]
+                cur_stage_microbatch_id[order_item] = cur_stage_microbatch_id[order_item] + 1
+            else:
+                raise ValueError(f"Invalid order item: {order_item}")
+            schedule.device_queues[device_id].add_operation(
+                schedule.get_op(micro_batch_id, stage_id, op_type)
+            )
+    return schedule
+def generate_1f1b_interleave_overlap_schedule(config: ScheduleConfig):
+    schedule = Schedule(config)
+    for device_id in range(config.num_devices):
+        microbatch_group_size_per_vp_stage = config.num_devices
+        num_warmup_microbatches = _get_pp_rank_microbatches(
+            config.num_batches,
+            config.num_devices,
+            device_id,
+            config.num_stages_per_device,
+            microbatch_group_size_per_vp_stage,
+        )
+        schedule_table = _get_schedule_table(
+            config.num_batches,
+            config.num_stages_per_device,
+            microbatch_group_size_per_vp_stage,
+        )
+        # NOTE: Add one more warmup microbatch for overlapped operations!
+        num_warmup_microbatches += 1
+        order = _convert_schedule_table_to_order(
+            num_warmup_microbatches,
+            num_model_chunks=config.num_stages_per_device,
+            schedule_table=schedule_table,
+        )
+        cur_stage_microbatch_id = {}
+        for i in range(1, config.num_stages_per_device+1):
+            cur_stage_microbatch_id[i] = 0
+            cur_stage_microbatch_id[-i] = 0
+        i = 0
+        num_overlapped_batches = len(order) - num_warmup_microbatches * 2
+        while i < len(order):
+            if i < num_warmup_microbatches:
+                order_item = order[i]
+                assert order_item > 0
+                op_type = "forward"
+                micro_batch_id = cur_stage_microbatch_id[order_item]
+                cur_stage_microbatch_id[order_item] = cur_stage_microbatch_id[order_item] + 1
+                stage_id = schedule.device_queues[device_id].stages[abs(order_item)-1]
+                schedule.device_queues[device_id].add_operation(
+                    schedule.get_op(micro_batch_id, stage_id, op_type)
+                )
+                i += 1
+            elif i >= num_warmup_microbatches and i < num_warmup_microbatches + num_overlapped_batches - 1:
+                order_item_a = order[i]
+                order_item_b = order[i+1]
+                op_type_a = "forward" if order_item_a > 0 else "backward"
+                micro_batch_id_a = cur_stage_microbatch_id[order_item_a]
+                cur_stage_microbatch_id[order_item_a] = cur_stage_microbatch_id[order_item_a] + 1
+                op_type_b = "forward" if order_item_b > 0 else "backward"
+                micro_batch_id_b = cur_stage_microbatch_id[order_item_b]
+                cur_stage_microbatch_id[order_item_b] = cur_stage_microbatch_id[order_item_b] + 1
+                stage_id_a = schedule.device_queues[device_id].stages[abs(order_item_a)-1]
+                stage_id_b = schedule.device_queues[device_id].stages[abs(order_item_b)-1]
+                op_a = schedule.get_op(micro_batch_id_a, stage_id_a, op_type_a)
+                op_b = schedule.get_op(micro_batch_id_b, stage_id_b, op_type_b)
+                overlapped_op = OverlappedOperation([op_a, op_b])
+                schedule.register_overlapped_operation(overlapped_op)
+                schedule.device_queues[device_id].add_operation(overlapped_op)
+                i += 2
+            else:
+                assert i >= num_warmup_microbatches + num_overlapped_batches
+                order_item = order[i]
+                assert order_item < 0
+                op_type = "backward"
+                micro_batch_id = cur_stage_microbatch_id[order_item]
+                cur_stage_microbatch_id[order_item] = cur_stage_microbatch_id[order_item] + 1
+                stage_id = schedule.device_queues[device_id].stages[abs(order_item)-1]
+                schedule.device_queues[device_id].add_operation(
+                    schedule.get_op(micro_batch_id, stage_id, op_type)
+                )
+                i += 1
+    return schedule
+def create_overlapped_ops(schedule, batch_id1, batch_id2, stage_id, type1, type2):
+    """
+    Helper function to create overlapped operations correctly.
+    This handles the underlying operation creation and registration to avoid device_id issues.
+    """
+    # Get the operations from the schedule
+    op1 = schedule.ops[(batch_id1, stage_id, type1)]
+    op2 = schedule.ops[(batch_id2, stage_id, type2)]
+    # Create the overlapped operation
+    overlapped_op = OverlappedOperation([op1, op2])
+    # Register in the schedule to ensure proper tracking
+    schedule.register_overlapped_operation(overlapped_op)
+    return overlapped_op
+def generate_dualpipe_schedule(config: ScheduleConfig):
+    """
+    Implements the DualPipe scheduling strategy.
+    DualPipe is a bidirectional pipeline parallelism algorithm that achieves full overlap of forward
+    and backward computation-communication phases and reduces pipeline bubbles.
+    The DualPipe strategy has the following characteristics:
+    1. Requires placement_strategy="dualpipe" in ScheduleConfig (set automatically)
+    2. Each device handles both a forward stage and a reverse stage
+    3. Overlaps forward and backward operations to reduce bubble size
+    4. Assumes config.num_batches corresponds to half the total microbatches in original paper (M).
+    5. Currently only supports split_backward=True.
+    Args:
+        config: The scheduling configuration
+    Returns:
+        A Schedule object with the DualPipe scheduling
+    """
+    # Ensure placement strategy is set for Schedule initialization
+    assert config.placement_strategy == "dualpipe", "DualPipe schedule currently only supports placement_strategy='dualpipe'"
+    # Assertions based on DualPipe requirements
+    assert config.num_stages % 2 == 0, "DualPipe requires an even number of stages (and devices)"
+    assert config.num_devices == config.num_stages, "DualPipe requires num_devices == num_stages"
+    assert config.num_batches % 2 == 0, "DualPipe requires an even number of microbatches (config.num_batches)"
+    # Assertion based on original implementation: num_chunks >= num_ranks * 2
+    # Here, M (config.num_batches) corresponds to half_num_chunks
+    assert config.num_batches >= config.num_devices, "DualPipe requires config.num_batches >= config.num_devices"
+    assert config.split_backward, "DualPipe schedule currently only supports split_backward=True"
+    schedule = Schedule(config, init_ops=False)
+    num_stages = config.num_stages
+    num_devices = config.num_devices
+    # config.num_batches is M in the original paper, which corresponds to half_num_chunks
+    half_num_chunks = config.num_batches // 2
+    num_half_ranks = num_devices // 2
+    fwd_batch_ids = defaultdict(int) # (device_id, phase) -> batch_id
+    bwd_d_batch_ids = defaultdict(int) # (device_id, phase) -> batch_id
+    waited_weight_grad = [deque() for _ in range(num_devices)] # (device_id, ) -> List[(stage_id, batch_id)]
+    for device_id in range(num_devices):
+        is_in_second_half = device_id >= num_half_ranks
+        if is_in_second_half:
+            fwd_batch_ids[device_id, 1] = 0
+            fwd_batch_ids[device_id, 0] = config.num_batches // 2
+            bwd_d_batch_ids[device_id, 1] = 0
+            bwd_d_batch_ids[device_id, 0] = config.num_batches // 2
+        else:
+            fwd_batch_ids[device_id, 0] = 0
+            fwd_batch_ids[device_id, 1] = config.num_batches // 2
+            bwd_d_batch_ids[device_id, 0] = 0
+            bwd_d_batch_ids[device_id, 1] = config.num_batches // 2
+    def get_stage_for_phase(device_id, phase, num_stages, is_in_second_half):
+        stage_fwd_dir = device_id # Stage handled when moving forward (0 to N-1)
+        stage_rev_dir = num_stages - 1 - device_id # Stage handled when moving backward (N-1 to 0)
+        if not is_in_second_half:
+            # First half: phase 0 -> fwd_dir, phase 1 -> rev_dir
+            return stage_fwd_dir if phase == 0 else stage_rev_dir
+        else:
+            # Second half: phase 0 -> rev_dir, phase 1 -> fwd_dir
+            return stage_rev_dir if phase == 0 else stage_fwd_dir
+    def add_op_to_queue(device_id, stage_id, op_type, batch_id):
+        # Retrieve the correct pre-initialized Operation object
+        op = Operation(batch_id, stage_id, op_type)
+        schedule.register_operation(op)
+        # Add to the device queue
+        schedule.device_queues[device_id].add_operation(op)
+    def _schedule_forward_chunk(device_id, phase, is_in_second_half):
+        """Schedules a forward compute operation."""
+        stage_id = get_stage_for_phase(device_id, phase, num_stages, is_in_second_half)
+        batch_id = fwd_batch_ids[device_id, phase]
+        add_op_to_queue(device_id, stage_id, "forward", batch_id)
+        fwd_batch_ids[device_id, phase] += 1
+    def _schedule_backward_chunk(device_id, phase, is_in_second_half):
+        """Schedules a backward_D with backward_W compute operation."""
+        stage_id = get_stage_for_phase(device_id, phase, num_stages, is_in_second_half)
+        batch_id = bwd_d_batch_ids[device_id, phase]
+        add_op_to_queue(device_id, stage_id, "backward", batch_id)
+        bwd_d_batch_ids[device_id, phase] += 1
+    def _schedule_backward_input_chunk(device_id, phase, is_in_second_half):
+        """Schedules a backward_D compute operation."""
+        stage_id = get_stage_for_phase(device_id, phase, num_stages, is_in_second_half)
+        batch_id = bwd_d_batch_ids[device_id, phase]
+        add_op_to_queue(device_id, stage_id, "backward_D", batch_id)
+        bwd_d_batch_ids[device_id, phase] += 1
+        waited_weight_grad[device_id].append((stage_id, batch_id))
+    def _schedule_backward_weight_chunk(device_id):
+        """Schedules a backward_W compute operation."""
+        stage_id, batch_id = waited_weight_grad[device_id].popleft()
+        add_op_to_queue(device_id, stage_id, "backward_W", batch_id)
+    def _schedule_forward_backward_chunk(device_id, fwd_phase, bwd_phase, is_in_second_half):
+        """Schedules an overlapped forward and backward_D compute operation."""
+        fwd_stage_id = get_stage_for_phase(device_id, fwd_phase, num_stages, is_in_second_half)
+        bwd_stage_id = get_stage_for_phase(device_id, bwd_phase, num_stages, is_in_second_half)
+        fwd_batch_id = fwd_batch_ids[device_id, fwd_phase]
+        fwd_op = Operation(fwd_batch_id, fwd_stage_id, "forward")
+        schedule.register_operation(fwd_op)
+        fwd_batch_ids[device_id, fwd_phase] += 1
+        bwd_batch_id_d = bwd_d_batch_ids[device_id, bwd_phase]
+        bwd_op = Operation(bwd_batch_id_d, bwd_stage_id, "backward")
+        schedule.register_operation(bwd_op)
+        bwd_d_batch_ids[device_id, bwd_phase] += 1
+        # Create and register the overlapped operation
+        overlapped_op = OverlappedOperation([fwd_op, bwd_op])
+        schedule.register_overlapped_operation(overlapped_op)
+        # Add the overlapped operation to the queue
+        schedule.device_queues[device_id].add_operation(overlapped_op)
+    # Process each device (rank in original code)
+    for device_id in range(num_devices):
+        half_rank = min(device_id, num_devices - 1 - device_id)
+        is_in_second_half = device_id >= num_half_ranks
+        is_middle_rank = (device_id == num_half_ranks - 1) or (device_id == num_half_ranks)
+        # Map original steps to operation additions
+        # Step 1: nF0
+        step_1_count = (num_half_ranks - half_rank - 1) * 2
+        for _ in range(step_1_count):
+            _schedule_forward_chunk(device_id, 0, is_in_second_half) # F0
+        # Step 2: nF0F1
+        step_2_count = half_rank + 1
+        for i in range(step_2_count):
+            _schedule_forward_chunk(device_id, 0, is_in_second_half) # F0
+            _schedule_forward_chunk(device_id, 1, is_in_second_half) # F1
+        # Step 3: nB1W1F1
+        step_3_count = num_half_ranks - half_rank - 1
+        for _ in range(step_3_count):
+            _schedule_backward_input_chunk(device_id, 1, is_in_second_half) # B1_D
+            _schedule_backward_weight_chunk(device_id,)   # W1
+            _schedule_forward_chunk(device_id, 1, is_in_second_half)  # F1
+        # Step 4 (Main step): nF0B1F1B0
+        step_4_count = half_num_chunks - num_devices + half_rank + 1
+        for i in range(step_4_count):
+            # if i == 0 and is_middle_rank:
+                # Schedule F0, B1_D, W1 sequentially for middle ranks on first iteration
+                # _schedule_forward_chunk(device_id, 0, is_in_second_half) # F0
+                # _schedule_backward_chunk(device_id, 1, is_in_second_half)# B1
+                # _schedule_backward_weight_chunk(device_id, 1, is_in_second_half)  # W1
+            # else:
+            # Overlap F0 and B1_D, then schedule W1
+            _schedule_forward_backward_chunk(device_id, 0, 1, is_in_second_half) # F0+B1
+            # Overlap F1 and B0_D, then schedule W0
+            _schedule_forward_backward_chunk(device_id, 1, 0, is_in_second_half) # F1+B0
+        # Step 5: nB1F1B0
+        step_5_count = num_half_ranks - half_rank - 1
+        for _ in range(step_5_count):
+            _schedule_backward_chunk(device_id, 1, is_in_second_half) # B1_D + B1_W
+            _schedule_forward_backward_chunk(device_id, 1, 0, is_in_second_half) # F1+B0
+        # Step 6: nB1B0
+        step_6_count = half_rank + 1
+        enable_zb = False
+        for i in range(step_6_count):
+            if i == step_6_count // 2 and half_rank % 2 == 1:
+                enable_zb = True
+            if enable_zb:
+                _schedule_backward_input_chunk(device_id, 1, is_in_second_half)
+            else:
+                _schedule_backward_chunk(device_id, 1, is_in_second_half)
+            if i == step_6_count // 2 and half_rank % 2 == 0:
+                enable_zb = True
+            if enable_zb:
+                _schedule_backward_input_chunk(device_id, 0, is_in_second_half)
+            else:
+                _schedule_backward_chunk(device_id, 0, is_in_second_half)
+        # Step 7: nWB0
+        step_7_count = num_half_ranks - half_rank - 1
+        for _ in range(step_7_count):
+            _schedule_backward_weight_chunk(device_id)   # W1 (use gradient from B1_D scheduled previously)
+            _schedule_backward_input_chunk(device_id, 0, is_in_second_half) # B0_D
+        # Step 8: nW
+        step_8_count = half_rank + 1
+        for _ in range(step_8_count):
+            # W0 uses gradients from B0_D scheduled in steps 4, 5, 6.
+            # W1 uses gradients from B1_D scheduled in steps 3, 4, 5, 6.
+            # The last W0 gradients correspond to B0_D from step 6 or 7.
+            _schedule_backward_weight_chunk(device_id)   # W0 (use gradient from B0_D scheduled previously)
+    return schedule

src/visualizer.py ADDED Viewed

	@@ -0,0 +1,612 @@

+import dash
+from dash import dcc, html
+from dash.dependencies import Input, Output
+import plotly.graph_objects as go
+from typing import List, Dict
+from tqdm import tqdm
+from functools import lru_cache
+import webbrowser
+from threading import Timer
+from src.execution_model import Schedule, OverlappedOperation
+def convert_schedule_to_visualization_format(schedule: Schedule):
+    """
+    Converts a Schedule object to the format needed for visualization.
+    Returns:
+        Dict[int, List[Dict]]: Dictionary mapping device_id to a list of operation dictionaries
+    """
+    # Make sure all operations have start and end times
+    for op in schedule.ops.values():
+        if op.start_time is None or op.end_time is None:
+            raise ValueError(
+                "Operations must have start and end times. Run ScheduleExecutor.execute() first."
+            )
+    visualization_data = {}
+    # Organize operations by device
+    for device_id, device_queue in enumerate(schedule.device_queues):
+        visualization_data[device_id] = []
+        for op in device_queue.ops:
+            # Handle both regular Operations and OverlappedOperations
+            if isinstance(op, OverlappedOperation):
+                visualization_data[device_id].append(
+                    {
+                        "type": op.op_type,
+                        "batch": op.batch_id + 1,  # +1 because batch_id is 0-indexed
+                        "stage": op.stage_id,
+                        "start_time": op.start_time,
+                        "duration": op.end_time - op.start_time,
+                        "is_overlapped": True,
+                        "operations": [
+                            {
+                                "type": nested_op.op_type,
+                                "batch": nested_op.batch_id + 1,
+                                "stage": nested_op.stage_id
+                            }
+                            for nested_op in op.operations
+                        ]
+                    }
+                )
+            else:
+                visualization_data[device_id].append(
+                    {
+                        "type": op.op_type,
+                        "batch": op.batch_id + 1,  # +1 because batch_id is 0-indexed
+                        "stage": op.stage_id,
+                        "start_time": op.start_time,
+                        "duration": op.end_time - op.start_time,
+                        "is_overlapped": False
+                    }
+                )
+    return visualization_data
+# Cache the color calculation as it's repeatedly called with the same parameters
+@lru_cache(maxsize=128)
+def get_color(op_type: str, stage_id: int, num_devices: int):
+    # A more harmonious blue palette with low saturation and high brightness
+    forward_colors = [
+        "#0a5aff",  # Intense blue
+        "#4c88ff",  # Blue (deeper)
+        "#7aa7ff",  # Medium blue
+        "#a8c5ff",  # Soft blue
+        "#d6e4ff",  # Very light blue
+    ]
+    # Orange palette for backward operations with low saturation and high brightness
+    backward_colors = [
+        "#f47b00",  # Intense orange
+        "#ffa952",  # Orange
+        "#ffc78e",  # Light orange
+        "#ffe6cc",  # Very light orange
+    ]
+    # Improved teal/turquoise palette with low saturation and high brightness
+    backward_d_colors = [
+        "#4dcccc",  # Light teal
+        "#33b3b3",  # Teal
+        "#009999",  # Medium teal
+        "#008080",  # Dark teal
+    ]
+    # Improved green palette with low saturation and high brightness
+    backward_w_colors = [
+        "#33b373",  # Medium green
+        "#009959",  # Forest green
+        "#008040",  # Dark green
+    ]
+    virtual_stage = stage_id // num_devices
+    # If virtual_stage is beyond our color list, cycle through the colors
+    color_index = virtual_stage % len(forward_colors)
+    if op_type == "forward":
+        return forward_colors[color_index]
+    elif op_type == "backward":
+        return backward_colors[color_index % len(backward_colors)]
+    elif op_type == "backward_D":
+        return backward_d_colors[color_index % len(backward_d_colors)]
+    elif op_type == "backward_W":
+        return backward_w_colors[color_index % len(backward_w_colors)]
+    else:
+        raise ValueError(f"Invalid operation type: {op_type}")
+def create_pipeline_figure(
+    schedule_data: Dict[int, List[Dict]], max_time=None, show_progress=True
+):
+    """
+    Create a Plotly figure for pipeline parallelism scheduling.
+    Args:
+        schedule_data: Dictionary mapping device IDs to lists of tasks (converted from Schedule)
+        max_time: Optional maximum time to display
+        show_progress: Whether to show a progress bar
+    """
+    # Find the number of devices
+    num_devices = len(schedule_data)
+    empty_color = "whitesmoke"
+    # Find the maximum time in the schedule if not provided
+    if max_time is None:
+        max_time = 0
+        for device in schedule_data:
+            for task in schedule_data[device]:
+                end_time = task["start_time"] + task["duration"]
+                if end_time > max_time:
+                    max_time = end_time
+    # Determine maximum batch number to decide whether to show text labels
+    max_batch = 0
+    for device in schedule_data:
+        for task in schedule_data[device]:
+            max_batch = max(max_batch, task["batch"])
+    # Flag to determine whether to show text labels
+    num_operations_per_device = len(schedule_data[0])
+    show_text_labels = num_operations_per_device <= 64
+    # Create a figure
+    fig = go.Figure()
+    # Initialize progress tracking
+    total_tasks = sum(len(tasks) for tasks in schedule_data.values())
+    tasks_processed = 0
+    if show_progress:
+        progress_bar = tqdm(
+            total=total_tasks + num_devices + 3, desc="Creating visualization"
+        )
+    # Create a custom y-axis with no gaps between devices
+    y_spacing = 1.0  # Use 1.0 for no gaps
+    # Batch processing for increased performance
+    shapes = []
+    annotations = []
+    hover_traces = []
+    # Add rectangles for each task
+    for device_idx, device in enumerate(schedule_data):
+        device_idx_reversed = num_devices - device_idx - 1
+        # Sort tasks by start time to ensure correct rendering
+        sorted_tasks = sorted(schedule_data[device], key=lambda t: t["start_time"])
+        for task in sorted_tasks:
+            # Calculate y positions with no gaps
+            y_pos = device_idx_reversed * y_spacing
+            start_time = task["start_time"]
+            duration = task["duration"]
+            # Special handling for overlapped operations
+            if task.get("is_overlapped", False) and "operations" in task:
+                # Prepare hover text for the entire overlapped operation
+                op_details = "<br>".join([
+                    f"- {op['type']} (Batch {op['batch']}, Stage {op['stage']})"
+                    for op in task["operations"]
+                ])
+                hover_text = (
+                    f"Overlapped Operations:<br>{op_details}<br>"
+                    f"Start: {task['start_time']:.2f}<br>"
+                    f"End: {task['start_time'] + task['duration']:.2f}<br>"
+                    f"Duration: {task['duration']:.2f}"
+                )
+                # Add invisible marker for hover info
+                hover_traces.append(
+                    dict(
+                        x=[start_time + duration / 2],
+                        y=[y_pos],
+                        mode="markers",
+                        marker=dict(opacity=0),  # Invisible marker
+                        hoverinfo="text",
+                        text=hover_text,
+                        showlegend=False,
+                    )
+                )
+                # Calculate height of each sub-operation
+                sub_height = 1.0 / len(task["operations"])
+                # Add rectangles and annotations for each sub-operation
+                for i, sub_op in enumerate(task["operations"]):
+                    # Determine color for this sub-operation
+                    color = get_color(sub_op["type"], sub_op["stage"], num_devices)
+                    # Calculate y position for this sub-operation
+                    sub_y_pos_bottom = y_pos - 0.5 + (i * sub_height)
+                    sub_y_pos_top = sub_y_pos_bottom + sub_height
+                    sub_y_center = (sub_y_pos_bottom + sub_y_pos_top) / 2
+                    # Add rectangle for this sub-operation
+                    shapes.append(
+                        dict(
+                            type="rect",
+                            x0=start_time,
+                            y0=sub_y_pos_bottom,
+                            x1=start_time + duration,
+                            y1=sub_y_pos_top,
+                            line=dict(color="black", width=0.5),
+                            fillcolor=color,
+                            layer="above",
+                        )
+                    )
+                    # Add batch number text for this sub-operation only if show_text_labels is True
+                    if show_text_labels:
+                        # Determine text color based on background color
+                        if sub_op["type"] in ["backward", "backward_D", "backward_W"]:
+                            text_color = "black"
+                        else:
+                            text_color = "white"
+                        annotations.append(
+                            dict(
+                                x=start_time + duration / 2,
+                                y=sub_y_center,
+                                text=f"{sub_op['batch']}",
+                                showarrow=False,
+                                font=dict(color=text_color, size=12, family="Arial, bold"),
+                            )
+                        )
+            else:
+                # Regular (non-overlapped) operation
+                # Determine task color and text color
+                if task["type"] == "forward":
+                    color = get_color(task["type"], task["stage"], num_devices)
+                    text_color = "white"
+                    name = "Forward"
+                elif task["type"] == "backward":
+                    color = get_color(task["type"], task["stage"], num_devices)
+                    text_color = "black"
+                    name = "Backward"
+                elif task["type"] == "backward_D":
+                    color = get_color(task["type"], task["stage"], num_devices)
+                    text_color = "black"
+                    name = "Backward (Grad)"
+                elif task["type"] == "backward_W":
+                    color = get_color(task["type"], task["stage"], num_devices)
+                    text_color = "black"
+                    name = "Backward (Weight)"
+                else:
+                    color = empty_color
+                    text_color = "black"
+                    name = "Unknown"
+                # Add rectangle for the task
+                shapes.append(
+                    dict(
+                        type="rect",
+                        x0=start_time,
+                        y0=y_pos - 0.5,
+                        x1=start_time + duration,
+                        y1=y_pos + 0.5,
+                        line=dict(color="black", width=0.5),
+                        fillcolor=color,
+                        layer="above",
+                    )
+                )
+                # Add batch number text only if show_text_labels is True
+                if show_text_labels:
+                    annotations.append(
+                        dict(
+                            x=start_time + duration / 2,
+                            y=y_pos,
+                            text=f"{task['batch']}",
+                            showarrow=False,
+                            font=dict(color=text_color, size=12, family="Arial, bold"),
+                        )
+                    )
+                # Prepare hover data
+                hover_text = (
+                    f"Batch: {task['batch']}<br>"
+                    f"Stage: {task['stage']}<br>"
+                    f"Type: {name}<br>"
+                    f"Start: {task['start_time']:.2f}<br>"
+                    f"End: {task['start_time'] + task['duration']:.2f}<br>"
+                    f"Duration: {task['duration']:.2f}"
+                )
+                hover_traces.append(
+                    dict(
+                        x=[start_time + duration / 2],
+                        y=[y_pos],
+                        mode="markers",
+                        marker=dict(opacity=0),  # Invisible marker
+                        hoverinfo="text",
+                        text=hover_text,
+                        showlegend=False,
+                    )
+                )
+            # Update progress
+            if show_progress:
+                tasks_processed += 1
+                progress_bar.update(1)
+    # Add all shapes at once for better performance
+    fig.update_layout(shapes=shapes)
+    # Add all annotations at once
+    fig.update_layout(annotations=annotations)
+    # Add all hover traces at once
+    for trace in hover_traces:
+        fig.add_trace(go.Scatter(**trace))
+    # Add custom legend
+    legend_items = []
+    # Find the maximum virtual stage in the data
+    max_virtual_stage = 0
+    for device in schedule_data:
+        for task in schedule_data[device]:
+            virtual_stage = task["stage"] // num_devices
+            max_virtual_stage = max(max_virtual_stage, virtual_stage)
+    # Check if overlapped operations exist
+    has_overlapped = any(
+        task.get("is_overlapped", False)
+        for device in schedule_data
+        for task in schedule_data[device]
+    )
+    # Add forward and backward items for each virtual stage
+    for vs in range(max_virtual_stage + 1):
+        legend_items.append(
+            dict(
+                name=f"Forward (VS {vs})",
+                color=get_color("forward", vs * num_devices, num_devices),
+            )
+        )
+        legend_items.append(
+            dict(
+                name=f"Backward (VS {vs})",
+                color=get_color("backward", vs * num_devices, num_devices),
+            )
+        )
+        # Add entries for split backward operations if this is a zb1p schedule
+        if any(
+            task["type"] in ["backward_D", "backward_W"]
+            for device in schedule_data
+            for task in schedule_data[device]
+        ):
+            legend_items.append(
+                dict(
+                    name=f"Backward Grad (VS {vs})",
+                    color=get_color("backward_D", vs * num_devices, num_devices),
+                )
+            )
+            legend_items.append(
+                dict(
+                    name=f"Backward Weight (VS {vs})",
+                    color=get_color("backward_W", vs * num_devices, num_devices),
+                )
+            )
+    # If no tasks found, add default legend items
+    if not legend_items:
+        legend_items = [
+            dict(name="Forward (VS 0)", color=get_color("forward", 0, num_devices)),
+            dict(name="Backward (VS 0)", color=get_color("backward", 0, num_devices)),
+            dict(
+                name="Backward Grad (VS 0)",
+                color=get_color("backward_D", 0, num_devices),
+            ),
+            dict(
+                name="Backward Weight (VS 0)",
+                color=get_color("backward_W", 0, num_devices),
+            ),
+        ]
+    for i, item in enumerate(legend_items):
+        fig.add_trace(
+            go.Scatter(
+                x=[None],
+                y=[None],
+                mode="markers",
+                marker=dict(size=10, color=item["color"]),
+                name=item["name"],
+                showlegend=True,
+            )
+        )
+        if show_progress and i < len(legend_items) - 1:
+            progress_bar.update(1)
+    # Set axis properties
+    device_labels = [f"Device {i+1}" for i in range(num_devices)]
+    # Calculate tick positions with no gaps
+    tick_positions = [(num_devices - i - 1) * y_spacing for i in range(num_devices)]
+    # Adjust the range to ensure there are no empty spaces at the end
+    x_end = max_time * 1.05  # Add a small margin
+    title_text = "Pipeline Parallelism Schedule"
+    fig.update_layout(
+        yaxis=dict(
+            tickmode="array",
+            tickvals=tick_positions,
+            ticktext=device_labels,
+            showgrid=False,
+            zeroline=False,
+        ),
+        margin=dict(l=50, r=20, t=40, b=40),
+        plot_bgcolor="white",
+        title=dict(
+            text=title_text,
+            x=0.5,
+            y=0.98,  # Move title position closer to the top
+            font=dict(size=20),
+        ),
+        legend=dict(
+            orientation="v",  # Changed from horizontal to vertical
+            yanchor="top",
+            y=1.02,  # Position at the top
+            xanchor="right",
+            x=1.20,  # Position further to the right to accommodate more items
+            title=dict(text="<b>Operation Types:</b>"),
+            itemsizing="constant",
+            tracegroupgap=0,
+        ),
+        width=2000,  # Increase width to accommodate the expanded legend
+        height=400,  # Maintain current height
+        bargap=0,
+        bargroupgap=0,
+    )
+    if show_progress:
+        progress_bar.update(1)
+        progress_bar.close()
+    return fig
+# Cache for storing processed schedule data
+_schedule_data_cache = {}
+def create_dash_app(
+    schedule: Schedule, schedule_type="1f1b", enable_caching: bool = True
+):
+    """
+    Create a Dash app to visualize the pipeline schedule.
+    Args:
+        schedule: Schedule object to visualize
+        schedule_type: Type of schedule ("1f1b", "zb1p", or custom description)
+        enable_caching: Whether to cache the schedule data and figure
+    """
+    # Process schedule data only once and cache it
+    global _schedule_data_cache
+    cache_key = id(schedule)
+    if enable_caching and cache_key in _schedule_data_cache:
+        schedule_data = _schedule_data_cache[cache_key]
+        print("Using cached schedule data")
+    else:
+        schedule_data = convert_schedule_to_visualization_format(schedule)
+        if enable_caching:
+            _schedule_data_cache[cache_key] = schedule_data
+            print("Cached schedule data")
+    total_tasks = sum(len(tasks) for tasks in schedule_data.values())
+    print(f"Total tasks in schedule: {total_tasks}")
+    app = dash.Dash(__name__)
+    app.title = f"Pipeline Parallelism Visualization - {schedule_type}"
+    # Create a more informative layout with data size information
+    app.layout = html.Div(
+        [
+            html.H1(
+                f"Pipeline Parallelism Visualization - {schedule_type}",
+                style={"textAlign": "center"},
+            ),
+            html.Div(
+                [
+                    html.P(
+                        f"Number of devices: {len(schedule_data)}",
+                        style={"display": "inline-block", "marginRight": "20px"},
+                    ),
+                    html.P(
+                        f"Total tasks: {total_tasks}",
+                        style={"display": "inline-block", "marginRight": "20px"},
+                    ),
+                ],
+                style={"marginBottom": "20px"},
+            ),
+            html.Div(id="graph-container", children=[]),
+            dcc.Loading(
+                id="loading-graph",
+                type="circle",
+                children=[
+                    dcc.Graph(
+                        id="pipeline-graph",
+                        config={
+                            "displayModeBar": True,
+                            "toImageButtonOptions": {
+                                "format": "png",
+                                "filename": "pipeline_visualization",
+                            },
+                        },
+                    ),
+                ],
+            ),
+        ]
+    )
+    # Cache for storing figure to avoid regenerating it
+    figure_cache = {}
+    @app.callback(
+        Output("pipeline-graph", "figure"),
+        Input("graph-container", "children"),
+        prevent_initial_call=False,
+    )
+    def load_graph(_):
+        # Use cached figure if available
+        cache_key = f"{id(schedule)}"
+        if enable_caching and cache_key in figure_cache:
+            print("Using cached figure")
+            return figure_cache[cache_key]
+        # Create the figure
+        figure = create_pipeline_figure(schedule_data, show_progress=True)
+        # Cache the figure
+        if enable_caching:
+            figure_cache[cache_key] = figure
+            print("Cached figure")
+        return figure
+    return app
+def visualize_pipeline_parallelism_dash(
+    schedule: Schedule,
+    port: int = 8050,
+    debug: bool = False,
+    enable_caching: bool = True,
+    schedule_type="1f1b",
+    open_browser: bool = True,
+):
+    """
+    Launch a Dash app to visualize the pipeline schedule interactively.
+    Args:
+        schedule: Schedule object to visualize
+        port: Port to run the Dash app on
+        debug: Whether to run the Dash app in debug mode
+        enable_caching: Whether to cache schedule data and figures
+        schedule_type: Type of schedule ("1f1b", "zb1p", or custom description)
+        open_browser: Whether to automatically open a browser window
+    """
+    app = create_dash_app(
+        schedule, schedule_type=schedule_type, enable_caching=enable_caching
+    )
+    # Define function to open browser after a short delay
+    def open_browser_tab():
+        webbrowser.open_new_tab(f"http://localhost:{port}/")
+    # Open browser automatically if requested
+    if open_browser:
+        # Use a timer to open the browser after the server has started
+        Timer(1.0, open_browser_tab).start()
+    print(f"Starting Dash app on http://localhost:{port}/")
+    app.run_server(debug=debug, port=port)