How to Containerise FramePack with CUDA 12.1 (and Fix Every Gotcha)

FramePack is an impressive video frame-packing toolbox, but its current repository targets an old pre-release of PyTorch 2.2. If you naïvely docker build, you hit four classic road-blocks:

1 - NumPy/SciPy compile for ages, disk fills up --no-binary=:all: hidden in requirements

2 - operator torchvision::nms does not exist, |torch ≈ 2.2.1 ✕ torchvision 0.17.1 mismatch

3 - cudnn_sdp_enabled() missing, API renamed to flash_sdp_enabled() in PyTorch 2.2

4 - Gradio dependency hell, repo pins gradio==5.23.0 (conflicts with FastAPI ≥ 0.110)

Below is a step-by-step recipe that solves them all and produces an image that

builds in < 8 min, weighs ≈ 3 GB and boots straight into a Gradio UI on port 7860.

Final Dockerfile

FROM nvidia/cuda:12.1.1-base-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONUNBUFFERED=1 \
    UV_TORCH_BACKEND=cu121 \
    GRADIO_SERVER_NAME=0.0.0.0

RUN apt-get update && apt-get install -y --no-install-recommends \
        git python3-pip build-essential \
        libgl1 libglib2.0-0 libsm6 libxrender1 libxext6 \
        openssl libssl-dev ca-certificates \
    && rm -rf /var/lib/apt/lists/*

RUN python3 -m pip install -U pip wheel uv

ENV TORCH_VER=2.2.1 \
    TVISION_VER=0.17.1 \
    TAUDIO_VER=2.2.1

RUN uv pip install --system --no-cache-dir \
        --index-url https://download.pytorch.org/whl/cu121 \
        torch==${TORCH_VER}+cu121 \
        torchvision==${TVISION_VER}+cu121 \
        torchaudio==${TAUDIO_VER}+cu121 \
        xformers

RUN pip install --no-cache-dir \
        numpy==1.26.4 \
        "scipy>=1.12,<1.14"

WORKDIR /opt
RUN git clone --depth 1 https://github.com/lllyasviel/FramePack.git
WORKDIR /opt/FramePack

RUN sed -i 's/gradio==5\.23\.0/gradio>=5.25.2,<6/' requirements.txt \
    && pip install --no-cache-dir -r requirements.txt

RUN sed -i 's/cudnn_sdp_enabled/flash_sdp_enabled/' \
        diffusers_helper/models/hunyuan_video_packed.py

EXPOSE 7860
CMD ["python3", "demo_gradio.py"]

Why Each Step Exists

Step 1 – OS layer

Only the essentials: compiler tool-chain, OpenGL/X11 libs for torchvision, and OpenSSL so pip can fetch wheels over HTTPS.

Step 2 – Python toolchain

uv downloads large wheels (torch, xformers…) in parallel and installs into the system (--system) to keep the image tidy.

Step 3 – PyTorch stack

Pin the trio torch / torchvision / torchaudio to exactly the same CUDA tag (+cu121) so C++ operators such as nms load correctly.

Step 4 – Scientific wheels

Remove --no-binary; the official manylinux wheels work fine on NVIDIA images and install in seconds instead of compiling SciPy for 30 minutes.

Step 5 – Clone FramePack

--depth 1 keeps the context small.

Step 6 – Gradio upgrade

FramePack’s old pin on gradio==5.23.0 conflicts with fastapi>=0.110. A one-liner sed raises the lower bound to 5.25 and pip resolves the rest.

Step 7 – Patch the renamed API

Rather than juggling sitecustomize.py, we surgically replace the obsolete cudnn_sdp_enabled() call by its new nameflash sdp enabled() directly in the source. Zero runtime overhead, zero import‐ordering issues.

Step 8 – Entrypoint

Gradio listens on 0.0.0.0:7860; RunPod forwards that port to the outside world.

Building & Running on RunPod

Clone this repo or copy the Dockerfile, then:

docker build -t framepack-cu121 .

On RunPod, choose “Custom Container”:

image: your-registry/framepack-cu121:latest

port : 7860

GPU : any card with CUDA 12.1 (L4, A10, A100…)

On launch you should see:

Torch 2.2.1+cu121 | CUDA 12.1 OK Running on http://0.0.0.0:7860

Open the provided URL and start experimenting with FramePack’s video models.

Conclusion

By pinning compatible CUDA wheels, avoiding unnecessary compilations, and directly patching a one-line API change, we reduced a **40 min / 15 GB** build to a slick **8 min / ≃ 3 GB** image.

Use the template above as a blueprint for any PyTorch + CUDA project that ships with _just-old-enough_ dependencies : it will save you hours of debugging and gigabytes of disk space.

Happy hacking!

Ben the DBA