How to Containerise FramePack with CUDA 12.1 (and Fix Every Gotcha)
FramePack is an impressive video frame-packing toolbox, but its current repository targets an old pre-release of PyTorch 2.2. If you naïvely docker build, you hit four classic road-blocks:
1 - NumPy/SciPy compile for ages, disk fills up --no-binary=:all: hidden in requirements
2 - operator torchvision::nms does not exist, |torch ≈ 2.2.1 ✕ torchvision 0.17.1 mismatch
3 - cudnn_sdp_enabled() missing, API renamed to flash_sdp_enabled() in PyTorch 2.2
4 - Gradio dependency hell, repo pins gradio==5.23.0 (conflicts with FastAPI ≥ 0.110)
Below is a step-by-step recipe that solves them all and produces an image that
builds in < 8 min, weighs ≈ 3 GB and boots straight into a Gradio UI on port 7860.
Final Dockerfile
FROM nvidia/cuda:12.1.1-base-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive \
PYTHONUNBUFFERED=1 \
UV_TORCH_BACKEND=cu121 \
GRADIO_SERVER_NAME=0.0.0.0
RUN apt-get update && apt-get install -y --no-install-recommends \
git python3-pip build-essential \
libgl1 libglib2.0-0 libsm6 libxrender1 libxext6 \
openssl libssl-dev ca-certificates \
&& rm -rf /var/lib/apt/lists/*
RUN python3 -m pip install -U pip wheel uv
ENV TORCH_VER=2.2.1 \
TVISION_VER=0.17.1 \
TAUDIO_VER=2.2.1
RUN uv pip install --system --no-cache-dir \
--index-url https://download.pytorch.org/whl/cu121 \
torch==${TORCH_VER}+cu121 \
torchvision==${TVISION_VER}+cu121 \
torchaudio==${TAUDIO_VER}+cu121 \
xformers
RUN pip install --no-cache-dir \
numpy==1.26.4 \
"scipy>=1.12,<1.14"
WORKDIR /opt
RUN git clone --depth 1 https://github.com/lllyasviel/FramePack.git
WORKDIR /opt/FramePack
RUN sed -i 's/gradio==5\.23\.0/gradio>=5.25.2,<6/' requirements.txt \
&& pip install --no-cache-dir -r requirements.txt
RUN sed -i 's/cudnn_sdp_enabled/flash_sdp_enabled/' \
diffusers_helper/models/hunyuan_video_packed.py
EXPOSE 7860
CMD ["python3", "demo_gradio.py"]
Why Each Step Exists
Step 1 – OS layer
Only the essentials: compiler tool-chain, OpenGL/X11 libs for torchvision, and OpenSSL so pip can fetch wheels over HTTPS.
Step 2 – Python toolchain
uv downloads large wheels (torch, xformers…) in parallel and installs into the system (--system) to keep the image tidy.
Step 3 – PyTorch stack
Pin the trio torch / torchvision / torchaudio to exactly the same CUDA tag (+cu121) so C++ operators such as nms load correctly.
Step 4 – Scientific wheels
Remove --no-binary; the official manylinux wheels work fine on NVIDIA images and install in seconds instead of compiling SciPy for 30 minutes.
Step 5 – Clone FramePack
--depth 1 keeps the context small.
Step 6 – Gradio upgrade
FramePack’s old pin on gradio==5.23.0 conflicts with fastapi>=0.110. A one-liner sed raises the lower bound to 5.25 and pip resolves the rest.
Step 7 – Patch the renamed API
Rather than juggling sitecustomize.py, we surgically replace the obsolete cudnn_sdp_enabled() call by its new nameflash sdp enabled() directly in the source. Zero runtime overhead, zero import‐ordering issues.
Step 8 – Entrypoint
Gradio listens on 0.0.0.0:7860; RunPod forwards that port to the outside world.
Building & Running on RunPod
Clone this repo or copy the Dockerfile, then:
docker build -t framepack-cu121 .
On RunPod, choose “Custom Container”:
image: your-registry/framepack-cu121:latest
port : 7860
GPU : any card with CUDA 12.1 (L4, A10, A100…)
On launch you should see:
Torch 2.2.1+cu121 | CUDA 12.1 OK Running on http://0.0.0.0:7860
Open the provided URL and start experimenting with FramePack’s video models.
Conclusion
By pinning compatible CUDA wheels, avoiding unnecessary compilations, and directly patching a one-line API change, we reduced a **40 min / 15 GB** build to a slick **8 min / ≃ 3 GB** image.
Use the template above as a blueprint for any PyTorch + CUDA project that ships with _just-old-enough_ dependencies : it will save you hours of debugging and gigabytes of disk space.
Happy hacking!