sebastiankamph

Turn Images2Video with SVD 1.1

Added 2024-02-13 13:10:11 +0000 UTC

SVD Image2Video. Stable Video Diffusion subscriber guide.

In this guide we’ll be focusing on getting Stable Video Diffusion 1.1 to work in ComfyUI. This guide is accompanied by the Youtube video found here: https://youtu.be/ue1qHBvlurA

If you need help installing ComfyUI, I recommend this video: https://youtu.be/KTPLOqAMR0s

1. DOWNLOADING AND INSTALLING STABLE VIDEO DIFFUSION 1.1

For Stable Video Diffusion 1.1 locally download the model from this link.

Press the download button marked above.

From the root folder of your ComfyUI installation folder, place the downloaded SVD 1.1 model in the following folder:

ComfyUI/models/checkpoints

2. SETTING UP STABLE VIDEO DIFFUSION 1.1 IN COMFYUI

Comfy has released a base workflow to get you started with SVD, I recommend first testing this workflow before you delve deeper.

Download here: https://comfyanonymous.github.io/ComfyUI_examples/video/workflow_image_to_video.json

Drag & drop this into your ComfyUI:

Starting at the top left node (Image Only Checkpoint Loader), select svd_xt_1_1.safetensors from the dropdown menu:

Drag & drop an image into the load image box. For best results, use 1024x576 or aspect ratio 16:9. This is the same aspect ratio as 1920x1080:

VideoLinearCFGGuidance: This node will gradually change the CFG through the video. The video will start at this CFG level and then change towards the CFG set in the ksampler. This will help with the video quality. Leave default if unsure.

Width & height: The size of your output video. 1024x576 is the default size trained for this model, but other sizes can be used also.

video_frames: This tells you how many frames, or images, are used to make the video. For best results, use the same as your model, ie. 25

motion_bucket_id: Higher numbers mean more movement in the video. SVD 1.1 was trained on a motion bucket value of 127 so this is a good starting number. However, don’t be afraid to play around with this number as aside from the seed, this number can influence your resulting animation the most.

fps: Frames per second - higher numbers make the video smoother. Good default values are 6, 12, 25. But for SVD 1.1 it has specifically been trained on 6, so keep it default 6 for now.

augmentation level: This scales how much your video will change from your starting image. It’s actually how much noise is input to give stable diffusion more to generate from. Higher value = more change. Lower value = less change. Leave this value low or your image will start breaking.

In the KSampler node, the generation gets done.

Seed: Which seed, or starting noise, is used to generate. If you want to test settings you can set seed at fixed to see your changes. If you want a new generation each time, make sure it’s set to randomize.

Steps: The amount of sampling steps for each frame. 20-50 is a good value here. I usually use 20-30

Cfg: This correlates to the VideoLinearCFGGuidance in previous node. This is the ending cfg, leave default if unsure.

Sampler_name: Your preferred sampler. You can test various, for example euler, euler_e, dpm++ 2m karras. The sampler is the tool that will create your images. Euler has been tested to work very well with SVD.

Denoise: In a similar way to augmentation level, this will change how much is changed from your initial image. Leave at default 1.

Your finished video will be saved as a webp or gif. I recommend loading the VHS_VideoCombine node (requires custom install from manager) which will give you more format options.

Double click and find the VideoCombine node.

If you need to install VHS, find it from the manager and install VideoHelperSuite.

Click on the format field for other options, such as h.264/mp4 or webm.

Frame_rate: Determines the final frame rate (or fps) of the resulting video

Loop_count: How many times you want the animation to repeat within the resulting video.

Filename_prefix: Determines the name prefix of the resulting video. You can also use a backslash (\) to create subfolders (e.g. Cyberpunk\Video_) within your ComfyUI output folder.

Format: the type of file that you’re saving your animation to. The choice video/h264-mp4 covers most use-cases.

Pingpong: Toggle to true if you want your animation to loop back in reverse once finished.

Save_image: Whether or not files are automatically saved to the output folder. If set to false, you can save the animation by right-clicking on the animation and clicking “save preview”.

Crf: Constant rate factor, which determines the quality and file size of the output. Lower numbers are better. A good default value is around 18-20. 0 means lossless.

Save_metadata: Determine whether the ComfyUI information will be embedded into the animation file. If set to true, you can drag and drop the animation file into ComfyUI to call up the workflow and settings used.

Audio_file: Specify a file with which to add audio to your animation.