sebastiankamph

AI Glossary

Added 2024-10-09 08:13:50 +0000 UTC

4x-Ultrasharp - A popular upscaling model that upscales your image 4x times, but can also be used for other sizes.

AI Upscaler - Software that uses AI models to increase image resolution. Unlike traditional methods, AI upscalers can add detail.

AnimateDiff - A tool that creates AI animations. It generates a sequence of coherent frames, allowing for motion in generated scenes. AnimateDiff can create anything from small movements to detailed character animations.

Automatic1111 - One of the first and most popular user interfaces for Stable Diffusion. It's a feature-rich web UI and offers extensive customization options, support for various models and extensions, and a user-friendly interface. Automatic1111 is popular among both beginners and advanced users.

Black Forest Labs - The creators of Flux. BFL is a company working with AI and machine learning research for image generation. A major part of the team was initially a part of Stability AI.

Checkpoint Model - A term to describe an AI model, such as Stable Diffusion or Flux. Historically, models were .ckpt filetypes and called checkpoints. Checkpoint refers to the save point (x steps) when a model was trained, as training progresses in steps.

Civitai - A popular platform for sharing visual AI models and resources. It hosts a wide range of custom models, LoRAs and other resources.

Classifier-Free Guidance (CFG) Scale - A parameter that controls the balance between prompt adherence and creative freedom. Higher CFG values make the output more closely match the prompt, while lower values allow for more variation.

CLIP - Contrastive Language-Image Pretraining (CLIP) is a type of AI that learns by looking at images and their matching text description. This helps models like Stable Diffusion understand text and create images based on it. Some call CLIP a "text encoder," but that's only part of what it does — it also connects text with images.

ComfyUI - A node-based interface for AI models, offering granular control over the generation process. It allows users to create complex workflows by connecting various processing nodes. ComfyUI is favored by advanced users for its flexibility and power.

ControlNet - An extension that allows visual AI models' precise control over pose, shape, or composition using reference images. ControlNet enables highly directed image generation and editing through ControlNet models.

DDIM - Denoising Diffusion Implicit Models (DDIM), a sampling technique for faster image generation. Think of it like taking shortcuts when creating an image, so the AI can make high-quality pictures in fewer steps. It doesn’t follow the usual, slower path and is popular because it works quickly and still makes great images.

Deforum - A tool for creating animations with Stable Diffusion. It allows for keyframing of prompts, settings, and camera movements to produce complex sequences. Deforum is popular for creating dream-like animations and visual effects.

Denoising Strength - A parameter controlling how much an input image is altered in img2img operations. Lower values preserve more of the original, while higher values allow for more drastic changes. It's crucial for balancing fidelity and creativity in image editing.

Descriptive Prompting - Descriptive prompting is when you give an AI detailed and clear instructions about what you want. Instead of vague or general prompts, you provide specific descriptions of objects, settings, colors, styles, and other details to guide the AI in creating the most accurate result.

Diffusion - The core process in Stable Diffusion where random noise is gradually transformed into an image. It involves iteratively denoising data, guided by the learned model and input prompt. This process allows for the controlled generation of complex, high-quality images.

DPM Sampler - A DPM sampler (Denoising Diffusion Probabilistic Model sampler) is a method used by AI to generate images faster and more efficiently. One of my favourites!

Dreambooth - A tool for training Stable Diffusion models with a dataset of input images. It allows users to teach the model new people, concepts or styles in high quality. A more intense process than Lora training.

Embedding - A learned representation of concepts or styles in Stable Diffusion. Embeddings can be trained on specific images or text to introduce new elements into the model's vocabulary. They allow for quick fine-tuning and style control without modifying the entire model. Generally the quickest and lowest quality training.

Euler Sampler - A basic sampling method used in diffusion models. It's known for its speed but may produce lower quality results compared to more advanced methods. Flux default sampler.

Face Restoration - Post-processing techniques to improve the quality of faces in generated images. These methods can enhance details, correct proportions and increase realism.

Fine-Tunes - AI models that have been further trained on specific datasets. This allows the model to specialize in certain styles or subjects. Fine-tuned models can produce more consistent and tailored results for specific use cases.

Flux - Flux is Black Forest Labs' visual AI model. Flux was released in August, 2024 and during that time was one of, if not the highest quality model.

Fooocus - A simplified interface for Stable Diffusion, designed for ease of use. It streamlines the image generation process by automating many settings and offering a clean, intuitive UI. Fooocus is ideal for users who want quick results without deep technical involvement.

Hires.fix - A technique to generate high-resolution images in two stages. It first creates a low-res version, then upscales and refines it with additional diffusion steps. This method can produce detailed, high-quality images while managing memory usage effectively. It was mainly popular for Stable Diffusion 1.5 models that were trained on lower resolution.

HuggingFace - A platform for sharing and collaborating on machine learning models. It hosts numerous AI models, datasets, and tools. HuggingFace has become a central hub for AI researchers and practitioners to access and contribute to the latest developments.

Hypernetwork - A small neural network used to modify the behavior of a larger network. They offer a lightweight method for customizing model outputs.

Img2Img, Image2Image, Image to Image - A technique that uses an existing image as a starting point for generation. It allows for guided modifications, style transfers, or variations of an original image. Img2img is fairly simple in comparison to more advanced tools like ControlNet.

Inpainting - The process of repainting specific parts of an image while keeping the rest unchanged. It's used for selective editing, object removal, or adding new elements to existing images. Inpainting is a key tool for precise image manipulation with AI.

Inference - The actual generation. The time taken for a model to generate an image from a given prompt.

InstantID - A method for quickly adding someone's face into generated images.

IP-adapter - A method for integrating image prompts into the generation process. It allows users to guide the output using reference images alongside text prompts.

KDiffusion - KDiffusion is a technique used in AI image generation models, specifically in diffusion models, to improve how images are created step by step. In a diffusion model, the process starts with a noisy or blurry image, and the AI works to "denoise" it, gradually refining the image until it becomes clear.

Ksampler - KSampler is one of the tools used within the KDiffusion framework. A sampler’s job is to decide how to take those gradual steps when removing the noise and generating the image. The KSampler helps determine the path the model should follow during each step of the image generation process, allowing for better and faster results.

Kohya - A tool for training and fine-tuning AI models like Stable Diffusion and Flux. It offers advanced options for customizing the training process.

LAION-5B - A large dataset of image-text pairs used in training many AI models, including some versions of Stable Diffusion. It contains billions of diverse images with associated captions.

Latent Diffusion - The specific type of diffusion model used in Stable Diffusion. It operates in a compressed latent space rather than pixel space (the one we see), allowing for efficient processing of high-resolution images. Latent diffusion is key to Stable Diffusion's ability to generate detailed images quickly.

Latent Space - A compressed representation of images used by Stable Diffusion for efficient processing. It's a lower-dimensional space where the model performs its operations. Working in latent space allows for faster computation and more effective handling of high-resolution outputs.

LoRA - Low-Rank Adaptation, a technique for fine-tuning AI models. A LoRA is generally used to train a model on a new character or style. LoRAs are popular due to the ease and speed of training as well as being a small separate file independent of the main model.

LyCORIS - Linearized Conditional Regularization Interpolation for Style - An extension of LoRA that allows for more flexible and powerful model customization. It offers additional training options and can capture more complex styles or concepts.

Negative Prompt - Text instructions specifying what should not appear in the generated image.

GGUF - GGUF is a file format used to store AI models in a way that makes them easier and faster for computers to work with. Think of it like a compact version of the AI's brain, optimized so that it runs faster and more efficiently on different devices.

Node - A node is like a building block or tool used in a user interface to create AI workflows. Each node performs a specific function or task. These nodes are connected together to form a workflow, which is a visual way of organizing how an AI model operates.

Noise Schedule - A noise schedule in AI (especially in diffusion models) is like a plan for how much noise or randomness to add to an image at each step during the image generation process.

Outpainting - The process of extending an image beyond its original boundaries.

Prompt - The text input that tells the AI what you want. An example prompt for visual generative AI could be "A cat in a hat is sitting on a table and reading a book titled 'Paw-sitively funny dad jokes'"

Prompt Engineering - The practice of crafting effective prompts to achieve desired outcomes in AI image generation.

Prompt Schedule - A technique where the prompt changes during the generation process. This is often used in animations, as prompts can change per frame.

Quantized Models - Quantized models are like smaller, faster versions of AI models. They use fewer numbers (or less detailed numbers) to do their calculations, which makes them quicker and take up less space. However, they might be a little less accurate compared to the original, larger models.

Regional Prompter - A technique for applying different prompts to specific regions of an image.

Render Time - The total time taken to generate an image, including all processing steps. Also referred to as inference time.

Safetensors - A file format for storing model weights, designed to be more secure and efficient. Replaced the previous .ckpt checkpoint filetype.

Sampling Method/Sampler - The algorithm used to generate images from the noise during the diffusion process. Different sampling methods can affect image quality and speed. Examples are Euler, DDIM, DPM.

Sampling Steps - The number of iterations in the denoising process when generating an image. More steps can lead to higher quality images, but increase generation time.

SD.Next - A fork of the Automatic1111 web UI with additional features and optimizations. It was earlier referred to as Vlad Diffusion.

SDXL - Stable Diffusion XL, a larger, more capable version of Stable Diffusion released in 2023. Compared to the earlier models trained on 512x512px, this was trained on 1024x1024px hence achieving better quality overall.

Seed - A seed is a number that tells the AI how to start its random guessing process. In image generation, if you use the same seed, model, and settings, the AI will produce the same image every time.

Stable Diffusion - An AI model for generating images based on text descriptions. Since its release in 2022, it has become a cornerstone of AI image generation, spawning numerous variants and a vibrant community.

Stable Diffusion versions
1.x Internal versions of SD before the public release.
1.4 The first public release but was quickly replaced by 1.5.
1.5 Trained on 512x512px. The first big popular model and is still active to this day due to huge community support with custom models and fine-tunes.
2.0 Failure at launch, it was never publicly adopted.
2.1 Failure at launch, it was never publicly adopted.
SDXL Trained on 1024x1024px resolution. Great community adoption and still being used to this day.
3.0 Semi-failure at launch. While the model can produce good images, it has huge problems with human anatomy. The community never adopted the model for future fine-tuning.

SwarmUI - A user interface built on top of ComfyUI, adding a better user experience while still maintaining everything ComfyUI has to offer.

T5 - T5 is an AI model that turns every task into a text problem, like answering questions or summarizing stories, by transforming one piece of text into another.

Text2Image, Txt2Img, Text-to-Image - The process of generating images directly from text descriptions. Text2Image is the primary use case of Stable Diffusion, enabling users to create visuals based on text prompts.

Textual Inversion - A technique for teaching Stable Diffusion new concepts or styles with just a few example images. Textual Inversion allows users to introduce custom objects, characters or artistic styles into the model's vocabulary.

Tokenizer - A tokenizer is a tool that breaks down text into smaller pieces, called tokens, so that an AI model can understand and work with it. Tokens can be as small as single characters, parts of words or full words.

Trigger Keyword - A specific word or phrase in a prompt that activates a particular style or trained concept. These are often associated with custom LoRAs, allowing users to activate it.

U-Net - A U-Net is a type of neural network designed for tasks like image processing. Imagine a U-shaped pipe: data goes in one side, gets processed in the middle, and then comes out the other side. Along the way, the network "shrinks" the image, keeping important details, and then "expands" it back to its original size.

UniPC - Unified Predictor-Corrector, an advanced sampling method for diffusion models. It combines the strengths of various sampling techniques to achieve high-quality results with fewer steps, offering a balance between speed and image quality.

User Interface (UI) - UI stands for User Interface. It’s everything you see and interact with when using a computer program or website. Think of buttons, menus, icons, and text boxes—these are all parts of the UI. In image generation, example UIs are: ComfyUI, Automatic1111, Forge, Swarm, Fooocus.

VAE - Variational AutoEncoder, a key component in Stable Diffusion that handles the compression (encoding) of images into a latent space and their decompression (decoding) back into pixels we can see. Different AI models usually have different VAEs.

Workflow - A set of steps and settings used to create specific outputs in generative AI. This term is most used together with ComfyUI where you can save and share your workflows.