How does it differ from FLUX.1-dev?

FLUX.1-dev generates images from text prompts only. FLUX.1-Kontext-dev takes both an input image and a text prompt, allowing you to modify, augment, or transform existing images while maintaining character consistency and style.

What are the hardware requirements?

FLUX.1-Kontext-dev requires significant GPU memory, typically 16GB+ VRAM. It’s optimized for NVIDIA RTX GPUs. The model can use CPU offloading to reduce VRAM requirements.

What image dimensions does it support?

The model requires dimensions to be multiples of 16. It may automatically adjust dimensions to meet its requirements. The tool handles this by rounding dimensions and resizing output back to requested sizes.

Can I use it commercially?

FLUX.1-Kontext-dev is released under a non-commercial license. For commercial use, you need to obtain a commercial license from Black Forest Labs.

What can I use it for?

Common use cases include character consistency across scenes, local image editing, style transfer, removing unwanted elements, adding elements to images, and transforming images based on text descriptions.

FLUX.1-Kontext-dev: Image Augmentation AI Model

Q: What is FLUX.1-Kontext-dev?

FLUX.1-Kontext-dev is an AI model by Black Forest Labs that enables image-to-image generation. Unlike text-to-image models, it takes an existing image and a text prompt to generate augmented versions while preserving key elements of the original.

AI model for augmenting images with text instructions

Page content

Black Forest Labs has released FLUX.1-Kontext-dev, an advanced image-to-image AI model that augments existing images using text instructions.

Unlike FLUX.1-dev which generates images from text alone, FLUX.1-Kontext-dev takes both an input image and a text prompt to create modified versions while preserving key elements.

gopher on a bycicle This image demonstrates FLUX.1-Kontext-dev’s ability to augment images.

The original Go mascot image:

Gopher go logo

was transformed with the instruction this gopher rides on the bicycle on the hilly road. A decent result, isn’t it?

What is FLUX.1-Kontext-dev?

FLUX.1-Kontext-dev is designed for in-context image generation and editing. Key features include:

Character Consistency: Preserves unique elements (like characters or objects) across multiple scenes
Local Editing: Modifies specific parts of an image without affecting the rest
Style Reference: Generates new scenes while maintaining styles from reference images
Image Augmentation: Transforms images based on text instructions

Installing

Prerequisites

You’ll need:

16GB+ VRAM on your GPU (NVIDIA RTX recommended)
Python 3.8+ with pip
Access to Hugging Face (account and token)

Setup Steps

Create a Hugging Face account at huggingface.co if you don’t have one
Visit the model page: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
Accept the license agreement (non-commercial use)
Create a Write access token at https://huggingface.co/settings/tokens
Download the model:

git clone https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev

Or use the model path directly in your code.

Installation

Install required Python packages:

pip install -U diffusers torch transformers pillow accelerate sentencepiece

Or using uv:

cd tools/fkon
uv sync

Usage

Basic Python Script

Here’s a complete example using FLUX.1-Kontext-dev:

import torch
from diffusers import FluxKontextPipeline
from PIL import Image

# Load the model
model_path = "/path/to/FLUX.1-Kontext-dev"
pipe = FluxKontextPipeline.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16
)

# Enable CPU offloading to save VRAM
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

# Load your input image
input_image = Image.open("path/to/your/image.png").convert("RGB")

# Define your augmentation prompt
prompt = "this gopher rides on the bicycle on the hilly road"

# Generate augmented image
result = pipe(
    prompt=prompt,
    image=input_image,
    height=496,
    width=680,
    guidance_scale=3.5,
    num_inference_steps=60,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(42)
)

# Save the result
output_image = result.images[0]
output_image.save("augmented_image.jpg")

Dimension Handling

FLUX.1-Kontext-dev has specific dimension requirements:

Multiples of 16: Dimensions should be multiples of 16
Automatic adjustment: The model may adjust dimensions to meet its requirements
Output resizing: Our tool automatically resizes output back to requested dimensions

The tool handles this by:

Rounding requested dimensions to multiples of 16
Resizing the input image to the rounded dimensions
Generating the image (model may adjust further)
Resizing the output back to your requested dimensions

Example Use Cases

Character Transformation

Transform a character while maintaining consistency:

prompt = "this gopher rides on the bicycle on the hilly road"

Object Removal

Remove unwanted elements:

prompt = "please remove the human dressed as minnie mouse from this photo"

Tips and Best Practices

VRAM Management: Use enable_model_cpu_offload() if you have limited VRAM
Dimension Planning: Request dimensions that are multiples of 16 to minimize adjustments
Prompt Clarity: Be specific in your text instructions for better results
Batch Generation: Generate multiple variations (--n 4) to get the best result
Seed Control: Use manual seeds for reproducible results

Limitations

Non-commercial license: Requires commercial license for business use
Hardware intensive: Needs powerful GPU with significant VRAM
Dimension constraints: May adjust dimensions automatically
Processing time: Can take 10-15 minutes per image depending on hardware