FLUX.1-Kontext-dev: Image Augmentation AI Model
AI model for augmenting images with text instructions
Black Forest Labs has released FLUX.1-Kontext-dev, an advanced image-to-image AI model that augments existing images using text instructions.
Unlike FLUX.1-dev which generates images from text alone, FLUX.1-Kontext-dev takes both an input image and a text prompt to create modified versions while preserving key elements.
This image demonstrates FLUX.1-Kontext-dev’s ability to augment images.
The original Go mascot image:

was transformed with the instruction this gopher rides on the bicycle on the hilly road. A decent result, isn’t it?
What is FLUX.1-Kontext-dev?
FLUX.1-Kontext-dev is designed for in-context image generation and editing. Key features include:
- Character Consistency: Preserves unique elements (like characters or objects) across multiple scenes
- Local Editing: Modifies specific parts of an image without affecting the rest
- Style Reference: Generates new scenes while maintaining styles from reference images
- Image Augmentation: Transforms images based on text instructions
Installing
Prerequisites
You’ll need:
- 16GB+ VRAM on your GPU (NVIDIA RTX recommended)
- Python 3.8+ with pip
- Access to Hugging Face (account and token)
Setup Steps
-
Create a Hugging Face account at huggingface.co if you don’t have one
-
Visit the model page: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
-
Accept the license agreement (non-commercial use)
-
Create a Write access token at https://huggingface.co/settings/tokens
-
Download the model:
git clone https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
Or use the model path directly in your code.
Installation
Install required Python packages:
pip install -U diffusers torch transformers pillow accelerate sentencepiece
Or using uv:
cd tools/fkon
uv sync
Usage
Basic Python Script
Here’s a complete example using FLUX.1-Kontext-dev:
import torch
from diffusers import FluxKontextPipeline
from PIL import Image
# Load the model
model_path = "/path/to/FLUX.1-Kontext-dev"
pipe = FluxKontextPipeline.from_pretrained(
model_path,
torch_dtype=torch.bfloat16
)
# Enable CPU offloading to save VRAM
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
# Load your input image
input_image = Image.open("path/to/your/image.png").convert("RGB")
# Define your augmentation prompt
prompt = "this gopher rides on the bicycle on the hilly road"
# Generate augmented image
result = pipe(
prompt=prompt,
image=input_image,
height=496,
width=680,
guidance_scale=3.5,
num_inference_steps=60,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(42)
)
# Save the result
output_image = result.images[0]
output_image.save("augmented_image.jpg")
Dimension Handling
FLUX.1-Kontext-dev has specific dimension requirements:
- Multiples of 16: Dimensions should be multiples of 16
- Automatic adjustment: The model may adjust dimensions to meet its requirements
- Output resizing: Our tool automatically resizes output back to requested dimensions
The tool handles this by:
- Rounding requested dimensions to multiples of 16
- Resizing the input image to the rounded dimensions
- Generating the image (model may adjust further)
- Resizing the output back to your requested dimensions
Example Use Cases
- Character Transformation
Transform a character while maintaining consistency:
prompt = "this gopher rides on the bicycle on the hilly road"
- Object Removal
Remove unwanted elements:
prompt = "please remove the human dressed as minnie mouse from this photo"
Tips and Best Practices
- VRAM Management: Use
enable_model_cpu_offload()if you have limited VRAM - Dimension Planning: Request dimensions that are multiples of 16 to minimize adjustments
- Prompt Clarity: Be specific in your text instructions for better results
- Batch Generation: Generate multiple variations (
--n 4) to get the best result - Seed Control: Use manual seeds for reproducible results
Limitations
- Non-commercial license: Requires commercial license for business use
- Hardware intensive: Needs powerful GPU with significant VRAM
- Dimension constraints: May adjust dimensions automatically
- Processing time: Can take 10-15 minutes per image depending on hardware