FLUX.1-Kontext-dev: Image Augmentation AI Model

AI model for augmenting images with text instructions

Page content

Black Forest Labs has released FLUX.1-Kontext-dev, an advanced image-to-image AI model that augments existing images using text instructions.

Unlike FLUX.1-dev which generates images from text alone, FLUX.1-Kontext-dev takes both an input image and a text prompt to create modified versions while preserving key elements.

gopher on a bycicle This image demonstrates FLUX.1-Kontext-dev’s ability to augment images.

The original Go mascot image:

Gopher go logo

was transformed with the instruction this gopher rides on the bicycle on the hilly road. A decent result, isn’t it?

What is FLUX.1-Kontext-dev?

FLUX.1-Kontext-dev is designed for in-context image generation and editing. Key features include:

  • Character Consistency: Preserves unique elements (like characters or objects) across multiple scenes
  • Local Editing: Modifies specific parts of an image without affecting the rest
  • Style Reference: Generates new scenes while maintaining styles from reference images
  • Image Augmentation: Transforms images based on text instructions

Installing

Prerequisites

You’ll need:

  • 16GB+ VRAM on your GPU (NVIDIA RTX recommended)
  • Python 3.8+ with pip
  • Access to Hugging Face (account and token)

Setup Steps

  1. Create a Hugging Face account at huggingface.co if you don’t have one

  2. Visit the model page: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev

  3. Accept the license agreement (non-commercial use)

  4. Create a Write access token at https://huggingface.co/settings/tokens

  5. Download the model:

git clone https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev

Or use the model path directly in your code.

Installation

Install required Python packages:

pip install -U diffusers torch transformers pillow accelerate sentencepiece

Or using uv:

cd tools/fkon
uv sync

Usage

Basic Python Script

Here’s a complete example using FLUX.1-Kontext-dev:

import torch
from diffusers import FluxKontextPipeline
from PIL import Image

# Load the model
model_path = "/path/to/FLUX.1-Kontext-dev"
pipe = FluxKontextPipeline.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16
)

# Enable CPU offloading to save VRAM
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

# Load your input image
input_image = Image.open("path/to/your/image.png").convert("RGB")

# Define your augmentation prompt
prompt = "this gopher rides on the bicycle on the hilly road"

# Generate augmented image
result = pipe(
    prompt=prompt,
    image=input_image,
    height=496,
    width=680,
    guidance_scale=3.5,
    num_inference_steps=60,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(42)
)

# Save the result
output_image = result.images[0]
output_image.save("augmented_image.jpg")

Dimension Handling

FLUX.1-Kontext-dev has specific dimension requirements:

  • Multiples of 16: Dimensions should be multiples of 16
  • Automatic adjustment: The model may adjust dimensions to meet its requirements
  • Output resizing: Our tool automatically resizes output back to requested dimensions

The tool handles this by:

  1. Rounding requested dimensions to multiples of 16
  2. Resizing the input image to the rounded dimensions
  3. Generating the image (model may adjust further)
  4. Resizing the output back to your requested dimensions

Example Use Cases

  1. Character Transformation

Transform a character while maintaining consistency:

prompt = "this gopher rides on the bicycle on the hilly road"
  1. Object Removal

Remove unwanted elements:

prompt = "please remove the human dressed as minnie mouse from this photo"

Tips and Best Practices

  1. VRAM Management: Use enable_model_cpu_offload() if you have limited VRAM
  2. Dimension Planning: Request dimensions that are multiples of 16 to minimize adjustments
  3. Prompt Clarity: Be specific in your text instructions for better results
  4. Batch Generation: Generate multiple variations (--n 4) to get the best result
  5. Seed Control: Use manual seeds for reproducible results

Limitations

  • Non-commercial license: Requires commercial license for business use
  • Hardware intensive: Needs powerful GPU with significant VRAM
  • Dimension constraints: May adjust dimensions automatically
  • Processing time: Can take 10-15 minutes per image depending on hardware