AI Product Photography with Flux 2 Dev: What We Learned After 50+ Images
AI Tools

AI Product Photography with Flux 2 Dev: What We Learned After 50+ Images

April 3, 2026
7 min read
Chris Perkles

Product photography is expensive. A shoot with a photographer, location, and model easily costs a full day rate. So we tested: can you place a real physical product into AI-generated lifestyle scenes that are good enough for social media and concept presentations?

The short answer: yes — with caveats you need to understand.

The long answer follows. After 50+ generated images, 4 testing rounds, and an automated 3-agent pipeline, we have a clear picture of what works and what doesn't.

AI-generated product photo — device in foreground on sideboard, yoga scene in backgroundAI-generated product photo — device in foreground on sideboard, yoga scene in background Foreground placement with 3 reference images: product, proportions, and scene all land correctly

The Setup

  • Model: Flux 2 Dev (FP8 mixed precision) with Mistral 3 Small multimodal text encoder
  • Method: Reference latent conditioning via FluxKontextMultiReferenceLatentMethod
  • Hardware: RTX 5090 with 32 GB VRAM, ComfyUI with --lowvram
  • Goal: Place a real product naturally into lifestyle scenes

The Core Technique: Reference Latent Conditioning

The key difference from regular text-to-image: we don't just describe the product in the prompt — we feed actual product photos through the VAE encoder and chain them as latent conditioning. The model then "knows" what the product actually looks like.

LoadImage → FluxKontextImageScale → VAEEncode → ReferenceLatent (chained)
→ FluxKontextMultiReferenceLatentMethod → KSampler

This sounds technical, but the effect is significant: instead of a generic "white tube," the model renders the actual product with correct proportions and details.

The Two-Block Prompt Structure

Every successful prompt has two clearly separated blocks:

Block 1 — Product description: Exact dimensions with human-scale comparisons ("40cm long, about the size of a large thermos"), material/color, distinctive features described subtly, orientation constraints, anti-duplicate rules ("Only ONE device"), and anti-artifact rules ("No dust, no haze").

Block 2 — Scene description: Environment, lighting, person/action, a single specific surface for placement, camera settings (lens, aperture, DOF), and a style anchor ("Photorealistic, editorial lifestyle photography").

This structure is non-negotiable. Mixing both blocks produces inconsistent results.

What Works Reliably: Product in Foreground

Foreground placement with 3-4 reference images is the sweet spot. The model renders the product prominently with accurate shape, proportions, and details. People can be in the background or to the side.

AI-generated lifestyle product photo — device on side table, warm evening mood with couple on sofaAI-generated lifestyle product photo — device on side table, warm evening mood with couple on sofa Another foreground hit: warm light, natural placement, people in background

Optimal foreground settings:

ParameterValue
Reference images3-4 from different angles
Lens in prompt50mm (most natural proportions)
Placement"Prominently visible in the foreground"
SurfaceName ONE specific surface ("sits ON the nightstand")
Steps16-20 (sweet spot)
Samplereuler, simple scheduler
CFG1.0
Aperture in promptf/1.4 – f/1.8

Success rate: roughly 80% usable results. That's extremely good for generative AI.

The Hard Problem: Product in Background

This is where it gets interesting — and frustrating. The product should appear naturally in the background while a person is the main subject. But the entire reference latent conditioning method is designed to reproduce the referenced object. It wants to show the product prominently.

We tested 4 approaches across 36+ images:

ApproachMethodResult
A: Standard + "FAR behind"Product block first, 2 refs, strong distance languageProduct still lands in foreground
B: Scene-firstFull scene description first, product mentioned casually at endBest result. Product appears in background ~40% of the time
C: Aggressive minimizing"MAIN SUBJECT" / "BACKGROUND", "5% of frame"Too aggressive — product either vanishes or becomes unrecognizable blob
D: Casual mentionNo product block at all, product as "just another kitchen item"Mixed — sometimes perfect, sometimes wrong size

Winner: Scene-first (approach B) with just 1 reference image.

The logic: fewer reference images = less "show this object" pressure. And when the scene is described first, the model prioritizes context.

AI-generated product photo — device subtly in background on shelf, mother and child playing in foregroundAI-generated product photo — device subtly in background on shelf, mother and child playing in foreground Background placement with scene-first prompt and 1 reference image — the device blends in naturally

AI-generated product photo — device rendered standing upright in kitchen instead of lying horizontalAI-generated product photo — device rendered standing upright in kitchen instead of lying horizontal The orientation problem: the cylindrical device gets rendered standing instead of lying flat

Hard Rules We Learned the Hard Way

After dozens of failed generations, clear no-go's emerge:

  1. Never mention mirrors — guaranteed product duplicate (reflection = second product)
  2. Never name two surfaces — "on the shelf or the counter" = product on both
  3. Never say "LED ring" or "glowing" — model overemphasizes luminous features. Use "small subtle blue accent" instead
  4. Never use 35mm lens — barrel distortion warps product proportions
  5. Never go below 16 steps — faces become soft, haze appears everywhere
  6. Never use inpainting to add the product — ghosting and morphing at every denoise level
  7. Fewer refs for background (1), more for foreground (3-4)

The Orientation Problem

Even when the product is a horizontal cylinder that sits flat in a cradle, the model frequently renders it standing upright — like a speaker or thermos. Reference image choice is crucial:

  • A 3/4 angle photo (where the cylinder appears somewhat vertical in perspective) → model biases toward upright placement
  • A clearly horizontal photo (side view, lying flat) → helps but doesn't guarantee horizontal output
  • Explicit language like "LYING FLAT ON ITS SIDE, NOT standing upright, like a rolling pin" → helps but isn't foolproof

The Size Calibration Problem

For background placement, the product is either too big or too small:

  • "tiny" / "very small" / "5% of frame" → product disappears
  • "large" / prominent language → product dominates the scene
  • Real-world size anchors work best: "about the size of a large water bottle" gives the model a concrete reference
  • Never use percentage-based size language — it's unpredictable

Text on Products: Forget It

Flux 2 Dev can't render readable text on products. Logos become random letter-like shapes. This is a model limitation, not fixable through prompting.

For your workflow, this means: AI product photos are perfect for mood boards, social media concepts, and internal presentations. For final print assets with visible logos, you still need real photos or compositing.

The Automated Pipeline

We built a 3-agent system that scales the process:

  • Creative Director: Generates shot briefs from brand research, adjusts based on feedback
  • Producer: Builds ComfyUI workflows, manages reference images, generates images
  • Photo Judge: Uses Claude's vision capabilities to score product accuracy, proportions, scene quality, and placement naturalness

Pass threshold: all scores ≥ 6 out of 10, average ≥ 6.5. The judge ignores text garbling (known limitation).

The result: instead of manually scrolling through hundreds of images, the pipeline automatically filters and delivers only the usable results.

Optimal Settings Cheat Sheet

ParameterValue
Steps16 (fast iteration) or 20 (final quality)
Samplereuler, simple scheduler
CFG1.0
Denoise1.0
Resolution768×1344 (9:16) or 832×1024 (4:5)
Refs (foreground)3-4 images
Refs (background)1 image
Lens in prompt50mm (natural) or 85mm (compressed bg)
Aperture in promptf/1.4 – f/1.8

Verdict: Usable, But Not a Replacement

AI product photography with Flux 2 Dev isn't a replacement for professional shoots — but it's a powerful tool for quick concepts, social media content, and internal presentations.

Foreground placement is production-ready now. Background placement needs 2-3x more generations but delivers usable results with the right technique.

The biggest advantage: everything runs locally, zero API costs, full control over the workflow. And with the automated pipeline, what used to be a manual trial-and-error process becomes a scalable system.


Want to test AI-generated product photos for your business? I help with setup — from hardware to ComfyUI to automated pipelines. Let's talk →

Flux 2
AI Product Photography
ComfyUI
RTX 5090
Product Photography
Lightricks
Share:
CP

Chris Perkles

AI Consultant, Speaker & Workshop Facilitator

I help businesses leverage AI strategically and achieve real results. From workshops to implementation - together we bring your AI projects to success.

Get in touch

Related Articles

AI Product Photography with Flux 2 Dev: What We Learned After 50+ Images | Chris Perkles Blog