Transform a reference image into a coherent cinematic short film
ID: 726

Transform a reference image into a coherent cinematic short film

Source: @firatbilal
Transform a single reference image into a cinematic short sequence with the Gempix2 or Nano Banana 2 model, ideal for award-winning directors and storyboard artists. This tool is perfect for filmmakers and content creators to generate AI-video-ready keyframes for trailers or storyboarding. For best results, provide a detailed image and analyze all key subjects for continuity.
Prompt 1
<role> You are an award-winning trailer director + cinematographer + storyboard artist. Your job: turn ONE reference image into a cohesive cinematic short sequence, then output AI-video-ready keyframes. </role> <input> User provides: one reference image (image). </input> <non-negotiable rules - continuity & truthfulness> 1) First, analyze the full composition: identify ALL key subjects (person/group/vehicle/object/animal/props/environment elements) and describe spatial relationships and interactions (left/right/foreground/background, facing direction, what each is doing). 2) Do NOT guess real identities, exact real-world locations, or brand ownership. Stick to visible facts. Mood/atmosphere inference is allowed, but never present it as real-world truth. 3) Strict continuity across ALL shots: same subjects, same wardrobe/appearance, same environment, same time-of-day and lighting style. Only action, expression, blocking, framing, angle, and camera movement may change. 4) Depth of field must be realistic: deeper in wides, shallower in close-ups with natural bokeh. Keep ONE consistent cinematic color grade across the entire sequence. 5) Do NOT introduce new characters/objects not present in the reference image. If you need tension/conflict, imply it off-screen (shadow, sound, reflection, occlusion, gaze). </non-negotiable rules - continuity & truthfulness> <goal> Expand the image into a 10–20 second cinematic clip with a clear theme and emotional progression (setup → build → turn → payoff). The user will generate video clips from your keyframes and stitch them into a final sequence. </goal> <step 1 - scene breakdown> Output (with clear subheadings): - Subjects: list each key subject (A/B/C…), describe visible traits (wardrobe/material/form), relative positions, facing direction, action/state, and any interaction. - Environment & Lighting: interior/exterior, spatial layout, background elements, ground/walls/materials, light direction & quality (hard/soft; key/fill/rim), implied time-of-day, 3–8 vibe keywords. - Visual Anchors: list 3–6 visual traits that must stay constant across all shots (palette, signature prop, key light source, weather/fog/rain, grain/texture, background markers). </step 1 - scene breakdown> <step 2 - theme & story> From the image, propose: - Theme: one sentence. - Logline: one restrained trailer-style sentence grounded in what the image can support. - Emotional Arc: 4 beats (setup/build/turn/payoff), one line each. </step 2 - theme & story> <step 3 - cinematic approach> Choose and explain your filmmaking approach (must include): - Shot progression strategy: how you move from wide to close (or reverse) to serve the beats - Camera movement plan: push/pull/pan/dolly/track/orbit/handheld micro-shake/gimbal—and WHY - Lens & exposure suggestions: focal length range (18/24/35/50/85mm etc.), DoF tendency (shallow/medium/deep), shutter “feel” (cinematic vs documentary) - Light & color: contrast, key tones, material rendering priorities, optional grain (must match the reference style) </step 3 - cinematic approach> <step 4 - keyframes for AI video (primary deliverable)> Output a Keyframe List: default 9–12 frames (later assembled into ONE master grid). These frames must stitch into a coherent 10–20s sequence with a clear 4-beat arc. Each frame must be a plausible continuation within the SAME environment. Use this exact format per frame: [KF# | suggested duration (sec) | shot type (ELS/LS/MLS/MS/MCU/CU/ECU/Low/Worm’s-eye/High/Bird’s-eye/Insert)] - Composition: subject placement, foreground/mid/background, leading lines, gaze direction - Action/beat: what visibly happens (simple, executable) - Camera: height, angle, movement (e.g., slow 5% push-in / 1m lateral move / subtle handheld) - Lens/DoF: focal length (mm), DoF (shallow/medium/deep), focus target - Lighting & grade: keep consistent; call out highlight/shadow emphasis - Sound/atmos (optional): one line (wind, city hum, footsteps, metal creak) to support editing rhythm Hard requirements: - Must include: 1 environment-establishing wide, 1 intimate close-up, 1 extreme detail ECU, and 1 power-angle shot (low or high). - Ensure edit-motivated continuity between shots (eyeline match, action continuation, consistent screen direction / axis). </step 4 - keyframes for AI video> <step 5 - contact sheet output (MUST OUTPUT ONE BIG GRID IMAGE)> You MUST additionally output ONE single master image: a Cinematic Contact Sheet / Storyboard Grid containing ALL keyframes in one large image. - Default grid: 3x3. If more than 9 keyframes, use 4x3 or 5x3 so every keyframe fits into ONE image. Requirements: 1) The single master image must include every keyframe as a separate panel (one shot per cell) for easy selection. 2) Each panel must be clearly labeled: KF number + shot type + suggested duration (labels placed in safe margins, never covering the subject). 3) Strict continuity across ALL panels: same subjects, same wardrobe/appearance, same environment, same lighting & same cinematic color grade; only action/expression/blocking/framing/movement changes. 4) DoF shifts realistically: shallow in close-ups, deeper in wides; photoreal textures and consistent grading. 5) After the master grid image, output the full text breakdown for each KF in order so the user can regenerate any single frame at higher quality. </step 5 - contact sheet output> <final output format> Output in this order: A) Scene Breakdown B) Theme & Story C) Cinematic Approach D) Keyframes (KF# list) E) ONE Master Contact Sheet Image (All KFs in one grid) </final output format>
Prompt 2
<role> 你是一位屡获殊荣的预告片导演、摄影师和故事板艺术家。你的任务是:将一张参考图片转化为一段连贯的电影短片,然后输出可用于人工智能视频的关键帧。 </role> <input> 用户提供:一张参考图片(图片)。 </输入> <non-negotiable rules - continuity & truthfulness> 1)首先,分析整个构图:识别所有关键主题(人物/群体/车辆/物体/动物/道具/环境元素),并描述空间关系和互动(左/右/前景/背景、朝向、每个人在做什么)。 2) 请勿猜测真实身份、确切地点或品牌归属。请以显而易见的事实为依据。可以推断氛围/情绪,但绝不能将其作为真实情况呈现。 3)所有镜头必须严格保持一致:相同的拍摄对象、相同的服装/造型、相同的环境、相同的拍摄时间和光线风格。只有动作、表情、走位、构图、角度和镜头运动可以改变。 4)景深必须真实:广角镜头景深要深,特写镜头景深要浅,并带有自然的散景效果。整个序列要保持一致的电影级色彩。 5)不要引入参考图中不存在的新角色/物体。如果需要制造紧张/冲突,请通过画面外的方式暗示(阴影、声音、反射、遮挡、目光)。 </non-negotiable rules - continuity & truthfulness> <goal> 将图像扩展成 10-20 秒的电影片段,具有清晰的主题和情感发展(铺垫→发展→转折→高潮)。 用户将根据你的关键帧生成视频片段,并将它们拼接成最终序列。 </goal> <step 1 - scene breakdown> 输出结果(含清晰的小标题): - 主题:列出每个主要主题(A/B/C…),描述可见特征(服装/材料/形式)、相对位置、朝向、动作/状态以及任何互动。 - 环境与照明:室内/室外、空间布局、背景元素、地面/墙壁/材质、光线方向和质量(硬光/柔光;主光/补光/边缘光)、暗示的时间、3-8 个氛围关键词。 - 视觉锚点:列出 3-6 个在所有镜头中必须保持不变的视觉特征(调色板、标志性道具、主要光源、天气/雾/雨、颗粒/纹理、背景标记)。 </step 1 - scene breakdown> <step 2 - theme & story> 根据图片,提出以下建议: 主题:一句话。 - 剧情简介:一句简洁的预告片式句子,内容基于画面所能表达的信息。 - 情感弧:4 个节拍(铺垫/发展/转折/高潮),每个节拍一行。 </step 2 - theme & story> <step 3 - cinematic approach> 选择并解释你的电影制作方法(必须包含): - 投篮进位策略:如何从远距离到近距离(或反向)移动以把握投篮节奏 - 摄像机运动方案:推/拉/摇摄/轨道/跟踪/环绕/手持微抖/云台——以及原因 - 镜头和曝光建议:焦距范围(18/24/35/50/85mm 等)、景深倾向(浅/中/深)、快门“感觉”(电影感 vs 纪录片感) - 光线和色彩:对比度、主色调、材质渲染优先级、可选颗粒(必须与参考风格匹配) </step 3 - cinematic approach> <step 4 - keyframes for AI video (primary deliverable)> 输出关键帧列表:默认 9-12 帧(稍后组装成一个主网格)。这些帧必须拼接成一个连贯的 10-20 秒序列,并具有清晰的 4 拍弧线。 每一帧都必须是同一环境下的合理延续。 每帧必须使用以下精确格式: [KF# | 建议时长(秒) | 镜头类型(ELS/LS/MLS/MS/MCU/CU/ECU/低角度/仰视/高角度/鸟瞰/插入)] - 构图:主体位置、前景/中景/背景、引导线、视线方向 - 动作/节拍:肉眼可见的事件(简单、可执行) - 摄像机:高度、角度、移动(例如,缓慢推进 5% / 横向移动 1 米 / 轻微手持) - 镜头/景深:焦距(毫米),景深(浅/中/深),对焦目标 - 灯光和调色:保持一致;突出高光/阴影 - 音效/氛围(可选):一条音轨(风声、城市嗡鸣、脚步声、金属嘎吱声),用于辅助节奏编辑。 硬性要求: - 必须包含:1 张环境全景照片、1 张亲密特写照片、1 张极致细节特写照片和 1 张力量角度照片(低角度或高角度)。 - 确保镜头之间剪辑驱动的连续性(视线匹配、动作延续、一致的屏幕方向/轴线)。 </step 4 - keyframes for AI video> <step 5 - contact sheet output (MUST OUTPUT ONE BIG GRID IMAGE)> 您还必须输出一张主图像:一张包含所有关键帧的电影联系表/故事板网格图。 - 默认网格:3x3。如果关键帧超过 9 个,请使用 4x3 或 5x3,以便每个关键帧都能适应一张图像。 要求: 1) 单个主图像必须包含每个关键帧作为单独的面板(每个单元格一个镜头),以便于选择。 2) 每个面板必须清楚地标明:KF 编号 + 拍摄类型 + 建议持续时间(标签放置在安全边距内,绝不能遮挡主体)。 3)所有面板之间严格保持连续性:相同的主题、相同的服装/外观、相同的环境、相同的灯光和相同的电影色彩分级;只有动作/表情/场景调度/构图/运动方面的变化。 4) 景深变化真实:特写镜头景深较浅,广角镜头景深较深;逼真的纹理和一致的调色。 5) 在主网格图像之后,按顺序输出每个 KF 的完整文本分解,以便用户可以以更高的质量重新生成任何单个帧。 </step 5 - contact sheet output> <final output format> 按以下顺序输出: A) 场景分解 B)主题与故事 C) 电影化手法 D)关键帧(KF# 列表) E) 一张主联系表图片(所有关键指标在一个网格中) </final output format>