This is the workflow I use to make Instagram carousels where the first slide looks like a proper magazine cover instead of a bland AI-generated document.
The short version:
ChatGPT makes the visual assets. Codex saves the files, runs the code, and exports the final carousel. Bun + Playwright render the slides as exact 1080x1350 PNGs.
You do not need the OpenAI Images API for this version. The point is to use your ChatGPT subscription inside Chrome.
This is a guide. The most important step to get started is STEP 10. TLDR: download codex. then go to step 10 of this guide. copy the entire one shot prompt and paste into codex to set this whole thing up. then read the guide to see how this whole setup works :) Any questions or troubleshooting ask in my free skool community: https://www.skool.com/ai-automation-experts-5012/about?ref=a0e8de0b607240589239ff47d72650a8
What You Need
- A ChatGPT account with image generation available.
- Google Chrome.
- Codex on your computer.
- Bun.
- A folder of face reference images if you want the cover slide to include you.
- A project folder. Codex can create the carousel renderer inside it.
Official setup links:
- Codex CLI: https://help.openai.com/en/articles/11096431
- Bun install docs: https://bun.sh/docs/installation
1. Install Codex
Install Codex using the official command:
npm install -g @openai/codexThen open the folder you want Codex to work in:
cd ~/Documents/ai-carousel-system
codexIf you use Codex Desktop instead, open the project folder inside the app.
Important: for this workflow, Codex needs to be able to control Chrome through Computer Use. That is how it can open ChatGPT, upload your reference photos, download the generated images, move them into the right folder, and run the render script.
2. Install Bun
On macOS or Linux:
curl -fsSL https://bun.com/install | bashRestart your terminal, then check it worked:
bun --versionInside your carousel project:
bun install3. Log Into ChatGPT In Chrome
Open Chrome, go to:
https://chatgpt.comLog in.
Do not use a private window. Do not log out after setup.
Codex can only automate the image step if Chrome is already logged into ChatGPT. This matters because the workflow uses your ChatGPT subscription, not an API key.
4. Create The Custom GPT
In ChatGPT, create a Custom GPT called something like:
Instagram Carousel Hook SlideUpload your best face reference images into the GPT knowledge/files.
Use clear photos:
- front-facing
- good lighting
- no sunglasses
- no weird crop
- a few different angles
Then paste this as the Custom GPT instructions:
You are my Instagram carousel hook image generator.
Your job is to create 4:5 portrait hook-slide images for my personal brand carousels.
Always use the uploaded face reference images as the likeness source. Preserve my facial identity, hair, face shape, skin tone, and overall recognisability. Do not invent a different person.
If image generation cannot reliably access the uploaded reference photos, ask me to upload two face reference images directly in the chat before generating.
Expression:
I should look elated, blissful, and genuinely happy: huge smile, eyes closed, confident, approachable.
Image style:
Create a professional high-resolution editorial photograph for a social media hook slide. Make it premium, vibrant, high contrast, and scroll-stopping.
Clothing:
Clean modern casual clothing, such as a neutral hoodie, crisp linen shirt, or high-quality crew neck.
Background:
Use vibrant attention-grabbing scenes with depth:
- lush paddock
- yellow wildflower meadow
- cobalt blue sky
- waterfall
- turquoise water
- bright park
- colourful outdoor architecture
Avoid plain studio-only backgrounds unless I explicitly ask for one.
Composition:
Use true 4:5 portrait framing for Instagram carousel export. Fill the full frame. No borders, no side gutters, no padding, no poster-on-background effect.
Keep the face, hands, logos, and any text inside the central 4:5 crop-safe area.
Typography:
Only add text when I explicitly ask for a finished hook slide. Use professional magazine-style typography. Keep it readable. Do not add random extra words.
For empty background assets:
Remove me completely.
Remove all text completely.
Keep the same scene, lighting, colour palette, and visual style.
Leave clean negative space for Codex to typeset slide text later.
Never add watermarks.
Never add extra people.
Never add a dark shade or vignette unless I ask.5. Use This Folder Structure
Keep everything boring and predictable. That is what lets Codex automate it.
ai-carousel-system/
├── package.json
├── tsconfig.json
├── src/
│ └── render.ts
├── templates/
│ └── styles.css
├── config/
│ ├── brand-voice.md
│ └── chatgpt-carousel-workflow.md
├── content/
│ └── carousels/
│ └── my-carousel.json
├── assets/
│ ├── face reference/
│ │ ├── face-01.png
│ │ └── face-02.png
│ ├── Inspiration hook slides/
│ │ └── reference.png
│ └── heroes/
│ ├── my-carousel-cover.png
│ ├── my-carousel-background.png
│ └── my-carousel-05-finished.png
└── output/
└── my-carousel/
├── 01.png
├── 02.png
├── 03.png
├── 04.png
├── 05.png
├── 06.png
├── 07.png
├── 08.png
└── caption.txtRules:
content/carousels/stores the carousel scripts.assets/face reference/stores your identity photos.assets/Inspiration hook slides/stores screenshots of cover styles you like.assets/heroes/stores ChatGPT-generated image assets.output/is what you upload to Instagram.
6. After Setup, Put Files Here
Once Codex creates the project, this is what your viewers need to know.
Put your face photos here:
assets/face reference/Use 2-8 clear images. Good lighting. No sunglasses. No weird crops. If you want the image to look like you, this folder matters more than any clever prompt.
Put visual inspiration here:
assets/Inspiration hook slides/These can be screenshots of carousel covers, magazine layouts, colour styles, or hook-slide designs you like. They are optional, but useful.
Put carousel scripts here:
content/carousels/Each carousel is one JSON file. Example:
content/carousels/chatgpt-codex-workflow.jsonGenerated ChatGPT images go here:
assets/heroes/Use this naming:
assets/heroes/chatgpt-codex-workflow-cover.png
assets/heroes/chatgpt-codex-workflow-background.png
assets/heroes/chatgpt-codex-workflow-05-finished.pngFinal Instagram files appear here:
output/chatgpt-codex-workflow/That folder should contain:
01.png
02.png
03.png
04.png
05.png
06.png
07.png
08.png
caption.txt7. Carousel JSON Format
Each carousel is one JSON file.
Example:
{
"id": "chatgpt-codex-carousel-system",
"title": "How I automate beautiful carousels with ChatGPT and Codex",
"caption": "ChatGPT makes the visual assets. Codex saves the files, runs the script, and exports the deck.",
"hashtags": ["ChatGPT", "Codex", "AIAutomation", "InstagramCarousel"],
"slides": [
{
"type": "hook",
"eyebrow": "ChatGPT + Codex",
"display": "I stopped making carousels manually.",
"size": "md",
"kicker": "Here is the workflow."
},
{
"type": "content",
"label": "The problem",
"headline": "Most AI carousels look like formatted documents.",
"bullets": [
"The copy might be fine.",
"The layout might be clean.",
"But nothing makes someone stop."
]
},
{
"type": "content",
"label": "The fix",
"headline": "Separate the image job from the rendering job.",
"bullets": [
"ChatGPT makes the cover.",
"ChatGPT makes the matching empty background.",
"Codex renders the final slides."
]
},
{
"type": "content",
"label": "The asset",
"headline": "The cover and background live in one folder.",
"code": "assets/heroes/my-carousel-cover.png\\nassets/heroes/my-carousel-background.png"
},
{
"type": "compare",
"label": "The difference",
"headline": "Claude can render. ChatGPT can make people stop.",
"bad": {
"label": "Bland",
"body": "Clean text on a plain background.",
"meta": "Correct but forgettable."
},
"good": {
"label": "Visual",
"body": "A cover image with a real scroll-stopping hook.",
"meta": "Better for attention."
}
},
{
"type": "content",
"label": "The workflow",
"headline": "One prompt starts the run.",
"bullets": [
"Codex opens Chrome.",
"ChatGPT generates the images.",
"Codex downloads and normalises them.",
"Bun exports the PNGs."
]
},
{
"type": "content",
"label": "The pro move",
"headline": "Make every deck use the same asset contract.",
"numbered": [
"Cover image.",
"Empty background.",
"JSON content.",
"Rendered PNGs."
]
},
{
"type": "cta",
"eyebrow": "Want the system",
"headline": "Comment Carousel and I will send you the workflow.",
"instruct": "Includes the prompts, folder structure, and Codex setup.",
"primary": { "label": "Comment Carousel" },
"secondary": { "label": "Save this" }
}
]
}8. The Cover Prompt
This is the prompt Codex should send to your Custom GPT after uploading two face reference images. Obviously customise the prompt if you want to make it more in your style!
Use the two attached face photos as the primary identity source.
Generate one finished true 4:5 portrait Instagram carousel hook slide WITH typography included in the image.
Topic:
[CAROUSEL TOPIC]
Identity:
The person must look recognisably like me from the two attached face photos. Preserve face shape, skin tone, hair, smile, eyes, jawline, and overall identity. Do not invent a different person.
Expression:
Make me look elated and blissful: huge genuine smile, eyes closed, confident, approachable.
Scene:
Put me in a vibrant attention-grabbing outdoor setting that fits the topic. Use a lush paddock, yellow wildflower meadow, cobalt blue sky, waterfall, turquoise water, bright park, or colourful outdoor architecture.
Product marks:
If the topic is about ChatGPT, include a tasteful ChatGPT mark.
If the topic is about Claude, include a tasteful Claude mark.
If the topic is about Codex, include a tasteful Codex mark.
Keep marks secondary, like editorial props or stickers.
Ratio:
True 4:5 Instagram carousel frame. Fill the whole frame. No borders, no side gutters, no padding, no poster-on-background effect. Keep all text, face, hands, logos, and CTA safely inside the 4:5 frame.
Typography:
Small top header: Daniel Builds AI
Main headline: [MAIN HEADLINE]
Subheading: [SUBHEADING]
Bottom CTA: [CTA]
Use readable premium magazine-style fonts. Strong hierarchy. No random text. No watermark. No extra people.9. The Matching Background Prompt
After ChatGPT makes the cover, ask it for the empty background.
Create a matching empty background asset from the generated cover image above for slides 2-8.
Keep the same true 4:5 Instagram carousel framing, vibrant background world, lighting, colour palette, and premium editorial style.
Remove me completely.
Remove every bit of typography and text completely.
No people. No captions. No words. No watermark.
Keep any product logo objects as small tasteful decorative elements, secondary and unobtrusive.
Leave generous readable negative space across the upper-left and center so Codex can typeset slide text later.
Keep the environment visually appealing. Preserve the meadow, paddock, waterfall, cobalt sky, turquoise water, or other strong scene from the cover.
Do not add a dark shade, vignette, smoke, dimming layer, side gutters, borders, or padding.10. The One-Shot Prompt For Codex
Paste this into Codex from a blank project folder or an existing carousel project.
This prompt tells Codex to create the project if it does not exist, explain the folder structure, generate the assets through ChatGPT in Chrome, and render the first carousel.
Replace the bracketed parts.
Set up and render one Instagram carousel from beginning to end using the ChatGPT subscription image workflow.
You are starting from this project folder. If the carousel renderer project does not already exist, create it from scratch.
Goal:
Build a Bun + Playwright Instagram carousel system that renders exact 1080x1350 PNG slides from JSON content files, uses ChatGPT subscription image generation in Chrome for the cover/background assets, and outputs one complete carousel.
Project setup:
- Runtime must be Bun.
- Do not use Node/npm/yarn for running the project.
- Use `bun install`.
- Render with Playwright Chromium, headless.
- Use TypeScript.
- Export exact 1080x1350 PNGs.
- Use Google Chrome + Computer Use for ChatGPT image generation.
- Do not use the OpenAI Images API.
- Do not ask me to manually copy prompts, download images, or move files.
Create this folder structure if missing:
.
├── package.json
├── tsconfig.json
├── src/
│ └── render.ts
├── templates/
│ └── styles.css
├── config/
│ ├── brand-voice.md
│ └── chatgpt-carousel-workflow.md
├── content/
│ └── carousels/
│ └── [CAROUSEL_ID].json
├── assets/
│ ├── face reference/
│ ├── Inspiration hook slides/
│ └── heroes/
└── output/
Also create a short README or guide that explains what goes in each folder:
- `assets/face reference/` = two or more clear face photos.
- `assets/Inspiration hook slides/` = optional visual references.
- `content/carousels/` = JSON carousel scripts.
- `assets/heroes/` = generated cover/background images.
- `templates/styles.css` = carousel visual system.
- `src/render.ts` = Bun + Playwright renderer.
- `output/` = final Instagram-ready PNGs and caption.
Carousel:
- id: [CAROUSEL_ID]
- topic: [CAROUSEL_TOPIC]
- hook: [HOOK_TEXT]
- main headline for cover: [MAIN_HEADLINE]
- subheading for cover: [SUBHEADING]
- CTA for cover: [CTA]
Content direction:
[WRITE THE MAIN POINTS YOU WANT COVERED]
Renderer requirements:
- Accept `bun src/render.ts content/carousels/[CAROUSEL_ID].json`.
- Accept `bun src/render.ts --all`.
- Render exactly 8 slides.
- Export exact 1080x1350 PNGs.
- Write `caption.txt`.
- If `assets/heroes/[CAROUSEL_ID]-cover.png` exists, use it directly for slide 1.
- If `assets/heroes/[CAROUSEL_ID]-background.png` exists, use it behind slides 2-8.
- If `assets/heroes/[CAROUSEL_ID]-05-finished.png` exists, use it directly as slide 5.
- Do not render slide numbers or page counters.
- Use readable `Daniel Builds AI`, not tiny all-caps.
- Use readable sentence-case section labels.
- Use actual bullet points, not circular dot badges.
- Keep text readable and not all caps after slide 1.
- Do not apply a dark shade over the generated background.
Custom GPT:
- Open this in Google Chrome: [CUSTOM_GPT_URL]
- Chrome should already be logged into ChatGPT.
- Use Google Chrome + Computer Use.
- Attach exactly two face reference images from `assets/face reference/`.
Cover image prompt to send to ChatGPT:
Use the two attached face photos as the primary identity source.
Generate one finished true 4:5 portrait Instagram carousel hook slide WITH typography included in the image.
Topic:
[CAROUSEL_TOPIC]
Identity:
The person must look recognisably like me from the two attached face photos. Preserve face shape, skin tone, hair, smile, eyes, jawline, and overall identity. Do not invent a different person.
Expression:
Make me look elated and blissful: huge genuine smile, eyes closed, confident, approachable.
Scene:
Put me in a vibrant attention-grabbing setting that fits the topic. Use a visually rich background with depth, such as a yellow wildflower meadow, cobalt blue sky, waterfall, turquoise water, rooftop, colourful architectural space, bright park, or bold studio scene.
Pose:
Choose a pose that fits the topic: lying in grass looking up, walking toward camera, sitting confidently, leaning into a creative desk, holding a product logo, or celebrating with arms open.
Product marks:
If the topic is about ChatGPT, include a tasteful ChatGPT mark.
If the topic is about Claude, include a tasteful Claude mark.
If the topic is about Codex, include a tasteful Codex mark.
Keep marks secondary, like editorial props or stickers.
Ratio:
True 4:5 Instagram carousel frame. Fill the whole frame. No borders, no side gutters, no padding, no poster-on-background effect. Keep all text, face, hands, logos, and CTA safely inside the 4:5 frame.
Typography:
Small top header: Daniel Builds AI
Main headline: [MAIN_HEADLINE]
Subheading: [SUBHEADING]
Bottom CTA: [CTA]
Use readable premium magazine-style fonts. Strong hierarchy. No random text. No watermark. No extra people.
After the cover is generated:
1. Download it.
2. Save it as `assets/heroes/[CAROUSEL_ID]-cover.png`.
3. Ask ChatGPT for a matching empty background with this prompt:
Create a matching empty background asset from the generated cover image above for slides 2-8.
Keep the same true 4:5 Instagram carousel framing, vibrant background world, lighting, colour palette, and premium editorial style.
Remove me completely.
Remove every bit of typography and text completely.
No people. No captions. No words. No watermark.
Keep any product logo objects as small tasteful decorative elements, secondary and unobtrusive.
Leave generous readable negative space across the upper-left and center so Codex can typeset slide text later.
Keep the environment visually appealing. Preserve the meadow, paddock, waterfall, cobalt sky, turquoise water, rooftop, bright park, colourful architecture, or other strong scene from the cover.
Do not add a dark shade, vignette, smoke, dimming layer, side gutters, borders, or padding.
Then:
- Download the background.
- Save it as `assets/heroes/[CAROUSEL_ID]-background.png`.
- Normalise every downloaded image to exact 1080x1350 without side padding, blurred gutters, or borders. Crop first if needed, then resample.
- If slide 5 needs a full visual comparison, generate it and save it as `assets/heroes/[CAROUSEL_ID]-05-finished.png`.
Writing rules:
- Direct, blunt, useful.
- Opinion first, explanation second.
- One concept per slide.
- Every slide should make the next slide feel necessary.
- Avoid corporate words like leverage, utilize, actionable, unlock.
- Avoid vague phrases like visual world, taste layer, production team.
- Do not write generic creator advice.
Design rules:
- No slide numbers or page counters.
- Use `Daniel Builds AI`, not tiny all-caps.
- Make section labels readable sentence case.
- Use actual bullet points, not circular dot badges.
- Keep typography readable and not all caps after slide 1.
- Use the generated background for slides 2-8.
- Do not apply a dark shade over the background.
- Use visual variety. Slide 5 should be the layout shift.
Deliverables:
- Create the project files if missing.
- Create `content/carousels/[CAROUSEL_ID].json`.
- Save generated assets in `assets/heroes/`.
- Render final PNGs to `output/[CAROUSEL_ID]/01.png` through `08.png`.
- Write `output/[CAROUSEL_ID]/caption.txt`.
- Run `bunx tsc --noEmit`.
- Verify every PNG is exactly 1080x1350.
- Create a contact sheet preview and show it to me.
- Explain what files were created and where everything goes.
Start now and do the whole workflow without stopping unless Chrome login, ChatGPT generation, or file upload is blocked.11. Normalise Image Downloads
ChatGPT sometimes gives you a file that is close to 4:5 but not exact.
Do not add side padding. Do not blur-fill the edges. Do not put the image on a bigger canvas.
Crop to 4:5, then resample:
sips --cropToHeightWidth 1402 1122 input.png --out /tmp/crop.png
sips --resampleHeightWidth 1350 1080 /tmp/crop.png --out assets/heroes/my-carousel-cover.pngIf the crop cuts off the face, headline, CTA, or product marks, regenerate the image with stricter crop-safe instructions.
12. Render The Carousel
Render one carousel:
bun src/render.ts content/carousels/my-carousel.jsonRender every carousel:
bun src/render.ts --allCheck image sizes:
sips -g pixelWidth -g pixelHeight output/my-carousel/*.pngYou want:
pixelWidth: 1080
pixelHeight: 135013. What Can Break
ChatGPT can fail at image generation.
If that happens, retry with a shorter prompt. The most important bits are:
- use the two attached face references
- true 4:5 frame
- no side gutters
- vibrant background
- keep text inside the safe area
- no random text
Chrome can log you out.
If that happens, log back into ChatGPT in Chrome, then run the Codex prompt again.
The image might not look like you.
If that happens, upload two face references directly into the live chat, not just into the Custom GPT knowledge. Say:
Use the two attached photos as the identity source. Preserve likeness more strongly than pose, background, or typography.The deck might look too samey.
Fix that by making slide 5 a visual shift:
- split-screen comparison
- full-slide generated visual
- code screenshot style
- before/after layout
14. The Final Checklist
Before posting:
- Is slide 1 scroll-stopping?
- Does slide 2 create tension?
- Does each slide make the next one feel necessary?
- Does slide 5 re-engage attention?
- Is every PNG exactly
1080x1350? - Are there no slide numbers?
- Are the bullets real bullets?
- Is the background vivid and clean?
- Is the copy specific, not generic?
- Does the CTA ask for one simple action?
That is the system.
ChatGPT makes the visual asset people notice. Codex handles the file work. Bun and Playwright export the deck.