Give your agent a real image rendering tool.
DALL-E hallucinates layouts. Midjourney can't hit a brand spec. For agents that need a logo on a card, a chart from state, or a certificate per user action — you need deterministic rendering. MCP-native, REST-callable, CLI-shellable. Pick your wiring.
The problem
LLM image generation (DALL-E, Midjourney, Imagen) is creative but non-deterministic. Ask for “a card with our logo in the top left, headline in 72px Inter, gradient background” and you get an approximation — different every call, never quite on-brand, text often garbled.
It's also expensive (~$0.04+ per image) and slow (5-30s). For an agent that needs to issue 1,000 certificates, render a chart from query results, or stamp an OG image per blog post — that's a non-starter. You need rendering, not generation.
Three ways to give your agent the tool
1. MCP server (native)
Claude Desktop / Code, Cursor, Windsurf, ClineInstall @codetoimage/mcp-server via npx. Two intent-aware tools appear in your client: render_html_to_image (inline base64) and render_html_to_url(24h hosted). The model picks the right one. Listed in Anthropic's official MCP Registry.
2. REST API (any agent loop)
LangChain, custom GPT, n8n HTTP nodePOST to /v1/render with HTML and dimensions, get bytes or a hosted URL back. Wrap it as a LangChain Tool, an OpenAI function-calling schema, or an n8n HTTP step. Same credit pool as MCP.
3. CLI shell (anything that can exec)
n8n Execute Command, Make, shell agentsnpx @codetoimage/cli render from any process. n8n Execute Command, Make custom modules, shell-tool agents, GitHub Actions — if it can spawn a process, it can render images.
MCP setup — Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"codetoimage": {
"command": "npx",
"args": ["-y", "@codetoimage/mcp-server"],
"env": {
"CODETOIMAGE_API_KEY": "cti_live_..."
}
}
}
}Restart Claude Desktop. Now try a prompt like:
“Render a 1200x630 card with a dark gradient background, our logo bottom-left, and the headline ‘Q4 revenue up 38%’ in Inter 72px. Return a hosted URL so I can post it to Slack.”
Claude writes the HTML, picks render_html_to_url, returns the link. No prompt engineering, no tool selection hints needed.
Non-MCP agent — LangChain Tool wrapper
For LangChain, OpenAI function-calling, or any agent framework that takes Tool definitions, wrap the REST API:
// tools/renderImage.ts
import { tool } from "@langchain/core/tools";
import { z } from "zod";
export const renderImage = tool(
async ({ html, width, height, format }) => {
const res = await fetch("https://api.codetoimage.app/v1/render", {
method: "POST",
headers: {
"X-API-Key": process.env.CODETOIMAGE_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify({
html,
width: width ?? 1200,
height: height ?? 630,
format: format ?? "png",
output: "url",
}),
});
if (!res.ok) throw new Error(`render failed: ${res.status}`);
const { url } = await res.json();
return url;
},
{
name: "render_image",
description:
"Render HTML/CSS to a pixel-perfect image. Returns a 24h hosted URL. Use for cards, certificates, OG images, charts — anything where layout precision matters.",
schema: z.object({
html: z.string().describe("Full HTML with inline CSS or <style> block"),
width: z.number().optional(),
height: z.number().optional(),
format: z.enum(["png", "jpeg", "webp"]).optional(),
}),
}
);Same shape works for OpenAI function calling — replace the wrapper with a JSON schema and call from your agent loop.
vs LLM image generation
| Aspect | DALL-E / Midjourney / Imagen | codetoimage |
|---|---|---|
| Determinism | Different every call | ✓ Same input → same pixels |
| Layout precision | Approximate, text often garbled | ✓ Exact — it's a real browser |
| Per-call cost | ~$0.04 (DALL-E 3 standard) | ✓ <$0.003 at Hobby |
| Latency | 5-30s | ✓ ~600ms median |
| Brand consistency | Best-effort prompt steering | ✓ Your design system, your fonts |
| Re-render same input | New image, new cost | ✓ Cache the output URL forever |
| Creative novelty | ✓ Concept art, illustration | Whatever HTML/CSS can express |
What you get
Agentic OG generation
LLM writes the HTML from the post content, codetoimage renders it. Per-post social previews without a designer in the loop.
Tool-call certificates
Issue a branded PNG per user action — course completion, badge, receipt. Same template, dynamic copy, deterministic output.
Chart / data viz from agent state
Pipe query results into a Chart.js or vanilla SVG template, render to PNG, hand back to the user as an attachment.
Slack / Discord image bots
Agent receives a message, renders it as a branded card, posts back. Beautiful unfurls without any image-editing tooling.
n8n / Zapier image steps
Drop an HTTP node or shell exec into a workflow. Every automation gets image rendering — no per-template setup, no Bannerbear-style template UI.
MCP Registry wedge
The only HTML-to-image stack in Anthropic's official MCP Registry. Discoverable by name in any client that indexes the registry.
FAQ
When should I use this vs DALL-E or other LLM image generation?▾
Use generative models (DALL-E, Imagen, Midjourney) when you want creative novelty — concept art, illustrations, mood boards. Use codetoimage when you want determinism — branded cards, certificates, charts, OG images, dashboards. If the same input must always produce the same pixel-perfect output, or the layout has to match your design system exactly, you need rendering, not generation. Bonus: rendering is ~10x cheaper per call and ~10x faster.
Can the LLM generate the HTML itself?▾
Yes — this is the canonical pattern. Prompt the model with the data plus a style guide ("render a 1200x630 card with our brand gradient, headline X, subheadline Y") and let it produce the HTML/CSS string. Pass that to render_html_to_image. The model handles creative layout, codetoimage handles pixel-perfect rendering. Works great for OG images, social cards, certificates with dynamic copy.
How much does it cost per render?▾
Sandbox is free for 50 renders/month. Hobby is $7/month for 3,000 renders — that's roughly $0.0023 per render. Pro is $19/month for 10,000 renders ($0.0019 each). Compare to DALL-E 3 standard at $0.040 per image — codetoimage is ~17x cheaper and the result is deterministic, so you can cache aggressively to drive effective cost even lower.
Which MCP clients are supported?▾
Any client that supports stdio MCP servers. Verified: Claude Desktop, Claude Code, Cursor. Should work without changes: Windsurf, Cline, Zed, OpenCode, and anything else that runs MCP servers via stdio. Non-MCP agents (LangChain, custom GPT actions, n8n, Make, Zapier) wire up via the REST API or by shelling out to @codetoimage/cli.
Plug image rendering into your agent.
Free Sandbox tier — 50 renders/month, no credit card. MCP, REST, or CLI — same credits, same renderer.