Tutorial · 2026-05-28 · 6 min read

How to give Claude or Cursor an image generation tool in 60 seconds

LLM image generation is great when you want creative — a stylized hero illustration, a moodboard, a "draw me a cat in a spacesuit" moment. It's terrible when you want exact. A logo at 512×512. A certificate with the right name. A receipt header. A chart from numbers your agent already has in memory. DALL-E and Midjourney are non-deterministic by design; you can't ship that into a customer-facing flow.

Sixty seconds of config solves it. We're going to give your AI client — Claude Desktop, Claude Code, Cursor, whichever you use — a deterministic image rendering tool, and the model will call it on its own whenever the prompt asks for an exact image. No fine-tuning. No prompt wizardry. Three lines of JSON.

What MCP actually is, in one minute

MCP stands for Model Context Protocol. It's a small JSON-RPC protocol Anthropic published in late 2024 to standardize how tools get exposed to LLM clients. Before MCP, every app rolled its own plugin system; after MCP, the same server works in Claude Desktop, Claude Code, Cursor, Windsurf, Cline, Zed, and a growing list of others.

An MCP server is just a tiny process. The client launches it (usually via npx), the server advertises a list of tools over stdio, and the model — when it decides a tool would help — calls one. The model sees real responses, including images, and can react to them. It's roughly the agent equivalent of a browser plugin: install once, available everywhere the agent runs.

codetoimage ships an official MCP server, published as @codetoimage/mcp-server and listed in Anthropic's MCP Registry under the id io.github.beznazwiska/codetoimage-mcp-server. It exposes two tools: render_html_to_image (returns the PNG bytes inline so the model can see them) and render_html_to_url (returns a CDN-served URL with 24h TTL, for when you want to paste a link instead).

Three minutes of setup

You need Node 18 or newer. That's it — no global install, npx will pull the package on first run.

1. Grab an API key

Head to codetoimage.app/signup and create a free account. No credit card. The free Sandbox tier gives you 50 renders per month, watermarked, capped at 800×800 — plenty for testing the integration. When you outgrow it, Hobby is $7/mo for 3,000 renders, full size, no watermark.

After signup you'll land on the dashboard with a key that starts with cti_live_. Copy it.

2. Drop the config into your client

The JSON is the same shape everywhere. Only the file path changes.

Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json on macOS, or %APPDATA%\Claude\claude_desktop_config.json on Windows:

{
  "mcpServers": {
    "codetoimage": {
      "command": "npx",
      "args": ["-y", "@codetoimage/mcp-server"],
      "env": { "CODETOIMAGE_API_KEY": "cti_live_..." }
    }
  }
}

Claude Code — easiest path is the CLI:

claude mcp add codetoimage \
  -e CODETOIMAGE_API_KEY=cti_live_... \
  -- npx -y @codetoimage/mcp-server

Or edit ~/.config/claude-code/mcp.json directly with the same JSON shape as Claude Desktop.

Cursor — drop a .cursor/mcp.json in your project root (or ~/.cursor/mcp.json for global), same JSON:

{
  "mcpServers": {
    "codetoimage": {
      "command": "npx",
      "args": ["-y", "@codetoimage/mcp-server"],
      "env": { "CODETOIMAGE_API_KEY": "cti_live_..." }
    }
  }
}

Windsurf and Cline both use the same shape — check their docs for the exact config path.

3. Restart and verify

Quit and relaunch your client (Claude Desktop especially won't pick up MCP changes until you fully restart). Then ask the model:

What tools do you have available?

You should see render_html_to_image and render_html_to_url in the list. If they're missing, check that npx @codetoimage/mcp-server runs cleanly in your terminal — usually it's a Node version or a typo in the API key.

That's the whole setup. The model now has the tool. You don't have to tell it to use it; it'll reach for it whenever the prompt smells like an image task.

Trying it out — three prompts

Prompt 1 — inline render:

Render a card that says "Welcome, Jakub" in 1200×630 with a purple gradient background and big bold Inter font. Show me the result.

The model writes a small HTML+CSS snippet, calls render_html_to_image, and the PNG appears inline in the chat. You can ask it to tweak the gradient or font size and it'll re-render. The HTML is exact — no creative reinterpretation — so the second render with the same prompt is byte-identical.

Prompt 2 — hosted URL:

Generate an OG image for this blog post title: "Give Claude an image generation tool in 60 seconds". Use our brand purple. Return a hosted URL I can paste into meta tags.

This time the model calls render_html_to_url. You get back a JSON object with a URL on cdn.codetoimage.app that's live for 24 hours. Long enough to paste into a draft or share with a teammate; short enough that you're not accidentally building a permanent CDN out of free-tier renders.

Prompt 3 — agent-state to image:

Take the revenue numbers we discussed earlier and render a card I can post in Slack. Three rows, big numbers, brand purple accent, 1200 wide.

This is where it gets interesting. The model already has the data in its context; it composes the HTML around it and calls the tool. You go from "agent that talks about your data" to "agent that drops a shareable image of your data" without writing a single line of integration code.

When this is the right tool, and when it isn't

It's the right tool for: deterministic layouts, branded social cards, OG images, certificates with real names, receipt headers, invoice previews, data-viz from agent state, code screenshots, dashboard snapshots, anything where "the same input produces the same output" matters.

It's the wrong tool for: photorealistic content, creative interpretation, illustration, anything where you want the model to make aesthetic choices. For that, keep using DALL-E, Midjourney, Imagen, or whatever generative image model you already like. The two categories don't compete — they cover different jobs.

The cost difference is worth flagging. A render on our Hobby tier works out to roughly $0.0023 each. A single DALL-E 3 HD image is around $0.04. That's ~17× cheaper per image when you don't actually need the LLM to invent the pixels — and most "image" tasks in an agent workflow don't. Use the cheap deterministic tool for the boring layouts; save the creative budget for the cases that actually need it.

Two follow-up paths

If you don't have an MCP-aware client but want the same power from a shell script or CI job, our npm CLI wraps the same API: npx @codetoimage/cli render --html '<h1>hi</h1>' --out card.png. It's nice for batch jobs and webhook handlers.

If you're building a custom agent loop and want raw HTTP, the REST endpoint is one POST to https://api.codetoimage.app/v1/render with your API key in the X-API-Key header. Returns binary PNG/JPEG/WebP or, with output: "url", a hosted URL. We have a dedicated guide on wiring it into AI agents.

Source for the MCP server is on GitHub — PRs welcome. More posts at /blog; follow @beznazwiska on GitHub for release notes.