How to apply syntax highlighting to content from a headless CMS

5 minute read

When you build a blog or documentation site with a headless CMS (like Marble, Contentful, or Sanity), the data typically arrives at your frontend as raw HTML or Markdown.

If you write technical content, you inevitably need to display code blocks. However, out of the box, your CMS is just going to hand you something like this:

<pre><code class="language-typescript">
const message = "Hello world";
console.log(message);
</code></pre>

If you render this directly to your page, it looks plain, boring, and hard to read. To make it look like a highly legible, beautiful code editor, you need syntax highlighting.

But how do you apply syntax highlighting to HTML strings coming from a remote database without ruining your site's performance? Let's break it down.

The Options: Prism vs. Highlight.js vs. Shiki

There are a few major players in the web syntax highlighting ecosystem. Let's look at why you might choose one over the other.

1. Prism.js

Prism is an extremely popular, lightweight, and extensible syntax highlighter. It relies on CSS and JavaScript included on the frontend. When the page loads, Prism scans the DOM for code blocks and injects styles. It's great if you want a simple drop-in solution and easy theme customization right in your CSS.

2. Highlight.js

Similar to Prism, Highlight.js works primarily on the client side (though it can run in Node). It is known for its incredible language support and "auto-detection" feature, where it can guess what language a code block is written in if the author forgot to specify it.

3. Shiki

Shiki works differently. Instead of relying on regular expressions to highlight text, it uses the exact same TextMate grammar engine that powers VS Code. It generates the HTML with inline styles (or classes) ahead of time on the server. Because it doesn't require sending any JavaScript to the client (avoiding a Flash of Unstyled Content), it's highly recommended for modern server-rendered frameworks like Next.js, Astro, or Remix.

In this tutorial, we will be using Shiki.

The Strategy: Server-Side Replacement

Our goal is to intercept the raw HTML string coming from the headless CMS on the server, find the code blocks, run them through Shiki, and pass the final, fully-styled HTML to our frontend components.

Most rich-text editors and headless CMS platforms (including Marble) serialize code blocks by attaching a language-* class to the <code> tag.

Here is a robust, production-ready function in TypeScript that finds these blocks via regex, decodes them, and highlights them using Shiki.

Setting up the Highlighter

First, install Shiki:

npm install shiki

Next, let's create our highlighter utility. We want to initialize Shiki as a singleton so we don't reload the massive TextMate grammars on every single request.

import { createHighlighter } from "shiki";

// Ensure we only spin up the highlighter engine once
// https://shiki.style/guide/install#highlighter-usage
let highlighter: Awaited<ReturnType<typeof createHighlighter>> | null = null;

async function getHighlighter() {
  if (!highlighter) {
    highlighter = await createHighlighter({
      themes: ["github-dark", "github-light"],
      langs: [
        "javascript", "typescript", "json", "html", "css", 
        "bash", "jsx", "tsx", "markdown", "python", "go", "rust"
        // Add whatever languages your CMS authors write in!
      ],
    });
  }
  return highlighter;
}

The HTML Transformer Function

Now, we write a function that takes our CMS HTML string, finds the code blocks, and replaces them.

export async function highlightContent(
  htmlContent: string,
  theme: "light" | "dark" = "dark"
): Promise<string> {
  const highlighter = await getHighlighter();

  // Regex to match: <pre...><code class="language-jsx">...</code></pre>
  const codeBlockRegex =
    /<pre[^>]*>\s*<code(?:\s+[^>]*?class="[^"]*?language-([^"\s]+)[^"]*?")?[^>]*>([\s\S]*?)<\/code>\s*<\/pre>/g;

  return htmlContent.replace(codeBlockRegex, (match, language, code) => {
    try {
      // 1. CMS platforms escape characters like < and > to prevent XSS.
      // We must decode them back to raw text before passing to Shiki.
      const decodedCode = code
        .replace(/&lt;/g, "<")
        .replace(/&gt;/g, ">")
        .replace(/&amp;/g, "&")
        .replace(/"/g, '"')
        .replace(/'/g, "'");

      // 2. Identify the language. Fallback to raw text if none is provided.
      const lang = language || "text";

      // 3. Ensure the language is actually loaded in our Shiki instance
      const supportedLanguages = highlighter.getLoadedLanguages();
      const finalLang = supportedLanguages.includes(lang) ? lang : "text";

      // 4. Generate the beautiful HTML
      return highlighter.codeToHtml(decodedCode, {
        lang: finalLang,
        theme: theme === "dark" ? "github-dark" : "github-light",
      });

    } catch (error) {
      console.warn("Failed to highlight code block:", error);
      // If something goes wrong, gracefully degrade by returning the original unstyled HTML
      return match;
    }
  });
}

Rendering in your Framework

Now, wherever you fetch data from your CMS (e.g., inside a Next.js React Server Component), you pass the HTML through your transformer before rendering it.

Because this happens on the server, the client just receives pre-colored <span> tags.

For example, our initially plain code block from earlier is handed to the browser looking like this:

<pre class="shiki github-dark" style="background-color:#24292e;color:#e1e4e8" tabindex="0"><code>
  <span class="line"><span style="color:#F97583">const</span><span style="color:#79B8FF"> message</span><span style="color:#F97583"> =</span><span style="color:#9ECBFF"> "Hello world"</span><span style="color:#E1E4E8">;</span></span>
  <span class="line"><span style="color:#E1E4E8">console.</span><span style="color:#B392F0">log</span><span style="color:#E1E4E8">(message);</span></span>
</code></pre>

All you have to do is render it!

// Example in a Next.js Server Component (app/blog/[slug]/page.tsx)
import { highlightContent } from "@/utils/highlight";
import { getPostFromCMS } from "@/lib/cms";

export default async function BlogPostPage({ params }) {
  const post = await getPostFromCMS(params.slug);
  
  // Transform the HTML on the server
  const highlightedHtml = await highlightContent(post.htmlContent, "dark");

  return (
    <article className="prose lg:prose-xl">
      <h1>{post.title}</h1>
      {/* Safely inject the pre-highlighted content */}
      <div dangerouslySetInnerHTML={{ __html: highlightedHtml }} />
    </article>
  );
}

Wrapping Up

By leveraging server-side rendering and tools like Shiki, formatting code blocks from a headless CMS is easier—and much faster for your users—than ever. You get perfectly accurate themes without sacrificing a single kilobyte of your client-side JavaScript bundle.

P.S. If you are using Marble as your headless CMS, the API delivers data structured perfectly for this regex, making it incredibly simple to drop this exact snippet into your codebase!

Try Marble today.

A simpler way to publish articles and manage your blog.