Astro RSS Feeds with Full MDX Content
RSS is pretty cool. We can pick our favorite websites, magazines, blogs, really anything that offers a public feed, add them to our aggregator of choice, and get notified about new content. I use NetNewsWire, because it has a great UI and lets me sync my reading across Apple devices, but there are many other great options available.
This blog is built with Astro. I initially used the recommended @astrojs/rss library to generate an RSS feed, but now I want my feed to include the full post content, not just title and description. Astro has a guide for that, but there is one major problem with that approach.
> “Note: this will not process components or JSX expressions in MDX files.”
… I’m using MDX. There hasn’t been a clear solution for this problem until recently, when Astro 4.9 was released. In this article, I show you exactly how I use the new Astro Container API to render the full article content when using MDX.
## The Astro Container API
This new API, at this point still marked as experimental, lets us render a single Astro Component to a string. That’s perfect for generating RSS feeds, because we only need the HTML of the post content, not the whole document structure. Let’s look at the code.
// getPostWithContent.ts
import type { APIContext } from 'astro';
import { experimental_AstroContainer as AstroContainer } from 'astro/container';
import { loadRenderers } from 'astro:container';
import { getContainerRenderer as getMDXRenderer } from '@astrojs/mdx';
import { render } from 'astro:content';
import { rehype } from 'rehype';
export default async function getPostsWithContent(context: APIContext) {
const siteURL = getSiteURL(context);
const container = await AstroContainer.create({
renderers: await loadRenderers([getMDXRenderer()]),
});
const posts = await getSortedBlogPosts();
return Promise.all(
posts.map(async post => {
const { Content } = await render(post);
const rawContent = await container.renderToString(Content);
const file = await rehype()
.data('settings', {
fragment: true,
})
.use(sanitizeHTML, { siteURL })
.process(rawContent);
return {
post,
content: String(file),
};
})
);
}
This function is responsible for fetching the post metadata and render the post content to a sanitized string. The `siteURL` comes from Astro’s `APIContext` which is available in all API functions. The function will be used inside of a `GET` request handler, but I’ll get to that later.
I’m using MDX for my blog posts, so the renderer I need to load for the Astro container is the MDX renderer (There are more renderers available and you can also write your own). Once the `container` is created, I load the posts from the content collection and `render` it to an Astro component.
At this point I could add the `<Content />` component to a `.astro` page, but since this is an API function, I pass it to `container.renderToString()` instead, which renders the Astro component as HTML to a string.
I could stop here and put this string into the RSS post content, but there are some issues with the HTML output that I have to fix first.
## Sanitizing the Output
Astro, being built for websites, rendered the post for a web page. Links to pages and images of the website use relative paths. Unfortunately that won’t work in RSS readers. To fix this I need to prefix each path with the site URL. To do this properly it requires a three-step process.
1. parse the HTML into AST format
2. modify some of the nodes
3. render the modified AST back into a string
This sounds like a lot. Parsing and rendering HTML is far from trivial. Lucky for us, there are great tools available. I choose unified, or rather rehype, because it’s well-documented and widely used. In fact, it’s used by Astro internally for rendering Markdown.
I added a single plugin to the processing chain, `sanitizeHTML`. `rehype` internally wraps that into a `rehypeParse`, to turn the HTML string into an abstract syntax tree (AST) and `rehypeStringify`, which turns the AST into serialized HTML.
Let me show you the sanitizing plugin.
// sanitizeHTML.ts
import type { Element, Root } from 'hast';
import type { Plugin } from 'unified';
import { visitParents } from 'unist-util-visit-parents';
interface SanitizeHTMLOptions {
siteURL: string;
}
const sanitizeHTML: Plugin<[SanitizeHTMLOptions], Root> = ({ siteURL }) => {
return tree => {
visitParents(tree, (node, parents) => {
if (node.type !== 'element') {
return;
}
// Remove all style tags
if (node.tagName === 'style') {
return removeElementNode(node, parents);
}
// Remove all script tags
if (node.tagName === 'script') {
return removeElementNode(node, parents);
}
// Remove all spans inside code tags
if (
node.tagName === 'span' &&
parents.some(parent => parent.type === 'element' && parent.tagName === 'code')
) {
return removeElementNode(node, parents, true);
}
// Fix relative link URLs
if (node.tagName === 'a' && typeof node.properties.href === 'string') {
node.properties.href = new URL(node.properties.href, siteURL).href;
}
if (node.tagName === 'a' && 'target' in node.properties) {
delete node.properties.target;
}
// Fix relative image URLs
if (node.tagName === 'img' && typeof node.properties.src === 'string') {
node.properties.src = new URL(node.properties.src, siteURL).href;
}
// Drop all style attributes
if ('style' in node.properties) {
delete node.properties.style;
}
// Drop all class attributes
if ('className' in node.properties) {
delete node.properties.className;
}
// Remove Astros data-astro-cid-... attributes
for (const key of Object.keys(node.properties)) {
if (key.startsWith('dataAstroCid')) {
// eslint-disable-next-line @typescript-eslint/no-dynamic-delete
delete node.properties[key];
}
}
});
};
};
export default sanitizeHTML;
I’m using unist-util-visit-parents because it gives me access to the parent nodes of each visited node. `removeElementNode` is a simple helper function that replaces a node with its children. The plugin does the following sanitization steps.
1. Remove all `<style>` tags. RSS readers take care of text styling and an RSS feed shouldn’t come with styles attached.
2. Remove all `<script>` tags, for similar reasons. RSS feeds should provide only text content and semantics and most RSS readers will ignore inline scripts.
3. Remove all `<span>` tags inside `<code>` tags. Astro’s syntax highlighting plugin adds a lot of `<span>` tags for styling purposes, but without styles they only add bloat.
4. Fix relative link URLs. For example, a link to /blog/view-transitions will be converted to https://prass.tech/blog/view-transitions
5. Fix relative image URLs for similar reasons.
6. Drop inline style attributes.
7. Drop all class attributes.
8. Remove `data-astro-cid-...` attributes. Those are used for styling in Astro.
This will produce minimal clutter-free HTML output.
## Creating RSS, ATOM and JSON feeds
@astrojs/rss has no built-in support for Atom feeds. That’s why I decided to use the popular feed library. It can handle RSS 2.0, Atom 1.0 and also JSON Feed 1.0. Perfect!
In Astro, to create an XML file, we can use a `GET` handler. I added the following files to the `/pages/feed` folder: `atom.ts`, `json.ts` and `rss.ts`. The `generateFeed` function contains the shared logic to create a `new Feed()`.
import { SITE } from '@config';
import getPostsWithContent from '@utils/feed/getPostsWithContent';
import type { APIContext } from 'astro';
import { Feed, type Author } from 'feed';
import getSiteURL from './getSiteURL';
export async function generateFeed(context: APIContext): Promise<Feed> {
const siteURL = getSiteURL(context);
const author: Author = {
name: SITE.author,
email: SITE.email,
link: SITE.website,
};
const feed = new Feed({
id: SITE.website,
link: siteURL,
language: SITE.language,
title: SITE.title,
description: SITE.desc,
favicon: new URL('/favicon.ico', siteURL).toString(),
copyright: SITE.license,
author,
feedLinks: {
json: new URL('/feed/json', siteURL).toString(),
atom: new URL('/feed/atom', siteURL).toString(),
rss: new URL('/feed/rss', siteURL).toString(),
},
});
const postsWithContent = await getPostsWithContent(context);
for (const { post, content } of postsWithContent) {
const link = new URL(`/blog/${post.id}/`, siteURL).toString();
feed.addItem({
id: link,
link,
title: post.data.title,
description: post.data.description,
published: post.data.pubDate,
content,
date: post.data.updatedDate || post.data.pubDate,
category: post.data.tags.map(tag => ({
name: tag,
term: tag.toLowerCase(),
domain: new URL(`/tags/${tag.toLowerCase()}/`, siteURL).toString(),
})),
});
}
return feed;
}
On each page I added the `GET` function with the feed output in the `Response`.
import { generateFeed } from '@utils/feed/generateFeed';
import type { APIContext } from 'astro';
export async function GET(context: APIContext) {
const feed = await generateFeed(context);
return new Response(feed.rss2(), {
headers: {
'Content-Type': 'application/xml',
},
});
}
_NOTE: The`Content-Type` header for `feed.json1()` is `application/json` and for `feed.atom1()` it’s `application/atom+xml`._
## Discoverability
One more small change is needed to make the feeds discoverable from every page of my website. I’m using a shared layout component, where I added the following lines inside of the `<head>` tag.
<link rel="alternate" type="application/rss+xml" title={`${SITE.shortTitle} RSS Feed`} href="/feed/rss" />
<link rel="alternate" type="application/json" title={`${SITE.shortTitle} JSON Feed`} href="/feed/json" />
<link rel="alternate" type="application/atom+xml" title={`${SITE.shortTitle} Atom Feed`} href="/feed/atom" />
And that’s it. Everyone can now read the full article content inside their favorite RSS/ATOM reader.
Curious to see the live result?
* RSS: https://prass.tech/feed/rss
* Atom: https://prass.tech/feed/atom
* JSON: https://prass.tech/feed/json
I hope you enjoyed this little excursion into the world of RSS and HTML parsing.