Ghost and Markdown: Getting everything consistent on the backend

First of all, this was kind of a shit-show, certainly much more so than it needed to be. My goal in all of this was to get some consistency in and between my older posts (which have been imported → exported → imported god knows how many times and through how many platforms) and newer posts created directly within Ghost.

I went into this wanting to have everything present as Markdown inside the Ghost editor. It took many, many hours and a lot of headache before I was finally able to make that happen, mainly because of Ghost’s implementation of Mobiledoc and their Mobiledoc editor, which is something I just wasn’t expecting.

Of course all this yak-shaving has the masochist geek in me kind of itching to play around with a new platform (Hugo maybe?), but for now I'm sticking with Ghost. In any event, this process has at least proved to me that getting my data out of Ghost and into Markdown shouldn’t be too difficult if and when the time comes.

My (thought) process

The following outlines (a small percentage of) the steps I went through and how I ended up with Markdown in the end…well, Mobiledoc Markdown cards. Hopefully this will save some folks some time if they have plans to do something similar (and me too should I have occasion to do anything like this again).

Getting the posts out of Ghost

This was the easiest part as there’s only one option: use Ghost’s export dingus, which spits out a single JSON file, with post content in the Mobiledoc format (something I didn’t realize until much further into this whole ordeal). Most of my 3000+ posts were in HTML, which the exporter also includes in the export, in addition to “plaintext” versions. 🧐

Converting the posts to Markdown

After a bit of searching I came across Ghost-to-MD, and to my great surprise, it worked flawlessly out of the box, which never happens with these sorts of things. I didn’t have to massage the JSON or fart around with the code at all, which was surprising. (I ended up making a few slight modifications to template.md and index.js for particular things I wanted included in the output, but otherwise just left it alone.)

What I thought was going to be an issue at the time, but which ultimately turned out to be a good thing given all the work I was going to have to do to get everything how I wanted, was that this utility was created to move your Ghost installation into individual Markdown files for use in a static-site generator (e.g., Jekyll, Hugo, Gatsby etc.).

Fortunately for me, I still had a working Jekyll installation on one of my machines (I used Jekyll to power this site for many years). I spent a little time going through a random selection of the generated Markdown files to ensure things were converted properly, and nearly everything was. The one issue I noticed was with block quotes that contained newlines. (A similar issue cropped up again later, as you’ll see.) It was easy enough to fix, but still annoying; I considered modifying the code to use a converter other than h2m, but decided I had bigger fish to fry.

Converting the individual Markdown files to JSON

Ghost's importer requires a JSON file in the format that WordPress exports, but I didn't want to go through the hassle of moving the Markdown files to WordPress and then out again. Frankly, I didn't want anything touching WordPress because I wanted to keep the content as "clean" as possible, and there's no way WordPress wouldn't add all kinds of useless, messy shit to the export file. (Furthermore, if I wanted to go that route, I would have started by looking for Ghost-to-WordPress utilities, of which I'm sure there are more than a few.)

Because I didn't want to deal with WordPress and because I already had a working Jekyll installation, I went hunting for a Jekyll plugin that would do the conversion, and came across Jekyll-to-Ghost Exporter. (If it doesn't seem like the plugin is running, ensure that you've configured your _config.yml correctly; in my previous installation, I called out a single plugin in my config file, which was causing Jekyll to ignore the _plugins directory when building.)

There were a number of minor issues to resolve, but for the most part the conversion went smoothly. I did notice however that the JSON contained both Markdown and HTML versions of every single post, which I was nervous was either going to break the Ghost importer entirely, or cause inconsistencies in how the data was saved in Ghost’s backend. Because of this I modified the Jekyll-to-Ghost code to ensure that only Markdown content would be within the JSON import file.

Importing the JSON file into Ghost

Once I got everything to a place where I thought it was working fine, I created a JSON import file that contained just 5 posts (by putting only 5 Markdown posts in my Jekyll _posts directory). I changed the dates of the 5 posts to nearly the same date/time (to easily locate them in the Ghost interface), and modified their slugs so as not to disrupt my current installation or trigger new RSS entries.

The import into Ghost failed for reasons that of course were entirely opaque. The error message said the import was unsupported and to install Ghost v1.0 (Ghost is currently on v3+, so I thought that was kind of weird), import, and then update to the latest version of Ghost. 🙄

This wasn't an option for me because I don't self-host the Ghost software; I use Ghost(Pro), which is managed fully by them.

Given that it seemed I had no choice, I got the Ghost CLI up and running locally on my machine (which turned out to be a real blessing given all the work that was about to come), and the import failed there too, except this time part of the error message said "Unrecognized export version: 000". On a whim, I searched for this in the JSON, changed it to the version number I found in the "About" section of Ghost(Pro), and the import worked!

But, there was a big problem: everything imported correctly (date, title, slug, etc.), except the content. There was literally empty space where the content should have been.

So, I then went back and started analyzing the original export that Ghost created to see what might be throwing it off. This is when I noticed for the first time all the Mobiledoc stuff, and after some research realized that Ghost’s import won’t accept anything other than Mobiledoc; HTML, Markdown, etc. are ignored.

Ghost has a migrate tool that’ll convert HTML in the JSON to Mobiledoc, but unfortunately there’s no affordance for doing the same with Markdown. It was at this point I started thinking that it just might not be possible to get what I wanted.

I used the tool to do the conversion (which went off without a hitch), and the generated JSON file imported perfectly into Ghost (with some caveats listed below).

At this point, all of my Ghost posts were consistent, but not in the way I wanted, namely Markdown. For a day I left it alone, but it kept nagging at me that everything wasn’t in Markdown; even though I could still very easily export everything out and convert it to individual Markdown files should I need to at any point, I wanted every post to present as Markdown in the editor.

Markdown and Mobiledoc

I did more research into Mobiledoc and the JSON file that Ghost exports in an effort to see if I could somehow get everything into Markdown cards within Mobiledoc.

I realized I probably could wrap all of the necessary Mobiledoc stuff around the Markdown content for each post, but I wasn’t quite sure yet how I could actually make that happen. My first thought was to come up with some crazy regex find/replace that I could run against the JSON file, but really didn’t want to go down that path because I knew it’d be pretty difficult, if not impossible.

As a baseline test, I worked from a single Markdown post that contained just one word (plus all the necessary YAML front matter). With this single file I used Jekyll-to-Ghost to create the JSON, and then used Ghost’s migrate tool to get it into the accepted Mobiledoc format:

migrate json html /path/to/file.json

I also wiped my local Ghost install and created a single post with this same word, and then exported it via Ghost. Eyeing the Mobiledoc elements of both JSON files, everything looked identical, but the JSON created locally simply would not carry the content (in this case just one word) through the Ghost import process (similar to what happened earlier, before I was aware of the Mobiledoc requirement). Even straight copying the Mobiledoc element from one file to the other didn’t fix it.

Because I couldn’t see any difference by eyeing the text, I ended up diff’ing the Mobiledoc elements against each other, and noticed that, following the Markdown content (again, in this case, just a single word), there was an “n” where a quote character should have been. By replacing that “n” (which I think the importer was associating with a newline character as it was preceded with an escape character) with a quote character, the import worked!

With this, I was now certain that I could just wrap the Markdown elements of each post with the Mobiledoc stuff I gleaned from the Ghost export.

Wrapping the Markdown with Mobiledoc

Because I didn’t think the regex find/replace route was feasible, I thought instead about modifying the Jekyll-to-Ghost code, and hopefully bypassing entirely Ghost's migrate tool (which requires HTML). Accordingly, I made the following changes to jekylltoghost.rb.

Commented out these lines:

"markdown" => post.content,
"html" => converter.convert(post.content),

Added this line (in the same spot as the two lines above):

"mobiledoc" => "{\"version\":\"0.3.1\",\"atoms\":[],\"cards\":[[\"markdown\",{\"markdown\":\"" + (post.content).strip + "\"}]],\"markups\":[],\"sections\":[[10,0],[1,\"p\",[]]]}",

I don't even want to tell you how long it took me to get that one line right. There were a number of formatting and escaping issues, including the fact that post.content (which is basically everything in your Markdown file except the YAML front matter) kept inserting a newline character after the content, and for the life of me I couldn't figure out why. In any event, I used the strip() function to remove it.

Importing the Mobiledoc-wrapped Markdown into Ghost

The import of this JSON file (which contained just that single post with a single word) went perfectly, and I figured I had solved the damn thing. In light of this, I ran Ghost-to-MD across my entire site, and then ran the modified Jekyll-to-Ghost code against those files, imported into my local Ghost installation and…again, everything came over except the content! 🤬

As before, I reran everything with just a few posts (and this time with “real” posts that had content beyond just a single word), and diff’d the Mobiledoc elements again. It was clear that, within the Markdown portions, it was again (unescaped) newline characters that were throwing off the import process.

After running Ghost-to-MD and the modified Jekyll-to-Ghost code again across my entire site, and doing a find/replace across the generated multi-megabyte JSON to change every \n to \\n (for which I had to use UltraEdit because I couldn’t get BBEdit to interpret the sequence literally), everything imported perfectly into my local installation! FINALLY. After sanity checking a ton of the posts, I imported into my live site without issue. Whew!

I cannot tell you how excited I was to have figured this out and completed the transition. Now, when I jump into any post on my Ghost site, all content shows up as a single Markdown card. 🎉

In summary, follow these steps

  1. Use Ghost to export your entire site
  2. Run Ghost-to-MD on the exported JSON file
  3. Copy the output files into your Jekyll _posts directory
  4. Modify Jekyll-to-Ghost Exporter as described above and run jekyll build
  5. Open the generated JSON file (in your _site directory) and change "version”:”000” to "version":"3.40.2", and then replace every instance of \n with \\n
  6. Import the modified JSON into Ghost

Damn, it seems so easy now. 🤪

Some caveats

  • When you export your site from Ghost, pages get moved over as well, but show up as as regular posts on import (wtf?), so I recommend copying their raw content into a text file somewhere (especially if you’re running any kind of JavaScript, etc. within them) and pasting the content back into new pages after importing (and don’t forget to remove the posts that were created on import, else you won’t be able to use the same slug you were using before); if you aren’t wiping your entire site before doing this stuff (which also wipes your pages) then you don’t need to worry about this, but I can’t imagine why anyone wouldn’t wipe their site in this scenario
  • Tags, images, and authors aren’t covered by the process outlined above; if your posts have multiple authors, you might want to try this modified version of Jekyll-to-Ghost to see if it helps

Next up for the site?

Fixing search. Ghost’s native search is, well, awful. It’s embarrassingly bad. Thinking about trying out something like Algolia. If you’ve thoughts about Algolia or have other suggestions, please get in touch.