Migrate from Blogger to Ghost

My long time readers may have seen a huge number of changes to the look and feel of my blog recently. After so many years on Blogger I finally took the plunge and moved to a new platform, a new domain, a new comment system and a new look. Say "hi" to Ghost!

Migrate from Blogger to Ghost

My long time readers may have seen a huge number of changes to the look and feel of my blog recently. After so many years on Blogger I finally took the plunge and moved to a new platform, a new domain, a new comment system and a new look. Say "hi" to Ghost!

My old blog has served me for a long time and the Blogger platform was relatively simple, free and it has always "just worked", mostly.

The old blog looked a bit dated.

There were a few issues with the old blog as well, the library I used for Code Formatting was a major version behind, the theme and layout engine was very hard to tame, I let my tags become a mess over the years and many of my page urls were far from Search Engine friendly.

While digging into my options I created a list of desirements:

  • I must be able to clean up my tags
  • I want to move to a new domain name
  • I want to be able to clean up my page urls
  • I want to replace the code formatting library
  • I want to consolidate a few pages into one
  • I want to keep my SEO ranking
  • I'm willing to pay a bit of money to get what I want
  • I need to keep the old RSS/ATOM feed urls for aggregators
  • I wanted to keep the old comments on my old posts
  • I needed to take my blog images from blogger to the new platform

I wanted my migration to support for most of those things. Initially I tried Hugo and Jekyll. But the whole Python|Ruby|Node on Windows is still just a mess. The migration tools wouldn't work and relied on the old ATOM+RSS feed from Blogger which was ham-stringed by Google over time. The migration tools were all written in languages I do not master fully.

We're using Wordpress at Xpirit to host our website and I always get lost in all the control panel options, fields and tabs.

Then I found Ghost Pro. The hosted version of Ghost. It's simple, looks great and offered a powerful import/export module which gives full control over the content you're bringing into the platform. There was even an existing Import/Export tool for Blogger. So I decided to give it a try.

Trying out Ghost

First thing I did was setup a trial account and playing around. I manually copied over a few blog posts, and tried out a few things to see if I could get my Google+ comments to move over, add it to Google analytics and such. I didn't bump into any major issues. Then I exported my test site to analyze the structure of the export file to weigh my options.

Trial Migration

Next step was to try the existing Migration tool which was written in Node. A very, very old version of Node. Unfortunately I couldn't get the migration tool to play nice and while it did migrate over some of the content, it butchered most of it. So I took a brave step... I decided to build my own migration tool: Blogger2Ghost.NET. Written in C#, my most comfortable language. This step of course took a bit of time, but it also gave me full control over the migration process.

Getting the data out

The first thing I needed for the migration was a reliable method to get my data out. Step one was to get the RAW page contents. Luckily Blogger has added a "Export Data" option which gives you a file with the full XML contents of the blog.

Use the Back up Content option to get your data out.

This provided a great place to start. The next step was to parse all the HTML content to grab the Image URLs, which - with the help of the HTML Agility Pack, was a breeze as well. I created a simple command line tool to host the migration.

blogger2ghost images -i bloggerbackup.xml -o .\migration

This parses the blogger backup file and finds all the image urls. Then downloads them to a single images subfolder.

At this stage I found out that I had quite a few broken image links in old blog posts. The migration tools provides a list of broken images, but of course these will have to be fixed manually.

Later I found out that the <img> tag for a blogger hosted image links to a thumbnailed version of the image. A bit of extra code also grabs the larger image which can be found in the enclosing <a> tag.

Mapping things

Of course Blogger and Ghost don't use the exact same file format and I also wanted to clean up a few things. This meant generating a number of mapping files:

  • Image mappings. Based on the previously downloaded images it contains a list of all the original blogger image urls and the relative url to the Ghost blog root.
  • Url mappings. It contains a list of all the urls on blogger and their target slug on Ghost. This will later be used to change post urls, consolidate pages and a few other things.
  • User mappings. In order to map the Blogger author(s) to Ghost users.
  • Tag mappings. Blogger has a fewer restrictions on tags, so in order to import the blog I had to rename a few. I also use this file to delete and consolidate some tags.

The skeleton for these mapping files can easily be generated based on the Blogger export, so I built a feature in the migration tool to do this:

blogger2ghost mapping -i bloggerbackup.xml -o .\migration --all

After running it you'll find a number of mapping files in the migration folder which will need to be edited by hand:

In the authors.json add your to_email and slug. If your admin user is blog author (which is the case for me), make sure the data matches the existing user data. Other users will be created on import.

[
  {
    "id": "1",
    "name": "Jesse Houwing",
    "from_google_plus_url": "https://plus.google.com/108560985897799710937",
    "to_email": "jesse.houwing@gmail.com",
    "slug": "jesse-houwing"
  }
]

In the tags.json you'll be able to set the target slug and consolidate the tags in blogger through the blogger_tag list. You'll need to duplicate that to aliases unfortunately due to some incomplete refactoring.

[
  {
    "blogger_tag": [
      "Agile",
      "agile olympics"
    ],
    "slug": "agile",
    "name": "Agile",
    "description": null,
    "order": 1,
    "aliases": [
      "Agile",
      "agile olympics"
    ],
    "child_tags": [
      {
        "blogger_tag": [
          "Teams"
        ],
        "slug": "teams",
        "name": "Agile teams",
        "description": null,
        "order": 2,
        "aliases": [
          "Teams"
        ],
        "child_tags": []
      }
    ]
  }
]  

To remove a tag completely, simply remove it from the tags.json. To influence the order in which tags are assigned, change the value of order. Higher numbers will be assigned to posts first on import.

Ghost supports parent/child tags on import, though I haven't found a reliable way to change this later.

The urls.json file is used to map the blogger urls to their new location on Ghost. This file is also used to fix internal links between pages on your blog (Blogger by default always uses absolute urls for everything.

[
  {
    "from_url": "http://blog.jessehouwing.nl/2013/10/connecting-to-tfs-from-any-version-of.html",
    "to_url": "vsts-tfs-connect-any-visual-studio-version"
  }
]

The final mapping file is not generated as part of the preparation stage but is generated as part of the conversion. I'll discuss the redirects.json later.

Resizing images

If you want to make sure your images are similar in size or smaller than a certain dimension, this is the right time to do that. I did not build this as a feature of the migration tool, but it's pretty straightforward to use an image editing tool to resize your images on disk. There are even a number of free bulk-conversion tools available.

I used Adobe Photoshop's batch processing feature to reduce the size of my images. Just save the images in-place and they'll be the ones that are imported into Ghost.

Generating your Ghost import

When you've edited all of the mapping files, it's time for your first trial migration. Run the convert command to generate a ghost.json and then package that together with all of the images into a single zip file

If you want to keep your Google+ comments copy post.hbs to custom-blogger.hbs and add --template blogger to the command below.

This is explained in the follow-up: Show Google+ comments from Blogger in Ghost
blogger2ghost convert -i bloggerbackup.xml -o .\migration --zip --markdown

The convert command may find a few issues (such as misspelled tags, missing authors, etc) so you may need to go back to fix your mapping files. Once it generates a zip file successfully it's time to import it into Ghost to see what it looks like.

Import the ghost.zip using the Import Content option

It took me a few rounds and a couple of manual tweaks to get to a reasonable result. There are still a few things that get lost in the migration though. I decided to fix these after the migration.

Setting up your redirects

The convert command also spits out a redirects.json which can be loaded into Ghost to make sure all of your old urls are still working. It will include:

  • All of your manually mapped urls from urls.json
  • Map your Atom and RSS feed urls to the correct Ghost endpoints
  • Map your Tags to the original Tag search urls

You can manually tweak the file to add more redirects. I personally added the following to redirect all the old /year/ and /year/month/ urls to the root as well as catch requests for index.html.

  {
    "from": "\\/\\d{4}\\/?(\\d{2}\\/?)?$",
    "to": "/",
    "permanent": true
  },
  {
    "from": "\\/index\\.html$",
    "to": "/",
    "permanent": true
  }
Note: The from regex is case sensitive. I've updated Blogger2Ghost.NET to generate redirects that are case insensitive (since Blogger is case insensitive).

Finally, import the redirects.json using the Redirects feature which can also be found under Labs in Ghost:

Import redirects.json using the Redirects feature

Spot check to complete

I finally clicked through the most visited pages of my blog (the Blogger Stats page comes in handy) and had to correct a few issues.

The Blogger stats page will give you a good overview of important pages to check

Things that didn't migrate over well were:

In future blogs I'll walk through the other things I did: