One of the problems I faced when switching over to my new domain was how I was going to get my entries exported from Blogger into MoveableType. Luckily there are a number of resources out there. One thing I noticed, though, is that the tutorial I followed, while really good, didn’t offer any Mac-specific help for getting your images off Blogger (mentioning only the Windows-only* utility WinHTTrack).
This post discusses a Mac alternative to WinHTTrack, the freeware WebGrabber 0.7.
The main interface looks of WebGrabber looks like this:
Yes, it’s imposing with all those settings and options, but there were just three simple changes I needed to make in order to get the job done.
1. In the “Web Site to Download” field, fill in an address of a webpage to rip photos from. It must be an HTML page, so that WebGrabber knows which images to look for. Enter the address of one of your archive pages (the more entries, the better — more on this later). For the purposes of this tutorial, I’m going to paste an archive page from my brother Anand‘s blog, since my Blogger site is currently difficult to navigate 🙂
2. Click on “Ignore Robots.txt”. Robots.txt is a file used by search engines and other tools (like WebGrabber) when indexing the contents of the web. If the file exists, the search engine/program ignores whatever files are listed inside. In this case, Blogger creates a file which will cause WebGrabber to not download anything, so we want this box checked. For the curious, you can read more about the Robot Exclusion Standard here.
You should now have something that looks like this:
3. The third step. Since you’re ripping pictures, you only need to download files with suffixes (extensions) that correspond to graphic files. You can restrict WebGrabber to just the common file types by switching to the “Filtering” tab:
…and entering their extensions into the “Allowed File Suffixes” field. I used “jpg png gif” (without the quotes); you can enter as many as you’d like, just make sure you separate each type with a space character. While you’re on the Filtering tab, you should also enter the domain of the server containing your image files. This isn’t strictly necessary, but it can speed things up a fair bit, especially if your blog has ads and other 3rd-party image content on it. To determine your domain, open one of your existing images in a new browser window and note the URL. If you’re using Blogger to host, it’ll probably look something like the following:
The part you want is everything between “http://” and up to the first “/”, i.e., “photos1.blogger.com” — substitute as appropriate for your hosting service. This will ensure WebGrabber only retrieves images from your image hosting service. Enter your domain into the “Allowed Domains” field.
When you’re done, your tab should look something like this:
Click on the “Start” button and WebGrabber will retrieve all of the images linked to in the URL you specified. If you only spent a few months on Blogger (as was my case), you can simply copy the URL for each month of your archives and paste it into the “Web Site to Download” field on the main screen, clicking start after each one and waiting for it to complete. If you’ve spent many months on Blogger and this would cumbersome, you might consider changing your Blogger front page to hold all of your entries, as the tutorial explains in Step 7 and 8 of the section titled “EXTRACTING YOUR ENTRIES FROM BLOGGER” — with one small exception. Instead of clicking “Preview” in Step 8, click on “Save Template Changes”, and now you will have a proper URL (your blog address) to give to WebGrabber.
What you’ll end up with, when all is said and done, is a folder on your Desktop labelled “http”. If you navigate into this folder, you should see a folder resembling the name of your domain, e.g. “photos1.blogger.com_80”, and all of your images will be located inside! 🙂
Hopefully this helps out a few Mac-using MoveableTypoids out there.
* Yes, I realize the source code is available for the HTTrack. No, it won’t compile.