How to convert a bunch of files from Simplified Chinese into Unicode

Converted 154 files from Chinese to Unicode today. Here’s how I ended up doing it on my OS X box:

find . -name "*.cfg" -exec sh -c 'iconv -f GBK -t UTF-8 "$1" > "../new/zh_CH/$1"' -- {} \;

lhunath‘s answer to this question on Stack Overflow was instrumental in getting the syntax right.

The other half was figuring out what type of Chinese text encoding was being used in the source files. EditPad Pro for Windows was extremely helpful in this regard, as it allowed me to quickly preview what many different text encodings looked like. In the command line above, “GBK” is the source text encoding (one of several text encoding standards for Simplified Chinese).

It costs money, but BinaryMark’s Batch Encoding Converter for Windows would have also done the conversion work once I’d figured out the source text encoding was GBK.  As it stands I used the “iconv” tool which is built into OS X.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.