jump to navigation

PHP: Working with Japanese Text April 19, 2008

Posted by bluejay002 in PHP.
Tags: , , , , , , , , , , , , ,
add a comment

Unlike English text, Japanese text behaves differently. It is because Japanese characters itself is a combination of other Japanese characters. This case we call them mutlibyte, unlike English characters which are stored in a single byte.

I have been troubled the first time I got to make a website which needs to handle Japanese characters, both input and output. To make things easier, I will have let you see this checklist when working Japanese Text:

Choose which collation you are going to use
There are several collations that you can choose from. I will suggest two:
1. UTF-8
2. Shift_JIS
These are the commonly used collations.

If you plan to make websites that would handle Japanese text (with some English text on it), then Shift_JIS would do. It has a large collection of Kanji that you can work with (Check other Japanese websites, they usually use Shift_JIS). Also, with Shift_JIS, you can easily translate it with no problem at all to UTF-8.

If you plan on making a website that would later cater other multibyte characters as well (e.g., Chinese), then UTF-8 would be better. UTF-8 got a rich collection of kanji but changing to a different collation would make some of your characters unreadeable (since some cannot be translated perfectly).

Configure you IDE in accordance with the collation you are going to use
I have encountered problems before when I tried using Shift-JIS as the default charset and I tried saving it in UTF-8 format. It did not display characters correctly.

Specify the charset in your website
You can specify the charset of your website at the meta tag inside the head tag:

<meta http-equiv="Content-Type" content="text/html; charset=shift_jis" />

Set the collation in your website
To save your data as how it should be, the database should have the same collation with your charset. otherwise, you need to change literally change the encoding in PHP which would take some time during translation, its not really that long though.

If this is the case, use mb_convert_encoding() function in PHP. This is really handy when an existing database is already set with critical information, though in this manner you cannot expect to work perfectly in all cases.

That’s it! Wit this you should be ready to make website in Japanese, the rest will rely to your skills.

If you find this helpful, or if you have comments, suggestion, corrections or additions, I would really appreciate if you would drop some comments.

Happy programming! God bless!