ASP and Unicode UTF-8

Posted onThursday, 01 May 2008

Tagged withutf-8unicodesyntaxhighlighterSpanishRussianPortugeseFrenchASP

We were recently working with internationalisation of a client site which needed to support French, Portugese, Spanish and Russian translations of the existing English site. The European translations were pretty easy, but came across some issues with the different character sets and the Russian translation was near impossible as most characters appeared as ?'s.

Upon some further investigating and much googling we discovered that we must use Unicode encoding. On the web the most common Unicode used is UTF-8. For a good introduction to Unicode, UTF-8 and other character sets read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky. And also check out the Unicode site and I18n Guy.

In order to serve up UTF-8 encoded text correctly in ASP, there are some important changes we must make to the HTTP headers:

ASP CODE:

Response.ContentType = "text/html"
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8" 


and the following HTML META tag:

<meta equiv="Content-Type" content="text/html;charset=UTF-8" />