We are using MySQL at the company I work for, and we build both client-facing an
ID: 639692 • Letter: W
Question
We are using MySQL at the company I work for, and we build both client-facing and internal applications using Ruby on Rails.
When I started working here, I ran into a problem what I had never encountered before; the database on the production server is set to Latin-1, meaning that the MySQL gem throws an exception whenever there is user input where the user copies & pastes UTF-8 characters.
My boss calls these "bad characters" since most of them are non-printable characters, and says that we need to strip them out. I've found a few ways to do this, but eventually we've ended up in a circumstance where a UTF-8 character was needed. Plus it's a bit of a hassle, especially since it seems like the only solution I ever read about for this issue is to just set the database to UTF-8 (makes sense to me).
The only argument that I've heard for sticking with Latin-1 is that allowing non-printable UTF-8 characters can mess up text/full-text searches in MySQL. Is this really true?
Are there other reasons one should use Latin-1 over UTF-8? It's my understanding that it is superior and becoming more ubiquitous.
Explanation / Answer
Unicode is certainly difficult, and the UTF-8 encoding has a couple of inconvenient properties. However, UTF-8 has become the de-facto standard encoding on the web, surpassing ASCII, Latin-1, UCS-2 and UTF-16. Just use UTF-8 everywhere.
The most important reason why you should support Unicode is that you shouldn't make unnecessary assumptions about user input. I have no idea what your domain is, but things like Hebrew usernames, a blog post about China, a comment with Emoji, or simply well styled text
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.