Well, I am neither black-and-white with the topic. Both have valid reasons for use; UTF-8 in general is more efficient in space for any use where ASCII characters are predominant. UTF-16 is more efficient if non-Latin-1 characters are predominant. UTF-8 has backwards compatibility to ASCII if that is important. UTF-16 makes it easier and faster to count characters, take slices of characters, and scan for characters (and, so far, I insist no valid argument why this is the case has ever been produced - all UTF-8 solutions are hacky and/or harder to maintain). UTF-8 nicely works even with the native C char type, UTF-16 doesn't. In the end, all these considerations need to be weighed and your solution should accomodate what is best for your problem. In my case, all I want to say is that UTF-16 often turns out to be the better approach for text processing, while UTF-8 is useful for text storage and transmission.