Bugzilla Strings Now Have UTF-8 Bit Set

Max Kanat-Alexander mkanat at bugzilla.org
Fri Nov 23 07:13:50 UTC 2007

	So, everybody should know that the following bug was just fixed:


	For anybody who doesn't understand the world of Perl and
Unicode, it can be kind of a complex subject, but in short: this is
going to make our lives a lot easier.

	Here's a brief explanation: When you're using UTF-8, in many
written languages a single character is more than one byte. This means
that length($string), if it measures bytes, doesn't really return what
we'd think of as the "length" of the string. For example, Insídeṛ looks
like seven characters, but it's 10 bytes long.

	Now, in Bugzilla, when the utf8 parameter is turned on, all
strings are treated as characters instead of bytes, so length($string)
will return the correct "length" of the string, not the number of bytes
in it. Strings going into the database and coming out of the database
are automatically considered to be UTF-8.

	7-bit ASCII strings (containing only values of 127 or below) are
still treated as bytes, because a byte and a character are the same
thing in ASCII. If for some reason you need to find out if a string is
utf-8 or just ASCII, you can use "utf8::is_utf8($string)". You don't
need to "use utf8" to do that--the function is always available as part
of the Perl core.

	However, you should never have to worry about that anymore. All
Bugzilla code should now automatically do the "right thing" without you
ever having to think about whether or not you're dealing with Unicode,
thanks to the patch just checked in in the above bug.

	Since all of this only works with the "utf8" flag on, anybody
anybody who wants to use a non-ASCII language in Bugzilla should be
using the "utf8" flag. This has always been true (as long as we've had
the utf8 flag), but since Bugzilla now works even *better* with
non-ASCII languages, it's even *more* true.

	I just wanted to let you know about all this, and the details of
Unicode in Perl, in case for some reason somebody needs to know these
things in the future.

Competent, Friendly Bugzilla and Perl Services. Everything Else, too.

More information about the developers mailing list