ajf

Andrea Faulds

Contents

PHP RFC: Unicode Codepoint Escape Syntax

Introduction

Despite the wide and increasing adoption of Unicode (and UTF-8 in particular) in PHP applications, PHP does not yet have a Unicode codepoint escape syntax in string literals, unlike many other languages. This is unfortunate, as in many cases it can be useful to specify Unicode codepoints by number, rather than using the codepoint directly. For example, say you wish to output the UTF-8 encoded Unicode codepoint U+202E RIGHT-TO-LEFT OVERRIDE in order to display text right-to-left. You could embed it in source code directly, but it is an invisible character and would display the rest of the line of code (or indeed entire program) in reverse!

The solution is to add a Unicode codepoint escape sequence syntax to string literals. This would mean you could produce U+202E like so:

echo "\u{202E}Reversed text"; // outputs ‮Reversed text

Another use is to visually distinguish between visually similar or identical, yet differently encoded, Unicode characters, if you need to output one or the other specifically. The following two lines of code actually have slightly different output, but you couldn't tell by looking at them:

echo "mañana";
echo "mañana";

However, by using an escape sequence to produce the ñ, it becomes clearer:

echo "ma\u{00F1}ana"; // pre-composed character
echo "man\u{0303}ana"; // "n" with combining ~ character (U+0303)

A further use is to produce characters you can't type on your keyboard. If you are unable to type the emoji for FACE WITH TEARS OF JOY, you can use its escape sequence instead:

echo "\u{1F602}"; // outputs 
            

Versions

Version Changed Date
5 Added Errata
0.1.3 \u without a following opening { passes through verbatim
0.1.2 Ruby support
0.1.1 Added Future Scope note on named literals
0.1 Initial version

Votes

An option needs 2/3 votes to win

Accept the Unicode Codepoint Escape Syntax RFC and merge into master? (92% approved)
User Vote
aharvey Yes
ajf Yes
davey Yes
dragoonis Yes
fa Yes
guilhermeblanco Yes
gwynne Yes
hywan Yes
indeyets Yes
jedibc Yes
jwage Yes
kalle Yes
klaussilveira Yes
kriscraig Yes
laruence No
mbeccati Yes
mfischer No
mike Yes
nikic Yes
pierrick Yes
pollita Yes
reeze Yes
stas Yes
yohgaki Yes
yunosh Yes