pollita

Sara Golemon

Contents

PHP RFC: IntlChar class

Introduction

ICU exposes a great deal of i18n/l10n functionality beyond what is currently exposed by PHP. This RFC seeks to expose just a little bit more…

Proposal

Expose additional ICU functionality from uchar.h as IntlChar::*() following the ICU API as much as possible.

See hphp/runtime/ext/icu/ext_icu_uchar.php in https://reviews.facebook.net/D30573 for a full breakdown of the functions (complete with docblocks). Constants can be found in either PR's *-enum.h files.

Proposed PHP Version(s)

PHP 7 (or 5.next if there is one)

New Constants

Enumerations of UProperty, UCharNameChoice, UPropertyNameChoice, UCharDirection, UBlockCode, etc… For example:

class IntlChar {
  const PROPERTY_ALPHABETIC = _UCHAR_ALPHABETIC_;
  const PROPERTY_ASCII_HEX_DIGIT = _UCHAR_ASCII_HEX_DIGIT_;
  /* etc... */
}

New Static Methods

Mapping of ICU API to PHP. For example:

class IntlChar {
  static public function hasBinaryProperty(int $codepoint, int $property): bool;
  static public function isAlphabetic(int $codepoint): bool;
  /* etc... */
}

Note that properties taking a codepoint will accept either an integer codepoint value (e.g. 0x2603 for U+2603 SNOWMAN), or the character encoded as UTF-8 (e.g. “\xE2\x98\x83”). For methods which return a codepoint, they will return int unless they accepted a codepoint as a utf-8 string, in which case they remain utf-8.

Notes

I also added IntlChar::chr() and IntlChar::ord() which aren't directly part of the API, but they made sense as wrappers for the U8_*() family of macros.

Some methods take a range in the form ($start, $limit) which the range is INclusive of $start, and EXclusive of $limit. i.e. (0x20, 0x30) ⇒ 0x20..0x2F. I kept this meaning for $limit to stay consistent with the ICU API, but changing $limit to have the semantics of $end would probably make more sense in PHP.

Implementation

Votes

An option needs 50%+1 votes to win

Accept the IntlChar RFC and merge into master? (100% approved)
User Vote
ab Yes
aharvey Yes
ajf Yes
guilhermeblanco Yes
indeyets Yes
irker Yes
jedibc Yes
kinncj Yes
leigh Yes
mike Yes
pollita Yes
rasmus Yes
salathe Yes
stas Yes