zeev

Zeev Suraski

francois

Francois Laupretre

dmitry

Dmitry Stogov , from Zend Technologies
(user votes do not necessarily reflect their company's views)

Contents

PHP RFC: Coercive Types for Function Arguments

Background & Summary

Scalar Type Hints have been a top requested feature for PHP for a very, very long time. There have been numerous attempts at introducing them to the language, all of which failed to make it into the language thus far.

While there seems to be consensus regarding the viability and usefulness of adding Scalar Type Hints (STH), theres been a long standing debate regarding what is the correct way to implement them. The two key schools of thoughts around STH that emerged over the years are:

  1. Strict STH. In essence, this approach conducts a very simple zval.type check on the value being passed from the caller to the callee; If theres a match the argument is accepted, otherwise rejected. Coercion is (almost) never performed, so, for instance, a callee that expects an integer argument and is passed a 32 (string) will reject it.
  2. Dynamic (weak) STH. Unlike strict STH, dynamic STH automatically converts values which are not already of the type that is expected by the callee into that type, according to the same ruleset presently used by internal functions. That means that a callee expecting an integer that will be passed 32 (string), will successfully receive the argument, and it will be automatically converted to 32 (int). Note that existing dynamic STH rules (as they already exist for internal functions) are stricter than explicit casts. Example: a non-numeric string, like “oranges” is rejected when expecting an integer, while explicitly casting “oranges” into (int) will succeed without a notice, and result in 0.

Its important to note that in terms of the code *inside* the callee, theres absolutely no difference between the two schools of thought. In both cases, the callee can rely with absolute confidence that if it type hinted a certain argument as an int, this argument will always be an int when function execution begins. The difference is localized to the behavior surrounding the invocation of the callee by the caller, with Strict STH rejecting a lot more potential inputs, compared to Dynamic STH.

Proponents of Strict STH cite numerous advantages, primarily around code safety/security. In their view, the conversion rules proposed by Dynamic STH can easily allow garbage input to be silently converted into arguments that the callee will accept but that may, in many cases, hide difficult-to-find bugs or otherwise result in unexpected behavior.

Proponents of Dynamic STH bring up consistency with the rest of the language, including some fundamental type-juggling aspects that have been key tenets of PHP since its inception. Strict STH, in their view, is inconsistent with these tenets.

This RFC proposes a composite solution, which attempts to address the main goals of both camps, dubbed Coercive STH. Coercive STH is less restrictive than simple zval.type checks, but a lot more restrictive than the conversion rules presently employed by internal functions. It attempts to strike a balance between rejecting erroneous input, and allowing valid-but-wrongly-typed input, and outlines a gradual roadmap for transitioning internal functions to this new rule-set.

Finally, the RFC outlines a potential future evolution of employing the new rule-set into additional parts of PHP, most notably implicit type conversions (outside the scope of this specific RFC).

Proposal

Coercion Rules

A new set of coercion rules will apply to both user-land type hints and internal type hints. The guiding principals behind these new rules are:

  1. If the type of the value is an exact match to the type requested by the hint - allow.
  2. If the value can be coerced to the type requested by the hint without data loss and without creation of likely unintended data - allow.
  3. In all other cases - reject.

Here are the rules we get when applying these changes to the rules currently used in PHP 5 :

Value Type
Hint boolean int float string
bool Accept Accept* Reject Accept*
int Accept Accept Only if no DL† Numeric integer string only‡
float Accept Only if no DL† Accept Numeric string only‡
string Accept Accept Accept Accept

* Coercion from int or string into bool is done using the same rules that apply in the rest of PHP; 0, “0” and “” coerce to false; Any other value coerces to true.

† Float to int coercion will be accepted only if there are no significant digits after the decimal point. E.g. 7.0 will be coerced to 7, but 7.3 will be rejected. Int to float coercion will be accepted only if the integer value can be represented without loss of accuracy using a floating point number. Extremely large integers (with absolute value larger than 2^52) will be rejected.

‡ Numeric strings may be converted to int or float types, only in case there is no loss in data or accuracy. Leading zeroes, as well as leading and trailing whitespaces are accepted. Other non-numeric trailing data will be rejected. For int hints, numeric strings with significant digits after the decimal point will be rejected. For floating point hints, integer values that cannot be represented without loss of accuracy (exceed 2^52 in absolute value) will be rejected as well.

Handling of non-scalar values

Generally speaking, coercion from non-scalar values into scalar type hints is not supported and will be rejected, with few exceptions.

Arrays and resources will always be rejected as valid inputs for scalar type hinted arguments . Objects will always be rejected as valid inputs for scalar type hinted arguments, with one exception - an object with a __toString() method will be accepted for string type hint. Nulls will be rejected as valid inputs for scalar type hinted arguments when using user-land type hints, but presently accepted for internal functions. See the 'Changes to Internal Functions' section for more information.

User-land Additions

This RFC proposes to introduce four new type hints into PHP – int, float, string and bool. These new hints will adhere to the new coercion rules detailed above. Values that cannot be accepted per the coercion rules above, will result in E_RECOVERABLE_ERROR being triggered. Note that if the Exceptions in the Engine RFC is accepted, this will throw an exception instead, making recovery simpler and more straightforward.

These type hints can be used for function arguments, as well as for return values, as described in the Return Type Declarations RFC. In both cases, they are handled exactly the same way.

No type declaration for resources is added, as this would prevent moving from resources to objects for existing extensions which some have already done (e.g. GMP).

Changes to Internal Functions

This RFC proposes to bring the rule-set described in the last section to internal functions as well, through updates to the zend_parse_parameters() function.

However, given that unlike the introduction of STH - which is a new, previously unused feature that will (for the most part) not affect existing code - changes to what internal functions would be willing to accept could have substantial compatibility implications.

To mitigate the risk of compatibility breakage being introduced between PHP 5.6 and 7.0, two mitigation steps are proposed:

  1. To allow developers time to adhere to the updated rules, a two-staged migration strategy will be used. At the first stage - in PHP 7, conversions which were supported in the past and are no longer allowed due to the new rules, will emit E_DEPRECATED warnings, but will still allow the values in and convert them using the same rules as PHP 5. When it is considered safe (PHP 8 or later), the E_DEPRECATED warnings will be upgraded to E_RECOVERABLE_ERROR errors (or exceptions, depending on the engine standards). The two-staged would provide users ample time to update their code as needed to fit the new, more restrictive rule-set.
  2. Unlike user-land scalar type hints, internal functions will accept nulls as valid scalars. Based on preliminary testing, this is an extremely common use case, most often used in conjunction with uninitialized values. Disallowing it - language-wide for all internal functions - may be too big of a shift. Therefore, internal functions receiving a NULL (non-)value for a scalar will accept it, and convert it to 0 or an empty string in the same way PHP 5 does. Note that this discrepancy may be resolved in the future - by introducing new options for arguments that would explicitly reject NULL values, in the same manner user-land STH do. However, since this requires substantial auditing of internal functions - especially ones that have default values but don't explicitly declare themselves as accepting NULLs - it's outside the scope of this RFC and will be revisited for 7.1. Note that if the https://wiki.php.net/rfc/nullable_types is accepted it will further reduce this discrepancy, by allowing user-land functions and internal functions the same level of granularity in terms of accepting or rejecting NULL values for function arguments.
  3. As we don't clearly define a date for switching E_DEPRECATE to fatal errors, the RFC states that such decision cannot, in any cases, be made before a delay of 5 years after the first public release of a PHP distribution containing the STH features described here. This statement is voted upon as the rest of the RFC. So, it cannot be violated without a new vote on this specific subject. This statement is provided as a guarantee to the developers that they will have ample time to fix their code.

Impact on Real World Applications

The patch has been tested with numerous real world apps and frameworks, to attempt to gauge the impact the changes to the internal functions rules would have:

  • Drupal 7 homepage: One new E_DEPRECATED warning, which seems to catch faulty-looking code
  • Drupal 7 admin interface (across the all pages): One new E_DEPRECATED warning, which again seems to catch a real bug – stripslsahes() operating on a boolean.
  • Magento 1.9 homepage (w/ Magento's huge sample DB): One new E_DEPRECATED warning, again, seems to be catching a real bug of ‘false’ being fed as argument 1 of in json_decode() – which is expecting a string full of json there.
  • WordPress 3.4 homepage: One new E_DEPRECATED warning, again, seems to be catching a real bug of ‘false’ being fed as argument 1 of substr().
  • Zend Framework 2 skeleton app: Zero new E_DEPRECATED warnings.
  • Symfony ACME app: Zero new E_DEPRECATED warnings (across the app).
  • PHPUnit: Several E_DEPRECATED issues that were fixed in a matter of hours.

The negative impact on real world apps appears to be very, very limited - which is consistent with the premise that the Coercive STH RFC aims to allow the conversions which are common and most likely sensible, and block the ones which are likely faulty - which means we shouldn't see too many of those in real world apps.

In addition, the patch was tested with numerous unit-test suites; PHP's test suite shows a lot of new errors, however, the majority of them stem from tests purposely designed to check 'insensible' conversions (e.g. readgzfile($filename, -10.5)), and not code blocks that we're ever likely to bump into in the real world.

The Symfony and Zend Framework test suites were also run and showed new deprecation errors; Based on very preliminary analysis, it seems that most of them either fall into the same bucket as the PHP unit tests above (purposely designed to test insensible conversions), or seem to point out issues that may translate into real world bugs, and that can be fixed in a relatively small number of changes.

All in all, the signal to noise ratio of turning the new coercive rules for the entirety of PHP seems to be very good.

Difference from PHP 5

The following table details the changes made to the values acceptable by internal functions proposed by this patch (dash (-) means there were no changes):

Value Type
Hint boolean int float string
bool Unchanged Unchanged Reject Unchanged
int Reject Unchanged Restrict* Restrict†
float Reject Unchanged Unchanged Restrict†
string Reject Unchanged Unchanged Unchanged

* Only accept inputs that contain no significant digits after the decimal point.

† Numeric strings that have non-blank alphanumeric characters (e.g., “7 dogs”, “3.14 pizzas”) are no longer accepted.

Note that in all cases, when conversion occurs - its rules are identical to those in PHP 5. PHP 7 will accept fewer types of inputs as valid, but will apply the same conversion rules to the ones that are accepted.

Examples for conversions now deprecated for Internal Functions

Here are examples of conversions which, while still providing the same results as in PHP 5, now also raise an E_DEPRECATED error :

false -> int           # No more conversion from bool
true -> string         # No more conversion from bool
7.5 -> int             # 7.5 cannot be converted to an integer without data loss
"8.2" -> int           # "8.2" cannot be converted to an integer without data loss
4.3 -> bool            # No more conversion from float to bool
"7 dogs" -> int        # Non-blank trailing characters no longer supported
"3.14 pizzas" -> float # Non-blank trailing characters no longer supported 

Potential Future Changes to Implicit Casting Rules

While outside the scope of this RFC, the introduction of the new coercive-yet-more-restrictive rule-set may be considered for additional areas in PHP, most notably implicit casting. For example, today, the result of “Apples” + “Oranges” is 0, because the + operator implicitly casts anything into a number. It could be imagined that in the future, the + operator will accept only values that would fit into an int or float STH, and warn users about others (realistically, most probably through E_STRICT). Users would still be able to use permissive explicit casting ($foo = (int) “Apples”; would still assign 0 into $foo), but the risk sometimes associated with implicit casting will be eliminated.

Ability to add Strict STH in the future

It should be noted that nothing in this RFC conflicts with the ability to add Strict STH in the future. The ability to add a 2nd mode via declare() or some other mechanism will always be there. We do believe that demand for it will greatly diminish with the introduction of these scalar type hints, but in case it doesn't - there'll be no technical blocks preventing us from adding it in the future, even in the 7.x lifetime.

Comparison to the other RFC

Numerous community members have invested substantial effort into creating another comprehensive RFC, that proposes to introduce STH into PHP Scalar Type Hints RFC v0.5 ("Dual Mode RFC"). However, we believe the proposal in this RFC is better, for several different reasons:

  1. Single Mode. Thanks to the fact this RFC proposes a single mode and enables it across the board for any code that's run, you get the benefit of stricter-yet-sensible rules overnight. Thanks to the good signal-to-noise ratio, moving to PHP 7 is likely to help people find real world bugs, without having to proactively enable stricter modes and without having to introduce a lot of changes to their code.
  2. Smaller cognitive burden. Even though the Dual Mode RFC presents a novel idea about how to allow developers to choose which mode they'd like to use, and use different modes in different parts of the app, it still introduces the burden of two different modes. Two different rule-sets that need to be learned increase the language's complexity. Further, the two sets can cause the same functions to behave differently depending on where they're being called, and potentially a new class of bugs stemming from developers not realizing which mode they're in in a particular file. This RFC is unaffected by these issues, as it presents a single, composite rule set.
  3. Too strict may lead to too lax. In the Dual Mode RFC, when in Strict mode, in many cases, functions would reject values that, semantically, are acceptable. For example, a “32” (string) value coming back from an integer column in a database table, would not be accepted as valid input for a function expecting an integer. Since semantically the developer is interested in this argument-passing succeeding, they would have the choice of either removing the integer STH altogether, or, more likely, explicitly casting the value into an integer. This would have the opposite of the desired outcome of strict STHs - as explicit casts ($foo = (int) $foo;) always succeed, and would happily convert “100 dogs”, “Apples” and even arrays and booleans into an integer. Further, since already today, internal functions employ coercion rules that are more restrictive than PHP's explicit casting, pushing people towards explicit casting will actually make things worse in case developers opt for explicit casting as they pass values in an internal function call.
  4. Smooth integration with Data Sources. PHP uses strings extensively across the language, and in most cases, data sources always feed data into PHP as strings. PHP applications rely extensively on internal type juggling to convert that string-based data according to the needed context. Strict zval.type based STH effectively eliminates this behavior, moving the burden of worrying about type conversion to the user. The solution proposed in this RFC allows code that relies on type coercion to Just Work when the values are sensible, but fail (and appropriately warn the developer) otherwise.
  5. Forward compatibility for internal function calls. Codebases which will be tested & improved to work on PHP 7, can run without a problem on PHP 5 and benefit from the improved code quality.

In addition, there appear to be numerous misconception about benefits of strict type hinting, that to the best of our (deep) understanding of the associated technologies, aren't really there:

  1. Performance. There's complete consensus that there are no tangible performance differences between the strict and coercive typing. The difference is that strict typing would block scenarios that coercive typing would allow; But that's a difference in behavior, not performance.
  2. AOT/JIT implications. It is our position that there is no difference at all between strict and coercive typing in terms of potential future AOT/JIT development - none at all. In both the case of strict and coercive STH, we can have full confidence that the value inside the callee is of the requested type; And in both the case of strict and coercive STH, we can't make any assumptions about what is the type of value that the caller is passing as an argument. Again, the difference is only that strict typing may reject values that coercive typing may accept; But the very same checks need to be conducted in both cases; The very same type inference can be used in both cases to potentially optimize these checks away; Etc.
  3. Static Analysis. It is the position of several Strict STH proponents that Strict STH can help static analysis in certain cases. For the same reasons mentioned above about JIT, we don't believe that is the case - although it's possible that Strict Typing may be able to help static analysis in certain edge cases. However, it is our belief that even if that is true, Static Analyzers need to be designed for Languages, rather than Languages being designed for Static Analyzers.

Backward Incompatible Changes

Given the change to the acceptable values into a wide range of internal functions, this RFC is likely to result in a substantial number of newly introduced E_DEPRECATED warnings in internal function invocations, although those can be easily suppressed. When E_DEPRECATED is replaced with E_RECOVERABLE_ERROR in a future PHP version, users will be forced to update their code and 'clean it up' before they can upgrade. Also, the newly-introduced type hints (int, float, string and bool) will no longer permitted as class/interface/trait names (including with use and class_alias)

Proposed PHP Version(s)

7.0

Glossary

  • STH - Scalar Type Hints. Code structures designed to provide information to PHP regarding the nature of an argument that a function expects.
  • JIT - Just In Time (Compilation). In the context of PHP - selective translation of PHP opcodes into machine code while the code is already running, while potentially taking advantage of information known only at runtime.
  • AOT - Ahead Of Time (Compilation). Compiling PHP opcodes into machine code before it executes it.
  • Static Analysis - Analyzing code without running it, attempting to derive conclusions about security, performance, etc.

Patches and Tests

Votes

An option needs 2/3 votes to win

coercive_sth (38% approved)
User Vote
aharvey No
ajf No
andi Yes
ashnazg Yes
bate Yes
bishop Yes
brandon No
brianlmoon Yes
bwoebi No
Damien Tournoud No
danack No
demon Yes
derick No
diegopires No
dm No
dmitry Yes
elf Yes
eliw No
francois Yes
fredemmott No
galvao No
guilhermeblanco No
hirokawa Yes
indeyets No
ircmaxell No
jedibc No
jmikola No
joey No
jwage No
kalle No
kguest Yes
kinncj No
kk Yes
klaussilveira No
krakjoe No
lcobucci No
leigh No
mfischer No
mike Yes
mj Yes
mrook No
nikic No
ocramius No
olemarkus No
pajoye No
patrickallaert Yes
pauloelr No
peehaa No
philstu No
rasmus Yes
rdlowrey No
rdohms No
reeze No
rmf No
salathe No
sammywg Yes
sebastian Yes
shimooka Yes
sobak No
spriebsch Yes
stas Yes
stewartlord Yes
subjective No
thekid Yes
theseer Yes
thorstenr No
till No
toby No
weierophinney Yes
yohgaki Yes
zeev Yes