Viewed   63 times

I'm pretty new to PHP, but I've been programming in similar languages for years. I was flummoxed by the following:

class Foo {
    public $path = array(

It produced a syntax error: Parse error: syntax error, unexpected '(', expecting ')' in test.php on line 5 which is the realpath call.

But this works fine:

$path = array(

After banging my head against this for a while, I was told you can't call functions in an attribute default; you have to do it in __construct. My question is: why?! Is this a "feature" or sloppy implementation? What's the rationale?



The compiler code suggests that this is by design, though I don't know what the official reasoning behind that is. I'm also not sure how much effort it would take to reliably implement this functionality, but there are definitely some limitations in the way that things are currently done.

Though my knowledge of the PHP compiler isn't extensive, I'm going try and illustrate what I believe goes on so that you can see where there is an issue. Your code sample makes a good candidate for this process, so we'll be using that:

class Foo {
    public $path = array(

As you're well aware, this causes a syntax error. This is a result of the PHP grammar, which makes the following relevant definition:

      | T_VARIABLE '=' static_scalar //...

So, when defining the values of variables such as $path, the expected value must match the definition of a static scalar. Unsurprisingly, this is somewhat of a misnomer given that the definition of a static scalar also includes array types whose values are also static scalars:

static_scalar: /* compile-time evaluated scalars */
      | T_ARRAY '(' static_array_pair_list ')' // ...

Let's assume for a second that the grammar was different, and the noted line in the class variable delcaration rule looked something more like the following which would match your code sample (despite breaking otherwise valid assignments):

      | T_VARIABLE '=' T_ARRAY '(' array_pair_list ')' // ...

After recompiling PHP, the sample script would no longer fail with that syntax error. Instead, it would fail with the compile time error "Invalid binding type". Since the code is now valid based on the grammar, this indicates that there actually is something specific in the design of the compiler that's causing trouble. To figure out what that is, let's revert to the original grammar for a moment and imagine that the code sample had a valid assignment of $path = array( 2 );.

Using the grammar as a guide, it's possible to walk through the actions invoked in the compiler code when parsing this code sample. I've left some less important parts out, but the process looks something like this:

// ...
// Begins the class declaration
zend_do_begin_class_declaration(znode, "Foo", znode);
    // Set some modifiers on the current znode...
    // ...
    // Create the array
    // Add the value we specified
    zend_do_add_static_array_element(znode, NULL, 2);
    // Declare the property as a member of the class
    zend_do_declare_property('$path', znode);
// End the class declaration
zend_do_end_class_declaration(znode, "Foo");
// ...
// ...

While the compiler does a lot in these various methods, it's important to note a few things.

  1. A call to zend_do_begin_class_declaration() results in a call to get_next_op(). This means that it adds a new opcode to the current opcode array.
  2. array_init() and zend_do_add_static_array_element() do not generate new opcodes. Instead, the array is immediately created and added to the current class' properties table. Method declarations work in a similar way, via a special case in zend_do_begin_function_declaration().
  3. zend_do_early_binding() consumes the last opcode on the current opcode array, checking for one of the following types before setting it to a NOP:

Note that in the last case, if the opcode type is not one of the expected types, an error is thrown – The "Invalid binding type" error. From this, we can tell that allowing the non-static values to be assigned somehow causes the last opcode to be something other than expected. So, what happens when we use a non-static array with the modified grammar?

Instead of calling array_init(), the compiler prepares the arguments and calls zend_do_init_array(). This in turn calls get_next_op() and adds a new INIT_ARRAY opcode, producing something like the following:

SEND_VAL        '.'
DO_FCALL        'realpath'

Herein lies the root of the problem. By adding these opcodes, zend_do_early_binding() gets an unexpected input and throws an exception. As the process of early binding class and function definitions seems fairly integral to the PHP compilation process, it can't just be ignored (though the DECLARE_CLASS production/consumption is kind of messy). Likewise, it's not practical to try and evaluate these additional opcodes inline (you can't be sure that a given function or class has been resolved yet), so there's no way to avoid generating the opcodes.

A potential solution would be to build a new opcode array that was scoped to the class variable declaration, similar to how method definitions are handled. The problem with doing that is deciding when to evaluate such a run-once sequence. Would it be done when the file containing the class is loaded, when the property is first accessed, or when an object of that type is constructed?

As you've pointed out, other dynamic languages have found a way to handle this scenario, so it's not impossible to make that decision and get it to work. From what I can tell though, doing so in the case of PHP wouldn't be a one-line fix, and the language designers seem to have decided that it wasn't something worth including at this point.

Sunday, August 7, 2022

The mysql extension is ancient and has been around since PHP 2.0, released 15 years ago (!!); which is a decidedly different beast than the modern PHP which tries to shed the bad practices of its past. The mysql extension is a very raw, low-level connector to MySQL which lacks many convenience features and is thereby hard to apply correctly in a secure fashion; it's therefore bad for noobs. Many developers do not understand SQL injection and the mysql API is fragile enough to make it hard to prevent it, even if you're aware of it. It is full of global state (implicit connection passing for instance), which makes it easy to write code that is hard to maintain. Since it's old, it may be unreasonably hard to maintain at the PHP core level.

The mysqli extension is a lot newer and fixes all the above problems. PDO is also rather new and fixes all those problems too, plus more.

Due to these reasons* the mysql extension will be removed sometime in the future. It did its job in its heyday, rather badly, but it did it. Time has moved on, best practices have evolved, applications have gotten more complex and require a more modern API. mysql is being retired, live with it.

Given all this, there's no reason to keep using it except for inertia.

* These are my common sense summary reasons; for the whole official story, look here:

Choice quotes from that document follow:

The documentation team is discussing the database security situation, and educating users to move away from the commonly used ext/mysql extension is part of this.


Moving away from ext/mysql is not only about security but also about having access to all features of the MySQL database.


ext/mysql is hard to maintain code. It is not not getting new features. Keeping it up to date for working with new versions of libmysql or mysqlnd versions is work, we probably could spend that time better.

Friday, December 16, 2022

One reason is the lack of a JIT compiler in PHP, as others have mentioned.

Another big reason is PHP's dynamic typing. A dynamically typed language is always going to be slower than a statically typed language, because variable types are checked at run-time instead of compile-time. As a result, statically typed languages like C# and Java are going to be significantly faster at run-time, though they typically have to be compiled ahead of time. A JIT compiler makes this less of an issue for dynamically typed languages, but alas, PHP does not have one built-in. (Edit: PHP 8 will come with a built-in JIT compiler.)

Saturday, December 24, 2022

Globals are evil

This is true for the global keyword as well as everything else that reaches from a local scope to the global scope (statics, singletons, registries, constants). You do not want to use them. A function call should not have to rely on anything outside, e.g.

function fn()
    global $foo;              // never ever use that
    $a = SOME_CONSTANT        // do not use that
    $b = Foo::SOME_CONSTANT;  // do not use that unless self::
    $c = $GLOBALS['foo'];     // incl. any other superglobal ($_GET, …)
    $d = Foo::bar();          // any static call, incl. Singletons and Registries

All of these will make your code depend on the outside. Which means, you have to know the full global state your application is in before you can reliably call any of these. The function cannot exist without that environment.

Using the superglobals might not be an obvious flaw, but if you call your code from a Command Line, you don't have $_GET or $_POST. If your code relies on input from these, you are limiting yourself to a web environment. Just abstract the request into an object and use that instead.

In case of coupling hardcoded classnames (static, constants), your function also cannot exist without that class being available. That's less of an issue when it's classes from the same namespace, but when you start mix from different namespaces, you are creating a tangled mess.

Reuse is severly hampered by all of the above. So is unit-testing.

Also, your function signatures are lying when you couple to the global scope

function fn()

is a liar, because it claims I can call that function without passing anything to it. It is only when I look at the function body that I learn I have to set the environment into a certain state.

If your function requires arguments to run, make them explicit and pass them in:

function fn($arg1, $arg2)
    // do sth with $arguments

clearly conveys from the signature what it requires to be called. It is not dependent on the environment to be in a specific state. You dont have to do

$arg1 = 'foo';
$arg2 = 'bar';

It's a matter of pulling in (global keyword) vs pushing in (arguments). When you push in/inject dependencies, the function does not rely on the outside anymore. When you do fn(1) you dont have to have a variable holding 1 somewhere outside. But when you pull in global $one inside the function, you couple to the global scope and expect it to have a variable of that defined somewhere. The function is no longer independent then.

Even worse, when you are changing globals inside your function, your code will quickly be completely incomprehensible, because your functions are having sideeffects all over the place.

In lack of a better example, consider

function fn()
    global $foo;
    echo $foo;     // side effect: echo'ing
    $foo = 'bar';  // side effect: changing

And then you do

$foo = 'foo';
fn(); // prints foo
fn(); // prints bar <-- WTF!!

There is no way to see that $foo got changed from these three lines. Why would calling the same function with the same arguments all of a sudden change it's output or change a value in the global state? A function should do X for a defined input Y. Always.

This gets even more severe when using OOP, because OOP is about encapsulation and by reaching out to the global scope, you are breaking encapsulation. All these Singletons and Registries you see in frameworks are code smells that should be removed in favor of Dependency Injection. Decouple your code.

More Resources:

  • How is testing the registry pattern or singleton hard in PHP?
  • Flaw: Brittle Global State & Singletons
  • static considered harmful
  • Why Singletons have no use in PHP
  • SOLID (object-oriented design)
Monday, October 17, 2022

To answer your second question, where is _v?

Your version of the descriptor keeps _v in the descriptor itself. Each instance of the descriptor (the class-level instance SomeClass1, and all of the object-level instances in objects of class SomeClass2 will have distinct values of _v.

Look at this version. This version updates the object associated with the descriptor. This means the object (SomeClass1 or x2) will contain the attribute _v.

class MyDescriptor(object):
  def __get__(self, obj, type=None):
    print "get", self, obj, type
    return obj._v
  def __set__(self, obj, value):
    obj._v = value
    print "set", self, obj, value
Thursday, December 15, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :