Consider the following PHP Code:
//Method 1
$array = array(1,2,3,4,5);
foreach($array as $i=>$number){
$number++;
$array[$i] = $number;
}
print_r($array);
//Method 2
$array = array(1,2,3,4,5);
foreach($array as &$number){
$number++;
}
print_r($array);
Both methods accomplish the same task, one by assigning a reference and another by re-assigning based on key. I want to use good programming techniques in my work and I wonder which method is the better programming practice? Or is this one of those it doesn't really matter things?
Since the highest scoring answer states that the second method is better in every way, I feel compelled to post an answer here. True, looping by reference is more performant, but it isn't without risks/pitfalls.
Bottom line, as always: "Which is better X or Y", the only real answers you can get are:
Be that as it may, as Orangepill showed, the reference-approach offers better performance. In this case, the tradeoff one of performance vs code that is less error-prone, easier to read/maintan. In general, it's considered better to go for safer, more reliable, and more maintainable code:
I guess that means the first method has to be considered best practice. But that doesn't mean the second approach should be avoided at all time, so what follows here are the downsides, pitfalls and quirks that you'll have to take into account when using a reference in a
foreach
loop:Scope:
For a start, PHP isn't truly block-scoped like C(++), C#, Java, Perl or (with a bit of luck) ECMAScript6... That means that the
$value
variable will not be unset once the loop has finished. When looping by reference, this means a reference to the last value of whatever object/array you were iterating is floating around. The phrase "an accident waiting to happen" should spring to mind.Consider what happens to
$value
, and subsequently$array
, in the following code:The output of this snippet will be:
In other words, by reusing the
$value
variable (which references the last element in the array), you're actually manipulating the array itself. This makes for error-prone code, and difficult debugging. As opposed to:Maintainability/idiot-proofness:
Of course, you might say that the dangling reference is an easy fix, and you'd be right:
But after you've written your first 100 loops with references, do you honestly believe you won't have forgotten to unset a single reference? Of course not! It's so uncommon to
unset
variables that have been used in a loop (we assume the GC will take care of it for us), so most of the time, you don't bother. When references are involved, this is a source of frustration, mysterious bug-reports, or traveling values, where you're using complex nested loops, possibly with multiple references... The horror, the horror.Besides, as time passes, who's to say that the next person working on your code won't foget about
unset
? Who knows, he might not even know about references, or see your numerousunset
calls and deem them redundant, a sign of your being paranoid, and delete them all together. Comments alone won't help you: they need to be read, and everyone working with your code should be thoroughly briefed, perhaps have them read a full article on the subject. The examples listed in the linked article are bad, but I've seen worse, still:This is a simple example of a traveling value problem. I did not make this up, BTW, I've come across code that boils down to this... honestly. Quite apart from spotting the bug, and understanding the code (which has been made more difficult by the references), it's still quite obvious in this example, mainly because it's a mere 15 lines long, even using the spacious Allman coding style... Now imagine this basic construct being used in code that actually does something even slightly more complex, and meaningful. Good luck debugging that.
side-effects:
It's often said that functions shouldn't have side-effects, because side-effects are (rightfully) considered to be code-smell. Though
foreach
is a language construct, and not a function, in your example, the same mindset should apply. When using too many references, you're being too clever for your own good, and might find yourself having to step through a loop, just to know what is being referenced by what variable, and when.The first method hasn't got this problem: you have the key, so you know where you are in the array. What's more, with the first method, you can perform any number of operations on the value, without changing the original value in the array (no side-effects):
This generates the output:
As you can see, the first, seventh and tenth elements have been altered, the others haven't. If we were to rewrite this code using a loop by reference, the loop looks a lot smaller, but the output will be different (we have a side-effect):
To counter this, we'll either have to create a temporary variable, or call the function tiwce, or add a key, and recalculate the initial value of
$v
, but that's just plain stupid (that's adding complexity to fix what shouldn't be broken):Anyway, adding branches, temp variables and what have you, rather defeats the point. For one, it introduces extra overhead which will eat away at the performance benefits references gave you in the first place.
If you have to add logic to a loop, to fix something that shouldn't need fixing, you should step back, and think about what tools you're using. 9/10 times, you chose the wrong tool for the job.
The last thing that, to me at least, is a compelling argument for the first method is simple: readability. The reference-operator (
&
) is easily overlooked if you're doing some quick fixes, or try to add functionality. You could be creating bugs in the code that was working just fine. What's more: because it was working fine, you might not test the existing functionality as thoroughly because there were no known issues.Discovering a bug that went into production, because of your overlooking an operator might sound silly, but you wouldn't be the first to have encountered this.
Note:
Passing by reference at call-time has been removed since 5.4. Be weary of features/functionality that is subject to changes. a standard iteration of an array hasn't changed in years. I guess it's what you could call "proven technology". It does what it says on the tin, and is the safer way of doing things. So what if it's slower? If speed is an issue, you can optimize your code, and introduce references to your loops then.
When writing new code, go for the easy-to-read, most failsafe option. Optimization can (and indeed should) wait until everything's tried and tested.
And as always: premature optimization is the root of all evil. And Choose the right tool for the job, not because it's new and shiny.