Viewed   140 times

What exactly is the difference between array_map, array_walk and array_filter. What I could see from documentation is that you could pass a callback function to perform an action on the supplied array. But I don't seem to find any particular difference between them.

Do they perform the same thing?
Can they be used interchangeably?

I would appreciate your help with illustrative example if they are different at all.

 Answers

4
  • Changing Values:
    • array_map cannot change the values inside input array(s) while array_walk can; in particular, array_map never changes its arguments.
  • Array Keys Access:
    • array_map cannot operate with the array keys, array_walk can.
  • Return Value:
    • array_map returns a new array, array_walk only returns true. Hence, if you don't want to create an array as a result of traversing one array, you should use array_walk.
  • Iterating Multiple Arrays:
    • array_map also can receive an arbitrary number of arrays and it can iterate over them in parallel, while array_walk operates only on one.
  • Passing Arbitrary Data to Callback:
    • array_walk can receive an extra arbitrary parameter to pass to the callback. This mostly irrelevant since PHP 5.3 (when anonymous functions were introduced).
  • Length of Returned Array:
    • The resulting array of array_map has the same length as that of the largest input array; array_walk does not return an array but at the same time it cannot alter the number of elements of original array; array_filter picks only a subset of the elements of the array according to a filtering function. It does preserve the keys.

Example:

<pre>
<?php

$origarray1 = array(2.4, 2.6, 3.5);
$origarray2 = array(2.4, 2.6, 3.5);

print_r(array_map('floor', $origarray1)); // $origarray1 stays the same

// changes $origarray2
array_walk($origarray2, function (&$v, $k) { $v = floor($v); }); 
print_r($origarray2);

// this is a more proper use of array_walk
array_walk($origarray1, function ($v, $k) { echo "$k => $v", "n"; });

// array_map accepts several arrays
print_r(
    array_map(function ($a, $b) { return $a * $b; }, $origarray1, $origarray2)
);

// select only elements that are > 2.5
print_r(
    array_filter($origarray1, function ($a) { return $a > 2.5; })
);

?>
</pre>

Result:

Array
(
    [0] => 2
    [1] => 2
    [2] => 3
)
Array
(
    [0] => 2
    [1] => 2
    [2] => 3
)
0 => 2.4
1 => 2.6
2 => 3.5
Array
(
    [0] => 4.8
    [1] => 5.2
    [2] => 10.5
)
Array
(
    [1] => 2.6
    [2] => 3.5
)
Friday, October 28, 2022
1

Actually, this can be done. Through a php extension.

File: config.m4

PHP_ARG_ENABLE(test, whether to enable test Extension support, [ --enable-test   Enable test ext support])

if test "$PHP_TEST" = "yes"; then
  AC_DEFINE(HAVE_TEST, 1, [Enable TEST Extension])
  PHP_NEW_EXTENSION(test, test.c, $ext_shared)
fi

File: php_test.h

#ifndef PHP_TEST_H
#define PHP_TEST_H 1

#define PHP_TEST_EXT_VERSION "1.0"
#define PHP_TEST_EXT_EXTNAME "test"

PHP_FUNCTION(getaddress4);
PHP_FUNCTION(getaddress);

extern zend_module_entry test_module_entry;
#define phpext_test_ptr &test_module_entry

#endif

File: test.c

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include "php.h"
#include "php_test.h"

ZEND_BEGIN_ARG_INFO_EX(func_args, 1, 0, 0)
ZEND_END_ARG_INFO()

static function_entry test_functions[] = {
    PHP_FE(getaddress4, func_args)
    PHP_FE(getaddress, func_args)
    {NULL, NULL, NULL}
};

zend_module_entry test_module_entry = {
#if ZEND_MODULE_API_NO >= 20010901
    STANDARD_MODULE_HEADER,
#endif
    PHP_TEST_EXT_EXTNAME,
    test_functions,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
#if ZEND_MODULE_API_NO >= 20010901
    PHP_TEST_EXT_VERSION,
#endif
    STANDARD_MODULE_PROPERTIES
};

#ifdef COMPILE_DL_TEST
ZEND_GET_MODULE(test)
#endif

PHP_FUNCTION(getaddress4)
{
    zval *var1;
    zval *var2;
    zval *var3;
    zval *var4;
    char r[500];
    if( zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "aaaa", &var1, &var2, &var3, &var4) == FAILURE ) {
      RETURN_NULL();
    }
    sprintf(r, "n%p - %p - %p - %pn%p - %p - %p - %p", var1, var2, var3, var4, Z_ARRVAL_P(var1), Z_ARRVAL_P(var2), Z_ARRVAL_P(var3), Z_ARRVAL_P(var4) );
    RETURN_STRING(r, 1);
}

PHP_FUNCTION(getaddress)
{
    zval *var;
    char r[100];
    if( zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "a", &var) == FAILURE ) {
      RETURN_NULL();
    }
    sprintf(r, "%p", Z_ARRVAL_P(var));
    RETURN_STRING(r, 1);
}

Then all you have to do is phpize it, config it, and make it. Add a "extension=/path/to/so/file/modules/test.so" to your php.ini file. And finally, restart the web server, just in case.

<?php
  $x = array("123"=>"123");
  $w = $x;
  $y = $x;
  $z = &$x;
  var_dump(getaddress4($w,$x,$y,$z));
  var_dump(getaddress($w));
  var_dump(getaddress($x));
  var_dump(getaddress($y));
  var_dump(getaddress($z));
?>

Returns(at least for me, your memory addresses will probably be different)

string '
0x9efeb0 - 0x9effe0 - 0x9ef8c0 - 0x9efeb0
0x9efee0 - 0x9f0010 - 0x9ed790 - 0x9efee0' (length=84)

string '0x9efee0' (length=8)

string '0x9f0010' (length=8)

string '0x9ed790' (length=8)

string '0x9efee0' (length=8)

Thanks to Artefacto for pointing this out, but my original code was passing the arrays by value, so thereby was recreating arrays including the referenced-one, and giving you bad memory values. I have since changed the code to force all params to be passed by reference. This will allow references, arrays, and object, to be passed in unmolested by the php engine. $w/$z are the same thing, but $w/$x/$y are not. The old code, actually showed the reference breakage and the fact that the memory addresses would change or match when all variables were passed in vs multiple calls to the same function. This was because PHP would reuse the same memory when doing multiple calls. Comparing the results of the original function would be useless. The new code should fix this problem.

FYI - I'm using php 5.3.2.

Sunday, September 4, 2022
 
sk0x50
 
3

You can try below code to merge array. Code generates desired output required to you. I have used sample array as given by you:

<?php
    $arr1=array(
        "384"=>array("name"=>"SomeMovieName1","age"=>"12.2 hrs","IMDBLink"=>"","IMDBRating"=>"", "coverArt"=>""),
        "452"=>array("name"=>"SomeMovieName2","age"=>"15.2 hrs","IMDBLink"=>"","IMDBRating"=>"", "coverArt"=>""),
        "954"=>array("name"=>"SomeMovieName3","age"=>"4.2 hrs","IMDBLink"=>"","IMDBRating"=>"", "coverArt"=>"")
    );
    $arr2=array(
       "384" => array("IMDBLink" => "7.2", "IMDBRating" => "http://www.imdb.com/LinkToMovie1", "coverArt" => "http://www.SomeLinkToCoverArt.com/1"),
       "452" => array("IMDBLink" => "5","IMDBRating" => "http://www.imdb.com/LinkToMovie2", "coverArt" => "http://www.SomeLinkToCoverArt.com/2"),
       "954"=>array("IMDBLink" => "8","IMDBRating" => "http://www.imdb.com/LinkToMovie3", "coverArt" => "http://www.SomeLinkToCoverArt.com/3")
    );
    $arr3 = array();
    foreach($arr1 as $key=>$val)
    {
         $arr3[] = array_merge($val, $arr2[$key]);
    }
    echo "<pre>";
    print_r($arr3);
?>
Tuesday, September 13, 2022
4

rev4: A very eloquent comment by user Sammaron has noted that, perhaps, this answer previously confused top-down and bottom-up. While originally this answer (rev3) and other answers said that "bottom-up is memoization" ("assume the subproblems"), it may be the inverse (that is, "top-down" may be "assume the subproblems" and "bottom-up" may be "compose the subproblems"). Previously, I have read on memoization being a different kind of dynamic programming as opposed to a subtype of dynamic programming. I was quoting that viewpoint despite not subscribing to it. I have rewritten this answer to be agnostic of the terminology until proper references can be found in the literature. I have also converted this answer to a community wiki. Please prefer academic sources. List of references: {Web: 1,2} {Literature: 5}

Recap

Dynamic programming is all about ordering your computations in a way that avoids recalculating duplicate work. You have a main problem (the root of your tree of subproblems), and subproblems (subtrees). The subproblems typically repeat and overlap.

For example, consider your favorite example of Fibonnaci. This is the full tree of subproblems, if we did a naive recursive call:

TOP of the tree
fib(4)
 fib(3)...................... + fib(2)
  fib(2)......... + fib(1)       fib(1)........... + fib(0)
   fib(1) + fib(0)   fib(1)       fib(1)              fib(0)
    fib(1)   fib(0)
BOTTOM of the tree

(In some other rare problems, this tree could be infinite in some branches, representing non-termination, and thus the bottom of the tree may be infinitely large. Furthermore, in some problems you might not know what the full tree looks like ahead of time. Thus, you might need a strategy/algorithm to decide which subproblems to reveal.)


Memoization, Tabulation

There are at least two main techniques of dynamic programming which are not mutually exclusive:

  • Memoization - This is a laissez-faire approach: You assume that you have already computed all subproblems and that you have no idea what the optimal evaluation order is. Typically, you would perform a recursive call (or some iterative equivalent) from the root, and either hope you will get close to the optimal evaluation order, or obtain a proof that you will help you arrive at the optimal evaluation order. You would ensure that the recursive call never recomputes a subproblem because you cache the results, and thus duplicate sub-trees are not recomputed.

    • example: If you are calculating the Fibonacci sequence fib(100), you would just call this, and it would call fib(100)=fib(99)+fib(98), which would call fib(99)=fib(98)+fib(97), ...etc..., which would call fib(2)=fib(1)+fib(0)=1+0=1. Then it would finally resolve fib(3)=fib(2)+fib(1), but it doesn't need to recalculate fib(2), because we cached it.
    • This starts at the top of the tree and evaluates the subproblems from the leaves/subtrees back up towards the root.
  • Tabulation - You can also think of dynamic programming as a "table-filling" algorithm (though usually multidimensional, this 'table' may have non-Euclidean geometry in very rare cases*). This is like memoization but more active, and involves one additional step: You must pick, ahead of time, the exact order in which you will do your computations. This should not imply that the order must be static, but that you have much more flexibility than memoization.

    • example: If you are performing fibonacci, you might choose to calculate the numbers in this order: fib(2),fib(3),fib(4)... caching every value so you can compute the next ones more easily. You can also think of it as filling up a table (another form of caching).
    • I personally do not hear the word 'tabulation' a lot, but it's a very decent term. Some people consider this "dynamic programming".
    • Before running the algorithm, the programmer considers the whole tree, then writes an algorithm to evaluate the subproblems in a particular order towards the root, generally filling in a table.
    • *footnote: Sometimes the 'table' is not a rectangular table with grid-like connectivity, per se. Rather, it may have a more complicated structure, such as a tree, or a structure specific to the problem domain (e.g. cities within flying distance on a map), or even a trellis diagram, which, while grid-like, does not have a up-down-left-right connectivity structure, etc. For example, user3290797 linked a dynamic programming example of finding the maximum independent set in a tree, which corresponds to filling in the blanks in a tree.

(At it's most general, in a "dynamic programming" paradigm, I would say the programmer considers the whole tree, then writes an algorithm that implements a strategy for evaluating subproblems which can optimize whatever properties you want (usually a combination of time-complexity and space-complexity). Your strategy must start somewhere, with some particular subproblem, and perhaps may adapt itself based on the results of those evaluations. In the general sense of "dynamic programming", you might try to cache these subproblems, and more generally, try avoid revisiting subproblems with a subtle distinction perhaps being the case of graphs in various data structures. Very often, these data structures are at their core like arrays or tables. Solutions to subproblems can be thrown away if we don't need them anymore.)

[Previously, this answer made a statement about the top-down vs bottom-up terminology; there are clearly two main approaches called Memoization and Tabulation that may be in bijection with those terms (though not entirely). The general term most people use is still "Dynamic Programming" and some people say "Memoization" to refer to that particular subtype of "Dynamic Programming." This answer declines to say which is top-down and bottom-up until the community can find proper references in academic papers. Ultimately, it is important to understand the distinction rather than the terminology.]


Pros and cons

Ease of coding

Memoization is very easy to code (you can generally* write a "memoizer" annotation or wrapper function that automatically does it for you), and should be your first line of approach. The downside of tabulation is that you have to come up with an ordering.

*(this is actually only easy if you are writing the function yourself, and/or coding in an impure/non-functional programming language... for example if someone already wrote a precompiled fib function, it necessarily makes recursive calls to itself, and you can't magically memoize the function without ensuring those recursive calls call your new memoized function (and not the original unmemoized function))

Recursiveness

Note that both top-down and bottom-up can be implemented with recursion or iterative table-filling, though it may not be natural.

Practical concerns

With memoization, if the tree is very deep (e.g. fib(10^6)), you will run out of stack space, because each delayed computation must be put on the stack, and you will have 10^6 of them.

Optimality

Either approach may not be time-optimal if the order you happen (or try to) visit subproblems is not optimal, specifically if there is more than one way to calculate a subproblem (normally caching would resolve this, but it's theoretically possible that caching might not in some exotic cases). Memoization will usually add on your time-complexity to your space-complexity (e.g. with tabulation you have more liberty to throw away calculations, like using tabulation with Fib lets you use O(1) space, but memoization with Fib uses O(N) stack space).

Advanced optimizations

If you are also doing a extremely complicated problems, you might have no choice but to do tabulation (or at least take a more active role in steering the memoization where you want it to go). Also if you are in a situation where optimization is absolutely critical and you must optimize, tabulation will allow you to do optimizations which memoization would not otherwise let you do in a sane way. In my humble opinion, in normal software engineering, neither of these two cases ever come up, so I would just use memoization ("a function which caches its answers") unless something (such as stack space) makes tabulation necessary... though technically to avoid a stack blowout you can 1) increase the stack size limit in languages which allow it, or 2) eat a constant factor of extra work to virtualize your stack (ick), or 3) program in continuation-passing style, which in effect also virtualizes your stack (not sure the complexity of this, but basically you will effectively take the deferred call chain from the stack of size N and de-facto stick it in N successively nested thunk functions... though in some languages without tail-call optimization you may have to trampoline things to avoid a stack blowout).


More complicated examples

Here we list examples of particular interest, that are not just general DP problems, but interestingly distinguish memoization and tabulation. For example, one formulation might be much easier than the other, or there may be an optimization which basically requires tabulation:

  • the algorithm to calculate edit-distance[4], interesting as a non-trivial example of a two-dimensional table-filling algorithm
Thursday, December 22, 2022
 
mini
 
1

Unlike a List<> ...

  1. A HashSet is a List with no duplicate members.

  2. Because a HashSet is constrained to contain only unique entries, the internal structure is optimised for searching (compared with a list) - it is considerably faster

  3. Adding to a HashSet returns a boolean - false if addition fails due to already existing in Set

  4. Can perform mathematical set operations against a Set: Union/Intersection/IsSubsetOf etc.

  5. HashSet doesn't implement IList only ICollection

  6. You cannot use indices with a HashSet, only enumerators.

The main reason to use a HashSet would be if you are interested in performing Set operations.

Given 2 sets: hashSet1 and hashSet2

 //returns a list of distinct items in both sets
 HashSet set3 = set1.Union( set2 );

flies in comparison with an equivalent operation using LINQ. It's also neater to write!

Tuesday, August 16, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :