Viewed   47 times

I am downloading a CSV file from another server as a data feed from a vendor.

I am using curl to get the contents of the file and saving that into a variable called $contents.

I can get to that part just fine, but I tried exploding by r and n to get an array of lines but it fails with an 'out of memory' error.

I echo strlen($contents) and it's about 30.5 million chars.

I need to manipulate the values and insert them into a database. What do I need to do to avoid memory allocation errors?

 Answers

1

PHP is choking because it's running out memory. Instead of having curl populate a PHP variable with the contents of the file, use the

CURLOPT_FILE

option to save the file to disk instead.

//pseudo, untested code to give you the idea

$fp = fopen('path/to/save/file', 'w');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec ($ch);
curl_close ($ch);
fclose($fp);

Then, once the file is saved, instead of using the file or file_get_contents functions (which would load the entire file into memory, killing PHP again), use fopen and fgets to read the file one line at a time.

Friday, October 14, 2022
4

When you declare a class method/variable as static, it is bound to and shared by the class, not the object. From a memory management perspective what this means is that when the class definition is loaded into the heap memory, these static objects are created there. When the class's actual object is created in the stack memory and when updates on the static properties are done, the pointer to the heap which contains the static object gets updated. This does help to reduce memory but not by much.

From a programming paradigm, people usually choose to use static variables for architectural advantages more than memory management optimization. In other words, one might create static variables like you mentioned, when one wants to implement a singleton or factory pattern. It provides more powerful ways of knowing what is going on at a "class" level as opposed to what transpires at an "object" level.

Sunday, August 28, 2022
1

Your loop fills the variable $field for no reason (it writes to a different cell on every loop iteration), thereby using up more memory with every line.

You can replace:

$field[$loop] = explode ($delimiter, $line);
$export_date = $field[$loop][0];
$genre_id = $field[$loop][1];
$application_id = $field[$loop][2];

With:

list($export_date, $genre_id, $application_id) = explode($delimiter, $line);

For improved performance, you could take advantage of the ability to insert several lines using REPLACE INTO by grouping N lines into a single query.

Sunday, September 11, 2022
 
mdaniel
 
4

If you're afraid of O(m*n) behaviour - basically, you needn't, such cases don't occur naturally - here's a KMP implementation I had lying around which I've modified to take the length of the haystack. Also a wrapper. If you want to do repeated searches, write your own and reuse the borders array.

No guarantees for bug-freeness, but it seems to still work.

int *kmp_borders(char *needle, size_t nlen){
    if (!needle) return NULL;
    int i, j, *borders = malloc((nlen+1)*sizeof(*borders));
    if (!borders) return NULL;
    i = 0;
    j = -1;
    borders[i] = j;
    while((size_t)i < nlen){
        while(j >= 0 && needle[i] != needle[j]){
            j = borders[j];
        }
        ++i;
        ++j;
        borders[i] = j;
    }
    return borders;
}

char *kmp_search(char *haystack, size_t haylen, char *needle, size_t nlen, int *borders){
    size_t max_index = haylen-nlen, i = 0, j = 0;
    while(i <= max_index){
        while(j < nlen && *haystack && needle[j] == *haystack){
            ++j;
            ++haystack;
        }
        if (j == nlen){
            return haystack-nlen;
        }
        if (!(*haystack)){
            return NULL;
        }
        if (j == 0){
            ++haystack;
            ++i;
        } else {
            do{
                i += j - (size_t)borders[j];
                j = borders[j];
            }while(j > 0 && needle[j] != *haystack);
        }
    }
    return NULL;
}

char *sstrnstr(char *haystack, char *needle, size_t haylen){
    if (!haystack || !needle){
        return NULL;
    }
    size_t nlen = strlen(needle);
    if (haylen < nlen){
        return NULL;
    }
    int *borders = kmp_borders(needle, nlen);
    if (!borders){
        return NULL;
    }
    char *match = kmp_search(haystack, haylen, needle, nlen, borders);
    free(borders);
    return match;
}
Tuesday, December 6, 2022
 
malachi
 
1

If you want to read back a Clojure datum previously written to a file as a literal, you need to use read or read-string rather than load-file:

(with-open [fd (java.io.PushbackReader.
                (io/reader (io/file "/path/to/file")))]
  (read fd))

You can call read multiple times to read successive forms (as long as you hold the Reader open, of course).

This involves no evaluation except when the #= reader macro occurs in the input stream, in which case the form immediately following it is evaluated at read time and replaced with the result in read's output (e.g. (read-string "#=(+ 1 2)") returns 3). To prohibit evaluation of #= prefixed forms bind *read-eval* to false.

Friday, November 11, 2022
 
verbeia
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :