Viewed   91 times

I have a PHP script that takes a long time (5-30 minutes) to complete. Just in case it matters, the script is using curl to scrape data from another server. This is the reason it's taking so long; it has to wait for each page to load before processing it and moving to the next.

I want to be able to initiate the script and let it be until it's done, which will set a flag in a database table.

What I need to know is how to be able to end the http request before the script is finished running. Also, is a php script the best way to do this?

 Answers

2

Certainly it can be done with PHP, however you should NOT do this as a background task - the new process has to be dissocated from the process group where it is initiated.

Since people keep giving the same wrong answer to this FAQ, I've written a fuller answer here:

http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html

From the comments:

The short version is shell_exec('echo /usr/bin/php -q longThing.php | at now'); but the reasons why are a bit long for inclusion here.

Tuesday, September 27, 2022
3

Swiftmailer

It's awesome! I've used it in all my projects with great results!

About the mail going to spam, I suggest this article

Wednesday, November 16, 2022
2

I know it's obvious, but the first place to check is the PHP configuration, make sure that curl is NOT listed in either:

disable_functions=
disable_classes=

While checking php settings, make sure that you have (based on your path):

extension_dir="C:PHPext"

Verify (again) that C:PHPextphp_curl.dll actually exists and that it wasn't just copied from the XAMPP install.

After that, depending on how you launch Apache (as a service using the System account or as a user), you may need to check the user path in addition to the system path.

At one point in time, I'd found that I had multiple copies of ssleay32.dll and libeay32.dll that were being called BEFORE the PHP binaries causing a version conflict. I've included an example of the trouble I ran into below.

To see the full path being used (assuming Apache is being run from user space):

C:> echo %PATH%
C:Windowssystem32;C:Windows;C:WindowsSystem32Wbem;C:WindowsSystem32WindowsPowerShellv1.0;C:Program Filescurl;C:Program Filescurldlls;C:Webphp;C:Python27;

On this machine, these entries are from my user path and not the system path:

C:Program Filescurl;C:Program Filescurldlls;C:Webphp;C:Python27;

To check the location of installed files in the path:

C:> where ssleay32.dll
C:Program Filescurldllsssleay32.dll
C:Webphpssleay32.dll
C:> 

In this example, the included dlls for my builds of curl and php were far enough apart that they were incompatible with each other. PHP was trying to load curl's dlls, but failed. I don't have it installed so I can't check right now, but TortoiseSVN may include them.

It would probably also be worth verifying the permissions on the file:

C:> cacls c:Webphpssleay32.dll
C:Webphpssleay32.dll BUILTINAdministrators:(ID)F
                        NT AUTHORITYSYSTEM:(ID)F
                        BUILTINUsers:(ID)R
                        NT AUTHORITYAuthenticated Users:(ID)C

As a side note, copying files to %SystemDrive%WindowsSystem32 is a bad idea. It's basically just a hack to get whatever files into the system path (in a priority position) without having to explain to the user how to edit the path variables.

Monday, October 31, 2022
 
3

Here are some points:

  • async void methods are only good for asynchronous event handlers (more info). Your async void ExecuteAsync() returns instantly (as soon as the code flow reaches await _pauseSource inside it). Essentially, your _task is in the completed state after that, while the rest of ExecuteAsync will be executed unobserved (because it's void). It may even not continue executing at all, depending on when your main thread (and thus, the process) terminates.

  • Given that, you should make it async Task ExecuteAsync(), and use Task.Run or Task.Factory.StartNew instead of new Task to start it. Because you want your task's action method be async, you'd be dealing with nested tasks here, i.e. Task<Task>, which Task.Run would automatically unwrap for you. More info can be found here and here.

  • PauseTokenSource takes the following approach (by design, AFAIU): the consumer side of the code (the one which calls Pause) actually only requests a pause, but doesn't synchronize on it. It will continue executing after Pause, even though the producer side may not have reached the awaiting state yet, i.e. await _pauseSource.Token.WaitWhilePausedAsync(). This may be ok for your app logic, but you should be aware of it. More info here.

[UPDATE] Below is the correct syntax for using Factory.StartNew. Note Task<Task> and task.Unwrap. Also note _task.Wait() in Stop, it's there to make sure the task has completed when Stop returns (in a way similar to Thread.Join). Also, TaskScheduler.Default is used to instruct Factory.StartNew to use the thread pool scheduler. This is important if your create your HighPrecisionTimer object from inside another task, which in turn was created on a thread with non-default synchronization context, e.g. a UI thread (more info here and here).

using System;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    public class HighPrecisionTimer
    {
        Task _task;
        CancellationTokenSource _cancelSource;

        public void Start()
        {
            _cancelSource = new CancellationTokenSource();

            Task<Task> task = Task.Factory.StartNew(
                function: ExecuteAsync, 
                cancellationToken: _cancelSource.Token, 
                creationOptions: TaskCreationOptions.LongRunning, 
                scheduler: TaskScheduler.Default);

            _task = task.Unwrap();
        }

        public void Stop()
        {
            _cancelSource.Cancel(); // request the cancellation

            _task.Wait(); // wait for the task to complete
        }

        async Task ExecuteAsync()
        {
            Console.WriteLine("Enter ExecuteAsync");
            while (!_cancelSource.IsCancellationRequested)
            {
                await Task.Delay(42); // for testing

                // DO CUSTOM TIMER STUFF...
            }
            Console.WriteLine("Exit ExecuteAsync");
        }
    }

    class Program
    {
        public static void Main()
        {
            var highPrecisionTimer = new HighPrecisionTimer();

            Console.WriteLine("Start timer");
            highPrecisionTimer.Start();

            Thread.Sleep(2000);

            Console.WriteLine("Stop timer");
            highPrecisionTimer.Stop();

            Console.WriteLine("Press Enter to exit...");
            Console.ReadLine();
        }
    }
}
Tuesday, September 6, 2022
 
desau
 
1

Try this to see if SELinux will let the web server connect to the network:

getsebool httpd_can_network_connect

If not, allow it with

setsebool -P httpd_can_network_connect on

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/selinux_users_and_administrators_guide/sect-managing_confined_services-the_apache_http_server-booleans

Monday, December 5, 2022
 
vatti
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :