Viewed   94 times

In my PHP application I need to read multiple lines starting from the end of many files (mostly logs). Sometimes I need only the last one, sometimes I need tens or hundreds. Basically, I want something as flexible as the Unix tail command.

There are questions here about how to get the single last line from a file (but I need N lines), and different solutions were given. I'm not sure about which one is the best and which performs better.

 Answers

2

Methods overview

Searching on the internet, I came across different solutions. I can group them in three approaches:

  • naive ones that use file() PHP function;
  • cheating ones that runs tail command on the system;
  • mighty ones that happily jump around an opened file using fseek().

I ended up choosing (or writing) five solutions, a naive one, a cheating one and three mighty ones.

  1. The most concise naive solution, using built-in array functions.
  2. The only possible solution based on tail command, which has a little big problem: it does not run if tail is not available, i.e. on non-Unix (Windows) or on restricted environments that don't allow system functions.
  3. The solution in which single bytes are read from the end of file searching for (and counting) new-line characters, found here.
  4. The multi-byte buffered solution optimized for large files, found here.
  5. A slightly modified version of solution #4 in which buffer length is dynamic, decided according to the number of lines to retrieve.

All solutions work. In the sense that they return the expected result from any file and for any number of lines we ask for (except for solution #1, that can break PHP memory limits in case of large files, returning nothing). But which one is better?

Performance tests

To answer the question I run tests. That's how these thing are done, isn't it?

I prepared a sample 100 KB file joining together different files found in my /var/log directory. Then I wrote a PHP script that uses each one of the five solutions to retrieve 1, 2, .., 10, 20, ... 100, 200, ..., 1000 lines from the end of the file. Each single test is repeated ten times (that's something like 5 × 28 × 10 = 1400 tests), measuring average elapsed time in microseconds.

I run the script on my local development machine (Xubuntu 12.04, PHP 5.3.10, 2.70 GHz dual core CPU, 2 GB RAM) using the PHP command line interpreter. Here are the results:

Solution #1 and #2 seem to be the worse ones. Solution #3 is good only when we need to read a few lines. Solutions #4 and #5 seem to be the best ones. Note how dynamic buffer size can optimize the algorithm: execution time is a little smaller for few lines, because of the reduced buffer.

Let's try with a bigger file. What if we have to read a 10 MB log file?

Now solution #1 is by far the worse one: in fact, loading the whole 10 MB file into memory is not a great idea. I run the tests also on 1MB and 100MB file, and it's practically the same situation.

And for tiny log files? That's the graph for a 10 KB file:

Solution #1 is the best one now! Loading a 10 KB into memory isn't a big deal for PHP. Also #4 and #5 performs good. However this is an edge case: a 10 KB log means something like 150/200 lines...

You can download all my test files, sources and results here.

Final thoughts

Solution #5 is heavily recommended for the general use case: works great with every file size and performs particularly good when reading a few lines.

Avoid solution #1 if you should read files bigger than 10 KB.

Solution #2 and #3 aren't the best ones for each test I run: #2 never runs in less than 2ms, and #3 is heavily influenced by the number of lines you ask (works quite good only with 1 or 2 lines).

Thursday, October 20, 2022
2

Where are loggers used?

In general there are two major use-cases for use of loggers within your code:

  • invasive logging:

    For the most part people use this approach because it is the easiest to understand.

    In reality you should only use invasive logging if logging is part of the domain logic itself. For example - in classes that deal with payments or management of sensitive information.

  • Non-invasive logging:

    With this method instead of altering the class that you wish to log, you wrap an existing instance in a container that lets you track every exchange between instance and rest of application.

    You also gain the ability to enable such logging temporarily, while debugging some specific problem outside of the development environment or when you are conducting some research of user behaviour. Since the class of the logged instance is never altered, the risk of disrupting the project's behaviour is a lot lower when compared to invasive logging.

Implementing an invasive logger

To do this you have two main approaches available. You can either inject an instance that implements the Logger interface, or provide the class with a factory that in turn will initialize the logging system only when necessary.

Note:
Since it seems that direct injection is not some hidden mystery for you, I will leave that part out... only I would urge you to avoid using constants outside of a file where they have been defined.

Now .. the implementation with factory and lazy loading.

You start by defining the API that you will use (in perfect world you start with unit-tests).

class Foobar 
{
    private $loggerFactory;

    public function __construct(Creator $loggerFactory, ....)
    {
        $this->loggerFactory = $loggerFactory;
        ....
    }
    .... 

    public function someLoggedMethod()
    {
        $logger = $this->loggerFactory->provide('simple');
        $logger->log( ... logged data .. );
        ....
    }
    ....
}

This factory will have two additional benefits:

  • it can ensure that only one instance is created without a need for global state
  • provide a seam for use when writing unit-tests

Note:
Actually, when written this way the class Foobar only depends on an instance that implements the Creator interface. Usually you will inject either a builder (if you need to type of instance, probably with some setting) or a factory (if you want to create different instance with same interface).

Next step would be implementation of the factory:

class LazyLoggerFactory implements Creator
{

    private $loggers = [];
    private $providers = [];

    public function addProvider($name, callable $provider)
    {
        $this->providers[$name] = $provider;
        return $this;
    }

    public function provide($name)
    {
        if (array_key_exists($name, $this->loggers) === false)
        {
            $this->loggers[$name] = call_user_func($this->providers[$name]);
        }
        return $this->loggers[$name];
    }

}

When you call $factory->provide('thing');, the factory looks up if the instance has already been created. If the search fails it creates a new instance.

Note: I am actually not entirely sure that this can be called "factory" since the instantiation is really encapsulated in the anonymous functions.

And the last step is actually wiring it all up with providers:

$config = include '/path/to/config/loggers.php';

$loggerFactory = new LazyLoggerFactory;
$loggerFactory->addProvider('simple', function() use ($config){
    $instance = new SimpleFileLogger($config['log_file']);
    return $instance;
});

/* 
$loggerFactory->addProvider('fake', function(){
    $instance = new NullLogger;
    return $instance;
});
*/

$test = new Foobar( $loggerFactory );

Of course to fully understand this approach you will have to know how closures work in PHP, but you will have to learn them anyway.

Implementing non-invasive logging

The core idea of this approach is that instead of injecting the logger, you put an existing instance in a container which acts as membrane between said instance and application. This membrane can then perform different tasks, one of those is logging.

class LogBrane
{
    protected $target = null;
    protected $logger = null;

    public function __construct( $target, Logger $logger )
    {
        $this->target = $target;
        $this->logger = $logger;
    }

    public function __call( $method, $arguments )
    {
        if ( method_exists( $this->target, $method ) === false )
        {
            // sometime you will want to log call of nonexistent method
        }

        try
        {
            $response = call_user_func_array( [$this->target, $method], 
                                              $arguments );

            // write log, if you want
            $this->logger->log(....);
        }
        catch (Exception $e)
        {
            // write log about exception 
            $this->logger->log(....);

            // and re-throw to not disrupt the behavior
            throw $e;
        }
    }
}

This class can also be used together with the above described lazy factory.

To use this structure, you simply do the following:

$instance = new Foobar;

$instance = new LogBrane( $instance, $logger );
$instance->someMethod();

At this point the container which wraps the instance becomes a fully functional replacement of the original. The rest of your application can handle it as if it is a simple object (pass around, call methods upon). And the wrapped instance itself is not aware that it is being logged.

And if at some point you decide to remove the logging then it can be done without rewriting the rest of your application.

Wednesday, August 3, 2022
5

Your global variables are already accessible in $GLOBALS['foo'], $GLOBALS['bar'] etc. This is a clearer indication inside function scope that they come from the global scope than using the global keyword. Should not affect performance in any meaningful way.

Many will tell you that best practice is to avoid global variables in the first place and instead pass variables through function calls and object constructors.

Monday, December 5, 2022
 
3

With Eclipse go to Preferences -> PHP -> Editor -> Templates -> New and use something like this:

/**
 * Method to get the ${PropertyName}
 *
 * @return  ${ReturnType}  return the ${PropertyName}.
 *
 * @since   __DEPLOY_VERSION__
*/
public function get${MethodName}() {
  return $$this->${PropertyName};
}

/**
 * Method to set the ${PropertyName}
 *
 * @return  boolean  True on success and false on failed case
 *
 * @since   __DEPLOY_VERSION__
*/
public function set${MethodName}($$value) {
  $$this->${PropertyName} = $$value;
}

To use the template type its name and press ctrl+space - a context menu should also automatically appear when you type the name.

Saturday, September 17, 2022
3

PHP has a built-in function that does exactly what you want: strip_tags

$text = '<b>Hello</b> World';
print strip_tags($text); // outputs Hello World

If you expect broken HTML, you are going to need to load it into a DOM parser and then extract the text.

Monday, October 10, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :