Viewed   75 times

Are there any free OCR libraries that work with PHP or Python on a Linux server? The idea is to be able to upload an image and pull out characters from it, or allow users to "draw characters", and parse them out of said image.

 Answers

3

Since you're on a Linux box, I would highly recommend Google's open source project ocropus.

It's not PHP, but I think it will be your best option. Of course you can call it from within PHP via exec. Its mature and has a lot of options. From the project site:

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.

There is also another open source project, tesseract. I've used this in the past as well and have been pleased with the results. Includes training, limiting your alphabet, etc.

Sunday, August 21, 2022
5

You can use the PhpSpreadsheet library, to read an existing Excel file, add new rows/columns to it, then write it back as a real Excel file.

Disclaimer: I am one of the authors of this library.

Thursday, August 11, 2022
5

Try this found here

//This input should be from somewhere else, hard-coded in this example
$file_name = '2013-07-16.dump.gz';

// Raising this value may increase performance
$buffer_size = 4096; // read 4kb at a time
$out_file_name = str_replace('.gz', '', $file_name); 

// Open our files (in binary mode)
$file = gzopen($file_name, 'rb');
$out_file = fopen($out_file_name, 'wb'); 

// Keep repeating until the end of the input file
while (!gzeof($file)) {
    // Read buffer-size bytes
    // Both fwrite and gzread and binary-safe
    fwrite($out_file, gzread($file, $buffer_size));
}

// Files are done, close files
fclose($out_file);
gzclose($file);
Wednesday, August 17, 2022
3

I came across this link. It will do what you want (I've tested it and posted results). Just pass the class the path of the TTF file you want to parse the data out of. then use $fontinfo[1].' '.$fontinfo[2] for the name.

In case you don't want to register, here is the class

Resulting Data

Array
(
    [1] => Almonte Snow
    [2] => Regular
    [3] => RayLarabie: Almonte Snow: 2000
    [4] => Almonte Snow
    [5] => Version 2.000 2004
    [6] => AlmonteSnow
    [8] => Ray Larabie
    [9] => Ray Larabie
    [10] => Larabie Fonts is able to offer unique free fonts through the generous support of visitors to the site. Making fonts is my full-time job and every donation, in any amount, enables me to continue running the site and creating new fonts. If you would like to support Larabie Fonts visit www.larabiefonts.com for details.
    [11] => http://www.larabiefonts.com
    [12] => http://www.typodermic.com
)

Usage

<?php 
    include 'ttfInfo.class.php'; 
    $fontinfo = getFontInfo('c:windowsfonts_LDS_almosnow.ttf'); 
    echo '<pre>'; 
    print_r($fontinfo); 
    echo '</pre>'; 
?>

ttfInfo.class.php

<?php 
/** 
 * ttfInfo class 
 * Retrieve data stored in a TTF files 'name' table 
 * 
 * @original author Unknown 
 * found at http://www.phpclasses.org/browse/package/2144.html 
 * 
 * @ported for used on http://www.nufont.com 
 * @author Jason Arencibia 
 * @version 0.2 
 * @copyright (c) 2006 GrayTap Media 
 * @website http://www.graytap.com 
 * @license GPL 2.0 
 * @access public 
 * 
 * @todo: Make it Retrieve additional information from other tables 
 *  
 */ 
class ttfInfo { 
    /** 
    * variable $_dirRestriction 
    * Restrict the resource pointer to this directory and above. 
    * Change to 1 for to allow the class to look outside of it current directory 
    * @protected 
    * @var int 
    */ 
    protected $_dirRestriction = 1; 
    /** 
    * variable $_dirRestriction 
    * Restrict the resource pointer to this directory and above. 
    * Change to 1 for nested directories 
    * @protected 
    * @var int 
    */ 
    protected $_recursive = 0; 

    /** 
    * variable $fontsdir 
    * This is to declare this variable as protected 
    * don't edit this!!! 
    * @protected 
    */ 
    protected $fontsdir; 
    /** 
    * variable $filename 
    * This is to declare this varable as protected 
    * don't edit this!!! 
    * @protected 
    */ 
    protected $filename; 

    /** 
    * function setFontFile() 
    * set the filename 
    * @public 
    * @param string $data the new value 
    * @return object reference to this 
    */ 
    public function setFontFile($data) 
    { 
        if ($this->_dirRestriction && preg_match('[./|../]', $data)) 
        { 
            $this->exitClass('Error: Directory restriction is enforced!'); 
        } 

        $this->filename = $data; 
        return $this; 
    } // public function setFontFile 

    /** 
    * function setFontsDir() 
    * set the Font Directory 
    * @public 
    * @param string $data the new value 
    * @return object referrence to this 
    */ 
    public function setFontsDir($data) 
    { 
        if ($this->_dirRestriction && preg_match('[./|../]', $data)) 
        { 
            $this->exitClass('Error: Directory restriction is enforced!'); 
        } 

        $this->fontsdir = $data; 
        return $this; 
    } // public function setFontsDir 

    /** 
    * function readFontsDir()  
    * @public 
    * @return information contained in the TTF 'name' table of all fonts in a directory. 
    */ 
    public function readFontsDir() 
    { 
        if (empty($this->fontsdir)) { $this->exitClass('Error: Fonts Directory has not been set with setFontsDir().'); } 
        if (empty($this->backupDir)){ $this->backupDir = $this->fontsdir; } 

        $this->array = array(); 
        $d = dir($this->fontsdir); 

        while (false !== ($e = $d->read())) 
        { 
            if($e != '.' && $e != '..') 
            { 
                $e = $this->fontsdir . $e; 
                if($this->_recursive && is_dir($e)) 
                { 
                    $this->setFontsDir($e); 
                    $this->array = array_merge($this->array, readFontsDir()); 
                } 
                else if ($this->is_ttf($e) === true) 
                { 
                    $this->setFontFile($e); 
                    $this->array[$e] = $this->getFontInfo(); 
                } 
            } 
        } 

        if (!empty($this->backupDir)){ $this->fontsdir = $this->backupDir; } 

        $d->close(); 
        return $this; 
    } // public function readFontsDir 

    /** 
    * function setProtectedVar() 
    * @public 
    * @param string $var the new variable 
    * @param string $data the new value 
    * @return object reference to this 

    * DISABLED, NO REAL USE YET 

    public function setProtectedVar($var, $data) 
    { 
        if ($var == 'filename') 
        { 
            $this->setFontFile($data); 
        } else { 
            //if (isset($var) && !empty($data)) 
            $this->$var = $data; 
        } 
        return $this; 
    } 
    */ 
    /** 
    * function getFontInfo()  
    * @public 
    * @return information contained in the TTF 'name' table. 
    */ 
    public function getFontInfo() 
    { 
        $fd = fopen ($this->filename, "r"); 
        $this->text = fread ($fd, filesize($this->filename)); 
        fclose ($fd); 

        $number_of_tables = hexdec($this->dec2ord($this->text[4]).$this->dec2ord($this->text[5])); 

        for ($i=0;$i<$number_of_tables;$i++) 
        { 
            $tag = $this->text[12+$i*16].$this->text[12+$i*16+1].$this->text[12+$i*16+2].$this->text[12+$i*16+3];

            if ($tag == 'name') 
            { 
                $this->ntOffset = hexdec( 
                    $this->dec2ord($this->text[12+$i*16+8]).$this->dec2ord($this->text[12+$i*16+8+1]). 
                    $this->dec2ord($this->text[12+$i*16+8+2]).$this->dec2ord($this->text[12+$i*16+8+3])); 

                $offset_storage_dec = hexdec($this->dec2ord($this->text[$this->ntOffset+4]).$this->dec2ord($this->text[$this->ntOffset+5])); 
                $number_name_records_dec = hexdec($this->dec2ord($this->text[$this->ntOffset+2]).$this->dec2ord($this->text[$this->ntOffset+3])); 
            } 
        } 

        $storage_dec = $offset_storage_dec + $this->ntOffset; 
        $storage_hex = strtoupper(dechex($storage_dec)); 

        for ($j=0;$j<$number_name_records_dec;$j++) 
        { 
            $platform_id_dec    = hexdec($this->dec2ord($this->text[$this->ntOffset+6+$j*12+0]).$this->dec2ord($this->text[$this->ntOffset+6+$j*12+1])); 
            $name_id_dec        = hexdec($this->dec2ord($this->text[$this->ntOffset+6+$j*12+6]).$this->dec2ord($this->text[$this->ntOffset+6+$j*12+7])); 
            $string_length_dec    = hexdec($this->dec2ord($this->text[$this->ntOffset+6+$j*12+8]).$this->dec2ord($this->text[$this->ntOffset+6+$j*12+9])); 
            $string_offset_dec    = hexdec($this->dec2ord($this->text[$this->ntOffset+6+$j*12+10]).$this->dec2ord($this->text[$this->ntOffset+6+$j*12+11])); 

            if (!empty($name_id_dec) and empty($font_tags[$name_id_dec])) 
            { 
                for($l=0;$l<$string_length_dec;$l++) 
                { 
                    if (ord($this->text[$storage_dec+$string_offset_dec+$l]) == '0') { continue; } 
                    else { $font_tags[$name_id_dec] .= ($this->text[$storage_dec+$string_offset_dec+$l]); } 
                } 
            } 
        } 
        return $font_tags; 
    } // public function getFontInfo 

    /** 
    * function getCopyright()  
    * @public 
    * @return 'Copyright notice' contained in the TTF 'name' table at index 0 
    */ 
    public function getCopyright() 
    { 
        $this->info = $this->getFontInfo(); 
        return $this->info[0]; 
    } // public function getCopyright 

    /** 
    * function getFontFamily()  
    * @public 
    * @return 'Font Family name' contained in the TTF 'name' table at index 1 
    */ 
    public function getFontFamily() 
    { 
        $this->info = $this->getFontInfo(); 
        return $this->info[1]; 
    } // public function getFontFamily 

    /** 
    * function getFontSubFamily()  
    * @public 
    * @return 'Font Subfamily name' contained in the TTF 'name' table at index 2 
    */ 
    public function getFontSubFamily() 
    { 
        $this->info = $this->getFontInfo(); 
        return $this->info[2]; 
    } // public function getFontSubFamily 

    /** 
    * function getFontId()  
    * @public 
    * @return 'Unique font identifier' contained in the TTF 'name' table at index 3 
    */ 
    public function getFontId() 
    { 
        $this->info = $this->getFontInfo(); 
        return $this->info[3]; 
    } // public function getFontId 

    /** 
    * function getFullFontName()  
    * @public 
    * @return 'Full font name' contained in the TTF 'name' table at index 4 
    */ 
    public function getFullFontName() 
    { 
        $this->info = $this->getFontInfo(); 
        return $this->info[4]; 
    } // public function getFullFontName 

    /** 
    * function dec2ord() 
    * Used to lessen redundant calls to multiple functions. 
    * @protected 
    * @return object 
    */ 
    protected function dec2ord($dec) 
    { 
        return $this->dec2hex(ord($dec)); 
    } // protected function dec2ord 

    /** 
    * function dec2hex() 
    * private function to perform Hexadecimal to decimal with proper padding. 
    * @protected 
    * @return object 
    */ 
    protected function dec2hex($dec) 
    { 
        return str_repeat('0', 2-strlen(($hex=strtoupper(dechex($dec))))) . $hex; 
    } // protected function dec2hex 

    /** 
    * function dec2hex() 
    * private function to perform Hexadecimal to decimal with proper padding. 
    * @protected 
    * @return object 
    */ 
    protected function exitClass($message) 
    { 
        echo $message; 
        exit; 
    } // protected function dec2hex 

    /** 
    * function dec2hex() 
    * private helper function to test in the file in question is a ttf. 
    * @protected 
    * @return object 
    */ 
    protected function is_ttf($file) 
    { 
        $ext = explode('.', $file); 
        $ext = $ext[count($ext)-1]; 
        return preg_match("/ttf$/i",$ext) ? true : false; 
    } // protected function is_ttf 
} // class ttfInfo 

function getFontInfo($resource) 
{ 
    $ttfInfo = new ttfInfo; 
    $ttfInfo->setFontFile($resource); 
    return $ttfInfo->getFontInfo(); 
} 
?>

Update 2021

Here is an updated version of the class with some fixes https://github.com/HusamAamer/TTFInfo.git

Friday, August 26, 2022
 
27

There are several online utilities can be used to identify fonts, including:

  • WhatTheFont!, which can automatically match a font in an image you submit to the closest matches in the database;
  • Identifont and Fonts.com , where you specify the appearance of the characters in the font to identify the font.

These utilities cannot be used to determine the formatting of the text in an image. However, you can use OCR programs such as Tesseract (open source) and Smart OCR (commercial, starting from US$99.90) to detect formatting such as paragraph alignment and line spacing as well as font styles such as bold or italic (see this question). Note that some OCR programs can attempt to identify the font(s) in an image as well.

Saturday, September 10, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :