Are there any free OCR libraries that work with PHP or Python on a Linux server? The idea is to be able to upload an image and pull out characters from it, or allow users to "draw characters", and parse them out of said image.
Answers
You can use the PhpSpreadsheet library, to read an existing Excel file, add new rows/columns to it, then write it back as a real Excel file.
Disclaimer: I am one of the authors of this library.
Try this found here
//This input should be from somewhere else, hard-coded in this example
$file_name = '2013-07-16.dump.gz';
// Raising this value may increase performance
$buffer_size = 4096; // read 4kb at a time
$out_file_name = str_replace('.gz', '', $file_name);
// Open our files (in binary mode)
$file = gzopen($file_name, 'rb');
$out_file = fopen($out_file_name, 'wb');
// Keep repeating until the end of the input file
while (!gzeof($file)) {
// Read buffer-size bytes
// Both fwrite and gzread and binary-safe
fwrite($out_file, gzread($file, $buffer_size));
}
// Files are done, close files
fclose($out_file);
gzclose($file);
I came across this link. It will do what you want (I've tested it and posted results). Just pass the class the path of the TTF file you want to parse the data out of. then use $fontinfo[1].' '.$fontinfo[2]
for the name.
In case you don't want to register, here is the class
Resulting Data
Array
(
[1] => Almonte Snow
[2] => Regular
[3] => RayLarabie: Almonte Snow: 2000
[4] => Almonte Snow
[5] => Version 2.000 2004
[6] => AlmonteSnow
[8] => Ray Larabie
[9] => Ray Larabie
[10] => Larabie Fonts is able to offer unique free fonts through the generous support of visitors to the site. Making fonts is my full-time job and every donation, in any amount, enables me to continue running the site and creating new fonts. If you would like to support Larabie Fonts visit www.larabiefonts.com for details.
[11] => http://www.larabiefonts.com
[12] => http://www.typodermic.com
)
Usage
<?php
include 'ttfInfo.class.php';
$fontinfo = getFontInfo('c:windowsfonts_LDS_almosnow.ttf');
echo '<pre>';
print_r($fontinfo);
echo '</pre>';
?>
ttfInfo.class.php
<?php
/**
* ttfInfo class
* Retrieve data stored in a TTF files 'name' table
*
* @original author Unknown
* found at http://www.phpclasses.org/browse/package/2144.html
*
* @ported for used on http://www.nufont.com
* @author Jason Arencibia
* @version 0.2
* @copyright (c) 2006 GrayTap Media
* @website http://www.graytap.com
* @license GPL 2.0
* @access public
*
* @todo: Make it Retrieve additional information from other tables
*
*/
class ttfInfo {
/**
* variable $_dirRestriction
* Restrict the resource pointer to this directory and above.
* Change to 1 for to allow the class to look outside of it current directory
* @protected
* @var int
*/
protected $_dirRestriction = 1;
/**
* variable $_dirRestriction
* Restrict the resource pointer to this directory and above.
* Change to 1 for nested directories
* @protected
* @var int
*/
protected $_recursive = 0;
/**
* variable $fontsdir
* This is to declare this variable as protected
* don't edit this!!!
* @protected
*/
protected $fontsdir;
/**
* variable $filename
* This is to declare this varable as protected
* don't edit this!!!
* @protected
*/
protected $filename;
/**
* function setFontFile()
* set the filename
* @public
* @param string $data the new value
* @return object reference to this
*/
public function setFontFile($data)
{
if ($this->_dirRestriction && preg_match('[./|../]', $data))
{
$this->exitClass('Error: Directory restriction is enforced!');
}
$this->filename = $data;
return $this;
} // public function setFontFile
/**
* function setFontsDir()
* set the Font Directory
* @public
* @param string $data the new value
* @return object referrence to this
*/
public function setFontsDir($data)
{
if ($this->_dirRestriction && preg_match('[./|../]', $data))
{
$this->exitClass('Error: Directory restriction is enforced!');
}
$this->fontsdir = $data;
return $this;
} // public function setFontsDir
/**
* function readFontsDir()
* @public
* @return information contained in the TTF 'name' table of all fonts in a directory.
*/
public function readFontsDir()
{
if (empty($this->fontsdir)) { $this->exitClass('Error: Fonts Directory has not been set with setFontsDir().'); }
if (empty($this->backupDir)){ $this->backupDir = $this->fontsdir; }
$this->array = array();
$d = dir($this->fontsdir);
while (false !== ($e = $d->read()))
{
if($e != '.' && $e != '..')
{
$e = $this->fontsdir . $e;
if($this->_recursive && is_dir($e))
{
$this->setFontsDir($e);
$this->array = array_merge($this->array, readFontsDir());
}
else if ($this->is_ttf($e) === true)
{
$this->setFontFile($e);
$this->array[$e] = $this->getFontInfo();
}
}
}
if (!empty($this->backupDir)){ $this->fontsdir = $this->backupDir; }
$d->close();
return $this;
} // public function readFontsDir
/**
* function setProtectedVar()
* @public
* @param string $var the new variable
* @param string $data the new value
* @return object reference to this
* DISABLED, NO REAL USE YET
public function setProtectedVar($var, $data)
{
if ($var == 'filename')
{
$this->setFontFile($data);
} else {
//if (isset($var) && !empty($data))
$this->$var = $data;
}
return $this;
}
*/
/**
* function getFontInfo()
* @public
* @return information contained in the TTF 'name' table.
*/
public function getFontInfo()
{
$fd = fopen ($this->filename, "r");
$this->text = fread ($fd, filesize($this->filename));
fclose ($fd);
$number_of_tables = hexdec($this->dec2ord($this->text[4]).$this->dec2ord($this->text[5]));
for ($i=0;$i<$number_of_tables;$i++)
{
$tag = $this->text[12+$i*16].$this->text[12+$i*16+1].$this->text[12+$i*16+2].$this->text[12+$i*16+3];
if ($tag == 'name')
{
$this->ntOffset = hexdec(
$this->dec2ord($this->text[12+$i*16+8]).$this->dec2ord($this->text[12+$i*16+8+1]).
$this->dec2ord($this->text[12+$i*16+8+2]).$this->dec2ord($this->text[12+$i*16+8+3]));
$offset_storage_dec = hexdec($this->dec2ord($this->text[$this->ntOffset+4]).$this->dec2ord($this->text[$this->ntOffset+5]));
$number_name_records_dec = hexdec($this->dec2ord($this->text[$this->ntOffset+2]).$this->dec2ord($this->text[$this->ntOffset+3]));
}
}
$storage_dec = $offset_storage_dec + $this->ntOffset;
$storage_hex = strtoupper(dechex($storage_dec));
for ($j=0;$j<$number_name_records_dec;$j++)
{
$platform_id_dec = hexdec($this->dec2ord($this->text[$this->ntOffset+6+$j*12+0]).$this->dec2ord($this->text[$this->ntOffset+6+$j*12+1]));
$name_id_dec = hexdec($this->dec2ord($this->text[$this->ntOffset+6+$j*12+6]).$this->dec2ord($this->text[$this->ntOffset+6+$j*12+7]));
$string_length_dec = hexdec($this->dec2ord($this->text[$this->ntOffset+6+$j*12+8]).$this->dec2ord($this->text[$this->ntOffset+6+$j*12+9]));
$string_offset_dec = hexdec($this->dec2ord($this->text[$this->ntOffset+6+$j*12+10]).$this->dec2ord($this->text[$this->ntOffset+6+$j*12+11]));
if (!empty($name_id_dec) and empty($font_tags[$name_id_dec]))
{
for($l=0;$l<$string_length_dec;$l++)
{
if (ord($this->text[$storage_dec+$string_offset_dec+$l]) == '0') { continue; }
else { $font_tags[$name_id_dec] .= ($this->text[$storage_dec+$string_offset_dec+$l]); }
}
}
}
return $font_tags;
} // public function getFontInfo
/**
* function getCopyright()
* @public
* @return 'Copyright notice' contained in the TTF 'name' table at index 0
*/
public function getCopyright()
{
$this->info = $this->getFontInfo();
return $this->info[0];
} // public function getCopyright
/**
* function getFontFamily()
* @public
* @return 'Font Family name' contained in the TTF 'name' table at index 1
*/
public function getFontFamily()
{
$this->info = $this->getFontInfo();
return $this->info[1];
} // public function getFontFamily
/**
* function getFontSubFamily()
* @public
* @return 'Font Subfamily name' contained in the TTF 'name' table at index 2
*/
public function getFontSubFamily()
{
$this->info = $this->getFontInfo();
return $this->info[2];
} // public function getFontSubFamily
/**
* function getFontId()
* @public
* @return 'Unique font identifier' contained in the TTF 'name' table at index 3
*/
public function getFontId()
{
$this->info = $this->getFontInfo();
return $this->info[3];
} // public function getFontId
/**
* function getFullFontName()
* @public
* @return 'Full font name' contained in the TTF 'name' table at index 4
*/
public function getFullFontName()
{
$this->info = $this->getFontInfo();
return $this->info[4];
} // public function getFullFontName
/**
* function dec2ord()
* Used to lessen redundant calls to multiple functions.
* @protected
* @return object
*/
protected function dec2ord($dec)
{
return $this->dec2hex(ord($dec));
} // protected function dec2ord
/**
* function dec2hex()
* private function to perform Hexadecimal to decimal with proper padding.
* @protected
* @return object
*/
protected function dec2hex($dec)
{
return str_repeat('0', 2-strlen(($hex=strtoupper(dechex($dec))))) . $hex;
} // protected function dec2hex
/**
* function dec2hex()
* private function to perform Hexadecimal to decimal with proper padding.
* @protected
* @return object
*/
protected function exitClass($message)
{
echo $message;
exit;
} // protected function dec2hex
/**
* function dec2hex()
* private helper function to test in the file in question is a ttf.
* @protected
* @return object
*/
protected function is_ttf($file)
{
$ext = explode('.', $file);
$ext = $ext[count($ext)-1];
return preg_match("/ttf$/i",$ext) ? true : false;
} // protected function is_ttf
} // class ttfInfo
function getFontInfo($resource)
{
$ttfInfo = new ttfInfo;
$ttfInfo->setFontFile($resource);
return $ttfInfo->getFontInfo();
}
?>
Update 2021
Here is an updated version of the class with some fixes https://github.com/HusamAamer/TTFInfo.git
There are several online utilities can be used to identify fonts, including:
- WhatTheFont!, which can automatically match a font in an image you submit to the closest matches in the database;
- Identifont and Fonts.com , where you specify the appearance of the characters in the font to identify the font.
These utilities cannot be used to determine the formatting of the text in an image. However, you can use OCR programs such as Tesseract (open source) and Smart OCR (commercial, starting from US$99.90) to detect formatting such as paragraph alignment and line spacing as well as font styles such as bold or italic (see this question). Note that some OCR programs can attempt to identify the font(s) in an image as well.
Since you're on a Linux box, I would highly recommend Google's open source project ocropus.
It's not PHP, but I think it will be your best option. Of course you can call it from within PHP via
exec
. Its mature and has a lot of options. From the project site:There is also another open source project, tesseract. I've used this in the past as well and have been pleased with the results. Includes training, limiting your alphabet, etc.