I'm trying to read .doc .docx
file in php. All is working fine. But at last line I'm getting awful characters. Please help me.
Here is code which is developed by someone.
function parseWord($userDoc)
{
$fileHandle = fopen($userDoc, "r");
$line = @fread($fileHandle, filesize($userDoc));
$lines = explode(chr(0x0D),$line);
$outtext = "";
foreach($lines as $thisline)
{
$pos = strpos($thisline, chr(0x00));
if (($pos !== FALSE)||(strlen($thisline)==0))
{
} else {
$outtext .= $thisline." ";
}
}
$outtext = preg_replace("/[^a-zA-Z0-9s,.-nrt@/_()]/","",$outtext);
return $outtext;
}
$userDoc = "k.doc";
Here is screenshot.
DOC files are not plain text.
Try a library such as PHPWord (old CodePlex site).
nb: This answer has been updated multiple times as PHPWord has changed hosting and functionality.