3
votes

I have a large amount of files where their original file names have been replaced by ids from my database. For example, what was once name word_document.doc is now 12345. Through a process I have lost the original name.

I am now trying to present these files for download. The person should be able to download the file and view it using it's original application. The files are all in one of the following formats:

  • .txt (text)
  • .doc (word document)
  • .docx (word document)
  • .wpd (word perfect)
  • .pdf (PDF)
  • .rtf (rich text)
  • .sxw (star office)
  • .odt (open office)

I'm using

$fhandle = finfo_open(FILEINFO_MIME);
$file_mime_type = finfo_file($fhandle, $filepath);

to get the mime type and then mapping the mime type to an extension.

The problem I am running into is some of the files have a mime type of octet-stream. I've read online and this type seems to be a miscellaneous type for binary files. I can't easily tell what the extension needs to be. In some cases it works when I set it to .wpd and some cases it doesn't. The same goes for .sxw.

1
Lol, think main phrase in your post - 'Through a process I have lost the original name'. You are already keep some info in database, why you not keep filenames in database too? - degr
Maby this will help you? tika.apache.org - sanderbee
@degr I do keep filenames in the database, but users are allowed to "delete" their files. "Deleting" is simply removing the row in the database that holds information such as the filename. As part of the website we need to keep the files and have them still accessible as the files are now owned by others. - Caleb Doucet
@Caleb Doucet You need to delete file with row from database. If you need to keep files, you can keep row in database too, just add one more 'bit' field named - deleted. - degr
@degr I understand the solution would be to just keep the database record but that would require a lot of rework. (it is a big system) The budget won't allow for what you are proposing. - Caleb Doucet

1 Answers

2
votes

Symfony2 do it in 3 steps

1) mime_content_type

$type = mime_content_type($path);

// remove charset (added as of PHP 5.3)
if (false !== $pos = strpos($type, ';')) {
    $type = substr($type, 0, $pos);
}

return $type;

2) file -b --mime

ob_start();
passthru(sprintf('file -b --mime %s 2>/dev/null', escapeshellarg($path)), $return);
if ($return > 0) {
    ob_end_clean();

    return;
}

$type = trim(ob_get_clean());
if (!preg_match('#^([a-z0-9\-]+/[a-z0-9\-\.]+)#i', $type, $match)) {
    // it's not a type, but an error message
    return;
}

return $match[1];

3) finfo

if (!$finfo = new \finfo(FILEINFO_MIME_TYPE, $path)) {
    return;
}

return $finfo->file($path);

After you've got mime-type you can get extension from predefined map, for example from here or here

$map = array(
    'application/msword' => 'doc',
    'application/x-msword' => 'doc',
    'application/vnd.openxmlformats-officedocument.wordprocessingml.document' => 'docx',
    'application/pdf' => 'pdf',
    'application/x-pdf' => 'pdf',
    'application/rtf' => 'rtf',
    'text/rtf' => 'rtf',
    'application/vnd.sun.xml.writer' => 'sxw',
    'application/vnd.oasis.opendocument.text' => 'odt',
    'text/plain' => 'txt',
);