Thursday, May 28, 2009

Handling tif image produced by MS Office Document Scanning

I scanned a paper using MS Office Document Scanning (on Windows XP) and the output was a tiff image file. Using file command, I could see the file type as below:

image.tif: TIFF image data, little-endian

As usual, I wanted to open it with a image editing software like gimp or imagemagick or krita or even with document viewer like okular but to no avail. The error produced by GIMP was: unsupported layout, No RGBA loader. The error produced by Krita was: Cannot create storage. I tried convert a utility packaged with imagemagick to convert the image to jpg :

[zamri@triniton KINGSTON]# convert image.tif -quality 90 kpd.jpg

I got this error:
convert: image.tif: unknown field with tag 512 (0x200) encountered. `TIFFReadDirectory' @ tiff.c/TIFFWarnings/525.
convert: image.tif: unknown field with tag 513 (0x201) encountered. `TIFFReadDirectory' @ tiff.c/TIFFWarnings/525.
convert: image: unknown field with tag 514 (0x202) encountered. `TIFFReadDirectory' @ tiff.c/TIFFWarnings/525.
convert: image.tif: unknown field with tag 37677 (0x932d) encountered. `TIFFReadDirectory' @ tiff.c/TIFFWarnings/525.
convert: image.tif: unknown field with tag 37678 (0x932e) encountered. `TIFFReadDirectory' @ tiff.c/TIFFWarnings/525.
convert: image.tif: unknown field with tag 37680 (0x9330) encountered. `TIFFReadDirectory' @ tiff.c/TIFFWarnings/525.
convert: compression not supported `image.tif' @ tiff.c/ReadTIFFImage/811.
convert: missing an image filename `kpd.jpg' @ convert.c/ConvertImageCommand/2776.

So I searched the internet and I found the solution on this website. I installed foremost by issuing this command: urpmi foremost. Luckily Mandriva has this utility in its repositories. Then I issued this command:

foremost -i image.tif -o image

The first argument -i is the input file and second arg -o is for the dir where we want to extract the content of the image. It appeared that the tif file was compressed and contained many files including the jpg file that of interest to me. The output of above command was a directory (folder) named image. In the directory,I got:

audit.txt jpg/ ole/

In directory jpg, I got :

00000000.jpg 00000703.jpg

The file named 00000000.jpg was the image file type jpeg that can be opened with any image viewer. The other one was the thumbnail.

3 comments:

imperium said...

I often find office's document scanning to be haphazard, its usually better to buy a separate DMS. Commercial ones are often very cheap.

zamri said...

Yeah that's true but if you can't install one then foremost should be the savor.

joshua said...

Hi Nice Blog.Document scanning company offering document conversion and data capture services, including scanning of photographs, microfilm, microfiche, questionnaire, technical drawings and conversion to CAD and OCR editable text.document scanning