GIF, LZW, Postscript, and PDF

September 06, 2007, at 05:00 PM

Well, the other day when I wrote about this, I did a test where I used an external tool to decode the LZW payload of a GIF and feed that into my generated PDF. That worked. Then I tried decoding with PostScript's codec by using Ghostscript interactively, and comparing the data to what I expected. That worked. Then I tried telling the PDF to decode with the same filter. That did not work.

So I slept on it.

Today I looked in on it again and the problem was immediately obvious: PostScript and PDF have different versions of the LZWDecode filter, which aren't quite identical. Specifically, their parameters have the same names, but in PDF only values between 9 and 12 are valid for the initial number of bits in each code. In PostScript, any value is valid. The stream I was testing with used 3 bits. I actually had already tried generating a stream that wouldn't have that issue, since I was confused by some wording in the PDF spec, but that one used 8 bits!

So that answers that question. Yay! Now I need to decide what to do about it! Obviously I need to decode and re-encode the thing. Since I want to link against Ghostscript anyway to handle EPS conversion, maybe I should figure out how to link against it now and then have it do the recoding. That way I won't need a Haskell implementation of LZW. Not that LZW is that hard... I've implemented it before, but that's precisely why I don't really want to do it again. Boring to keep doing the same thing. And since Ghostscript already contains a well-tested implementation, as well as impls for any other codec I might possibly want...

TrackBack

TrackBack URL for this entry:
http://www.accela.net/~dankna/cgi-bin/mt/mt-tb.cgi/14

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)