Post by a***@spenarnc.xs4all.nlPost by Don YPost by a***@spenarnc.xs4all.nlPost by bitrexPost by Don YI eventually scanned everything with a Perfect Binding and now fit those
same books on a single microSD card (in a Nook; PDFs on a 12" tablet).
PDFs are a dreadful format! Maybe there's a high-end e-ink that
processes them effectively but they look like shit on the cheaper ones
like most of the Kindles with e-ink displays.
PDF's are fearful. It is under control of one company, Adobe.
It is changed without notice. I regret the change away from PostScript
that at least was defined.
How is this any different than other file formats "controlled" by their
originators? MS can't even access THEIR older versions of THEIR format.
I have PCB layout tools that can't read THEIR earlier (one version)
files, etc.
PDF is better that WORD. See my other response.
PDF is somewhat open and stable. Most other formats are closed and
subject tot he whims of their creators/owners.
Post by a***@spenarnc.xs4all.nlPost by Don YYou could always render your document to a TIFF (and then encapsulate it
in a PDF!), losing the textual nature in the process...
What? That is stupid. I don't go for the looks.
PDF is ALL about the page layout. If all you care about is the *content*
(and not the format/layout), then you could use HTML to encapsulate
the document (assuming you have other media besides just "ASCII text")
Post by a***@spenarnc.xs4all.nlOTOH convert a pdf document into UTF8 rather than a graphical
format.
There are tools that will make these conversions (scanned images OCRed,
PDF/PS to text, etc.). But, you lose all of the non-text content.
I use PDFs as a versatile container format that lets me show content
exactly how I want it presented, include graphics, audio, video/animation,
etc. I can describe a piece of code and "attach" the code to the explanation
(without having to "include" all of that text IN the presentation).
How do I -- in prose -- describe the different audio characteristics
of speech created with two different glottal waveform generators?
And, be reasonably sure that the reader truly understands the (audible)
pros and cons of each.
Or, illustrate which classes of cubic beziers exhibit discontinuities?
Which have degenerate forms? You can describe these mathematically... but,
it is far simpler to just SHOW them, graphically.
Post by a***@spenarnc.xs4all.nlI'm writing a program to read TIFF's in behalf of ocr.
There are tools that will already do this for you. You can have an
invisible "text" layer that sits "under" the corresponding TIFF imagery
in a PDF. These are funky documents to use as selecting text based on
the *visible* imagery actually highlights the regions occupied by the
INvisible text. So, highlighting "these words" may actually show
"these words ma" as highlighted -- even though pasting that selection
will deliver the expected results! :< WYSInWYG!
Post by a***@spenarnc.xs4all.nlIn TIFF there are several compression schemes that are possible, e.g.
You can also render to different pixel depths - 1, 4, 16, etc. I created
some documents with a *2* bit pixel representation (which seemed "legal"
per the definition of the TIF format) but were not recognized by most tools
available, at that time. So, I had to "inflate" them to a 4b representation
in order to render them.
Post by a***@spenarnc.xs4all.nlone of those is the black and white Fax machines. TIFF is worse than
PDF, and you couldn't search it for text content.
You use TIFF as a semi-photographic rendering of the page. I often
request technical documents from my local public library. Invariably,
these are *FAXed* to the library. Then, printed for delivery to me.
So, the original document was scanned (at some resolution), FAXed
(with some potential for resampling in the FAX software), printed
(yet another resolution/resampling) and, finally, *I* scan it (so
I don't have to keep track of piles of paper) in yet another
resampling.
OTOH, I end up with a readable document, including any illustrations
that it may have had (often, color -- and greyscale -- is stripped
in the processing). This is far preferable to a searchable document
that I *don't* have! :-/
(File names for documents are really important. Folks who deliver documents
with names like C484915.pdf should be flogged and then shot! Are they
hosting those documents on an MSDOS FAT12 filesystem that can't handle
long DESCRIPTIVE file names????)