- From a MasterFile database or briefcase, select the documents you want to process and then click on [R+ Evidence Cruncher > Document Services].
The dialog box shown below appears listing the services available.
Services available in a briefcase differ and depend on the state of the briefcase and are covered in Production.
OCR/PDF crunch now and OCR/PDF crunch directory now
This is the most frequently used option. It is used to simply crunch documents in a MasterFile database or document files in directories on disk, preparing them for Express Load.
Many options (below) are available to control PDF conversion and OCR generation. For example you can stamp PDF files with headers or footers, scale their contents, choose colour or black and white output, set OCR options, and so forth. For more detail on each, click the "?" prefix on the field description for contextual help.
- Set any options you require and click "OK" to start the OCR/PDF crunching process.
"Re-crunch" forces the Evidence Cruncher to use the source file (that is, the second attachment in the profile) and re-create the PDF document and reload the text. The existing PDF document, if any, and any OCR text loaded from it, is erased and replaced.
When you use "Only selected" to process selected documents, due to a limitation within IBM Notes, the order in which they are processed will not be the order in which they appear in the view; therefore printed documents may be in random order.
During OCR processing, original document images are converted to 'B&W TIFF in PDF' format to conserve space.
Process near duplicates / Dump documents to disk
Choose this option to extract document files and save them to disk. You can dump native document files, PDF files, the redacted PDF or just profile meta data and optionally, OCR text in matching .OCR files. A corresponding MasterFile CSV load file is always created in the same directory.
This same function controls the dump of document and email text for near-duplicate clustering and email threading. That process starts with the dump here. Clustering, threading and review is explained in this article on near duplicates.
Dump PDF documents into one PDF
Choose this option to extract PDFs and aggregate them into one PDF. The aggregate PDF will have bookmarks to each document within it. A corresponding MasterFile CSV load file is also created in the same directory.
Load text from selected PDFs
Choose this option to load text from multiple PDFs (for example, OCR from searchable PDFs) into the profile's "OCR/Transcript/Full Text of Document" field.
Less frequently used
Un-crunch deletes the first PDF file attached to the profile. If there is only one PDF file attached to the profile, it is retained. Un-crunch also erases any OCR text loaded and resets the 'OCR/PDF Crunch Status' profile field.
Queue for OCR/PDF crunch now
You can add documents to the databases Evidence Cruncher queue to delegate or defer processing. Each MasterFile database has its own queue. When ready, simply select the documents from [L+ Evidence Cruncher Status > Evidence Cruncher Queue], set any options you require and click "OK" to start the OCR/PDF crunching process.
Dump documents to printer
The Evidence Cruncher will print documents from all major desktop applications, including Microsoft Office, Word, Excel, scanned document images, PDF, etc.
Choose this option to print selected or all documents. Specify a printer in the field that appears. If you leave the printer name field blank or enter "Default", the documents are printed to the default Windows printer. Other printer names can be found in the "Printers" folder in "Control Panel". For example:
identifies the printer "MyPrinter" attached to "MyServer".
Dump PDF documents to TIFF format
Choose this option to rasterize PDF documents into single or multi-page B&W TIFFs. A corresponding MasterFile CSV load file is also created in the same directory. You may also choose to dump OCR text in matching .OCR files.