Archive for May, 2009

Searchable PDFs and TIFFs with OCR text

Almost all cases now contain some form of electronic data. Even if most of your case involves paper documents, the paper is scanned and the document productions are actually electronic image files (TIFFs or PDF’s) on a disk. For this example we’ll use a scenario where our images are from scanned documents.

Regardless of the image format, you must OCR image files for them to be searchable. When PDF images are put through the OCR process they treat the OCR text a little differently than TIFF images. A PDF image actually embeds the OCR text within itself (kind of behind the image itself). The PDF software and the OCR software work together to align the OCR text directly behind the words within the image so when you go to the search function within Adobe, the search hit will be highlighted on the image. As a user, all you will see is the PDFs text while the software program will see both the PDF text and the OCR text that is in alignment behind it. Many clients like this method because it is typically fairly cheap and easy to use. This method does have its limits, however, and cannot be utilized with high efficiency in large collections.

This is where TIFF images come in. When a paper document is scanned in a TIFF format it also has to be put through the OCR process to enable search capability. When OCRing a TIFF image a separate text file (.txt) is created and used in the TIFF searching process. So let’s recap real quick…

  • A searchable PDF is only one file
  • A TIFF image that was OCRed has two files (the actual TIFF image and a text file).

Technically a TIFF image is not really searchable, it is the “text file” produced by the OCR process that is searchable. After OCR, each text file corresponds to its respective TIFF image (where the text came from) and when a search hit occurs in the text file, the TIFF image is indicated in the search hit. This cannot be done in Adobe and requires more advanced document management software to take advantage of these features. In addition to our on-line review platform ImageDepot, the two most common document management packages are Summation and Concordance. We consult with our clients to help them choose the right solution for their case.

These options may seem cumbersome, but in large document collections it is the way to go. You can use multiple search criteria and conditional searching such as “AND/OR” type searches to further cull down your collection. Beginning to understand OCR, and how it works, may mean saving literally hundreds of review hours for you and your client.

Jason Lopez
Imaging Department

VN:F [1.7.5_995]
Rating: 4.4/5 (35 votes cast)

No related posts.

Not Just A Pretty Face (Parts 2 & 3: Interview With A Viking)

Here is the next update in our series of video interviews called ‘Not Just A Pretty Face‘ that we’ll be doing roughly once per month. Today is extra special though as we’re giving you TWO interviews — with the same man! Different personalities, but the same man. As I said last time, we’re truly lucky to have some great people working here at New Jersey Legal and Jay, one of our drivers from our Princeton office, hits that nail on the head. I know many of our clients in the areas surrounding Princeton know Jay and see him in their firms regularly and my first video interview with Jay will show him like you know him. The second interview though…

In my second video interview with Jay we’ll be interviewing Jason as people know him in his freetime, on weekends and as he would prefer — but thankfully can’t — at work. We’ll call this one Interview With A Viking.

As usual, below you’ll find the youtube video embedded as well as a link to the interview on another video hosting site if youtube is unavailable for you. If you’re viewing the youtube video, we suggest pressing the ‘HQ’ button after pressing play so you can view the video in high quality.

VIDEO INTERVIEW 1

Video Interview 1 via alternate site, Viddler

VIDEO INTERVIEW 2

Video Interview 2 via alternate site, Viddler

VN:F [1.7.5_995]
Rating: 4.3/5 (31 votes cast)

Related posts:

  1. Not Just A Pretty Face (Part 1)
    This is the first episode in a series of interviews titled ‘Not Just A Pretty Face’ that we’ll be posting over the next few months. These videos will feature interviews......
  2. Data Destruction: The Fun Way (Videos)
    Last week we posted an entry asking “How Do You Delete Your Data?” Whether it resides on hard drives or disks, there are many ways of destroying your data. No......