Techinfobook

What is Document Capture in a SharePoint World?

By Raimund Wasner, Managing Director, Kollabria

KollabriaDocument capture is a term that vendors in the document management industry use to describe what happens after you put a paper document into the scanner and push the scan button.  It is a widely accepted term that covers so many features and capabilities that it really requires re-definition in a SharePoint world. Vendors use the term so loosely that it makes it nearly impossible for the average user to understand the features and capabilities that they truly get, and decide which of those are necessary to do the job at hand.

 

Related Educational Articles & Webinars:

 

 

Why Capture, why not Scanning? - A Historical Perspective

Getting a piece of paper into a computing system is a process.  It is a process that consists of several steps.  First and foremost is document preparation (the removal of staples for example), next is actually scanning and performing image enhancement (make ugly paper documents look good on the screen, and in the subsequent file), then to indexing the document (filling out a "properties" sheet on the document), then to extraction (turning characters into machine readable text), then to classification (automatically detecting the type of document), then to release (sending the image someplace to be stored, and the metadata about the image (index, extracted content, classification) to another place.  Up until the present, the presumption has been that all of those things had to be done at scan time and therefore one product was needed to do all of this.  In order to demonstrate to people that there is more to scanning than putting a piece of paper into a scanner and pushing a button, the process of scanning was referred to as the "capture" process.  Software that could do all of the above and then some, was called "Capture Management Software". 

The need for Capture Management Software was dictated by several technology related issues:

1. Document Scanners were dumb - they were mechanical devices with camera's inside.  The "thinking" behind the had to be done inside the computer to which the scanner was attached.

2. PC's were slow and underpowered and you needed a lot of them to do document imaging - the PC was not architected for or natively capable of managing, storing and processing large numbers of file objects that were also bitmaps of paper documents. PC's came with no facilities to be able to accomplish this feat and therefore needed specialized software to help them do that.

3. Networks were slow and finicky - Bandwidth was limited.  Scanning and moving large numbers of big files around the network was simply not do-able on a PC LAN without separating scanning from retrieval traffic.

For this reason, scanning, image clean-up, indexing, classification, had to be done first.  The final image, and the metadata that indexes it were "released" the the document management software against which users executed retrievals of the document image.  Using a single "capture" platform to perform that sequence of steps was desirable, and given the state of the various technologies, also necessary.  All content management architectures (except some hold outs that wanted to do their own "capture" as part of the document/content management architecture) were built with that expectation in mind.  

All solutions are sold as a integrated system, where Scanners are connected to Capture Software, the capture process is performed, the images and metadata are handed off to the management platform which handles how the objects are manipulated, and the preservation of those objects is either controlled by the management software itself, or a separate set of software and hardware.  What do you do if there is no "management" software?  Hold that thought, we'll come back to that in a minute.

What is Scanning?

The customer expects that any product that says it "scans" also actually performs that operation.  When you buy a car, you expect it to start, and you expect it to roll.  Well there are crummy cars that don't start all the time, and don't roll very well.  So it is with scanners. They are not all the same even though they may have the same "specifications".  What sets scanners apart is the purpose they were built for,  what they have inside in order to render the best possible picture, their ability to put the image where you want it, and for how reliably they perform those actions.  The expectation is that the scanner will provide you with the best possible "picture" by pushing a single button without the need to fiddle with a bunch of pre-sets, sliders and other foolishness. It will simply put the "picture" of that document onto my computer desktop, or in a folder.  

It takes a lot of complex and powerful image processing technology to take a high quality picture of a paper document and turn it into a useable document image.  Without going into too much detail here, there is a long list of features like:

- Make sure the image comes out straight even though the paper went in crooked

- Take out the black dots you get when you scan 3-hole punched paper

- Make carbonless copies look good and readable

- Give you both a color and a black and white image if you need it

....... you get the picture....

This is what separates a document scanner from an MFP (Multifunction Peripheral), cheap Staples MFP. or other non-document specialized scanner.  Many high quality scanners (see below) come with those and other features which is an indication of just how powerful they have become.  

Is Capture a Form of ad hoc Document Management?

It is when it comes to SharePoint.  SharePoint for all intents and purposes comes "out of the box" with the ability to share, collaborate, distribute and manage files, including document images.  So, here is the chicken and egg part of this article.  Can I scan first, and "capture" later? Do I need to "capture" at all?

Given the technology built into the high quality scanners (see below) you can simply scan to your desktop or directly into a folder with some, not all of them. With Kodak scanners in particular you can scan directly into SharePoint itself, making it very easy for SharePoint users to collaborate around paper documents.  The power of the scanners themselves pretty much gives you everything you need to do that.  Just even simply adding paper documents as PDF's into SharePoint portals, your own SharePoint workspace or anywhere else inside SharePoint becomes an easy thing to do.

The built in taxonomy/folksonomy features of SharePoint allow the person who has scanned a document to tag the scanned images with labels that they make up (folksonomies), or labels that are supplied via formal taxonomies provided by SharePoint administrators.  You can use SharePoint search, or tag clouds to facilitate and execute retrievals.  You can even just manually add your own metadata (index values) in SharePoint and link it directly to the scanned document.  

For casual and ad hoc operations, and indeed workgroup or even small infrequent departmental efforts that may be good enough, and may well be all that is expected.  

For departments scanning higher volumes, you can also scan first and capture later, but now you are using the capture software (if it can do so) to bring order to the documents that you have already scanned.  With products like Kodak Capture Pro for example, you can organize your SharePoint scanned images by using it to look at the document, creating new metadata either automatically by using the built in OCR features, or manually by adding your own standard index values , or even make searchable PDF that SharePoint's FAST engine can search for and retrieve. You can even perform batch operations on previously scanned documents that do all of that.  Or if you like you can also do that at scan time.

For production scanning applications however, it becomes a whole new ball game.  That is fodder for the next article.

Recommended Vendors

In the column below you will find a short list of recommended vendors each of which has some unique products that provide a wide range of document management capabilities.

Kodak 

All Kodak scanners come with native SharePoint scanning support

Kodak Capture Pro  

Software to help you process the image and its content before and after you scan

Canon Well made powerful scanners, optimized for document scanning
Kofax Express Software to help you process the image and its content before you scan

 

For more assistance you might consider joining our executive member community, which gives you access to additional content not publicly distributed and also provides you with the ability to discuss your selection issues with an analyst.  The $495 annual subscription fee will come back to as thousands of dollars of savings.

Kollabria
The voice behind techinfocenter.com

 


Article Details

Last Updated
28th of July, 2010

Document Scanners
Recommended Scanners

Would you like to...

Print this page Print this page

Email this page Email this page

Questions and Comments Questions and Comments

Subscribe me

Add to favorites Add to favorites

Remove Highlighting Remove Highlighting

Edit this Article

Quick Edit

Export to PDF

User Opinions ( )

How would you rate this answer?



Thank you for rating this answer.

Stay Current...

Join Our Mail List Join Our Mail List

Related Articles

Attachments

No attachments were found.

Visitor Comments

  1. Comment #1 (Posted by David Houlston )
    It is worth noting that Sharepoint 2010 has a much better handle on metadata 'tagging' for documents. This means that a well designed taxonomy within the Sharepoint site can now be used for batch orientated capture. Front this with intelligent software tools like barcode and data lookups and you now have a viable ECM platform to build upon.
  2. Comment #2 (Posted by Raimund Wasner )
    Exactly David!

Questions and Comments

Have a question or comment? We'd love to hear form you. Simply complete the form below. Fields marked with an asterisk are required.
   Name:
   Email:
* Comment:
* Enter the code below:
 

Continue