RATIONALES AND STRATEGIES FOR DIGITIZING LIBRARY INFORMATION RESOURCES
This paper provides foundation for understanding digitization from planning, implementation, promotion, evaluation; features of the source material being converted, metadata, scanning and its technology. It also explained the functional requirements for digital reproduction and how the end product will be put to use. Finally, the paper offers pieces of advice and guidance for the practicing librarians and information scientists.
Librarians have been digitizing collections for decades, their collective experiences have produced a depth of technical expertise and practical knowledge that has been widely shared among colleagues and well reported in a number of literatures. Libraries face several issues when it comes to undertaking digitization. Some of these issues are constraints that if left unaddressed may limit the potential of digitization to enhance research and teaching. Therefore they must be identified, explored and provides possible solutions in order to sustain digitization project effectively.
Digitization is the process of taking traditional library materials that are in form of books and papers and converting them to the electronic form where they can be stored and manipulated by a computer (Ian and David 2003). High-Tech Dictionary defines digitization as the process of translating data into digital form (binary coded files for use in computers). Scanning images, sampling sound, converting text on paper into text in computer files, all are examples of digitization.
Digitization converts materials from formats that can be read by people (analog) to a format that can be read only by machines (digital). Flatbed scanning, digital cameras, planetary cameras, and a number of other devices can be used to digitize library and cultural heritage materials. Nevertheless, the accessibility of digital information differs among libraries as well as among individual end-user. These differences are attributed to factors such as the availability of the required hardware, personnel, types of metadata used to organize the materials and the users' ability to identify, select, retrieve and navigate as well as the libraries' role in creating awareness, facilitating access and stimulating the use of digital information,
Ding (2000) has elaborated the works of Getz (1997), Line (1996) and McKinley (1997) on the advantages of digitization. They maintained that:
i. Digitization means no new buildings are required; information sharing can be enhanced and redundancy of collections reduced.
ii. Digitization leads to the development of Internet in digitalized based libraries. As Internet is now the preferred form of publication and dissemination.
iii. Digital materials can be sorted, transmitted and retrieved easily and quickly.
iv. Access to electronic information is cheaper than its print counterpart when all the files are stored in an electronic warehouse with compatible facilities and equipment.
v. Digital texts can be linked, thus made interactive; besides, it enhances the retrieval of more information.
In the light of the following advantages, it is natural today to find more information being digitized and uploaded into the Internet or Compact-Disc Read Only Memory (CD- ROM) in order to be made correspondingly accessible globally.
There are three main needs for digitization; two or all the three of them may apply to your digital library project. Smith (1999), given the following reasons in his work titled “Why Digitize"?
i. To preserve the Documents: That is to allow people to read older or unique documents without damage to the originals.
ii. To make the documents more accessible: This is to serve the existing users better; e.g. to allow the users to search the full text of the documents, to serve more Users than envisaged in remote locations, or more than one person at a time, or to bring together scattered materials on a specific, topic and to respond to a particular request for a digital library,
iii. To reuse the documents. It means to convert documents into different formats; for example to use images in a slideshow and to adopt the content for a different purpose.
Digitizing documents can take a lot of time, effort and money. The following reasons: should be considered before going into digitization.
Reasons to be considered
i. Is it worth digitizing?
Do the documents contain the information that is valuable enough to warrant the costs of digitization? There is no point digitizing the documents that are already out of date, no matter how bulky they, but it is worthy to digitize the old, unique documents that can be easily damaged so that the people can be allowed to use them without handling the originals. These unique documents are sometimes called the heritage documents.
ii. Who is your audience?
If there are only few users, or may be there are a large number of potential users, but they do not have computers to access the digital library, they can be served by sending those photocopies. It may be difficult to judge the demand for documents. It is, however; wise to get other people's opinions. Ask the potential users of the documents what they see as their priorities.
iii. Do the documents form a collection?
It is important to verify if the documents form a collection. In fact, the documents in a digital library should have something in common like a common subject focus.
iv. How easy is it to digitize documents?
Another important factor to take into account is how easy it will be to, digitize the documents. Not all the hard copy documents can be easily converted to electronic format. There is the need to check the physical characteristics of the documents to understand how easy it will be to digitize them. If you have a lot of documents that are hard to digitize, you might choose not to include them in the digital library. It is advisable to put them in the image files, rather than in the searchable text document.
Having established the ground-rules mentioned, the next is to make selection and prioritizing the materials to be digitized. Developing a score sheet in a tabular format will guide you on your job. Creating a digital library collection involves the following steps: planning, implementation, promotion and evaluation. The creation of digital library collection requires careful planning, implementation and management of various processes. These are essential if the finished product is to successfully meet the user's needs and conform to the accepted quality standards.
Planning mainly involves identifying various tasks related to digitization, developing strategies for handling these tasks, identifying required resources and formulating a timeline for accomplishing these tasks. A successful digitization projects may have the following plans:
i. Define goals and objectives,
ii. Needs analysis
iii. Establish a working staff/team,
iv. Agree on a plan of action,
v. Agree on a timetable and end product,
vi. Monitor project process,
vii. Re-assess and revise goals as unforeseen situations develop, and/or
viii. Control the process and the outcomes, and
ix. Assess the outcomes/results
If there is a need to have a large digitization project, you may consider conducting a feasibility study to assess the viability of the project before detailed planning. The outcome of the feasibility study could be a formal proposal for obtaining management approval or grant for the project.
a. Define goals and objectives
The first step in planning a digitization of library collection is to develop a keen understanding of the overall goals and missions of your institution, specify the need, its purpose and target user community. You should indicate if management, the users or others have expressed this need and defined what this need is. The purpose could be improving preservation of some rare or delicate materials, improving access to and the visibility of certain material or facilitating re-use of documents. It is important to identify the target user community and their profile.
b. Needs analysis
Once these overview aspects of the digitization project have been established, and it is clear that a digitization project will meet your needs and an audience has been determined, the next step in planning a digitization project is to take stock of your environment and resources to assess needs. Typically, this kind of analysis achieves several goals. These include but not limited to:
· determining funding sources,
· assessing staffing required, and
· examine the extent and type of technical support needed.
To conduct these analyses, it is helpful to ask specific questions such as:
* Do you have the hardware to digitize?
* Do you have the software to digitize?
* Do you have adequate storage for master digital images?
* Do you have the software and hardware to provide access to the digitized collections?
* Will your equipment provide the speed of access needed for large files?
* Will you be able to upgrade equipment as newer technologies come online?
ii. Documentation and conventional practices
* Do you have sound documentation, or will you need to substantially re-work your collection data?
* Do you have appropriate metadata for the collection or can it be derived quickly from previous work on the collection? (i.e., do you have document identification, capture information, acquisition records provenance information, navigation paths, indexing)
iii. Administration and staffing
* Do you understand the scale of the project and how it will affect routine work flow?
* Does the cost of the digitization project fit within the planned budget? Is the project worth the cost? Will additional monies be needed to complete the project?
* Do you have enough time to complete the project?
* Do you have sufficiently skilled staff (including those who understand the technical needs of digitization) to effectively complete the project?
* Do you have the means to train staff and keep their training current?
iv. Audience and patrons
* Will the digitized materials meet your audience's needs?
It is from such an analysis that project goats and objectives are refined.
c. Establish a working staff/team
A project's long-term success depends on the accurate assessment of the required human resources. Institutions vary in their areas of expertise and different types of projects require different skills. Most digitization projects in library institutions will require that the following tasks:
* Conservation: A crucial aspect of any digitization initiative will be a conservation assessment of the hard copy materials. Under some conditions this may show that before some material can be digitized, it will require conservation intervention.
* Digitization/Encoding: This can involve digital imaging, keyboarding, Optical Character Recognition, character or full-text encoding, or a combination of these.
* Metadata/Cataloging: The creation of metadata records for the digital material is .a specialized task. This work may also involve cataloging the analog material or searching for information to enhance the metadata record where it is absent from the analog version.
* Technical Development/Support: This falls into two distinct areas: the creation or implementation of specific IT solutions for creating, managing, delivering, or preserving the digital material, and the provision of IT support for project hardware and software. This latter area includes workstations, desktop applications, network services, and capture devices.
In smaller institutions staff may carry out tasks in more than one area. For example, the digitizer may also handle technical development, or the project manager may take on metadata creation. A digitization project staff may include any combination of the following: advisory board, project manager, curatorial staff, archive staff, library staff, volunteers, interns, catalogers, systems analyst, programmer, web designer, or photographer. Above all, digitization projects involve team approach, even if that team is very small. A variety of skills and expertise are required to execute a successful digitization project. Below are some tips for hiring new staff for a digitization project.
· Job Descriptions,
· Desirable Employee Characteristic with particular reference to traits analysis like Capacity to learn constantly and quickly, Flexibility, Innate skepticism, Risk-taking, Public service perspective, Team spirit and appreciation of other perspectives, Skill at enabling and fostering change, Capacity and desire .to work independently.
· Professional staffing standard and
d. Agree on a plan of action
There is the need to define the source material that can be digitized and their key attributes. Examples of source material include project reports, staff publications, working papers, theses, audio' and video lectures, songs and musical scores etc. There is also the need to specify what portion of the material is to be digitized and if all the material or only a sub-set will be covered in the digitization. Remember to assess copyright restrictions.
Define the key features of the library collection you plan to digitize. Identify the nature of the collection in respect of the digitization process e.g. static or dynamic. Assess the document formats to be used for storing and delivering, documents and the user end requirements to handle these formats. Plan a strategy for maintaining back- up copies.
e. Agree on timetable and end product
The important task in digitizing library materials is the conversion of the source materials available in hardcopy into a digital format. There should be a clear cut statement about the related requirements, timeframe and their processes, namely:
i. how to convert the source material into required digital format,
ii. what are the digitization requirements?,
iii. the workflow involved in digitizing, the source material,
iv. identify the resources and money required for digitizing and maintaining digital collections and
v. what type of information technology (IT) infrastructure is required for digitizing and maintaining the digital collections?
f. Monitor project process
As the digitization project gets underway, the project manager should outline those processes to utilize staff time efficiently and to assure that no one process gets missed. The timeline for digitization of collections will naturally be determined by the institutional goals, staffing of the institution, and by the fiscal resources available. An example of general table of the steps in the digitization process based on the model established by the Library of Congress is as follows in the table:
% of Time (approximate)
· Establish Project Goals and Objectives
· Select items or collection(s)
· Research copyright, use restrictions, other
· -record information appropriately
· Plan the project
· -a small project or a small collection?
· -a selected project across collections?
· -the audience?
· -type of access? metadata?
· Develop the work-plan with staff and admin.
· Hire or re-assign staff
· Determine division of labor and roles of staff
· Train staff in proper handling, etc.
· Define work space
Determine the structure and/or arrangement of material
Prepare the material
· Organize: reformat material if necessary Preserve: repair or adjust
· Describe: develop finding aid, catalog, or database Determine name and subject authorities .LC, DOC, etc.
Apply consistent digital naming conventions
Establish processes for physical “handling” – fragile material - oversized material, etc
Establish access and use guidelines
· Determine the costs of contracted services "
· Establish reputation of service
· Allocate portions to be outsourced
· Prepare RFPs, if necessary
· Draft work statement
· Draft timeline
· Evaluate proposals
Digital Capture Process
· Conduct post-processing
· -create multiples (access, thumbnails, etc.)
· -name files
· -convert text, format, create headers, compress, set up for Web
· Inspect 10% of records for accuracy
· Inspect 10% of images for quality
· Check technical requirements and standards
· Give feedback to administration, contracted services
· Record assessment
· Make adjustments where necessary .
· Determine archival storage method
· Record all necessary information for migration purposes
Prepare for Web Access
· Prepare HTML files
· Create indexes
· Assess quality of Web creation
· -Web accessibility for disabled
· -Consistent with current standards .,
· Test, re-design, if necessary
· Establish distribution network (internal or external)
· Prepare educational modules, if applicable
· Qualitative and Quantitative assessment
Digitization Project Workflow
- Use document removed cards
- Place in scan file cabinet
5 hours week
to be done semi-weekly
- Clean Scanner
- Calibrate -Scan
- Save following file name procedures
- Move folder to catalogers
cabinet/flag for cataloging
workflow sheet for daily work
- Prepare Dublin Core
- Create text file for
web pages and
transfer to web staff
workflow sheet for daily work
- Create jpg and
thumbnail images -Store images in
- appropriate folders on server
- Create backup CDs
for tiffs '-
- Update index for backup CDs
Web site development
- Using template,
create pages for each image.
- Include contextual information tin page and Dublin Core in
g. Reassess and revise goal as unforeseen situation develop
A workflow table such as the one mentioned can be modified in any number of ways to reflect the activities, staffing and time frames of digitization project. It will also be necessary for the project manager to recognize that any workflow table or project planning chart is not set in static; it should allow for changes to be made due to unforeseen circumstances. Having a chart or table allows the project manager and project work team to map out work and see exactly what needs to be done when and by whom. By having all staff aware of each others' responsibilities and deadlines, the effects of any changes can be more easily understood by all involved.
h. Control the process and the outcomes
The project work team should note that digital product is not to be ignored, after the completion of digitization work and mounting on the Web. It is at this point that concerns about site maintenance and data migration begin to payoff. And even if digital products were self-maintaining, they/probably would continue to draw the attention of staff. Most digital collections made available online cause an increase in requests for the material and increase the reference duties of the host institution.
i. Assess the out outcomes/results
Finally, there is the need to define how the project is going to be implemented and what are major milestones and time requirements are?
Planning is followed by implementation. That is getting down to the actual steps required to set up the collection. This means that there must be a need to obtain the management approval for the plan and the required resources before proceeding with the implementation. There is a need to identify and designate a project manager to lead the implementation of the digital project. For large digital library projects, it is essential to have, a full time project manager for the project period.
The Implementation of a digital library project involves the following activities.
i. Establish the project team
ii. Set up the Information Technology (IT) infrastructure
iii. Procure and install digitization equipments and software
iv. Finalize policies and specifications
v. Complete arrangement of workflow for digitization
vi. Obtain copyright permissions and
vii. Release the digital copies for use.
PROMOTION AND PROVISION OF SERVICES
The digital library collection created should be visible, and it should provide an easy access for users. One-way of achieving this is to include links to the collection site in the appropriate pages of the library website and other related on-line services in the organization. In addition to, or in the absence of remote on-line access to the digital collection, there is the need to explore other modes of providing access to the digital collection. These may include:
i. Setting up local public access computers on the library Local Area Network.
ii. Provision of e-mail based services and
iii. CD-ROM based distribution of the collection.
Creating a digital project involves multiple steps and considerations including evaluation potentially formatively during the development of the digital resources and summative to assess continuing impacts. Thus, an evaluative component needs to be planned for during project initiation in order to identify potential improvements as well as to identify the impacts of the digital project over time. This helps in understanding costs and benefits as well as whether the presentation and inter-operative framework are appropriate for users.
In order to construct a good evaluation framework, project planners need to understand why evaluation is so important and what the general characteristics of evaluation are. In the beginning to contemplate evaluation consider these questions: Why are you doing evaluation, what do you want to find out, and what will you do with the answers? Answering these questions is the first step in developing an evaluation plan and provides a perspective on the qualities of evaluation.
Three reasons why evaluation matters:
i. To improve performance by helping project staff manage the process of developing, planning, and implementing prototypes and systems.
ii. To provide evidence for usability, cost-effectiveness, and added value of projects, including systems, output, and configurations developed.
iii. To contribute to the overall learning from the project.
And six things to remember about evaluation:
i. Evaluation results from design not accident.
ii. Evaluation has purpose.
iii. Evaluation is about quality.
iv. Evaluation is more than measurement.
v. Evaluation doesn't have to be big.
vi. There is no one right way to evaluate.
There are a wide variety of different answers to the second question: what do you want to find out. Thinking about this in the planning process, will help you develop appropriate evaluation approaches. For instance, you may want to discover whether or not your workflow is appropriate and efficient for the production of the digital project. One the other hand, you may want to find out if the website is pleasing to look at or easy to use. Also, some funding agencies require a focus on identifying desired outcomes and basing an evaluation plan on identifying actual outcomes. Deciding what you want to find out determines what kinds of measurements you will take and what questions you will ask in the data collected.
Finally, who your intended audience is for the evaluation is an important consideration. Are you reporting administrative details or are you reporting user feedback? This has an impact not only on the form that that dissemination plan comes in, but in the direct results that can take place from the evaluation process. Evaluation is only as useful as the actions that can be implemented as a result, including decisions that no actions need to tak place.
Selection of the necessary equipment can have the greatest impact on the quality of images for a digital project. The development of scanning and digital camera technology has led to a proliferation of equipment varying in quality and availability. Before any equipment is purchase, consider the following overall questions:
a. What can your staff and your physical environment accommodate?
b. What can your current technology support?
c. What type of material are you digitizing (photos, documentation, art images, artifacts, etc.)? .
d. What financial restrictions do you have?
e. How will you provide storage for you project?
Hardware: Digital Capture
There are five basic types of digital capture devices.
* Flatbed scanner - The most commonly used type, it accepts a broad range of formats and varies in quality and price. Flatbed Scanners are typically modeled for a scan area of 8" x 11", but larger flatbed scanners are available. They can be purchased with transparency adapters which handle negatives and slides very easily. High end scanners have less problems with "flare" and now come with front side USB and fire wire connectors which are much easier to use, especially with digital cameras.
* Sheet-fed scanner -Similar to the flat-bed Scanner, it is used for batch work and should never be employed with originals because of potential jamming which could damage or destroy the originals.
* Drum scanner -The drum Scanner produces high quality images but is quite expensive. Because materials are affixed to a rotating drum, they are not recommended for cultural heritage materials but are suitable for surrogate negatives and transparencies. There are now drum scanners, sometimes called roll scanners, they utilize a conveyor belt arrangement which is less damaging to the original materials instead of a rotating drum.
* Digital camera -Good for 3-dimensional objects, digital cameras vary widely in quality and price. They also have a problem with “flare” or bright patches on the images. Lenses are geared toward the capture of 3-dimensional scenes and may introduce distortions to flat materials. If a digital camera is necessary, it works best in a controlled, studio-type environment.
* Film scanner -Specifically designed to digitize transparent materials such as 35 mm film, the film scanner is particularly good for roll film, but less productive for slides. It too has a problem with "flare."
Advantages and Disadvantages of Digital Capture Devices
· Highly addressable
· Many units can handle both transmission and reflection materials
· Flexible software
· Most good up to 600dpi of real resolution
· Low learning curve
· Low productivity, frequent document handling
· Tendency toward streaking and color Miss-registration
· Prone to inflated marketing claims
· .High productivity
· As good as or better than flatbed
· Many automatic features
· .Unsuitable for fragile
· bound, wrinkled, 3-D or inflexible objects
· More expensive than flatbed scanners
· May not handle all sizes of documents
· Very high image quality
- high resolution
- low noise
- high dynamic range
- good tone/color fidelity
- few artifacts
· Very flexible-software drivers
· Variable sampling rate
· Low productivity
· Frequent handling
· High operator skill level
· Handles limiteddocument types: must be mountable on drum
· Can handle a variety of document/object types
· (3-D, bound, glass plates, non-flat oversized)
· Unlimited field size
· User-controlled lighting
· Rapid capture for area arrays
· Non-contact capture
· May have interchangeable lenses
· Generally good image quality
· Good models are expensive
· Limited sensor size
· Low productivity for linear array types
· Non uniformity artifacts common
· Area array devices prone to low dynamic range due to flare
· Moderate skill level required
· Highly productive for roll film
· Low flare/good dynamic range for linear arrays
· Low productivity for sheet film or slides
· Potential for high flare in area-array devices
· Dust/Scratch artifacts common
· Image quality characterization difficult due to lack of targets
Select the computer that will be used in the digital production, It is recommended to devote one computer to this and below are outlined some guidelines on the best selection for this, Select a computer that:
· has as much Random Access Memory (RAM) as possible (at least 512 mb). More memory allows the computer to process large amounts of image data more quickly.
· has a processor that is optimized for image manipulation (Pentium IV and higher).
· supports high-speed data input through serial connections Universal Serial Bus (USB 2.0), or Institute of Electrical and Electronics Engineers (IEEE 1394) "Fire-wire."
· has an International Standard Organization (ISO) 9660 compliant CD-RW burner to create archival storage CD-ROMs of your digital images.
If you are going to be purchasing a new computer to act as your digitization station, it is recommended that you review trade publications such as PC Magazine to help make an informed decision. In making these decisions, it is recommended that you involve your technology support as much as possible. Not only can technology personnel provide help in making decisions, but they will be better able to perpetuate their support throughout your digitization project.
In purchasing hardware, consider these issues:
a. What are the resolution capabilities?
b. Is the scan bed large enough to handle your originals?
c. How long does it take to scan one image at your master image specifications?
d. Does the manufacturer have a good reputation fur service and durability?
e. Optics quality is important. Manufacturers' claims sometime may be unreliable, especially relating to the number of pages scanned per minute. and the maximum possible resolutions. Look for product reviews; ask those using the equipment, and put close attention to actual rather than interpolated resolution. A scanner's speed is directly related to the associated computer's capabilities. The higher or faster the RAM, Hard disk space and CPU speed, the better.
Some kind of software usually accompanies the digital production device. For a scanner, iris called the scanning software, while for a digital camera it is refer to the software that provides the interface to download images from the camera to the computer. A second kind of software is used to manipulate the scanned image. This is image manipulation software. It may come with the scanner, but. i~ will usually allow for only the very basic editing of an image. Manipulation software is mounted on the hard drive of a computer and is used to orient the image; crop it; adjust brightness, contrast, and resolution; transform; flip; or otherwise manipulate the image.
The de facto standard for image manipulation is the software package, Adobe PhotoShop. It can import the scanning software so that you are able to scan and manipulate the image within the PhotoShop umbrella application. There are several versions of PhotoShop, ranging from PhotoShop Elements to "PhotoShop Creative Suite Premium. Other imaging software is adequate for basic tasks (Paint Shop Pro, Desk-scan II, etc.). It is recommended that you look for software that allows you some flexibility for advanced manipulation and saves the image in all the common formats (i.e., TIFF, JPEG, GIF). It is also recommended that the software allows conversion from one format to another. If the project will require the processing of a large volume of images, it is best to consider additional software that allows batch processing (i.e., Photo-Shop, Deba-belizer or Image-Magic) that will enable the automatic processing of files and the standardization of compression.
When selecting image manipulation software, institutions should look for:
· Ability to work directly with scanner, software through Toolkit Without An Interesting Name (TW AIN) or other plug-ins
· Support for a wide variety of file formats
· Tools for controllable image optimization (i.e., color adjustment or color spaces)
· Usable documentation and reliable technical support
· Ability to create macros for frequently applied functions
· Batch processing
a. Manipulation software
How versatile is the software? What storage formats does it support? What are the options for manipulating the image? Can you turn off some of the options or does the software force you to "improve" the image?
b. Scan software
What are its resolution capabilities? What are the save file options? Can you set the default? Does it allow you to change the default settings or must you change them each time you scan or state a scanning session?
The main factors to consider in purchase:
Scanners and digital cameras can range anywhere from $100.00 (or less) to thousands of dollars. Remember, you get what you pay for. Scanners in the mid-range of several hundred dollars are likely to be adequate for most scanning projects. Look carefully at warranties, maintenance reputation, reliability, good documentation, flexibility of the scanning platform and non-proprietary interface cards.
Installing the scanner should be very direct. With only a few exceptions, the scanner is a plug and play peripheral. Be very careful to purchase a scanner that does not require a proprietary interface card, as this card may create incompatibility in other computer functions. A Universal Serial Bus (USB) interface has become the standard although Small Computer System Interface (SCSI2) is still better and fire wire connectors are almost as popular as they are faster). SCSI 2 allows attachment of other devices to the computer with few complications (tape-drives, Zip-drives, CD Rom drives, etc.), but also requires a special hardware card. The other devices may be required for storage and for transport of large files as the institution's digital collections grow. Installation does not impact digital cameras, although you will want to be sure that accompanying hardware will work on your computer platform.
c. Destination of the image
If the use will be for Web images alone, an inexpensive capture device may suffice. If archiving or migration is of concern, higher-end machines are needed.
d. Resolution needed
A 4 x 5 photograph will be fine on a 600 dpi scanner. A 1 x 2 contact print will need a higher resolution, more in the range of 1200 dpi, and will require a more expensive scanner.
e. Number of items to be scanned
If you plan to process large collections, the 30 seconds or more needed to scan one image can add up to an enormous drain on resources. Consider buying a faster scanner or buy two scanners (this won't help if your staff is small!). A "single pass" scanner is the faster scanner but may not capture all the information.
f. Format of items to be digitized
Slides, photographs, color, grayscale, half-tone print, graphics, text, three- dimensional objects, etc. will all need to be treated differently for best results. Can the scanner handle a variety of formats? If there are three-dimensional objects or large oversized flat materials to be digitized, a digital camera will need to be purchased. Slides and film require more sophisticated scanners, and the purchase price will be higher if a stand-alone system is purchased.
g. Additional Tools
Some tools will come with the scanner. These often include masks for transparencies and negatives. These are strongly recommended, as a dark surrounding field for transparencies and negatives produces the best scanned image. Compressed air, and/or a soft brush will be useful for photographs and to keep the bed of the scanner free of lint. Tripods and other equipment are necessary for a digital camera to create a stable digitization station. These would be items that would have to be purchased in addition to your camera. And of course, add to this list of tools cotton gloves for those handling originals.
If you are purchasing an expensive capture device, company representatives should demonstrate its capabilities. You should also negotiate a trial period in which you can evaluate the results of digitizing a full range of materials.
DIFFERENT STAGES IN DIGITIZING DOCUMENTS
There are seven stages in digitizing documents for a digital library: Selection, Registering, Scanning, Optical Character Recognition, Proofreading and formatting and producing the Final Version.
Materials to be digitized have to be selected and identify. This will give a clear cut position to know what you are doing and
Before scanning large number of documents, there is the need to first register them and use a filing system to keep their track. If not, you risk misplacing hardcopies, losing files, skipping steps in the process or duplicating work, perhaps without realizing it. There is also the risk of losing electronic versions of files because they have been misnamed or saved in the wrong subdirectory. Moreover, a good filing system is vital, so everyone in the digitizing team knows what he is supposed to do, and he can fill in for another person in case of absence.
It is necessary to clean and dust off the documents to be scanned; make sure that all the pages are present and in the right order. If the document is in poor condition, try to find a fresh copy. If it is a sheet fed scanner, cut the book open to get individual sheets to feed through the scanner. If necessary, you can rebind the books later. If you do not want to damage the books, you can photocopy each page and feed in the photocopy through the scanner, though this uses a lot of paper and reduces the quality of the scan. To scan a document, place it face down on the scanner platen or put the pages into the sheet feeder. Then, in the software, choose a setting, resolution and colour and scan each page of the document at the settings you have chosen.
d. Optical Character Recognition (OCR)
Optical Character Recognition (OCR) software converts a scanned image into a text file that a word processor can read. To do this, it must first recognize where the text is on the page. The software breaks the text blocks down into lines or into an individual character. It tries to match the image of each letter against patterns it recognizes as an "a", "b", etc. There is a problem to encounter with languages that use Latin scripts with accented characters. As a solution, you should use the OCR software that is specific for language.
This is the act of making corrections to the document text and layout. This is done in two ways:
i. Comparing the scanned text on the screen with the hardcopy and entering the corrections directly into the computer. The word processor's spellchecker will help in spelling errors quickly.
ii. Printing out the scanned text and comparing it with the original copy. Mark any corrections on the printout, and then enter them, into the computer. This is a slower method, but may be the best option if there are no enough computers for each proofreader.
The Optical Character Recognition (OCR) software may produce a document that consists of straight text, no columns, no headers and footers. There is the need to reinsert these by hand or correct where they appear on the page. There may be also need to change the typeface, heading styles and so on, to make the document more attractive and readable. Alternatively, you may be able to adjust the settings of your OCR program to preserve the layout of the page.
g. Final Version
For many documents, there is a need to add some information to the text so that readers can identify it easily. As for a book you must make sure that the book title, the author or the editor, the publisher and the publication date are all included. As for chapter in a book, you should include the title and the author of that chapter and the original page numbers in the printed version of the book. As for the journal articles you should include the journal title, the date, the volume and the issue number, the article title and the authors and the page numbers in the original printed journal. In other words there is the need to add Metadata to describe each document.
It is expensive for institutions to go back and re-digitize their holdings. Few ever do so. In addition, many originals could suffer from the handling and exposure to bright light required by digitization. Therefore, it is best to simply "scan once," create a master image, and make any future duplicates from it.
Step One - Create a Master Image
The highest quality copy of a digital image, often called the mater image, is expected to be a quality in place of the original. As such, it should represent the un-manipulated original and be created at a high resolution and stored in an uncompressed format usually Tagged Image File Formats (TIFF). High resolution equals large amounts of information captured, and large amounts of information captured usually equal a higher quality digital image. The higher the quality, the longer the life of the digital copy and the more versatile it can be used, It is the master image that holds the promise of versatility and longevity. From it, high quality prints or publications might be made as well as derivatives for a variety of uses.
Step Two - Create an Access Image
Access images refers to lower reso14tion copies taken from the master copy by using a "save as'' function and changing the storage format and resolution. They may be of varying quality and are generally manipulated for better display upon the screen or page (cropping, re-sizing, etc.) Additional images, such as "thumbnails" (even lower resolution copies) may also be created from the master or access image. These thumbnails allow for even quicker downloads of pages, and faster retrieval of large numbers of images.
Step Three - Storing the Master Image
The master image is the copy to be maintained for the long-term. As such, it should be stored appropriately. Master images take up a great deal of space, and most institutions will not wish to store them for the long-term on computer hard-drives. Some institutions maintaining large amounts of digital images will wish to work with a form of tape or server backup, while those institutions engaged in more modest digital products should copy master images onto CDs. If an institution decides to use the CD as a storage medium, it is suggested that two copies of each CD be prepared and stored separately. One will serve as the "master" CD and the other will be the "use" CD from which access images, copies for users, etc. may be prepared. CDs used in this way should be "refreshed" regularly: copied from the old CD to a new CD (approximately every 5 years). Not all CDs are equal. Master images should be stored on CD-R gold. The gold in the CDs does not oxidize, thus the storage medium lasts longer.
How do we find the materials in our libraries, archives, museums, and historical societies? The descriptive tools that allow special collections to be accessed are in large number forms. Yet, libraries, archives, museums, and historical societies and indeed all cultural heritage institutions are depending upon these tools of access to make them viable. Among the many information repositories, libraries have the longest history of providing accessibility in a standardized format. The broad acceptance of cataloging conventions such as the Anglo-American Cataloging Rules (AACR2) and the Machine Readable Catalog (MARC21) format allows users to move easily from library to library. In contrast, historical societies, museums, and archives, have often used locally developed cataloging and access tools, reflecting the special nature of their holdings. Archives, for example, hold materials in many different formats (e.g., manuscripts, oral histories, photographs, objects, and films). Historical museums are even more different, and art museums are hybrids, combining many objects, archives, and library materials. From institution to institution (and individual collection to individual collection), their access tools vary in descriptive elements and formats. The uniqueness of special collections has made the development and implementation of broad, uniform practices difficult, preventing broad cross-collection access.
Recent advances, offers hope for greater and more uniform access. Digitization has been a clear part of those efforts, and every digital project must address metadata issues to provide the best access to their materials and to ensure that their collection information is available in the larger arena of digital access. Today, there are several good descriptive systems available for use in cultural institutions. The most widely adopted is Dublin Core, a general descriptive system used by many multi-partner digitization projects to manage their electronic resources. Other systems include Encoded Archival Description (EAD), which is a system of encoding finding aids for the Web, and the Text Encoding Initiative (TEl), a system for encoding textual documents primarily from the humanities and social sciences. These systems are generally favored by large, established institutions. Other descriptive systems have been developed for specific formats.
These individual systems can be related to each other through the descriptive elements that they share (e.g., creator/author or subject). This process is often referred to as "cross- walking." Shared collection access methods (e.g., searching by subject across the holdings of several archives or across an archive, museum, and library) were difficult to accomplish in the pre-digital age. With technology, the dream of shared access is rapidly becoming a reality.
The uniform description of resources (what librarians have always called cataloging) in an electronic form is one of the first steps in creating shared access. Describing a resource is a difficult process, but an important one if the resource is to be accessible to the user. The more conformity to uniform practices, the more likely the resource will be located and used. The choice of a "cataloging system" is actually a choice of "metadata" formats.
What Is Metadata?
According to North Carolina Exploring Cultural Heritage Online guidelines (2006), Metadata is formally defined as "information about information" or any data associated with a resource that describes that particular resource. A more general definition that is useful for library and cultural institutions is "structured information about any information resource of any media type or format." In this context, an information object is anything that can be addressed and manipulated by a human or a system as a discrete entity. The essential aspect of a metadata system that describes an object, then, is its ability to provide a structured format for information about that object.
Metadata itself is essentially a modem term for the bibliographic information that libraries traditionally entered into their catalogs. However, the term metadata is most commonly used to refer to descriptive information about electronic resources. Library and Cultural heritage institutions have been creating metadata for as long as they have been collecting cultural materials for their preservation and presentation to the public. The impact that the digital environment has had on metadata is the creation of electronic information in structured formats.
The Creation of metadata for digital resources is an important part of any digitization project and must be incorporated into a project's workflow. Metadata should be created and associated with the digital resource to support the discovery, use, management, reusability, and sustainability of that resource. Metadata relating to digital resources is most often divided into five conceptual types (with some overlap among the five) as narrated by Erik (2002) in his work titled: "Metadata Principles and Practicalities."
a. Descriptive metadata: Information used for the indexing, discovery, and identification of a
b. Analytical metadata: information about the subject and context of a digital resource.
c. Structural metadata: information used to display and navigate a digital resource; also includes information on the internal organization of the digital resource. Structural metadata might include inforn1ation such as the structural divisions of a resource (i.e., chapters in a book) or sub-object relationships (such as individual diary entries in a diary section).
d. Administrative metadata: information needed for the management of the digital resource, which includes information regarding access, display, rights management.
e. Preservation metadata: information about the digital image for -preservation purposes, including the resolution at which the images were scanned, the hardware/software used to produce the image, compression information, pixel dimensions, etc., important for migration and long-term sustainability of the digital resource. This can also be referred to as technical metadata.
"Finding" or "accessing" holdings is the most visible role of metadata in the electronic environment. Today's users are coming to the digital resource from their home, work, school, etc., at any time of the day, and often without the assistance of a librarian, archivist, curator, museum educator, or other cultural heritage professional. In addition, digital resources present their own unique characteristics, and cultural institutions need to consider these characteristics as they try to integrate management of these resources into their traditional holdings. Metadata for digital resources needs to provide information that:
* certifies the authenticity al1d degree of completeness of the content;
* establishes and documents the context of the content;
* identifies and exploits the structural relationships that exist between and within information objects;
* provides a range of intellectual access points for an increasingly diverse range of users;
* provides some of the information that an information professional might have provided in a physical reference or research setting;
* provides information about the digital resource to the information professional to aid in the resource's sustainability.
Unfortunately, there is no uniform metadata solution for all library cultural materials. The metadata for text is different from the metadata for visual images. Further, the elements used to describe an object can change and grow as more becomes known about that object. Metadata should be thought of as a dynamic process. New metadata schemes for different formats of library materials or for different needs in managing those library .materials emerge. It is important to stay current as the field of metadata grows and changes.
How Do I Select the Best Metadata Standard for my Materials?
As indicated earlier on, there is a wide variety of metadata standards available to library and cultural institutions. Selection of a standard should be based on the needs of the repository and its users. Deciding which metadata system to use for a collection can be a very individualized process and a difficult one. Here are some general guidelines that can be followed while making choices about metadata systems:
· What is the purpose of the metadata process?
· Is there an institution similar to ours that is using a particular metadata documentation standard? Are they happy with the standard they chose? What would they do differently if they had it to do over?
· What is the reputation of the selected standard? How widely used is it? How old is the standard? Is it likely to be around for some time?
· Is my practice and experience compatible with the standard? Can I understand the elements as they relate to my collection?
· If I select a specific standard, will my collection be compatible with larger systems?
Dublin Core 15 Metadata Element Set
Description of Item
The name of the object. The title of a book, name given a work of art, name of manuscript collection, map name, etc. If item is unnamed, give the item descriptive title
The person(s), family(ies), organization(s) or corporate body(ies) primarily responsible for the creation of the object, collection, or item being described
SUBJECT AND KEYWORDS
What the content of the resource is all about or what it is, expressed by terms, including: topical, personal, corporate, or geographic for significant people, places, organizations, events, and topics reflected
A textual description of the content of the resource, such as an abstract, tables of contents, or free-text account of the object. This information can be taken from the object or provided by the record creator and can include specialized information not included in other elements.
The institution or repository that makes the resource available on the Web
The person(s) or organization(s) that made significant secondary contributions to the creation of the object, collection, or item being described.
The date of creation of the original item
DC. Resource- Type
The genre or nature of the resource, TYPE -such as sound recording, image,
Physical object, collection, or text
The extent of the original item being described. Can be in number of pages or linear feet, dimensions, etc
A character string or record number IDENTIFIER that clearly and uniquely identifies a digital object or resource. This element may be the accession number, record number, , International Standard Book
Number (ISBN), Uniform Resource Identifier (URI) or the Universal Resource Locator (URL), World Wide Web address and Digital Object Identifier (DOl)
A reference to a resource from which the item was derived, e.g. a larger collection from which an item was selected or a book from which a chapter was chosen. Recommended best practice is to reference the resource by means of a string or number conforming to a -formal identification system.
The language(s) of the intellectual content of the resource. This can be the language(s) in which a text is written or the spoken language(s) of an audio or video resource. Recommended best practice for the value of the language element is defined by RFC 1166 which includes a two-letter language code (taken from the ISO 639 standard), followed optionally, by a two-letter country code (taken from the ISO 3166 standard). For example, 'en' for English, 'fr' for French or 'en-uk' for English used in the United Kingdom.
The relationship to other objects. Element includes a variety of refinements to express the kind of relationship that exists between the resource and the other objects. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.
(Spatial) The geographic location(s) C associated with the resource DC. Coverage(Temporal) The time period associated with the resource. Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names (TGN) and that, where appropriate named places or time periods be used in preference to numeric identifiers such as sets of coordinates or date ranges.
A rights management or usage MANAGEMENT statement a URL that links to a rights management statement, or a URL that links to a service providing information on rights management of the resource. Rights information often encompasses Intellectual Property Rights (IPR), copyright and various property rights. If the rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.
Other Metadata Standards
While Dublin Core is the baseline of metadata standards, there are other standards that provide richer descriptive tools, retrieval possibilities, and other management capabilities for specific types of cultural materials. For example, Dublin Core is not as efficient a tool as some systems when describing relationships between materials and hierarchies of information. This is, for example, significant in creating description for manuscript and archival collections. Typically, individual collections of manuscripts are composed of series of materials, and a series of material is composed of sub-series of materials, and a sub-series of material is composed of files or other sub-series, and a file of material is composed of individual items. Brief lists of other metadata standards as given by NC ECHO (2006) are follows.
Encoded Archival Description (EAD)
Another metadata standard, Encoded Archival Description (EAD), has been developed to address the need to describe relationships between materials.
Encoded Archival Description (EAD) is a metadata system that leverages the structure of archival description found in archival finding aids through its encoding standard. It is an Extensible Markup Language (XML) document type definition (dtd) that enables EAD- encoded finding aids to be searched, retrieved, displayed, and exchanged. EAD has independent platform and is maintained by the Society of American Archivists. It is a recognized international standard.
EAD is especially helpful in information retrieval because of its ability to identify particular areas of description in the finding aid and its ability to present information in a hierarchical fashion.
Encoded Archival Context (EAC)
EAC is an emerging standard for the description of record creators. It provides sections on identity, description (both formal and informal), relationships, and record maintenance. The standard approaches cultural heritage materials from a new perspective. Rather than describing materials, it described the creators and provides connections to the materials of those creators. North Carolina Exploring Cultural Heritage On-line (NC ECHO) has a working group, North Carolina Encoded Archival Context (NCEAC) that has examined the beta standard and adopted a union model for the NC ECHO project. This project is in it infancy but hopes to become a resource for the public and institutions alike. Most importantly, it will rely on partner institutions contributing information about the people and corporate bodies that have created the state's cultural heritage materials.
Visual Resources & Objects Standards
Categories for the Description of Works of Art (CDWA)
CDWA was created by the Getty Art Museum for the description of works of art and is used throughout California Museums cataloging information for their holdings.
Cataloguing Cultural Objects (CCO)
CCO was designed for the description of many types of cultural objects, including architecture, archaeological sites, and artifacts as well as functional objects from the realm of material culture. Like CDW A, it focuses on works of art and their visual surrogates and is not directly intended for historical objects, science and technology specimens, and the like. It focuses on the data content standard and recommendations of controlled vocabularies. The primary emphasis is descriptive metadata intended to describe a cultural work. That description is then used in systems intended to manage that data. CCO excludes administrative and technical metadata in so far as they do not impact the description of the object, and it is therefore recommended that CCO be used in conjunction with other standards to address all the metadata needs of an institution.
Text Encoding Standards
Text Encoding Initiative (TEl)
TEl is the standard system of encoding transcribed documents for presentation on the Web (often rare books, pamphlets, etc.). It is not used to mark up finding aids or to "catalog" digital resources as Dublin Core is. TEl is, however, one of the most prominent systems used to bring full-text resources (and not just images of those resources) to researchers via the WEB. The TEl provides guidelines for the long-term preservation of electronic data, and a means of supporting effective usage of such data in many subject areas. It is the encoding scheme of choice for the production of critical and scholarly editions of literary texts, for scholarly reference works and large linguistic item, and for the management and production of detailed metadata associated with electronic text and cultural heritage collections of many types. NC ECHO has established a TEI-NC group, which will provide implementation guidelines for institutions interested in using TEl.
Oral histories present interesting issues for metadata. NC ECHO intends to work closely with the oral history projects in North Carolina to provide guidance on metadata for those oral histories. The group has yet to form, but will most like produce guidelines for oral history metadata.
Maintaining information about the creation and maintenance of your digital images is an important aspect of digitization because it ensures the longevity of your work. NC ECHO has constructed a preservation metadata standard to aid in the long-term sustainability of the digital content created in digitization projects. The tools developed include a content standard as well as a Microsoft Access database tool available for institutions that might need it.
A controlled vocabulary is a set of terms used consistently and defined very carefully. It helps little if archivists, museum professionals, and librarians recognize the same metadata fields, but then choose to fill them with their own descriptive phrasing. That is where controlled vocabularies enter the picture. A controlled vocabulary is used when the search results need to be consistent. If indexing is to work, a controlled vocabulary is a must.
Several different descriptive elements lend themselves to controlled vocabularies. Names of creators or contributors, genres or mediums, and subject listings all reap the benefits of controlled vocabularies. Other fields, such as Date and Language rely on data content standards that dictate the way that that information is entered.
The best practice is to select terms from controlled vocabularies, thesauri, and subject heading lists to use as subject elements, rather than just using keywords. Employing terminology from controlled vocabularies ensures consistency and can improve the quality of search results. It also can reduce the likelihood of spelling errors when inputting metadata records. Recognizing the diverse nature of the statewide initiatives and the involvement of a broad range of cultural heritage institutions, controlled vocabularies have been expanded to include subject discipline taxonomies and thesauri. Some institutions are developing geographic-based lists of terms that may be helpful in achieving a level of consistency in terminology. Many of the thesauri, subject heading lists, and taxonomies are currently available via the web.
Digitization involves numbers of standards and best practices. While standards and best practices are not new to library and cultural heritage institutions, the nature of digitization makes adherence to these standards and practices an important matter. Making decisions about which standards to follow and which practices that are really best can be a difficult and overwhelming task. In addition, digital standards are more dynamic than traditional standards. Often, they must be reshaped to assimilate quickly. This dynamic nature has indeed created a shift in the way we understand the term "standard." Yet, dynamic or static, standards make it easier for everyone to use information. We hope librarians we keep up to date on new changes with respect to standards and best practices.
1. Conway, Paul (1994). The Implications of Digital Imaging for Preservation. In Preservation
of Library and archival Materials, 2nd ed. Edited by Sherelyn, Ogden. Andover, MA:
Northeast Document Conservation Center
2. Ding, Choo Ming (2000). Access to Digital Information: Some Breakthroughs and Obstacles.
Journal of librarianship and information science, 32 (1)
3. Duval, Erick et-al (2002). Metadata Principles and Practicalities. In D-Lib. Magazine. Vol. 8
4. Elisam, Magara and Stephen, Mayega (2005). Digitization of Theses and Dissertations in
Universities in Developing Countries: Strategies for Makerere University. University of
Dares-salaam library journal, Vol. 7, No.2
5. Ian, H. Witten and David Brain-bridge (2003). How to Build a Digital Library. London: Morgan
6. North Carolina Exploring Cultural Heritage On-line (2006). Guidelines for Digitization. North
Carolina Exploring Cultural Heritage On-line, USA.
7. Smith, Abby (2001). Strategies for Building Digitized Collections. Digital Library Federation
Council on Library and Information Resource: Washington D.C
8. Ibid. (2001). Why Digitize. Council on Library and Information Resource: Washington D. C: