By Naveh Greenberg, Director, US Defense Development, DCL
Any federal content digitization effort incorporates significant requirements for quality control (QC), whether in the civilian or Department of Defense (DoD) sector. And many agencies mistakenly consider QC to be a fully manual, time consuming, and extremely expensive element in a content conversion initiative. In fact, by focusing on upfront analysis of your content needs, and choosing the right tools, you can bring a level of automation to the QC process that can save time, effort and dollars.
Identify what’s really in your legacy content
The goals of most DoD standards, as well as content QC processes, is to ensure that you maintain clean content that can be reused as needs dictate. That’s difficult to envision when faced with tremendous volumes of legacy content that might be stored as hard copy, PDFs, microforms, MS Word, and even multiple versions of older markup languages. Often, an agency has little understanding of how much structure exists in their content, much less how much might be replicated. In fact we find that up to 50 percent of legacy content is duplicative. Without analyzing what’s in your content before starting a conversion initiative, you could spend a substantial portion of your budget converting twice as much content as you actually need.
So it makes sense to complete a deep dive analysis of the content, and with tools such as DCL’s Harmonizer, the analysis of content quality can be highly automated. The software examines thousands of pages looking for exact matches and near matches, which can result from simple typos or content that is “rewritten” to apply to similar procedures or to a range of parts. Where content has been tagged to support a standard such as S1000D or 40051, Harmonizer provides clues on how to break the data up, or use applicability, in a matter of minutes.
Uncover the reuse potential of content and automate cleanup before conversion
DCL always works closely with clients throughout the entire digitization project, and the time spent in planning the production process pays off by reducing the amount of content to be converted, and increasing the quality of the data before processing it for final production. Harmonizer plays a key role, not only in identifying redundant content at multiple levels of granularity, but also in defining applicability when going to standards such as S1000D. It provides a detailed report that lays the groundwork for implementing the full conversion production process.
We implement a production process that accommodates both the conversion goals and the nature of the content being converted. Harmonizer is one of the automation tools used more than once during the process.
Text Extracting: To clean up typographical errors.
Scoping a project: Determine reuse potential for ROI calculation. Provides metrics on reuse potential in document set.
Analysis: The Sequence Matches Report shows instances of two or more consecutive paragraphs with matches elsewhere in the document set. This can be used when looking for data chunking in S1000D.
Applicability/effectivity: By using the Information Summary Report and analyzing the close matches some paragraphs can be converted once and used many times with slight modification using applicability.
Align quality control of content digitization to standards
Automation makes a digitization project manageable, and the Harmonizer tool sets the stage for automation of substantial portions of the quality control effort. And that means the project team can focus on other critical aspects of ensuring the agency’s content repository is accurately digitized and efficiently managed, no matter which standards it must comply with.
Comments