Home HomeFred's BiographyHomefmoore@horison.com

(Click for larger image)








Structured and Unstructured Data

Data is typically regarded as either structured, such as databases that are built and maintained by a database application, or unstructured, simply meaning everything else. Structured data is managed by technology that enables querying and reporting against predetermined data types and understood relationships. Companies have been using data mining software for years to extract business intelligence from their structured data. Since the database fields are clearly defined, it is easy to run queries and formulas that extract meaningful information, not just raw data. Computers are great at handling massive quantities of structured information, something which people have a hard time doing. Coping with structured and unstructured data contributes to the increasing storage management gap.

Unstructured data is what we find in e-mails, reports, scanned data, PowerPoint presentations, voice mail, phone notes, some scientific data and photographs. Unstructured data consists of two types. Bitmap objects are inherently non-language-based, such as image, video or audio files. Textual objects are based on a written or printed language, such as Microsoft Word documents, e-mails or Microsoft Excel spreadsheets. Both of these object types are classified as data, but the technology and methodology for capturing relevant information from bitmap objects remains difficult as it is still in its infancy. The vast majority of the world's digital data is unstructured with some estimates as high as 85 percent.

Today's database technology addresses textual objects that have naming conventions, tags or a taxonomy to identify "things." With unstructured content, there is no conceptual definition and no data type definition. Some current technologies used for content searches on unstructured data require tagging individual entities such as names, applying keywords or metadata tags. Therefore, human intervention, often labor-intensive, is required to help make the unstructured data machine readable and usable. Even if unstructured data is in a format such as a word processing template, the data is still not usable from a semantic level without a compatible interface or application.

Companies are attempting solutions that must bring together the unstructured and structured worlds. A good example of bridging unstructured data with structured data centers on the efforts around compliance and the Sarbanes-Oxley Act. Compliance has highlighted the difference in structured and unstructured data. As data is distributed to employees, partners and customers, it can gradually migrate from structured formats to an unstructured format causing a reduction in or loss of easy and quick accessibility. The quote "We are drowning in information but are starving for knowledge" has been correct for too long. Businesses are forcing the storage industry to resolve the massive challenge of harnessing unstructured data and making it more readily useful. When this happens, the giant and long-awaited shift from data to information can be realized.

Source: Horison Information Strategies: Storage Navigator


© 2005 Horison

 Previous Topics of the Month
Virtualization July 2006
Downtime June 2006
Storage TCO April 2006
Storage Utilization February 2006
High Stakes. High Voltage. January 2006

© 2005 Horison, Inc.