Big Data

Over the last two decades, businesses have been creating and consuming data in exponential quantities due to advances in technology. The data management industry has emerged from this data explosion and given birth to a wide variety of jobs, ranging from the Data Scientist to the Chief Data Officer. Data, however, is generally of little practical use to the business unless it is of a high enough quality.

What is Data Quality and Why Does it Matter?
Data quality is about ensuring that data is fit for a purpose and that it is accurate, timely, and complete enough to be used in the way it is intended. Good data quality ensures that the information about your customers is as complete and as accurate as it can be. In short, good data is your most valuable asset. Conversely, poor data quality can adversely affect the success of projects and extend both costs and duration. According to a Gartner survey, poor data quality costs organizations an average of $14.2 million annually.[1] Additionally, bad data can seriously impact your credibility and affect customer confidence.

Emerging Data Quality Trends
Given the problems inherent in organizations’ data quality, the following are some key trends that companies are using to effectively solve data quality problems:

Data Quality Initiative Diversity
Data quality is gaining popularity across data domains and use cases. Although data quality initiatives focused on customer data are highlighted most frequently, other data types are gaining ground.[2]  This diversity of data quality initiatives – which includes transactional data, financial data, location data, and product data – is a result of the increasing variety of data found in organizations. Additionally, this trend may indicate that data quality should be established as a function that is delivered across lines of business. Instead of solving data quality issues on a case-by-case basis for each domain, mature organizations may look to orchestrate information management across the enterprise. Likewise, interest is growing in applying data quality tools and techniques to less structured data sources, such as social data. It is estimated that unstructured information might account for more than 70%–80% of all data in organizations, representing a significant growth area for data quality tools and practices.[3]





More Roles Participate in Improving Data Quality
Data Quality initiatives have traditionally been driven by IT. Recently, however, the business has taken a more active role in managing the goals, rules, processes, and metrics associated with improving data quality. Business functions have begun to establish roles such as Data Stewards, Chief Data Officers, and Information Governance Teams as they recognize the importance of data quality, and in order to deliver on these stewardship-oriented activities, organizations have identified a need to create structured processes for detecting, tracking, and correcting data quality issues. Similarly, vendors are beginning to offer solutions with self-service capabilities as the balance shifts toward data quality roles within the business. These solutions are being tailored to business workers, who must be able to understand and manage the data quality capabilities in order to have an impact.

Big Data and Information Trust
Information trust issues risk serious damage to an organization’s reputation. With the proliferation of big data projects, data quality and information trust challenges are squarely in the public eye. According to Mark Smith of Ventana Research, most of the time spent on big data projects relates to data quality and data preparation.[4] An alternative to the highly promoted data lake approach is gaining ground, referred to as the “data reservoir approach”. According to Gartner analysts Merv Adrian and Nick Heudecker, a data lake aims to gather data in a big data environment without further preparation and cleansing work, while a data reservoir aims to focus on making it more consumption-ready for a wider audience, and not only for a limited number of highly-skilled data scientists.[5] Under that vision, data quality becomes a building block of big data initiatives, rather than a separate discipline.

IT leaders are beginning to realize that poor-quality data is limiting their ability to optimize the performance of workers and business processes, make better decisions, manage risk, and cut costs. The key trends highlighted above are being increasingly used in order to make data an organization-wide asset.

[1] Friedman, Ted; Judah, Saul. The State of Data Quality: Current Practices and Evolving Trends. Gartner. 2013.
[2] Friedman. 2013.
[3] Holzinger, Andreas; Stocker, Christof; Ofner, Bernhard; Prohaska, Gottfried; Brabenetz, Alberto; Hofmann-Wellenhof, Rainer. “Combining HCI, Natural Language Processing, and Knowledge Discovery – Potential of IBM Content Analytics as an Assistive Technology in the Biomedical Field”. Lecture Notes in Computer Science. Springer. pp. 13–24. 2013.
[4] Smith, Mark. “Big Data Requires Integration Technology.” Perspectives. Ventana Research, 07 November 2014. Web. 09 February 2016.
[5] Adrian, Merv; Heudecker, Nick. (Producer). (2014). Hadoop 2015: The Road Ahead. [Video Webinar]. Retrieved from