Data Governance Definition
Data governance is the combination of people, processes, and technology used to manage the availability, usability, integrity, and security of enterprise system data. It ensures that data is consistent, trustworthy, and not misused.
Three Major Parts of Data Governance
Data governance consists of three main parts: curation, discovery and understanding, and protection.
Curating Data at Scale
Curating data involves identifying and managing valuable data sources such as databases, data lakes, and data warehouses. By curating data, you limit the transformation and proliferation of critical data assets and ensure data accuracy, freshness, and the absence of sensitive information.
Understanding Data in Context
Understanding your data means ensuring that users can discover and comprehend its meaning. This allows users to use the data confidently to drive business value through a centralized data catalog that simplifies access and usage.
Protecting Data
Protecting data involves balancing privacy, security, and access. Effective tools enable governance across organizational boundaries and help both business and engineering users manage data in an intuitive manner.
Strategic Asset and Governance Focus
Data governance emphasizes treating data as a strategic asset and developing the competencies to manage it effectively. It involves exercising authority and control over data to meet stakeholder expectations and supporting business initiatives.
Key Data Governance Roles
To succeed with data governance, it’s important to define roles like data owners, stewards, and IT personnel. These roles should be assigned to the right individuals, considering segregation of duties.
Data Stewards’ Role
A data steward is a business representative who has detailed knowledge of data needed for business initiatives. Data stewards manage daily project tasks and understand the data issues likely to affect business initiatives.
Data Owners’ Role
Data owners, often at the executive level, make decisions about data policies, including regulatory and compliance rules. They determine access to data, such as claims or customer data. They work closely with data stewards to guide the steward’s work.
IT Roles in Data Governance
IT plays a crucial role by managing and deploying data governance tools like AWS services, helping data stewards with the right tools and capabilities for managing data.
Data Profiling
Data profiling systematically examines data to detect issues and understand its characteristics for various purposes, ensuring the data is accurate and aligned with business needs.
Data Catalog and Data Lineage
A data catalog ensures that data is accessible to those who need it, while data lineage tracks the flow of data, showing its origin and how it was transformed, moved, and stored.
AWS Glue DataBrew for Data Governance
AWS Glue DataBrew is a visual tool for data preparation that helps users clean and normalize data without writing code. It offers data profiling features to examine and define data quality rules and detect problems. Additionally, it tracks data lineage, helping users understand the origin and flow of their data.