Cerebrus

Litmus7’s Metadata Driven Data Quality Engine

The key driver of innovation and competitiveness – ‘Data‘ is said to be the new oil and the backbone for sustainable growth. Most organizations are now moving towards a data-driven approach and hence the quality of data being used is of utmost importance. Since there is a huge amount of data being collected from a wide variety of sources, there is a high chance for incorrect or incomplete data getting ingested due to manual or machine errors. This ultimately impacts the sales revenue, customer relationships, as well as the brand reputation.

The role of Cerebrus

A metadata-driven data quality engine developed by Litmus7, Cerebrus aims to tackle the problem of inaccurate and incomplete data. It offers a comprehensive solution to address the challenges associated with data quality. Users can ensure impeccable data quality by integrating Cerebrus into their data management processes. It will help ensure seamless data movement from any source to destination, powering businesses with faster, more reliable and consistent data.

Key Features

User Friendly Metadata registry UI

Register column descriptions for quick reference

Minimal Manual Effort

Built-in and ready to use with very minimal user inputs. Ability for non-technical users to define data assertions

Column and Row level detailed checks

Column level checks such as Data type, Size, Values, Range, Pattern, Action etc. Each row level checks for the set conditions

Platform Agnostic DQ engines

Python and Spark based engine to suit your DE engine

Customizable DQ functions

Leverages custom defined DQ functions for detailed data quality checks. Automatically converts rules into functions

Proactive Actions and Thresholds

Control actions and thresholds based on DQ events at record/dataset level

Smart Audit Log

Audit log at each record level data for runtime capture of DQ rules with error message. Graphical representation of Data Quality output

Advantages of Cerebrus

Cerebrus provides a user interface which is simple to use even for non-technical users. So, more control can be given to the business user for defining data quality rules without getting involved in the technical aspects happening in the background for data validation. The user has the ability to upload a sample data file, based on which Cerebrus will infer many of the details like column name, data type etc. In addition, the user will be able to define more validation rules such as uniqueness, applicable values, valid range, expected pattern, field length etc. This validation metadata could be persisted into a database, which could be used multiple times to validate the actual source data. Upon running the data quality engine over the source data using already registered metadata, valid and invalid records are separated into different files/tables. Another catch is that an audit file is also generated which will capture the detailed information about validation failures. This audit file is leveraged to provide multiple insights to the user like most successful column, most frequently failing column, most frequently failing validation constraint etc.

Key Benefits

Improved accuracy and reliability of data

Empowering business users and data owners to define data assertions

Enhanced data stewardship and power to business

Time & Cost savings

Automated enforcement of
Compliance and Governance