Streaming data is growing exponentially. With the rise of SaaS, e-commerce and connected devices, organizations across the globe are increasingly required to deliver and react to data in real-time to meet customer and competitive demands. As a result, more and more companies are exploring event-driven architecture (EDA), a framework that delivers changes to data as they occur, or as a series of discrete events.
The proliferation of streaming data represents a fundamental shift in how organizations send data between systems. Businesses are increasingly relying on messaging frameworks for streaming data delivery, with the open-source, distributed streaming platform Apache Kafka being the clear favorite due to its high-throughput, low latency real-time streaming, flexible data retention, redundancy and scalability. The platform quickly communicates events such as changes in customer information, system log updates and transactions between the systems or applications that created the data (producers) to any number of systems that subscribe to ingest it (consumers).
In industries like financial services, the ability to monitor and update data in real–time is a critical capability, but organizations also need to establish data quality for streaming data to ensure that data is being reliably delivered and data integrity is maintained despite high volumes and high speeds.
There are numerous Kafka use cases in the financial services space. One example includes a large multinational investment bank and financial services company that was using Kafka to manage thousands of daily financial transactions. The organization was concerned about data quality in this high-throughput environment, and implemented in-line data quality validations to check for completeness and consistency before the transactions are posted. These on-the-fly quality checks ensured that any discrepancies or data messages that failed to meet pre-defined patterns would be routed to a workflow for investigation and resolution before being posted to the official record. It’s a significant advantage when investment transactions can be monitored immediately, instead of in batch form at the end of the day, to deliver reliable investment financials to their customers and partners.
Concerns about streaming data quality aren’t exclusive to large, global financial institutions. Smaller banks are also implementing Kafka data quality measures to monitor streaming data. For instance, a local bank used Kafka on their trading platform to send information from one system to another. In this case, the organization wasn’t concerned with in-line validations, but ensuring that data did not fall outside of expected thresholds, such as statistical controls and trade volumes. For example, if the expected number of daily transactions was around 2,000, but that number fell to 100, they would know there was an issue that required immediate investigation. The organization was also looking to reconcile the Kafka messages that were received to the data that was stored in the source system. These data quality rules were applied to all data within a given timeframe, with any inconsistencies or outliers routed to the appropriate resource for review.
Both of these cases demonstrate the importance of maintaining data quality in high volume, high throughput environments like Kafka. In industries such as financial services, accuracy is critical not only from a regulatory compliance perspective, but also for operational effectiveness, competitive advantage and a positive customer experience. Across systems and processes, from creation to consumption, errors must be quickly identified before they have the opportunity to proliferate downstream.
Data quality is essential to ensuring data remains an enterprise asset for financial institutions. An enterprise data quality solution can help banks protect the integrity of their data and provide validation at speeds that keep up with streaming data platforms like Kafka.
Enterprise data quality can ensure data quality not only at data’s source, but within Kafka and at the point of consumption. Basic quality checks can occur in real-time to ensure data conformity, patterns and completeness as well as verify various financial transactions and amounts. The data quality solution should not only provide visibility into issues, but feature robust error remediation and workflow to ensure timely resolution of potential quality issues.
Yet not all data quality rules in Kafka must be real-time. Data quality can also be checked on a batch basis in circumstances where a real-time response in not required, or for more advanced data quality dimensions such as integrity (Is the data retained and properly transformed as it moves across systems?) or accuracy (Is the data correct at the point of consumption?). Batch processing is ideal for data reconciliation, which provides insight into aggregated transactions or balances from system to system.
Event-driven architecture and streaming data is quickly gaining popularity in the financial sector. High velocity messaging will continue to change the way banks and other institutions collect, store, share, manage and utilize data. Still, data quality is always a critical component of any successful data strategy, regardless of speed or amount of data.
Are you looking for additional information about solving data quality challenges for streaming data? Check out the eBook below.
For a deeper dive into this topic, visit our resource center. Here you will find a broad selection of content that represents the compiled wisdom, experience, and advice of our seasoned data experts and thought leaders.