Defining Big Data – Making Compliance Data “Relevant” Data
March 25th, 2016
“Big Data”, perhaps the hottest buzzword since transparency, reflects the critical need for more insight within a compliance program. Big Data has become one of those talking points that everybody uses quite liberally. Everybody claims familiarity with Big Data concepts, but in reality, hardly anyone seems to really know what to do with it.
Yes, there is no question that Big Data is the way of the future. Over the past year and half, enterprises have started to invest and buildup their data processing capabilities with the implementation of data lakes using the Hadoop technology HDFS. As the data assets continue to be built, companies are starting to implement more real-time analytics using in-memory technologies such as Storm and Flume which provide more responsive analytics.
The definition of “Big Data”, according to Webster’s, is an accumulation of data that is too large and complex for processing by traditional database management tools. Based upon this definition, firms utilizing such technology are up against some massive challenges. If you look at the other definitions of big data analytics, it states, “the process of examining large data sets containing a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information”. Here lies the true intent of what compliance is trying to achieve by going to a Hadoop-like environment.
The largest of these challenges is what to actually include in your big data platform. How much information is too much information? This is a question that has no definitive answer, but just like any other process put in place (particularly in compliance), bogging down an analyst’s system with too much data will only hinder the monitoring and review process. For example, if an alert is presented to an analyst, they will need to review all data around that alert. If some of this data is insignificant, or not related, they still need to review it because their policy states that all data associated with an alert must be reviewed and dispositioned. The result provides a process no different from today, long and drawn out, instead of efficient and effective.
Big Data, in the use case of financial compliance, should be renamed “relevant data”. Adding in every data point that a firm has at its disposal is not necessarily the right way to approach it. Compliance needs to step back and assess the data points and make a determination based upon what it is trying to achieve, in terms of which dots need to be connected to provide the greatest insight to compliance analysts so that they are able to make an educated decision. Just as firms run through a compliance review, they also need to apply this process to the data they want to use. This process is not one that is run once; it needs to be a continuing process as the amount of data evolves as well as the increased regulatory pressure that firms are facing.
This now leads to the other biggest challenge, finding compliance personnel with the knowledge base to actually be able to use such advanced systems. Firms putting such sophisticated processes in place have now reduced the number of potential qualified employees able to gain insight into what data the system now provides. Since human interaction will never be eliminated, nor should it be, firms need to invest in proper training and hiring of the right people. The ideal candidate has a vast knowledge of the markets, but also is able to speak in SQL as well as trader slang.
Firms will get there eventually, but bridging the technology and knowledge gap is an immediate challenge that cannot be overlooked.