Deutsche Börse – Leveraging AWS For Rapid Infrastructure Evolution

Deutsche Börse – Leveraging AWS For Rapid Infrastructure Evolution

Abstract

Client: The Deutsche Börse, Market Data and Services Business Groups

Project Dates:  January, 2017 through January, 2018

Technology Solutions Used:

 

  • AWS Cloud Hosting
  • Docker Swarm: orchestration framework
  • Redis: in-memory cache
  • Confluent Kafka: scalable replay log
  • TICK: monitoring framework
  • Greylog: log aggregation
  • Jenkins: CI/CD pipeline

Summary: Leveraging AWS, we empowered our client with the automation and tools needed to rapidly create and test a production-scale on-demand infrastructure within a very tight deadline to support MIFID2 regulations.

Given the extreme time pressure that we were under to deliver a mission-critical platform, together with Risk Focus’ we decided to use AWS for Development, QA, and UAT which proved to be the right move, allowing us to hit the ground running. The Risk Focus team created a strong partnership with my team to deliver the project.

Maja Schwob

CIO, Data Services, Deutsche Börse

Problem Statement

In 2017, the Deutsche Börse needed an APA to be developed for their RRH business to support MIFID2 regulations, to be fully operation by January 3, 2018. The service provides real-time MIFID2 trade reporting services to around 3,000 different financial services clients. After an open RFP process, The Deutsche Börse selected Risk Focus to build this system and expand its capacity so it process twenty times the volume of messages without increasing latency and deliver the system to their on-premises hardware within four months.

AWS-Based Solution

To satisfy these needs, we proposed a radical infrastructure overhaul of their systems that involved replacing their existing Qpid bus with Confluent Kafka. This involved architecture changes and configuration tuning.

The client’s hardware procurement timelines and costs precluded the option to develop and test on-premises. Instead, we developed, tested and certified the needed infrastructure in AWS and applied the resulting topology and tuning recommendations for the on-site infrastructure. Finding optimal configuration required executing hundreds of performance tests with 100s of millions of messages flowing through a complex mission-critical infrastructure, and it would have been impossible in the few weeks available without the elasticity and repeatability provided by AWS. This was implemented using an automated CI/CD system that built both environment and application, allowing developers and testers to efficiently create production-scale on-demand infrastructure very cost effectively.

Financial Services Expertise

Although this was a technical implementation, we were ultimately selected by the business unit at the Deutsche Börse to provide an implementation of their service because of our strong Financial Services domain expertise and implemented both technical and business requirements. The stakeholder group also included the internal IT team and the Bafin (German Financial Regulator), and all technology, infrastructure and cloud provider decisions had to be made in tandem by all three groups collectively.

Risk Focus’s deep domain knowledge in Regulatory Reporting and Financial Services was crucial to understanding and proposing a viable solution to address the client’s needs and satisfy all stakeholders. That domain expertise, in combination with Risk Focus’ technology acumen, allowed for the delivery of the service that met requirements.

Setting the Stage for The Future

The system was delivered to the client’s data center on time within a startlingly short timeframe. We worked with their internal IT departments closely to transfer fully over the delivered solution, allowing us to disengage from the project and effectively transfer all knowledge to the internal team. All test environments and automation were handed over to the client, allowing them to further tune and evolve the system.

The ability to develop and experiment in AWS empowers the client to make precise hardware purchasing decisions as volumes change, and they now have a testbed for further development to adapt to new regulatory requirements.  They are well-positioned for the future, as it provides a pathway to public cloud migration once that path is greenlighted by regulators.

Using the Siren Platform for Superior Business Intelligence

Using the Siren Platform for Superior Business Intelligence

Can we use a single platform to uncover and visualize interconnections within and across structured and unstructured data sets at scale?

Objective

At Risk Focus we are often faced with new problems or requirements that cause us to look at technology and tools outside of what we are familiar with. We frequently engage in short Proof-Of-Concept projects based on a simplified, but relevant, use case in order to familiarize ourselves with the technology and determine its applicability to larger problems.

For this POC, we wanted to analyze email interactions between users to identify nefarious activity such as insider trading, fraud, or corruption. We identified the Siren platform as a tool that could potentially aid in this endeavor by providing visualizations on top of Elasticsearch indexes to allow drilling down into the data based on relationships. We also wanted to explore Siren’s ability to define relationships between ES and existing relational databases.

The Setup

Getting started with the Siren platform is easy given the dockerized instances provided by Siren and the Getting Started guide. Using the guide, I was able to get an instance of the platform running with pre-populated data quickly. I then followed their demo to interact with Siren and get acquainted with its different features.

Once I had a basic level of comfort with Siren, I wanted to see how it could be used to identify relationships in emails, such as who communicates with each other and if anyone circumvents Chinese firewall restrictions by communicating through an intermediary. I chose the Enron email corpus that is publicly available as my test data and indexed it in ES using a simple Java program that I adapted from code I found here. I created one index containing the emails and another index of all the people, identified by email address, who were either senders or recipients of the emails. The resulting data contained half a million emails and more than 80,000 people.

With the data in place, I next set up the indices in Siren and defined the relationships between them. The straightforward UI makes this process very simple. The indices are discoverable by name, and all of the fields are made available. After selecting the fields that should be exposed in Siren, and potentially filtering the data, a saved search is created.

Once all of the indices are loaded and defined, the next step is to define the relationships. There is a beta feature to automatically do this, but it is not difficult to manually setup. Starting with one of the index pattern searches, the Relations tab is used to define the relationships the selected index has to any others that were defined. The fields that are used in the relationships must be keyword types in ES, or primary keys for other data sources.

Now that the indices are loaded and connected by relationships, the next step is to create some dashboards. From the Discovery section, the initial dashboards can be automatically created with a few common visualizations that are configured based on the data in the index. Each dashboard is linked to an underlying saved search which can then be filtered. There is also a visualization component that allows for filtering one dashboard based on the selection in a related dashboard.

Dashboard

Each dashboard is typically associated with a saved search and contains different visualizations based on the search results. Some of the visualizations show an aggregated view of the results, while others provide an alternative way to filter the data further and to view the results. Once the user has identified a subset of data of interest in a particular dashboard, s/he can quickly apply that filter to another related dashboard using the relational navigator widget. For example, one can identify a person of interest on the people dashboard and then click a link on the relational navigator to be redirected to the emails dashboard, which will be filtered to show just the emails that person sent/received.

The above screenshot shows two versions of the same information.  The people who sent the most emails are on the x-axis, and the people they emailed are on the the y-axis with the number of emails sent in the graph. By clicking on the graphs, the data can be filtered to drill down to emails sent just from one person to another, for example.

Graph Browser

One of the most interesting features of Siren is the graph browser, which allows one to view search results as a graph with the various relationships shown. It is then possible to add/remove specific nodes, expand a node to its neighboring relationships and apply lenses to alter the appearance of the graph. The lenses that come with Siren allow for visualizations such as changing the size or color of the nodes based the value of one of its fields, adding a glyph icon to the node, or changing the labels. It also supports custom lenses to be developed via scripts.

In the screenshot above, I started with a single person and then used the aggregated expansion feature to show all the people that person had emailed. The edges represent the number of emails sent. I then took it one step further by expanding each of those new nodes in the same way. The result is a graph showing a subset of people and the communication between them.

Obstacles

As this was my first foray into both Elasticsearch and Siren, I faced some difficulty in loading the data into ES in such a way that it would be useful in Siren. In addition, the data was not entirely clean given that some of the email addresses were represented differently between emails even though they were for the same person. There were also many duplicate emails since they were stored in multiple inboxes, but there was no clear ID to link them and thus filter them out.

Apart from the data issues, I also had some difficulty using Siren. While the initial setup is not too difficult, there are some details that can be easily missed resulting in unexpected results. For example, when I loaded my ES indices into Siren, I did not specify a primary key field. This is required to leverage the aggregated expansion functionality in the graph browser, but I didn’t find anything in the documentation about this. I also experienced some odd behavior when using the graph browser. Eventually, I contacted someone at Siren for assistance. I had a productive call in which I learned that there was a newer patch release that fixed several bugs, especially in the graph browser. They also explained to me how the primary key of the index drove the aggregated expansion list. Finally, I asked how to write custom scripts to create more advanced lenses or expansion logic. Unfortunately this is not documented yet, and is not widely used. Most people writing scripts just make modifications to the packaged ones currently.

Final Thoughts

In the short amount of time that I spent with Siren, I could see how it can be quite useful for linking different data sets to find patterns and relationships. With the provided Siren platform docker image, it was easy to get up and running. Like with any new technology, there is a learning curve to fully utilize the platform, but for Kibana users this will be minimal. The documentation is still lacking in some areas, but the team is continuously updating it and is readily available to assist new users with any questions they may have.

For my test use case, I feel that my data was not optimal for maximizing the benefits of the relational dependencies and navigation, but for another use case or a more robust data set, it could beneficial. I also did not delve into the monitoring/alerting functionality to see how that could be used with streaming data to detect anomalies in real-time, so that could be another interesting use case to investigate in the future.

References