Unlocking More Value From Your Trade Data with Bigbang

November 16, 2023 | 16:41

Transcript

Paul kaisharisSenior Vice President of Engineering, Molecule

Alex CerboneSales Engineer, Molecule

Kari FosterVice President of Marketing, Molecule

0:00

Narrator Hello, and thanks for watching!

This video is a previously recorded demo of Bigbang, Molecule's data lake-as-a-service platform. You'll see a general overview of how it works and the types of reports you can create using your Molecule data.

If you have any questions about Molecule and Bigbang that are more specific to your organization, we're happy to show you more - reach out to us at molecule.io/bigbang.

Enjoy the demo!

0:34

Paul Kaisharis Alright, so let's first talk about kind of the differences between multi-tenant type systems and legacy over on-premise kind of systems.

Multi-tenant systems, obviously, Molecule is one. Typically, with multi-tenant systems, you're getting rapid updates. At Molecule, actually, we do releases - as our existing customers know - we do releases every couple of weeks.

Being multi-tenant, running in the cloud, we're able to take advantage and leverage leading technology. We're able to adapt and move to technology very quickly. You know, uptime is a very important aspect of a multi-tenant solution or a hosted solution like Molecule offers. It's our responsibility to ensure, you know, uptime of the application.

Typically, these systems also provide very rich APIs. Molecule has a pretty rich REST API that we've worked on over the past few years. It gives you access to all the data that Molecule has to offer. But on the downside, you don't get access to the underlying database. That's one of the downsides.

Now, looking at the legacy side - the on-premise type solutions - customers control when updates are made. I've worked at a few of these legacy companies in my past. Typically, these upgrades are very long and arduous using older technology. You know, keeping the system up and running is the customer's responsibility. Integrations with the data with these systems can typically be expensive and very custom and bespoke. But, an advantage is the direct access to the data. And so, that's just a comparison between the two.

So, the idea with Bigbang is really to bring those two worlds together. We built Bigbang to break the barrier of data that you're controlled by the SaaS solution. So, where Molecule can stream live Molecule data directly to a database, leveraging all the advantages of using a multi-tenant solution like what we offer.

But also, getting access to the data directly. Having direct access to the data without having to go through REST APIs and having to do some of the additional work that was required to get to that data. What we're offering here is we're not actually going to be dictating that technology that our customers want to use to access that data. Some examples here - a lot of our customers use Power BI, Excel Power Query.

You know, if there are in-house applications that our customers have that need direct access to the database. You know, some examples - desktop SQL editors, as an example. Any custom code that needs to be written. You know, whatever language of choice that you run in your organization - dot net, Java, Python are tools you can use with it. And of course, modern business intelligence tooling. Molecule provides some of that within the application itself, but some of our customers also have other types of tooling they use for business intelligence.

And, you'll see in a second, too, where our customers can also upload their own data to Bigbang, to bring some of that data together. So, there's varied uses or tooling that can be used to go against this data. And again, Molecule's not necessarily going to be dictating what tooling to use.

We utilize Kafka to provide these data streaming services, where you are running and taking advantage of Confluent Cloud to provide these services. So essentially, what this diagram is illustrating is when - one of the things we did - choose Kafka and Confluent Cloud, in particular. It's a highly performant, reliable, and secure technology. Especially when we need to ensure that data coming out of Molecule gets to where it needs to go.

You know, we want to make sure that there are guarantees that that data is accurate. And, one of the big advantages of tooling, like Kafka and some of its capabilities, to ensure data gets to its destination is one of the big reasons we chose it. And, what this document diagram is illustrating is that as events occur in Molecule - so if a trade is created, a trade is valued. That information is flowing in real-time through to get to its target destination. We're supporting trades with sublegs, products, market data, and valuations. That's the main data we're currently supporting today.

4:58

And, what you can see in this diagram, too, what it's illustrating is that each one of our accounts really has their own data stream or has their own pipeline through our data streaming services. So, what that allows us to do is really to scale very independently from other accounts that might have larger volumes of data versus some that might have smaller volumes of data.

So, it's really independent flows of data through our data streaming pipeline that does allow us to scale quite a bit horizontally for some of our larger customers that that have millions of transactions that run through our system every day. So this technology allows us to support that.

Security is, obviously, an important thing for Molecule, important for our customers making sure that the security of the data is managed. Really along through this whole set of services, there are multiple layers of protection we have in place to secure the data and make sure the data that needs to go into a particular account is going to that account. So, we've spent a lot of time ensuring the integrity of that information.

Now, one of the big reasons we chose Confluent is really this connector technology. If you look toward the right, where it says Confluent connectors, that technology allows us to connect to multiple targets. Today, we're supporting a hosted PostgreSQL solution. Tomorrow, we can support other offerings.

In the future, the plan is for us to be able to support data storage, data lakes. If the customer's running MySQL, Oracle, or Snowflake and needs data pushed to other destinations, the same connector technology, the same implementation we have in place will cover that. To be clear, today is the PostgreSQL, the hosted PostgreSQL solution we're offering.

And, just diving a little bit deeper into the schema itself, this hosted database solution. So, there's two schemas that are managed within that solution: there's a Molecule schema and then the customer schema.

The Molecule schema is where Molecule will write the data to. So, the data that's coming out of Molecule - trades, valuation, et cetera - is written to that schema. Customers can't write to that schema, but they can read from that. But, there's also a customer schema that's there, as well, where customers can write to, and Molecule can read from, you know, optionally. But, the power there is really to bring those together.

So, if our customers have data in their environment, and they want to push it here that combines with our data, they can use this tooling to combine the data and pull it together and create meaningful information and any type of analytics that might help them make their decisions.

7:57

Alex Cerbone So, Paul talked about the pros and cons of having a SaaS offering. One of the biggest ones is access to the direct database. Things within Molecule are structured, but that does create a limitation around how we can pull that data out of the system.

The data lake really addresses some of those things. One of those is time series data. Most things exist in Molecule in snapshots of as-of dates. With direct database access, you can pull a series of date ranges and look at the variance between those.

Next one is going to go into pulling in data from other systems. You can create the custom fields in Molecule, use those as mappings, and get your external keys to your external systems to make the mappings a bit easier in those connections.

The next piece is all about really analyzing your data. You can create custom views, either in SQL or some other BI tool that your organization already has access to. The dashboard reports are an example of this in Molecule already that are embedded in the system. We use Mode, that's white-labeled within the system there. But, there's no reason that with a data lake service, you can't create more specific reports outside of that implementation process where we typically set this up today.

The next piece is connecting Power BI. So yes, it is a SQL database that's sitting on the back end, which does require some technical skills. But, Power BI has a natural language question processing system. So, if you have non-technical people within the group that want to analyze this data, that's a really good option to access the database directly.

Kari, I'm gonna take over the screen share here, and we'll go directly into the system.

9:45

Right up, you will see this is Mode on the back end. There are a couple of different ways to access the system here, but Mode is our system of choice, so we will stick with this for today. And, let's look at the first example that we use here, which is the analysis of your time series data.

If we wanted to pull a forward curve in Molecule, then we could go to our market data, pull a curve, and we have to choose it for an individual date. Where, in contrast, if I want to see how our nat gas curve has changed over time, then I could pull a range of dates and create a visualization of how this time series has changed. So, we see in these dark numbers, these are closer to the as-of date today that there's been a significant shift in the value of this power curve over time, but the shape has remained relatively the same.

The next big use case for this is the introduction of data from other systems. And, it seems like from our poll here, that's really what our audience is interested in. So, let's spend a little bit more time there.

You'll see when I pull this data series, I have a raw data series here. We talked already about the simplified architecture of the data on the backend here. That where you might have 100 different tables creating the architecture in the back end of Molecule, you only have five showing up here.

But, you can also introduce supplementary data sources. So, for this example, we're talking about looking at a treasury system or a GL system that you might be using for processing your netting agreements and your payments. We can pull directly from that payments - you see, I have it up here already - what payments have been processed. I can create those keys that we talked about before.

Let's say you're using SAP and you have the short name keys for your counterparties, you need to map those to your counterparty names that you use in Molecule. You can create a connection off of these counterparty connections. You can connect your payments to your trade IDs that they're going off of, and you can realize the netting terms off of it.

So, that allows us to create something like this, where we can summarize our total counterparties. With our trade IDs, you'll see where there's null. That means it's a combination of trade IDs off of the netting terms associated with that counterparty. It'll tell you your payment status. It'll summarize the realized payments from Molecule, and it will let you know, after you connect that treasury system, how much of your remaining balance is due.

And, that's really just one option. There's really infinite possibilities when it comes to connecting external data sources to supplement the data feeds in here. And, the last piece we're going to spend most of our time here is in creating more complex drill-downs of your information.

When we set up your reports as part of the implementation process, typically, it's a summary of what you need to make your decisions at start of day day. Right? That we give you the end state of that decision. But, it doesn't give you the option to drill further into that data with those sources.

When we do something like this, we want to go into our positions by product. Then, we can get a little bit more of a detailed view, summarized how we see fit. And let's say that, "Oh, wow. This ERCOT position is extremely negative compared to the others." Now, I can go directly from here. I can query my database directly and get into the detail of why that position is outstanding from the others, not just notice that it is low.

One thing to point out here, as well, is that this is a live feed coming from the system. So, let's go to another report that I have up here, which is going to be an emissions coverage. You have different plant locations. You have emissions associated with those power plants, and you are trading RGGIs to offset those positions.

We'll see here that this December position is lacking a little bit. That we have some obligations outstanding here. So, let's go into the system and see how this trades, and see if we wanted to mop up this position.

We're going to go into Molecule, connect it to my system here. We're going to go into Tickets directly since that's the source of that trade.

And, if I create a ticket, these are my emissions. It was in my plant one; it was a December position, so I'll grab a random December date. And, just to bump that one up a little bit, we're going to put in a 40,000 volume estimate, and we're going to tack on a random price. And then, I'm going to put this tagged to a product that is associated with that asset.

15:26

When I save this here, it will automatically feed into the database on the backend and then come into those reports. So, if I go back to this report, we see this is still red here. Let's go back to my query. I'm going to rerun. It'll let me know that that succeeded here. And, if I hop back to my report, we can see it's updated in real-time for me that now we're green, and we're covered in that position.

So to refresh, most major use cases that we're seeing are going to be time series analysis that wouldn't be available in Molecule otherwise, connecting external data sources so that you can supplement your data in the Molecule database, and creating custom reports that you might not see otherwise in the application.