The role of risk managers is rapidly expanding, requiring the need for integration across organizations and deeper relationships between risk, data and digital product teams.
Pursuing net zero targets, participation in new markets such as carbon and LNG, and digital transformation all generate new opportunities for risk — despite the great opportunities they present. Today, we sat down with Global Head of Digital Product at Abaxx Technologies, Carrie Jaquith, to explore how risk teams can balance risk and reward and what role data and product teams play in the process.
The following Q&A is created using slightly edited excerpts from the episode transcript, optimized for readability. Download full transcript.
CJ: Oh, it’s such a juicy topic, and I am so thrilled to get to nerd out with you on it because digital product is an amazing space. We get to work at the intersection of design, engineering, business, risk and governance. When you’re working in digital product, you’re orthogonally positioned at the center of all the things.
Digital product, it’s so fun to talk about because it’s a space that we all interact with in our day to day lives. When a digital product is done well, we know that the product is amazing and makes our lives easy. We know that almost because it’s invisible to us.
When a digital product is done terribly, we know because we get an error message or because the thing we’re trying to do doesn’t work. And when it’s done really poorly, when a digital product doesn’t connect with risk, everyone knows — especially the lawyers. So I’m super excited to talk with you about this space of digital product and data and risk. We could do many, many pods on this.
These are great questions. I’m one of those humans that is a collector of signals. A collector of themes. And that always feeds back into the digital products that my teams are building. We’ll see these themes percolate around the people that we’re working with, the humans that we’re working with. And I am one of those humans that is a connector of humans to solutions. So I love to collect the themes and I love to take those and make those into solutions for humans.
When I think about the needs of both chief data officers and chief risk officers, the needs bubble up into needing to get access to single sources of truth, needing to get access to analytics platforms that give you a pane of glass, if you will, to be able to see into your data, to ask questions of it.
More recently, tools are emerging that are purpose-built for listening. Think: tools that listen to our data on our behalf and then translate it for us. So if you’ve been hanging out with Chief Data Officers, as I do — not everyone probably takes joy from hanging out with Chief Data Officers. They are some of my favorite people. If you’ve had the opportunity, you should leap at it and I think what you would see them talking about recently has been big efforts wielded around building out data repositories that provide a single source of truth, that leash together disparate data sets that have grown up in the organizations, both intentionally and unintentionally, and then building these digital products that sit on top of this data.
These are really interesting needs, right? The need for digital products that sit on top of single-source-of-truth data, because what you find in the field is there are all kinds of digital products you could slap on top of your data. But, at the end of the day, if your data is dirty in a way that you didn’t realize, or is structured in a way that doesn’t let you permission it to the product that you’re sitting on top of it, in the way that you need to — then you’re just burning millions and millions of dollars.
This happens in all kinds of companies: you have this goal of building this single source of truth. You hire people to help you do it. And you get a year into the project only to find out that you didn’t architect permissions high enough in the chain of the data life and you have to undo everything and redo it. Sometimes that happens because you inherited a data set or you inherited an ecosystem when you acquired a company. Sometimes it happens because you just didn’t get enough time. And sometimes, it happens because the tech changes and by the time your teams have started to build, the tools have evolved because this is a space that rolls really, really, really fast.
Single sources of truth are really, really interesting. The truth changes depending on the audience. So you have this whole other aspect of like, what is David’s truth versus what is Carrie’s truth? And just getting to an agreement upon what is the truth that is valuable for our purpose? That’s a whole other situation.
There are a couple of goals around the single source of truth. One of them is you want to be able to bring data to bear that is generated in one area of your ecosystem that another area can use. I should preface how I talk about data and product and risk with the fact that I’ve spent many years in highly regulated spaces.
So, working in digital product, investment banking, insurance and property and casualty insurance, and now at Abaxx — these are spaces that are super highly regulated. The implications of the work that we do have an impact on economies and on human lives. So the purposes of bringing data to bear in these ecosystems runs the spectrum of needing to be able to bring to bear data that that impacts decisions on actual human health and human life and bringing to bear data that impacts decisions and recommendations around what governments should do, what corporations should do, and — brings to bear, beyond that, driving revenue. The goal is to bring data to the table so that you can present differentiating revenue-driving recommendations drawn from data that, without bringing it to a single source of truth and to a governance space, you just can’t get to it.
Historically yes. What is happening in the data space now is there’s an evolution from highly hand-curated data to sort of this hybrid human-and-machine cleaned data onwards to data that is machine-to-machine cleaned. So there is an element of evolution around that that we see for sure.
Yeah. It’s the idea of laying a sheet of glass on top of your data so that you, the human, can ask questions that you can’t ask in an Excel spreadsheet or that you can’t ask in a two dimensional data space.
Being able to build tools, we’re at a place now where we’ve got — honestly coming out of the video game space — visualization tools that allow us to look at our data in ways that we could not look at in real time. This visual language gives us the ability to see information that we just as humans, like our human eyeballs, could not have comprehended 20 years ago.
This pane of glass paradigm is really interesting because it is super easy to give humans all of the data in the data set, right? That is the easiest thing in the world. What’s really hard, from a digital data product perspective, is finding the right way to aggregate, distill and filter the data so that the human sees what the human needs to see.
I was at this this machine learning conference recently and one of the heads of fraud and risk from, I think it was Equifax, was presenting on work they were doing around machine-to-machine learning models and how they were able to visualize the behavior of risk in credit fraud in a way that five or six years ago you wouldn’t have been able to do. Imagine a piece of paper where you sprinkle sand on it and you run a little magnet underneath it and it pulls all the little metallic magnetized bits together. We have data sets that our risk teams get to use now that are structured in a way that when looked at through this kind of pane of glass, you can visually see behaviors around risk actively move and cluster together. You can see this bloom of cluster-y, risky or fraud-y behavior. That, to me, is the most amazing thing because it wasn’t that long ago that you could not have imagined being able to see that — let alone see it in real time. All you could see was a big flat field of dots.
Oh, for sure. This is such a fascinating space because we’re able to leash in data that streams in from sensors at this point. For instance, in the property and casualty space, you were able to enhance how you model risk based on data feeds that you’re getting from moisture sensors on construction sites.
And just two generations in with that tech, you’re going from data that was super unstructured and super noisy and super hard to understand to the hardware getting optimized so that it’s kicking out cleaner data and you’re able to bring that data together. My teams have worked on projects where scarce data has been a problem that has driven us to need to devise and build synthetic data to augment our scarce data and be able to model and ask questions of that data. There were scenarios where I’ve had so few data points that I can tell, for example, that Dave is Dave. Because the data is so small, if there’s only one person that is Dave-shaped, if the scarce data is scarce enough, I’m going to be able to figure out that it’s Dave even if I change the name.
So we’ve gone from this problem of having too little data — where we were generating synthetic data to obfuscate the things that we need to make sure don’t come out — to having tons and tons of data. When you’re talking about sensor data flowing in and very large streams of data, you have the new problem of big data. With risk teams, you need to be able to understand the risk associated with this volume and pipeline of data that’s flowing through your ecosystem.
So there’s this really cool space and also really terrifying space around listening tools or digital products that are designed to sit on top of systems and listen. There’s a suite of digital products focused around process mining where you’ve got this software that’s listening to the activity in your ecosystem to make recommendations to you around what you could do better.
You’ve got listening tools that are listening for things like servers that should be asleep. But at four in the morning they’re not asleep when they should be. And they’re kicking data out, which is an indication of a breach. We’ve got listening tools that are listening for our voiceover IP. In some scenarios, this is something that you want — because you’ve got highly regulated conversations that need to be audited and auditable. So you’ll have tools listening to conversations to understand whether or not something like maybe insider trading is happening. And then you’ve also got kind of the creepy listening where you’re listening to customer service calls and converting them from speech to text and then parsing them to see: was the customer given good support? Were they not? These kinds of listening tools are wielded both by risk teams and they inform risk teams. It’s a super, super interesting space.
What do some of those levels of technology between the raw data that’s being amassed and the human being who needs it for decision-making look like?
The data pipeline is super interesting and it is utterly invisible. You only know that it matters when it breaks. The data pipelines that my teams work with are varied. There’s raw data that’s collected by humans in some cases. In some cases, it’s collected by machines. It is either hand cleaned or machine cleaned. It’s structured by someone who’s known as a data modeler. It sits in different kinds of storage devices like cloud servers. And it is sometimes transformed on its way in. It’s sometimes transformed as it leaves and heads to a digital product where the human actually sees it.
In between there are humans that are designing the user experience and user interface. They’re making sure that what you see is understandable to you, that what you see you can ask questions of, and you can visualize and use it to make decisions. They’re then, in turn, taking how you’re using these products and taking data on how you’re using these products and feeding it back into the data that’s driving the products. It’s quite an engine.
Well, again, video games are probably going to eat the world. Everything that’s happening in compute and rendering in video games precedes what’s happening in the enterprise by about six years. So the way in which you navigate your data is just going to get more and more immersive and more and more interactive. You are going to feed the data as a human in ways that you do not expect right now, in both good and bad ways. This is definitely a Spider-Man ‘with great power comes great responsibility’ situation.
I think if you’re going to play with them right now — you’d better be taking a look at Roblox, Minecraft and Fortnite. Fortnite, I think, will probably eat the world. But one of those three will get a view of where data’s headed.
That’s a really good question. There are three obstacles that come to mind: Fluency timing, and resources. Those three themes come up a lot with this space. Fluency, meaning your teams need to be fluent enough to build, configure for purpose, and deploy new products. And while they’re working full time, it can be hard for your teams to be free enough to ‘learn up’ to be able to do the new thing.
Timing can be a real ‘gotcha!’ in big organizations because if you just lease servers for five years, if a brand new really high-value tool comes out, you may not actually be able to use it until your leases are up and your software licenses cycle over. So just capital expenditure cycles can be a real gotcha for this space.
In terms of fluency and timing. There’s the piece around rules and government regulations that are actively changing. So for your teams to be able to build digital data products and run them compliantly in your organization — you have to be actively working with your risk team. We’re four years into GDPR, we’re four years into California’s Consumer Privacy Act, just this month Illinois passed a rule around AI bias requiring Clearview AI and the ACLU to come to an agreement around how their data tools will be wielded. There is this interesting challenge working in this space because you are constantly learning, constantly keeping your fluency up, constantly looking to keep up with regulatory changes.
What you may go to market with today may be illegal to sell tomorrow. You can find out that your business model is blown up within 24 hours. That’s a reality right now. Regulatory requirements are changing as soon as people get fluent with how data is getting used. Regulations are being implemented to provide guardrails to make sure that data is not abused.
In terms of resources, getting the right people in the right room at the right time is so important with risk and data products because you can unintentionally introduce vectors of risk exposure if you’re trying to build these things up in a vacuum and you don’t have collaboration and you don’t have diversity of view. An example of this from real life was my data science team was working on a project to use machine learning to make recommendations for a nonprofit. This recommendation engine was to recommend children’s coat sizes for a nonprofit. The team was very young, none of them had children, and they were all one gender. So you had a group of data scientists, all one gender, none of them have children, collecting data to make a recommendation on a child’s coat size and then building the machine learning model around that data.
And because it was a super narrow and slightly biased group of humans building this model and collecting this data — the first version of the output of this model was not that great. It didn’t account for kids in real life who are much bigger than the statistics that the datasets that are commonly used are pulling from, which are from the 1970s and 1980s.
Their model didn’t afford for kids who might be one gender, but want a different gender of coat. It didn’t afford for region. It didn’t afford for geography. There were just data points that, because the group was coming at this problem from a very specific background, they just couldn’t imagine.
So when you’re building digital products and data, it’s super critical to have team members from your risk team directly in the room, to have your data governance people and your data privacy people directly in the room, and to have domain experts directly in the room. That’s where conversations will occur like, “Oh, hey, you know, in practice we all think the word Dave in this data set means Dave.” But, in the right set of hands, Dave does not mean Dave. Dave is coded language for, let’s say, chocolate cupcakes. So, having the right people in the room is super critical.
You make a great point that you need somebody with experience at the end of that line to be able to say whether the answer coming out of the box is stupid or reasonable. That code example you just gave is a great one of how this can happen. As we use the technology and data to better manage our risks, what other new risks are we unintentionally creating?
We introduce risk into data unintentionally, into technology unintentionally. Every time a new generation of tooling comes out, humans are very predictable. We invent a thing. We make cat memes with the thing. We figure out a way to turn the cat meme into a business model. We make amazing businesses with the thing. Then we realize that we break some things with the thing. And we, very quickly, in best case scenarios, respond, and we write governance and guardrails around the thing.
Carnegie Mellon stood up its department focused on computer science and ethics only five or six years ago. Historically, we haven’t had ethics programs within our computer science programs. We’ve had them in medicine. I think we are, at scale, seeing a need to embed ethics, governance, risk, earlier in academia than we might have historically because we’re seeing the tools that we’re building wielded in ways that we didn’t anticipate.
This can range from being wielded around really thorny, horrific use cases that our data science and our data product people couldn’t have imagined when they were building the products to completely unintended consequences where product teams don’t realize what will happen until they get the product in the hands of humans. Humans are really good at breaking things and using them as not intended.
There’s another unintended consequence around risk that I’m personally fascinated by. I was chatting with a data scientist friend of mine, Zoe, about this last weekend. The idea that we have set up listening tools to listen for risk. And we’ve set up these tools to listen for signals in language around risk.
There’s a whole area of digital products that listens to tweets, that listens to Facebook, to LinkedIn, that scrapes earnings reports, that scrapes CEO keynotes and listens for words to surface signals around the value of a company and in some cases, the truthfulness of the human that’s talking and in some cases — the risk associated with that company because of the language that they use.
So there’s this digital product space that is like a giant microphone pointing at all of the data that’s sitting out in public spaces and is listening for things like risk. And this giant microphone is wielded by risk teams. In some cases, it’s also wielded by asset managers to surface signals that will guide whether or not they invest or don’t invest.
And this giant microphone that’s listening to these signals has been identified by the humans that are talking. So what you have happen — once the humans figure out that they’re being listened to for specific keywords like ‘um’ and ‘uh’ — they hack how they talk and they hack what they say in public spaces and they will actually introduce signal noise, they’ll introduce signal skewers, they will introduce keywords into their keynote that will skew the giant microphone. They end up bending the signal that the microphone is picking up and bending how the algorithm runs. The data scientists designing the algorithm have to tune the algorithm for the aspirational skew that comes in from the humans. So these are really interesting spaces. There’s this unintended consequence that is the result of a tool being stood up to do one thing, to assess risk through signals, and then it’s totally hacked by the humans.
It is really interesting! It’s a topic for another pod. But yes, it is super, super fun and super fascinating to watch.
One of the traits of Web2 is that it’s a curated space in a lot of ways. So much of what’s in Web2 is intentionally placed in Web2. Either someone’s putting that data out there on your behalf or you’re putting your data there on your behalf. The key differentiator — this introduces huge risk in Web3 — is that you’re generating the web in real time through behavior in a lot of ways, versus hitting a publish button or hitting a post button.
One of the core traits of Web3 is that it is actively being built in real time by the humans and the machines that are interacting in it. And that, from a privacy perspective, is a completely different paradigm. There’s a really big mind shift to go from ‘Oh, the data about me is put in Web2 by me or by someone on my behalf or maybe observed by me.’ with some level of control around that to, with Web3, more of an embedding of live creation. That’s going to take a few years. There’s going to be a few years of learning around that.
When I look at the bleeding edge of tech right now, and the emergent spaces around tech, there’s a huge opportunity, and really interesting risk, and huge amount of education that has to come out around innovations having to do with things like decentralized autonomous organizations and decentralized identity.
When you think about the way you and I consider ourselves as humans in the world right now — we think of our identities as existing in physically bound artifacts. We think of our identities as existing in our passport. We think of our identities online as kind of a username and password like an avatar, maybe it’s our Twitter avatar.
There is a coming evolution around how identity is handled in digital space. It will equip you, David and me, Carrie, with having data about ourselves that we have the keys to — right now, we do not have the keys. Currently the keys sit with very large organizations in buckets of human data that we can’t really control. Those buckets are fed by third parties and they’re held by third parties. There’s a very interesting future state around and what will happen with decentralized identity and what will happen if we have a mechanism to hold more information about ourselves. It means that we could potentially sell that information, right? If I’m the owner of it, it takes on value. It changes my relationship with some really big companies. And it also introduces risk because there is a risk around things like — what if I blow up my own decentralized ID data, is it possible for me to delete myself from the metaverse? I don’t know, maybe!
It’s such an early space, the risk cases haven’t fully been baked out yet. With emergent, decentralized, autonomous organizations or DAOs, we have frameworks from a digital product perspective, and from a risk perspective for working within organizations where there’s kind of this known pattern.
There’s an HR department, there’s a data privacy person, there’s a legal person, there’s the sales team and there is the marketing team. You’ve got these known ecosystems within big organizations. When you look at DDoS, you’re potentially looking at a future state where the structure, the organizational design that many of us have navigated within for many years, gets atomized. It exists as unconnected, smart contracted relationships where the relationship between departments flows through code. It flows through a coded contract, a computer coded contract. That, from a risk perspective, is a really interesting problem, right? Like how do I measure the risk associated with contracts that are designed to be automated, human-hands-free and just happen in the background until they don’t.
It’s coming! So get ready.
I feel like we're at a bit of a fork in the road where we have access to all this information which can help us better measure and manage risk, but it raises a very important issue of privacy. Is the loss of that privacy too high a cost to pay for the benefits of better risk management? Do we go down a path where it feels, to me, very Orwellian where we're basically naked in the metaverse and everything about us is known and taken and sold for profit? Or do we have a way to protect our privacy while still being able to collect enough information and use it to manage risk better? I hope the answer of ‘can we protect our privacy and manage risk better?’ is yes. And if so, how can we do that?
There’s a couple of areas that I keep an eye on in this space that the product teams that I get to work with actively work on. So obfuscation and encryption, every data team is working around how to ensure they’re encrypting what they should be encrypting from the right start point to the right endpoint? That they’re disclosing only what people need to see.
At the end of the day, Dave, let’s say, probably doesn’t need to see all the data. Dave needs to see just the view of the data that is legal, and allows Dave to do an amazing job. I’m really fascinated by the work that’s happening around zero knowledge proofs and around homomorphic encryption. Some of my favorite humans are working on homomorphic encryption tools with the goal of encrypting data in ways that allows you to take maybe PII or your health data from from one place and encrypt it using homomorphic encryption, pair it with someone else’s health data, run models against it, and then bring it back down.
That’s a really interesting space because you can also do that with financial data. Imagine potentially taking the model that you’ve built to use to measure risk around a book of trading data that you can’t really take over the wall — you can’t even take over the desk wall, like desk to desk.
You can imagine that being able to encrypt in a way that allows you to bring that data over the desk-wall, or over the actual wall, could be really powerful. There is an active effort across financial services, health care — anywhere where you’ve got highly regulated data, private data that you really need to look after — to make sure that the data that the humans are interacting with is obfuscated where required and is redacted where required.
In the past few years, we’ve actually seen the rise of the great deletion. Some of the biggest spends in the last two years have been redeployed and clawed back from building to deleting. It’s fascinating because we’ve built all of these machine learning tools and these web scrapers to go out and crawl and gather data and bring it together and now, we’ve got teams that are realizing ‘Oh, we actually have new rules that govern this big bucket of data.’ The budget that was going to be used to build stuff now is being used to go back and delete data. It’s incredibly interesting and will create an opportunity in the market for some really smart data scientists to build data deletion tools.
Yeah. It’s a really thorny problem. Even on the ground in building. We’re sometimes working with data that our own engineers can’t see — and shouldn’t see — it’s sort of like trying to build a house in a blank vault in the absence of having good synthetic data and in the absence of having really well obfuscated data, you have huge risks for your teams.
You see this kind of problem crop up with companies that have been fined for having buckets of test data sitting out in the open web. Those engineers probably went in with really good intentions and were just trying to get the data that would allow them to build the digital product that would be useful and would drive revenue.
Most people don’t leave data lying around because they are malicious and litterers of data. There were not, many years ago, great ways for teams to rapidly generate synthetic data. There weren’t great ways to rapidly obfuscate. There weren’t great ways to govern. It takes some big mistakes for humans to figure out big fixes.
I think you’ll continue to see an evolution of how risk managers are able to quantify the value of the things that they need to assign a value to, so that they can assign a risk score to the thing of value.
We’ve gone from, for instance, in property and casualty spaces, from having very simple models for how we assign the value of a pack of llamas and the value of a painting. Across the board with risk, we have data that now allows us to more richly measure what we’re measuring and view that what we are trying to measure with new ways. I think we’ll absolutely see a continued evolution in how we create scores and bundles of scores around risk. Those portfolios are bundles of scores — and can in turn be leveraged beyond risk. In some cases they’re business differentiating when you start to drill into them. So I think we’ll totally see that continue to happen.
I’m on a think tank at NYU and the think tank is filled with a number of humans that work in the genomics space and were members of the teams that mapped the human genome the first time, so the topic of risk comes up frequently. There are quite a lot of questions around like, ‘What happens if we edit the genes of humans and we turn ourselves into monsters?’
This is a concern. And I’m reminded that, with regard to evolution versus revolution, Doctor Mishra often says in that space: we’re always afraid that our machine learning models and our edits will alter humanity. In practice, it’s more like we operate in spaces that are more like a river. Our models will change the course of the river for a period of time. But then evolution sort of moves it back toward its own course.
I think about Dr. Mishra and how he describes human interventions with genomics, the idea that we think that we will revolutionize but, in practice, sometimes our revolution turns into an evolution. We build a big dam and then the river rolls through the big dam and changes our course, and we have to modify how we’re doing things at the same time as learning how those things in the field change themselves.
I think evolution and revolution, we will see both. Ultimately we will see both of these in the risk space. My goal and the goal of my teams, is certainly to minimize the destruction that comes with revolution — but to maximize the growth that comes with evolution, and to leave things better than you find them.
Ultimately, that is the goal: to leave things better than you find them, to help the humans get to the data, help the people building the data, get to the humans. That’s the goal anyway.