3
The risks of unstructured data with Joe Pearce
Do you know how much unstructured data you have? If you’re like most organizations, probably not. That’s a big problem.
According to some reports, unstructured data – data that does not have a pre-defined structure, like documents or media files – makes up as much as 90% of the typical organization’s data. Understanding what is in this data has long been an issue for organizations but, as this week’s guest – RecordPoint’s Head of Product Joe Pearce explains – resolving it has become much more urgent thanks to the advent of Generative AI.
Joe and Kris (no Anthony this time) go deep on one of the most significant problems the average organization faces – and probably the one they ignore the most.
They also discuss:
- What are the challenges with unstructured data?
- What are the risks?
- Why does unstructured data accumulate so quickly?
- Why does AI make this task more urgent?
- What are the steps organizations should take?
Key takeaways:
- Unstructured data - data without a pre-defined structure like documents, emails, and media files - makes up 70-90% of a typical organization's data, posing a significant challenge.
- The rise of generative AI amplifies the urgency of addressing unstructured data, as these powerful models can expose sensitive information if fed uncontrolled data.
- Effective unstructured data governance requires a three-pronged approach:
- Knowing what AI tools you're using
- Understanding what those AIs are doing
- Ensuring the right data is feeding into them
Resources:
📏 Benchmark: How much PII does the average organization store?
📑 Blog post: Mitigating AI risk in your organization
📑 Blog post: To improve cyber resilience, organizations must manage their unstructured data risk
Transcript
Kris: Welcome to FILED a monthly conversation with those at the convergence of data privacy, data security, data regulation, records and governance. I'm Kris Brown, co-host here at FILED, and Anthony Woodward. The CEO of RecordPoint isn't joining me today, so we're gonna fly solo, but I'm very excited about the guests we're going to have while we normally float around high level issues such as data privacy or AI.
This week we wanted to get a little bit more specific and focus on something. Every business needs to work hard on. Understanding their unstructured data. And for that, well, I couldn't think of anyone better to help me with this topic. And that's RecordPoint's very own head of product. Joe Pearce.
Welcome Joe. Thanks Kris.
Joe: It's really great to be here.
Kris: Look, Joe, it's great to have you on the podcast. And for the listener, I'd love it if you could just share a little bit about yourself and the history before you came to RecordPoint.
Joe: Happy to. So, I've got about two decades building out enterprise SaaS out there in the field. first encountered the InsureTech world and then most relevant to the path that led me to RecordPoint I spent about a decade creating security, compliance and privacy software, and it was. Living in this world where I used to be blown away by just how few people had any clue where their regulated data was before we would go out there and start to work with them.
And so, getting into topics like the ones that we're talking about today with unstructured data, this is sort of stuff that is now completely possible to get a handle on these things. And I'm, I'm super excited to kind of dig into this world and a little less scary for folks effectively.
Kris: And of course, you joined RecordPoint because I'm here, Joe.
That's what you are supposed to say there. But anyway, that's fine.
Joe: You had a great pitch. I, I'll give you that.
Kris: Super excited to chat with you today. So, look, this week we're gonna be talking about the risks that come with unstructured data, which is perhaps a little less newsy than our usual topics, but one I would imagine affects every single one of our listeners.
And so, look, can you completely, briefly go over it? What is unstructured data and, and why is it problematic?
Joe: Yeah, happy to. Really, the best way to understand what unstructured data is, is to compare it to its counterpart in structured data. You can think of structured data as stuff that's in a database, right?
Or some application where you're using a database. So, this is data that has very well-defined structure. This field is going to be a number. It can't be longer than this length. That sort of stuff is structured data. So, imagine that you're using really most of your SaaS applications. You're using Salesforce, you're entering a new lead into Salesforce if you're creating structured data in the backend there.
You can also think of this a lot of the times as stuff the engineers do or the data scientists do, ‘cos they really love structured data for these sorts of situations. Unstructured data. It's basically everything else. It's the word documents, it's the images, it's the emails. I was at a privacy conference a few months ago,
I ran across several people that were doing drones and autonomous cars, and they're talking about trying to regulate and get governance on top of biometric data, which is another form of unstructured data. So, really. You and I, you know, kinda the big difference is it's, you know, it doesn't take the engineers to create unstructured data.
You and I have all the tools that we need to create it today in our email applications, or heck generating images on, on mid journey and things like that. And it's just because of the human element. It's a little problematic, right? It's very inconsistent. It's very unpredictable. If you could predict human actions, then you'd have a lot of a huge superpower out there.
We can put whatever we want into an email. Unstructured data is interesting too because it's, it's difficult to search for. So, if I had a million rows in a structured database, a SQL or an Oracle database, or even a semi-structured database like MongoDB, if I wanted to search through that data of million rows, I could do it with a query.
I mean, we're talking like milliseconds to return this data. But if I had a million unstructured files, some are word, some are Excel, some are videos, some is source code, and I wanna search for certain data inside of that. It's really hard, and this is why for so long, folks, were really just kind of sticking their head in the sand and ignoring it, particularly from the privacy and security point of view.
Kris: And I'd probably go average worker is actually generating a lot more unstructured data. That office worker is generating a lot more Word documents and emails, and perhaps they're generating Salesforce entries or, or working directly in a database.
That's not to say that they're not generating lots of structured data, but they're also generally, I would think that I spend a lot of my time in unstructured data sources, creating word documents, creating contracts, creating. Other reviews and, obviously responding to RFPs, talking to partners, these sorts of things. is actually what the bulk of an organization, especially commercial organization is, is generating. Think of banks, think of utilities, think of government organizations. This is the type of data that they're spending a lot more of their time on. So, with all of that, Joe, why is it so difficult to manage you?
You pointed out, it's difficult to search, but, you know, why is it so difficult to manage unstructured data? Like you and I have both now got 50 plus years in, you know, this space. I dunno why I said that out loud. Why is it so difficult to manage?
Joe: And you were mentioning that it's most of what we generate that was a bunch of different research on this. It's probably for most org, somewhere around 70 to 90% of the data that we generate. Right. And just think about it. Right now, as we record this podcast, we are creating unstructured data. In the video, we're creating unstructured data in the audio recording.
We're just creating a big lump of unstructured data out there, right? The emails that we send back and forth, the teams messages that we send back and forth, this is all unstructured data. Very rarely do I see anyone opening up Airtable and creating a structured database when you're just a normal, everyday worker and to we very, even, even things that look like structured data like Excel, it's not, it's, it's just, you could put anything you want in Excel, right?
You could drop a link with some binary data and. Put a whole bunch of columns right beside it. There's no, we'll keep trying not to use the word structure again, but there's no order to it really. And so, what that results in is you try to. Wrap your mind around all this. Not only is it most of the data that's out there, but it's just, it's tons of different formats.
And these different formats have their own standards. A PDF document is not a Word document, but we very well could be putting sensitive and regulated information in either of those things, and it's created in just like dozens of tools, if not hundreds of tools. If you look at some of the large enterprises out there, they're, they're tracking a thousand plus different SaaS applications, and another goodness knows how many like desktop applications.
And these all have their own file formats and their own different ways of outputting the information. I was trying to think of a good, good example as you were saying that, but like, think of images alone. How many different ways can we create images? Right. I could go and open up Photoshop or I could jump over to Midjourney or today in the news everyone's talking about how photorealistic OpenAI's ability to create images are all of a sudden, bam, I've got some new unstructured data there that I could put whatever I want in there.
The other thing that that makes unstructured data hard is it doesn't always live in the cloud. Going back to my Salesforce example. I can pull up the reports and the logs and the audit trails from Salesforce to see what's going on there. You can create a Word document on my desktop that never gets uploaded to anywhere that most enterprises are keeping track of.
And so now the example you could think there is I could be doing my taxes, right? If I'm doing my taxes, I'm putting my PII in there, I'm putting sensitive financial data on there, and it's on my desktop. It's not being monitored. It's just a much larger net and much more opportunity to create kind of a security, vulnerability, privacy, whatever you wanna say, footprint with unstructured data.
Just more opportunity in general to do those sorts of things. We beat this drum very hard, certainly for me through the late nineties and early two thousands. The one of the different terms I heard, the tsunami of data, the data explosion, the rules of Moore's Law, everything doubling every 18 months.
Kris: And again, I've used this example even on this pod where it's like 10 years ago I was using. An iPhone that might have had four or 16 gig of storage. And now I'm walking around with a terabyte in my pocket and it's full. And the problem with that is, is that, you know, I'm obviously now paying for the terabyte data plan.
I'm paying for the iCloud plan. I'm going to the next version of iPhone. I'm having to go, well, I have to buy the most expensive one, and I'm hoping that they go to two terabytes, which will get even more expensive again, just because of. I'm creating so much more video I'm creating, so, you know, taking much, many more photos of mostly my bulldog, but you know, clearly every now and then my, my kids as well.
But even just when I travel, it's the way in which we interact Now, the simplicity of what we want to be able to do in our day-to-day life. I. And I use travel as a really good example 'cos I do a lot of it with work, but all of my expenses, that coffee I grab in the morning, the train ticket to go out and see the, the customer, the parking that I'll have at the airport.
All of these generate receipts either electronically or in paper form. And in the paper form, I then take the photo of them and then I put them into that cloud application, as you said, and I fill in the data for it. Thankfully, in a lot of cases now that you know, those paper forms can be read by AI and fill in a lot of the work for me, so I'm just taking the photo, but I'm forgetting that that's a 1, 2, 3 megabyte photo that I'm then pushing up and being managed and stored and has.
References to where I was and when I was, you know, those, those little, even just location data, for example, that are all have their own level of risks, but just because of the way I work and the way in which I interact, I'm just generating so much more than I ever did before. I remember a time where I carried a manila folder in my backpack for the receipts for that trip, so that when I got back to the office, I could then staple them to the expense form.
Fill it out and sign it just to get my expenses paid, for example. And we should never forget in the unstructured world that paper is included in that. But you know, we obviously have that piece of we're trying to become less, more paperless. So, those volumes are huge. And I'd be interested, you know, do you have any stats there around volumes of data?
What's sort of happening there?
Joe: So, I know quite well that 70 to 90% stat, but it, it's the overwhelming amount of data that's being created at this point. And you were mentioning a whole lot about the effort that's gone in how you've been preaching for years. The about the tsunami of data. There's a lot of good tools out there for structured data, for sorting through it, for organizing it, for categorizing it.
Like there's a lot of stuff out there, but right now it's still a little bit wild west for most enterprises are. Forget individuals, but for most enterprises, when it comes to getting a hold of that structured data, excuse me, unstructured data,
and I guess, you know, that leads me to the next question. So, we've just said it's difficult to manage.
We've just said it's two thirds to three quarters to almost all of the actual data that you've got inside an organization. We've talked a little bit about, it's got benefits too. Be able to have, take insight from that information. But what are the risks?
Oftentimes, I like, I try to fall on the side of how do we add governance to enable business cases, but I think we have to be serious that you can't enable if you don't control your risk.
So, let's think about. The situation we were in a moment ago, like we've got a ton of data. Most of our data that we have as an enterprise being created by everyone. So, you have very little control over who creates it. You have very little control over where it's stored. You have very little control over what format it's in.
And so, we've just got this like almost apocalyptic scenario of just overload of data that's out there. That makes it difficult to do things like you've gotta be able to discover what data you have before you can add any sort of privacy or security controls on top of it. If you can't discover it, I mean, you're certainly not gonna classify it.
You're not gonna figure out whether you have regulated data inside of it if you can't classify it. In fact, you can't. You can't govern it. So, how many conversations are going on in the modern corporate world today? I was debriefing with a telecom security professional the other day who was part of a conversation that said.
Let's go get CMMC certified. And for those who don't know, CMMC, it's the new framework if you wanna sell to the US Department of Defense effectively to control classified unsensitive data. So, their conversation had a whole bunch of people on there from legal and compliance and security, and it, it went alright.
We know exactly what the return is or a good estimate of what the return is. If we get CMMC certified, let's figure out the cost to get CMMC certified. And so, everyone's like, all right, let's do that. How do we do that? First, we need to put those controls on that sensitive unclassified data. And the immediate question from everyone was, okay, what is sensitive unclassified data?
So, after a very long conversation with a legal team, defined it as strictly as possible, making it as difficult as possible to implement, everyone ended the meeting by saying, super. I have no clue where this data is or how to get a control of it. So, here you've got literally what could be a multi-billion dollar business case that is blocked because you just have no way to govern your CMMC regulated data.
Another side effect of this, by the way, is when you can't govern your data, is that you hold onto a lot more of it. Like how many folks we run across that are just hoarding unstructured data, quote unquote, just in case. And I think we all know that when you hoard data, you're increasing your overall security footprint and having impacts on that front.
Kris: It's born of and, and I do harp on about this myself, but born of the storage's cheap mentality that we've had for many, many years. But the issue being is that just the sheer volume of what we're doing these days in those different formats that are very, very large, it's not the case anymore. The. Day of the, the free lunch as you will on that storage side is gone.
Organizations that we bump into constantly looking at, well, hey, how do we actually save money through all of the storage that we've got here? And again, no one's, everyone's afraid to pull the trigger, if you will. You know, they don't want to be the one to delete something that they're not sure what it is, but they know they need to save the money that's related to that ever-growing cost that you know is on the bottom line is just storage.
Joe: And then as a result, you know, it's interesting, you almost see like a, a backlash to some of these controls as well, trying to be put on, on the different ways of creating unstructured data. In fact, I'm, I'm sure everyone's following the news here in the US, but I categorize Signal Gate, which is hot off the presses as the perfect example of how easy it is to expose sensitive information in unstructured formats.
So, I, I'm sure you've been reading off on this, Kris, like we. We had a Signal chat with the United States Vice President, the Defense Secretary, the Secretary of State. A bunch of other folks are sitting there discussing sensitive war plans. Very, very regulated information, and you would expect there to be some level of controls in place, but they kind of usurp to those controls.
To go out there and find a way where they could have these conversations, generate this sensitive content without the appropriate governance happening around there.
Kris: we've all done it, Joe, like, I mean, the number of times you've grabbed a Messenger message and gone: you're chatting with one person, you're chatting with another person, and then you're like, yeah, the bill's gonna be a $1.42.
And the person's like, what? And you're like that's for someone else, soz. Again, you've probably been in that group where someone's like, I don't think I belong here. Interestingly, obviously for me, very early in the piece got sort of my email, my Gmail address early on, so I've got my name and for whatever reason, it's a bit of a spam trap, Kris Brown.
So, I get every Kris Brown on the planet's. servicing of their car. The real estate agent chasing them up for this. So, I end up on all sorts of groups. My current one, I'm regularly a part of a chat for running club in Jamaica.
Things that I will definitely not be doing, running and probably visiting Jamaica anytime soon. Unfortunately. I would love to, but we've seen these things, right? It's very easy to happen. You'd think they'd be a little bit more. What's the word I want to use? Sensitive almost to the fact that they're talking about sensitive topics, but yeah, they've deliberately gone there, which because they could, I'm like, why are they able to do that?
Well, they're humans. They've got phones, they're connected to the internet. The other humans have got phones. They're connected to the internet and. They introduced the reporter and today wasn't gonna be so newsy, but like that was their control. It is a great example, as you said, on how easy it is for that stuff to slip through and create the noise that it has.
Joe: You hit the nail on the head there, right? Humans are the reason unstructured data is difficult to manage. Like if we summarize everything we've said, it's humans, it's us.
Kris: So, hit on that too, though. We do talk about regulation and legislation here quite a lot as it relates to the information management world.
The other risks here are. Penalties, fines, good old-fashioned slap on the wrist, depending on the, the jurisdiction you're in. Ultimately there's a lot of risk there. Just in that space, you start to look at globally, there are a ton of regulations. country, every jurisdiction has some form of public records act for the government side, we're seeing growing privacy legislation here as well across the globe.
At some count it was like 161 nations have some form of. Federal or growing federal legislation. And you know, obviously in the states there's an awful of a state-based legislation that's growing, and those risks are also there because it is targeting that data. This is where that information lives. This is, as you say, in the structured systems, it's probably relatively easy to understand that the field called name has names in the field called date of birth, has date of births in it.
And if you don't wanna hang onto it, you know where to go looking for it. It's very easy. But. In a recent study that we even did here at RecordPoint a little bit before your time, but we had a bunch of our customers run a trial on the, the privacy product, and we found that just 1% of all of the stuff that ran through the platform in any given month had PCI.
So, credit card information in it doesn't sound like a lot. Platform does hundreds of millions of objects on a regular basis. 1% of those for just one customer was 20 and 30,000 credit card numbers living in unstructured data from SharePoint, from file shares, from teams, from interactions with customers who trust the entity that they're dealing with.
So, all of a sudden, that trust is betrayed because it's sitting in places where everybody has access to it. Now, there was probably no intent from anybody in those organizations to do anything bad, but the issue is now is that if I was to scrape an entire SharePoint environment, my expectation now would be that 1% of it has credit card information in it.
So, if I'm a bad actor. That risk exists there, and I tend to feel that from a security perspective, and you know you'll have a better idea than I hear, but from a security perspective, this is more of a when than an if situation. And when are you going to get breached as opposed to if you are going to get breached.
Joe: Yeah, and your SharePoint example, so in Seattle. We have big Boeing presence here. I mean, was it just a year and a half ago, somebody was scraping a SharePoint site from Boeing and from there they, they, and pulled employee PII, they pulled defense projects. They pulled compliance reports on export rules, went, underwent their audit on what you should have done better.
It was, hey, you should actually classify this stuff correctly, so know what you have in there and you need to put access controls on it. And to kinda follow up on something else you said, like it's, it's not always unintentional folks doing this either. I had those examples from AT&T, T-Mobile, Capital One, Wells Fargo, just like call center employees stealing the data from their customers, financial information, social security, or national IDs from them, and then in some cases literally selling it to the mafia.
And so, you've gotta stay on top of these things, right?
Kris: Let's pivot a little bit. It wouldn't be a FILED podcast if we didn't say the magic words, AI or artificial intelligence. So, let's go the AI hook. Generative AI loves unstructured data. This is what our chap GPTs and all of the, the other ones, the Geminis, et cetera, are all, all driving and training.
And LM on a particular data set's a great way to turn a volume of falls into insights. So, I think we are both big believers of that. So, should everybody be doing this. The irony here is we need to ensure that we've got solid AI governance in place. So, the tools most sort of for understanding the unstructured data aren't being trained on that sensitive and confidential stuff.
I'm sort of working with Microsoft at the moment around a, a number of co-pilot pieces and I think you recently Joe did a, a session where you were talking about a number of Copilot or other AI, LLM implementations are being blocked on the back of that worry of. Releasing sensitive confidential information.
So, so talk to us a little bit about that and, and how unstructured data now is linked to AI and LLMs.
Joe: Yeah, I mean, you're right. AI is the hot topic of the day. They should give you a soundboard and you have a, a crazy noise, like the morning shock jocks every time you say AI.
Kris: Let me pivot there ‘cos it's actually, that'd be fun. Maybe that can be the Christmas party edition that we roll through the highlights and, eh, every time Kris says AI or do that for the producers, the edit. How many different ways can Kris say AI? I listened to a session from someone yesterday and they were saying AI today is like Amazon sort of 30 years ago.
It's like, everybody's like, what do you mean someone's gonna sell books online and that's gonna be the, you know, bigger than Walmart, bigger than some of these other organizations. And, and people really sort of couldn't quite grasp that that what it really is going to be truly possible to do with these.
And interestingly, Jeff Bezos. You know, whether you like him or hate him, had a, a bit of a vision there. It was. It was less about the books. And it was more about how to work out where in the country of, you know, obviously the United States, you should be putting warehouses because books are all roughly the same size.
So, shipping our book from one place to another is a finite. Problem and being able to work out what's the quickest and best way to create warehouses across those locations. That data that he created that first few years around that, that's what Amazon was all about, was that building out the infrastructure.
The, you know, the AWS came from a, the, the cloud infrastructure. He, he rode that wave. Off the back of studying the books. And so, the, the interesting correlation I'm trying to make here was, you know what we think AI is possible and what it's capable of is almost looking at what do we think Amazon was capable of 20 something years ago.
We are not really sure what the underlying things that this is gonna be capable of. It is gonna be huge for us over the next 20, 30 years. We do harp on it a little bit about it here on fault, but it's, it's really, really interesting. I, I found that a really, really poignant piece. But we've still got that problem of it relies on unstructured data.
We just said unstructured data has a bunch of risk. It's huge volume. And what does that mean for the listener, Joe?
Joe: I think you framed it well. That's the literally trillion dollar problem to solve, right? Every company wants to put AI into their organization. Even if right now folks are thinking super low level, just roll out and copilot into their organizations.
But if they want to roll this out. They wanna do it mostly with unstructured data. They want to look through their meeting notes and their emails to help them make better decisions. Like that was the whole basis for Mills Chat, which General Mills has now rolled out to pretty much everywhere in the world, like how many, almost a hundred countries they pushed it out to.
We keep running across situations where architecture documents, they want to use them to make better railroad engineering or bridge engineering decisions. But to do these things like you've gotta. Be able to enable that unstructured data. One thing that we know for some market research that we've done at RecordPoint is somewhere around 80 to 90% of those companies that want to use AI, which effectively everyone, they're frozen, like they can't do anything.
And the reason they're frozen is because they don't have data governance in place, and they don't have security governance in place. And so, what's happening is organization after organization is rolling out what I heard. Nicely termed the other day, the AI firewall, which is basically a policy that says Do not use ChatGPT to do your job today.
And we also know from, from research that effectively everyone's ignoring that they're just firing up their personal ChatGPT account. I saw some data from cyber Haven the other day, somewhere around 75 to 90% of ChatGPT, Copilot and Gemini use in the workplace right now is on the personal account and I guarantee you barely anyone has gone in there and checked that little flag that says, please do not train on my data. So, now we're leaking tons of sensitive data out there, and we have numbers around this. Again, from Cyber Haven. I've been following their work closely, but around 23% of all queries. That go out into personal ChatGPT accounts, for example, have sensitive, confidential or regulated data in them. And I, I posted an article to, to LinkedIn the other day. I, I think this is realistically the largest leak of sensitive data in human history. everyone is just going out there and pushing their sensitive data out there, whether it's source code or you know, it's guaranteed to be PII.
And you probably see all kinds of other stuff out there, but in the meantime, it's these fears and these worries is why you're talking about Microsoft. I think it's around. Three to 4% of folks that are trialling, copilot in the enterprise have gotten out of the pilot phase. Everyone else is just stuck trying to enable it because step one I turn on copilot.
Step two, I wanna put data in there. Step three, everyone freaks out. Cause now you're gonna put my sensitive I IP and things like that out into an AI. There's a reason why industry organizations like the IAPP, they have listed this as literally the number one problem that we have to solve for the year 2025 I think we all know is the, like the magic of AI wears off.
And we start to get down to the reality of doing business with it. That this isn't gonna be solved this year. It's gonna take us many years to mature this, where we can actually start to get those business cases of saving bunches of money and making bunches of money off of our AI business plans.
Kris: Like anything, legislation, regulation is gonna be slow to follow it.
It isn't gonna move as quickly as the market does. It isn't gonna move as quickly. So, we, we, we can't wait for those things to come along and tell us vendors, us. Customers, US organizations who have got lots of data, we can't wait for them to tell us what should be done. So, let's play the advice game.
Joe's got the crystal ball. What? What can be done? You need the right tools and the right data. Governance is placed, but what? What can be done?
Joe: Yeah. Well, we've talked about a lot of this. I mean, step one, step one is adding the governance on your data. But let me step up. Kind of a lever, a level higher than that.
There are frameworks out there that are in place to recommend how to get control of your AI, and they kind of break into three areas. I'm gonna summarize what's literally hundreds of controls down to three areas. The first one is know what AIs you're using. So, you probably know that you're doing a trial of Copilot, for example, in your enterprise.
But I was chatting with some Gartner analyst the other day, and they pointed out in the next two to three years, 90% plus of your enterprise SaaS apps are gonna have some form of AI built into them. And so, whether you think you're using AR or not today, you're using AI, so knowing what AI you're using, the second step is know what your AI are doing.
So, you actually have to look into it, whether it's custom models that you're building internally or the external tools. I mean, we, we receive these inquiries quite often now on how are you using our data? How is it being trained, or are you training on the data? Within your systems, but this is also gonna introduce some new concepts to a lot of enterprises, like the concept of bias and fairness on knowing what your AI does, and these are very specific terms often to industries.
Once you know which AI you're using and you know what your AI is doing, then we get into the exciting world of knowing what data is feeding into AI and this. Is actually the hardest part. Like I can do an inventory of AI across any size organization. I can pull the SaaS systems that are being used within an enterprise.
Like this is a finite amount of information to collect. Getting the data that's in there, that's much more difficult because you want to apply those privacy controls. You wanna apply those security controls down to the user level. You wanna do it against your unstructured data. If I was an enterprise tackling this, I would tackle that hard problem first, because that's where the inevitable leaks from the quote unquote shadow AI are happening.
Get governance over your unstructured data and then you can feed it out. Into your various different AI systems. So, what does getting governance mean? Just kind of summarize to the high level, and we've talked about this a few times on the podcast, Kris, but it's get all your data into one view so you know everything you have, go through the time of classifying your data.
Is this a legal document? Is this an HR document? I mean, there's various different ways you can classify it, but know what type of data it is and then get into the tricky business of knowing if you have sensitive data in there, we're talking about, you know, you detected all those credit cards for that one customer know where you have credit card data out there and then afterwards.
Once you have data governance in place, you can implement your retention and your data lifecycle management, your controls, to minimize your security footprints. That's basic data governance. And then you can extend that to, hey, we know what's inside of our data. We know what data. We don't want to go into AI.
We know who we don't want to have access to that data. Magic, you kind of now have a concept of AI governance out there. And that's really, you're talking about the frameworks. I mean, there's, that's basically what they're telling you to do. Whether it's the U AI Act, or I know UN, or some folks are working with like the OECD to put together frameworks.
A whole bunch of national governments are trying to regulate this. I personally am a big fan of the NIST A-I-R-M-F. I think that's kind of the most holistic structure that's out there. But even if you don't want to get down into the details of learning a new framework and possibly even doing internal and external audits against a new framework.
If you start to take your, your privacy controls today, your GDPR, your CCPAs, and your security controls, so your PCIs, your CMMCs, or whatever frameworks you're using to comply with, if you take those and just extend them into your AI environments, once you have control of your data, there you have, you know what AI you have and you know what they're doing.
You're more advanced than like, I'm talking to people all the time. You're more advanced than 99% of folks out there. So, don't always reach for the stars sometimes. Just do what you're already good at and extend it out into this new AI environment.
Kris: And I think it's all great advice. And, and interestingly for the last 20 something years, we've sort of been saying the same thing, get your data under control. There's plenty of value there. You can, we've started to look at the storage problem and how you can get a bunch of cost savings around storage by doing good data minimization. You help with regulation around classification and, and just straight out data governance as you've said.
We now have a genuine new use case, which is if you've got all those things, you've done the right thing, you can actually take more advantage of AI than anyone else, and this is going to become a competitive advantage. So, if you are that records manager, you are that information governance professional.
If you are advising organizations around security or AI, and you're not saying to them. Get a good handle on everything that you've got in the business. This will lead to faster implementation. It will lead to value more quickly. This is a massive use case for this industry that has not been there previously.
And I think while today's topic might have been a little bit dry, and certainly, you know, Joe and I have tried our very, very best to, to keep it sort of real and exciting, but getting unstructured data under control, you know, should be one of the top priorities for any information governance professional.
Especially in the era of AI. Joe, thanks for joining me today. On behalf of Anthony Woodward and myself, Kris Brown, looking forward to hearing from you all and your comments as we trip around it's conference season. Again, thanks for listening and we'll see you next time on FILED.