Four fifths of organizations have been burned by employees using Gen AI, with the leaking of sensitive data almost as common as false or inaccurate results, research by Komprise has found.
And while companies are racing to manage their data, they are also playing catchup when it comes to storing it and managing it, the data management vendor’s AI, Data and Enterprise Risk study found.
Over two-thirds said infrastructure was a top priority when it came to supporting AI initiatives, with 9 percent saying it was the most important thing after cybersecurity.
Over a third identified increasing storage capacity as their top storage investment when it came to AI, with 37 percent identifying data management for AI – on the basis that AI was only useful when it incorporated the organization’s own data.
And just under a third said acquiring “performant storage to work with GPUs” was their top priority. Overall, 46 percent said all three paths were top priorities.
Just finding and moving the right unstructured data was a key challenge for 55 percent of companies, with lack of visibility across data, and the absence of “easy ways to classify and segment data” also key concerns.
And a third of respondents said they were having “internal disagreement on how to approach data management and governance for AI.”
Krishna Subramanian, co-founder of Komprise, said companies were starting to investigate tooling to enforce strong AI governance and compliance. The alternative was company data leaking and become “part of the public LLM.”
Let’s get tactical
“A top tactic is classifying sensitive data and using workflow automation to prevent its improper use with AI (73 percent). More than half (55 percent) are also instituting policies and training their workforces.”
This would seem obvious, she said, “But it’s encouraging to see that it’s actually taking place.”
And some are restricting the use of public Gen AI tools as they rollout their own internal tools.
Customers were trying to get better visibility into their data, so they could manage it, Subramanian said, “and are looking into tagging to classifying and segment data as well as automation to help feed the right data to AI and monitor the outcomes.”
She said that the reality was few companies would be training their own models at any great scale. That means they have less need for GPUs and GPU accessible storage. But it does mean they would have to get to grips with unstructured data.
“Rather, your focus is on getting the right corporate data to pre-trained models so they can deliver optimal business outcomes. Curating data for AI is emerging as the next phase and core investment in AI.”
“As the inferencing market begins to take off, the focus will be on helping enterprises use AI effectively with their own data,” Subramanian said. “After all, models have already been trained on all the publicly available data.”