How do storage suppliers find a way to be relevant in the ChatGPT world? Generative AI using large language models (LLM) has received an enormous boost with ChatGPT and its successors. IT industry execs including Nvidia’s Jensen Huang, Alphabet’s Sundar Pichai and Microsoft’s Satya Nadella are all saying this is an epoch-defining defining moment in AI development and tech.
Several storage suppliers are directly involved in delivering storage capacity to the GPU servers used in generative AI processing, including Dell, DDN, HPE, IBM, NetApp, Panasas, Pure Storage, VAST Data and Weka. Others are fielding data sets for use by the models such as Databricks, Snowflake and their ilk. Building vector databases? Think Pinecone.
Every storage supplier faces the same problem here. How do you define and promote your product/service relevance in this generative AI tidal wave?
They may all see their markets being affected in some way by AI and could have to add AI interfaces for their customers or find a way to help their customers use AI resources in some way. Data lifecycle and governance manager Komprise is already walking along this path as we found out through an interview with president and COO Krishna Subramanian.
Blocks & Files: Is there a role for data management/governance in this new world of generative AI?
Krishna Subramanian: Yes there is, and we are calling this SPLOG or Security Privacy Lineage Ownership Governance.
If you have sensitive or PII (personally identifying information) data and are using AI on it, how do you protect that proprietary and protected data? You don’t know if you are generating results from another company’s PII. Your own company’s PII/IP data could get leaked. There are few controls here. How to keep your data in your domain, prevent leakage so it doesn’t get used by a general LLM.
Data lineage is another way to describe data transparency: if an LLM gets data from general sources how do you know if its verified, with consent, does is contain bias or inaccuracies? There is no requirement to share source data now and very difficult to track data sources.
There is a data ownership angle. Who does the IP belong to if you use a derivate work and who’s liable if something goes wrong? This requires legal coordination and likely regulation.
From a data governance viewpoint you need to ask how do you know who did what with which data, so you can ensure compliance and investigate any issues that may happen with your data in an LLM? You need a framework for this internally.
Blocks & Files: How can Komprise help?
Krishna Subramanian: Our software can get you a solid understanding of your own data wherever it resides. Where’s your sensitive PII and customer data? Can you monitor usage to make sure these data types aren’t inadvertently fed into an AI tool and to prevent a security or privacy breach?
You can tag data from derivative works by the owner, or the individual or department that commissioned the project to help with compliance and tracking.
It can recognize when unintentional data leakage occurs and alert when sensitive or corporate data is shared with LLMs.
Blocks & Files: Could you use a generative AI to monitor another generative AI’s use of data?
Krishna Subramanian: As you know, generative AI is basically good at things like natural language processing that require deep pattern recognition and the ability to “generate” new content based on prior learnings. So, could a generative AI be used to look for patterns in another AI’s output to try and police things? The answer is yes, and we already see examples of this in tools that are now being used to spot if students are cheating using ChatGPT.
However, this assumes that you can see recognizable patterns in what generative AI creates – this is certainly true right now, but as generative AI advances, I think this will become harder and harder to do. In summary, I would say yes, AI can be used to spot other AI’s use of data as long you can characterize what you are trying to police in some recognizable patterns (e.g. use of PII or detect certain proprietary IP patterns).
The power and the danger with generative AI lies in the fact that it is not deterministic – just as you cannot always control a toddler’s behavior, you cannot deterministically predict the output of generative AI, making it hard to police.
Blocks & Files: Could Komprise add a generative AI interface to its capabilities?
Krishna Subramanian: Komprise can add a conversational element to our interface powered by Generative AI that responds to prompts, and uses as underlying logic an analytics-driven data management framework leveraging both generative as well as more deterministic machine learning and analytics-driven predictive automation techniques to address data governance, security and data management.
So, generative AI adds a natural-language interface to the analytics-driven data management that Komprise already offers, but in such a way that the logic is still deterministic and verifiable.
Blocks & Files: Could Komprise, for example, ask or request a Komprise generative AI: “What PII data do I have in my data estate?” Or ” Is any not protected?” And “Apply protection policy X to all unprotected PII data.“
Krishna Subramanian: Komprise already answers these questions, except that you would phrase these currently as queries in Komprise as opposed to natural language prompts.
Komprise analyzes all the data it is pointed at, creates a Global File Index and classifies the data so you can create such queries as well as take actions using Komprise Deep Analytics Actions by setting policies. Adding a generative AI chatbot will create the ability to interface with natural language to the functionality we already provide.
In addition, a key role we can provide through data management is to help organizations ensure the security and privacy of their data as their employees interface with AI applications – by answering questions like, “Am I exposing PII data to an AI application”, or flagging when employees may be inadvertently sharing corporate data and creating data leakage with an AI application. This goes beyond Komprise using AI in its own interface to Komprise helping organizations leverage generative AI safely while protecting their data security and privacy.
Blocks & Files: Could a Komprise generative AI be made proactive? For example, ask a Komprise generative AI “What unprotected PII data do I have in my data estate?” The response is: “New data set Y contains a list of all unprotected PII data. Shall I protect it?” What do you think?
Krishna Subramanian: Absolutely – “adaptive automation” is the broader category you are referring to, and this has always been our goal. Generative AI is simply one tool in achieving this, but more broadly, how can we continually add to our analytics to proactively learn more from data, can we anticipate what could happen and protect against it, can we be smarter in how we automate – these are all the areas where we see data management evolving.