CTERA launches Data Intelligence to link file data to AI models

CTERA says its new Data Intelligence offering supports retrieval-augmented generation (RAG) by linking its cloud file services data to customer-selected GenAI models and giving them real-time private context to help prevent inadequate responses.

Oded Nagel, CTERA
Oded Nagel

GenAI chatbots can provide poor responses to user requests when relying solely on their generic training data sets. Giving them access to an organization’s proprietary data is like giving a hotel’s concierge staff access to its guest register so they know who is staying in the hotel and what their preferences are, making for a much better service. CTERA thinks that, while AI services offer some methods for uploading files, these integrations fall short in their ability to handle live enterprise file storage, impose extensive network and compute overhead, and expose organizations to sensitive data leakage.

CEO Oded Nagel stated: “With CTERA Data Intelligence, we combine our expertise in secure file services, connecting distributed object and file data sources with automated AI data processing. We’re enabling AI that truly understands your data – delivering relevant, up-to-date insights grounded in the unique context of your enterprise, all while ensuring the highest levels of data privacy and security.”

CTERA Data Intelligence capabilities include:

  • AI that knows your data: A semantic RAG engine that uses the CTERA Notification API to constantly update its knowledge with live data sources.
  • Identity-based enforcement: Restricts AI data visibility based on the granular file-level access permissions (ACLs) of the currently logged-in user.
  • AI experts: Customizable virtual assistants with predefined personas and domain-specific knowledge scopes.
  • Fully private: On-premises option for organizations that wish to keep their data and LLM 100 percent private.
  • Distributed ingestion: In-cloud and edge data processing using the CTERA Direct protocol, eliminating costly egress fees and eliminating latency impact.
  • OpenAI and Microsoft Copilot integration: Agentic extensions for popular AI services with SSO authentication.
Aron Brand, CTERA
Aron Brand

We asked CTERA CTO Aron Brand some questions to find out more.

Blocks & Files: How is ingested data vectorized? Which vectorizing engine is used?

Aron Brand: The vectorizing engine is configurable. Customers have a choice between public embedding models like OpenAI’s and private Ollama-based embedding models of their choice.

Blocks & Files: How does the system recognize incoming data and trigger the vector engine?

Aron Brand: The CTERA Notification Service, which is part of the CTERA SDK, provides an API for microservices to subscribe to file events that meet a certain filter criteria. CTERA Data Intelligence uses this API to trigger the ingestion when needed according to a predefined policy.

Blocks & Files: How are the vector embeddings stored? Where is the index of vector embeddings (used for semantic search)  stored, i.e. in which public cloud repositories?

Aron Brand: The embedding vectors and index are stored in a database that is part of the offering. Customers have the option to deploy it on-prem or in-cloud based on their security and performance needs.

Blocks & Files: Does CTERA have a capability for supporting edge location GenAI LLMs?

Aron Brand: Yes. This is a multi-LLM platform. Meaning you can configure multiple LLM targets to work simultaneously based on the usage profile. Those LLMs can be either public (like OpenAI), hosted open source (like groq or Together AI), or fully private (Ollama-based).

Blocks & Files: What kind of unstructured data can be ingested from which source systems?

Aron Brand: Our focus for now is on data from the CTERA global file system. So any files stored in the CTERA Portal can be ingested. We support a range of file formats, including PDF, office files, and media.

Blocks & Files: Is a demo available showing a customer employee interaction with a CTERA RAG system?

Aron Brand: See this video

CTERA Data Intelligence dashboard
CTERA Data Intelligence dashboard

Blocks & Files: What about customizable virtual assistants, are these agentic?

Aron Brand: The CTERA Experts are virtual assistants that have predefined content scope, agent profile, and target LLM. Customers can use them to create domain specific “experts” that can be used with the built-in web UI or in an agentic way from external AI services such as OpenAI and Copilot.

Blocks & Files: Tell us more about the OpenAI and Copilot integration.

Aron Brand: CTERA Data Intelligence is accessible from OpenAI and Copilot as an external agent. Customers can create a new GPT or Copilot app that is defined to use the CTERA Data Intelligence API for fetching data from CTERA and embedding it into its context.

On a general note, you might have noticed some analogies to our global file system architecture: Multi-Cloud/Multi-LLM, Edge Filer/Edge Ingestion Data Connectors, file system metadata database/embedding database, public-hybrid-private  deployment options, end-to-end file permissions enforcement, etc. This is no coincidence as the challenges enterprises are facing when extending AI services to corporate unstructured data are similar to those faced when creating global file systems, having to deal with distributed locations, security compliance, and flexibly deployment options.

The first stage of our journey was being able to connect unstructured data under a global file system, as done by the CTERA File Services Platform, which is a prerequisite for enabling AI. Once we solved that, the next phase is moving from regular file system metadata to AI metadata and enabling AI inference workflows and agentic access. This is what CTERA Data Intelligence is all about.

 ***

Camberley Bates, Chief Technology Advisor at The Futurum Group, said: “The success and progress of enterprise AI projects are very much linked to the data quality including classification, security, and data governance requirements … By directly addressing these requirements, CTERA Data Intelligence has the potential to significantly accelerate AI adoption across the enterprise landscape.”