Nasuni aims to become a data intelligence company, using software to analyze and locate automatically indexed and tagged file data.
Jim Liddle, VP for Nasuni’s Access Anywhere Product, told us the business – which currently supplies cloud-based file services, in contrast to on-premises filers – has exceeded a $100 million annual revenue run rate, and stepped over the 750 customer mark. Many of its clients are large and geo-distributed enterprises, he added. It has some 500 staff, runs in the three big public clouds, has a CEO with experience of running listed companies, and a new CRO. What’s next?
Company-wise it looks like an IPO may be on the cards, and in terms of technology Nasuni wants to build on its file metadata base, both penetrating file data and broadening its overall scope.
“We index all of a customer’s data today. We’re working on expanding this to include the content.” That means reading files and indexing the content. How? “Our indexer integrates Apache solr and we use Apache Tika as the metadata extractor.”
Apache solr is an open-source full text search software platform using Apache Lucene. Liddle said Lucene and Tika need to be integrated with a data source file, and given rules to tell them what to detect and extract. The rules can define semantic terms and also words to ignore.
“We use graph technology to build our content index.” Storage admins can create their own rules and Nasuni adds tagging on top of these.
All files that are content-indexed have to be fully read, but subsequent updates are read on their own with any fresh content index entries added to the existing file’s content index.
A search on the content index returns a path to the file. Users instituting such searches can only access files they are allowed to see, with an Active Directory or similar permissions facility used. A user could search for files with content related to a particular project name and have them tagged with a group identifier. This has obvious applicability to legal activities and grouping files which mention a product, a department, a subsidiary, a person, a customer, etc.
The next stage would be to use this content indexing to automate the discovery of sensitive and/or personal identification information (PII). This PII detection could work in background mode and result in an automatic quarantining of discovered files.
The third stage, influenced by Large Language Model AI/ML software, would augment the content indexing on a vertical basis using what Liddle called “Big AI.”
For example a user could ingest a video into their file system. Previously they have set up video content rules. The video is passed to Google Vision AI which can classify images and videos and their content. It returns metadata which could include a scene-by-scene description, an audio transcription, a copyright yes:no indicator, and individuals in an image or video. This is passed back to Nasuni which adds it to the file’s existing metadata and so it becomes part of Nasuni’s file metadata search space.
We have a 3-part roadmap here that includes Nasuni adding full text file indexing, search and actions based on the indexed content, and finally media file content indexing. It will no longer just be providing data access but helping management activity based on file content.
Liddle said: “Nasuni is a data management company moving into data intelligence.” Data governance, CTERA’s focus, is part of this, but data intelligence means much more than governance. Automated content indexing and searching open doorways into an organisation’s filed content memories and enable it to discover and respond to much, much more than database records and website form box input.