Arcitecta now adept at AI

Arcitecta has evolved its Mediaflux data access and management fabric to an AI-ready platform with structured and unstructured data support, a vector database, and AI model support.

MediaFlux is distributed data management software supporting file and object data storage with a single namespace and tiering capability. It works across the on-prem, public cloud, and hybrid environments, using storage tiers formed from SSDs, disk, and tape. It has a Livewire data mover and metadata database. Mediaflux Multi-Site, Edge, and Burst offerings help geo-distributed workers get fast access to shared data – text, images, video, time-series, etc. – with Mediaflux Real-Time offering virtually instant access to content data. Now Mediaflux supports AI workflows and users increase AI applicability in areas such as cancer research, genomic analysis, defense, M&E, finance, government, and scientific discovery. 

Jason Lohrey.

CEO Jason Lohrey stated: “As organizations increasingly rely on AI and machine learning, the challenge of making vast, diverse datasets accessible and usable for AI training has become paramount. With an enhanced version of Mediaflux that powers AI, we are delivering a revolutionary data fabric that integrates any data asset into an AI-ready resource pool, allowing our customers to achieve better models faster and with unparalleled operational efficiency.”

He reckons: “This integrated approach bypasses the need for fragmented software development tools and separate vector stores, setting a new standard for AI data management. The result will be outcomes such as transformative advancements in cancer research, accelerated drug discovery and preservation of the world’s most important cultural archives.”

There are many potential data stores that could contribute data to AI models, such as transactional databases, analytic databases, SaaS transactional systems such as CRM, data warehouses, lakes and lakehouses, unstructured file and object stores, backup stores and archival data vaults. We just enumerated ten types of data silo. Getting data in these silos ready for AI can involve silo-specific or partchworked tools for ingest, tagging, ETL plus a vector database and vector search capability.

Arcitecta defines this as a mess and claims Mediaflux is a unified data management platform designed from the ground up to turn all your data—text, images, time series, genomics, and more—into AI-ready assets.

Lohry told us in a briefing: “Basically you need data to train your models. Indeed. That’s been the whole mission along the way; to get as much data from disparate sources into the one system, a single pane of glass, and enable your pathways to AI. And for me, actually in some ways, there’s nothing particularly special about AI. It’s a compute framework. It’s computation that’s just using a particular method; vectors basically.”

He said: “We see people moving towards unlocking the value of AI or the perceived value of that. You actually need really good infrastructure and software to drive that. It’s pretty clear though that if you win that race in many ways, then you are competitively advantaged compared to others.”

And: “You need unified data platform. So that’s our role. We are not in the business of doing the models themselves. Our job is to house the output of the models, the vectors themselves, and just treat it as general data. So we will orchestrate the external systems to go and do their analysis.”

For Arcitecta that means: “I’m heading us into a position where we are the data company basically; that’s all. Not the producer of it. Everything we do is centered around data. Doesn’t matter whether it’s in your traditional database or your non-database, your unstructured data, it’s all the one thing. So you end up with this platform that brings it all together and spans on-premises to the cloud.”

From the B&F point of view Mediaflux AI – our name – doesn’t just make make data passively available to AI. It actively helps AI with a set of features providing a linkage layer between data and AI models, that helps make data ready for use by AI, without involving separate, external infrastructure components. Its features include:

  • Metadata – a schema-less metadata catalog and fabric covering diverse data sources with full metadata and vector indexing in a single system, combining metadata, vector, file and object data across multiple locations
  • Vector embeddings stored in Arcitecta’s in-house developed XODB database
  • RAG-ready data and pipeline 
  • Similarity search with native vector search engine and fast semantic queries across trillions of records in milliseconds
  • Built-in pipelines  to automate ingest, tagging and transformation
  • Single-pane orchestration 

The XODB database is a foundational pillar for AI-ready Mediaflux and has built-in capabilities for vector embeddings and plugin support for new models managed within Mediaflux. It supports object, time-series, geospatial and vector data, and manages metadata in real time, instantly directing users toward their data, regardless of scale or location.

Vector search is critical here. Lohry tells us: ”We do see, obviously, one of the things that underpins AI vector database is in vector maths. … The beauty of vectors is we don’t have to know exactly what they mean. We can perform similarity matches on them just using very standard maths.”

For full, high-res image click here.

We shouldn’t think of XODB as being localized: ”An emphasis for us is to say the database is not necessarily in one place. It can be in multiple places and actually joined together. You might do your analysis on data that’s 2000 kilometers or 20,000 kilometers away and produce outcomes that come back into a central system or remain in place where those are for distributed querying.  The end user can get at that global network.”

XODB has a uniqueness, with Lohry saying: “I don’t know of anybody else that’s put a database in the middle of their file system and done it successfully. … A file is just a container of data. So it’s just data, and that’s all in the database and that’s how we get to the ability to do a wild card searches across file systems. …When we show people this, they are just gobsmacked and can’t believe it. But you can do an arbitrary wild card search for digital certificates in two or three building files and do it in tens of milliseconds and then go and do the same search across all points in time.”

“It’s very fast and … it handles objects, [in] which a file is an object, a directory is an object, a person is an object, an airplane is an object and has time series aspects.

This is a key point for him and Arcitecta: “We’re a database company. You can’t be in the space of managing data and have sophisticated data management systems really without a decent database because that is the limiting factor of any system. The more densely you can pack your information, the better you are. And the holy grail in databases is maximizing the information content per unit byte that you’ve got. And if you do that, then you can get at it much more quickly. And again, that’s why building these systems and incorporating vector support actually puts us in a fairly unique position with supporting AI. You’ve got everything in the one system without trying to integrate multiple systems.”

Arcitecta thinks its updated AI-ready Mediaflux offering is a good for for organizations working with massive data volumes needing a scalable, high-performance infrastructure in areas such as R&D, data science, genomics, medical imaging and machine learning Ops. Lohrey says of Arcitecta: “I really do emphasize the point that we are a data-based company as much as we are a data management company and all of this is just part of that whole persona of being in and around data.”

The new Mediaflux AI-ready capabilities are available as an integrated part of the existing Mediaflux platform. It is licensed by user count, with no capacity-based fees, and, Arcitecta says, offers a pricing edge compared to patchworked tools.

Read more in an Unlocking AI at Scale: Why Mediaflux is Your Data Fabric for the AI Era blog and Mediaflux: AI-Ready Data Fabric with Native Vector Search solution brief doc.

Comment

Object storage supplier Cloudian has added the Milvus vector database into to its Hyperstore offering. Scality supports external vector databases. Data orchestrator Hammerspace does not explicitly support a vector database. Ditto data manager Komprise. Both can help with the collection of data and its feeding to a vector embedding model with resulting vectors stored in an external vector database.