High-quality data key for effective AI agents

COMMISSIONED: As enterprises increasingly adopt GenAI-powered AI agents, making high-quality data available for these software assistants will come into sharper focus.

This is why it’s more important than ever for IT leaders to get their data house in order. Unfortunately, most IT shops’ data houses may be messier than they should be, as poor data quality remains one of the biggest challenges confronting organizations’ generative AI strategies.

In fact, 55 percent of organizations avoid certain GenAI use cases due to data-related issues, according to Deloitte research. The consultancy also found that 75 percent of organizations have increased their tech investments around data lifecycle management due to GenAI.

“I think we’re probably spending as much time on data strategy and management as on pure GenAI questions, because data is the foundation for GenAI work,” the chief technology officer at a manufacturing company told Deloitte.

As IT leaders, you know that high-quality data is critical for organizations seeking value from their GenAI use cases – and this is especially true for agentic architectures.

Agents are capable of “thinking,” essentially reasoning, planning, making decisions and learning from feedback. Memory systems help make this possible, ensuring that agents can retrieve information, including relevant context, procedural knowledge of how to execute processes and details about past events.

Enthusiasm for these tools is robust, with 82 percent of organizations saying they expect to adopt AI agents in one to three years, Capgemini says. Eventually, experts say, multi-agent systems – an entire host of agents communicating with each other and other applications as they execute tasks without human intervention – will automate entire workflows and business processes.

That is the dream, anyway. What is real today is that poor data hygiene – data rife with errors or duplications – can break an agentic AI system.

The data preparation playbook

Agents require the right information – high-quality data – in their moment of need to complete their tasks. This user guide can help:

Define requirements: What do you wish to accomplish with GenAI? Do you want to create basic digital assistants, autonomous agents or something else? Will a small language model do, or do you require a LLM? Be sure to account for specific features and attributes you want your applications to execute.
Identify and collect: You’ll want to figure out the data you need to achieve your goals. Instruct your data architects and engineers to collect the data necessary to train your systems.

  • Clean: “Cleaning” data means handling missing values, correcting errors, removing duplicates and addressing outliers. Use undersampling to remove data points from majority groups and oversampling to duplicate data points from minority groups.

Preprocess: You’ll preprocess the data, which may include tokenizing texts, resizing images or extracting audio features. This will make it suitable for training.

Label: You’ll manually assign labels to each data point, underscoring what the data represents. Although time consuming, labeling is essential for training a high-quality model.

Organize: Organize the data for training your model, which includes splitting the data into training, validation and test sets. This is no trivial task, as many organizations struggle with organizing their data.

Model training: With high-quality and well-organized data, you can begin training. This is where the model learns to generate new data consistent with the patterns present in the training data.

Model evaluation: After training, you should evaluate the generative model’s performance using the validation and test datasets. Assess text, images or other outputs to ensure they meet your desired criteria.

Monitor: Regularly monitor the model for errors, inconsistencies and data outliers. Monitoring your data quality helps ensure that your GenAI models are always using the best possible data – and should help you avoid bias creep. Consider bias detection tools to examine your data as you work.

The takeaway

As an IT leader, you can help shape the outcomes of your agents – and any GenAI application or service – by following best practices. This will require embracing data management solutions.

From vector databases that help manage the copious amounts of data LLMs generate to data lakehouses that help consolidate data siloes, there are several emerging tools at your disposal.

Fortunately, trusted advisors can help you navigate this new AI landscape. Dell Technologies offers accelerator workshops, or half-day events in which Dell consultants work with your business and technology stakeholders to brainstorm ideas for AI use cases that can augment your business.

Learn more about the GenAI Accelerator Workshop.

Brought to you by Dell Technologies.