AWS is promising more disciplined and cost effective AI data pipelines with new services through its SageMaker and S3 Tables tools that go live today.
As part of its contribution to Pi Day, the cloud giant has announced general availability of SageMaker Unified Studio, which Sirish Chandrasekaran, AWS vp of analytics, told Blocks and Files was a single development environment with an integrated set of services across AWS’s data analytics and AI/ML services.
It pulls together the vendor’s Lakehouse platform, its SageMaker Catalog, “which is the governance layer”, while with “The studio…you can do everything from SQL, analytics, data prep, data integration, model building, generative AI app development, all in one place.”
It has added new models under SageMaker AI, he said, such as Claude 3.7 Sonnet and Deepseek R1.
“We’ve added capabilities like latency sensitive inferencing for specific models from Anthropic, Meta and Amazon. And we’ve also made it simpler in terms of how you can use Bedrock to both prototype applications but also share them across team members.”
AWS has also announced the ability to access S3 Tables from within SageMaker Lakehouse. “You can now run …. SQL, Spark jobs, model building, Gen AI apps. You can combine your S3 Table data with other data in your Lakehouse, whether it’s in Redshift with what we call native party on S3, on premises and federated sources, all of it you can bring together.”
This would all help companies – or at least those using AWS services – build a better data foundation for their AI projects, he said.
“Our perspective is the way you differentiate is through your data, because every modern business is a data business, and what’s unique to your company is your data.”
“What we’re seeing increasingly with our customers is that … the silos are slowing them down,” he said, because of challenges bringing data into the same place or collaborating between different teams. At the same time, in other organizations, the silos were blurring, he said.
Clearly some companies are rushing to pull data together as they dive into AI. This had led to fears that traditional data management disciplines and skills were being left by the wayside. Chandrasekaran said he was seeing the opposite. “What I’m seeing a lot from companies is that they have this realization now that the way they move faster is by getting back to basics.”
“How we’re reimagining the SageMaker Lakehouse, a lot of it is being able to query data where it is. You do not need to now transfer data from Redshift to S3 or from S3 to Redshift. You can query Lake data from Redshift.”
This reduced duplication, he said. “And that obviously saves costs, same as federated sources.”
At the same time he said, companies were acutely aware of the need for governance, “But I think what’s different about this new world is that governance is no longer just about compliance. It’s about confidence.” That include confidence that AI projects are using and been trained on trusted data, “but also confidence that your AI is adhering to responsible AI use policies.”