Canada wants to phase out data copying

The Standards Council of Canada has approved an IT-related standard which would eliminate new data silos and copies when organizations adopt or build new applications, thus apparently preventing any data copies being made with new applications. However, last-time integration could bring in legacy data via copies.

The CAN/CIOSC 100-9, Data governance – Part 9: Zero-Copy Integration standard can be downloaded here

It states: “The organization shall avoid creation of application-specific data silos when adding new applications and application functionality. The organization shall adopt a shared data architecture enabling multiple applications to collaborate on a single shared copy of data. The organization should continue to support the use of application-specific data schema without the need to generate application-specific data copies.”

This standard was created by two Canadian organizations – the nonprofit Digital Governance Council trade body (DG, previously the CIO Strategy Council) and the nonprofit Data Collaboration Alliance.

Keith Jansa, DGC Canada
Keith Jansa

The DCG told us: “The CIO Strategy Council Technical Committee 1 on Data Governance comprises policy makers, regulators, business executives, academics, civil society representatives and experts in data management, privacy and related subjects from coast-to-coast-to-coast.”

Keith Jansa, Digital Governance Council CEO, issued a statement saying: “By eliminating silos and copies from new digital solutions, Zero-Copy Integration offers great potential in public health, social research, open banking, and sustainability. These are among the many areas in which essential collaboration has been constrained by the lack of meaningful control associated with traditional approaches to data sharing.”

The problems the standard is meant to fix center on data ownership and control, and compliance with data protection regulations such as California’s Consumer Privacy Act and the EU’s General Data Protection Regulation (GDPR). The creation of data copies, supporters of the standard say, transfers data control away from the original owners of the data.

Dan DeMers, CEO of dataware company Cinchy and Technical Committee member for the standard, said: “With Zero-Copy Integration, organizations can achieve a powerful combination of digital outcomes that have always been elusive: meaningful control for data owners, accelerated delivery times for developers, and simplified data compliance for organizations. And, of course, this is just the beginning – we believe this opens the door to far greater innovation in many other areas.”

DGC says viable projects for Zero-Copy Integration include the development of new applications, predictive analytics, digital twins, customer 360 views, AI/ML operationalization, and workflow automations as well as legacy system modernization and SaaS application enrichment. It is developing plans for the advancement of Zero-Copy Integration within international standards organizations.

Implications

We talked to several suppliers, whose products and services currently involve making data copies, about the implications for them.

WANdisco’s CTO Paul Scott Murphy told us: “At first glance, the standard may appear to push against data movement, but in fact, it supports our core technology architecture. Our data activation platform works to move data that would otherwise be siloed in distributed environments, like the edge, and aggregates it in the cloud to prevent application-specific copies from proliferating. 

“Our technology eliminates the need for application-specific data management and ensures it can be held as a single physical copy, regardless of scale.”

He added: “Notably, there’s a particularly important aspect of the new guidance on preventing data fragmentation by building access and collaboration into the underlying data architecture. Again, our technology supports this approach, dealing directly with data in a single, physical destination (typically a cloud storage service). Our technology does not rely on, require, or provide application-level interfaces. 

“In response to the standard, Canadian organizations will need to adopt solutions and architectures that do not require copies of data in distributed locations – even when datasets are massive and generated from dispersed sensor networks, mobile environments, or other complex systems.” 

A product and service such as Seagate’s Lyve Mobile depends upon making data copies and physically transporting the copied data to an AWS datacenter or customer’s central site. Both would be impacted for new apps if the Canadian Zero-Copy Initiative was adopted.

A Seagate spokesperson told us: “Seagate is monitoring the development and review of Zero-Copy Integration Standard for Canada by technical committee and does not speculate on potential adoption or outcome at this time.”

What does the standard mean for backup-as-a-service suppliers Clumio or Druva, makers of backup data copies stored in a public cloud’s object store.

W Curtis Preston, Chief Technical Evangelist at Druva, told us: “This is the first I’ve heard of the standard.  However, I believe it’s focusing on a different part of IT, meaning the apps themselves. They’re saying if you’re developing a new app you should share the same data, rather than making another copy of the data. The more copies of personal data you have the harder it is to preserve privacy of personal data in that data set. I don’t have any problem with that idea as a concept/standard.

“I don’t see how anyone familiar with basic concepts of IT could object to creating a separate copy of data for backup purposes. That’s an entirely different concept.”

Poojan Kumar, co-founder and CEO of Clumio, told us: “This is an important development; companies are being encouraged to use a single source of truth – such as a data lake – to feed data into apps, analytics platforms, and machine learning models, rather than creating multiple copies and using bespoke copy-data platforms and warehouses.”

He added: “Backup and DR strategies will evolve to focus on protecting the shared data repository (the data lake), rather than individual apps and their copies of data. We have maintained that backups should be designed not as ‘application-specific’ copies, but in a way that powers the overall resilience of the business against ransomware and operational disruptions. This validates our position of backups being a layer of business resilience, and not simply copies.”

DGC view

We have asked the DGC how it would cope with examples above and what it would recommend to the organizations wanting to develop their IT infrastructure in these ways. Dan DeMers told us: “One of the most important concepts within the Zero-Copy Integration framework is the emphasis on data sharing via granting access for engagement on uncopied datasets (collaboration) rather than data sharing via the exchange of copies of those datasets (cooperation).

“But as you point out, many IT ecosystems are entirely reliant upon the exchange of copies, and that is why Zero-Copy Integration focuses on how organizations build and support new digital solutions.”

Legacy apps can contribute, he said. “One capability that is not defined in the standard (but we are seeing in new data management technologies such as dataware) is the ability to connect legacy data sources into a shared, zero-copy data architecture. These connections are bi-directional, enabling the new architecture to be fueled by legacy apps and systems on a ‘last time integration’ (final copy) basis.

“It’s like building the plane while it’s taking off in that sense – you’re using a Zero-Copy data architecture to build new solutions without silos or data integration, but it’s being supplied with some of its data from your existing data ecosystem. 

“It’s all about making a transition, not destroying what’s working today, so the scenarios you outlined would not be an issue in its adoption.”