Home Blog Page 75

Cloudian gets more funding as it reaches breakeven, bags HPE GreenLake wins

Object storage demand is growing; Cloudian has gained $23 million in funding for its growth as its business, now at the breakeven stage, notches up four customer wins through HPE’s GreenLake subscription service.

Cloudian claims its geo-distributed, S3-compatible HyperStore is used for AI data lake storage and its experiencing an year-on-year 30 percent increase in annual recurring revenue (ARR). A Hyperstore data lake stores and feeds data to large language models  and works with AI and data analytics tools such as PyTorch, Tensor Flow, Kafka, and Druid. Cloudian has a partnership with HPE where HyperStore is sold as a GreenLake service.

Michael Tso, Cloudian CEO and co-founder, issued a statement. claiming Cloudian was experiencing “substantial growth as enterprises increasingly leverage Cloudian’s AI-ready data lake technology to create insights and advance their AI initiatives. Achieving breakeven alongside this growth marks a pivotal moment in our financial journey.”

Michael Tso.

The funding comes from Morgan Stanley Expansion Capital and its exec director, Stanley Hua, said: “We are delighted to support Cloudian in the next phase of growth and to help the company continue delivering exceptional value to its customers.” Cloudian will use the cash to pay for product development and sales and marketing initiatives.

Cloudian was founded in 2011 and has raised some $256 million funding with $60 million added to its coffers last year.

Customers gained through GreenLake include a global retailer where Cloudian is helping its Splunk data analytics infrastructure. This retailer, which could be Carrefour, has more than 200,000 staff, and switched its backend Splunk storage from HPE 3PAR/Primera to Hyperstore, lowering its storage costs by 51 percent.

A second GreenLake Cloud customer is cloud service provider Verinext, which provides fully managed backup, disaster recovery and storage services using Cloudian’s AI data lake. We understand two more Cloudian service provider deals are to be announced shortly.

HyperStore supports the AWS Mountpoint facility, meaning applications running in AWS can issue file IO requests to HyperStore running in AWS. That means AI apps running in AWS could use HyperStore data lake storage, pumping data to GPU instances. With Hammerspace’s Global Data Environment now supporting S3 it could be used to provide a high-performance bridge between on-premises HyperStore and NVIDIA GPU servers.

Cloudian’s Hyperstore software is available from the AWS Marketplace, HPE GreenLake, Lenovo, and from Cloudian’s worldwide roster of resellers. The company has relationships with data analytics vendors such as Snowflake, Teradata, Microsoft SQL Server, VMware Greenplum, and Vertica. 

Nasuni expands operations in France

Following an upsurge of business in France, Nasuni is bulking up its office with extra hires and more partners.

Nasuni is a cloud file services supplier utilizing edge caches to accelerate IO to object-based storage in the Amazon public cloud. It believes its data managing file system provides fast enough access to enable customers to move away from on-premises NAS systems such as NetApp filers. Nasuni is also building on the GenAI boom and offering data intelligence services.

Chris Addis, Nasuni
Chris Addis

Chris Addis, EMEA VP of Sales at Nasuni, stated: “Our expansion in the French market across sales, technical sales, and partnerships reflects the growing demands we’re seeing for the Nasuni File Data Platform in supporting enterprises with these [AI implementation] challenges and driving growth. We are excited to have now achieved critical mass in France, and this rapid growth alongside the growth of our partner network marks an exciting time for Nasuni as we continue to expand operations in Europe.”

The company started operations in France in 2017 and customers there include nine of France’s top 50 market cap companies such as Pernod Ricard, TBWA, Colas, Safran, and France Habitation. French clients run across different market sectors with a focus on automotive, manufacturing, consumer goods, engineering, and energy.

Addis writes: ”While our current success in France stems from Nasuni’s core benefits, the future looks even more promising, as our advanced ransomware protection services, Fit-for-AI capabilities, and latest innovations are well matched to the needs of French companies.”

He expects Nasuni to take advantage of AI interest in France, pointing out: “The AI market in France is projected to grow by over 28 percent in the next six years, and the importance of cyber resilience is well understood in the region, as the country is a frequent target of ransomware attacks. We’re building out an entire go-to-market team focused on the region. We’re going to strengthen local user groups.” 

Addis told us: “As part of Nasuni’s expansion into the French market, we have hired a Regional Sales Manager, Solution Architect and Sales Development Representative, who work alongside our existing French-speaking support staff. These new hires are helping us to drive further growth in the region, which includes expanding our partnership network. We currently work with AWS, Google and Microsoft as our cloud partners and an additional four channel partners in France, this includes arcITek who have been a key partner of ours since 2017.”

That means three new partners, one of whom we understand to be Abomicro, although you’ll find neither it nor arcITek listed under France in its Partner Base webpage.

Nasuni is on a charge. It’s recently recruited new execs, rebranded to increase its image and messaging, and is working with Microsoft’s GenAI Copilot. The AI surge could give it the momentum needed to run an IPO next year, and enable it to leave competitors CTERA and Panzura in their pre-IPO states.

Nasuni will hold a customer meetup in France in October. Sign up for attendance here.

Ransomware thieves beware

SPONSORED FEATURE: You know that a technology problem is serious when the White House holds a summit about it. Ransomware is no longer a simple nerd-borne irritation; it’s an organized criminal scourge. Research from the Enterprise Systems Group (ESG) found 79 percent of companies have experienced ransomware attacks within the last 12 months. Nearly half were getting attacked at least once each month, with many reporting attacks happening on a daily basis.

From the early days of enterprise ransomware, security pros had one common piece of guidance: back up your data. It’s still good advice, even in the era of double-extortion attacks where criminals exfiltrate victims’ information while encrypting it. But there’s a problem: attackers are very aware of your backup systems, and they’re searching for them while also looking for production data to encrypt or exfiltrate.

A typical ransomware attack starts when the attacker gains a foothold, often through phishing emails or exploited/unpatched vulnerabilities. Once inside, attackers aim to locate and encrypt production data to cripple operations.

Increasingly, though, they’re also searching for backup environments and data. If they find them unsecured they’ll encrypt that too, hampering recovery efforts. In fact, some attacks – such as 2021’s REvil attack on Kaseya – target backup systems first to ensure that backups will be useless after the malware scrambles production data.

According to Veeam’s 2023 Ransomware Trends Report, 93 percent of cyber attacks last year targeted backup storage to force ransom payments. Attackers successfully stopped victims’ recovery in three quarters of those cases said the company, which specializes in backup and recovery software and services.

Companies are aware of the problem and are looking for help. The ESG study, which surveyed over 600 organizations, found nearly nine in 10 were concerned that their backups have become ransomware targets.

“Government cybersecurity agencies now tell businesses that they should plan on when, rather than if, they’re breached,” points out Eric Schott, chief product officer at Object First.

Started by Veeam’s founders, Object First is on the front line of the battle to protect backup data with its immutable backup storage appliances. “We understand backups are an early target for recon and subsequent attack,” says Schott.

Object First designed its out-of-the-box immutability (Ootbi) backup storage to integrate with Veeam’s backup software. The immutable storage feature prevents data tampering, even if attackers were to gain access to the object storage buckets or appliance administration.

Zero trust data resilience

Employing immutable storage is part of a strategy that Object First and Veeam developed based on the Zero Trust Maturity Model. This framework, which the U.S. Department of Homeland Security’s Cybersecurity and Infrastructure Security Agency (CISA) introduced in September 2021, follows a gradual 15-year development of zero-trust principles that use the ‘trust no one’ approach to cybersecurity.

Zero Trust focuses on stopping people from compromising systems after they breach initial defenses. At its core is the assumption that you’re already breached (or will be at some point in the future).

“We view system hardening as important, but it is not the same as Zero Trust,” says Schott, explaining why the company chose this approach as a foundational part of its system design.

The Object First and Veeam framework building on that model is Zero Trust Data Resilience (ZTDR). It contains several principles. One is the use of least-privilege access to backup infrastructure, others include end-to-end system visibility and threat intelligence to protect systems from attack, along with the use of automated recovery plans if an attack does occur.

Another important principle is segmentation, which divides the backup infrastructure into distinct, isolated ‘resilience zones’ with its own security controls and policies. This minimizes the attack surface and limits the impact of a single hardware or software compromise.

When applied to backup infrastructure, this multi-layered security approach ensures that a breach in one zone does not compromise the ability to recover the zone, and does not compromise the entire backup infrastructure. For example, primary and secondary backup storage can be placed in separate zones to enhance resilience.

Object First has also used this principle to segment its backup hardware from backup software. This makes it harder for an attacker to move laterally to the backup storage.

“Object First’s appliance is a single-function device, so it is also easier to manage and secure” says Schott. “It makes things simpler for smaller organizations to deploy without security specialists or dedicated IT staff and improves operations in large organizations by reducing administrator overhead.”

Divide and conquer, encrypt and protect

What happens if an attacker does reach Object First’s hardware? This is where Zero trust principles come into play. Object First’s Ootbi (out-of-the-box immutability) appliance is built to ensure that backup data cannot be modified or deleted once it is written. “It’s crucial for protecting data from ransomware attacks and other cyber threats,” Schott adds.

To achieve immutability, Object First based Ootbi on the S3 storage protocol. This includes a feature called Object Lock, which uses a write once, read many (WORM) approach to ensure that written data cannot be modified or deleted after the fact. Users control the time limit on immutability using retention periods in Veeam, and they can apply legal holds to prevent deletion or modification of data until the hold is removed.

Immutability means that even total system compromise won’t enable hackers to delete or scramble your data. “Even if you have full admin credentials and access to every bucket secret, you can’t destroy immutable data,” Schott says.

A hacker with physical access could conceivably take a hammer to the appliance if they want to destroy the data, but that’s where the 3-2-1 backup approach recommended by Veeam and Object First is important. It involves keeping at least three copies of your data, storing them on at least two different types of media, and having one copy stored offsite or in the cloud.

Immutability was a key driver for managed IT service provider Waident Technology Solutions, which tested multiple products before settling on Ootbi to support its customers. This gave the company an on-site primary backup solution that it could combine with off-site backups in the U.S. and Europe.

Scale and grow

Object storage’s architecture provides an optimal platform for backup workflows because it’s not bound by the size limitations associated with file and block storage. It uniquely separates data from metadata, storing each as discrete objects. This architecture allows it to easily scale on demand to accommodate large amounts of data, addressing the needs of modern businesses dealing with swelling data volumes.

Conversely, file and block storage is constrained by hierarchical structures or fixed capacity limits. People wanting to scale block-based storage architectures typically build smaller systems and manage them individually, introducing more management complexity and overhead. Object First joins multiple storage units into a single cluster, allowing scalability and load balancing, without shared storage hardware, or a single distributed database for metadata. This allows an on-premise scaling and performance experience without burdening the administrator to manage separate storage systems.

Object storage is well suited for cloud environments. Its focus on individual data objects supports the distributed, often multi-regional nature of cloud resources in a way that is more difficult for file and block architectures.

One company that relied on this immutability in the cloud was SaaS-based legal practice management company Centerbase. Legal companies are top targets for ransomware because of the sensitive data they hold about their clients.

The company used Ootbi storage for its immutability and Veeam integration. It felt this combination could reduce its Recovery Time Objective (RTO) and Recovery Point Objective (RPO) metrics, helping it to get back up and running more quickly in the event of an attack. After installing Ootbi, it slashed its RPO by 50 percent from eight hours to four, while also improving backup speeds, it reported.

End-to-end encryption excludes exfiltration

Out of the box immutability protects data from malicious encryption or deletion, but that’s not all that ransomware attackers want to do. They increasingly want to steal data, threatening to publish it unless victims pay up. To protect customers from that, Object First relies on another capability in the backup software from Veeam: end-to-end encryption.

Veeam’s end-to-end encryption ensures that all data sent into the backup storage is encrypted, providing an additional layer of protection against data exfiltration. By encrypting data at all locations within the 3-2-1 backup environment, Veeam makes it impossible for attackers to read sensitive data in the highly unlikely event that they’re able to reach it at all.

The Veeam encryption keys can be securely stored within Veeam servers, or within external Key Management Services (KMS) including those stored in the cloud.

Having both on-site and off-site backups with immutable storage and Veeam’s encryption enables busy admins to enforce the same set of operations across both domains for maximum security without complex configuration, Schott explains.

“This level of protection provides a strong deterrent against ransomware attacks, safeguarding businesses and enabling continuity in operations,” he says.

In the face of rising ransomware threats targeting backup data, the combination of Veeam’s end-to-end encryption and Object First’s immutable storage provides an advanced line of defense. To develop an easy approach to zero-trust backup deployment, Object First did a pretty good job of thinking outside the box.

Sponsored by Object First.

The Synnovis cyber attack, glaring neglect, and business discontinuity

Comment. The Synnovis pathology lab malware attack on June 3 has resulted in delayed hospital treatments and operations in London, UK, with the Qilin ransomware gang now publishing stolen information. The affected Synlab business has been attacked twice before by malware gangs, and cyber security specialists are asking questions about why it hasn’t protected its systems more effectively against such attacks.

So far the attack has caused more than 2,194 outpatient appointments and 1,134 elective procedures (operations) to be postponed at the UK King’s College Hospital NHS Foundation Trust and Guy’s and St Thomas’ NHS Foundation Trust – 1,184 of which were for cancer treatment – because of the inability to get blood analysis results from the affected Synlab facility.

The Synlab pathology operations are mission-critical to the hospital trusts, yet the effects of the attack – an NHS statement notes “disruption from the cyber incident will be felt over coming months” – show that there was no effective business continuity or disaster recovery procedure in place for the affected IT systems.

What’s been done?

A Synnovis spokesperson told us: “Synlab constantly improves security measures and emergency processes as they are vital components in responding to and mitigating cyber attacks on essential healthcare providers.

“Furthermore, Synlab follows a ‘zero trust’ approach to cyber security and continuously invests in the security of its IT systems and processes as well as the awareness of employees to protect its infrastructure and data.

“We have taken several steps to further secure our infrastructure and implement operational mitigations for  partners. These have included but are not limited to:

  • Standing up new datacenter infrastructure
  • Resetting all service platform passwords and expiring MFA tokens.”

This is after-the-event activity. Why was it not done before? Why was it not done after Synlab France was attacked in June last year, and Synlab Italy in April this year?

A rude awakening

Dmitry Sotnikov.

Columbus, Ohio-based Cayosoft is a software developer focused on security, efficiency, and compliance for Active Directory (AD), Azure Active Directory (Azure AD), and Microsoft 365 customers. Chief product officer Dmitry Sotnikov told us: “Unfortunately, the situation with Synnovis and the affected hospitals is not unique, and healthcare organizations are experiencing a rude awakening. In the past, many assumed that hackers – those who were state-affiliated – mostly targeted commercial enterprises and generally avoided sensitive industries such as healthcare. However, the war in Ukraine radically changed cyber security and left no industry immune. The recent attacks on Change Healthcare and Ascension in the US are stark reminders that massive ransomware attacks can now put numerous lives in danger. 

“The healthcare industry needs to review its priorities and urgently invest in IT security. It is responsible for the privacy and lives of its patients.

“Luckily, the attacks that we are seeing are not tailored to healthcare targets at all. The criminals are using the same general-purpose attack methods in healthcare as they do with other industries: attacking via phishing or credentials, and then using Active Directory to spread laterally and elevate privileges. Cyber defense solutions exist to mitigate the threat of ransomware – MFA, threat detection, monitoring, recovery, etc. – but the sector needs to start applying them consistently.”

Contingency plans

An NHS document dating from the 2017 WannaCry attack states: “Organizations should have plans in place to detect and eliminate malware within their systems. These plans should include measures to minimize the impact of a security breach and to expedite the organization’s response. Organizations should adopt a ‘defence-in-depth’ approach, using multiple layers of defence with various mitigation techniques at each layer to detect malware and prevent it from causing significant harm.”

Further: “All NHS organizations must have business continuity plans in place so that they can maintain their services to the public and patients in the event of both large and small incidents.”

Because the Synlab June 3 attack has had such devastating consequences, and recovery will take several months, one might conclude the Synnovis partnership may not have followed this advice.

A combined Guy’s Hospital Foundation Trust and St Thomas’ Hospital Foundation Trust document refers to their claimed responsible attitude to cyber security, at least concerning the Synnovis EPIC software-powered pathology laboratory information management system: 

As the June 3 attack and its ongoing effects seem to demonstrate, though, Synnovis did not have “strong procedures in place to detect and eliminate malware” within its systems. You’d be forgiven for thinking lip service was paid to cyber security instead of erecting and maintaining effective malware attack defenses. The current business discontinuity was the result of this inadequate provision.

Leaving the door open

Richard May.

Richard May, CEO at UK-based specialist Cloud Service Provider, virtualDCS, told us: “As a specialist in data recovery, I am frequently disheartened by the delayed and often lackluster approach many organizations take toward disaster recovery (DR) and backup solutions. Decisions that should be made with urgency often languish for over six months, leaving critical data vulnerable. If an individual left their front door wide open, they would undoubtedly rush to secure it. It is expected that organizations handle others’ information with the same level of care and expedience as they would their personal belongings.

“The situation with Synlab is a glaring example of neglect in both risk prevention and mitigation. Recent reports suggest that restoring their services could take several months, highlighting a severe failure in its data protection strategy. This delay demonstrates negligent complacency and a blatant disregard for safeguarding mission-critical systems. Clearly, data assets were neither properly identified nor protected, and there appears to be no effective DR ‘playbook’ in place to facilitate a swift recovery.

“This incident should serve as a wake-up call for all organizations to prioritize the implementation of robust DR and backup solutions immediately. The protection of data should be approached with the utmost urgency and diligence, ensuring that comprehensive plans and technologies are in place to mitigate risks and ensure rapid recovery from any disruptions.”

Check out this specific UK NHS website to read updates about the attack and the progress of the recovery from it.

GigaOm identifies unstructured data management leaders

Data reflected in a man's glasses.

GigaOm has six suppliers competing for the leadership position in its 2024 unstructured data management (UDM) report

Analyst Walt Whitman writes: “UDM solutions go beyond mere data storage, empowering a range of professionals to extract vital value from their previously untapped data assets. Through a powerful set of features, UDM transforms once-amorphous data into readily searchable and interpretable assets, empowering informed decision-making across the organization.” UDM offerings, covering both file and object data management, include metadata management, user-defined tags, and rapid retrieval from embedded search engines.

Previous UDM Radar reports separated infrastructure and business use cases, he says. “The infrastructure report delved into the nuts and bolts of UDM solutions, exploring aspects like data tiering, lifecycle management, and search functions. The business report, in contrast, examined how UDM tackles challenges like compliance, security, and big data analytics.” 

The latest UDM report has a crowded Radar diagram, with nine leaders, 12 challengers, and three entrants. They are combined into three groups: Mature platform players, innovative platform players, and innovative feature players. The analyst evaluates and scores each supplier in three tables, looking at key features, emerging features, and business criteria. Each of these three looks at several individually rated characteristics.

GigaOm Radar stats

The suppliers are then located in one of four quadrants of the radar diagram – see bootnote below – to provide a fuller picture:

GigaOm Radar

Note that NetApp has recently deprecated (removed) features in its BlueXP classification offering, such as third-party storage suppprt amd the ability to move, copy, and delete source files. BlueXP Classification was the NetApp product focus for GigaOM’s analyst, and the deprecated feature set is not reflected in the GigaOm report.

For comparison with the 2024 GoigaOm report, here is the 2023 infrastructure version of this chart:

GigaOm Radar

The 12 infrastructure vendors assessed in 2023 have become a larger group of 24 in 2024. We can also see that Panzura’s positioning has abruptly changed from being a slightly innovative feature new entrant in 2023 to a balanced innovative platform challenger in 2024. Datadobi, Druva, and Komprise have gone backward, with the latter also changing from a moderately innovative platform player to a quite mature platform player in 2024.

The dozen new suppliers assessed in 2024, compared to the 2023 infrastructure report, are Aparavi, BigID, Google with Elastifile, Hammerspace, HPE, IBM, Nasuni, Pure Storage, Quantum, Qumulo, Scality, and Varonis.

You will have to obtain the report from GigaOm to find out more. There is a free account feature which may provide access.

Bootnote

The GigaOm Radar diagram locates vendors across three concentric rings, with those set closer to the center judged to be of higher overall value. The new entrants are in the outside ring, challengers in the next inner ring, and leaders in the one closer to the center after that. The central bullseye area is kept clear and positioned as a theoretical target that is always out of reach.

The chart characterizes each vendor on two axes – balancing Maturity versus Innovation and Feature Play versus Platform Play – while providing an arrow that projects each solution’s rate evolution over the coming 12 to 18 months: Forward mover, fast mover, and outperformer.

GigaOm says mature players have an emphasis on stability and continuity and may be slower to innovate. Innovative suppliers are flexible and responsive to market needs. Feature players provide specific functionality and use case support while perhaps lacking broad capabilities. Platform players have the broad capability and use case support with, possibly, more complexity as a result.

OpenAI acquires Rockset for vector database capabilities

GenAI LLM and chatbot pioneer OpenAI has acquired Rockset, an eight-year-old vector database business.

OpenAI was started up in 2015 by Sam Altman, Greg Brockman, Reid Hoffman, Jessica Livingston, Elon Musk, Peter Thiel, and others with $1 billion in funding. It has had a checkered, eventful, and spectacular history with its ChatGPT chatbot revolutionizing the speed and accuracy of large language models (LLMs). OpenAI is now engaged in a chatbot race with Google, Anthropic, X (Twitter as was), Meta, and others to develop an AGI (artificial general intelligence) capability. 

Sam Altman, OpenAI
Sam Altman

OpenAI wants to build an AI that counters hallucinations, gets faster, and has generic appeal. Altman says: “We can imagine a world where all of us have access to help with almost any cognitive task, providing a great force multiplier for human ingenuity and creativity.”

These LLMs depend upon a semantic search of chunks (tokens) of information represented as vectors and stored in vector databases. Rockset has developed and is extending its own vector database technology, with better indexing, retrieval, and ranking capabilities. It can index vectors as well as text, document, geo, and time series data, and is focused on hybrid search at scale.

An OpenAI blog says the company has “acquired Rockset, a leading real-time analytics database that provides world-class data indexing and querying capabilities … We will integrate Rockset’s technology to power our retrieval infrastructure across products, and members of Rockset’s world-class team will join OpenAI.”

Rockset co-founder and CEO Venkat Venkataramani blogged: “I’m excited to share that OpenAI has completed the acquisition of Rockset. We are thrilled to join the OpenAI team and bring our technology and expertise to building safe and beneficial AGI.” 

Venkat Venkataramani, Rockset
Venkat Venkataramani

He says: “Rapid advancements in LLMs are enabling a Cambrian explosion and numerous innovations across every industry, driving a preponderance of AI applications. While the nature of these applications has changed, the underlying infrastructure challenges have not. Advanced retrieval infrastructure like Rockset will make AI apps more powerful and useful. With this acquisition, what we’ve developed over the years will help make AI accessible to all in a safe and beneficial way.”

This is a technology and acqui-hire, with Venkataramani writing: ”We’ll be helping OpenAI solve the hard database problems that AI apps face at massive scale.” 

Rockset has attracted $105.5 million in funding across seed ($3 million), A ($18.5 million), B ($40 million), and extended B ($37 million) rounds and debt financing ($7 million). The acquisition price was not disclosed. According to Crunchbase, OpenAI has accumulated a gigantic $11.3 billion in funding from investors including Tiger Global, Sequoia, Andreessen Horowitz, and Microsoft. It can afford to splash cash on Rockset.

Other LLM developers such as Anthropic, Databricks (DBRX), Google (Gemini), Meta (Llama-3), and Mistral may now be looking at vector database suppliers – think Nuclia, Pinecone, and Zilliz – to ensure their LLMs can keep up with OpenAI’s GPT-4o.

Rockset said: “Existing Rockset customers will experience no immediate change. We will gradually transition current customers off Rockset and are committed to ensuring a smooth process.” Customers can check out the FAQ here. 

Datadog helps detect ‘problem’ Spark and Databricks jobs

Cloud monitoring and security firm Datadog has introduced Data Jobs Monitoring, which allows teams to detect problematic Spark and Databricks jobs anywhere in their data pipelines. It also allows them to remediate failed and long-running-jobs faster, and optimize over-provisioned compute resources to reduce costs, promises the provider.

Matt Camilli.

Jobs Monitoring is said to immediately surface specific jobs that need optimization and reliability improvements, while enabling teams to drill down into job execution traces so that they can correlate their job telemetry to their cloud infrastructure for “fast debugging.”

On the technology, Matt Camilli, head of engineering at Rhythm Energy, said: “My team is able to resolve our Databricks job failures 20 percent faster, because of how easy it is to set up real-time alerting and find the root cause of the failing job.”

“When data pipelines fail, data quality is impacted, which can hurt stakeholder trust and slow down decision making,” added Michael Whetten, VP of product at Datadog. “Data Jobs Monitoring gives data platform engineers full visibility into their largest, most expensive jobs, to help them improve data quality, optimize their pipelines and prioritize cost savings.”

Michael Whetten.

Out-of-the-box alerts immediately notify teams when jobs have failed or are running beyond automatically detected baselines, so this can be addressed before there are negative impacts to the end user experience. And recommended filters in Jobs Monitoring surface the most important issues that are impacting job and cluster health, so that they can be prioritized.

In addition, detailed trace views show teams exactly where a job failed in its execution flow, so they have the full context for faster troubleshooting. Also, multiple job runs can be compared to one another to expedite root cause analysis, and identify trends and changes in run duration, Spark performance metrics, cluster utilization and configuration.

Finally, resource utilization and Spark application metrics help teams identify ways to lower compute costs for over-provisioned clusters and optimize inefficient job runs.

A Gartner magic quadrant named the leading observability and APM vendors in 2023 as Dynatrace, Datadog, New Relic, Splunk, and Honeycomb. There were 14 other vendors mentioned in the MQ.

WD mixes TLC and QLC NAND in SN5000 gumstick product line

Western Digital’s Blue SN2000 gumstick SSD updates the existing SN580 TLC NAND product and adds a 4TB capacity variant by using QLC flash with a newer process technology.

The Blue SN580 is an M.2 2280 format PCIe 4 drive with 250GB, 500GB, 1TB and 2TB capacity levels built using BiCS5 112-later 3D NAND in a 3bits/cell (TLC) process, and sold as a consumer drive, not OEM. Its companion SN570 OEM drive, with the same capacity levels but using PCIe 3 not PCIe 4, used the same BiCS5 NAND and was replaced by the SN5000S.  

This SN5000S used BiCS 162-layer NAND organized in QLC (4 bits/cell) fashion and with 512GB, 1TB and 2TB capacity points. WD has not taken the same approach in the SN580 to SN5000 transition, opting for TLC BiCS5 flash for the 500GB, 1 and 2TB capacity points but BiCS6 162-layer QLC NAND for the upper 4TB capacity point.

This means that the SN5000 line has two controllers, one for the three BiCS5 TLC variants and a second for the 4TB BiCS6 QLC product. Both SN5000 variants are PCIe 4 drives, like the SN580.

A table summarizes the drive characteristics and maximum performance levels: 

This simple picture is deceiving though where the SN5000 is concerned. A look at the detailed performance numbers from the datasheet show how the SN5000 model’s performance varies with capacity and NAND type:

The random read IOPS start at 460,000 foir the 500GB model, peak at 730,000 for the 1TB product and then decline to 650,000 at 2TB but rise to 690,000 at 4TB, with the TLC-to-QLC and BiCS5-to-BiCS6 changeover. The random write IOPS also change with the switch, from 770,000 to a much higher 900,000 with the 4TB model.

The sequential write bandwidth also shows an inconsistent pattern: 4GB/sec at 500GB; 4.9GB/sec at 1TB; 4.85GB/sec at 2TB; and 5GB/sec at 4TB. Sequential read bandwidth is less variable and increases slightly with capacity, progressing from 5GBsec to 5.5GB/sec.

Endurance in terabytes written terms increases in a simple way with capacity: 500GB – 150TBW; 1TB – 300TBW; 2TB – 600 TBW and 4TB – 1,200 TBW.

WD has set Blue SN5000 prices at: 500GB – $69.99, 1TB – $79.99, 2TB – $139.99 and 4TB – $279.99. Download a datasheet here.

Next DLP improves data protection using origin and movement

Insider risk and data protection biz Next DLP has unveiled its Secure Data Flow technology intended to improve protection for customers. Part of the supplier’s Reveal Platform, Secure Data Flow takes into account the origin, movement and modification of data to widen protection.

The tech, we’re told, can be used to secure the flow of critical business data from any SaaS application, including Salesforce, Workday, SAP, and GitHub, helping to prevent accidental loss and malicious theft.

John Stringer.

“In current IT environments, intellectual property commonly resides in an organization’s SaaS applications and cloud data stores,” said John Stringer, head of product at Next DLP. “The risk here is that high-impact data in these locations cannot be easily identified based on its content. Secure Data Flow, Reveal ensures firms can… protect their most critical data assets with confidence, regardless of their location or application.”

Next DLP claims legacy data protection technologies are “falling short”. It says they rely heavily on pattern matching, regular expressions, keywords, user-applied tags, and fingerprinting, which “can only cover a limited range of text-based data types”.

It adds that recent studies show employees download an average of 30 GB of data each month from SaaS applications to their endpoints, including mobile phones, laptops, and desktops, which underscores the need for advanced data protection measures.

By tracking data as it flows to sanctioned and unsanctioned channels within an organization, Secure Data Flow can “prevent data theft and misuse effectively”, through complementing traditional content and sensitivity classification-based approaches with origin-based data identification, manipulation detection, and data egress controls.

This results in an “all-encompassing, 100 percent effective, false-positive-free solution that simplifies the lives of security analysts,” claims Next DLP.

“Secure Data Flow is a novel approach to data protection and insider risk management,” said Ken Buckler, research director at analyst house Enterprise Management Associates. “It not only boosts detection and protection capabilities, but also streamlines the overall data management process, enhancing the fidelity of data sensitivity recognition and minimizing endpoint content inspection costs in today’s diverse technological environments.”

IBM unveils uStore for faster remote data access using NVMe-oF in Storage Scale

IBM is planning a “uStore” for Storage Scale to get data from remote drives faster using NVMe over Fabrics.

Update: Added that Frank Kraemer’s quote comes from a LinkedIn post, 26 June 2024. Invalid IBM use of Cloudian’s HyperStore brand and its cessation noted, 30 August 2024.

Storage Scale is the latest incarnation of IBM’s venerable GPFS (General Parallel File System), which speeds file reads and writes by having file system nodes (servers) operate in parallel. NVMe over Fabrics (NVMe-oF) is a protocol effectively extending the PCIe bus and operating across TCP/IP, Fibre Channel, iSCSI, and Ethernet network links to provide direct block-level storage drive access to accessing servers.

IBM IT architect Frank Kraemer, via a LinkedIn post, thinks that the ideas expressed by Tom Lyon in the “NFS must die” article are “pretty cool” and says: “We have plans that go into a similar direction with using NVMe-oF for speed but we’ll still keep the classic way of file system interface and Erasure Coding (GPFS Native Raid – GNR) for ease of use and safe operations.” 

These plans center on a planned Storage Scale uStore feature, which will provide an NVMe-oF performance pool.

An IBM presentation, IBM Vendor Update – Storage, at the HPC User forum mentioned the concept. Presenter Chris Maestas, Chief Architect for Storage File and Object Systems in IBM’s Data and AI Storage Solutions unit, said that data is everywhere in a hybrid and multi-cloud world, and compute, both CPU and GPU-based, wants to access remote data as if it were local.

Update: The IBM presentation slides mentioned a “HyperStore” codename for the uStore concept and this was invalid as HyperStore is a trademarked Cloudian brand for its object storage. The IBM use of the Hyperstore codename has been discontinued. August 2024.

Storage Scale generally enables that to happen by providing storage access, abstraction with a single global namespace, and acceleration, as a slide illustrates:

IBM Storage Scale slide
Note; HyperStore is a Cloudian brand and its use by IBM is invalid and has since been retracted.

With reference to AI workloads and GPUs, he said admins could have remote data effectively run closer to the compute, emulating local storage on the GPU compute nodes with NVMe-oF. This principle was demonstrated at the SC22 event using IBM’s ESS 3500 with SSDs, delivering more than 10 million IOPS and hundreds of GBps to accessing compute clients. The system used an integrated extreme high IOPS storage pool.

This led to the uStore development:

IBM Hyperstore slide
Note; HyperStore is a Cloudian brand and its use by IBM is invalid and has since been retracted.

The uStore concept was revealed to the Spectrum Scale User Group last week in London. It is a tiered system with Storage Scale providing an intermediate reliable pool of storage using GNR (native declustered RAID), the performance pool, and local drives on the client compute nodes. 

These access the reliable storage pool using Network Shared Disks (NSD), a logical grouping of storage disks in a network on file storage systems. Storage Scale stripes files across NSD servers, which store the stripes as blocks. Accessing clients do real-time parallel IO to the NSD servers.

The performance pool drives, a subset of the reliable pool’s drives, are accessed using NVMe-oF, which is faster. There is a shared cache across all the compute nodes.

More uStore details will be revealed in coming months.

Actian unveils ‘faster and smarter’ database for edge working

Actian has updated its embedded database to help businesses run “faster and smarter” applications at the edge.

Actian, the data and analytics division of HCLSoftware, has released Actian Zen 16.0, which is designed for real-time data processing across mobile networks, IoT devices, edge gateways, and complex machinery.

It’s a key market to address as analyst house IDC reckons edge computing will account for $232 billion in spending this year.

Zen 16.0 simplifies and optimizes edge computing for resource-constrained environments ranging from industrial IoT and connected healthcare to smart cities. The database now introduces performance enhancements and new features designed to improve efficiency and functionality for the 13,000-plus organizations already using Zen, and those Actian now expects to come on board.

The latest version addresses the need to support high-performance intelligent applications with “minimal administration,” particularly for frequent data update use cases like sensor data collection to monitor patient well-being, or asset management tracking using RFID scanners.

Zen 16.0 ensures data synchronization from the edge to cloud, supports both SQL and NoSQL data access, and uses popular programming languages to help developers build low-latency embedded applications.

It includes improved L2 cache sizing, page pre-load for large data files, Kafka data stream support, EasySync (a new datasync utility), enhanced JSON support, and support for Btrieve Python. There is also Docker and Kubernetes container support, and extended index key length.

Emma McGrattan, senior vice president of engineering and product at Actian, added: “Zen 16’s secure and scalable design allows for easy data synchronization with Zero-ETL, making it perfect for developers creating intelligent applications that can deliver real-time decisioning from the edge to the cloud, to give businesses a competitive advantage.”

“Zen continues to deliver exactly what we need and we’re enthusiastic about the new capabilities of Zen 16.0 to empower our business operations even further,” said Trent Maynard, director of product and engineering at customer Global Shop Solutions.

Storage news ticker – June 21

Anomalo has expanded its platform that monitors the quality of structured data in data warehouses and data lakes to monitor unstructured text. We’re told this makes it possible for enterprises to discover, curate, use, and ingest high volumes of text data without the risk of using low quality data, which is helpful for GenAI applications. Unstructured text documents can be organized and evaluated for data quality around various documents and document collection characteristics, including length, duplicates, topics, tone, language, abusive language, PII, and sentiment. Users can evaluate the quality of a document collection and identify issues in individual documents, reducing the time needed to organize and use unstructured text data. The feature is currently in private beta.

Databricks has launched its 2024 State of Data + AI Report. Featuring data from its 10,000-plus customers across the globe, this research provides an in-depth look into how organizations across industries are approaching AI. Key stats include:

  • Across all organizations, 1,018 percent more models were registered for production this year compared to last year.
  • The use of vector databases to enable the customization of AI models  grew 377 percent in the last year.
  • Across both Llama and Mistral users, 77 percent choose models that are 13B parameters or smaller.

More information can be found here.

Databricks and Informatica have an expanded partnership bringing together Informatica’s AI-powered Intelligent Data Management Cloud (IDMC) capabilities within the Databricks Data Intelligence Platform. This will enable customers to deploy enterprise-grade GenAI applications at scale, based on a foundation of high-quality, trusted data and metadata. The expanded partnership includes four new capabilities: 

  • GenAI solution blueprint for Databricks DBRX  
  • Native Databricks SQL ELT 
  • Cloud data integration-free service (CDI-Free) on Databricks Partner Connect 
  • Full IDMC support via Unity Catalog

The DNA Storage Alliance has announced new governing board members Biomemory, Entegris, and Imagene to join Catalog, Quantum, Twist Bioscience, and Western Digital. Biomemory is a pure play DNA data storage startup that aims to develop end-to-end DNA data storage systems. Entegris is a leading supplier of advanced materials and critical process solutions for the semiconductor, life sciences, and other high-tech industries. Imagene has more than 20 years of expertise in room temperature storage and stability assessment of nucleic acids and bio-specimens.

Document relational database company Fauna is launching new schema features to solve a problem faced by developers building GenAI, edge, and IoT applications: Ensuring data consistency and security at scale in document databases. With these new features, Fauna enables developers to define their database schema in their application code, combining the flexibility of document databases with the data integrity of relational databases – something it claims MongoDB and DynamoDB can’t deliver.

The FMS (Future of Memory and Storage) 2024 show has opened nominations for its Best of Show Awards across various categories including:

  • Most Innovative Sustainability Technology
  • Most Innovative Artificial Intelligence (AI) Application
  • Most Innovative Hyperscaler Implementation
  • Most Innovative Customer Implementation
  • Most Innovative Startup Company
  • Most Innovative Technology
  • Most Innovative Consumer Application
  • Most Innovative Enterprise Business Application

Nominations are due by 6pm PDT on June 21 and may be completed online here. FMS will be held August 6-8 at the Santa Clara Convention Center. The FMS24 BOS Awards Ceremony is August 6, from 5:30 to 7pm the Exhibit Hall FMS Theater. Winners will be announced and awards given at that time. Register for the event here.

In-memory streaming data supplier Hazelcast has hired distributed compute specialist Anthony Griffin as chief architect. He will focus on applying his serverless compute and financial systems skills to evolve Hazelcast’s unified real-time data platform with a focus on mission-critical and AI applications. Griffin served most recently as a senior engineering leader at AWS Lambda, an event-driven, serverless compute platform provided by Amazon.

We’re told IBM has announced the latest version of the Virtualize software for FlashSystem and SVC. 8.7.0 brings a raft of major and minor enhancements and the instantiation of a scalable storage platform using Flash Grid. Flash Grid is a scalable storage platform comprising of multiple FlashSystem or SVC systems with federated management, AI-powered data placement recommendations, and flexible deployment options. v8.7.0 features:

  • Flash Grid – scale out up to eight FlashSystem or SVC systems to be managed as one including non-disruptive workload mobility between members of the grid.
  • New topology designs for Policy-based High Availability (PB-HA) including fully active active pathing models and non-uniform host configurations. Minor additions with further OS support and clustering using SCSI-PR.
  • Up to two RDMA-based Ethernet partnerships between systems.
  • Pauseless Volume Group Snapshot (VGS) triggering. Default grain size changes for VGS and the ability to convert ThinClones into full Clones
  • Reclaim of unmapped space in Standard Pools (using VDM under the covers)
  • Auto FCM firmware update of candidate drives
  • Auto download of Security Patches from Fix Central
  • Some minor tweaks:
    • VMware discovers volumes as “Flash” without manual changes at ESXi
    • Volume Groups support multi-tenancy (ownership group assignments)
    • Improved vVol scalability and VASA service RAS
    • Full performance stats available via REST-API

India’s largest full-service stockbroking firm, Kotak Securities, is using Infinidat’s InfiniBox storage solution to support business growth, millions of time-sensitive customer transactions daily, and operational cost control initiatives. Kotak has a footprint of 175-plus branches, 1,300-plus franchisees and satellite offices across more than 370 cities in India. It adopted Infinidat’s flexible consumption “pay as you grow” approach, which allows the customer to only pay for the storage capacity they need and use. Read the full case study here.

Wedbush analyst Matt Bryson reckons Micron is expanding its HBM production in the US and considering extending a Malaysia test and assembly plant to build HBM chips. It wants to triple its HBM market share to 25 percent; the same as its DRAM market share. Currently, Nvidia buys nearly half of the global HBM output.

NetApp says its Spot by NetApp software has achieved the FinOps Certified Platform certification from the FinOps Foundation, validating the platform’s ability to provide the depth and breadth of capabilities organizations need to practice sound cloud financial management. Spot by NetApp has also expanded its certified platform with the general availability of its Cost Intelligence and Billing Engine solutions.

Cloud file services and collaboration biz Panzura has hired Petra Davidson as Global Head of Marketing. She is a returnee having been VP Customer Experience before she left in January 2023. CEO Dan Waldschmidt has also recruited Thomas Morelli from Cisco, after 2.5 years in the Office of the Chief Strategy Officer, to be his VP and Global Comms head.

Storage execs Petra Davidson and Thoma Morelli
Petra Davidson (left) and Thomas Morelli (right)

Redis did well in a vector database performance benchmark using the Qdrant framework. This is the first side-by-side comparison of the new, improved Redis Query Engine released June 20. It claims the top spot from Qdrant and its tests show that Redis is faster for vector database workloads compared to any other vector database tested, at recall >= 0.98. Redis has 62 percent more throughput than the second-ranked database for lower-dimensional datasets (deep-image-96-angular) and has 21 percent more throughput for high-dimensional datasets (dbpedia-openai-1M-angular). Read a Redis blog to find out more.

Redis benchmark chart
Redis benchmark chart

Object storage supplier Scality announced a large-scale deployment of its RING distributed file and object storage solution to optimize and accelerate the data lifecycle for high-throughput genomics sequencing laboratory SeqOIA Médecine Génomique. SeqOIA is one of two national laboratories integrating whole genome sequencing into the French healthcare system to benefit patients with rare diseases and cancer. Alban Lermine, IS and Bioinformatics Director of SeqOIA, said: “In collaboration with Scality, we have solved our analytics processing needs through a two-tier storage solution, with all-flash access of temporary hot datasets and long-term persistent storage in RING.”

Seagate is now selling refurbished disk drives. It has set up an official storefront on eBay as a direct channel for consumers to access factory-certified hard drives as part of the Seagate Circularity Program. Seagate says it’s committed to enabling the secure reuse of storage devices and reducing hard drive shredding. Hard drive shredding entails breaking down a hard drive into tiny pieces so that data cannot be recovered. As rare earth materials contained in those parts cannot be reused, hard drive shredding harms the environment and is not sustainable.

The SNIA STA forum (STA) announced completion of the 20th Serial Attached SCSI (SAS) Plugfest around 24G SAS. The plugfest brought together eight SAS equipment manufacturers in Austin, Texas, and was co-located with SNIA Regional SDC Austin for the first time. Test results were audited by an independent engineering consultant. The following companies attended the plugfest, underscoring a commitment to advancing SAS technology: AIC, Amphenol Corp, Broadcom, ConnPro, Kioxia, Microchip Technology, Samsung, and Teledyne LeCroy Corp.

Tiger Technology says Tiger Bridge subscriptions available on Azure Marketplace are enrolled in Microsoft Azure Consumption Commitment (MACC). This means that every dollar of a Tiger Bridge subscription purchased via Azure Marketplace counts toward a customer’s available MACC spend and can be conveniently tracked within its MCA or EA billing account.

Super-fast file system supplier WEKA has accumulated 113 patents to stop competitors using its technology. It has over 70 patent applications still pending. Co-founder and CEO Liran Zvibel said: “When my co-founders and I started WEKA a decade ago, we wanted to create a radically different approach to data management that was infinitely simpler and more customer-centric, which required a complete rethink of traditional data architectures and developing deeply differentiated intellectual property. Crossing this patent milestone validates our day-one vision to build a revolutionary new approach that eliminates the compromises and challenges of legacy data infrastructure to enable organizations to thrive in the AI era.”

DigiTimes reports that China’s YMTC NAND fabber has moved virtually all its production to its 232-layer fourth generation (Xtacking 3.0) process, and is working on its fifth generation with ~300 layers. This is despite US technology export restrictions. For reference, Micron is at the 232-layer level, Samsung at 236, SK hynix at 238, and Kioxia/WD at 218. Samsung is developing a 286-layer product and SK hynix’s gen 8 technology is 321 layers.