Home Blog Page 364

Coronavirus outbreak: Which NAND and disk suppliers are at risk?

Coronavirus

NAND and disk drive supplies could be reduced after China put entire cities into lockdown in response to the coronavirus outbreak.

So far some 6,000 people are known to have been infected in China and there have been 132 confirmed deaths. The numbers are expected to rise.

Wuhan, a city of 10 million people in Hubei province, is the epicentre of the outbreak and has been in lockdown since January 23, with public transport effectively suspended, the Lunar New Year holiday extended, and factories and offices closed.

Coronaviruses. Photo Credit: Content Providers(s): CDC/Dr. Fred Murphy – This media comes from the Centers for Disease Control and Prevention‘s Public Health Image Library (PHIL), with identification number #4814.

Fifteen other Chinese cities have put similar restrictions in place, including Chibi, Dangyang, Enshi, Ezhou, Huanggang, Huangshi, Jingmen, Suizhou, Qianjjiang, Xiangyang, Xianning, Xiantao, Xiaogan, Yichang, and Zhijiang.

This extensive set of restrictions will have knock-on effects on storage media supplies from factories in the affected areas.

Storage media suppliers at risk

A list of NAND and disk drive suppliers with operations in China includes;

  • Intel – with its Fab 68 making 3D NAND in Dalian,
  • Micron –  manufacturing facility in Xi’an,
  • Samsung has operations in three Chines locations;
    • Shaanxi Province – Samsung China Semiconductor,
    • Xian – F1x1 NAND wafer fab,
    • Suzhou – Samsung Suzhou Research Center (SSCR),
  • Sk Hynix has facilities in Chongqing and Wuxi, where it has HC1 and HC2 plants,
  • Tsinghua Unigroup has operations in Chengdu and Nanjing,
  • Yangtze Memory Technology Company (YMTC) has a plant in Wuhan,
  • Seagate produces finished disk drives in Wuxi,
  • Western Digital’s HGST unit has disk drive component manufacturing operations in Shenzhen and another facility in Shanghai.

Tsinghua Unigroup and YMTC are focussed on sales inside China. The others have integrated global supply chains whose output will be interrupted to some degree if their Chinese plants are shut down beyond the normal Lunar New Year holiday period

They will be affected by direct plant closures and also by supply chain interruptions due to affected component suppliers in China.

Supply chain choke point risks

The coronavirus outbreak is still developing with the Chinese authorities reacting to its growth.

China’s Lunar New Year break started on January 25 and lasts for a week, with it finishing on Sunday, February 2. Chinese officials have extended it to February 6. The city of Shanghai has gone further, extending it to February 9.

Businesses in Suzhou, a manufacturing hub in eastern China, are closed until at least February 8. Thus could well affect Samsung if its plant is in the close-down area.

Affected storage media suppliers, as well as other technology companies, will be examining their Chinese manufacturing operations and supply chains to work out the effects on their product supply operations. Details of any media shortages will emerge over the next few days.

British Airways has suspended flights to China from the UK. Passengers from China are being screened for the virus at five US airports; New York’s John F. Kennedy International Airport, San Francisco International Airport, Los Angeles International Airport, Hartsfield-Jackson Atlanta International Airport, and Chicago O’Hare International Airport.

Shares in US technology companies on the S&P 500 declined 1.6 per cent on Monday, January 27, but recovered by 1 per cent on January 28. 

The BBC is reporting it could be another 10 days before the outbreak peaks.

Intel flags up delay for Optane 3D, gen 2 delivery slowdown

Intel’s latest annual report signposted a delay in bringing second generation Optane SSD and DIMM products to full availability.

The report states: “With our Intel 3D NAND technology and Intel Optane technology, we are developing products to disrupt the memory and storage hierarchy. The 4th generation of Intel -based SSDs are scheduled to launch in 2020 with 144-layer QLC memory technology.”

It continues: “The 2nd generation Intel Optane SSDs for data centres are scheduled to start shipping samples in 2020, and are designed to deliver three times the throughput while reducing application latency by four times. In addition, the second-generation Intel Optane DC persistent memory is expected to achieve PRQ in 2020, and is designed for use with our future Intel Xeon CPUs.”

Optane SSD

PRQ (Product Qualification Report) is an earlier stage in productisation than sample shipping. Therefore PRQ  for gen 2 Optane DIMMs has not yet occurred and neither has sample shipping.

The QLC (4 bits/cell) SSDs were mentioned back in September and their 2020 ship date has not been altered.

Barlow Pass and Alder Stream

Back in September, Intel said it would release gen 2 Optane SSDs, code-named Alder Stream, and DIMMs, code-named Barlow Pass, in 2020. Now it is saying Alder Stream sample shipping will start in 2020, and Barlow Pass PRQ will take place this year also.

There is no mention of when the Barlow Pass Optane DIMMs will actually start sample shipping, or when full availability will start. That means it could be pushed as far back as 2021, as Tom’s Hardware has noted.

Optane chip manufacture

Intel confirmed it will still buy 3D XPoint chips from Micron, stating: “The next generation of Intel Optane technology and SSDs are being developed in New Mexico following the sale of our non-controlling interest in IMFT to Micron on October 31, 2019. We will continue to purchase product manufactured by Micron at the IMFT facility under established supply agreements.”

In other words there is nothing Intel wishes to say yet about having its own production fab for its Optane chips. Building a new fab would be a costly exercise, to the tune of billions of dollars.

We understand Micron’s X100 Optane product uses second generation XPoint technology. Essentially Intel has become an OEM of Micron, using its IMTF output. Until it builds its own XPoint fab, it will not be independent from Micron.

The 144 layer QLC chips are being manufactured at Intel’s Dalian plant in China. Gen 2 Optane is being developed at Intel’s Fab 11x in Rio Rancho, New Mexico, described as a centre of Optane technology advancement. Gen 3 Optane is also being developed there.

HyperFlex becomes mates with K8s: No need to go through vSphere first

Cisco’s hyperconverged HyperFlex system has been re-engineered to support containerisation with native Kubernetes and Intersight cloud-based management.

The rationale is that users need to be able to move apps between on-premises and public cloud environments. Cloud-native apps can do that efficiently and Kubernetes is the way to orchestrate them and their myriad components. Giving HyperFlex native Kubernetes (K8s) support, running on Linux, enables it to operate effectively in the hybrid, multi-cloud world.

Liz Centoni, SVP and GM for Cloud, Compute and IoT at Cisco, issued a quote saying: “With the HyperFlex Application Platform (HX-AP) we are making Kubernetes, the new de facto standard for app developers, much easier to deploy and manage for both app and infrastructure teams.” 

Native HyperFlex Kubernetes

Up until now HyperFlex has supported K8s running in a VMware virtual machine, necessitating an ESXi license. Now customers can go one hundred per cent cloud-native with no intervening vSphere layer.

The Intersight management service has had container support added so it integrates with HX-AP.

With no virtual server layer, there is a common HX-AP environment shared by DevOps app teams and the infrastructure managers. Apps can be developed either in the public cloud or on-premises, with self-service resource provisioning attributes, and deployed anywhere – AWS and Azure for starters. The Google Cloud Platform will surely soon be supported as well.

Cisco says K8s HyperFlex has a curated stack of components above basic K8s, and a turn-key infrastructure, but supplied no component details. It functions as a container-as-a-service platform.

 The cloud, compute and IoT GM blogged about HX-AP obviating customers from paying the V-tax, so to speak: “The HyperFlex Application Platform is designed to take the hard work out of K8s and make it as easy as deploying an appliance. We integrate the Kubernetes components and lifecycle manage the operating system, libraries, packages and patches you need for K8s.

“Plus, we manage the security updates and check for consistency between all components every time you deploy or upgrade a cluster. We then enable IT to deliver a Container-as-a-Service experience to developers – much like they are used to getting in the public cloud.”

Users can run HX-AP and traditional, VMware-based HyperFlex software on the same hardware should they wish. HyperFlex will also support bare metal Linux in the future.

Cloud and competition

Todd Brannon, senior director for Data Centre Marketing at Cisco, told Blocks & Files in a briefing that HX-AP “looks and feels like Kubernetes in the cloud.” He added: “The cloud is not a place but an operating model.”

Google has its Anthos cloud-based container services system, which enables application container movement between on-premises and the AWS, Azure and GCP clouds. How that will interoperate with HX-AP, if it does so, is not clear.

HPE and Google are providing a hybrid cloud for containers, using Anthos. It does yet support HPE’s hyperconverged systems, such as SimpliVity. Nimble storage is supported and HPE has its distributed HCI (dHCI) product using ProLiant servers and Nimble storage.

Cisco says Intersight can understand the resource needs for applications at all layers of the stack, for bare metal apps, ones running in virtual machines and also containerised ones. It integrates with Cisco’s AppDynamics performance monitoring software for this.

Intersight also has a Workload Optimiser function, a real-time decision engine, to help decide where, in the on-premises, multi-cloud environment, it’s bet to run an app. Together, Cisco says, HX-AP, Intersight and AppDynamics provide a closed-loop operating model.

HyperFlex Application Platform for Kubernetes will be available for early access in the second quarter of calendar 2020.

Come to MAMR! Western Digital unfurls HDD tech roadmap

Western Digital clarified its hard disk drive (HDD) technology road map at a Storage Tech Field Day session on January 22, 2020, but managed to avoid revealing how its latest ePMR technology works.

At the moment HDDs are in a perpendicular magnetic recording (PMR) era, with longitudinal bits set upright – perpendicularly – in the platter recording medium. The two great HDD manufacturers, Seagate and Western Digital, are both espousing energy-assisted technologies to get past approaching areal density limits with PMR. Essentially, due to growing thermal instability, PMR bit values become less and less reliable as 3.5-inch disk platters move into the 2TB/platter areal density level and beyond.

Energetic assistance

To move past that, some form of  more stable recording media will be needed, with bit values written while the bit location in the media is either heated (HAMR or heat-assisted magnetic recording) or excited by microwaves (MAMR or microwave-assisted magnetic recording.) The heat or microwaves lower the medium’s coercivity (resistance to magnetic polarity change) which returns to a higher value at room temperature. This makes the bit value stable.

Seagate is working on developing HAMR as its next main recording technology while Western Digital has settled on a 3-phase approach. This was outlined by WD’s Carl Che, VP HDD Technology at the session:

  1. Energy-assisted PMR (ePMR), as used in its 18TB DC HC550 drives,
  2. MAMR,
  3. HAMR. 

Che did not say how energy-assistance worked with ePMR. He did say: “When we worked on MAMR we found there is a new physical phenomenon we can utilise. And by combining this phenomena we created a new recording scheme called ePMR; standing for energy-assist PMR.”

Coming drives will use this technology, which is a stepping stone or building block based on MAMR technology, enabling full MAMR in the future. Che wouldn’t say more about it but: “We will have a product coming very soon and I’m definitely looking forward to having conversations on that.”

MAMR may give way to HAMR in WD’s roadmap; that decision has not been made yet.

ePMR roadmap

With the 9-platter, 18TB HC550, the energy assist is not about areal density, but rather reliability. Che said the HC550 will come this year and WD is simultaneously working on the next generation – 20TB and beyond – to exploit the ePMR gains and drive areal density upwards. He mentioned that the surface width of an ePMR track is in the 50nm area. These drives do not need very much extra electricity to operate at all.

The application of Shingled Magnetic Recording (SMR), with partially overlapping write tracks, will add a 20 per cent capacity uplift (24TB). After that the roadmap shows a progression with ePMR to about 24TB in 2023 and 30TB with shingling. 

Then there’s a jump to 30TB with non-shingled full energy assist – MAMR or HAMR – and 34TB with shingled MAMR or HAMR.  From this point the roadmap progresses to 50TB (non-shingled) and 60TB shingled in 2026.

WD outlook on SMR penetration of data centre disk drives; 40%+ in 2024.

Read/Write heads

Such a progression needs read/write head developments as well as media technology progression. The 2cm long head has to stay on track with plus/minus 1nm positioning accuracy. 

TMR stands for Tunnelling Magneto-Resistance and PZT for Piezoelectric actuator. PZT also indicates the Lead zirconium titanate compound; PbZrTi; a thin film piezoelectric material. VCM is Voice Coil Motor.

These heads need to be able to position themselves precisely over a track density of up to 500 tracks per inch (TPI) and beyond.

WD has a dual-stage micro actuator diagram (WDMA) showing dual piezoelectric strips. When differential voltage is applied to the WDMA, one piezo element expands as the other contracts. This action causes a slight rotational motion of the read-write head. 

Western Digital is currently shipping dual stage micro actuators with triple stage ones due in the Spring/second half of 2020 (HC550), with the required 500 TPI positioning accuracy and beyond. These move in three places along the head assembly; where it is mounted (as in single stage actuator), along the loadbeam (as in dual-stage actuator), and at the tip of the actuator (slider/gimbal).

The optimised use of these three controls should reduce the seek time and, Che said, “help our overall (IOPS) performance).”

Track-based writes for SMR 

Che said areal density in shingled drives was improved by using on-the-fly track-based ECC. There is ECC on the data blocks and ECC on a whole track. 

This enables improved track read and write accuracy. WD predicts it will get a 20 per cent capacity gain by shingling ePMR out to 2023, with 30TB SMR drives. The chart shows four generations of ePMR drives, starting with the 18TB HC550 this year, and followed by 20TB, 22TB and the 24TB product. The commensurate SMR product stages are 20TB ( HC650), 24TB, 26TB and 30TB.

Reducing IO Density

The increased number of tracks and bits/track, meaning overall increased areal density, reduces the number of IOPS/TB. Che said: “When we drive the capacity bigger you will see the unit of IO will reduce; that’s given.” He said WD’s hypercloud customers know this.

He said Intelligent IO queue management by the drive can be used to reach a target IOPS performance while maintaining latency at higher queue depths (4, 16, etc.)

Che didn’t mention multi- or dual-actuator drives which Seagate is developing to increase IO density. WD did discuss dual actuators last year for example, and Blocks & Files think it will use them eventually.

He said WD has a clear vision of how to get to 50TB and beyond – except that the decision about whether to use MAMR or HAMR post-2023 hasn’t yet been made. And we still don’t know how ePMR works.

Iguazio emits storage-integrated parallelised, real-time AI/machine learning workflows

face popping out of computer chip

Workflow-integrated storage supplier Iguazio has received $24m in C-round funding and announced its Data Science Platform. This is deeply integrated into AI and machine learning processes, and accelerates them to real-time speeds through parallel access to multi-protocol views of a single storage silo using data container tech.

The firm said digital payment platform provider Payoneer is using it for proactive fraud prevention with real-time machine learning and predictive analytics.

Real-time fraud detection

Yaron Weiss, VP Corporate Security and Global IT Operations (CISO) at Payoneer, said of Iguazio’s Data Science Platform: “We’ve tackled one of our most elusive challenges with real-time predictive models, making fraud attacks almost impossible on Payoneer.”

He said Payoneer had built a system which adapts to new threats and enables is to prevent fraud with minimum false positives.  The system’s predictive machine learning models identify suspicious  fraud and money laundering patterns continuously.

Weiss said fraud was detected retroactively with offline machine learning models; customers could only block users after damage had already been done. Now it can take the same models and serve them in real time against fresh data.

The Iguazio system uses a low latency serverless framework, a real-time multi-model data engine and a Python eco-system running over Kubernetes. Iguazio claims an estimated 87 per cent of data science models which have shown promise in the lab never make it to production because of difficulties in making them operational and able to scale.

Data containers

It is based on so-called data containers that store normalised data from multiple sources; incoming stream records, files, binary objects, and table items. The data is indexed,  and encoded by a parallel processing engine. It’s stored in the most efficient way to reduce data footprint while maximising search and scan performance for each data type.

Data containers are accessed through a V310 API and can be read as any type regardless of how it was ingested. Applications can read, update, search, and manipulate data objects, while the data service ensures data consistency, durability, and availability.

Customers can submit SQL or API queries for file metadata, to identify or manipulate specific objects without long and resource-consuming directory traversals, eliminating any need for separate and non-synchronised file-metadata databases.

So-called API engines engine uses offload techniques for common transactions, analytics queries, real-time streaming, time-series, and machine-learning logic. They accept data and metadata queries, distribute them across all CPUs, and leverage data encoding and indexing schemes to eliminate I/O operations. Iguazio claims this provides magnitudes faster analytics and eliminates network chatter.

The Iguazio software is claimed to be able to accelerate the performance of tools such as Apache Hadoop and Spark by up to 100 times without requiring any software changes.

This DataScience Platform can run on-premises or in the public cloud. The Iguazio website contains much detail about its components and organisation.

Iguazio will use the $24m to fund product innovation and support global expansion into new and existing markets. The round was led by INCapital Ventures, with participation from existing and new investors, including Samsung SDS, Kensington Capital Partners, Plaza Ventures and Silverton Capital Ventures.

Tabletop storage: Georgia Tech looks to SMASH an exabyte into DNA ‘sugar cube’

Georgia Tech Research Institute (GTRI) is looking into ways to speed up DNA-based cold storage in a $25m Scalable Molecular Archival Software and Hardware (SMASH) project.

DNA is a biopolymer molecule composed from two chains in a double helix formation, and carrying genetic information. The chains are made up from nucleotides containing one of four nucleobases; cytosine (C), guanine (G), adenine (A) and thymine (T). Both chains carry the same data, which is encoded into sequences of the four nucleobases.

DNA double helix concept.

GTRI senior research scientist Nicholas Guise said in a quote that DNA storage is “so compact that a practical DNA archive could store an exabyte of data, equivalent to a million terabyte hard drives, in a volume about the size of a sugar cube.” 

Put another way, Alexa Harter, director of GTRI’s Cybersecurity, Information Protection, and Hardware Evaluation Research (CIPHER) Laboratory, said: “What would take acres in a data farm today could be kept in a device the size of the tabletop.”

The intent is to encode and decode terabytes of data in a day at costs and rates more than 100 times better than current technologies. 

This is still slow by HDD and SSD standards. The intent is to use DNA storage for data that must be kept indefinitely, but accessed infrequently; backup/archive-type data in other words. 

Guise said: “Scientists have been able to read DNA from animals that died centuries ago, so the data lasts essentially forever under the right conditions.”

The grant has been awarded by the Intelligence Advanced Research Projects Activity’s (IARPA) Molecular Information Storage (MIST) program and is for a multi-phase project involving;

  • Georgia Tech’s Institute for Electronics and Nanotechnology – will provide fabrication facilities,
  • Twist Bioscience – will engineer a DNA synthesis platform on silicon that “writes” the DNA strands which code the stored data,
  • Roswell Biotechnologies – will provide molecular electronic DNA reader chips which are under development,
  • The University of Washington, collaborating with Microsoft – will provide system architecture, data analysis and coding expertise. 

GTRI envisages a hybrid chip with DNA grown above standard CMOS layers containing the electronics. Current technology uses modified inkjet printing to produce DNA strands. The SMASH project plans to grow the biopolymer more rapidly and in larger quantities using parallelized synthesis on these hybrid chips.

GTRI researchers Brooke Beckert, Nicholas Guise, Alexa Harter and Adam Meier are shown outside the cleanroom of the Institute for Electronics and Nanotechnology at the Georgia Institute of Technology. Device fabrication for the DNA data storage project will be done in the facility behind them. (Credit: Branden Camp, Georgia Tech)

Data will be read from DNA strands using a molecular electronic sensor array chip, on which single molecules are drawn through nanoscale current meters that measure the electrical signatures of each letter, C, G, A and T, in the nucleotide sequence.  

GTRI research engineer Brooke Becker said: “We’ll be working with commercial foundries, so when we get the processing right, it should be much easier to transition the technology over to them. Connecting to the existing technology infrastructure is a critical part of this project, but we’ll have to custom-make most of the components in the first stage.”

Guise cast more light on the difficulties: “The basic synthesis is proven at a scale of hundreds of microns. We want to shrink that by a factor of 100, which leads us to worry about such issues as crosstalk between different DNA strands in adjacent locations on the chips.”

Current human genome sequencing in biomedicine hopes to achieve a $1,000/genome cost. The SMASH project is looking for a $10/data genome cost. This is a huge difference; a hundredth less.

Blocks & Files thinks we’re looking at two to three year project here.

GTRI senior research scientist Adam Meier said: “We don’t see any killers ahead for this technology. There is a lot of emerging technology and doing this commercially will require many orders of magnitude improvement. Magnetic tape for archival storage has been improving steadily for 60 years, and this investment from IARPA will power the advancements needed to make DNA storage competitive with that.”

We could image a DNA helix as a kind of ribbon or tape, only at a molecular level. Storing an exabyte in a sugar cube-sized chip containing it would certainly make tape density look pretty shabby.

Mind the air gap: Quantum reinforces tape defences against ransomware

LTO tape
LTO tape

Quantum has added a software lock mechanism to prevent backup tapes being accessed in its Scalar i3, i6 and i6000 libraries as a further barrier against ransomware.

Tape cartridges stored in tape libraries are placed in shelves. They are offline in the shelves, and hence air-gapped from any network access. If users need to instantiate a tape backup or restore, then the library gets sent commands, selects and moves a tape cartridge to a drive, and carries out the directed operation.

Eric Bassier of Quantum’s product marketing team said: “Tape’s inherent offline attribute makes it the most secure place to keep a copy of data, and with Quantum’s Active Vault intelligent tape software customers can now store their content in an ultra-secure offline repository without any human intervention.”

Quantum Scalar library administrators can now set policies for tapes to be placed in a so-called logical Active Vault partition. This means a logical state as the tapes are not physically moved inside the library as a result. The firm said that, normally, the backup tapes are in a Backup Application Partition, which is “backup application-connected.”

These Active Vault status tapes are now actually inactive, ironically, until returned to their normal state, in the logical Backup Application Partition, via an administrator-directed command.

Existing i3, i6 and i6000 Scalar library users can get Active Vault software upgrades.

The company is making three Ransomware Protection Packs available. These are three Scalar library configurations bundled in with the Active Vault software:

  • Small – i3 to 600TB in 3U
  • Medium – i6 to 1.2PB in 6U
  • Large – i6 to 2.4PB in 12U.

The i6000 library supports ActiveVault but there are no pre-configured systems available. A Quantum spokesperson said: “We are deploying more Scalar i6000’s than ever before – for archive and cold storage of exabyte-scale unstructured data.” 

Universal memory candidate technology

University of Lancaster researchers have devised a universal memory candidate technology.

By using a structure based on members of the III-V chemical compounds family, it’s possible to build a memory cell with the switching speed of DRAM, data retention that’s better than NAND, and a far lower voltage needed for switching than NAND.

It is called an UltraRAM cell and uses an Indium arsenide (InAs) floating gate to store the memory state (bit value). This gate is isolated by a layered barrier built from Indium arsenide and Aluminium antimonide as this diagram illustrates;

Device structure. (a) Schematic of the processed device with control gate (CG), source (S) and drain (D) contacts (gold). The red spheres represent stored charge in the floating gate (FG). (b) Details of the layer structure within the device. In both (a,b) InAs is coloured blue, AlSb grey and GaSb dark red. (c) Cross-sectional scanning transmission electron microscopy image showing the high quality of the epitaxial material, the individual layers and their heterointerfaces.

A schematic diagram expands on this;

See list of the III-V compounds used below.

Interactions between the layers traps the memory state charge in the floating gate. The isolation of the floating gate is such that the cell holds data for an extremely long time. In fact a retention period of an extraordinary 100 trillion years has been predicted through a simulation exercise.

The cell’s state can be altered by a slightly larger than 2 volt current, taking advantage of a dual quantum well resonant tunnelling junction through the isolating barrier. This is said by the researchers to be 0.1 per cent of the energy needed to switch NAND and 1 per cent of that needed to switch DRAM. 

Writing logic state 1, adding charge, takes just over 5ns, while writing logic state 0, emptying charge, takes 3ns.

The UltraRAM cell is described in pay-to-download paper; Simulations of Ultralow-Power Nonvolatile Cells for Random-Access Memory. But the initial research is in a free-to-download report; Room-temperature Operation of Low-voltage, Non-volatile, Compound-semiconductor Memory Cells.

These papers have been authored by Professor Manus Hayne  and PhD student Dominic Lane, both in the university’s Physics research facility.

They say the cells can be built into bit-addressable arrays and so used for computing stage /memory devices. Watch this space to see if the research technology can be productised.

II-V Compounds

The III-V compounds or alloys are composed of particular elements in the table of elements, in the boron (III) and orogen (V) groups. The ones used by the researchers are:

  • InGaAs – Indium gallium arsenide – a room-temperature semiconductor,
  • GaSb – Gallium antimonide – a semiconducting compound,
  • AlGaAs – Aluminium gallium arsenide – a non-conductor and used as a barrier material in GaAs-based heterostructure devices,
  • GaAs – Gallium arsenide – a compound semiconductor,
  • AlSb – Aluminium antimonide – another compound semiconductor, 
  • InAs – Indium arsenide – another semiconductor.

Isilon supports containers, larger files, faster Azure compute access and cloud-based management

Dell EMC has delivered a powerful update to its OneFS operating system which runs its scale-out out Isilon filers, adding support for containers, larger files and effective capacity, faster Azure cloud compute access, and cloud-based management.

Version 8.2.2 OneFS supports 16TB files, four times more than before. Dell EMC also says OneFS’ inline compression and deduplication on its hybrid SSD/HDD H5600 system can increase its effective capacity 3x. As raw H5600 chassis capacity is 800TB that means it can effectively be 2,400TB. But this depends upon the stored files’ contents and your experience may be different.

Isilon H5600.

There is a Container Storage Interface (CSI) plugin for Kubernetes so that containers can get persistent volumes from an Isilon system through Kubernetes. Container code can call for volume provisioning/deletion, snapshot creation/deletion, creating volumes from snapshots and shared storage access for NFS file shares across multiple Kubernetes Pods.

As a side note, Blocks & Files notes CSI support is spreading, witness Portworx, VAST Data, StorPool and Cisco HyperFlex. We suggest it is going to become a standard feature of storage arrays and hyper-converged systems.

Two cloud extras

OneFS now supports Microsoft Azure ExpressRoute Local, a fast link to a nearby Azure cloud data centre. This means data can be moved from the Isilon box to an Azure compute instance with latency as low as 1.2ms and bandwidth up to 200Gbit/s. Any data written back to the Isilon system won’t incur Azure egress charges.

Dell EMC says this means that Isilon-held data can be used to support infrequent large compute needs in the Azure cloud as well as normal, lower-level, local processing.

CloudIQ on smart phone.

The fourth update item is CloudIQ support for Isilon systems, CloudIQ being Dell EMC’s no-charge, cloud-based, array performance analysis service. It says this uses machine earning technology to analyse information from the Isilon filer and provide a system health check. Admin staff get to find out about potential problems and fix them faster and more easily than before.

They can access CloudIQ through a desktop/notebook browser or mobile phone. It also supports other Dell EMC products such as PowerMax, Unity XT, XtremIO, the SC Series, PowerVault and Connectrix switches.

Dell EMC notes that the CSI plugin is available free of charge on its Community forums for CSI Drivers and Containers, GitHub and Docker Hub.

Mainframe cycle drives IBM storage revenues upwards

High-end DS8900 storage sales were boosted by IBM z15 mainframe demand and pushed IBM storage sales up three per cent in Big Blue’s fourth 2019 quarter.

As reported in our sister publication The Register, IBM’s Q4fy19 total revenues were $21.8bn, up 0.1 per cent, with the Systems business pulling in $2.6bn, up 18 per cent y/y.

In the earnings call, James Kavanaugh, IBM’s CFO, said: “If you look at Systems, we’re off to a very good start. That segment has always been predicated based on bringing new innovation and value to market. Our z15 and new high-end storage, which we brought the market, both grew nicely. Value proposition resonating. We expect a very strong first half in both of those.“

Within the Systems business results, systems hardware was the main revenue contributor at $2.6bn, up 18 per cent as the z15 mainframe cycle went full tilt, recording a 62 per cent increase y/y. The z15 was announced in September 2019 and this was the first full quarter of its availability. POWER server revenues fell 23 per cent. 

DS89900F.

The high-end DS8900F storage array was also announced in September 2019.

Storage system revenues increased three per cent, and we calculate that at $470m, based on the number we reported a year ago.

There are no separate numbers for IBM cloud storage or storage software, so we can’t judge how well it’s doing in these sections of the market. Howevr, Kavanaugh’s earnings call remarks suggest IBM will have good overall storage results for the next two quarters.

Mainframe lock-in is a great benefit to IBM, albeit declining, and that makes storing mainframe bits, bytes and blocks an uplifting Big Blue business too.

NAS to NAS storage migration is nasty… but there’s a better way, claims Datadobi

Replacing an old filer or object store entails migrating data to the new system – and that can lead to a world of hurt.

“The ROI of a new array only begins when migration ends,” says Michael Jack, head of sales at Datadobi, a company that has developed storage migration software called DobiMigrate.

Until migration ends, the customer has two arrays on their premises, taking up floor space and needing power, cooling and system management, according to Jack. The data migration process in-flight often takes much longer than expected. It is generally a one-off exercise, conducted by businesses that are not data migration experts. Few lessons are learnt or carried over from previous migrations.

DataDobi’s DataMigrate concept

Data migration is a multi-phase process:

  • Start. Scan the source system and build a what-to-migrate catalogue.
  • Update this during the migration process
  • Move the data
  • Write to the target
  • Verify it is correct
  • Finish. Cutover from the source to the target

File system scanning can take a long time when there are petabytes of data and billions of files. [Imagine one person has to list every book in the US Library of Congress. You would need to organise an army of people to work in parallel to complete the task in weeks rather than years.]

Also, data that is written to tape in a backup process is read back to verify that what was written is what should have been written. With data migration this happens only when specialist software is used. Only then do you have a data custody chain that can satisfy compliance regulations.

Robocopy and Rsync

Datadobi’s Jack told Blocks & Files NAS and object storage systems vendors minimise migration difficulties and suggest their customers do it themselves with scripting, using Windows Robocopy or Unix/Linux Rsync.

However, old-school software utilities date from pre-petabyte times when file populations were much smaller. They can take a long time to finish, need scripts written, have limited protocol support, do not cover cases where there are multiple different access permission schemes and cannot guarantee that a migration has completed successfully.

For example,

  • Rsync is single-threaded and only supports NFS. Multiple rsync instances can be run in parallel, by writing complicated shell scripts to parse the file system structure and assign each portion to a unique rsync instance. This approach does not scale well and limits performance.
  • Robocopy is limited to scanning NTFS file systems. It supports multiple tread; 8 by default, but only one scan thread is used to update file system maps.
  • Permission data is stored differently by different suppliers. A DataDobi tech brief states: “NetApp, for example, stores either NTFS or UNIX permissions but not both. EMC’s VNX and Unity platforms running in ‘Native’ access mode will store both NTFS and UNIX permissions separately while EMC’s Isilon implements a ‘Unified’ permission model wherein both sets of permissions are combined into a single permission model.”

DobiMigrate includes a DataMiner component which supports multi-processing and uses multiple threads. It has SMB and NFS proxies, which means NTFS and Linux file systems can be scanned in parallel. DobiMiner can also scan the same source file data in SMB and NFS modes, scooping up the different metadata from each protocol. This contains the permission data which is migrated to the destination system automatically.

DobiMigrate can scan 10 billion files or more and supports NFS v3 and v4, SMB v1, 2 and 3, S3, ECS, and Isilon formats. Azure and Google Cloud Platform object formats are on the development roadmap.

Data moving

Datadobi moves data across a network link between arrays in a parallel fashion to speed data movement. There isn’t a slow single stream. It is sensitive to the host system workload burden and throttles its activities if the workload is affected beyond set limits.

Making a hash of data writing

When file data is selected for migration, a hash of its contents is calculated before it is written to the target system. It is read back, a new hash calculated and the two hashes are compared to ensure an exact copy has been made. If there is a mismatch the file is copied again. DobiMigrate software does this automatically.

DataMigrate and hash calculations

Rsync uses hashes too but in a different way. It breaks a file to be migrated into chunks and makes chunk-level hashes. These are compared to similar chunk-level hashes on the target system. If there is no match the new chunk is written to the target. But the copied data’s integrity is not verified, i.e., that what was written matches what should have been written.

Robocopy does not natively check the integrity of files written to a destination. The unsupported Microsoft FCIV (File Checksum Integrity Verifier) utility can do this. It requires scripting for it to read both the source and target files, calculate the hashes, and compare them. If errors are detected the affected files must be re-copied and re-checked.

Datadobi fact file

Datadobi was founded by four Dell EMC engineers following the closure of the Centera object storage centre in Belgium in 2009. Their first migrations were Centera to Centera. This moved on to Centera to Isilon, then NetApp to Isilon and from there to nearly any NAS to any NAS or object store.

Datadobi is entirely self-funded. There are about 60 employees and revenues in 2018 were €10m. Customers are typically one-off users and there is little recurring revenue.

To date the company has worked for 737 customers, with 30 per cent apiece in finance and healthcare. Eighty per cent of its work is in the USA where it has 30 staff. Customer migrations include data moving between on-premises and cloud destinations.

We note that Dell EMC uses Datadobi for NetApp-to-Isilon OneFS migrations.

Data migration niche

Blocks & Files considers Datadobi is unlikely to attract much VC investment. Nor is it a business that would attract a vendor as their main interest is to provide a on-ramp to their kit, not an off-ramp.

Even then the difficulties inherent in developing software to scan, copy, move and verify data from multiple sources would limit what they could do. Yet there is a need for data migration – every time a customer buys a new filer or object system. Hence the niche business that Datadobi is opening up

Companies such as InfiniteIO (file access acceleration), Igneous (extremely large file system storage), Komprise (file system lifecycle management), Actifio and Cohesity (secondary data management), have expertise in scanning file system metadata but apply it for their own purposes, not data migration.

The Ghost in the self-driving storage machine (OK, it’s a car)

Interview: Blocks & Files aims to get a sense of the market for data storage in autonomous and near-autonomous vehicles (AVs). Seagate and Renovo have suggested an AV could generate up to 32TB of data a day. Data will presumably be stored in the vehicle if the vehicle is not in constant or regular connection to a cloud data centre .

How much data storage will an AV need and what storage media will it use? Let’s ask John Hayes, the CEO of Mountain View-based Ghost Locomotion, a self-driving car software startup.  He co-founded the company in 2017 with CTO Volkmar Uhlig, who designed and built a fully automated programmatic media trading platform at Adello. Hayes was a Pure Storage co-founder and chief architect at the company from 2009 to 2017.

Ghost is focused initially on producing an AI system for self-driving cars on highways, not on rural or urban roads. The system is designed to retrofit existing vehicles and should launch this year.

Blocks & Files: How much data will near-autonomous and autonomous vehicles (AV) generate per day?

John Hayes, Ghost Locomotion CEO

John Hayes: At the sensor, an AV might generate TBs a day, however that’s not practical. It’s like saying your phone camera generates 0.5 GB/sec, while recording video when in practice it writes 0.5 MB/sec after the video is MPEG compressed.

The highest bandwidth sensors are cameras; distant second is lidar where each laser is 40 KB/sec x 256 lasers. That’s 10 MB/sec, but even simple compression brings that down to 1 MB/sec. Everything else (like IMUs, GPS or logs) is trivial amounts of data.

Compression is important because when you’re sending data around a realtime system, any transmission delays within the vehicle increase the latency for decisions. If you want a decision on information less than 100ms ago, that can only be done by compressing before transmission.

Our system, with 8 cameras, stores 10 GB/hour. That’s already a large enough transmission challenge that we cut back on the data to transmit. Doing 10x that would significantly reduce the duty cycle of a vehicle.

Blocks & Files: Will this data be transmitted to the cloud for decision-making and analysis? Will the AV be completely self-sufficient and make its own decisions , or will there be a balance between the two approaches?

Hayes: AVs will have to make their decisions entirely internally, speed of decision doesn’t support remote transmission and the Internet is unreliable. Most common reason to connect to a data center is remote driving or other exception handling where a person has to look around and make a decision for the AV. Here you’re limited by wireless bandwidth- probably at most a few compressed video streams.

Blocks & Files: Will the AV IT system be centralised with sensors and component systems networked to a single central compute/storage and networking resource. Or will it be a distributed one with a set of separate systems each with their own processor, memory, etc? Ie. navigation, braking, suspension, engine management, etc.

Hayes: There will most likely be a central computer for driving, and there will be a backup computer with more limited capabilities, like pulling over to a safe location. Powertrain management will probably be a separate computer again for isolation purposes.

Blocks & Files: Will the AV-generated data have to be stored in the vehicle and, if so, for how long?

Hayes: AV generated data will definitely be stored in the vehicle and companies that test autonomous vehicles tend to swap the data storage components rather than connecting either a wired or wireless network. This increases the duty cycle of the expensive ($150-500k) vehicles. In this scenario, data is held no longer than 12 hours.

In personally-owned AVs, it is preferable to store data on the vehicle rather than in the data centre for privacy/legal exposure reasons. It might be stored for a period of days to weeks, but not much longer than three months.

Blocks & Files: How will the data be uploaded to a cloud data centre? How often?

Hayes: There are two models, uploading everything in batch and processing for interesting data during an upload stage, or processing for interesting data and then uploading. AVs for testing will prefer the former because you want to keep the AVs in service. Personally-owned AVs are idle 20-23 hours a day and can use that time to reprocess data and select it for upload. Our upload target is <1 per cent of observed data.

Uploading will be whatever is cheaper, home Wi-Fi, or LTE.

Blocks & Files: What is the maximum amount of storage capacity that will be needed in an AV to cope with the data generation load and the worst case data transmission capability?

Hayes: Space/power limitations create a practical limits on the amount of data storage. You can put a rack server in a trunk, but only half depth and 4-6Us tall. Never seen one with a drive shelf.

Blocks & Files: Will disk drives or flash storage be used or a combination?

Hayes: Flash will probably be preferred because there’s lower service requirements and it’s a tiny percentage of the vehicle cost.

Blocks & Files: Assuming flash storage is used will the workload be a mixed read/write one? If so, how much endurance should the flash have? (AVs could have a 15+ year working life.)

AVs will almost certainly not have a 15 year working life. If they’re individually owned, the electronics will be replaced every few years inline with a typical consumer electronics cycle. If they’re used as a robo-taxi, the lifetime for vehicles is more determined by miles driven rather than calendar age putting their lifetime at 3-5 years.

Blocks & Files: Will the flash have to be ruggedised to cope with the AV environment with its vibrations and temperature/moisture variations?

Hayes: Car interiors are already suitable for people and auto-grade electronics are characterized by extended environmental range, rather than extended warranty. Ordinary max temperatures of 70-85C will work fine.

Net:net

Semi and fully autonomous vehicles will not need a lot of in-vehicle data storage. In the test phase Hayes sees storage drives being swapped out (“swap the data storage components”) and in operation, personally owned vehicles will need to store up to a terabyte. That assumes 10GB/hour, with one-hour operation per day and data stored for up to 90 days. This is trivial.