Does trying to find a better economic approach to storage give you “Butterflies”? (Part 2)

Bright Blue and Black Butterfly 6061480This is the conclusion of a two part conversation with Liam Devine, the global post-sales face of Butterfly. In Part 1, we talked about Butterfly’s unique approach to storage infrastructure analytics and how Butterfly came to be an IBM company.

The Line: It’s been a couple of years since 2011, and you have had the opportunity to both analyze a lot of data and have a number of conversations with financial decision makers. What have you found to be the most compelling talking points?

Liam: The most compelling stuff comes from the data. We’ve analyzed hundreds of different infrastructures in most every conceivable configuration and have discovered some extraordinary things about software defining storage and IBM’s approach to backup.

  • When compared to an as-is physical storage environment, transforming to a software-defined storage environment with something like IBM SmartCloud Virtual Storage Center, the economic outlook can be, on average, 63% more efficient. That’s the average, your results may vary. (Editorial comment: in one of my posts from IBM Edge 2013 I talked about LPL Financial who followed the recommendations of a Butterfly Storage AER and saved an astounding 47% in total infrastructure. Listen to Chris Peek, Senior Vice President of IT at LPL Financial. )
  • When compared to as-is competitive backup environments, transforming to an IBM Tivoli Storage Manager approach can be, on average, 38% more efficient. Again, your results may vary. [Modified: For example, when we segment just the mass of Backup AER results from as-is Symantec NetBackup or CommVault Simpana or EMC NetWorker environments, each shows that transformation to a TSM approach produces different, and in these cases at least, somewhat stronger economic savings.] We’ve got data by industry and for many other competitive backup approaches but you get the picture. Choosing a TSM approach can save money.

Business Graph 13296111The Line:  For my readers, the Butterfly team had discovered most of these trends before IBM acquired them. As I noted above, that had a lot to do with IBMs interest in the company. [Modified: Now that IBM owns Butterfly, they have been quick to add legal disclaimers around anything that might be construed as a competitive claim*.]

Now Liam, switching back to you. Butterfly has been part of IBM for about 11 months. How has the transition been?

Liam: Very successful and pretty much as I had expected. We had a few minor technical hic-cups in switching infrastructure (freeware and open source components to a more IBM standard architecture), as you would expect, but those hic-ups are behind us now. The global IBM sales force and Business Partner community has created a lot more demand for our analytics so we are busy scaling out our delivery capability.  The good news is that we’re meeting our business goals.

The Line: Can you give us an idea of what you and the team are working on next?

Liam: Right, well we’re working on a couple of important things. First is an automated “self-service” AER generation model that will enable us to scale out further still and present the AER’s as a service to IBM and its Business Partners. And second, as you can imagine, the data driven AER reports are causing a lot of IT managers to rethink their infrastructure and begin transitioning to a new software defined approach. We are continuing to refine our migration automation to assist clients with the transition, especially between backup approaches.

The Line: Before ending, I have to ask about your Twitter handle @keydellkop. What’s the story?

Liam: Hmmm, bit of strange explanation here I am afraid to say, it’s more a play on words. I see much of life being a set of confused circumstances that can be placed into an ultimate order. This reminds me of the Keystone Kops. On that theme, I reside in an area called Keydell in the south of England and being a manic Liverpool supporter, you get The Kop (the famous Liverpool stand at the Anfield stadium) – Hence @KeydellKop. All tweets are my own covering such subjects as Information Protection, Liverpool Football Club, life in general with a smidge of humor thrown in where appropriate.

The Line: Liam, thank you for spending a few minutes with me and sharing your story with my readers.

If you have questions for Liam, please join the conversation below.

* Backup Analysis Engine Reports from >1.5Exabytes data analyzed by Butterfly Software.  Savings are the average of individual customer Analysis Engine Reports from Butterfly Software, May 2013, n+450.  The savings include cumulative 36-month hardware, hardware maintenance, and electrical power savings.  Excludes one-time TSM migration cost.  All client examples cited or described are presented as illustrations of the manner in which some clients have used IBM products and the results they have achieved.  Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions.  Contact IBM to see what we can do for you.

Does trying to find a better economic approach to storage give you “Butterflies”? (Part 1)

Financial Belt Tightening 8595689Recently there has been a lot of talk coming out of IBM about the economics of storage. In fact, all of my top 5 observations from IBM Edge 2013 had something to do with economics. Sure, technology advancements are still important, but increasingly what CIO’s are chasing is a clear understanding of the economic benefits a new technology approach can bring.

Monarch butterflyLate last year IBM acquired Butterfly Software, a small company in the United Kingdom who had developed some BIG thoughts around communicating the economic benefits brought by certain approaches to storage. Butterfly has developed what they call an Analysis Engine Report (AER) that follows a straight forward thought process.

  1. Using a very light weight collector, gather real data about the existing storage infrastructure at a potential customer.
  2. Using that data, explain in good detail what the as-is effectiveness of the environment is and what costs will look like in five years time if the customer continues on the current approach.
  3. Show what a transformed storage infrastructure would look like compared to the as-is approach, and more importantly what future costs could look like compared to continuing as-is.

Butterfly has two flavors of AER’s, one for primary storage infrastructure and one for copy data (or backup) infrastructure. They have analyzed some 850 different infrastructures scattered across every industry in most parts of the world and comprising over 2 exabytes of data. In all that analysis, they have discovered some remarkable things about IBM’s ability to transform the economic future of storage for its clients. (Editorial comment: the results probably have something to do with why IBM acquired the company).

Butterfly AER

I was able to catch up with the global post-sales face of Butterfly, Liam Devine, to talk about the company and where he sees the storage economics conversation going (see if you can hear his distinctly British accent come through).

The Line: Liam, let’s start with a little background for my readers. You’ve been a systems and storage manager, consulted for some pretty impressive companies in finance and healthcare and even spent a little time at vendors like NEC and EMC.

Liam: That’s right. I’ve had the pleasure of holding numerous IT roles in a variety of interesting companies for some 14 years previous to moving over to The Dark Side or vendor land, where I have been this past 12 years. The majority of that time spent at EMC in two stints, first supporting financial customers and second supporting Electronic Data Systems (EDS – now HP Enterprise Services).

The Line: Okay, so rewind us back to 2011. What was the motivation for joining Butterfly Software?

Liam: Everything is becoming software defined. Compute is ahead of storage, but storage is accelerating quickly. The reasons are rooted in economics. I became aware of this Butterfly company who were creating unique analytics to help communicate the economic value of shifting from traditional storage infrastructure approaches to more software oriented approaches. Once I had spoken to the founders and understood their strategic vision encompassing both primary storage infrastructure and data, there was no where else I wanted to be.

Check back soon for Part 2 of the interview as Liam shares some of the extraordinary savings that Butterfly analytics have uncovered.

IBM on Software Defined Storage

I routinely follow a number of blogs by storage industry thought leaders. Among them is a usually insightful blog by EMC’s Chuck Hollis. Last Friday I read his post titled Software-Defined Storage – Where Are We?  As Chuck described, the post was intended to explore “Where are the flags being planted?  Is there any consistency in the perspectives?  How do various vendor views stack up?  And what might we see in the future?” The questions themselves captured my attention. First, they are great questions that everyone who is watching this space should want answered. Second, I wanted to see which vendors EMC was interested in comparing with. Notably missing from Chuck’s list was IBM, a vendor who both has a lot to say and a lot to offer on the subject of software defined.

I thought Chuck did a nice job in the sections of his post on Basic [Software Defined Storage] SDS Concepts and Towards a Superset of Characteristics. My only critique would be that he didn’t acknowledge some of the forward leaning work being done in the space. For example, in the area of concepts he rightly observed of the past that “there is little consensus on what is software-defined storage, and what isn’t” but he failed to acknowledge the important work by the team at IDC in providing the industry with an unbiased nomenclature and taxonomy for software-based storage. See my post from a couple months back on How do you define Software-defined Storage?  Chuck also suggested that “the required technology isn’t quite there yet — but there are all signs that it’s coming along very quickly.  By next year, there should be several good products in the marketplace to concretely evaluate.”  That may be true for EMC and the rest of the vendors he chose to talk about, but by the end of this post I hope you will understand that when it comes to IBM, Chuck’s statement is several years behind.

The aim of software-defined

Software defined storage isn’t an end unto itself. It is a necessary piece in the evolution to a software defined environment (SDE), also referred to as a software defined datacenter IBM Software Defined Environments(SDDC). I like IDC’s definition of what this is, “a loosely coupled set of software components that seek to virtualize and federate datacenter-wide hardware resources such as storage, compute, and network resources and eventually virtualize facilities-centric resources as well. The goal for a software-defined datacenter is to tie together these various disparate resources in the datacenter and make the datacenter available in the form of an integrated service…” IBM is one of the few vendors who are working in all the areas of software-defined and Jamie Thomas, Vice President and General Manager of Software Defined Systems is the head of the division that coordinates that work.

IBM SDE PatternsJamie thinks about SDE from the perspective of workloads and patterns of expertise that can help simplify operations reducing labor costs and improving security. A software defined environment is also more responsive and adaptive as workloads expand from today’s enterprise applications to mobile, social, big data analytics and cloud. Her view is that open source and standards communities are crucial to the long term viability of SDE. IBMs work in software defined compute with the Open Virtualization Alliance and oVirt, our work in SDN with Open Daylight, and our work in cloud with OpenStack is helping propel the construction of software defined environments.

IBM SDE Standards

IBM’s work in software defined storage

The words have morphed over time. What VMware did for Intel servers has been referred to as a hypervisor, as virtualization, and now is being called software defined compute to line up with the rest of the SDE vocabulary. The foundation of a software defined environment is, well, software that offers a full suite of services and federates physical infrastructure together to provide the basic commodity. In the case of VMware, the commodity is Intel megahertz. In the case of SDS, the commodity is terabytes.

IBM clients first began using these capabilities in 2003 with the IBM SAN Volume Controller software drawing its compute horsepower from commodity Intel processors and managing terabytes provided by federated disk arrays. That software base has since been renamed to the Storwize family software platform and given an expanded set of commodity engines to run on. Today, there are federating systems with no storage capacity of their own, systems with internal solid-state drives to speed the input/output (I/O) of other federated storage, and systems that carry their own serial attached SCSI (SAS) disk and flash capacity to augment other federated capacity. There are entry models, midrange models, enterprise models and even models that are embedded in the IBM PureSystems family converged infrastructure. For a more complete description of the suite of services offered, the breadth of physical storage that can be federated, and the I/O performance that can be enjoyed, see my post Has IBM created a software-defined storage platform? Over the last decade, this software platform has been referred to as virtualization, as a storage hypervisor, and now with a total capacity under Storwize software management on its way to an exabyte, we call it SDS v1.0.

IBM Software Defined Storage (SDS)

SDS v2.0

SDS v2.0 came along early in 2012 with the introduction of IBM SmartCloud Virtual Storage Center (VSC). Building on the successful base of the Storwize family software platform, VSC added a number of important capabilities.

  • Service catalog:  Administrators organize the suite of VSC storage services into named patterns – catalog entries.  Patterns describe workload needs in terms of capacity efficiency, I/O performance, access resilience, and data protection.  For example, a pattern for ‘Database’ might describe needs that translate to compressed, thin provisioned capacity on a hybrid flash and SAS pool, with a single direction synchronous mirror and load-balanced multi-path access. The beauty of the service catalog is that requestors (application owners or orchestrators as we’ll see shortly) don’t need to concern themselves with the details. They just need to know they need ‘Database’ capacity.
  • Programmable means of requesting services: VSC includes API’s that surface the service catalog patterns to portals and orchestrators. The questions that must be answered are quite simple. How much capacity do you need? In what service level do you need it? Who needs access? From there, storage-centric orchestration takes over and performs all the low level mundane tasks of satisfying the request. And it works on a wide variety of physical storage infrastructure. The VSC API’s have been consumed by an end-user accessible portal, SmartCloud Storage Access, and by higher level SDE orchestrators like SmartCloud Orchestrator.
  • Metering for usage-based chargeback: Service levels and capacity usage is metered in VSC. Metering information is made available to usage and cost managers like SmartCloud Cost Management so that individual consumers may be shown or charged for their consumption. Because VSC meters service levels as well as usage, higher prices can be established for higher levels of SDS service. Remember IBM’s perspective, we are building out SDE of which SDS is a necessary part. SmartCloud Cost Management follows the model providing insight into the full spectrum of virtualized and physical assets.
  • Management information and analytics:  When the challenges of day-to-day operations happen (and they do happen most every day), administrators need straightforward information surrounded by visually intuitive graphics and analytic-driven automation to speed decision making and problem resolution. Last year we
    SmartCloud Virtual Storage Center management and analytics
    SmartCloud Virtual Storage Center management and analytics

    introduced just this approach with SmartCloud Virtual Storage Center. I discussed it more thoroughly in my post Do IT managers really “manage” storage anymore? If you watch the news, you’ll know that IBM is leading a transformation toward cognitive computing. We’re not there yet with the management of SDS, but consider this scenario. You are an IT manager who has invested in two tiers of physical disk arrays, probably from different vendors. You have also added a third storage technology – a purpose-built flash drawer. You have gathered all that physical capacity and put it under the management of a software defined storage layer like the SmartCloud Virtual Storage Center. All of your workloads store their data in virtual volumes that SmartCloud Virtual Storage Center can move at-will across any of the physical disk arrays or flash storage. Knowing which ones to move, when, and where to move them is where SmartCloud Virtual Storage Center excels. Here’s an example. Let’s suppose there is a particular database workload that is only active during month end processing. The analytics in SmartCloud Virtual Storage Center can discover this and create a pattern of sorts that has this volume living in a hybrid pool of tier-1 and flash storage during month end and on tier-2 storage the rest of the month. In preparation for month end, the volume can be transparently staged into the hybrid pool (we call it an EasyTier pool), at which point more real-time analytics take over identifying which blocks inside the database are being most accessed. Only these are actually staged into flash leaving the lesser utilized blocks on tier-1 spinning disks. Can you see the efficiency?

SDS v3.0

So where are we?

  • SDS v1.0 delivered. Software that offers a full suite of services and federates physical infrastructure.
  • SDS v2.0 delivered. A service catalog with a programmable means of accessing services, a portal and SDE cloud orchestration integration. Metering for usage-based chargeback and management information with analytics.

Where do we go from here? At IBM we’re busy opening up the Storwize family software platform for industry innovation, helping VSC become even more aware of application patterns, and progressing the notion of cognitive and analytic driven decision making in SDS.  Watch this space!

Users of IBM SDS speak

More than just theory and a point of view, IBM SDS is helping real customers. At the recent IBM Edge conference there were over 75 client testimonials shared, many of them about the benefits realized from using IBM SDS. I covered several of them in my post on Edge Day 2.

One of the coolest stories came earlier in the year at the IBM Pulse conference from IBMs internal IT operations.  IBMs CIO manages 100 petabytes of data and by leveraging SmartCloud Virtual Storage Center was able to reduce costs by 50% with no impact to performance.

Did this help clarify IBM’s position in SDS?

Edge 2013 Day 2 – “I could have cried!”

In my post yesterday I mentioned that we heard the first of over 75 client testimonials being shared at IBM Edge 2013. Today, the client stories came fast and furious. Several caught my attention.

Sprint is a US telecommunications firm who has 90% of their 16 petabytes of SAN storage capacity under the control of software-defined storage – specifically the Storwize family software running on IBM SAN Volume Controller engines. Because of the flexibility of software-defined storage, Sprint was able to seamlessly introduce IBM FlashSystem capacity as a new tier of MicroLatency capacity and transparently move a call center workload to the new flash storage. The results were impressive: 45x faster access to customer records. That’s right, a 4,500% improvement!

eBay is both the world’s largest online marketplace as well as a company that offers solutions to help foster merchant growth. They are serious about open collaborative solutions in their datacenters. When it comes to cloud, they use OpenStack. eBay implemented IBM XIV storage with its OpenStack Cinder driver integration and now is able to guarantee storage service levels to their internal customers.

Ricoh is a global technology company specializing in office imaging equipment, production print solutions, document management systems and IT services. All of their physical storage capacity is under the control of a Storwize family software-defined layer inside the IBM SmartCloud Virtual Storage Center. This enabled extreme efficiency saving them 125TB of real capacity and a 40% cost reduction with tiering. As the Ricoh speaker left the stage, the IBM host asked an unscripted question “Can you imagine running your IT without software-defined storage virtualization?” to which Ricoh responded “No! It would be catastrophic.”

LPL Financial is the largest independent financial dealer-broker in the US. Their physical storage infrastructure was multi-vendor, isolated in islands, underutilized, with little administrative visibility. The inflexible nature of physical storage had isolated workloads with certain disk arrays even though excess capacity might exist elsewhere in the datacenter. LPL implemented SmartCloud Virtual Storage Center (built on a Storwize family software-defined layer) for their most problem areas in just three months – 3 months! The seamless workload mobility provided by this software-defined storage approach solved issues like performance incident resolution, islands of waste, and the headaches associated with retiring old physical arrays. The quote of the day came from tears-of-joyChris Peek, Senior Vice President of Production Engineering at LPL Financial: “It was so good I could have cried!” LPL continued by building a new datacenter with a 100% software-defined storage infrastructure using SmartCloud Virtual Storage Center. Using software layer capabilities like tiering, thin provisioning and real-time compression they were able to save an astounding 47% in total infrastructure.

Arkivum is a public archive cloud provider in Europe. They use tape media with the IBM Linear Tape File System as the foundation of an archive service that economically offers a 100% guarantee for 25 years. The thing that struck me is in a storage industry that speaks of things in terms of five-9’s, Arkivum are combining cloud and tape with a 100% guarantee.

There were others too. Kroger is a grocery chain in the US. They implemented IBM FlashSystem and reduced latency tenfold for their Oracle Retail platform. And CloudAccess.net is a cloud service provider who needed to drive 400,000 I/O’s per second. They replaced a bank of disk drives with a single IBM FlashSystem drawer at one-tenth the cost.

I have to say that all the focus on client outcomes is refreshing. Sure, Edge has plenty of discussion around IBM’s strategy and the innovative technology being announced this week But I agree with Kim Stevenson, Chief Information Officer at Intel, who said “Organizations don’t buy technology, they buy benefits.”

I’m sure there were other client stories shared in sessions that I missed. Share your favorite outcomes below. Leave a comment!

Do IT managers really “manage” storage anymore?

Originally posted on May 15, 2013 as a guest blog on Service Management 360
 

A while back, I asked an IT manager in Europe what tools he uses to manage storage. His response changed the way I think about our mission as a supplier. He leaned back in his chair and with a grin on his face said, “Manage storage? I don’t manage storage. I herd storage.” As the conversation progressed, I listened to a story that has become quite familiar to most folks who are involved in the care and feeding of storage infrastructure:

  • A constant coming and going of physical disk arrays Under water
  • Headaches associated with provisioning, scheduling data migration and the associated application outages
  • Never ending series of reactionary events around backup, performance, replication, etc
  • Disaster recovery tests to perform

And this IT manager was trying to do all that inside the constraints of a multi-vendor infrastructure that his CFO liked because it gave them bargaining power in their hardware infrastructure purchases. Sounds like a new job description – storage herder.

Today’s storage management crisis
In the time since that conversation, I’ve come to realize that IT is in a real crisis when it comes to storage administration. There is a generation of admins who grew up in a world of fast, but not overwhelming data growth, a world where a server was a server, a disk was a disk and managing them required a fairly good understanding of how all of it physically worked. These ‘storage scientists’ make great use of all the knobs and dials that we vendors expose to them in order to tune and tweak their environment. They take great time and great pride in “managing” storage. I know, because before coming to IBM, I spent the first 10 years of my career as one of those storage scientists. The IT manager I met with in Europe was one of them too. The crisis is that the world that trained these experts is rapidly evaporating.

Today’s world is marked by out-of-control, overwhelming data growth. Analysts have tried to describe the pace using words like avalanche, explosion, and tidal wave. You see the mental imagery. Whatever you call it, the reality is that data is growing faster than hardware vendors’ ability to increase the areal density of disk drives, meaning that from here on out, there is going to be more coming of storage capacity than there is going.

Today’s world is also virtual. Words like software-defined have worked their way into common vocabulary. Servers aren’t servers; they are virtual machines that are elastic in horsepower and mobile. Tapes aren’t tapes; they are a deduplicated, replicated figment of the imagination that is stored on a disk. And increasingly, disks aren’t disks; they are thin provisioned, compressed virtual volumes that are replicated, snapshotted and mobile from tier-to-tier, vendor-to-vendor, and site-to-site. Virtualization has also dramatically increased pace. There is no longer a built-in physical governor on the speed of provisioning new workloads – no cables to pull and physical infrastructure to power up. Workloads, and the data they work on, can be built up and torn down almost at the speed of thought. And managing global data availability (disaster avoidance, recovery, discovery) for all this data is converging into a single idea. In this world, an administrator who attempts to manage as a storage scientist simply can’t keep up. He becomes a storage herder.

A new generation of storage admins
Another shift I am seeing with a lot of the clients I work with is that the traditional storage administrators – the storage scientist generation – aren’t the only people with storage responsibilities any more. They are being joined by a new breed of virtual environment, converged infrastructure and cloud admins who are responsible – but not nearly as prepared – for managing storage. I call them the iPod generation. These folks have been brought up with a whole different level of administrative expectation. They want to pick an outcome and have the “system” just take care of the details. They have no desire or expertise to deal with knobs and dials to tune an environment. And they value learning an interface approach once and then using it everywhere. These are the guys and gals who say “I like the ‘Apple’ interface across my laptop, tablet and phone” or  “the ‘Google’ interface across my tablet, phone and glasses”.  In  IT, these folks want to learn the interface and use it for all things storage regardless of what physical infrastructure vendor the CFO might be able to get a better deal from at the moment. I guess you could say that they want to deal in hardware-agnostic service level agreements.

So, IT has the storage scientists who, using knobs and dials, simply can’t keep up. They are scientists reduced to herders. And you can imagine the chaos if you were to hand the iPod generation a different command-line interface (CLI) for each vendor and type of storage device in the infrastructure.

An entirely new approach
IBM, with a lot of help from its clients, is addressing the problem with an entirely new approach to storage administration. Under the umbrella of IBM Design Thinking we are bringing together visually intuitive graphics and deep automation to create a storage administration approach that is so clear and straightforward that overall productivity climbs considerably.

One of the things I find most intriguing about this new approach to storage administration is that, for the first time in my career, social interaction was leveraged to engage clients at every phase of the development process. We call it transparent development. Through IBM Service Management Connect (which is powered by IBM Connections software)  the designers and developers at IBM engaged hundreds of clients, administrators, and business partners to understand needs, share designs, test implementations, get feedback, and ultimately create something that hasn’t been seen before in competitive offerings. You can get a feel for the public side of this interaction in the Tivoli Storage Operations Center community.

The thing that has stopped our clients in their tracks is that, because this is a software-defined approach to storage, what we’ve designed works equally well regardless of the physical storage infrastructure they choose. Choose infrastructure from EMC, NetApp, Dell, IBM, HP, Brocade, Cisco, Oracle StorageTek – it really doesn’t matter.  CFOs and IT managers have complete flexibility to get their best deal on hardware without affecting capability and productivity one bit. Sounds interesting huh?

Today’s IT manager is concerned with storage of two basic types of data.

  1. First is the data that supports the applications that run the business. This data is
    SmartCloud Virtual Storage Center administration for primary data
    SmartCloud Virtual Storage Center administration for primary data

    characterized by the need for speed, mobility, accessibility and resiliency.The software-defined storage layerforthis primary data takes care of things like provisioning (from the storage capacity, to the replication relationships, to the storage networks that tie it all together), auto-tiering, performance analytics and problem isolation, snapshotting and replication. The clients we work with who choose IBM SmartCloud Virtual Storage Center as the software-defined layer for managing primary data enjoy dramatically improved productivity and common capabilities regardless of their choice in physical storage infrastructure.

  2. Second is the copy data – the copies of primary data that are maintained for backups, for archives, for development and testing, for future analytics and so on.
    Tivoli Storage Manager Operations Center administration for copy data
    Tivoli Storage Manager Operations Center administration for copy data

    In sheer volume, copy data is now larger and growing faster than primary data, but it also has a significantly different set of needs. Copy data has the need for hyper efficiency, remote protection, and discoverability. The software layer for this ‘copy data’ takes care of things like capturing, deduplicating, compressing, encrypting, vaulting, and inventorying. Regardless of their chosen storage hardware infrastructure, the software layer for managing copy data is  IBM Tivoli Storage Manager (TSM).  The dramatically improved productivity is coming soon from the completely new TSM Operations Center.

Two visually intuitive and deeply automated interfaces – one for primary data and one for copy data – regardless of your choice in physical storage infrastructure. This is one of the use cases that clients we work with point to as making the idea of software-defined storage come alive.

I was recently asked “As IT managers shift toward a software-defined approach to managing storage, how well do the storage scientist skills adapt?” It’s a good question. I would like to hear from you. What do you think?

EMC ViPR: Breathtaking, but not Breakthrough

Last Monday, EMC announced ViPR as its new Software-defined Storage platform. Almost simultaneously, Chuck Hollis described it as ‘Breathtaking’ in his usually excellent blog. I must admit, one thing I routinely find breathtaking about EMC is their approach to marketing. They have a knack for being able to take unexceptional technology (or, as in this case, combinations of technology and theories about the future), and spin an extraordinarily compelling story. With all seriousness and without tongue in cheek… Nicely done EMC!

Chuck’s blog described ViPR in three parts. To a heritage EMC customer, these three concepts may seem revolutionary because, to-date, EMC hasn’t successfully offered this sort of technology. However, for clients of IBM, Hitachi, or other smaller vendors, the environment EMC hopes to create with ViPR will seem familiar because, in large part, it’s been evolving for years. Let’s look at the three parts one at a time.

unexceptional

The first ViPR idea Chuck describes is to help create “a better control plane for existing storage arrays: EMC and others”. To be clear, EMC is just getting started with ViPR so initially the ‘others’ include only NetApp, but you can expect the list to expand if ViPR matures. Chuck is describing a software virtualization layer that discovers existing physical storage arrays and allows administrators to construct virtual storage arrays as abstractions across the multiple units. The ‘better control plane’ comes when the virtual array capabilities are surfaced via a storage service catalog that describes things like snaps, replication, remote sites, etc.  Administrators are then able to make requests for these services, in turn driving an orchestrated set of provisioning steps. IBM clients over the last decade have come to understand that this first idea is extraordinarily powerful. Today, the IBM SmartCloud Virtual Storage Center helps clients create an software-defined abstraction layer over existing physical arrays from EMC and LOTS of others. Regardless of the brand, tier, or capability of your existing physical arrays, the virtual arrays are capable of snaps, replication, stretching a virtual volume across two physical sites at distance to facilitate active-active datacenters, thin provisioning, real-time compression, transparent data mobility, etc. Administrators can describe named collections of services for different workloads — “here are the services ‘Database’ workloads need, and here are the different set of services ‘E-mail’ workloads need” — greatly simplifying provisioning. If you need help in understanding your unique data and its needs, IBM has developed consulting services to assist. Once service levels are defined and named, administrators simply specify a) what service level they need, b) how much capacity they need in that service level, and c) what machine needs access. Requests kick off an orchestrated workflow that performs all the mundane tasks of creating virtual volumes with the right services, provisioning the remote replication relationships if needed, zoning the SAN and masking the virtual volumes for secure access, configuring the host multi-pathing for access resiliency, etc.  Requests can be made by administrators via a visually intuitive GUI, or programatically via REST API’s, an OpenStack Cinder plug-in, or deep integration with VMware vSphere Storage API’s. SmartCloud Virtual Storage Center also meters client capacity usage by service level. CIO’s can effectively manage these and other IT costs with IBM SmartCloud Cost Management.

SmartCloud Virtual Storage Center visually intuitive GUI
SmartCloud Virtual Storage Center visually intuitive GUI

The second ViPR idea Chuck describes is ‘changing how data is presented depending on a given application’s access needs’. What he is describing is a storage approach that layers access methods. In the case of ViPR, Chuck describes NFS as the base method and then other methods that could be layered on top, like object-over-NFS or HDFS-over NFS. The SmartCloud Virtual Storage Center implements block storage as the base layer. Related offerings that use the same block code stack, like the IBM Storwize V7000 Unified and the IBM SONAS, offer file-over-block and are looking forward to adding other object methods. This area is evolving rapidly and I agree with Chuck’s speculation that storing a piece of data once and accessing it through multiple methods could be important in the future.

The third ViPR idea Chuck describes is ‘Storage Services For Cloud Applications’. In his blog, he’s wrestling with a great question. A decade ago, ‘server virtualization’ was a budding young concept. Today it is foundational to the way we do IT. CIO’s have long since made their decisions on server virtualization and are now working to complete the virtual datacenter. We’ve found that with the servers handled, virtualizing the storage infrastructure is the focus in 2013. The question Chuck is wrestling with is “Is ViPR a modern interpretation of what we now mean when we say ‘storage virtualization’?” It’s certainly EMC’s modern interpretation having tried before to virtualize physical storage with Invista (circa 2005) and VPLEX (circa 2010). At IBM, we started virtualizing storage in 2003. Today, that software stack and its ecosystem of integration with applications, server hypervisors, orchestrators, cloud stacks, and cost managers is implemented in thousands of datacenters. If nothing else, we’ve stayed focused on growing what works. In recent posts, I have explored how the industry defines software-defined storage, and whether it is a key to a successful private cloud. If EMC breaks tradition and sticks with ViPR for the long term, the words they are using in their marketing demonstrate they understand what ViPR needs to become if it wants to be a complete offering. However, as CIO’s make decisions on software-defining their storage in 2013, I think they’ll find that the IBM SmartCloud Virtual Storage Center is already accomplishing for storage what server hypervisors have accomplished for servers.

SNW Spring 2013 recap

(Originally posted April 4, 2013 on my domain blog at ibm.com. Reposted here for completeness as I move to WordPress.com)
 

I’m just returning from the SNW Spring conference in Orlando. It seemed sparsely attended but my 5-foot tall wife of almost 28 years has always told me that dynamite comes in small packages (I believe her!).

As I noted in my last post, I was in Orlando to participate in a round table discussion on storage hypervisors hosted by ESG Senior Analyst Mark Peters. I was joined by Claus Mikkelsen – Chief Scientist at Hitachi Data Systems, Mark Davis – CEO of Virsto (now a VMware company), and George Teixeira – CEO of DataCore. Conspicuously missing from the conversation both at this SNW and at a similar round table held during the SNW Fall 2012 conference was any representation from EMC. More on that in a moment.

The session this time drew a crowd roughly three times the size of the Fall 2012 installment – a completely full room. And the level of audience participation in questioning the panel members further demonstrated just how much the industry conversation is accelerating. I was pleased to see that most of the discussion was focused on use cases for what was interchangeably referred to as storage virtualization, storage hypervisors, and software-defined storage. Following are a few of the use cases that were probed on.

Data migration was noted as an early and enduring use case for software-defined storage. Today’s physical disk arrays are capable of housing many TB’s of data, often from MANY simultaneous business applications. When one of these physical disk arrays has reached the end of its useful life (the lease is about to terminate), the process of emptying the data from that old disk array to a newer, more modern disk array can be consuming. The difficult part isn’t the volume of data, it’s the number of application disruptions that have to be scheduled to make the data available for moving. And if you happen to be switching physical disk array vendors, that can create related effort on each of the host machines accessing the data to ensure the correct drivers are installed. Clients we have worked with tell us the process can take months. That’s not only hard on the storage administration team, but it’s also wasteful because a) you have to bring in a new target array months ahead of time and b) both it and the source array remain only partially used during those months as the data is migrated. The economic value of solving this data migration issue is an early use case that has fueled solutions like IBM SAN Volume Controller (SVC), Hitachi Virtual Storage Platform, and DataCore SANsymphony-V. Each of these are designed to provide the basic mechanics of storage virtualization and mobility across most any physical disk array you might choose – all without disruption of any kind to the business applications that are accessing the data.

A quick side comment. While the data migration use case carries a strong economic benefit for IT managers (transparent migration from old to new disk arrays), it can just as easily be used to migrate from old to new disk array ‘vendors’. For the IT manager, this has the potential for even greater economic benefit because it creates the very real threat of competition among physical disk array vendors driving cost down and service up. But for an incumbent disk array vendor, there’s not a lot of built in motivation to introduce their client to such a technology. At SNW this week, it was suggested that this dynamic may be responsible for the relatively low awareness and deployment of storage virtualization technologies. Incumbent vendors are happy to keep their clients in the dark about software-defined storage and data migration use cases. Interestingly, almost 10 years after these technologies were first introduced, EMC (whose market share makes them the most frequent incumbent physical disk array vendor), is still only talking about this topic in the shadows of ‘small NDA sessions’. See Chuck’s Blog from earlier this week.

Flash storage ‘everywhere’ was identified as a more recent, and perhaps more powerful use case. SNW drew a strong contingent of storage industry analysts from firms like IDC, ESG, Evaluator Group, Silverton Consulting and Mesabi Group. A consistent theme from the analysts I spoke with, as well as from the panel discussion, is that data and performance hungry workloads are driving an unusually rapid adoption of flash storage. Early deployments were as simple as adding a new ‘flash’ disk type into existing physical disk arrays, but now flash is showing up ‘everywhere’ in the data path from the server on down. The frontier now is in the efficient management of this relatively expensive real estate whether it is deployed in disk arrays, in purpose-built drawers, or in servers. Flash is simply too expensive to park whole storage volumes on because a lot of what gets stored isn’t frequently accessed and would be better stored on something slower and less expensive. This is where the basic mechanics of storage virtualization and mobility from the Data migration use case come in. At IBM, we’ve evolved the original SVC capabilities to now couple the basic mechanics with analytics and automation that guide how and when to employ the mechanics most efficiently. The evolved offering, SmartCloud Virtual Storage Center, was introduced last year. Consider this scenario. You are an IT manager who has invested in two tiers of physical disk arrays. You have also added a third disk technology – a purpose-built flash drawer (perhaps an IBM TMS RamSan). You have gathered all that physical capacity and put it under the management of a software-defined storage layer like the SmartCloud Virtual Storage Center. All of your application data is stored in virtual volumes that SmartCloud Virtual Storage Center can move at-will across any of the physical disk arrays or flash storage. Knowing which ones to move, when, and where to move them is where SmartCloud Virtual Storage Center excels. Here’s an example. Let’s suppose there is a particular database-driven workload that is only active during month end processing. The analytics engine in SmartCloud Virtual Storage Center can discover this and create a pattern of sorts that has this volume living in a hybrid pool of tier-1 and flash storage during month end and on tier-2 storage the rest of the month. In preparation for month end, the volume can be transparently staged into the hybrid pool (we call it an EasyTier pool), at which point more real-time analytics take over identifying which blocks inside the database are being most accessed. Only these are actually staged into flash leaving the lesser utilized blocks on tier-1 spinning disks. Do you see the efficiency? The icing on the cake comes when all this data is compressed in real-time by the storage hypervisor. This kind of intelligent analytics – directing the mechanics of mobility – from a software-defined layer are critical to economically deploying flash.  

Commoditization of physical disk capacity YYYYiiiikkkkeeeessss!!! One of the more insightful observations offered by panel members, including VMware, was that if you follow the intent of a software-defined storage layer to its conclusion, it leads to a commoditization of physical disk capacity prices. From a client perspective, this is welcomed news, and really, it’s economically required to keep storage viable. Think about it, data is already growing at a faster pace than disk vendor ability to improve areal density (the primary driver behind reduced cost), and the rate of data growth is only increasing. Intelligence, analytics, efficiency, mobility… in a software-defined storage layer will increase in value freeing IT managers to shift, in mass, toward much lower cost storage capacity.

Another quick side comment. With EMC still lurking in the shadows on this conversation and VMware agreeing with the ultimate end state, it seems the two still have some internal issues to resolve. I don’t fault them. It’s a sobering thought for any vendor who has a substantial business in physical disk capacity. But at least for the two disk vendors represented on this week’s SNW panel, we are actively engaged in helping clients achieve the necessary end goal.

The conversation continues. Check out the blog by Kate Davis at HP, How do you define software-defined storage?

Join the conversation! Share your point of view here. Follow me on Twitter @RonRiffe and the industry conversation under #SoftwareDefinedStorage.

Storage hypervisor round table at SNW Spring 2013

(Originally posted March 27, 2013 on my domain blog at ibm.com. Reposted here for completeness as I move to WordPress.com)
 

Greetings! And thank you for taking a moment to peruse my first post as an independent blogger. A short introduction is in order.

I am coming up on my 27th anniversary in the storage industry having held positions as both a consumer and a peddler and in roles spanning from administrator, to thought leader, to strategic planner, to senior manager. Most recently I have served as the Business Line Manager for IBM Storage Software. This blog – The Line – is for the open exchange of ideas relating to the storage software and systems that come together to solve the enduring challenges we all face in taking care of the world’s data.

Back at the Storage Networking World Fall 2012 conference, I participated in a round table on storage hypervisors hosted by ESG Senior Analyst Mark Peters. I was joined by Claus Mikkelsen – Chief Scientist at Hitachi Data Systems, Mark Davis – CEO of Virsto (now a VMware company), and George Teixeira – CEO of DataCore. Following the conference, Mark Peters posted a very nice series of three video blogs with perspective from the round table participants. They are worth a listen.

Part 1

Part 2

Part 3

The discussion is continuing at SNW Spring 2013 at Rosen Shingle Creek in Orlando, Florida. The panel discussion “Analyst Perspective: The Storage Hypervisor: Myth or Reality?” will happen on Tuesday, April 2 at 5:00 pm EDT

With the mechanics of storage virtualization being offered by IBM and Hitachi for 10 and 9 years respectively, EMC joining the list almost 3 years ago, VMware’s acquisition of Virsto earlier this year, and talk of software-defined everything, the conversation around storage hypervisors is heating up and that’s been keeping us very, very busy. As we prepare for the round table next week, I thought it worthwhile to offer a point of view on storage hypervisors.

The fuel behind the storage hypervisor conversation is use cases – beyond being a cool technology, how does it contribute to helping me solve those enduring challenges we all face in taking care of the world’s data?

Perhaps the most obvious expectation is improved efficiency and data mobility. The basic idea behind hypervisors (server or storage) is that they allow you to gather up physical resources into a pool, and then consume virtual slices of that pool until it’s all gone (this is how you get the really high utilization). The kicker comes from being able to non-disruptively move those slices around. In the case of a storage hypervisor, you can move a slice (or virtual volume) from tier to tier, from vendor to vendor, and now, from site to site all while the applications are online and accessing the data. This opens up all kinds of use cases that have been described as “cloud”. One of the coolest is active-active datacenters. Each year almost all of the tropical storm activity in the Atlantic Ocean happens between June and November. If you operate a datacenter near the Atlantic coast, and if you have implemented both a server hypervisor (let’s say VMware vSphere for your Intel servers and IBM PowerVM for your Power systems), and a storage hypervisor (let’s say IBM SmartCloud Virtual Storage Center), then here’s how you might react to a tropical storm in the forecast: “Hey, the hurricane is coming, let’s move operations to our active-active datacenter further inland…” IBM SmartCloud Virtual Storage Center in a stretched-cluster configuration allows you to access the same data at both locations giving you the ability to do an inter-site VMware vMotion and PowerVM Live Partition Mobility (LPM) move – non-disruptively. IBM and its Business Partners have been helping hundreds of clients implement this sort of stretched-cluster configuration all over the world for the last 5 years.

But storage hypervisors are more, much more than just virtual slices and data mobility. We’re driving cost out of the equation. Sure, we’re getting high utilization from allocating virtual slices, but are we being as smart as we could be about allocating those slices? A good storage hypervisor helps you be smart.

  • Thin provisioning:  You have a client that asks for 500GB of new capacity. You’re going to give it to him as thin provisioned virtual capacity which is a fancy way of saying you’re not going to actually back it with real physical storage until he writes real data on it. That helps you keep cost down.
  • Compression: Same guy also asks to keep several snapshot copies of his data for recovery purposes. You’re going to start by giving him thin provisioned capacity for those snapshots, but you’re also going to compress whatever data those snapshots produce – again adding to your efficiency. For that matter, you’re going to compress his source data too.
  • Agnostic about vendors: Because you’re getting your storage services from a storage hypervisor (software-defined storage), you have the freedom to shift the physical storage you operate from all tier-1 to a more efficient mix of lower tiers, and while you’re doing it you can create a little competition among as many disk array vendors as you like to get the best price / support.  
  • Smart about tiers: If you shut your eyes real tight and think about the concept of a “virtual” disk that is mobile across arrays and tiers, you’ll quickly start asking questions about having the storage hypervisor monitor the utilization and response of your physical hardware infrastructure, watch for I/O patterns on blocks within that virtual disks, and apply some analytic intelligence toward moving the right data to the right tier to both meet requested SLA’s and optimize utilization of your hardware infrastructure.  This is especially important with flash showing up in multiple places in the infrastructure (in arrays, in the network, in the server). You simply won’t be able to manage all that with a tier-management system that is tied to an array. You need…dare I say it…a software-defined storage layer (a storage hypervisor) that includes both the raw mechanics of virtualization and the analytics to determine when and how to best use the mechanics.

To truly enable a hypervisor – in servers or storage – it’s important that the hypervisor not be dependent on the underlying physical hardware for anything except capacity (compute capacity in the case of a server hypervisor like VMware, storage capacity in the case of a storage hypervisor). Think about it… Wouldn’t it be odd to have a pair of VMware ESX hosts in a cluster, one running on IBM hardware and one on HP hardware, and be told that you couldn’t vMotion a virtual machine between the two because some feature of your virtual machine would just stop working?  If you tie a virtual machine to a specific piece of hardware in order to take advantage of the function in that hardware, it sort of defeats the whole point of mobility. The same thing applies to storage hypervisors. Virtual volumes that are dependent on a particular physical disk array for some function, say mirroring or snapshotting for example, aren’t really mobile from tier to tier or vendor to vendor any more.

But it’s more than just a philosophical issue, there’s real money at stake. The reason so many datacenters have an overabundance of tier-1 disk arrays on the floor is because, historically, if you wanted to take advantage of things like thin provisioning, application-integrated snapshot, robust mirroring for disaster recovery, high performance for database workloads, access to flash storage, etc… you had to buy tier-1 ‘array capacity’ to get access to these tier-1 ‘storage services’ (did you catch the subtle difference?) Now, I don’t have anything against tier-1 disk arrays (my company sells a really good one). In fact, they have a great reputation for availability (a lot of the bulk in these units are sophisticated, redundant electronics that keep the thing available all the time). But with a good storage hypervisor, tier-1 ‘storage services’ are no longer tied to tier-1 ‘array capacity’ because the service levels are provided by the hypervisor. Capacity…is capacity…and you can choose any kind you want. Many clients we work with are discovering the huge cost savings that can be realized by continuing to deliver tier-1 service (from the hypervisor), only doing it on lower-tier disk arrays. We routinely see clients shift their mix of ‘array capacity’ from 70% or 80% tier-1 to 70% or 80%  lower-tier arrays while continuing to deliver tier-1 ‘storage services’ to their data.

Join the conversation! Share your point of view here. And if you are going to be at SNW next week, come by and listen to the round table. I would love to meet you. Follow me on Twitter @RonRiffe