Cloud Lessons Learned
This article tries to take a very pragmatic viewpoint about cloud computing: what are the things we have learned? What do most reasonable analysts and users now agree on, as opposed to questions to which the jury is still out? What should you spend time worrying about, and what should you consider settled, for good or for bad? Finally, with various lessons learned, what should you educate your managers or clients about, so they don’t waste their time or yours?
Since the birth of the phrase “cloud computing” about 12 years ago, we have seen many papers and articles covering it with similar subtitles or taglines: “Cloud Challenges and Opportunities,” “Cloud Computing: Myths and Realities,” or something along those lines. This author is admittedly guilty as charged. Responding to another call for papers on the benefits, barriers, risks, and so forth, of the cloud is therefore a challenge: what can be useful to the reader that has not been said already?
This article tries to take a very pragmatic viewpoint: what are the things we have learned during this journey? What do most reasonable analysts and users now agree on, as opposed to questions on which the jury is still out? What should you spend time worrying about, and what should you consider settled, for good or for bad? Finally, with various lessons learned, what should you educate your managers or clients about, so they don’t waste their time or yours?
My answers to those questions have been influenced by several things:
A historical perspective: utility computing is as old as mainframe timesharing — and was predicted by John McCarthy as early as 1961.
Cloud services are a very broad domain; we use many of them as individuals without much hesitation.
What I have heard from my consulting clients.
My work since 2011 within the Cloud Standards Customer Council (CSCC), recently rebranded as the Cloud Working Group of the Object Management Group (OMG), which I cochair.
With this background in mind, I will attempt to provide guidance on the following topics:
The benefits of cloud adoption — in particular, is cost reduction a real argument and what are the other justifications?
What is the impact of the cloud on how IT relates to the rest of the business?
How do you address security in the cloud?
How do you measure success?
How do you select a cloud strategy given the increasing variety of deployment options?
Cost Shifting, Not Cost Reduction
One of the original arguments made to justify “going to the cloud” was that it would significantly reduce IT costs. We had heard this before; it was one of the major arguments for outsourcing in the 1990s. Cloud vendors emphasized that argument because it was an obvious way to sell their offering. Some customers really believed it, while others may have doubted it, but cynically thought that “it may or may not be true, but that’s the only argument I can make to sell this to my boss or to the board.”
We have learned that the impact of the cloud on IT costs is more complex than a simple reduction. Over the total lifetime of a system or application, renting the service in the cloud could very well cost more than it would to operate it on premises. However:
The nature of the cost is different. Instead of a CAPEX that will be depreciated over three or five years, it is a “pay per use” monthly cost. Thus, there is a risk that you will keep paying a monthly fee beyond the point where the two curves crossed, but frankly, how often have we used equipment or software beyond its depreciation lifetime?
Depreciation is a nice accounting trick, but you still need to pay the price up front. For companies with cash flow limitations, paying by the month is a lot easier to manage.
The cloud reduces or eliminates the risk of ending up with costly unused (or underutilized) systems. If you want or need to get off a particular cloud service, you can stop paying, as long as you have ensured that the contract had reasonable termination clauses.
In fact, the cloud has completely upset the traditional total cost of ownership calculation because the initial purchase or licensing costs are low or zero (you still have to count the labor costs incurred to adopt the solution, including such things as data migration and user training), while the recurring costs are higher than the traditional 15% or 18% annual support costs incurred for on-premises systems.
You should explain to decision makers that the key effects of cloud adoption are to shift costs from CAPEX to OPEX and to reduce the risk of write-offs of unused assets — which is likely to be worthwhile even if the total lifetime cost of a solution is not necessarily lower.
It’s About Agility
So if cost reduction is not the key motivation or benefit of going to the cloud, what other key factor(s) should genuinely influence that decision?
We have learned that a big part of the answer is agility. Call it flexibility or scalability if you want, but “agility” is a bit broader and evokes the principles of the Agile movement. The cloud allows you to try something, fail, and move on to something else with relatively less severe consequences. The cloud does not require you to calculate with high accuracy the amount of resources you will need, or the length of time you will need them. You should be able to scale up and down, to add and remove services, to get the latest updates, and so on — without having to do the work yourself.
Interestingly, while we can be reluctant to use agility or flexibility as a primary motivating factor in the enterprise, it is exactly what we consider on a personal basis when we buy a new smartphone and a communications plan for it. Think about this: I would never have dreamed of paying $750 for a phone — yet I will fairly happily pay $25 per month for three years instead. During that time, I know that I can trade the phone in for a new model, add or remove services, and so forth.
I am not arguing that you should lightly choose the first solution you hear without studying your requirements. After all, switching from one CRM solution in the cloud (to take an example) to another one is costly in terms of configuration, data migration, and learning curve. But choosing the wrong system, or sizing it inaccurately, is no longer a five-year sentence if it is cloud-based.
A primary business justification for “going to the cloud” should be the agility it gives the enterprise to adapt to change through faster IT sourcing and the ability to scale up or down, or to change solutions, whenever necessary.
The Cloud and “Shadow IT”
The IT department used to own the keys to the computing resources, literally and figuratively. The data center was the crown jewel of IT. To most users, it was impressive and mysterious, and a symbol of the relationship between IT and its users.
Because users in a line of business (LOB) had to go through the corporate IT department to obtain any IT resources, they had to follow the process defined by IT. This made the CIO or IT manager both powerful and resented. Few IT people understood the users’ plight of preparing justifications, listening to technical jargon they did not understand, and sitting in countless meetings until finally (i.e., after their project had been delayed by lack of the required capability) receiving a deliverable that may or may not have been what they needed. And then they got the bill.
What we have learned is that the cloud has enabled a shadow IT to emerge. That sounds scary (mostly to the IT people), right? But we have also learned that shadow IT is not totally a bad thing, as long as there is communication, coordination, architecture, and governance. Let’s discuss this more specifically.
Shadow IT means that each LOB, or each function (e.g., HR, marketing, sales, field service) is making its own decisions and paying its own costs for certain IT capabilities that it deems necessary, without going through the centrally controlled process on which we just heaped sarcasm. As a result, the agility benefit of cloud solutions can now directly and immediately benefit end users. Since many of these departments do not have the skills needed to perform a good study of their requirements and carefully select a solution, the decision process has shortcomings and can border on the arbitrary. Here are three typical ways in which a shadow IT solution is selected:
A department employee who seems to be pretty good with technology (often a junior one) is told to research and propose a solution.
A manager who has used a certain solution at home — for his homeowners’ association, say, or heard about it from a buddy at the gym — decides that it must be good enough for his business need (that’s how so many confidential documents end up being stored in Google Docs).
A consultant is hired to do the study. A very cheap consultant is chosen because the cost of this study needs to fly below the financial controls radar. In terms of the suitability of the recommendation, you get what you pay for.
What is typically not done is to go to the CIO and say, “Look, we need a solution for X, and we need it quickly. We’re not willing to go through a protracted process, and we heard that you guys typically take way too long. But we do want to make sure that what we choose will not create a mess; we want to make certain we can exchange data with other systems — even though we may not yet know what those are; we realize that we may need some form of support at some point; and we want to remain reasonably good friends. Will you help us achieve those goals? Call it a “proof of concept” if that sounds better. And guess what, you’re even going to learn something useful in the process, which could be applied elsewhere in the organization.”
Seen this way (i.e., very optimistically), shadow IT projects don’t quite deserve their name; they are no longer happening in complete darkness. They try to achieve a balance: work fast without having to jump through bureaucratic hoops, but not risk rejection by the host organism.
Some progressive CIOs understand the need to monitor, facilitate, and even embrace these processes rather than ignore or fight them. One way to do this (usually in a large enterprise because of the resource allocation that it requires) is to create a small “rapid reaction team” of IT specialists who work in quasi-startup mode. Their job is to quickly build prototypes for the internal clients, trying free or cheap solutions in the cloud, allowing decisions to be made in the course of days or weeks, not months, while keeping the IT organization informed and ensuring that the selected solutions can be integrated and supported. A key challenge is to protect this team from being “recaptured” inside mainstream projects when a resource crunch or an emergency happens.
If IT does not embrace agility in delivering solutions to the business, the business will go ahead and select solutions in the cloud without involving IT. But if the business procures such solutions on its own, integration will become very difficult.
Follow the Data
One of the key reasons organizations invoke for staying away from public clouds is security, and specifically the risk that unauthorized parties will access data.
From a technology perspective, placing data in a public cloud may actually improve security rather than weaken it. There are four simple reasons for this:
A cloud service provider (CSP) is likely to have many more security specialists than any one of its clients and can therefore do a better job at protecting its infrastructure and reacting to an attack.
If a CSP suffers a damaging attack, the business consequences can be devastating. Therefore, a CSP is even more highly motivated to protect its infrastructure than its clients are to protect theirs.
Multi-tenancy, which is a usual property of a public cloud, makes it harder to find a particular client’s data. If I stored my data on premises, attackers could find my IP addresses through the domain name system and would know exactly where to attack. If my documents are in a public cloud instead, attackers must find a needle in a haystack.1
Most security breaches are committed by insiders. Within a company, it is relatively easy to copy data without leaving fingerprints. Retrieving data from a public CSP, however, is likely to leave a trace.
Statistics about security breaches suffered by public CSPs are misleading. Companies prefer to keep quiet about internal security incidents, which means that claims of higher security for on-premises infrastructure are suspect. (The same arguments, by the way, can be made to debunk the claims that data is less available when placed in the cloud.)
However, we need to balance the technology arguments with the management perspective. Indeed, a combination of lack of education about the above considerations and issues of risk management, regulatory compliance, and legal exposure may result in staying away from the cloud. Personal survival may also come into play. If there is a successful attack against a bank and credit card numbers are stolen, the CEO can hope to escape personal blame (often by firing the CIO immediately after the initial mop-up effort). But if the CEO approved a proposal to move to the cloud, and a similar attack against the CSP results in a loss of confidential information, then the CEO’s head is also likely to roll.
The punitive measures included in the General Data Protection Regulation (GDPR) of the European Union (EU) have heightened, for managers of large companies, fears about losing control of data. If data contains personal information about citizens of the EU, a company may be fined as much as 4% of its annual revenue (or €20 million, whichever is higher) if this data is leaked “intentionally or negligently.” It is only if a data leak could not reasonably be expected, by virtue of having implemented strict security measures such as those detailed in Article 32, that the organization can avoid severe penalization.
Other countries are now considering the adoption of measures similar to those of the GDPR, which is often considered “model legislation.”2 This is the case in Canada with the Personal Information Protection and Electronic Documents Act (PIPEDA). And while it is unlikely to become law in its exact current form, a bill was recently introduced in the US Congress that would impose not only GDPR-like penalties but also jail time on executives of companies that fail to protect consumers’ personal data.
As if the issue of personal data protection was not difficult enough, we need to superimpose on it the problem of data residency. This term refers to the set of issues and challenges posed by the location of data — especially if it ends up located in (or even just passing through) a jurisdiction that exposes it to greater risks, including legal demands for access, prohibition of strong encryption, or state-sanctioned spying.3
Rational decisions about using or not using a public cloud are made more difficult by the fact that some IT managers will exaggerate the risks in order to preserve their fiefdom. Claiming that a public cloud is inherently less secure is a way to preserve the data center and staff from which they derive their status.
Much of what is feared about security in the cloud is a myth. Whether to place data outside of the enterprise or not is ultimately a cross-functional risk management decision, involving business management (including the legal department), IT, and the CSP itself, which must specify in the cloud service agreement the security measures it has deployed and its commitment to prompt reporting and remediation of attacks.
Measuring the Impact
It should be clear by now that I do not propose to measure the success of a cloud transformation strategy through cost savings, since the nature of the costs is changing, and the ultimate gain or loss can only be assessed over the entire lifetime of an application or system.
We have learned that security and availability metrics (e.g., number of incidents, time to repair) can be improved rather than reduced by a move to the cloud, but that the issue is clouded in myth. Therefore, it seems essential to firmly base this discussion on facts.
Baseline metrics about the “as is” environment are often missing. Before embarking on a cloud transformation, an organization should have data about its current on-premises performance levels, especially in terms of security and reliability. Only then will it be able to determine later whether the cloud provides better results — allowing fact-based decisions to change policy or to change providers.
Since we highlighted, early on, the importance of agility, it must be reflected in metrics (and a baseline measurement), such as the following:
Time taken (including management and administrative tasks) to upgrade IT resources (e.g., to add storage space)
Number of incidents related to lack of scalability (e.g., degraded operations due to a saturated resource)
Time required to fulfill an internal customer request, from the study of the internal customer’s requirement until its satisfaction
User satisfaction with the IT sourcing process derived from surveys
A well-managed cloud adoption program should lead to improvements in all these metrics — unless IT was doing a fabulous job on all counts, which, frankly, is very rare.
Establishing goals for these metrics, before a cloud migration, should be possible in many cases — and a CSP can help. It should have statistics from its other customers, and its service-level agreement should specify such things as how long it takes to bring a new server or a new disk online, or how many hours of downtime are needed to upgrade an application.
Measure what matters to the business, not only to IT. And do not compare the performance of a CSP to ideal on-premises statistics of 100% reliability and 0% security incidents, because that’s not the truth.
The Increasingly Complex Cloud Deployment Options
At the beginning of the cloud, things were relatively simple. Thanks to the National Institute for Standards and Technology (NIST) Cloud Computing Reference Architecture from 2011, we could divide the cloud world into three service models (software as a service [SaaS], platform as a service [PaaS], and infrastructure as a service [IaaS]) and four deployment models (public, private, hybrid, and community). Most initial adoptions focused on public clouds, and either on SaaS or IaaS — none of which required a great deal of technical complexity to adopt.
In 2018, things have become much more complex. Here are some things we have learned since NIST’s original work:
Hybrid clouds and multiclouds have become much more frequent than we first expected, and they require significant tooling in order to manage the complex environment they create.
To avoid being locked into a relationship with a single provider, ways to package the payload (especially in the case of applications; data storage is easier) have been developed to achieve portability across providers.
We now have a proliferation of deployment technologies: containers, container as a service (CaaS), cloud-native applications, microservices, function as a service (FaaS), bare metal servers, virtual machines (VMs), nested VMs … the list seems to increase each quarter.4
This evolution means that if an organization is migrating to the cloud some of its internal applications (or legacy commercial ones that are not delivered in the cloud by the vendor), it needs people with new technical knowledge to either package or refactor these applications. While retraining existing IT personnel may be possible, the learning curve can be steep and not everyone can climb it. In the short term, an IT organization probably needs to hire specialized consultants or contractors, which does not come cheap.
Do not underestimate the pace of evolution of cloud technologies. If no one in the organization watches (and understands) the new cloud delivery technologies, you may choose the wrong one, or you may be misled by a vendor that will fail to mention a better option because the vendor does not offer it.
Conclusion: Cloud Is the New Normal
The “accelerating pace of IT” has become a trite phrase. We should be quite used to these ever-shortening cycles of transformation, so it should not be a surprise that in the course of a decade the cloud has changed the landscape of IT so quickly for so many organizations.
It is now obvious that people who initially doubted the durability of the cloud phenomenon were wrong: the cloud has impacted business and IT much more completely than even the optimists thought possible. McCarthy’s prediction of an age of “utility computing” came true half a century after it was made. Even with some real risks and persistent misunderstandings (what I have described as myths), the cloud is clearly the “new normal” of IT management.
If your business is not to supply computing resources, then owning such resources is not important — it is even a distraction. Most companies don’t operate their own power generator or water treatment plant. Except in some very specific cases (e.g., real-time constraints, massive data access rates), why should they operate their own servers?
What is important, instead, is managing this environment and the resulting relationships. The IT organization becomes the broker in charge of supplying appropriate and scalable resources in a timely manner to its internal customers. Thus, in conclusion, we have learned that the key capabilities of IT in the cloud era will not be software development or system administration, but the ability to:
Elicit and understand user requirements
Keep track of technology and market trends
Implement a solid (but rapid) sourcing process
Manage relationships with providers
Measure performance and service levels and react to changes
1 This argument was made as far back as July 2010 by my late Cutter Consortium colleague Mitch Ummel; see: “Cloud Computing: Separating the Hype from the Reality.”
2 Remarks by Satya Nadella, Chairman of Microsoft, during an interview at the Viva Technology conference in Paris, 24 May 2018. From the author’s notes.
3 The nature of the risks, particularly their potential negative impact on the cloud industry, are documented in a paper by the Cloud Standards Customer Council (CSCC); see: “Data Residency Challenges: A Joint Paper with the Object Management Group (OMG).”
4 To bring some clarity to this complexity, the Cloud Working Group of OMG is starting a new guide on this topic, due out in the first quarter of 2019.