Cloud as an emerging IT platform

Description

The “cloud” is a broad and often vague term to describe a set of changes in the IT environment, procurement methods and value propositions for IT platform resources which have occurred over the last decade.  Within this trend description, we will seek to be specific about the description, architectural and organizational changes implied by the trend as we see them now.  First, we will define “cloud” in a similar manner to the US National Institute for Standards and Technology (NIST, http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf):

The baseline IT platform model for an organization is the set of on-premises assets owned, operated by an IT department for the benefit of the entire organization.  These assets include networking components (routers, cabling, etc.), computational resources (hardware, virtual machines, operating systems, applications), storage resources (file services or databases), and maintenance tools (like monitoring or security systems).  These assets also comprise an IT platform and are provisioned and maintained through procurement of goods and services directly by the organization.  Generally, these are provisioned in separate environments composed of zones: internal, or external/DMZ (internet-accessible) defined at specific geographic – or facilities – locations.  Internal systems typically have all the platform elements of security/identity management, application hosting (multiple types), data storage, compute provisioning (could be physical or VM), and networking.  Most elements of this environment do NOT span to the external environment and are not extensible beyond the internal implementation (e.g., internal security is not extensible to systems deployed in an extranet environment, or compute resources cannot be relocated between internal and external zones easily).  These assets may be owned and/or operated by the IT department or a business unit and their contractor(s).

The trend towards cloud computing is fundamentally a shift in one or more of the following key aspects of computational systems (and the IT platform):

  • Computation environments (formerly zones) may be composed of services and resources which behave similarly regardless of where they are deployed. e., network segment is not a constraint on the architecture (the same Lego™ blocks are available everywhere).  Environments are accessible from either internet or intranet, constrained by security permissions.
  • Computation, networking and storage are composed out of reusable (non-custom) building blocks which can be instantiated programmatically. e., the building blocks are available in all environments.  (self-service)
  • Environments composed of these reusable building blocks have a guarantee around service level for the platform. This may be in terms of availability, or other aspects, and may extend to applications as well as the base platform.
  • Computation capacity may scale up or down dynamically based on system loading and other constraints (elasticity).
  • Security and identity is common between environments and based on policy. e., identity may include internal and external users for an application, but changes to who has access does not change the architecture of the application.  Identity specifically is an aggregate of multiple providers, some of which are not managed by the organization.
  • Cloud systems have explicit resource management (optionally pooling and grouping of resources), change management, and auditing systems which span all environments.
  • Computation environments have an explicit cost per component, often based on usage over time, and can be managed by cost constraints from the business. (Measured service)

An organization who implements an IT platform and changes (or envisions changes to) any of these aspects is said to be utilizing cloud computing.  This trend illustrates that the organization is moving beyond the baseline environment to including one or more cloud environments.  Our distinction here is between the datacenter model of a decade ago – where not all these aspects were managed, even in a virtualized environment – and that of the private cloud, which now has the benefits of low-provisioning times, full virtualization and mobility, software-defined networking boundaries and security, and resource/capacity on demand even if the physical facility may be on the owner’s property.  Many vendors have supplied an offsite, private cloud capability for years (IBM and EDS notably going back into the 1980’s at least), where accompany could contract for dedicated capability for critical systems like ERP.

We will call the evolved state of the organization a hybrid cloud: a combination of the traditional baseline platforms, plus the inclusion of one or more cloud systems (on-premises cloud, private cloud, community cloud/multi-tenant, public cloud, etc.).  The public cloud term represents two specific additional aspects of the hosting model: 1) the systems generally can face out to the internet and consumers, and 2) the hardware/virtualized compute assets are not dedicated to a single customer, there may be shared components or workloads may be moved between physical assets based on load (e.g., your virtual database may be in a different VM on the same physical machine as another company’s database VM).

NIST further defines three categories of cloud services agnostic of environment, which define the level of management which the organization expects to take on:

  1. Software as a service (SAAS) – generally systems which are provisioned with internet-accessibility and are multi-tenant; the organization manages its own tenant and options related to the actual application being delivered.
  2. Platform as a service (PAAS) – an IT platform (compute, network and storage elements) which lets the organization manage only the application which sits on top of the OS stack. The OS, virtualization, and potentially other services are managed by the “cloud” vendor.  Often this service can provide improved SLA over the next service type (IAAS) by leveraging common fabric components for fault tolerance, disaster recovery, or load balancing.
  3. Infrastructure as a service (IAAS) – a partial IT platform which provides standardized compute, networking, and storage services on a virtualized platform. The customer organization manages the resources directly (e.g., patching the OS) and often has tools to migrate existing baseline environment resources to this type of environment, gaining benefits in elasticity or other aspects.

We will endeavor to describe cloud systems and the trend towards cloud computing in these terms – components and aspects – neutrally towards the various vendors of these services.  That is, we will adopt a description which fits the organization – the buyer and owner – of these capabilities.  Please see the section below on “What questions do I need to ask of my vendors?” for more specifics.  The primary reference models we will use are NIST, leading vendor service descriptions (Amazon Web Services, Microsoft Azure, Google Compute Engine[1]), colocation vendor service descriptions (IBM, Verizon, Rackspace), and other on-premises private cloud hosting vendors (IBM, VMWare).

We will not talk about the following in this perspective on the cloud trend:

  • Calculating the internal cost to serve of the baseline environment (on-premises)
  • Broad business value calculations (please see References for starting points here)

The following graphic shows the Microsoft Azure offerings – quite a complex bunch of parts for the architect to consider!

fig1

Figure 1.  Azure components, courtesy of Microsoft Corporation.

[1] Gartner and other analysts publish magic quadrant information annually, such as http://www.networkworld.com/article/2898095/cloud-computing/amazon-microsoft-and-salesforce-top-forrester-cloud-platform-list.html?phint=newt%3Dnetworkworld_cloud_computing_alert&phint=idg_eid%3Db256ca257831cc1f14c936c452b333ba#tk.NWWNLE_nlt_cloud_security_2015-03-19 .

Perspectives

So, what changes with the cloud?

From an architecture perspective, many of our basic ground rules and processes change when looking at cloud systems, but our analysis method still has very much the same steps.  The customer has an opportunity to re-look the business value of a new system, migration/upgrade of an old system, or retirement of an asset through the lens of new alternatives.  Does the enterprise architect have to do different calculations for the value of systems and timing on the portfolio road map?  Yes.  Does the solution architect have to look at the cost of the system differently?  Yes.  In many cases, this merely shines a light on aspects we did not investigate before: understanding business value in quantitative business terms (value to customer being discrete instead of a “hunch”), or understanding what it costs to maintain an application internally as a comparison.  If you start a project and only include the project cost and not an estimate of the long-term maintenance (and how many of us could actually say how much that critical, old Windows 95 machine out in R&D costs the company nowadays?), where does the maintenance funding actually come from?

That’s just at the front end.  Let’s talk through the development cycle of a project specifically now.  If the architect now has a choice between a data center-hosted solution internally and a cloud solution, s/he needs to compare the proposed designs and views of the new system in a comparable, apples-to-apples manner:

  • What are the fundamental capabilities I am trying to provide? (Are any of these easier to achieve in one environment or the other? Are some of these new business challenges that would cause me to rethink the legacy system entirely?)
  • What hosting environment is the system going to be in? (public cloud? Which one? Internal hosting?)
  • What software packages or development do I need to undertake? (if it’s developed code, do my processes support the environment and best practice I want to achieve?)
  • What integrations are needed and how will I achieve them? (How about security?)
  • Are there compliance aspects I need to consider?
  • What topology will I select? (does the environment support the optimal deployment? How do I scale if the project is successful?)
  • How will I test the solution?
  • How do I deal with legacy data and gain user adoption? (How will I maintain the system and provide governance and changes in the capability over the lifespan of the solution?)

fig2

Figure 2. Cloud concerns during the life-cycle of a project

Many of these questions we ask already for every project – the cloud just gives us another column of answers on our spreadsheet, an alternative.  Sometimes this new set of proposed values gives us capabilities we did not have previously.  Sometimes, this new set of answers may fall up short from what our strategy was with internal hosting capabilities.

Cloud systems from major vendors – especially in the software-as-a-service and infrastructure-as-a-service areas – have well-defined answers to each of these questions, and patterns we can leverage (see references for pointers to many of these).  The challenge for the architect – as advisor to the business on the overall issue of how to procure IT assets — then is two-fold:

  1. Translate the “goal” of moving some types of systems to the cloud into a set of principles which can be applied to many types of projects
  2. Develop a low-risk approach to this on the technical side which can be grown within the people model we have in the organization

fig3

Figure 3.  A standard comparison of platform aspects for on premise and cloud systems.

Both of these rely on the architect investing in their own skills to understand how this new environment can be leveraged.  To be effective technical leaders, we will need to at least understand the different options in the cloud – no different than understanding the trend towards NoSQL databases, or the patterns behind the Internet-of-Things as they apply to our organization’s business model – so that we can build our roadmap effectively and coach others to get to the point where we have a technical capability to take on this kind of work.

How does the architect determine the benefits and risks of this new environment?

Well, this boils down to comparing the current application environment – and its guiding principles – against the new application environment and determining what it is useful for. Just as there was no single-size-fits-all for internal hosting previously (we still needed some workloads to be on application-specific “appliances” or had some applications which could not even virtualize), the cloud does not present a silver bullet for all types of applications.  The architect will have to use experience with the existing applications to see how they would look in the new model; the architect will also have to look at where requirements and demand is coming from in his organization to understand if the cloud presents a compelling reason to bypass the existing application model.  Typically, we would compare alternatives across a set of criteria from the design side and non-functional requirements (also known as quality attributes in the ITABoK).  A simple list of these might look like the following:

  • Design requirements (per application)
    • Web design, rich client, mobile application design
    • Business logic, including workflow
    • External integrations
    • Compliance with regulatory restrictions
    • Data storage and schema
  • Non-functional requirements
    • Performance and scalability
    • Reliability, availability, recoverability (SLA, if any)
    • Testability
    • Security (integrity, privacy, authentication, authorization mechanisms)
    • Portability
    • Networking considerations including where the users of the system are located
    • Maintainability, extensibility and usability, especially for custom developed pieces
    • Monitoring and operability (ITIL support)
    • Other risk areas
    • (Cost)

Design requirements ask the question like an RFP for a capability – do we have a full solution to meet what the business needs?  For instance, is there a CRM case management system that meets the stated needs of our field service teams?  Does it also have the social integration features we want to reach our customers.

The non-functional requirements are often ones that can be tested in the final system and would be represented in a viewpoint or system aspect design (see PBAAM and ATAM methods elsewhere).  These are often not binary choices at the beginning of a project but certainly influence cost and schedule to deliver the solution.  We will omit cost for now under the assumption that, once the other elements are analyzed, we may have to cost each alternative separately and that each has a specific calculator for determining the cost elements for the project.

The next step for the alternative assessment is to fill in the expected values for these and look at them side-by-side to make a recommendation.  In the table below, we see some of the benefits and questions for the “cloud” alternatives:

System aspect Platform as a service Infrastructure Software as a service
Web client Supports latest development techniques for LAMP/IIS development, often advanced support for scalable, ecommerce solutions

 

Speed to provision and extend is often significantly faster than on premise

Supports multiple web servers, restricted support for advanced controls (web gardens, ISAPI, etc.)

 

Speed to provision and extend is often significantly faster than on premise

Many common packages prehosted as SaaS (SF.com, Office 365, MySAP), and components.  Look for API support for integrations and configurability

 

Speed to provision and extend is often significantly faster than on premise

Rich client Can be supported on remote desktop model; check performance compared to a client-distribution model N/A Some hosted solutions have pre-integrated rich clients (OneDrive, etc.)
Mobile application design Advanced support for database, media, available file sharing N/A Some hosted solutions have pre-integrated mobile clients (GooglePlay, etc.)
Business logic, including workflow New patterns available such as service bus (for IoT) or redis cache, which may not be available on premise Can configure business logic components such as MQ similar to on premise Usually through provider-supplied API’s in REST/JSON format
External integrations Integrations to web-based software (API, REST, or service) is well supported Similar to on premise, look at network segments to determine appropriate secure transport Synchronization between on premise and cloud systems requires design
Compliance with regulatory restrictions[1] Compliance generally the system designer’s responsibility Partial support from IaaS vendor Compliance supported by service provider
Data storage and schema Similar schema and database infrastructure options to on premise; some advanced analytics options Configured by service provider
Performance and scalability Advanced scalability, recovery and performance options in cloud not generally available on premise Advanced scalability, recovery and performance options in cloud not generally available on premise Configured by service provider with financial SLA; integrations with on premise systems usually not covered
Reliability, availability, recoverability (SLA, if any) Advanced options available by contract; significant changes to design for recoverability/ backup Advanced options available by contract; significant changes to design for recoverability/ backup Configured by service provider with financial SLA
Testability Most test tools work with cloud or on premise deployment May not be able to get specific VM configuration or access to certain OS settings Contractual capabilities by service provider; some allow additional testing (such as penetration testing) or support through add-on contracts
Security (integrity, privacy, authentication, authorization mechanisms) Many new options available based on OAuth standards; optimized for both internal AD-driven and consumer-driven authentication Internal authentication typically through a federation model across the S2S bridge between cloud and on premise environments

 

May require testing

Authentication options per service provider
Portability Some lock-in to particular development tools & hosting services Minimal change to on premise model Onboarding and off-boarding are defined by service provider contract; for major systems changing vendor is a significant migration
Networking considerations including where the users of the system are located Significantly improved networking responsiveness over on premise for internet-facing systems; internal routing dependent on speed of site-to-site routing.  Internal network often the gating factor here[2]
Maintainability, extensibility and usability, especially for custom developed pieces Similar skills application from on premise model Roadmap and usability provided with service; often participation in advisory board is recommended
Monitoring and operability (ITIL support) Some advanced telemetry options; most PaaS options have a dashboard for monitoring production applications Monitoring strategy needs to be extended to the cloud (Nagios, SCOM, etc.).

 

Other IT practices such as chargeback or recharge or show-back may need to be redesigned

Monitoring is usually done per SLA; limited monitoring done from on premise

 

This table obviously indicates that there are trade offs and that no single cloud offering is – de facto – an obvious choice for moving any given system.  The benefits of the “cloud” model however do concentrate in certain architectural aspects:

  • Speed of initial deployment
  • Options for large-scale solutions (flexibility to grow from pilot system to broad consumer-scale system)
  • Options for dynamic systems which fluctuate in demand
  • No need for physical facilities (and accompanying bill of materials to do initial provisioning)
  • Networking capability to reach globally (either external or internal customers, some geographies we may not have presence in such as China)
  • Automation of maintenance and upgrades
  • High reliability, consistent security due to standardization

Some aspects have changed radically, requiring a new model:  BC/DR where we used to keep spare copies of (most) items in an offsite location[3] is not feasible with the cloud but other mechanisms are provided for “warmer” fail-over, data storage models, infrastructure deployment patterns and security management.  Most of these aspects are testable/verifiable, and are included in service provider contracts.

As an architect, if you have requirements in one of these areas, one or more cloud solution types should be considered alongside traditional models.

Who else does this affect?

The ability to add either hybrid cloud or public/multi-tenant cloud to the architecture will affect different parts of the delivery team and the stakeholders differently.

The architect has options with the trend towards cloud systems and other people on the implementation team also have a stake in this change.  The project manager, the infrastructure architect and engineer, the software developer, the test & QA team, and more broadly the IT finance organization, and the end stakeholders in the business.

For each of these roles, there will be required both an understanding of why the model is changing and how to plan for this new model.  By example, the infrastructure engineer may have to learn new machine types which can be provisioned and map existing ones to similar types in the cloud.  PowerShell scripts may have to be modified to work in this new environment.  Monitoring tools should be extended.  Though the task has not changed in any of these cases, new technical details must be learned to be successful.

In terms of which ITABoK capability areas are affected, the cloud trend impacts the following domains the most:

  1. Business technology strategy – the value case with the customer
  2. IT environment – where systems are located physically and what controls we have over them, new partners
  3. Design skills – new technologies and skill sets
  4. Quality attributes – new service levels

The cloud tend does not affect the following areas significantly:

  1. Human dynamics

[1] Different cloud service providers have certifications but that may not be enough for your stakeholders.  E.g., what parts of HIPAA do you still have to comply with beyond what the service gives you?

[2] If a cloud system has to reach users over slow site links, it will be no faster than an internal deployment.  This is an opportunity to reduce load on corporate network lines, if the end user already has a direct-to-internet line of sufficient capacity.

[3] We mentioned that the cloud often shines a light on our own practices, and disaster recovery is a good example where we often have little visibility into the true cost of providing this service across even mission-critical systems, and many organizations fail to actually execute a DR drill before an emergency happens, to actually test the plan.

Best Practices

The CAPEX question

One of the more controversial changes introduced by the cloud affects large organizations: hosted cloud of the PaaS, IaaS and SaaS types generally can only be considered an operational expense from an accounting perspective.  This contrasts with purchasing hardware and to some degree other assets (perpetual software licenses) which a large organization could classify as a capital expense, or CAPEX.  Capital purchases are often favored for a couple of reasons, including depreciation on taxable income, and the ability to undertake large projects and spread the spend across multiple years as might be common for an ERP system.

The architect should understand this cost basis change and be able to evaluate the overall life-cycle costs of the proposed cloud system on an equivalent basis to the on premise alternative.  This may involve total cost of ownership (TCO) or benefits calculations such as IRR.  It is important to understand the full life-cycle costs including hardware, software, and labor as often it is not the hardware component which is the dominant cost element.   The cloud alternative changes each of these components of cost.  The on premise solution cost may similarly require some investigation to determine the true TCO.  Often, the IT organization may have made investments in a data center (physical space, VM hosts, processes, staff) which are then incentivized to make it attractive for business units to leverage the data center and achieve high utilization of the facility.  This is neither bad nor good, but tends to subsidize/reduce the apparent cost of an on premise solution by ignoring certain fixed costs of operating the data center, including labor.  The key is to get to a true apples-to-apples comparison both for the short-term project and the longer-term maintenance cycle.

Several cost and benefit calculators are provided in the references below.  If there are other services provided by the internal hoster, such as recharging costs back to business units, the architect has options in the cloud as solutions on top of the service provider, such as Apptio, Cloud Cruiser, Hanu Insight, etc.

The Integration question

One of the more complex issues that the architect will face on a cloud or hybrid project is the issue of integration.  Even if one system is in the cloud, often it will depend on other systems which remain in another environment.  At the very least, the data needed to operate the new cloud system may reside on premise in a different system and may need to be imported to the cloud system, possibly even bi-directionally synchronized.

A CRM system, or a customer-facing commerce system such as a product catalog, or an IT/OT system as we see in Internet-of-Things type systems all require this communication between two hosting environments.  There may even be more complexity in this environment when we try to add a cloud-only service (say, Cortana or Echo) to a traditional on premise application.

As an advanced scenario, we would expect architecture’s function to look at other patterns (call up other architects who may have similar systems to understand lessons learned), analyze trade-offs between potential implementations, and / or prototype one or more implementations to verify that there is a viable implementation.  We will have to look at views and viewpoints of the system to ensure performance, security, testability (similar to integration/data transfer testing between two different trading partners) and other aspects.

How can I present hybrid cloud to my department?

Whether it is a single project moving to the cloud, or whether it is part of a larger strategy, the broader implementation team needs information on how to accomplish this.  The architect may be the first one aware of the strategy on the project and may be called upon to present to the other architects, or the project team as they start the implementation.

The architect may have to learn many of these skills and then be prepared to translate them to the rest of the team.  See the references for some common starting points on training.  The architect should be comfortable describing the new technologies which will be used, how they connect, and laying out how decisions will be made on the project – these are principles that should be shared.

As the writer of this trend perspective, I am including two samples below – one for a set of decision criteria for moving a system to the cloud, and one as a discussion starter for a multi-project move to cloud hosting.  Neither of these is intended to be applicable to a specific organization, but has helped me convey the nature of the these changes to other architects.  You may find these as starting points for your own discussions.

fig4

Figure 4. Decision criteria for moving projects to the cloud.

fig5

Figure 5. Certain systems make sense on different types of hosted/cloud environments

Certainly one of the challenges to the architect is to develop these sorts of materials at the same time as learning about these systems him- or her-self.  Join one of the IASA discussion forums on this to ask other architects how they did this.

Capabilities and Maturity

In this final section, we will discuss specific capabilities and maturity in both the context of the individual architect’s growth and the organization.

Maturity level Architect Organization
BASIC At the CITA-Foundation level should be:

·        Familiar with all terms in this article, have read introductory vendor material

·        Managing a personal or small subscription for internet-only access

·        Able to draw equivalent components to current system diagrams using cloud symbols

·        Able to help provision IaaS components or SaaS application as part of another project

 

Has the following capabilities:

·        Architects are familiar with types of cloud computing and have basic ability to recognize components equivalent to current process

·        Limited infrastructure capability to integrate with cloud

·        Limited staff to develop or deploy systems to cloud

·        Limited ability to procure cloud resources

·        Limited understanding of how cloud security and compliance mechanisms work

INITIAL OPERATIONS At the CITA-Associate level should be:

·        Able to select appropriate cloud components for systems design and frame decisions between alternative SaaS and IaaS implementations

·        Able to size IaaS components including compute, storage and networking

·        Able to define project costs and operating cost structure

·        Able to implement prototype functionality in a production subscription and define connectivity required to corporate environments

·        May have led a SaaS or IaaS implementation

 

Has the following capabilities:

·        Basic infrastructure connectivity to support small systems in a single cloud

·        Procurement and single stakeholder is prepared for operational expense model through a single corporate subscription/contract

·        All existing cloud services are identified

·        Developing strategy/ principles, and a sizing estimator for subsequent projects

STANDARDIZED At the CITA-Specialist level should be:

·        Able to define and execute single projects on all cloud environments (IaaS, PaaS, and/or SaaS)

·        Compare on premise hosted versus hybrid versus public cloud solutions using common architecture tools

·        Trains other architects on techniques for doing cloud projects

·        Able to differentiate at the portfolio level which projects are better in each environment

·        May have led integrated (hybrid) projects

 

Has the following capabilities:

·        Single-vendor strategy to cloud defined and architects using in regular practice

·        May have a core cloud onboarding team formed to lead projects and disseminate skills to broader implementation teams

·        May have multiple cloud service providers actively contracted

·        Multiple projects in the cloud and ability to do tradeoff estimation for new projects

·        Roadmap being defined for systems and domains within the company to include cloud trend

ADVANCED At the CITA-Professional level should be:

·        Able to utilize multiple integrations (cloud-to-cloud, cloud-to-datacenter) for complex projects

·        Able to combine cloud services into existing applications

·        Trained on principles, able to present principles and rationale to business units

 

Has the following capabilities:

·        Adds governance discussions with business units

·        Able to charge directly (or show) costs allocated to specific applications and units

·        Strategic roadmap updated and reviewed by business units

 

Resources

Architecture

Analysts

Traditional hosting

Business value and cost

Design and viewpoints

Drawings and tools

Training and technical certification

Author

brian loomisThis perspective on the trend towards cloud systems was written by Brian Loomis, who is both a CITA-Professional and trainer for IASA Global.  His “day job” is as an enterprise architect for large organizations through Microsoft and leading software teams to deliver products in education, healthcare and manufacturing.  He has presented at industry and academic conferences including ACM SuperComputing, International Telemetry Conference, and at national business IT conferences.  Brian served as an officer in the United States Air Force and holds a Master’s degree in Computer Science from California State University and a Bachelor’s degree in Computer Engineering from Princeton University.  Check out his profile on the IASA instructors web site or drop him a line at bwloomis404@gmail.com if you have an architecture question.