Monday, December 31, 2012

2013 Cloud Predictions

Here's my quick cloud predictions for 2013:

1. OpenStack continues to gain traction but many early adopters bypass Folsom in anticipation of Grizzly.

2. Amazon's push to the enterprise means we will see more hosted, packaged apps from Microsoft, SAP and other large ISV's. Their IaaS/PaaS introductions will be lackluster compared to previous years.

3. BMC and CA will acquire their way into the cloud.

4. SAP Hana will quickly determine that Teradata isn't their primary competitor as the rise of OSS solutions matures.

5. Data service layers (think Netflix/Cassandra) become common in large cloud deployments.

6. Rackspace, the "Open Cloud Company" continues to gain traction but users find more and more of their services 'not open'.

7. IBM goes another year without a cohesive cloud strategy.

8. Puppet and Chef continue to grow presence but Cfengine gets a resurgence in mindshare.

9. Cloud Bees, Rightscale, Canonical, Inktank, Enstratus, Piston Cloud, PagerDuty, Nebula and Gigaspaces are all acquired.

10. Eucalyptus sunsets native storage solutions and adopts OpenStack solutions.

11. VMware solution dominates over other CloudFoundry vendors.

12. Cloud 'cost control' vendors (Newvem, Cloudyn, Cloud Cruiser, Amysta, Cloudability, Raveld, CloudCheckR, Teevity, etc.) find the space too crowded and begin shifting focus.

13. PaaS solutions begin to look more and more like orchestration solutions with capabilities to leverages SDN, provisioned IOPS, IAM and autonomic features. Middleware vendors that don't offer open source solutions lose significant market share in cloud.

14. Microsoft's server-side OS refresh opens the door to more HyperV and private cloud.

15. Microsoft, Amazon and Google pull away from the pack in the public cloud while Dell, HP, AT&T and others grow their footprint but suffer growing pains (aka, outages).

16. Netflix funds and spins out a cloud automation company.

17. Red Hat focuses on the basics, mainly integrating/extending existing product lines with a continued emphasis on OpenStack.

18. Accenture remains largely absent from the cloud, leaving Capgemini and major off-shore companies to take the revenue lead.

19. EMC will continue to thrive: it's even easier to be sloppy with storage usage in the cloud and users realize it isn't 'all commodity hardware'.

20. In 2013, we'll see another talent war. It won't be as bad as dot-com, but talent will be tight.

I try to keep my predictions upbeat and avoid the forecasts on who will meet their demise - but yes, I anticipate a few companies will close doors or do asset sales. It's all part of the journey.

Enjoy your 2013!

Saturday, December 29, 2012

AWS Outage: Netflix and Stackato

Over the last few days, a few of the engineers at TranscendComputing have been discussing what we could have done to have helped Netflix avoid their Christmas outage. For those of you who aren’t aware, AWS suffered an outage in the Elastic Load Balancer (ELB) service in the East Region.

In the middle of our discussions on creating massively scalable, highly available, clustered load balancers with feature parity to ELB, I caught a post by Diane Mueller at Active State. The gist of her post is that ‘Netflix went down because of AWS but her personal app (which leveraged FeedHenry and Stackato’ was revived after 10 minutes. The post seems to imply that if you use PaaS (like Stackato), one can switch clouds easily, like she did when she moved to her application to the HP Cloud.

I’ll avoid the overly dramatic retort but let’s just say that I disagree with Dianne’s implication. Here’s my position: if core Netflix applications were negatively affected by any core service (such as ELB), it would be extremely difficult to quickly switch to another cloud. Here are some specifics:
  1. No disrespect to my friends on the HP Cloud team but I honestly believe that if Netflix were to have done a sudden switch from AWS to HP it would have brought HP Cloud to its knees. ELB’s (if they had them) would have been crushed and Internet gateways would have been overloaded. Finding a very large number of idle servers may have also been a challenge.
  2. In this imaginary scenario, I guess we’ll assume that Netflix decided to keep their movie library and all application services running on multiple clouds. Sure this would be expensive but it wouldn’t have been realistic for them to do a just-in-time copy of the data from one location to the other.
  3. Netflix has done a great job of publishing their technical architecture: EMR, ELB, EIP, VPC, SQS, Autoscale, etc. None of these are available in the solution Dianne prescribed (Stackato), nor does HP Cloud offer them natively. There is a complete mismatch of services between the clouds. CloudFoundry offers some things that are ‘similar’ but I’m concerned that they wouldn’t have offered performance at scale.
  4. Netflix has also created tools specific to the AWS cloud (Asgard, Astyanax, etc.) as well as tuned off-the-shelf tools for AWS like Cassandra. These would have to be refined to work on each target cloud.

In summary, there’s little-to-no chance that Netflix could have quickly moved to ANY other cloud provider (including Rackspace or Azure) and there’s not a thing that Stackato would have done to alleviate the problem. All medium and large customers have real needs that are service dependent. I’ve joked that CloudFoundry is a toy. It is, but it’s a toy that is maturing and eventually may help with ‘real’ problems – but let’s be clear – that day isn’t today. Any suggestion that it is ready for a ‘Netflix-like-outage’ is either na├»ve or intentionally misleading.

I’ve spent the last three years working on solving the AWS portability problem – and it’s a bitch. Like Dianne, if you have a simple app my solution, TopStack, will work. It replicates core AWS services for workload portability. As proud as I am as what the team at Transcend Computing has done, I’m also quick to note that cloning any of the AWS services at massive scale with minimal down-time, across heterogeneous cloud platforms and providers is an incredibly tough problem.

Here’s my belief: Running the Transcend Computing ELB service on HP Cloud would not have worked for Netflix in their time of need. Our software would have been crushed. HP’s cloud would have been crushed. Netflix’s homegrown software wouldn’t have had ‘practically portable’.  It would not have worked.

I’m happy to acknowledge where we suck. We’ll continue to listen to the unfortunate incidents that AWS, Netflix and others encounter. My 2013 prediction for Transcend Computing is this: we’ll suck less. Acknowledging reality is the first step.