Saturday, August 29, 2015

My VMworld 2015 Schedule

I am heading to San Francisco in a few hours for VMworld 2015. I will only be presenting at a single session this year:

  • CNA4725 - Scalable Cloud Native Apps with Docker and Mesos. Weds 8:30am-9:30am
After initially thinking I had been rejected for all of my sessions, I consider myself fortunate to have gotten any speaking sessions and happy that I get to speak about something that I've been spending a lot of time on recently. Two other sessions from Nutanix employees:

  • SDDC6827 - Nutanix Industry Panel including Hallmark Business Connections
  • STP6311 - Datacenter Battles: Hyperconvergence vs 3-Tier Infrastructure
I also would highly recommend some High Performance Computing sessions. The first run by my former vHPC partner in VMware's Office of the CTO, Josh Simons and the second with Mark Achtemichuk who is the performance guru for VMware's Tech Marketing group:
  • CTO6454 - Delivering Maximum Performance for Scale-out Applications With ESX 6
  • VAPP5724 - Extreme Performance Series: BCA High Performance Panel
If anyone wants to Hadoop, HPC, or cloud-native platforms, please come find me at the Nutanix booth #1729.

Thursday, August 6, 2015

Platforms for CI/CD: Cloud Foundry, Mesos, and Kubernetes

Working with customers on next-gen platforms, and watching the container ecosystem evolve, I have been able to see what gets attention and what is ubiquitous across the platforms. My talk at Hadoop Summit where I advocated Mesos and a platform for building platforms was recently published:
https://www.youtube.com/watch?v=FAxmal6ozLY

Spending more time with customers since then, I have seen the arguments evolve around Mesos or Kubernetes or Kubernetes on Mesos or something else driven by proponents of each side. This kind of debate is important as everyone shares the same goal of encapsulating and scheduling the next generation of decoupled yet collaborative app architecture (or the overloaded term microservices?). I have had to get into more nuanced conversations about where each platform differentiates itself and what parts of each are most conducive to a modern data pipeline. Extra complexity and layers only make a complicated system more unreliable.

A modern data pipeline is, in my mind, a very complex system in itself and just like the data flowing through, it must constantly adapt and evolve to drive useful results. Shown below is a slide given from a Chris Mutchler's and my presentation from VMworld 2014 that gives an (albeit very busy) illustration of the different components that could comprise a modern data pipeline.

So besides providing the largest, most flexible resource pool, which of these platforms supports the most straightforward method of change and injection of new updates to a running service? Specifically, how does each platform choose to endorse continuous integration and/or continuous delivery of new updates?

A model for this story, as for most cloud-native developments, begins with Netflix:
http://highscalability.com/blog/2011/12/12/netflix-developing-deploying-and-supporting-software-accordi.html

Some key principles of the Netflix article, including but not limited to:

  • Launch new "canary" instances and evaluate health
  • Every component is behind a load-balancer
  • Facilitate rolling upgrades and tear-down of old running components
Fast-forward a few years with Docker and containers as the new "unit of work". Developers inject self-contained code effectively compiled and built with its environment into a container. This by itself is more reliable and scalable since the local app dependencies are inherent instead of assumed in the broader operating environment.

Cloud Foundry facilitates blue-green testing reminiscent of Netflix's approach with load-balancing and canary deployment:
https://docs.cloudfoundry.org/devguide/deploy-apps/blue-green.html

Mesos by itself isn't actually doing this since it allows its constituent frameworks to determine rolling updates. For example, in my opinion, a good demonstration of the granularity and abstraction of job updates via the Aurora scheduler:
https://github.com/apache/aurora/blob/master/docs/client-commands.md#updating-a-job

Last but not least, Kubernetes uses the replication controller to abstract updates to a given pod when a new Docker image is incorporated. This innate Replica service handles the ongoing orchestration of pushing new pod templates and cleaning up the old instances:
https://github.com/GoogleCloudPlatform/kubernetes/tree/master/docs/user-guide/update-demo

I'm sure there are more examples out there and the number of container platforms will probably continue to grow, but the flexibility and granularity of will be key differentiators in my opinion. Because scaling out and perpetual updates appear to be a given for these new cloud-native apps, scheduling patterns and what system builders find most adaptive and reliable should determine adoption.