Sunday, November 23, 2014

Build a Mesos cluster using vSphere Big Data Extensions 2.x


I've been helping customers virtualize Hadoop and scale-out applications for a while, starting with just creation and tuning of VMs through the old C# client and scripts.  The open-source project Serengeti was a simplified way to handle virtual deployment and is an integral part of vSphere's Big Data Extensions (BDE).  Under the covers, BDE has an open-source version of Chef installed to hold the Big Data cluster blueprints and recipes.  Part of what I have been working on in VMware's Office of the CTO to support our virtualized HPC customers is to take some of those lessons learned and apply them to the automation of creating and managing HPC clusters.

In essence, most if not all scale-out applications follow a simplistic model of master(s) and workers.  BDE defines a cluster model or blueprint in a JSON file that is then used to clone virtual machines from a base template and assign a master, worker, client or other type of role. In BDE's case, these roles are declared in Chef as well as the corresponding application cookbooks and recipes. I won't be going through an in-depth tutorial of Chef in this post and if you're not familiar, some of these steps below won't make much sense, so some basic lessons are here: http://learn.getchef.com/.  In this example, I'll be using Apache Mesos.

  • Download the BDE OVA, at least version 2.0.  It's not a top-level download but you can find it under the VMware vSphere product heading, enterprise and higher: https://my.vmware.com/web/vmware/details?downloadGroup=BDE_210_GA&productId=353&rPId=6997
  • Deploy the OVA using either the vCenter Web Client or C# client which will create a vApp with both the "Management Server" and a "template node" CentOS VM which will be the template for all node deployment. If you prefer another distro such as RHEL or Ubuntu, now is the time to prep that image: http://pubs.vmware.com/bde-2/topic/com.vmware.bigdataextensions.admin.doc/GUID-CAD01F1F-F915-42C9-A1C1-A6093C2564D3.html
  • Since you'll be using Chef, you'll want to get some Mesos recipes from the Supermarket community.opscode.com or from some other appropriate place on Github.
  • Log into the Management Server as the root user or serengeti and upload your Mesos cookbooks. The default Chef directory is /opt/serengeti/chef.


  • Make a copy of the basic cluster JSON file from /opt/serengeti/samples/basic_cluster.json to something to describe your new blueprint, such as mesos_cluster.json.
  • From the basic example, you can see the classes of node groups you can deploy (master or worker) and the role that BDE will assign after cloning and customization of that node. Right now, each node will only receive the "basic" role.  Adjust each nodeGroup type to get an appropriate role like "mesos_master" for master nodes and "mesos_worker" for worker nodes.
  • You can write your node roles in the /opt/serengeti/chef/roles directory and then upload them as you would with any other Chef server.  For example: knife node role from file /path/to/file/mesos_master.json
  • For a sample format, you can copy the basic.rb to a mesos_master.rb and mesos_worker.rb.  Make sure to adjust the Name parameter on line 1 and the run_list to match your Mesos recipes. 


  • Also make sure you've uploaded your cookbooks to the Chef server with knife cookbook upload -a or knife cookbook upload mycookbookname
  • The quickest way to test this is to run the serengeti CLI by entering the command serengeti.
  • Connect to the BDE server and login by entering connect --host mybdehostname:8443.  This is actually authenticating via SSO so you could use administrator@vsphere.local or an equivalent user with administrator privileges on that vCenter instance.
  • To create a cluster, enter at a minimum: cluster create --name myclustername --specFile /path/to/my/mesos_cluster.json --networkName defaultNetwork
  • You can add the optional argument --password yes to the cluster create command set the root password for each VM, which I typically recommend.
  • At this point, you can see the cluster creation occurring within the CLI window as well as under "All Tasks" in the vSphere client. The time that it takes mostly depends on the speed of cloning in your environment.
  • When the cluster creation completes you will now have your own mesos cluster to log into and check out, or you may have a broken cluster depending on a variety of issues such as no hostnames, no IP addresses, recipes failing to complete successfully, and so on. For testing purposes, I would recommend making small clusters of a single master and single worker to evaluate the accuracy of the Mesos recipes. If (when?) you need to completely delete a cluster, the command for that is: cluster --delete myclustername which will power off all the VMs and destroy those VM files.  Don't expect any data to remain from that cluster after this step.  At this point, you'll be in an experimentation mode where you can update recipes, upload to Chef, and recreate the cluster.
There's a lot more to learn if you plan on extending this recipes or writing your own, so I'm planning to post more work here. In particular, if you have dependencies within services, these will cause issues as well as proper setup of static IP/FQN or DHCP and DNS. Hope this was helpful to get started!

Additional links:
http://community.opscode.com
http://www.vmware.com/products/vsphere/features/big-data
https://mesosphere.com/


Friday, August 22, 2014

HPC sessions at VMworld US 2014

Considering the number of sessions up for vote:
http://virtual-hiking.blogspot.com/2014/05/virtualized-hpc-and-customer-sessions.html

there were considerably fewer accepted sessions but still a few to pay attention to next week, either in your schedule or to review after the conference is over.  This is a short-list since it's last minute and my apologies if I'm missing a session so please let me know in the comments or twitter.  As a rule, I would check the master schedule onsite as times and dates may change.  In addition, during TAM day lunch on Sunday, there will be a table with Josh Simons, resident master of HPC in the Office of the CTO, and Matt Herreras, SLED SE manager, to answer informal questions about virtualizing HPC.

Virtualized-HPC-as-a-Service #vBrownBag Talk
Monday at 12:30pm in the Community hangspace:
http://professionalvmware.com/2014/08/vbrownbag-tech-talk-schedulevmworld-usa-2014/

INF1466: High-Performance Computing in the Virtualized Datacenter
This customer session was already full for Tuesday, 2-3pm.  It would be great to get a repeat session but no guarantees.  Edmond had in-depth results to share last year and I am looking forward to this one for more real-world experience.
 

VAPP1856: How to Engage with Your Engineering, Science, and Research Groups About Virtualization and Cloud Computing
This is with Josh Simons and Matt Herreras, Thursday 10:30-11:30am.

TEX1808: Data Plane Performance for NFV with VMware vSphere and Intel DPDK
This is with Bhavesh Davda from VMware's Office of the CTO and Edwin Verplanke, a Systems Architect for Intel, to discuss the latest on low latency networking.  A bit early on Wednesday at 8-9am but definitely still recommended.

VAPP1428: Hadoop as a Service: Utilizing VMware vCloud Automation Center and Big Data Extensions at Adobe
Discussing the business use-case but also performance recommendations and handling operations for high-utilization VMs, Monday 1-2pm and Wednesday 2:30-3:30pm.

Finally, the Office of the CTO and HPC team will have staffing at the VMware OCTO booth for additional questions or comments.

Monday, August 18, 2014

My VMworld 2014 Schedule

Hopefully everyone who is going to be attending VMworld in San Francisco this year has their accommodations already worked out.  The following is tentatively where I'll be presenting or outside of private customers meetings in a week.  I look forward to seeing you there!

Sunday - 8/24/14

I'll hopefully be working at the VCDX bootcamp at the local office in San Francisco.  I really enjoy working with the up and coming VMware architects in the industry.  I believe becoming a VCDX, and to be more specific, going through the preparation and panel process which includes focused analysis of design, tradeoffs and realities, really helped me gain a depth of understanding in architecture.  To be authoritative in one area is a first step, for example storage or networking or a specific application.  Then to become authoritative in multiple areas and orchestrate a solution is another step.  And in the VCDX process, to not only author solutions in front of customers but to do that in front of a prepared panel of experts and peer is another step altogether.  It would never be satisfying for me to sit across the table from potential architects and see them fail so it is my hope to be able to mentor and offer what insights I have as well as learn from their own experiences.  What is knowledge without perspective!

Monday - 8/25/14

12:30pm-12:45pm - #vBrownBag vHPC-as-a-Service: I'll be presenting in the Community Hangspace.  Basically a condensed lightning talk about the what/why/how of doing virtualized HPC-aaS.  If there are additional questions, that's great, as I'll be at the VMware Office of the CTO booth as well as around the venue for the next few days.

1:00-2:00pm - VAPP1428 Hadoop-as-a-Service: Chris Mutchler from Adobe and I will be giving this update on the most recent developments and examples around the growing possibilities for companies to evolve their data pipelines in an elastic, self-service, and scalable manner.

4:00-5:00pm - VAPP1807 Best Practices of Virtualizing Hadoop on vSphere - Customer Panel: Wouldn't miss this for for insight and perspective on other customers' high performance data analytics environments.

Tuesday - 8/26/14

1:00-6:00pm - Office of the CTO booth: On the Solutions Exchange floor at the VMware booth to answer questions and show off some of the bleeding edge work being done by VMware's OCTO organization.

Wednesday - 8/27/14

2:30-3:30pm - VAPP1428 Hadoop-as-a-Service (repeat session)

Thursday - 8/28/14

10:30-11:30am - STO1424 Massively Scaling VSAN Implementations: Another copresentation between Frans Van Rooyen from Adobe and myself on the early and rapidly evolving work being done to build and manage VSAN beyond single clusters and datacenters.

Thursday, June 26, 2014

#QRQ 3

My VM is locked up in VMware Fusion 6 and yet I can't restart the VM, because it's locked up:-)  How do I restart?

Hold down the (alt) option key when you click on "Virtual Machine" on the menu bar.  You should see the "Suspend" become "Force Suspend", "Shutdown" become "Force Shutdown", and so on.

Additional links:
http://kb.vmware.com/kb/1006215

Sunday, June 22, 2014

vCAC 6.x with Linux Catalog Items

Starting fresh to customize some Linux guest VMs with vCAC 6.0.1, I found that there was no simple authoritative source for doing so, at least according to several colleague queries and Googling so my apologies if I missed something obvious.  This also gives me my first chance to make a potentially [unsupported] recommendation.

I am working specifically with RHEL 6.4, CentOS 6.4, and Ubuntu 12.04 so of course using other Linux distros or versions, YMMV.  For each of those specific versions, I created a generic VM and mostly followed the defaults, installing the basic server packages and nothing else.  Feel free to customize your software and repos according to what you want in your baseline.  At the least, make sure you have Perl installed so that you can install VMware Tools and a "Core" or "Minimal" install does not even include Perl.  These were also created using static IPs.

After booting into the guest OS, install VMware Tools and I stick to the defaults here as well for each of the Linux distros and there shouldn't be much if any variance in the install options.  At this point, since we're prepping for vCAC, you most likely will benefit from including the gugent agent on your baseline.  The gugent allows callbacks to an agent service running on the deployed guest for additional configuration and customization.  However, this will vary depending on your Linux distro since some are supported, like CentOS, Redhat, and SuSE, but not Ubuntu.  Check
http://www.vmware.com/pdf/vcloud-automation-center-60-support-matrix.pdf, page 9, for details.

An excellent guide for installing the vCAC linux customization agent, and recommended blog to follow is here:
http://www.vmtocloud.com/how-to-create-a-vcac-6-linux-guest-agent-template/

If you are allowed to use DHCP, then that's pretty much it.  You can stop there and create individual blueprints in vCAC matching to each template.  However, for static IPs this is a little more unintuitive.  For Redhat and CentOS, you can follow these template prep guidelines:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.0/html/Evaluation_Guide/Evaluation_Guide-Create_RHEL_Template.html

Just leave out the first step:

# touch /.unconfigured

and follow steps 2-5 unless you want the user to have to work through network configuration dialogs every time a new VM is provisioned.  After deleting the HWADDR entry and if you have multiple eth# devices present, I have seen those sometimes getting reordered after reboot so be cautious if you are multihoming your linux VMs or have multiple private networks.  For step 4, I've been told you can also just delete the file:

rm /etc/udev/rules.d/70-persistent-net.rules

and this file will be automatically recreated after reboot anyway so haven't experienced any issues with that approach either.

Now right-click on the VM and, annoyingly enough, "Convert to Virtual Machine" isn't one of the obvious options.  You'll find it if you go to "All vCenter Actions"->"Convert to Virtual Machine".  Now I created a Linux customization spec with the following options:



Just use the virtual machine name and refer to your vCAC machine prefixes.  Incidentally, did you know that even though vCAC 6.x is stated to be multi-tenant, the prefixes are shared across tenants?
You're using UTC right?

Feel free to set to manual if you want, but the vCAC provisioning should override this to whatever static IP is available from your Network Profile.
Of course, make sure your DNS and Search Path reflect your environment.
Now when you create your blueprint, you can use this custom spec across your templates for CentOS, Redhat and Ubuntu right?




Redhat works, Ubuntu also works (though without a gugent for additional customization), but what about CentOS?  I would get an error deploying the CentOS template similar to the below, which was issued when just cloning direct from the vSphere Web Client:



 OK, so what may be unsupported is the workaround.  By setting the CentOS VM option to Redhat, this allows everything to proceed as normal and vCAC will deploy using the customization specification without griping.






Pretty straightforward once it's all in one place, hopefully:-)

Additional links:
http://www.vmware.com/pdf/vcloud-automation-center-60-support-matrix.pdf
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.0/html/Evaluation_Guide/Evaluation_Guide-Create_RHEL_Template.html
 http://www.vmtocloud.com/how-to-create-a-vcac-6-linux-guest-agent-template/

Monday, June 16, 2014

VMware Office of the CTO, High Performance Computing

If you know me, then you know that I have a passion for HPC and virtualization and I enjoy a challenge.  I have been working with several customers behind the scenes on virtualizing HPC and developing this market for VMware and had presented at VMworld last year with UCSF on early work virtualizing their genome pipelines.  Now my latest challenge is to join full-time with VMware's Office of the CTO and I am humbled to be able to learn and contribute here.

I will be working with Josh Simons, leading HPC for VMware's OCTO and an HPC veteran and formerly with Sun Microsystems, towards advancement of how VMware approaches this market and the unique problems inherent in HPC as well as the problems now being shared by large "Web-scale" distributed systems.  It was at Hadoop Summit 2 weeks ago that I saw a lot of parallels resolving between HPC and the Hadoop ecosystem.  The common problems are classic computer science issues such as resource management and utilization as well as scheduling at different layers of the system.

And of course all of this is moving so fast, it's hard not to get distracted.  Which is why I appreciate being able to focus on this space because I will be able to leverage my background in compute, network, and storage as well as work on the bleeding edge on integrating and optimizing those with next-generation applications and frameworks.

Thursday, May 29, 2014

Quick Random Question #2

In a high latency, low bandwidth ROBO deployment scenario, how can a customer optimize their ESXi image for deployment so that the remote site will not suffer due to VIB deployment, function independently and allow for unreliable connectivity?

For starters:
1. On a vanilla ESX image, run "esxcli software vib list" to get the baseline of included vibs.
2.  On one of their prod ESX images (clustered, updated, drivers, everything to match what a prod host will look like at the remote site), run "esxcli software vib list" to get all the vibs they'll need.
3. Use ESXi image builder to create the software depot you need and they can export an iso to boostrap any remote hosts.






Additional links
Installing patches on an ESXi 5.x host from the command line
http://blogs.vmware.com/vsphere/2012/04/using-the-vsphere-esxi-image-builder-cli.html

Hadoop Summit and Hadoop as a Service

In the beginning of this year, I mentioned working on Big Data and virtualization and it has been a fruitful time.  Next week I will be co-presenting with Chris Mutchler from Adobe on "Hadoop-as-a-Service for Lifecycle Management Simplicity" at the Hadoop Summit conference in San Jose, CA.  Our session will be on Wednesday from 4:35pm-5:15pm.

I am humbled and excited to help present alongside other sessions from some of the most respected names in the industry from Yahoo!, Google, Cloudera, Hortonworks, MapR, Microsoft.  The growing depth, evolution, and community of the Big Data ecosystem is impressive, to say the least.  I hope to attend other Hadoop customer sessions as well as investigate what other large players are accomplishing from their respective stacks.  I see a lot of advanced sessions around new use-cases for Hadoop and research of adding additional layers and abstractions to Hadoop.  The Adobe session is focused on the usability of Hadoop from an IT operations perspective with a few key points to make:
  • Explain why virtualizing Hadoop is good from a business, techincal and operational perspective
  • Accommodate the evolution and diversity of Big Data solutions
  • Simplify the lifecycle deployment of these layers for engineering and operations.
  • Create Hadoop-as-a-Service with VMware vSphere, Big Data Extensions, and vCloud Automation Center
Hadoop is truly becoming a complicated stack at this point.  After I started actually getting hands on and working with customers on Hadoop-specific projects in 2011, I found that calling this new technology Hadoop seemed a bit disingenuous.  There was really MapReduce and HDFS, a compute layer and a storage layer.  Even though they were tightly coupled, that was enforced for very good and simple reasons.  Spending even more time on this has given more perspective on the different layers and their corresponding workloads.  Unless you're only running one type of job for your compute layer and sucking in data from a static set of sources for your data layer, then these workloads will vary as well as vary independently.  However, in a physical world, with both layers exactly coupled, how can they scale independently and flexibly?

Enter virtualization and everything I've been working on around virtualizing distributed systems, data analytics, Hadoop, and so forth.  Consider the layering of functionality for different distributions and look for the similarities.  If you take a look at Cloudera:


Or Hortonworks:
And Pivotal HD:

As a wise donkey once talked about in a movie when describing onions and cakes, they all have layers and so does any next-gen analytics platform.  Now we have our data layer, and then a scheduling layer, then on top of that we can look at batch jobs, SQL jobs, streaming, machine learning, etc.  Many moving parts and each one with probable workload variability per application, per customer.  What abstraction layer helps pool resources, dynamically move components for elastic scale-in and scale-out and allows for flexible deployment of these many moving parts?  Virtualization is a good answer, but also one of the first questions I get is "How's performance?"  Well, I have seen vSphere scale and perform next to baremetal.  Listed below is the link to the performance whitepaper detailing performance recommendations that have been tested on large clusters.

Speaking of all these layers, this leads to complexity very quickly so another angle specifically to the Adobe Hadoop Summit presentation is around hiding this complexity from the end-developers and making it easier and faster to develop, prototype, and release their analytics features into production.  Some sessions are exploring even deeper, more complex uses of Hadoop and I am eager to see their results, however, enabling this lifecycle management for ops is essential to adoption of the latest functionality of any vendors' Big Data stack.  VMware's Big Data Extensions, and in this case with vCloud Automation Center, allows for self-service consumption and easier experimentation.  There's a (disputed) quote that has been attributed to Einstein that states "Everything should be made as simple as possible, but not simpler."  There are a few vendors working on making Hadoop easier to consume and I would argue simplifying consumption of this technology is a worthwhile goal for the community.  Dare I say even Microsoft's vision of allowing Big Data analysis via Excel is actually very intriguing if they can make it that simple to consume.

Another common question I get is "Virtualization is fine for dev/test, but how is it for production?"  First, simplicity, elasticity, and flexibility are even more important to a production environment.  And maybe more importantly, let's not discount the importance of experimentation to any software development lifecycle.  As much as Hadoop enterprise vendors would like to make any Hadoop integration turnkey with any data source, any platform, any applications, I would argue we have a long way to go.  Any innovation depends on experimentation and the ability to test out new algorithms, replacing layers of the stack, evaluating and isolating different variables in this distributed system.

One more assumption that keeps coming up is the perception that 100% utilization on a Hadoop cluster equals a high degree of efficiency.  I am not a Java guru or an expert Hadoop programmer by any means, but if you think about it, it would be very easy for me to write something that drives a Yahoo! scale set of MapReduce nodes to 100% utilization but which really gives me no benefit whatsoever.  Now take that a step further as that job can have some benefit to the user, but still be very resource inefficient.  Quantifying that is worthy of more research but for now, optimizing the efficiency of any type of job or application specification will allow better business and operational intelligence to an organization and actually make their data lake (pond, ocean, deep murky loch?) worth the money.

Add to these business and operational justifications the added security posture:
http://virtual-hiking.blogspot.com/2014/04/new-roles-and-security-in-virtualized.html
and now you should have a much better idea of the solutions that forward-thinking customers are adopting to weaponize their in-house and myriad vendor analytics platforms.

Really exciting tech and hope to see you next week in San Jose!


Additional links
Hadoop Summit:
http://hadoopsummit.org/san-jose/schedule/
http://hadoopsummit.org/san-jose/speakers/#andrew-nelson
http://hadoopsummit.org/san-jose/speakers/#chris-mutchler
Hadoop performance case study and recommendations for vSphere:
http://blogs.vmware.com/vsphere/2013/05/proving-performance-hadoop-on-vsphere-a-bare-metal-comparison.html
http://www.vmware.com/files/pdf/techpaper/hadoop-vsphere51-32hosts.pdf
Open source Project Serengeti for Hadoop automated deployment on vSphere:
http://www.projectserengeti.org/
vSphere Big Data Extensions product page:
http://www.vmware.com/products/vsphere/features-big-data
How to set up Big Data Extensions workflows through vCloud Automation Center v6.0:
https://solutionexchange.vmware.com/store/products/hadoop-as-a-service-vmware-vcloud-automation-center-and-big-data-extension#.U4broC8Wetg
Big Data Extensions setup on vSphere:
https://www.youtube.com/watch?v=KMG1QlS6yag
HBASE cluster setup with Big Data Extensions:
https://www.youtube.com/watch?v=LwcM5GQSFVY
Big Data Extensions with Isilon:
https://www.youtube.com/watch?v=FL_PXZJZUYg
Elastic Hadoop on vSphere:
https://www.youtube.com/watch?v=dh0rvwXZmJ0

Tuesday, May 6, 2014

Virtualized HPC and Customer Sessions For Your Consideration...VMworld 2014

Now that the floodgates are open, I wanted to let you know about some key sessions regarding virtualized high performance computing applications, analytics and Big Data.
https://vmworld2014.activeevents.com/scheduler/publicVoting.do

Session 1682: Agile HPC-as-a-Service with VMware vCloud Automation Center
What I am aiming at is bringing the simplicity of deployment and integration that is available with Big Data Extensions to HPC clusters.  A simple idea that's already gotten very complicated very quickly but worth the effort to research.

Session 1688: Is Someone Bitcoin Mining on my Cluster? High Performance Security for Virtualized High Performance Computing
If we have these targeted compute clusters, what are we doing to make sure they are being used appropriately?  This issue will only become more prevalent so why not learn how to get in front of it?

Session 1428: Hadoop as a Service: Utilizing VMware vCloud Automation Center and Big Data Extensions at Adobe
Real world details of a customer I have worked with virtualizing and automating Hadoop deployments, from Chris Mutchler of Adobe.  This will be focused on the automation and self-service flexibility gained through virtualization and leveraging BDE.

Session 1424: Massively scaled VSAN Implementations
Another real customer implementation detailed with Frans Van Rooyen, a Compute Platform Architect at Adobe, around their use-case for VSAN for large-scaled analytics.

Session 1466: High-Performance Computing in the Virtualized Datacenter
Edmond DeMattia from the JHU Applied Physics Laboratory discusses his latest real world experience at scale of pooling virtualized compute clusters.  His session was a hit last year and I hope he gets a chance to update everyone this year on his work.

Session 1856: How to Engage with Your Engineering, Science, and Research Groups About Virtualization and Cloud Computing
This session is being done by my two good friends Matt Herreras, systems engineering manager for SLED and Josh Simons who works for the Office of the CTO focused on HPC.  For virtualizing HPC to work, getting the buy-in from the end-user is definitely necessary.

Session 2508: Extreme Computing on vSphere
Both Josh Simons and Bhavesh Davda from the Office of the CTO at VMware presenting on their virtualization of latency-sensitive workloads using Infiniband, GPGPUs and Xeon Phi from virtual machines.

Session 1539: Why the Hypervisor isn't a Commodity. Performance Best Practices from 3 Tier to HPC workloads
Bhavesh Davda and Aaron Blasius, a Product Line Manager for vSphere, discuss performance tips and tricks from the VMKernel.

Session 1232: Reference Architectures and Best Practices for Hadoop on vSphere
Justin Murray from VMware Tech Marketing and Chris Greer from Fedex discuss their current architecture for Hadoop on vSphere as well as look at how this is evolving for the next generation of Hadoop.

Session 1697: Hadoop on vSphere for the Enterprise
Joe Russell's, PM of Storage and Big Data for VMware, first customer panel discussing their experiences of virtualizing Hadoop including Northrop Grumman, Fedex, and Adobe.

Session 1807: Best Practices of Virtualizing Hadoop on vSphere - Customer Panel
Joe Russell's second customer panel including Adobe, Wells Fargo, and GE.

Customer Sessions:
Also, in addition to the customer panels, Adobe and JHU APL sessions, I would highly recommend voting for customer sessions on the given technologies that you want to see.  I always get asked for customer references for all different kinds of technologies and now is the opportunity for you to invite customers who want to talk about their implementations and lessons learned.  Please take advantage and help validate the work that these customers have done to advance the community.  A few examples in no particular order and certainly not a definitive list:
1400, Kroger and their ROBO implementation
2382, 2505, Symantec and their cloud implementation
2463, Francis Drilling Fluids and their SDDC
2770, University of Wisconsin and migrating to the vCenter Server Appliance
1526, Greenpages/LogisticsOne and VSAN
1635, MolsonCoors and their SDDC, including virtualizing SAP, using vCOps, VIN, and SRM
1897, Boeing and their ITaaS
2687, McAfee and Intel elastic cloud
2385, McKesson OneCloud
2285, Grizzly Oil and Horizon View

Thanks for taking the time to read and vote!

Friday, April 25, 2014

Much Ado About Containers


There has been a lot of sudden interest in containers, focused in tech publications fervently discussing “Docker vs virtualization”, or around the Red Hat Summit, blogs and so on.  In my opinion, they are leaving out two words, "Cloud Foundry".  Through the buzzword bingo, it would appear that the Redhat/OpenShift camp, a Cloud Foundry competitor, is aligning with Docker.  To contrast, Cloud Foundry has used Warden for the "container" form factor for building and linking cloud applications.  Since being put into the community's hands, versus VMware or Pivotal's, it's possible that Docker will even become a choice here as well (Decker).  Ultimately, customers’ options haven't really changed even though many articles portray that a turning point is imminent.  Customers will be able to utilize the virtual form factor that fits their business needs, and dare I say fits their business culture?  

This could be on a VMware SDDC, or PaaS, which can certainly be deployed on VMware-based IaaS or of course other alternatives.  It depends on what set of abstractions they are comfortable with even though any startup focused on Docker needs to swing the needle to their side in order to better justify their existence.  I am a fan of Cloud Foundry and PaaS in general, but I don’t believe that every app will be run at that layer of abstraction.  But hey, maybe that last statement will be one of those quotes like “640K ought to be enough for anybody.”  I am just thankful to be working with such a diverse group of customers to keep this in perspective as well as get to play at the bleeding/leading edge.

Additional links:
https://groups.google.com/a/cloudfoundry.org/forum/#!topic/vcap-dev/V-lVpMpNqL4/discussion
https://docs.google.com/document/d/1DDBJlLJ7rrsM1J54MBldgQhrJdPS_xpc9zPdtuqHCTI/edit
http://blog.cloudfoundry.org/2013/09/08/combining-voice-with-velocity-thru-the-cloud-foundry-community-advisory-board/

Tuesday, April 8, 2014

New Role and Security in a Virtualized Environment

I've accepted additional responsibility in VMware’s CTO Ambassador program so in addition to working with the Office of the CTO on advanced projects around vHPC and vHadoop I will be working to get more visibility and feedback from customers directly to the PMs and R&D engineers responsible for our products.

Despite not officially being in VMware’s Network and Security business unit (NSBU) for the past year, I am still constantly called in to customers to discuss security in a virtualized environment.  In the past, I was responsible for establishing security documentation and baselines for many military branches such as the USMC and how they could achieve their security accreditation for any given site.  This effort to be able to apply DISA STIGs to virtual environments predated VMware’s standard security hardening guidelines.

Now, all of VMware’s security information can be found at http://www.vmware.com/security.  Customers want security reference architectures but they also want more detail about how to customize those for their specific environment.  If you take a vanilla installation of vSphere and apply the latest security hardening guide, you have the basis for the reference configuration that was submitted for Common Criteria certification.  For example, vCloud Networking and Security (vCNS) v5.5 recently achieved EAL 4+ certification.

Two of the biggest issues I still see in virtualized environments are inconsistency and complexity, which have a direct bearing on companies’ security posture.  One would think that with standardized templates, tools and scripts that consistency would be much easier.  However, production IT departments have to support a wide range of applications and this can quickly devolve into catalog sprawl and managing a fairly complex array of templates.  In addition, security policy becomes increasingly specific to each VM or application and again too complex to manage.

I believe the idealized notion of hackers sitting in front of their keyboard elegantly dissecting a target environment is more than a bit disingenuous.  Malware toolkits, or “Sploits”, built to do all of the tedious work are getting even better at spotting inconsistencies and exploiting them before the IT and security admins who should be most familiar with a given environment.  We used to make fun of “script kiddies”, those who ran scripted attacks without understanding why they would be effective, but the threats have evolved as always.  In an autonomous fashion, attacker toolkits can explore a local OS and network, probe for weaknesses, and evade detection in running memory.  One of their typical tasks besides evasion is to compromise any local security controls, antivirus, etc.  So why not only base security policy in the network at a different layer?  Those local controls do provide the best amount of context to what is actually going on within the OS, but without appropriate isolation, they can be easily bypassed.  Why should the IT and security admin toolkit be any less dynamic and automated?

The “Goldilocks Zone” first discussed by NASA and reinterpreted for security by Martin Casado of VMware and Nicira fame keys into leveraging virtualization as being “just right” to support security in a virtualized environment.  To me, and many customers I’ve spoken with, the virtualization layer, along with providing the baremetal abstractions, delivers the right amount of context for local apps and OS while isolated from potential malware that would otherwise corrupt or compromise those local controls.  ESXi can be this trusted layer, with TXT, and appropriate application of the security hardening guide recommendations.  This can also be audited via 3rd party tools such as Hytrust.

Within vCNS, you can configure security groups to be the logical container for policies.  You can start with static security groups that are defined pertaining to the standard virtual infrastructure layout that vSphere admins are already familiar with such as at the virtual datacenter, clusters, or port group.  The next step is to allow security groups to be dynamic by leveraging the object model AND enable partner solutions to interact with those objects to affect security policy in real-time and coordinate security response to violations.  NSX allows this through essentially creating security tags for VMs that may be read by NSX as well as NSX ready partners, for example Trend Micro and Palo Alto Networks.

Much more to be said about this, but needless to say, excited to be working more directly with customers on next-gen apps with NSX.  I will actually be discussing this live with Trend Micro and Accuvant on 4/18/14 and will post a link to a recording when it is available.

Additional links:
http://blogs.vmware.com/networkvirtualization/2014/03/goldilocks-zone-security-sddc.html
http://www.vmware.com/security/certifications
http://www.forrester.com/No+More+Chewy+Centers+Introducing+The+Zero+Trust+Model+Of+Information+Security/fulltext/-/E-RES56682
http://www.vmware.com/products/nsx

Tuesday, March 25, 2014

Do not forget NTP

I have seen a lot of recent issues related to NTP and time skew in VMware environments recently.  Any new appliances which have SSO as a dependency need time skew to be kept at a minimum.  If you're deploying vCloud Automation Center (vCAC) 6.x, then time needs to be inline with some consistent source, most likely the domain controller.  This also applies for the VMware Big Data Extensions (BDE) appliance and if you've deployed the vCloud Networking and Security (vCNS) Manager aka vShield Manager appliance in the past with vSphere 5.1. The errors may not be particular helpful unless you dig in the logs and see something similar to the following "Server returned 'request expired' less than 0 seconds after request was issued".

In a Windows world, the servers can be pointed at the Domain Controller and on a linux distro you typically point /etc/ntp.conf to one of the pool.ntp.org servers and always make sure UDP port 123 is open.  In a virtualized context, you could be lazy and synchronize with host time from VMware Tools.  Most of the appliances now have this exposed under the "Admin" tab.  But let's say you go through the ESXi host configuration and point the NTP client to the domain controller.  That should fix it right?

If you're still running into the issue and noticing that your ESXi hosts are not syncing, then you need to read this:
Synchronizing ESXi/ESX time with a Microsoft Domain Controller
because there are several steps that require going into the ESXi shell to remedy.

In the interest of strong design, please don't take NTP for granted.  Your logging, which can be mind-numbing to troubleshoot to begin with, obviously makes no sense if there are time discrepancies.  Then we have log management tools like Log Insight now for example, which have basically become mandatory.  Do you still set all your servers to a specific timezone or have you standardized on UTC?


Additional Links:
http://tycho.usno.navy.mil/NTP/
https://blogs.vmware.com/management/2014/01/vmware-vcenter-log-insight-1-5-technology-and-features.html
http://blogs.technet.com/b/askds/archive/2007/10/23/high-accuracy-w32time-requirements.aspx

Thursday, February 13, 2014

QRQ #1 (of many quick random questions)

Quick Random Question: How can we treat a VMDK as a first-class citizen for sharing data between VMs?

If we need to share a virtual disk in a read-only fashion between VMs, that's basically an ISO isn't it?  We can drop the ISO on VMFS or NFS and map it to as many VMs as we need but that's not really for a working data set.  If we need to share a virtual disk between cluster members, there is the multi-writer flag.  If you work with Oracle RAC in a vSphere environment then you know about that one.  There is the physical mode RDM to share quorum and data between cluster nodes (Is MSCS the final offender there?) and of course NFS to sharing files but what about specifically VMDKs?

Since v5.1, vCloud Director provides this capability in "Independent Disks" and allows the creation of a new disk within an Org vDC.  As the name would imply, this disk will then be independent of a vApp and be read-write to any given VM.  The capacity is given in the API call, but what about performance and placement?  You can also attach a storage profile to help define performance and chargeback.

In this particular customer's use case, it is able to be a transitional working set between VMs and vApps within that Org vDC.  To take the discussion to a new-level, they are also working with Cloud Foundry and creating vBlobs and wanted a similar storage construct that they can move around independent of a given VM's or vApp's lifecycle.

I realize that vCAC is "the new hotness" and vCD is "old and busted" but there are still several features of vCD that come in handy on a regular basis.  Such as saving vApps back to the catalog, spinning up tenant networks automatically (vCAC 6.0 + NSX is getting there), vCloud Connector for defining a Pub/Sub model for your vDC catalog.  Also I like vCD's extensible object metadata repository.  Try it, you can store anything about your VMs and use those tags to programmatically identify any sets of objects within your vDC that you want.  This reminds me I need to learn more about vCAC's custom properties and wading through the latest vCAC 6.0 SDK...and try to be a vCO ninja this year:-) Oh well, maybe a green belt at least?

Documentation links:
vCD 5.1 Independent disk links
http://pubs.vmware.com/vcd-51/index.jsp?topic=%2Fcom.vmware.vcloud.api.doc_51%2FGUID-E49DFB64-7862-4360-AD13-E73336CEA657.html
http://pubs.vmware.com/vcd-51/index.jsp?topic=%2Fcom.vmware.vcloud.api.doc_51%2FGUID-33EAA54E-391A-4636-B4CB-EE8651BD111A.html
vCD 5.5 API link (see pages 80 and 112 for independent disk info
http://pubs.vmware.com/vcd-55/topic/com.vmware.ICbase/PDF/vcd_55_api_guide.pdf
SCSI Multi-writer flag usage
http://kb.vmware.com/kb/1034165
Get and set vCloud Director metadata
https://communities.vmware.com/docs/DOC-20475


Friday, January 31, 2014

Objectives for 2014

I believe I missed the cutoff for the obligatory 2013 retrospective post, and I'm sure I will revisit as necessary, but I am really excited about 2014.  I've worked for VMware as a network and security specialist, a BCDR specialist, and a storage specialist and before that as a PSO solutions architect.  I enjoy getting my hands dirty and staying close to the customers and this year has lots of opportunities to do that.

Storage

Keeping with my storage roots, I am working a lot with Virtual SAN these days.  There's no shortage of "How to setup Virtual SAN" posts but why another scale-out storage platform?  Isilon is scale-out NAS that pretty much looks like racked servers, not to mention Gluster, Lustre.  Then ScaleIO, Nutanix, Simplivity, the list could go on...and could change on a regular basis since the storage market has been lucrative since I started paying attention in 2000.  However, Virtual SAN focuses on proactive alignment of storage capabilities with a VM's storage policy.

Traditional storage is built on basic capabilities or service levels providing availability, performance, and capacity.  Availability could be local or remote (RAID and block replication for example).  Performance could be tied to spindle count in a RAID set, amount of read or write cache, auto-tiering with flash.  Capacity can be thick or thin-provisioned.  And to get the correct and corresponding SLA was (and still is) a perpetual balancing act of enterprise storage admins.  Should this database go on RAID5 or RAID10?  What happens when the tablespaces outgrow their LUN(s)?  And so on...

Auto-tiering for performance is getting closer to the problem, however, there are still reactive policies that govern the migration of blocks between tiers.  You can place a workload on a datastore and hope that those policies react fast enough to satisfy the performance requirements and at the same time are efficient enough to make the most out of a starved flash resource.  With the rush of All-Flash Arrays (AFAs), obviously there is a market for not doing auto-tiering at all.  Why bother with policies to manage tiers if you only have the highest tier to work with?  Much simpler with an AFA and even though I believe that not all data is flash-worthy, simple as a design principle should never be discounted.

So to circle back, Virtual SAN doesn't ask for RAID configuration or LUN carving of a storage pool. By creating storage policies and assigning to VMs, now those VMs inherit the benefits of the storage pool accordingly and now availability, performance, and capacity can be proactively set per policy, as well as reacting dynamically to the workload without the traditional management headaches.  I plan on going into detail on the use cases such as scale-out applications, Big Data ,perhaps redundant as you could just as easily describe as "scale-out data", test/dev, virtual storage appliance replacement, and as a DR target.

vHPC and vHadoop

Last year I presented at VMworld US and Europe around virtualizing high performance computing with UCSF.  This year with the latest improvements to vSphere 5.5 and vCAC 6.0, I have my work cut out for me proving the flexibility and performance of using virtualization for HPC-type applications.  Early on in my career I worked at Argonne National Labs outside of Chicago and since college I have had a passion for distributed systems in computer science.  Except now I typically approach from a systems engineering perspective instead of as a programmer.  But everything I'm working on lately has programming attached, or to be more accurate event-driven scripting and slogging through APIs.

In addition, I've spent the past two years working on virtualizing Hadoop and helping customers who have been early adopters.  Automation and virtualization arguably go together very well and scaling with flexibility is key.  Even Hadoop, to me, is a bit of a misnomer these days as I think in terms of MapReduce and HDFS.  And then Hive, or Drill, or Giraph, and layers upon layers.  Sidenote: when I think about layers, I think about Shrek, and then that makes me think of "I am Legend".  So much potential for optimization of the compute and data layers, independently or tightly coupled.  Looking forward to working on several of these use-cases this year as well.

BC/DR

I started at VMware focused on business continuity and disaster recovery.  If you looked through my virtual customer whiteboards, you would see thousands of DR plans and Site Recovery Manager drawings.  I haven't given up on SRM and am pleasantly surprised by the number of customers building out stretched clusters, active/active applications across regions with tools like Gemfire, and pushing the boundaries of what and how they can abstract site services with PaaS.

Again, really looking forward to 2014 and working on lots of opportunities.

Thursday, January 30, 2014

My blog because REASONS!

Really?  Another tech blog?  Who cares?

Simply, I'm doing this for myself, to remember, and hopefully for anyone else interested, but still mainly to keep a chronicle.  I had written a blog for a few months when I was in Australia and my wife has kept our family blog alive and I read them to remember.  I wish I had the picture perfect memory I used to but with kids and time, I tend to forget all the little points along the way.  I remember the mountain tops and the dark valleys, but there is lots of stuff in between.

My work in technology is a big part of my life and I would like to share what I am working on, what I am working towards, and what am I looking forward to.  This blog is my own and will be my own opinions.  Because there's still a lot of work to do and places to go, this is my virtual hike.

Thanks for coming along on any part of it.