IT 2.0 Next Generation IT Infrastructures Thu, 12 Oct 2017 09:12:13 -0600 en-US hourly 1 15181877 So long VMware, Hello AWS Wed, 27 Sep 2017 19:12:46 +0000 I have an awesome job, an awesome manager and I work for one of the best companies around.

Yet, Friday September 29th 2017 is my last day at VMware.

On Monday October 2nd I will join Amazon Web Services as a Principal Solutions Architect.

This was not . . . → Read More: So long VMware, Hello AWS]]> I have an awesome job, an awesome manager and I work for one of the best companies around.

Yet, Friday September 29th 2017 is my last day at VMware.

On Monday October 2nd I will join Amazon Web Services as a Principal Solutions Architect.

This was not a decision I took lightly.

This blog post (in its original draft) was 7 pages long. I intended to explain, at a certain level of details, the thought process I went through to take this decision.

I eventually figured that this was just (psychologically) useful to myself and there was a chance my blog readers would be bored about it. So now you get to see only the final result and not “the 7 pages sausage making thought process”. You’re welcome.

A bit of background about me

I worked at IBM in various positions since 1994 but pretty much on x86 architectures. I started with OS/2, moved onto Windows and eventually I joined the Systems and Technology Group working on the Netfinity line of Intel based servers (soon to become eServer xSeries).

However, for me, it all started in October 2001.

I was in Redmond for a Windows 2000 Datacenter training at Microsoft when I stopped by the IBM facility in Kirkland. I spent time talking to Bob Stephens, one of the big guys there. Bob was telling me about this huge scale up big server (the xSeries 440) that was so big… that nobody really wanted/needed it.

At that point, he said they were in talks with this small startup in Silicon Valley that was working on a thin software layer that you could install on this server and that would allow a user to carve out “small software partitions” (for lack of better wording at that time) where customers could install Windows.

I distinctly remember that I chuckled at Bob’s whiteboarding and the first words that came out of my mouths were “come on Bob, are you kidding? A software layer between Windows and the hardware? This will never work!”.

Bob acknowledged my concerns and handed me a CD with a label on it: “ESX”. He suggested I gave it a try.

Next week I got back to the office in Milan, booted that CD, installed ESX 1.0 onto my xSeries 440 in the lab and I distinctly remember I chuckled (again) and said to myself “Holy moly, this thing is going to change the world entirely forever”. And it did.

The rest is history.

I have been working exclusively on VMware technologies since 2001.

Up until 2010 for IBM until, on February 2010, I joined VMware.

I guess it’s fair to say I have seen it all from ESX 1.0 all the way to what VMware is today.

Why am I leaving VMware?

There isn’t any reason other than, after 17 years, I feel the need for a professional change. VMware is an amazing company and one I could have easily retired at still having fun (which is a priority for me).

Having that said, in the last few years, it’s grown in me the desire to expand my reach into different technology areas, different communities and embrace different cultures (more on this later).  In other words, see something different from different perspectives.

VMware has always been amazing to me (both when I was a partner at IBM and an employee at VMware). I only hope I was able to pay my debt with my contributions during the last 17 years.

I will forever be grateful to the company and the community.

Why am I joining Amazon (Web Services)?

As I watched industry trends and I crossed them with my current career interests, it was clear that it was time for me to try a new experience and move on onto my next stint in the industry.

In the last few years my interests have evolved towards ”consuming” [resources and services] Vs. “building” [infrastructures]. This is something that has been close to my heart for a long time now.

Also, just like I have never been a “multi-hypervisor” type of guy, this “multi-cloud” thing hasn’t really warmed me up.

Long story short, the pivot to a leading public cloud provider with a strong personality and value proposition was becoming very appealing to me.

I started creating a short list in my head of organizations I would have liked to join. To be honest, I think there are just “two and a half” of them (I will not name names).

The output of a very well thought process is that I decided to accept an offer from Amazon Web Services for a Principal Solutions Architect role.

I have always been fascinated by the Amazon culture. There is some magic behind it IMO that I have always heard of and, at some point, I really felt the desire to live it from the inside. There are intimidating aspects of that culture but the fascinating aspects of it dwarf hands down any possible concern.

There are a couple of articles I suggest you to read and that fairly represent why the Amazon culture is, in my opinion, unique:

There are tons of such articles out there but I think these two capture my thinking fairly well.

This customer obsession attitude and unprecedented service automation at company scale is literally just mind blowing.

I couldn’t be more excited to join AWS and see this from the inside being part of it.

As a last final thought, I am joining AWS with a (relatively) senior position at Principal level and I will bring as much “experience” as I can into the new role. After all, as Andy Jessy once said, “there is no compression algorithm for experience”.

Having that said, the graph below (source here) describes accurately the mindset I am starting this new adventure with.

The more you live in this industry, the more you understand that what you know is a tiny bit of the entire universe that we are immersed in. There is no doubt why it’s always going to be “day 1”!

When I started talking about this to my (awesome) manager at VMware (Steve), I remember telling him that this opportunity is my “Cloud MBA”.

Saying that I am super excited to start this new chapter of my professional life is an understatement.

Ad maiora!


P.S. Ian Massingham ’s  feedback to this post was: “You missed one critical aspect: Ian M literally would not stop nagging me to join”. I know for a fact that he meant it, for real. That made me chuckle. I will take it as a token of appreciation from Ian.


]]> 20 992
“VMware Cloud on AWS” Vs. “Azure Stack” Mon, 18 Sep 2017 13:28:53 +0000 Introduction

VMware, Amazon Web Services and Microsoft are in the middle of some interesting technology and services roll out that have the potential of moving the needle in cloud adoption (spoiler alert: whatever cloud means). VMware is coming from a very strong (almost exclusive) marketshare in the on-prem data center virtualization . . . → Read More: “VMware Cloud on AWS” Vs. “Azure Stack”]]> Introduction

VMware, Amazon Web Services and Microsoft are in the middle of some interesting technology and services roll out that have the potential of moving the needle in cloud adoption (spoiler alert: whatever cloud means). VMware is coming from a very strong (almost exclusive) marketshare in the on-prem data center virtualization space. AWS is the 800-pounds cloud gorilla and Microsoft is one of the biggest contenders in the same public cloud space.

VMware just made available “VMC on AWS” (aka VMWonAWS, aka the possibility to run the entire VMware SDDC stack on the AWS infrastructure).

Microsoft is making available “Azure Stack” (aka an on-prem instantiation of the services available on the Azure public cloud).

These two announcements will generate (and are already generating) lots of interest in the respective communities. In this post, I would like to make a comparison between the different approaches VMware, AWS and Microsoft are taking when it comes to hybridity (again, whatever hybridity means).


The cloud industry has been poised in the last 10+ years by the fact that, when AWS pioneered it, it changed two very different paradigms at the very same time:

  • it changed the economic paradigm with a PAYGO/OPEX model (Vs. the traditional on-prem/CAPEX model).
  • it also shifted the application architectural paradigm with a cloud-native/Cattle application model (Vs. the traditional enterprise/Pet application model).

I won’t bore you more with this because I have already ranted about it a couple of years ago in my “What do Cloud Native Applications have to do with Cloud?” blog post. It would be beneficial to scan through it if the topic is of your interest.

The picture below is intended to summarize visually this multi-dimensional aspect I have alluded to in the blog post linked above:

As you can see, AWS introduced both a new economic and delivery model (X axis) as well as a new application architectural paradigm (Y axis).

In the last few years the industry has witnessed a number of attempts to bridge these two worlds and dimensions. “VMC on AWS” and “Azure Stack” are two such examples of that (albeit the respective approaches couldn’t be more different).

Let’s explore them.


When VMware and AWS teamed up, it’s clear that they focused on tackling the economic and delivery model of the traditional Enterprise data center stack dimension (X axis).

The idea is to keep the VMware software stack “constant” (and hence compatible and consistent with what the majority of the Enterprise customers are running on-prem) and make it available “as-a-Service” on the AWS infrastructure. That is to say you can consume one (or more) instances of vSphere/vSAN/NSX clusters as a service, on-demand, in the public cloud.

This picture should visually convey this concept:

In other words, the strategy here is as simple as “bringing the existing data center stack into a cloud with a cloud delivery model”.  AWS gets an additional workload capability (a huge one with an enormous total addressable market) while VMware (and its customers) get access to a different “infrastructure economic and delivery model” option.

Azure Stack

When Microsoft looked at the hybridity challenge they took a completely different approach. Instead of focusing on the economic and delivery model aspect of “the cloud”, they rather looked at it from the application architecture and the stack required to run pure cloud-native applications (Y axis).

The idea here is to keep the on-prem delivery model (somewhat) “constant” and focus on making available (in your own data center) services and technologies that usually are only available in “the cloud” (that is, the public cloud).

This picture should, ideally, convey this approach:

Note how the traditional on-prem “data center virtualization” stack from Microsoft HAVE to give way to the new “Azure Stack” stack (no pun intended). Azure Stack isn’t meant to run on a traditional data center stack nor is it meant to run traditional “pets” workloads.

In other words, the strategy here is about “bringing the cloud services required to build and run cloud-native applications into an on-prem data center”.


In conclusions, discussing and comparing “VMC on AWS” and “Azure Stack” doesn’t even make much sense given they are meant to solve completely different problems in the hybridity space. Because of this, they both have complementary holes in their respective value proposition.

“VMC on AWS” shines if you are a happy vSphere customer, you have to support “pet workloads” but you would like to do so with a different economic and delivery model (i.e. if you want to get out of the business of managing VMware clusters, if you want zero-effort global reach, if you want a “pay-per-use” infrastructure opex billing… all without touching your application stack). On the other hand, this solution doesn’t address how to run cloud-native applications on-prem (at least in a way that is compatible with the AWS services – VMware and Pivotal have their own solutions for that but this is out of scope for this blog).

“Azure Stack” shines if you are re-architecting your applications using the 12-factor application methodology (“cattle workloads”) but, for various reasons, you want to run them on-prem in your data center (i.e. you want to leverage advanced cloud services such as Azure Functions, Object Storage, CosmoDB… all without having to move your applications to the public cloud). On the other hand, this solution doesn’t address how to run traditional applications (“pets”) with a cloud(-ish) economic and delivery model.

If you have clear what your challenges are and what you need for your own organization, you have already picked the right solution for you without any further comparison.

In the end, both solutions remain spectacular projects and I am sure they will be successful in their own respective domains. There are gigantic engineering efforts that most people do not even appreciate to make these things happen.

Super kudos to the teams at AWS, Microsoft and VMware for the astonishing work.

Interesting times ahead!



]]> 11 986
A data center provisioning horror story Wed, 26 Jul 2017 12:21:38 +0000 Yesterday I noted a tweet from Frank Denneman:

I guess he was asking this in the context of the VMWonAWS cloud offering and how, with said service, you could provision vSphere capacity without having to “acquire server hardware”.

This reminded me of an anecdote I often use in . . . → Read More: A data center provisioning horror story]]> Yesterday I noted a tweet from Frank Denneman:

I guess he was asking this in the context of the VMWonAWS cloud offering and how, with said service, you could provision vSphere capacity without having to “acquire server hardware”.

This reminded me of an anecdote I often use in talks to describe some of the data center provisioning and optimization horror stories. This won’t answer Frank’s question specifically but it offers a broader view of how awful (and off rail) it could quickly get inside a data center.

It was around year 2005/2006 and I was working at IBM as a hardware pre-sales on the xSeries server line. I was involved in a traditional Server Consolidation project at a major customer. The pattern, those days, was very common:

  1. Pitch vSphere
  2. POC vSphere
  3. Assess the existing environment
  4. Produce a commercial offer that would result in an infrastructure optimization through the consolidation the largest number of physical servers currently deployed
  5. Roll out the project
  6. Go Party

We executed flawlessly up until stage #4 at which point the CIO decided to provide “value” to the discussion.  He stopped the PO process because he thought that the cost of the VMware software licenses was too high (failing to realize that the “value for the money” he was going to extract out of those was much higher than the “value for the money” he was paying for the hardware, I should add).

They decided to split the purchase of the hardware from the purchase of the VMware licenses. They executed on the former while they started a fierce negotiation for the latter with VMware directly (I think Diane Greene still remember those phone calls).

Meanwhile (circa 2 weeks), the hardware was shipped to the customer.

And the fun began.

The various teams and LOBs had projects in flight for which they needed additional capacity. The original plan was that the projects could have been served on the new virtualized infrastructure that was to become the default (some projects could have been deployed on bare metal, but that would have been more of an exception).

The core IT team had the physical servers available but didn’t have the VMware licenses that were supposed to go with the hardware. They tried to push back as much as they could those but they got to the point where they couldn’t handle it anymore.

Given that IT ran out of small servers they used in the past to serve small requirements (to be fulfilled by VMs from now on), they started to provision the newly acquired super powerful 4 sockets (2 cores) / 64GB of memory bare metal servers to host small scale out web sites and DNS servers!

While they would have traditionally used a small server for this (at a 5% utilization rate), they had now to use a monster hardware (at 0.5% utilization rate).

If you think this is bad. You saw nothing. More horror stories to come.

Time went by, negotiations came to an end and an agreement with VMware was found. As soon as the licenses were made available, a new assessment had to be done (given the data center landscape has drifted in the meanwhile).

At that time, there were strict rules and best practices re what you could (or should) have virtualized. One of those best practices were that you could (or should) not virtualize servers with a high number of CPUs (discussing the reason for which is beyond the scope of this short post).

Given those recently deployed small web sites and DNS servers showed up in the assessment as “8 CPUs servers” they were immediately deemed as servers that couldn’t be virtualized for technical reasons.

We were left with a bunch of servers that were supposed to go onto 2 vCPUs VMs in the first place but that had to go into 8 CPUs monster hardware (due to gigantically broken decisions). And we couldn’t do anything about it.

This was 2005 and lots of these specific things have changed. However, I wonder how much of these horror stories still exists nowadays in different forms and shapes.



]]> 3 979
Yelb, yet another sample app Fri, 21 Jul 2017 11:57:00 +0000 Another pet project I have spent cycles on as of late is an open source sample application called Yelb (thanks to my partner in crime chief developer Andrea Siviero for initiating me to the mysteries of Angular2).

This is the link to the Yelb repo on GitHub.

I am trying to be fairly verbose in . . . → Read More: Yelb, yet another sample app]]> Another pet project I have spent cycles on as of late is an open source sample application called Yelb (thanks to my partner in crime chief developer Andrea Siviero for initiating me to the mysteries of Angular2).

This is the link to the Yelb repo on GitHub.

I am trying to be fairly verbose in the README files in the repo so I am not going to repeat myself here. Someone said GitHub repos are the new blog posts in the DevOps and cloud era. I couldn’t agree more.

For the records, Yelb looks like this (as of today). The skeleton of this interface was literally stolen (with permission) from a sample application the Clarity team developed.

When deployed with direct Internet access it should allow people to vote and (thanks to Angular2 and the Clarity based UI) you will see graphs dynamically changing. In addition to that, Yelb also track the number of page views as well as the application layer container hostname serving the request.

I thought this was a good mix of features to be able to demo an app to an audience while inspecting what was going on in the app architecture (e.g. watching the container name serving the request changing when multiple containers are deployed in the app server layer).

Good for using it to demo Docker at a conference, good for using it as the basis to build a new deployment YAML for the 37th container orchestration framework we will see next week.

This is the architecture of the application (as of today).

Check on the GitHub repo for more (up to date) information about Yelb.

If you are into the container space I think it helps a lot owning something that you can bring from personal development (from scratch) to production. You have got to see all the problems a dev sees by taking his/her own app into production using containers and frameworks of sort.

While you are more than welcome to use Yelb for your own demos and tests (at your own peril), I truly suggest you build your own Yelb.

Not to mention the amount of things you learn as you go through these exercises. I am going to embarrass myself here by saying I didn’t even know Angular was not server side and that I didn’t know how the mechanics of the Angular compiling process worked.  Stack Overflow is such an awesome resource when you are into these things.


]]> 0 974
Project Harbor makes an entry into Rancher! Thu, 20 Jul 2017 09:03:45 +0000 This article was originally posted on the VMware Cloud Native corporate blog. I am re-posting here for the convenience of the readers of my personal blog.

Early this year I challenged myself with a pet project to create a Rancher catalog entry for Project Harbor (a VMware-started open sourced enterprise container registry).

. . . → Read More: Project Harbor makes an entry into Rancher!]]> This article was originally posted on the VMware Cloud Native corporate blog. I am re-posting here for the convenience of the readers of my personal blog.

Early this year I challenged myself with a pet project to create a Rancher catalog entry for Project Harbor (a VMware-started open sourced enterprise container registry).

This is something that I have been working on, off and on in my spare time. I originally envisioned this to be a private catalog entry. In other words, an entry that a Rancher administrator could add as a private catalog to her/his own instance.

I am happy to report that, a few weeks ago, Rancher decided to include the Project Harbor entry into the Rancher community catalog. The community catalog is a catalog curated by Rancher but populated by the community of partners and users.

The specific entry for Project Harbor is located here.

As a Rancher administrator, you don’t have to do anything to configure it other than enabling visibility of the Rancher community catalog. Once you have that option set, every Rancher user out there can point to Project Harbor and deploy it in their Cattle environments.

This is how the view of the Rancher community catalog looks like today:

Note that, as of today, this catalog entry only works with Rancher Cattle environments (depending on community interest support could be expanded to Rancher Kubernetes and Rancher Swarm environments as well).

Originally, this catalog entry had a couple of deployment models for Harbor (standalone and distributed). The last version of this catalog entry has only one model and depending on the parameters you select Harbor will be deployed on a single Docker host in the Cattle environment or it will be distributed across the hosts in a distributed fashion.

The README of the catalog entry will explain the various options and parameters available to you.

If you are interested in understanding the genesis of this pet project and all the technical details you have to consider to build such a catalog entry for Harbor, I suggest you read the original blog post that includes lots of technical insides about this implementation (including challenges, trade-offs, and limitations). Note that, at the time of this writing, the Rancher community catalog entry for Project Harbor will instantiate the OSS version of Harbor 1.1.1.

Last but not least, mind that the Rancher community catalog is supported by the community. The Project Harbor catalog entry follows the same support pattern so, if you have any issue with this catalog entry, please file an issue on the GitHub project.


]]> 0 970
Hashidays 2017 – London: a personal report Thu, 22 Jun 2017 10:06:55 +0000 I have lately started a tradition of copying/pasting reports of events I attend for the community to be able to read them. As always, they are organized as a mix of (personal) thoughts that, as such, are always questionable …. as well as raw notes that I took during the keynotes and breakout . . . → Read More: Hashidays 2017 – London: a personal report]]> I have lately started a tradition of copying/pasting reports of events I attend for the community to be able to read them. As always, they are organized as a mix of (personal) thoughts that, as such, are always questionable …. as well as raw notes that I took during the keynotes and breakout sessions.

You can find/read previous reports at these links:

Note some of these reports have private comments meant to be internal considerations to be shared with my team. These comments are removed before posting the blog publicly and replaced with <Redacted comments>.

Have a good read. Hopefully, you will find this small “give back” to the community helpful.


Massimo Re Ferré, CNA BU – VMware

Hashidays London report.  

London – June 12th 2017

Executive summary and general comments

This was in general a good full day event. Gut feeling is that the audience was fairly technical, which mapped well the spirit of the event (and HashiCorp in general).

There had been nuggets of marketing messages spread primarily by Mitchell H (e.g. “provision, secure, connect and run any infrastructure for any application”) but these messages seemed a little bit artificial and bolted on. HashiCorp remains (to me) a very engineering focused organization where the products market themselves in an (apparently) growing and loyal community of users.

There were very few mentions of Docker and Kubernetes compared to other similar conferences. While this may be due to my personal bias (I tend to attend more containers-focused conferences as of late), I found interesting that there were more time spent talking about HashiCorp view on Serverless than containers and Docker.

The HashiCorp approach to intercept the container trend seems interesting. Nomad seems to be the product they are pushing as a counter answer for the like of Docker Swarm / Docker EE and Kubernetes. Yet Nomad seems to be a general-purpose scheduler which (almost incidentally) supports Docker containers. However, a lot of the advanced networking and storage workflows available in Kubernetes and in the Docker Swarm/EE stack aren’t apparently available in Nomad.

One of the biggest tenet of HashiCorp’s strategy is, obviously, multi-cloud. They tend to compete with some specific technologies available from specific cloud providers (that only work in said cloud) so the notion of having cloud agnostic technologies that work seamlessly across different public clouds is something they leverage (a ton).

Terraform seemed to be the special product in terms of highlights and number of sessions. Packer, Vagrant were hardly mentioned outside of the keynote with Vault, Nomad and Consul sharing almost equally the remaining of the time available.

In terms of backend services and infrastructures they tend to work with (or their customers tend to end up on) I will say that the event was 100% centered around public cloud. <Redacted comments>.

All examples, talks, demos, customers’ scenarios etc. etc. were focused on public cloud consumption. If I have to guess a share of “sentiment” I’d say AWS gets a good 70% with GCP another 20% and Azure 10%. These are not hard data, just gut feelings.

The monetization strategy for HashiCorp remains (IMO) an interesting challenge. A lot (all?) of the talks from customers were based on scenarios where they were using standard open source components. Some of them specifically proud themselves for having built everything using free open source software. There was a mention that at some point this specific customer would have bought Enterprise licenses but the way it was phrased let me think this was to be done as a give-back to HashiCorp (to which they owe a lot) rather than specific technical needs for the Enterprise version of the software.

Having that said there is no doubt HashiCorp is doing amazing things technologically and their work is super well respected.

In the next few sections there are some raw notes I took during the various speeches throughout the day.

Opening Keynote (Mitchell Hashimoto)  

The HashiCorp User Group in London is the largest (1300 people) in the world.

HashiCorp strategy is to … Provision (Vagrant, Packer, Terraform), secure (Vault), connect (Consul) and run (Nomad) any infrastructure for any application.

In the last few years, lots of Enterprise features found their way into many of the above products.

The theme for Consul has been “easing the management of Consul at scale”. The family of Autopilot features is an example of that (set of features that allows Consul to self-manage itself). Some of such features are only available in the Enterprise version of Consul.

The theme for Vault has been to broaden the feature set. Replication across data centers is one such feature (achieved via log shipping).

Nomad is being adopted by largest companies first (very different pattern compared to the other HashiCorp tools). The focus recently has been on solving some interesting problems that surface with these large organizations. One such advancement is Dispatch (HashiCorp’s interpretation of Serverless). You can now also run Spark jobs on Nomad.

The theme for Terraform has been to improve platforms support. To achieve this HashiCorp is splitting the Terraform core product from the providers (managing the community of contributors is going to be easier with this model). Terraform will download providers dynamically but they will be developed and distributed separately from the Terraform product code. In the next version, you can also version the providers and require a specific version in a specific Terraform plan. “Terraform init” will download the providers.

Mitchell brings up the example of the DigitalOcean firewall feature. They didn’t know it was coming but 6 hours after the DO announcement they did receive a PR from DO that implemented all the firewall features in the DO provider (these situations are way easier to manage when community members are contributing to provider modules if these modules are not part of the core Terraform code base).

Modern Secret Management with Vault (Jeff Mitchell) 

Vault is not just an encrypted key/value store. For example, generating and managing certificates is something that Vault is proving to be very good at.

One of the key Vault features is that it provides multiple (security related) services fronted with a single API and consistent authn/authz/audit model.

Jeff talks about the concept of “Secure Introduction” (i.e. how you enable a client/consumer with a security key in the first place). There is no one size fits all. It varies and depends on your situation, infrastructure you use, what you trust and don’t trust etc. etc. This also varies if you are using bare metal, VMs, containers, public cloud, etc. as every one of these models has its own facilities to enable “secure introduction”.

Jeff then talks about a few scenarios where you could leverage Vault to secure client to app communication, app to app communication, app to DB communications and how to encrypt databases.

Going multi-cloud with Terraform and Nomad (Paddy Foran) 

Message of the session focuses on multi-cloud. Some of the reasons to choose multi-cloud are resiliency and to consume cloud-specific features (which I read as counter-intuitive to the idea of multi-cloud?).

Terraform provisions infrastructure. Terraform is declarative, graph-based (it will sort out dependencies), predictable and API agnostic.

Nomad schedules apps on infrastructure. Nomad is declarative, scalable, predictable and infrastructure agnostic.

Paddy is showing a demo of Terraform / Nomad across AWS and GCP. Paddy explains how you can use output of the AWS plan and use them as inputs for the GCP plan and vice versa. This is useful when you need to setup VPN connections between two different clouds and you want to avoid lots of manual configurations (which may be error prone).

Paddy then customizes the standard example.nomad task to deploy on the “datacenters” he created with Terraform (on AWS and GCP). This will instantiate a Redis Docker image.

The closing remark of the session is that agnostic tools should be the foundation for multi-cloud.

Running Consul at Massive Scale (James Phillips) 

James goes through some fundamental capabilities of Consul (DNS, monitoring, K/V store, etc.).

He then talks about how they have been able to solve scaling problems using a Gossip Protocol.

It was a very good and technical session arguably targeted to existing Consul users/customers that wanted to fine tune their Consul deployments at scale.

Nomad and Next-generation Application Architectures  (Armon Adgar)

Armon starts to define the role of the scheduler (broadly).

There are a couple of roles that HashiCorp took in mind when building Nomad: developers (or Nomad consumers) and infrastructure teams (or Nomad operators).

Similarly, to Terraform, Nomad is declarative (not imperative). Nomad will know how to do things without you needing to tell it.

The goal for Nomad was never to build an end-to-end platform but rather to build a tool that would do the scheduling and bring in other HashiCorp (or third party) tools to compose a platform. This after all has always been the HashiCorp spirit of building a single tool that solves a particular problem.

Monolith applications have intrinsic application complexity. Micro-services applications have intrinsic operational complexity. Frameworks has helped with monoliths much like schedulers are helping now with micro-services.

Schedulers introduce abstractions that helps with service composition.

Armon talks about the “Dispatch” jobs in Nomad (HashiCorp’s FaaS).

Evolving Your Infrastructure with Terraform (Nicki Watt)

Nicki is the CTO @ OpenCredo.

There is no right or wrong way of doing things with Terraform. It really depends on your situation and scenario.

The first example Nicki talks about is a customer that has used Terraform to deploy infrastructure on AWS to setup Kubernetes.

She walks through the various stages of maturity that customers find themselves in. They usually start with hard coded values inside a single configuration file. Then they start using variables and applying them to parametrized configuration files.

Customers then move onto pattern where you usually have a main terraform configuration file which is composed with reusable and composable modules.

Each module should have very clearly identified inputs and outputs.

The next phase is nested modules (base modules embedded into logical modules).

The last phase is to treat subcomponents of the setup (i.e. Core Infra, RDS, K8s cluster) as totally independent modules. This way you manage these components independently hence limiting the possibility of making a change (e.g. in a variable) that can affect the entire setup.

Now that you moved to this “distributed” stage of independent components and modules, you need to orchestrate what needs to be run first etc. Different people solve this problem in different ways (from README files that guide you through what you need to manually do all the way to DIY orchestration tools going through some off-the-shelf tools such as Jenkins).

This was really an awesome session! Very practical and very down on earth!

Operational Maturity with HashiCorp (James Rasell and Iain Gray)

This is a customer talk.

Elsevier has an AWS first. They have roughly 40 development teams (each with 2-3 AWS accounts and each account has 1-6 VPCs).

Very hard to manage manually at this scale. Elsevier has established a practice inside the company to streamline and optimize this infrastructure deployments (they call this practice “operational maturity”). This is the charter of the Core Engineering team.

The “operational maturity” team has 5 pillars:

  • Infrastructure governance (base infrastructure consistency across all accounts). They have achieved this via a modular Terraform approach (essentially a catalog of company standard TF modules developers re-use).
  • Release deployment governance
  • Configuration management (everything is under source control)
  • Security governance (“AMI bakery” that produces secured AMIs and make it available to developers)
  • Health monitoring

They chose Terraform because:

– it had a low barrier to entry

– it was cloud agnostic

– codified with version control

Elsevier suggests that in the future they may want to use Terraform Enterprise. Which underlines the difficulties of monetizing open source software. They are apparently extracting a great deal of value from Terraform but HashiCorp is making 0 out of it.

Code to Current Account: a Tour of the Monzo Infrastructure (Simon Vans-Colina and Matt Heath)

Enough said. They are running (almost) entirely on free software (with the exception of a small system that allows communications among banks). I assume this implies they are not using any HashiCorp Enterprise pay-for products.

Monzo went through some technology “trial and fail” such as:

  • from Mesos to Kubernetes
  • from RabbitMQ to Linkerd
  • from AWS Cloud Formation to Terraform

They, right now, have roughly 250 services. They all communicate with each other over http.

They use Linkerd for inter-services communication. Matt suggests that Linkerd integrates with Consul (if you use Consul).

They found they had to integrate with some banking systems (e.g. faster payments) via on-prem infrastructure (Matt: “these services do not provide an API, they rather provide a physical fiber connection”). They appear to be using on-prem capacity mostly as a proxy into AWS.

Terraform Introduction training

The day after the event I attended the one day “Terraform Introduction” training. This was a mix of lecture and practical exercises. The mix was fair and overall the training wasn’t bad (albeit some of the lecture was very basic and redundant with what I already knew about Terraform).

The practical side of it guides you through deploying instances on AWS, using modules, variables and Terraform Enterprise towards the end.

I would advise to take this specific training only if you are very new to Terraform given that it assumes you know nothing. If you already used Terraform in one way or another it may be too basic for you.



]]> 0 963
Kubecon 2017 – Berlin: a personal report Mon, 03 Apr 2017 13:48:58 +0000 Following the establishing best practices of ‘open sourcing’ my trip reports of conferences I attend, I am copying and pasting hereafter my raw comments related to the recent Kubecon EMEA 2017 trip.

I have done something similar for Dockercon 2016 and Serverlessconf 2016 last year and given the feedbacks I had received, this is something . . . → Read More: Kubecon 2017 – Berlin: a personal report]]> Following the establishing best practices of ‘open sourcing’ my trip reports of conferences I attend, I am copying and pasting hereafter my raw comments related to the recent Kubecon EMEA 2017 trip.

I have done something similar for Dockercon 2016 and Serverlessconf 2016 last year and given the feedbacks I had received, this is something worthwhile apparently.

As always:

  1. these reports contains some facts (hopefully I did get those right) plus personal opinions and interpretations.
  2. If you are looking for a properly written piece of art, this is not it.
  3. these are, for the most part, raw notes (sometimes written laying on the floor during those oversubscribed sessions at the conference)

Take it or leave it.


Massimo Re Ferre’ – Technical Product Manager – CNA Business Unit @ VMware

Kubecon Europe 2017 – Report

Executive Summary

In general, the event was what I largely expect. We are in the early (but consistent) stages of Kubernetes adoption. The ball is still (mainly) in the hands of geeks (gut feeling: more devs than ops). While there are pockets of efforts on the Internet to help un-initiated to get started with Kubernetes, the truth is that there is still a steep learning curve you have to go through pretty much solo. Kubecon 2017 Europe was an example of this approach: you don’t go there to learn Kubernetes from scratch (e.g. there are no 101 introductory sessions). You go there because you know Kubernetes (already) and you want to move the needle listening to someone else’s (advanced) experiences. In this sense Dockercon is (vastly) different compared to Kubecon. The former appears to be more of a “VMworld minime” at this point, while the latter is still more of a “meetup on steroids”.

All in all, the enthusiasm and the tail winds behind the project are very clear. While the jury is still out re who is going to win in this space, the odds are high that Kubernetes will be a key driving force and a technology that is going to stick around. Among all the projects at that level of the stack, Kubernetes is clearly the one with the most mind-share.

These are some additional (personal) core takeaways from the conference:

  • K8s appears to be extremely successful with startups and small organizations as well as in pockets of Enterprises. The technology has not been industrialized to the point where it has become a strategic choice (not yet at least). Because of this, the prominent deployment model seems to be “you deploy it, you own it, you consume it”. Hence RBAC, multi-tenancy and security haven’t been major concerns. We are at a stage though where, in large Enterprises, these teams that own the deployment are seeking for IT help and support in running Kubernetes for them.
  • The cloud native landscape is becoming messier and messier. The CNCF Landscape slide is making a disservice to the cloud native beginners. It doesn’t serve any other purpose than officially underline the complexity of this domain. While I am probably missing something about the strategy here, I am puzzled how the CNCF is creating category A and category B projects by listing hundreds of projects in the landscape but only selecting a small subset to be part of the CNCF.
  • This is a total gut feeling (I have no data to back this up) but up until 18/24 months ago I would have said the container orchestration/management battle was among Kubernetes, Mesos and Docker. Fast forward to these days, it is my impression that Mesos is fading out a bit. These days the industry seems to be consolidating around two major centers of gravity: one is Kubernetes and its ecosystem of distributions, the other being Docker (Inc.) and their half proprietary stack (Swarm and UCP). More precisely there seems to be a consensus that Docker is a better fit and getting traction for those projects that cannot handle the Kubernetes complexity (and consider K8s being a bazooka to shoot a fly) while Kubernetes is a better fit and getting traction for those projects that can absorb the Kubernetes complexity (and probably requires some of its advanced features). In this context Mesos seems to be in search of its own differentiating niche (possibly around big data?).
  • The open source and public cloud trends are so pervasive in this domain of the industry that it is also changing some of the competitive and positioning dynamics. While in the last 10/15 years the ‘lock-in’ argument was around ‘proprietary software’open source software’. Right now, the ‘lock-in’ argument seems to be around ‘proprietary public cloud services’ Vs. ‘open source software’. Proprietary software doesn’t even seem to be considered a contender in this domain. Instead, its evil role has been assumed by the ‘proprietary cloud services’. According to the CNCF, the only way you can fight this next level of lock-in is through (open source) software that you have the freedom to instantiate on-prem or off-prem at your will (basically de-coupling the added-value services from the infrastructure). This concept was particularly clear in Alexis Richardson’s keynote.


The Expo was pretty standard and what you’d expect to see. Dominant areas of ecosystem seem to be:

  • Kubernetes setup / lifecycle (to testify that this is a hot/challenging area)
  • Networking
  • Monitoring

My feeling is that storage seems to be “under represented” (especially considering the interest/buzz around stateful workloads). There were not a lot of startups representing this sub-domain.

Monitoring, on the other hand, seems to be ‘a thing’. Sematext and Sysdig (to name a few) have interesting offerings and solutions in this area. ‘We have a SaaS version and an on-prem version if you want it’ is the standard delivery model for these tools. Apparently.

One thing that stood out to me was Microsoft’s low profile at the conference (particularly compared to their presence at, say, Dockercon). There shouldn’t be a reason why they wouldn’t want to go big on Kubernetes (too).

Keynote (Wednesday)

There are 1500 attendees circa at the conference. Given the polls during the various breakout sessions, the majority seem to be devs with a minority of ops (of course boundaries are a bit blurry in this new world).

The keynote opens up with the news (not news) that Containerd is joining the CNCF. RKT makes the CNCF too. Patrick C and Brandon P get on stage briefly to talk about, respectively, Containerd and RKT.

Aparna Sinha (PM at Google) gets on stage to talk about K8s 1.6 (just released). She talks about the areas of improvement (namely 5000 hosts support, RBAC, dynamic storage provisioning). One of the new (?) features in the scheduler allows for “taint” / “toleration” which may be useful to segment specific worker nodes for specific namespaces e.g. dedicated nodes to tenants (this needs additional research).

Apparently RBAC has been contributed largely by Red Hat, something I have found interesting given the fact that this is an area where they try to differentiate with OpenShift.

Etcd version 3 gets a mention as having quite a quite big role in the K8s scalability enhancements (note: some customers I have been talking to are a bit concerned about how to [safely] migrate from etcd version 2 to etcd version 3).

Aparna then talks about disks. She suggests to leverage claims to decouple the K8s admin role (infrastructure aware) from the K8s user role (infrastructure agnostic). Dynamic storage provisioning is available out of the box and it supports a set of back end infrastructure (GCE, AWS, Azure, vSphere, Cinder). She finally alludes to some network policies capabilities being cooked up for next version.

I will say that tracking where all (old and new) features sit on the spectrum of experimental, beta, supported (?) is not always very easy. Sometimes a “new” features is being talked about just to find out that it has moved from one stage (e.g. experimental) to the next (i.e. beta).

Clayton Coleman from Red Hat talks about K8s security. Interestingly enough when he polls about how many people stand up and consume their own Kubernetes cluster a VAST majority of users raise their hand (assumption: very few are running a centralized or few centralized K8s instances that users access in multi-tenant mode). This is understandable given the fact that RBAC has just made into the platform. Clayton mention that security in these “personal” environments isn’t as important but as K8s will start to be deployed and managed by a central organization for users to consume it, a clear definition of roles and proper access control will be of paramount importance. As a side note, with 1.6 cluster-up doesn’t enable RBAC by default but Kubeadm does.

Brandon Philips from CoreOS is back on stage to demonstrate how you can leverage a Docker registry to not only push/pull Docker images but entire Helm apps (cool demo). Brandon suggests the standard and specs for doing this is currently being investigated and defined (hint: this is certainly an area that project Harbor should explore and possibly endorse).

Keynote (Thursday)

Alexis Richardson does a good job at defining what Cloud Native is and the associated patterns.

CNCF is “project first” (that is, they prefer to put forward actual projects than just focus on abstract standards -> they want to aggregate people around code, not standards).

Bold claim that all projects in the CNCF are interoperable.

Alexis stresses on the concept of “cloud lock-in” (Vs generic vendor lock-in). He is implying that there are more people that are going to AWS/Azure/GCP consuming higher level services (OSS operationalized by the CSP) compared to the number of people that are using and being locked in by proprietary software.

Huawei talks about their internal use case. They are running 1M (one million!) VMs. They are on a path to reduce those VMs by introducing containers.

Joe Beda (CTO at Heptio) gets on stage. He talks about how to grow the user base 10x. Joe claims that K8s contributors are more concerned about theoretical distributed model problems than they are with solving simpler practical problems (quote: “for most users out there the structures/objects we had 2 or 3 versions ago are enough to solve the majority of the problems people have. We kept adding additional constructs/objects that are innovative but didn’t move the needle in user adoption”).

Joe makes an Interesting comment about finding a good balance between solving products problems in the upstream project Vs solving them by wrapping specific features in K8s distributions (a practice he described as “building a business around the fact that K8s sucks”).

Kelsey Hightower talks about Cluster Federation. Cluster Federation is about federating different K8s clusters. The Federation API control plane is a special K8s client that coordinates dealing with multiple clusters.


These are some notes I took while attending breakout sessions. In some sessions, I could physically not step in (sometimes rooms were completely full). I skipped some of the breakouts as I opted to spend more time at the expo.


This session was presented by Docker (of course).

Containerd was born in 2015 to control/manage runC.

New in project government (but the code is “old”). It’s a core container runtime on top of which you could build a platform (Docker, K8s, Mesos, etc.)

The K8s integration will look like:

Kubelet –> CRI shim –> containerd –> containers

No (opinionated) networking support, no volumes support, no build support, no logging management support etc. etc.

Containerd uses gRPC and exposes gRPC APIs.

There is the expectation that you interact with containerd through the gRPC APIs (hence via a platform above). There is a containerd API that is NOT expected to be a viable way for a standard user to deal with containerd. That is to say… containerd will not have a fully featured/supported CLI. It’s code to be used/integrated into higher level code (e.g. Kubernetes, Docker etc.).

gRPC and container metrics are exposed via Prometheus end-point.

Full Windows support is in plan (not yet into the repo as of today).

Speaker (Justin Cormack) mentions VMware a couple of times as an example of an implementation that can replace containerd with a different runtime (i.e. VIC Engine).

Happy to report that my Containerd blog post was fairly accurate (albeit it did not go into much details):

Kubernetes Day 2 (Cluster Operations)

Presented by Brandon Philips (CoreOS CTO). Brandon’s session are always very dense and useful. Never a dull moment with him on stage.

This session covered some best practices to manage Kubernetes clusters. What stood out for me in this preso was the mechanism Tectonic uses to manage the deployment: fundamentally CoreOS deploys Kubernetes components as containers and let Kubernetes manage those containers (basically letting Kubernetes manage itself). This way Tectonic can take advantage of K8s own features from keeping the control plane up and running all the way to rolling upgrades of the API/scheduler.


This session was presented by two of the engineers responsible for the project. The session was pretty full and roughly 80% of attendees claimed to be using K8s in production (wow). Helm is a package manager for Kubernetes. Helm Charts are logical units of K8s resources + variables (note to self: research the differences between “OpenShift applications” and “Helm charts” <- they appear to be the same thing [or at least are being defined similarly]).

There is a mention about which is a front-end UI to monocular (details here:

The aim of the session was to seed the audience with new use cases for Helm that aspire to go beyond the mere packaging and interactive setup of a multi-container app.


The attendance was low. The event population being skewed towards developers tend to greatly penalize sessions that are skewed towards solutions aimed to solve primarily Ops problems.

Their value prop (at the very high level) is similar to vSphere Integrated Containers, or Intel Clear Containers for that matter: run Docker images as virtual machines (as opposed to containers). Hyper proud themselves to be hypervisor agnostic.

They claim a sub-second start-time (similar to Clear Path). Note: while the VIC value prop isn’t purely about being able to start containerVMs fast, tuning for that characteristic will help (half-joke: I would be more comfortable running VIC in production than showing a live demo of it at a meetup).

The most notable thing about Hyper is that it’s CRI compliant and it naturally fits/integrate into the K8s model as a pluggable container engine.

]]> 4 959
Docker Containerd Explained in Plain Words Thu, 16 Mar 2017 15:15:24 +0000 This article was originally posted on the VMware Cloud Native corporate blog. I am re-posting here for the convenience of the readers of my personal blog.

I have been frequently asked “what’s [Docker] Containerd?” The short answer I gave may be of benefit for the larger community so I am turning this into a short blog post. I . . . → Read More: Docker Containerd Explained in Plain Words]]> This article was originally posted on the VMware Cloud Native corporate blog. I am re-posting here for the convenience of the readers of my personal blog.

I have been frequently asked “what’s [Docker] Containerd?” The short answer I gave may be of benefit for the larger community so I am turning this into a short blog post. I hope the condensed format is useful.

Background: What’s the Problem

Docker started a technology (Docker Engine) that allows you to package and run your application in a Linux container on a single host. Linux containers have been around for decades.

Docker made them consumable for the masses. At this point a couple of things happened in parallel:

  • The ecosystem started to flourish and more open source projects started to leverage Docker (Engine) as a core building block to run containerized applications at scale on a distributed infrastructure (think Kubernetes).
  • Docker Inc. (the company) started working on solving similar problems and decided to embed into the core Docker (Engine) technologies that would help solve the problems of running containerized applications at scale on a distributed infrastructure (think Docker Swarm Mode)

This created a dynamic where Docker and Kubernetes started building similar solutions – one is building solutions to solve problems on a core building block that happens to embed another solution (created by the latter) to solve the same problems and it is creating friction in the industry.

What Happened Next?

Purists and open source advocates are advocating that, by doing so, Docker Inc. is bloating the core building block with additional bugs and instability to make space for code that isn’t needed (when third party solutions are being used). Third party vendors are claiming that Docker Inc. is creating an artificial funnel and path with commercial interest. The general fear is rooted in (1) using Docker (Engine) for free results in (2) enabling Swarm Mode for free leads to (3) buying Docker Data Center. The industry has started to bifurcate to either forking Docker (Engine) or building a completely separate container runtime.

Enter Containerd

Docker has announced Containerd (, an open-source project that the industry can use as a common container run-time to build added value on top (e.g. container orchestration, etc.)

Containerd is a daemon that runs on Linux and Windows, and it can be used to manage the container lifecycle including tasks such as image transfer, container execution, some storage and networking functions. With Containerd, Docker has evolved again to below:

There are many questions such as how Containerd will be packaged, how current Docker Engine will be re-packaged, etc. that have yet to be answered.  I will write another blog to follow up, and look forward to hearing your thoughts.


]]> 3 952
Automating vSphere Integrated Containers Deployments Tue, 10 Jan 2017 15:04:30 +0000 In the second half of 2016 I had an opportunity to focus (almost) exclusively on vSphere Integrated Containers all the way through its GA in early December.

I had been working on VIC Engine and project Bonneville since 2015 but in H2 2016 I focused pretty much full-time on that and I had the pleasure . . . → Read More: Automating vSphere Integrated Containers Deployments]]> In the second half of 2016 I had an opportunity to focus (almost) exclusively on vSphere Integrated Containers all the way through its GA in early December.

I had been working on VIC Engine and project Bonneville since 2015 but in H2 2016 I focused pretty much full-time on that and I had the pleasure to be involved in the discussions that led vSphere Integrated Containers to be what it is today.

As part of my job I found myself in situations where I had to tear up and down environments for test and other various purposes. Because of this I had been using a script (with many rough edges) to properly setup, against a given vSphere environment, the three VIC components (i.e. VIC Engine, Harbor and Admiral).

I figured I’d share that script publicly at this point as it may benefit (somehow) the community. This isn’t only for being able to use it as-is (it could be adapted and enhanced to fit your specific needs) but also because it may show you how each component is being setup and installed (you can read through the script and see how the mechanic works).

The script is called and it is available at this GitHub repo.

This solution is nothing more than the codified version of the user workflows we created for the alpha and beta phases of the VIC product (where users were given a step by step flow to setup an end-to-end VIC stack).

The purpose of this script was (also) to validate whether the documented steps where technically valid.

Using code to perform documentation quality checks, if you will. Mh…

In case this script will evolve over time I am going to document what it does and what its major limitations are on GitHub so you can always read the latest (Vs printing a description in stone on a static blog post while the code, potentially, evolves).

Please move to the GitHub repo of the project if you want to read more about it (and if you want to give it a try).



]]> 0 949
VMware Harbor as a Rancher catalog entry Mon, 09 Jan 2017 10:23:51 +0000 In the last few months I have been looking at Rancher (an open source container management product) as a way to learn more about Docker containers and understand better the ecosystem around them.

One of the things that appealed me about Rancher is the notion of an extensible catalog of application . . . → Read More: VMware Harbor as a Rancher catalog entry]]> In the last few months I have been looking at Rancher (an open source container management product) as a way to learn more about Docker containers and understand better the ecosystem around them.

One of the things that appealed me about Rancher is the notion of an extensible catalog of application and infrastructure services. There is an “official” catalog tier called Library (maintained and built by Rancher the company), there is a “community” catalog tier called Community (maintained by Rancher the company but built and supported by the Rancher community) and then there is a “private” tier (where you can add your own private catalog entries that you own and maintain).

While Rancher supports users to connect to cloud based registries, I noticed there is only one container registry package in the Community catalog that one can deploy and manage (that is Convoy Registry). I thought this was a great opportunity (and a great learning exercise) to add VMware Harbor (an Enterprise class container registry) as a private catalog item option.

If you know Rancher, something like this. If you will.

What you will see and read below should be considered, literally, the proof of a concept that, in its current shape and form, can’t be used as-is for production purposes. Having that said, if there is enough appetite for this, I am willing to work more on it and refine some of the rough edges that exist today.


Before we dive into the topic, I want to thank Raul Sanchez from Rancher that has (patiently) answered all of my questions (and fixed some of my broken yaml). Without his help this blog post would have been much shorter. Oh I also want to thank my wife and my dog (although I don’t have a dog), because everyone does that.

An escalation of difficulties learning opportunities

It became immediately very clear that there were different set of tasks I needed to accomplish to be able to achieve the initial goal. One task was, usually, dependent on the other.

In order to be able to create a Rancher catalog entry, you need to be able to instantiate your application from an application definition file (when using the default Cattle scheduler that would be a standard Docker Compose file) as well as a Rancher Compose file. You can’t really run any shell script or stuff like that as part of a Rancher catalog entry.

If you explore how Harbor gets installed on a Docker host (via the documented “on-line installer”), it’s not really compatible with the Rancher catalog model.

With the standard on-line installer, you have to download the Harbor on-line tar.gz installer file, you have to explode it, you have to set your configuration settings in the harbor.cfg file, you have to run a “prepare” script that takes the harbor.cfg file as an input and create configuration files and ENVIRONMENT variables files for you to THEN, eventually, run the Docker Compose file passing the config files and ENVIRONMENT variable files as volumes and directives of Docker Compose (note some of these steps are buried under the main install script but that’s what happens behind the scenes). At this point the Docker Compose file actually grabs the docker images off of Docker Hub and instantiate Harbor based on the configuration inputs.

At the end, what started as a simple “PoC” project, turned into three “sub-projects”:

  1. Dockerizing (is this a word?) the Harbor on-line installer so that you can include the “preparation” procedure as part of the Docker Compose and pass input parameters as variables to Docker Compose (instead of editing the harbor.cfg file manually and perform the whole preparation circus)
  2. Rancherizing (this is not definitely a word!) the dockerized Harbor on-line installer and create a Rancher private catalog entry that mimic the typical single-host Harbor setup
  3. As a bonus: rancherizing the dockerized Harbor on-line installer and create a Rancher private catalog entry that allows Harbor to be installed on a distributed cluster of Docker hosts

Note that, while I had to create a dockerized Harbor on-line installer to fit the Rancher catalog model, you can also use it for other use cases where you need to automatically stand up Harbor but you don’t have a way to do that manually and interactively (Rancher being one of those use cases).

In the next few sections, I am going to cover in more specific details what I have done to implement these sub-projects.

Sub-project #1: Dockerizing the Harbor on-line installer

At the time of this writing, Harbor 0.5.0 can be installed either using an OVA or through an installer. The installer can be on-line (images get dynamically pulled from Docker Hub) or off-line (images are shipped as part of the installer and loaded locally).

We are going to focus on the on-line installer.

As we have alluded already, once you download the on-line installer, you have to “prepare” your Harbor installation by tweaking the parameters of the harbor.cfg file that ships with a template inside the install package.

The resulting set of configurations are then passed as an inputs to the Docker Compose file (via local directories mapped as “volumes” and via the “env_file” directive).

Wouldn’t it be easier/better if you could pass the Harbor setup parameters directly to the Docker Compose file without having to go through the “preparation” process?

Enter harbor-setupwrapper.

Harbor-setupwrapper is a Harbor installer package that includes a new docker image which (more or less) implements, in a docker container, the “preparation” process. This container accepts, as inputs, Harbor configuration parameters as environment variables. Last but not least the container runs a script that launches the preparation routines (this is all self-contained inside the container).

The Dockerfile for this image and the script that kicks off the preparation routines that ships with it are worth a thousand words.

If you will, what the does, is very similar to what the does for the standard Harbor on-line installer.

We now have a new Docker Compose file which is largely based on the original Docker Compose file that ships with the official on-line installer. You can now “compose up” this new Docker Compose file passing the parameters that you would have otherwise tweaked in the harbor.cfg file.

This is a graphical representation of what I have done:

Remember this is just a PoC. Mind there are caveats!

  • I have only really tested this with the HARBORHOSTNAME and HARBOR_ADMIN_PASSWORD variables. Other variables should work but I haven’t tested them
  • There will definitely be scenarios where this will break. For example, I haven’t implemented a way to create certificates if you choose to use a secure connection (https). This would need to be implemented as additional logic inside (hint: do not try to enable https because weird things may happen)
  • The original on-line installer is (really) meant to be run on a single Docker host. The approach I have implemented in this new install mechanism honors that model and assumes the same constraints
  • Because of the above, I didn’t even try to deploy this compose file on a distributed Swarm cluster. BTW, in the transition from “legacy Swarm” to “Swarm mode” Docker Compose doesn’t seem to have gained compatibility with the latter and given I didn’t want to waste too much time with the former, I have just opted to not test it in a Swarm environment
  • More caveats that I haven’t thought about (but that certainly may exist!)

Making available the configuration files from the wrapper (generated by the script) to the application containers was the easy piece. I have achieved that with the “volumes_from” directive where the application containers would grab their relevant configuration files directly from the wrapper container.

What proved to be more challenging was figuring out a way to pass the ENVIRONMENT variables (that are, again, in various files on the wrapper container) to the application containers. I could not use the “env_file” directive in compose because the value of the directive refers to files that are visible to the system where the compose is run from (whereas in my case those files were inside the wrapper container). Long story short, I ended up tweaking the ENTRYPOINT of the application containers to point to a script that would, first, load those environment variables and, then, it would start the original script or command that was in the original ENTRYPOINT. If you are curious, you can check all the entrypoint*.sh files on the harbor-setupwrapper GitHub repo.

If you want to play with this and setup Harbor using this new mechanism, all you need to do is cloning the harbor-setupwrapper repo and “up” the Docker Compose file you find in the harbor-setupwrapper directory. However, before you launch it, you will have to export the HARBORHOSTNAME and the HARBOR_ADMIN_PASSWORD variables. This is the equivalent of tweaking the harbor.cfg file in the original installer. If you forget to export these variables, Docker Compose will show this:

root@harbor:~/harbor-setupwrapper# docker-compose up -d
WARNING: The HARBORHOSTNAME variable is not set. Defaulting to a blank string.
WARNING: The HARBOR_ADMIN_PASSWORD variable is not set. Defaulting to a blank string.
Creating network "harborsetupwrapper_default" with the default driver

At a minimum the HARBORHOSTNAME variable needs to be set and it needs to be set to the IP address or FQDN of the host you are installing it on (otherwise the setup will not be functional for reasons that I am going to explain later in the post). If you do no set the HARBOR_ADMIN_PASSWORD variable you will have to use the default Harbor password (Harbor12345).

What you want to do is this:

root@harbor:~/harbor-setupwrapper# export HARBORHOSTNAME=
root@harbor:~/harbor-setupwrapper# export HARBOR_ADMIN_PASSWORD=MySecretPassword
root@harbor:~/harbor-setupwrapper# docker-compose up -d
Creating network "harborsetupwrapper_default" with the default driver
Creating harbor-log

Hint: if you keep bringing up and down Harbor instances on the same host and you intend to start them from scratch, please remember to get rid of the /data directory on the host (because that is where the state of the instance is saved, and new instances will inherit that state if the instances find that directory).

Sub-project #2: Creating the Rancher catalog entry for the single-host deployment

Now that we have a general purpose dockerized Harbor installer that we can bring up by just doing a “compose up”, we can turn our attention to the second sub-project. That is, creating the structure of the Rancher catalog entry.

I thought this was the easy part. After all, this was pretty much about re-using the new docker-compose.yml file we discussed above, in the context of Rancher. Right? Well…

I have learned (the hard way) that the devil is always in the details and, particularly in the context of containers, a tweak here to “fix” a particular problem often means opening up a can of worms somewhere else.

I will probably report hereafter more than what you would need/want to read to be able to consume this package but I am doing so to share my pains in the hope they may help you in other circumstances (I am also documenting my findings otherwise I will forget in 2-week time).

First and foremost, in Rancher, you can only do “volume_from” within the context of a sidekick. I have originally played with adding “io.rancher.sidekicks: harbor-setupwrapper” to every container in compose. However, I suddenly found out that this would create a harbor-setupwrapper helper container for every container that this was declared a sidekick of. Albeit this sounded ok to start with, I eventually figured that running multiple instances of the preparation scripts for a single Harbor deployment may lead to various configuration inconsistencies (e.g. tokens signed with untrusted keys, etc.).

I had to revert back to a strategy where I only had one instance of the harbor-setupwrapper container (that would generate consistently all the configuration files in one go) and I have accomplished that by making it the main container with all the other application containers being sidekicks of it. Practically, I just added “io.rancher.sidekicks: registry, ui, jobservice, mysql, proxy” as a label of the harbor-setupwrapper container. Warning: I did not tell Raul this and this may horrify him (or any other Rancher expert). But it works, so bear with me.

As usual, we fixed a problem by opening up another. Name resolution with sidekick containers doesn’t really work the way you expect it to work so I had to put in place another workaround (if you are interested you can read the problem and the fix here).

There were another couple of problems I had to resolve in the process of creating a Rancher catalog entry:

  • Harbor requires that the variable “harborhostname” is set to the exact value that the user will use to connect to that harbor instance.
  • All Harbor containers needs to be deployed on a single host which is more likely going to be one of the hosts of a (Cattle) cluster of many hosts.

Again, in a move that may horrify Rancher experts, I have configured the Docker Compose file to schedule all containers on the host that has a “harbor-host=true” label.

This allowed me to make sure all containers get deployed on the same host (and, more importantly, have some degree of control over which one that is). In fact, given that I know which host my containers are going to land on, I can choose wisely the variable “harborhostname”. That could be the host IP address or the host FQDN.

Last but not least the Docker Compose file will publish ports 80 and 443 of the proxy container on the host (it goes without saying that those ports need to be free on that host, otherwise the deployment will fail). Again, perhaps not a great best practice but something basic and easy that works.

Note: remember that state is saved in the /data directory of the host so if you intend to bring up and down instances of Harbor for test purposes, you need to be aware that state is kept there across multiple deployments. This is far from how you’d run a true cloud native application but it is how Harbor (0.5.0) is architected and I am just honoring the original operational model in the Rancherization scenario for single host.

The following picture shows the details and relationships of the various components in a single-host deployment:

This picture shows the actual deployment in action in my lab:

As a recap, at the high level, the current status of the Harbor private catalog entry for Rancher for single-host deployment is as follows:

  • It only works with the Cattle scheduler
    • Building Harbor catalog versions for Swarm and K8s based off the Cattle version should be relatively trivial (last famous words)
  • This catalog entry inherits all of the limitations of the dockerized on-line installer described above (e.g. it doesn’t support https etc)
  • The Docker hosts where you pull/push images need to have the “–insecure-registry” flag set on the Docker daemon (because we can only fire up Harbor with http access)
  • One of the hosts will have to have the “harbor-host=true” label for the docker-compose to work and schedule containers properly
  • The host with the “harbor-host=true” label will have to have ports 80 and 443 available

You can locate the deliverable for this sub-project inside my overall Rancher catalog extension repo.

You can watch the single-host deployment in action in this short video:

Sub-project #3: Creating the Rancher catalog entry for the distributed deployment

This is the part where I have learned the most about the challenges of operationalizing distributed applications. It was definitely a fun useful ride.

While Harbor is shipped as a containerized application, there are some aspects of it that do not make it an ideal candidate for applying cloud native application operational best practices. It doesn’t really adhere to The Twelve-Factor App methodology.

First and foremost, the Harbor installer has been built with the assumption that all 6 containers are run on a “well-known” single host. Let me give you a couple of examples that may underline some of these challenges. We may have already mentioned some of those:

  • The Harbor package comes with an embedded syslog server that the Docker daemon talks/logs to. If you look at the original Docker Compose file, you will notice that all application containers log to implying and assuming that the syslog is running on the same host of all other containers
  • You have to enter (as a setup parameter) the exact harbor hostname that users will use to connect to the registry server. Ideally, in a cloud native context, an app should be able to work with any given IP / FQDN that has been associated with it. As a last resort you should have an option to set (post-setup) the proper IP/FQDN endpoint that the app should be using. With Harbor 0.5.0 you have to know (upfront) what that IP/FQDN is before launching the setup (making things a bit more difficult to operationalize in a dynamic, self-service and distributed environment). That is to say: if your harbor service will happen to be exposed to users as “” you have (had) to enter that string as the FQDN at deployment time (without even possibly knowing which hosts the containers are going to be deployed on)
  • As part of the assumption that Harbor runs on a single well-known host, the product saves its own state on local directories on the host it is deployed onto. This is accomplished by means of various directory mappings in the containers configuration

The goal for this sub-project is to make Harbor run distributed on a Cattle cluster and no longer on a well-known host.

In order to do that the log image gets instantiated on every node of the cluster (requirement: each node has to have the label “harbor-log=true”). A more elegant solution would be to have a separate syslog server to point to (thus getting rid completely of the log service in Docker Compose).

Also, given we don’t know which host the proxy server is going to end up on (and given that in this scenario we wanted to implement a low touch experience in terms of service discovery) we have implemented the Harbor distributed model by leveraging Traefik (as explained in this blog post by Raul). If you are familiar with Docker, what Traefik does is (somewhat) similar to the “HTTP Routing Mesh” out-of-the-box experience that Docker provides with Swarm Mode. Please note that the proxy container ports (80 and 443) do not get exposed on the host and Traefik is the only way to expose the service to the outside world (in this particular distributed implementation).

The overall idea is that your DNS can resolve the IP of where Traefik runs and then Traefik “automagically” adds the hostname you have entered at Harbor setup time to its configuration. Check Raul’s blog post for more information on the setup concepts.

Storage management has been another interesting exercise. In a distributed environment you can’t let containers store data on the server they happen to run on at any given point in time.

If the container is restarted on another host (due to a failure or due to an upgrade) it needs to get access to the same set of data. Not to mention if other containers (that may happen to run on different hosts) need to access the same set of data.

To overcome this problem, I opted to use the generic NFS service Rancher provides. This turned out to be useful, flexible and handy because it allows you to pre-provision all the volumes required (in which case they persist across re-instantiation of the Harbor catalog entry) or you can let Docker Compose create them automatically at instantiation time (in which case they will be removed when the Harbor instance gets brought down). Please note that, to horrify the purists, all volumes are mapped to all the application containers (except the log and proxy container which don’t require volumes). There is vast room for optimization here (as not all volumes need to be mapped to all containers) but I figured I’ll leave it like that for now.

This approach leaves all the hosts stateless as there is no volume directories mapping in Docker Compose (all volumes are named volumes that live on the NFS share).

The following picture shows the details and relationships of the various components in a distributed deployment:

This picture shows the actual deployment in action in my lab:

As a recap, at the high level, the current status of the Harbor private catalog entry for Rancher for distributed deployment is as follows:

  • It only works with the Cattle scheduler
    • Building Harbor catalog versions for Swarm and K8s based off the Cattle version should be relatively trivial (last famous words)
  • This catalog entry inherits all of the limitations of the dockerized on-line installer described above (e.g. it doesn’t support https etc)
  • The Docker hosts where you pull/push images need to have the “–insecure-registry” flag set on the Docker daemon (because we can only fire up Harbor with http access)
  • All of the hosts in the cluster will have to have the “harbor-log=true” label for the docker-compose to work and schedule the log image properly
  • The Traefik service (found in the Community catalog) needs to be up and running for being able to access Harbor from the outside. This has been tested exposing port 80 (note the Traefik default is 8080)
  • The NFS service (found in the Library catalog) needs to be up and running and properly configured to connect to an NFS share. The Docker Compose file has been parametrized to use, potentially, other drivers but the only one I have tested is “rancher-nfs

You can locate the deliverable for this sub-project inside my overall Rancher catalog extension repo.

You can watch a distributed deployment in action in this short video:

Overall challenges and issues

Throughout the ­project I came across a few challenges. I am going to mention some of them here mostly for future reference by keeping a personal track of those.

  • Even small things can turn into large holes. Sometimes it’s a cascading problem (i.e. to do A you need to do B but doing B requires you to do C). A good example was the Rancher sidekick requirement to be able to perform a “volume_from“. That basically broke entirely name resolution (see the single-host section for more info re what the problem was)
  • Containers coming up “all green” doesn’t mean your app is up and running (properly). There were situations where containers were starting ok with no errors but I couldn’t login into harbor (due to certificates mismatches generated by running multiple instances of the install wrapper). There were situations where I could login but couldn’t push images. And there were situations where I could push images but the UI couldn’t show them (due to the registry container not being able to resolve the ui container name because of the name resolution issues with sidekicks)
  • Debugging containers in a distributed environment is hard. At some point I came across what seemed to be a random issue to later find out that the problem was due to a particular container getting scheduled (randomly) on a specific host that was mis-configured. Fixing the issue was easy once this was root-caused. Root-causing was hard
  • Knowledge of the application internal is of paramount importance when packaging it to run in containers (and, most importantly, to orchestrate its deployment). One of the reasons for which I left all named volumes connected to all containers in the distributed scenario is because I am not 100% sure which container reads/writes from/to which volume. Plus, a thousands of other things for which not knowing the application makes its packaging difficult (particularly for debugging purposes when something doesn’t work properly). All in all, this enforces my idea that containers (and their orchestration) is more akin to how you package and run an application Vs how you manage an infrastructure
  • While container orchestration is all about automation and repeatable outcomes, it’s also a little bit like “hand-made elf art“. There was an issue at some point where (randomly) the proxy container would only show the nginx welcome page (and not the Harbor user interface). Eventually I figured that restarting that container (after the deployment) fixed the problem. I thought this was due to some sort of start up sequence. I tried leveraging the “depends_on” directive to make the proxy container start “towards the end” of the compose up but that didn’t work. It seems to work consistently now by leveraging the “external_links” directive (which, in theory, shouldn’t be required AFAIK). All in all being able to properly orchestrate the start up of containers is still very much a working in progress and a fine art (apparently since 2014)
  • Managing infrastructure (and services) to run containerized applications is hard. During my brief stint into leveraging something simple like the basic Rancher NFS service, I came across myself a few issues that I had to workaround using different levels of software, different deployment mechanisms, etc etc. Upgrades from one version of the infrastructure to another are also very critical
  • Another NFS related issue I came across was that volumes don’t get purged properly on the NFS share when the stack is brought down. In the Rancher UI the volumes seem to be gone but, looking directly at the NFS share, some of them (a random number) seem to be left over in the form of left over directories. I didn’t dig too deep into why that was the case


As I alluded to this is a rough integration that, needless to say, can be perfected (euphemism). This was primarily an awesome learning exercise and future enhancements (e.g. integration with the Kubernetes scheduler in Rancher, enabling the https protocol, etc. etc.) will allow me to stretch it even further (and possibly making it more useful).

An exercise like this is also very useful to practice some of the challenges that you could come across in the context of distributed systems with dynamic provisioning consumed via self-service. Sure this was not super complex but being able to get your hands dirty with these challenges will help you better understand the opportunities that exist to solve them.

As a side note, going deep into these experiments allows you to appreciate the difference between PaaS and CaaS (or, more generally, between a structured approach Vs an unstructured approach, if you will).

With PaaS a lot of the design decisions (particularly around scheduling, Load Balancing, name resolution etc) are solved for you out of the box. However, it may be the case that hammering your application to make it fit a distributed operational model may not work at all or it may work in a too opinionated and limited way. With an unstructured approach (like the one discussed in this blog post) there is a lot more work that needs to be done to deploy an application (and there is a lot more “sausage making” that you get expose to) but it can definitely be tailored to your specific needs.


]]> 4 946