IT 2.0 Next Generation IT Infrastructures Tue, 12 Jun 2018 14:47:07 +0000 en-US hourly 1 15181877 Compute abstractions on AWS Tue, 12 Jun 2018 14:47:07 +0000 When I joined AWS last year, I was trying to find a way to explain, in the easiest way possible, all the options the platform offers to our users from a compute perspective. There are of course many ways to peal this onion and I wanted to create a “visual story” that was . . . → Read More: Compute abstractions on AWS]]> When I joined AWS last year, I was trying to find a way to explain, in the easiest way possible, all the options the platform offers to our users from a compute perspective. There are of course many ways to peal this onion and I wanted to create a “visual story” that was easy for me to tell. I ended up drafting an animated slide that I have presented at many customers meetings and public events. I have always received positive feedbacks so I thought I would offer to tell the same story on my blog.

I spent a large chunk (if not all) of my career working on the compute domain. I personally define the compute domain as “anything that has CPU and Memory capacity that allows you to run an arbitrary piece of code written in a specific programming language”. It goes without saying that your mileage may vary in how you define it but this is a broad enough definition that should cover a lot of different interpretations.

A key part of my story is around the introduction of different levels of compute abstractions this industry has witnessed in the last 20 years or so.

In the remaining of this blog post I will unfold the story as I usually try to represent it to AWS customers.

Separation of duties

The start of my story is a line. In a cloud environment, this line defines the perimeter between the consumer role and the provider role. In the cloud, there are things that AWS will do and things that the consumer of AWS services will do. The perimeter of these responsibilities varies depending on the services you opt to use. If you want to understand more about this concept I suggest you read the AWS shared responsibility model documentation.

This is the first build-up of my visual story:

The different abstraction levels

The reason for which the line above is oblique, is because it needs to intercept different compute abstraction levels. If you think about what happened in the last 20 years of IT, we have seen a surge of different compute abstractions that have changed the way people consume CPU and Memory resources. It all started with physical (x86) servers back in the eighties and then we have seen the industry adding a number of abstraction layers over the years (i.e. hypervisors, containers, functions).

As you can depict from the graphic below, the higher you go in the abstraction levels, the more the cloud provider can add value and can offload the consumer from non-strategic activities. A lot of these activities tend to be “undifferentiated heavy lifting”. We define “undifferentiated heavy lifting” as something that an AWS customers have to do but that doesn’t necessarily differentiate them from their competitors (because those activities are table-stakes in that particular industry).

This is how the visual keeps building-up during my story:

In the next few paragraphs I am going to call out some AWS services that intercept this layout. What we found is that supporting millions of customers on the platform requires a certain degree of flexibility in the services we offer because there are many different patterns, use cases and requirements that we need to satisfy. Giving our customers choices is something AWS always strives for.

A couple of final notes before we dig deeper: the way this story (and its visual) builds up through the blog post is aligned to the announcement dates of the various services (with some duly noted exceptions).  Also, all the services mentioned in this blog post are all generally available and production-grade. There are no services in preview being disguised as generally available services. For full transparency, the integration among some of them may still be work-in-progress and this will be explicitly called out as we go through them.

The instance (or virtual machine) abstraction

This is the very first abstraction we introduced on the AWS platform back in 2006. Amazon Elastic Compute Cloud (Amazon EC2) is the service that allows AWS customers to launch instances in the cloud. When customers intercept the platform at this level, they retain responsibility of the guest Operating System and above (middleware, applications etc.) and their life-cycle. Similarly, customers leave to AWS the responsibility for managing the hardware and the hypervisor including their life-cycle.

At the very same level of the stack there is also Amazon Lightsail. Quoting from the FAQ, “Amazon Lightsail is the easiest way to get started with AWS for developers, small businesses, students, and other users who need a simple virtual private server (VPS) solution. Lightsail provides developers compute, storage, and networking capacity and capabilities to deploy and manage websites and web applications in the cloud”.

And this is how these two services appear on the slide:

The container abstraction 

With the raise of microservices, a new abstraction took the industry by storm in the last few years: containers. Containers are not a new technology but the raise of Docker a few years ago democratized access to this abstraction. In a nutshell, you can think of a container as a self-contained environment with soft boundaries that includes both your own application as well as the software dependencies to run it. Whereas an instance (or VM) virtualizes a piece of hardware so that you can run dedicated operating systems, a container technology virtualizes an operating system so that you can run separated applications with different (and often incompatible) software dependencies.

And now the tricky part. Modern containers-based solutions are usually implemented in two main logical pieces:

  • A containers “control plane” that is responsible for exposing the API and interfaces to define, deploy and life-cycle containers. This is also sometimes referred to as the container orchestration layer.
  • A containers “data plane” that is responsible for providing capacity (as in CPU/Memory/Network/Storage) so that those containers can actually run and connect to a network. From a practical perspective this is (typically) a Linux host or (less often) a Windows host where the containers get started and wired to the network.

Arguably, in a specific compute abstraction discussion, the data plane is key but it is as important to understand what’s happening for the control plane piece.

Back in 2014 Amazon launched a production-grade containers control plane called Amazon Elastic Container Service (ECS). Again, quoting from the FAQ, “Amazon Elastic Container Service (ECS) is a highly scalable, high performance container management service that supports Docker ……. Amazon ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure”.

In 2017 Amazon also announced the intention to release a new service called Amazon Elastic Container Service for Kubernetes (EKS) based on Kubernetes, a successful open source containers control plane technology. Amazon EKS has been made generally available in early June 2018.

Just like for ECS, the aim for this service is to free AWS customers from having to manage a containers control plane. In the past, AWS customers would spin up a number of EC2 instances and deploy/manage their own Kubernetes masters (masters is the name of the Kubernetes hosts running the control plane) on top of an EC2 abstraction. However, we believe many AWS customers will leave to AWS the burden of managing this layer by either consuming ECS or EKS (depending on their use cases). A comparison between ECS and EKS is beyond the scope of this blog post.

You may have noticed that everything we discussed so far is about the container control plane. How about the containers data plane? This is typically a fleet of EC2 instances managed by the customer. In this particular setup, the containers control plane is managed by AWS while the containers data plane is managed by the customer. One could argue that, with ECS and EKS, we have raised the abstraction level for the control plane but we have not yet really raised the abstraction level for the data plane as the data plane is still comprised of regular EC2 instances that the customer has responsibility for.

There is more on that later on but, for now, this is how the containers control plane and the containers data plane services appear on the slide I use to tell my story:


The function abstraction 

At re:Invent 2014, AWS also introduced another abstraction layer: AWS Lambda. Lambda is an execution environment that allows an AWS customer to run a single function on the AWS platform. So instead of having to manage and run a full-blown OS instance (to run your code), or instead of having to track all software dependencies in a user-built container (to run your code), Lambda allows you to upload your code and let AWS figure out how to run it (at scale). Again, from the FAQ: “AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume – there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app”.

What makes Lambda so special is its event driven model. As you can read from the FAQ, not only can you invoke Lambda directly (e.g. via the Amazon API Gateway) but you can trigger a Lambda function upon an event in another AWS service (e.g. an upload to Amazon S3 or a change in an Amazon DynamoDB table).

In the context of this blog post, the key point about Lambda is that you don’t have to manage the infrastructure underneath the function you are running. No need to track the status of the physical hosts, no need to track the capacity of the fleet, no need to patch the OS where the function will be running. In a nutshell, no need to spend time and money on the undifferentiated heavy lifting.

And this is how the Lambda service appears on the slide:


The bare metal abstraction

Also known as the “no abstraction”.

As recently as re:Invent 2017, we announced (the preview of) the Amazon EC2 bare metal instances. We made this service generally available to the public in May 2018.

As alluded to, at the beginning of this blog post, this announcement is part of the Amazon’s strategy to provide choice to our customers. In this case we are giving customers direct access to hardware. To quote from Jeff’s post “…. [AWS customers] wanted access to the physical resources for applications that take advantage of low-level hardware features such as performance counters and Intel® VT that are not always available or fully supported in virtualized environments, and also for applications intended to run directly on the hardware or licensed and supported for use in non-virtualized environments”.

This is how the bare metal Amazon EC2 i3.metal instance appears on the slide:

As a side note, and also as alluded by Jeff in his blog post, i3.metal is the foundational EC2 instance type on top of which VMware created their own “VMware Cloud on AWS” service. We are now offering the ability to any AWS user to provision bare metal instances. This doesn’t necessarily mean you can load your hypervisor of choice out of the box but you can certainly do things you wouldn’t be able to do with a traditional EC2 instance (note: this was just a Saturday afternoon hack).

More seriously, a question I get often asked is whether users could install ESXi on i3.metal on their own. Today this cannot be done and I’d be interested in hearing your use case for this as opposed to using the “VMware Cloud on AWS” service.

The full container abstraction (for lack of a better term)

Now that we covered all the abstractions, it is now time to go back to the whiteboard slide and see if there are other further optimization we can provide for AWS customers. When we discussed above the container abstraction, we called out that, while there are two different fully managed containers control planes (ECS and EKS), there wasn’t a managed option for the data plane (i.e. customers can only deploy their containers on top of customers owned EC2 instances).

Some customers were (and still are) happy about being in full control of said instances.

Others have been very vocal that they wanted to get out of the (undifferentiated heavy-lifting) business of managing the life cycle of that piece of infrastructure.

Enter AWS Fargate. AWS Fargate is a production-grade service that provides compute capacity to AWS containers control planes.  To quote from the Fargate service home page: “With AWS Fargate, you no longer have to provision, configure, and scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing. AWS Fargate removes the need for you to interact with or think about servers or clusters. Fargate lets you focus on designing and building your applications instead of managing the infrastructure that runs them”.

Practically speaking, Fargate is making the containers data plane fall into the “Provider space” responsibility. This means the compute unit exposed to the user is the container abstraction, while AWS will manage transparently the data plane abstractions underneath.

This is how the Fargate service appears on the slide:

As alluded to in the slide above, now ECS has two so called “launch types”: one called “EC2” (where your tasks get deployed on a customer managed fleet of EC2 instances), and the other one called “Fargate” (where your tasks get deployed on an AWS managed fleet of EC2 instances).

For EKS the strategy is very similar albeit, to quote again from the Fargate service home page, “AWS Fargate support for Amazon EKS will be available in 2018”.  For those of you interested in some of the exploration being done to make this happen, this is a good read.


In this blog post we covered the spectrum of abstraction levels available on the AWS platform and how AWS customers can intercept them depending on their use cases and where they sit on their cloud maturity journey. Customers with a “lift & shift” approach may be more akin to consume services on the left-hand side of the slide whereas customers with a more mature cloud native approach may be more interested in consuming services on the right-hand side of the slide.

In general, customers tend to use higher level services to get out of the business of managing non-differentiating activities. I was for example recently talking to a customer interested in using Fargate. The trigger there was the fact that Fargate is ISO, PCI, SOC and HIPAA compliant and this was a huge time and money saver for them (that is, it’s easier to point to an AWS document during an audit than having to architect and document for compliance the configuration of a DIY containers data plane).

As a recap, this is the final slide I tend to show with all the abstractions available:

I hope you found it useful. Any feedback is obviously greatly appreciated.


]]> 2 1021 My first 6 months at AWS Tue, 08 May 2018 08:41:02 +0000 As you may have heard, late last year I joined Amazon Web Services. I have recently turned 6 months at AWS (or 180 x Day1) and that is often a good point to pause and reflect. Also, I have got so many people asking me how I am doing here that I thought . . . → Read More: My first 6 months at AWS]]> As you may have heard, late last year I joined Amazon Web Services. I have recently turned 6 months at AWS (or 180 x Day1) and that is often a good point to pause and reflect. Also, I have got so many people asking me how I am doing here that I thought a public blog post would scale better than many 1:1 interactions.

The TL/DR version of it is: it is exactly as I have envisioned before joining; I didn’t have any major surprise; my due diligence was accurate (i.e. I did my homework properly). Actually, it’s probably slightly even better than what I thought.

I will talk (briefly) about the overall culture as well as the Solutions Architect role (my current role at AWS) in more details here below, if you are interested.

The AWS culture     

The culture at Amazon is very interesting. As part of my due diligence I was talking to a lot of people that were either working or worked at Amazon in the past and one comment stood out for me. It was on the lines of:

“Usually every vendor has its own coded values and principles but you often just read them when you join the company and you forget about those. At Amazon it’s different: you live and breathe every day the Amazon Leadership Principles”.

It couldn’t have been more true. The LPs are front and center. They dictate what you do, how you do it and, ultimately, they define the metric of your success at Amazon.

Another interesting aspect of working at Amazon is the public communication posture. I will admit that this was my biggest concern when I joined AWS because you often hear stories from the outside of “you need to erase all of your social media accounts” or “you need to stop blogging” and those sorts of things. Well, if nothing, this blog is a testament that the rumors you hear are highly exaggerated. There are indeed rules you need to follow but nothing mind-blowing to be fair. Most of them are common sense and standard social media best practices at many large organizations. Here they just take them seriously. I will concede that if you spend your day trolling people on Twitter your social media activity may need to adapt a bit. I am actually enjoying this part honestly and it’s helping me being less of a jerk on social media than I was before.

Last but not least on culture, I think it’s fair to say that I have never seen an organization so obsessed about customers. Granted this is not a secret, you can read it everywhere (including in the Leadership Principles). But trust me, the first time you get to hear things like “we think we should go meet with customer XYZ to optimize their deployment because there are savings we can probably suggest” you do really feel you are on candid camera. Customer obsession, for real.

The Solutions Architect role

In general, life (in the field) at a service company is fundamentally different than what life (in the field) looks like at a hardware or software vendor. The way a customer buys, the way you optimize, the way you help them. I am indeed learning a ton. In a Solutions Architect role you breathe this day in, day out.

One other concern I had when I joined AWS (other than the communication posture) was that I was going back to a field role in Italy (that is, not Silicon Valley). The reality is that regardless of the customers I have been working with so far (small Vs. big, mature Vs. less mature in terms of cloud adoption, etc.) I am learning a lot in every single interaction with them. I get to see the whole spectrum both in terms of size as well as in terms of use cases and complexity. I am loving it to the point that I am volunteering to sign up for a 5-hour drive to deliver a first call deck.

Being involved in the full spectrum of the use cases also allows me to not only see the traditional lift & shift scenarios (e.g. “I have 1.674 VMs in my data centers and I would like to move them to 1.674 instances on AWS”) but it also allows me to see what I consider the most interesting aspects associated to business outcomes and use cases. Forget VMs (and containers to an extent), I am building a customer demo as I type this to take clicks from the IoTbutton, send data to an object storage in CSV format, query the object storage with SQL syntax and visualize these data in a pie chart. I have done this in 4 hours. And only because I know nothing, yet, about this stuff. Someone that know better than me about this could have done it in 20 minutes.

Sure, for most this sounds like the new normal already but for someone that has spent his life on infrastructure related things this is literally jaw dropping when you think how much time (and money) this could save customers.

Which brings me to (IMO) the most interesting part of this blog post. Where am I spending the majority of my energies? What’s the role of a Solutions Architect at AWS? What is the most difficult part of the job? Where do you insist to try to get better in what you are doing? In other words, what does AWS pay you for?

The SA is one of the many functions inside AWS. SAs typically operate in a couple of dimensions:

  • they tend to have a long term working relationship with customers to architect and optimize their deployments on AWS in a 1:1 setting.
  • they contribute to generate content in the form of blogs, solutions, documentation as well as presenting at AWS and industry event in a 1:many setting.

I personally found the second dimension to be the easiest (relatively speaking). I have already done some of these things at AWS Summits and other industry events. You typically focus on an area of expertise and you deliver.

The first dimension is what I found more challenging and, frankly, the most interesting part of my job. I have broken this down into three distinct challenges/tasks that you have to carry on in parallel:

  • You get to know all of the AWS services. Before joining AWS I thought this was the most difficult challenge. After joining AWS, I figured this was the easiest part (again, relatively speaking). Don’t get me wrong, it’s a lot of work learning all the services, it’s a moving target and you will never be a guru on all of them. The challenge here is to know as much as possible of all of them.
  • In many situations you have to work backwards from the customer’s needs and translate their business objectives into a meaningful architecture that can deliver the results. This isn’t so much of a problem when the need is “I have 1.674 VMs in my data centers and I would like to move them to 1.674 instances on AWS”. But it is a challenge when the need is expressed in business terms such as “I need to do predictive maintenance on my panel bender line of products” or “I want to build a 3D map of my plants to offer training without having employees come on-site”. This is quite a challenging mental task because, among the many difficulties, it requires a good understanding of the customer’s business to actually understand (or better, anticipate proactively) the use case being discussed.
  • The third and possibly most difficult part is this though: once you get a good understanding of the services portfolio, once you get a good understanding of the use case, which one of the potential many combinations of services do you use to deliver the best solution to the customer? You will find out that there is an (almost) infinite way to build a solution but, in the end, there are only a handful of different combinations of services that make sense in a given situation. There are 5 dimensions that you usually need to consider when designing a solution: operation (you want the architecture to be easy to maintain and easy to evolve), security (you want the architecture to be secure), reliability (you want the architecture to be reliable and avoid single point of failures), performance (you want the architecture to be fast) and costs (you want the architecture to be as cost-effective as possible). It is not by chance that these aspects are the foundation pillars of the AWS Well Architected Framework. Finding the balance among all these aspects is key and possibly the most challenging (and interesting) task for any Solutions Architect. The other reason for which this is challenging is because it builds on top of #1 and #2: it assumes you have a good understanding of all of the services (perhaps the one you don’t know well and fail to consider is the one that would be the best suited) and it assumes you get a good understanding of the use case and the business needs of the customer.


All in all, I couldn’t be happier about my move. I am on track to achieve what I set myself I wanted to achieve and I am enjoying every minute of this ride. I don’t want to repeat myself but I left my previous company to join AWS and told my manager I was doing so to do a “Cloud MBA”. This is exactly what turned out to be.

The only suggestion I have is that you spend some time on this link here as it may open a world for you as it did for me:


]]> 6 1012 AWS Identity and Access Management: Introduction to Resources Access Control Thu, 05 Apr 2018 14:32:08 +0000 This is my first blog post as an AWS employee. I have spent the last 6+ months learning new things (IAM being one of them) and I figured I could (and should) share some of these learnings with my followers. I hope it can smooth the learning curve when you transition from a . . . → Read More: AWS Identity and Access Management: Introduction to Resources Access Control]]> This is my first blog post as an AWS employee. I have spent the last 6+ months learning new things (IAM being one of them) and I figured I could (and should) share some of these learnings with my followers. I hope it can smooth the learning curve when you transition from a data center centric view of the world to a cloud centric view of the world. This blog post doesn’t add new information that can’t be found in the AWS official documentation. However, it flows in a way that makes the most sense to me. Given this is a personal preference, if you are like me, I hope you may find this useful. Let’s get started.

As you learn how user and access management work in the AWS Cloud, some concepts might be unfamiliar to you if you are new to cloud computing. I recommend that you start by learning some fundamental AWS Identity and Access Management (IAM) concepts to help you securely control access to your AWS resources. In this blog post, I walk through some of the options that AWS customers has to configure access to resources. Among the many, there are four specific use cases I will be discussing throughout the document that demonstrate the flexibility and granularity that IAM provides:

  1. Assigning permissions to users within an account
  2. Assigning permissions to applications running on EC2 within an account
  3. Assigning cross account permissions to users
  4. Assigning cross account permissions to applications running in Lambda

First things first: IAM users and groups

When you first create an AWS account, you begin with an identity that has access to all AWS services and resources in the account—this is called the root user. You access your root user by signing in with the email address and password that you used to create your account. Because the root user has unlimited privileges, you need to secure the user by enabling multi-factor authentication (MFA) and then abstain from using the root user for common tasks. I recommend you use IAM users for common tasks because you can control access to the AWS services and resources in your AWS account.

Up next: IAM policies

Before we dive into the actual use cases, we need to clear the air with an additional key component: IAM policies. With IAM, you manage permissions by attaching policies to identities (such as IAM users and through groups membership) and resources (such as AWS services). You can attach permissions policies to identities (identity-based policies) or to resources (resource-based policies). Identity-based policies describe which actions the user can perform on the resources described in the policy. Resource-based policies describe which actions can be performed on a resource by the users specified in the policy.

In this blog post, I also use trust policies, which are resource-based policies that are associated with an and that define who can assume the role. IAM roles are a mechanism to receive temporary permissions to entities in the AWS Cloud. More on this later in the document.

For more information about IAM policies, see IAM Policies, which includes an overview of the JSON policy syntax (policies are written in JSON). Understanding how IAM policies are structured is important as you read the remainder of this blog post, which includes some simple IAM policy examples.

Assigning permissions to users within an account

Most AWS customers get started with IAM by creating users from their root account and associating permissions policies. You also can put users in IAM groups (an IAM best practice) and associate policies with groups instead of individual users. If you associate IAM policies with groups, users in the groups will have the group’s permissions. If you remove a user from a group, the user no longer has the permissions associated with that group. If you don’t use groups, you must manage permission to users individually. This can create more administrative work and introduces the possibility of misconfiguring permissions.

In the following diagram, a user (1) has been granted permissions to work with Amazon EC2 instances (2a) and Amazon RDS databases (2b). The user can do this work either through the AWS Management Console or by using the AWS CLI and AWS SDKs. Note that using the console requires the user to sign in using a user ID and password. However, if the user wants to interact programmatically with AWS services, the user would use an access key ID and secret access key. For every IAM user, you can decide which types of access you want to grant—console credentials, programmatic access keys, or both.

Some users use their access key ID and secret access key within their applications when accessing AWS resources. Similarly, others use credentials inside an application running on an Amazon EC2 instance that requires access to Amazon RDS or any other AWS service. These approaches can become a problem when, for example, your application breaks when you rotate keys: you don’t want to have to change the code of your application every time you change the credentials of your user because this will create a critical dependency (if you don’t change both at the same time, your application won’t have access to AWS resources). In the next section we explore how IAM can help address more properly these situations.

Assigning permissions to applications running on EC2

A best practice to allow an Amazon EC2 instance to access other AWS services is to use IAM roles. You can use an IAM role to receive temporary credentials to access another entity (such as an IAM user or an AWS service). Those entities can assume the role temporarily and gain access to the permissions assigned to the role to perform tasks.

In the case of the scenario illustrated in the preceding diagram, for the Amazon EC2 instance to be able to gain access to AWS services, your account administrator must:

  1. Create an IAM role for Amazon EC2 with a set of permissions associated with the role.
  2. Assign the role to the Amazon EC2 instance.

Note that when you create an IAM role, you are assigning both a trust policy (which defines the identity or the resource that is permitted to assume that role) and a permissions policy (which defines which actions the role can perform). In this scenario, the trust policy states that the EC2 service is the resource trusted to assume the IAM role we created, and the set of permissions assigned to the IAM role is AmazonRDSFullAccess (an AWS managed policy with the permissions assigned in the preceding diagram to the user or group). Assigning these permissions allows the Amazon EC2 instance to assume the role and have access to Amazon RDS (see the following diagram).

With roles for Amazon EC2, you do not need to keep long-term credentials on the instance to run an application. The Amazon EC2 instance uses the Amazon EC2 metadata service. You can query the metadata service explicitly and programmatically; however, the AWS CLI and SDK automatically retrieve the temporary key from the metadata service. Using the AWS CLI and SDK allows you to focus on writing your code logic instead of managing credentials. To learn more, see Retrieving Security Credentials from Instance Metadata.

The following diagram shows the Amazon EC2 instance assuming the IAM role (4). The role (3) allows the Amazon EC2 instance (and the code running inside the instance) to access the Amazon RDS resource (2b).

Assigning cross account permissions to users

An IAM role is useful in many other ways and enables you to rely on short-term credentials instead of long-term credentials. Many AWS customers require multi-account deployments, and IAM roles can provide cross-account access to resources. Note that some AWS services have native capabilities for cross-account access by virtue of being able to configure resource-based policies. However, not all AWS services support these capabilities and, in those cases, using roles can be helpful.

Continuing the example from the previous section, let’s say you have a user in AWS account #2 assume the role you created in AWS account #1 (see the following diagram). Doing so allows the user in the second account to access the Amazon RDS instance in the first account. To grant such access, you have to create a role in the first account to delegate permissions to an IAM user in the second account. This creates a trust (see the “Trust Policy” section on Roles Terms and Concepts for more information). By delegating a user in account #2 to assume a role that you created in account #1, you are creating a trust in which account #1 trusts account #2.

Note that after you create a role, you can edit it to modify its properties (for example, to list more than one account you want to trust). The following diagram illustrates a setup that allows a user in the second account to access resources (6) in the first account.

Now that you have established trust between the two accounts, an administrator in the second account can grant a user in the same account the right to assume the role in the first account. When the user has the permissions (assigned by using a permissions policy), he can assume the role created in the first account through the AWS console, through the CLI or through the APIs, to gain access to the permissions assigned to that role.

Note that the preceding diagram shows two roles (represented by two green construction hard hats) in the first account. This is because the first role has a trust relationship (as a result of the trust policy) with Amazon EC2 (meaning it’s a role that an Amazon EC2 instance can assume). The JSON policy document for the trust relationship looks like the following code example. As you can see, the Principal (the entity affected by the policy) is Amazon EC2.

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": ""
      "Action": "sts:AssumeRole"

The second role has a trust relationship with the second account, which means this trust enables an administrative user in the second account the ability to allow users and roles in that account to assume the role in the first account. The following code example is the policy document for this second role.

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account #2>:<admin user>"
      "Action": "sts:AssumeRole",
      "Condition": {}


Assigning cross account permissions to applications running in Lambda

In this post, I have showed the high level concepts of how to use IAM roles to enable a resource (such as an Amazon EC2 instance) in one AWS account to consume a service (such as Amazon RDS) in the same account. I also showed how to allow a user in one AWS account to assume a role to consume a resource in another account.

Now, I will show you how to configure a resource in one AWS account to assume a role and consume a resource in a different AWS account. For example, you can allow a Lambda function in the second AWS account to access the Amazon RDS database in the first account (see the following diagram). You can read more about Lambda here but, in essence, it’s a serverless event-driven computational model that allows you to run single application functions without requiring any host (EC2 or otherwise). The process is similar to the process that allows a user to assume a role. However, in this case, you also must create a role in the second account that will be associated with the Lambda function.

The role I create in the second account has the following trust relationship.

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": ""
      "Action": "sts:AssumeRole"


In the preceding code, I create a trust relationship with Lambda (so that this role can be associated with a Lambda function). The trust explicitly allows Lambda ( to assume a role (sts:AssumeRole). If I were to create a role and assign it to, for example, an Amazon EC2 instance instead, the principal would have been instead of

After I create this trust relationship, I need to assign it the proper permissions. The role in the second AWS account will have an inline permissions policy that explicitly allows the Lambda function to assume the role in the first account. Note that the permissions policy (edited inline in this case) should not be confused with the trust policy. This permissions policy has the same permissions that were assigned to the IAM user in the previous example. The trust relationship lets Lambda assume a role by calling the AssumeRole API, and the permissions policy assigned to the role allows Lambda to access AWS services defined in the policy. 

Now, the Lambda function can explicitly request to assume the role via the AWS CLI or SDK. The function is granted by AWS Security Token Service (AWS STS) temporary credentials that carry the permissions associated with the role in the first account that the Lambda function is assuming. As shown in the following diagram, the Lambda function in AWS account #2 is associated with (8) a role in account #2 that has permissions to assume the role in account #1 (6). And the role in account #1 has permissions (3) to access Amazon RDS (2b). As a result, the Lambda function can access Amazon RDS without relying on long-term credentials such as secret and access keys.


In this blog post I explored some fundamental concepts of AWS IAM as they relate to assigning permissions to entities on the AWS Cloud. I explained the high level concepts of how to grant users and groups in an account permissions to access resources in the same account. I then explored how to enable resources in an account to access resources in the same account. Finally, I presented a cross-account scenario that uses IAM roles.

The scenarios in this post use some variations and different API calls to obtain temporary access to AWS resources. You can learn more about the different calls and methods by visiting Requesting Temporary Security Credentials. This documentation topic also is useful starting point to dive deeper into other access management scenarios, based on your specific needs and use cases.

If you have comments about this please, submit them in the “Comments” section below. If you have questions about how to implement the solutions in this post, start a new thread on the IAM forum.


]]> 0 1003
So long VMware, Hello AWS Wed, 27 Sep 2017 19:12:46 +0000 I have an awesome job, an awesome manager and I work for one of the best companies around.

Yet, Friday September 29th 2017 is my last day at VMware.

On Monday October 2nd I will join Amazon Web Services as a Principal Solutions Architect.

This was not . . . → Read More: So long VMware, Hello AWS]]> I have an awesome job, an awesome manager and I work for one of the best companies around.

Yet, Friday September 29th 2017 is my last day at VMware.

On Monday October 2nd I will join Amazon Web Services as a Principal Solutions Architect.

This was not a decision I took lightly.

This blog post (in its original draft) was 7 pages long. I intended to explain, at a certain level of details, the thought process I went through to take this decision.

I eventually figured that this was just (psychologically) useful to myself and there was a chance my blog readers would be bored about it. So now you get to see only the final result and not “the 7 pages sausage making thought process”. You’re welcome.

A bit of background about me

I worked at IBM in various positions since 1994 but pretty much on x86 architectures. I started with OS/2, moved onto Windows and eventually I joined the Systems and Technology Group working on the Netfinity line of Intel based servers (soon to become eServer xSeries).

However, for me, it all started in October 2001.

I was in Redmond for a Windows 2000 Datacenter training at Microsoft when I stopped by the IBM facility in Kirkland. I spent time talking to Bob Stephens, one of the big guys there. Bob was telling me about this huge scale up big server (the xSeries 440) that was so big… that nobody really wanted/needed it.

At that point, he said they were in talks with this small startup in Silicon Valley that was working on a thin software layer that you could install on this server and that would allow a user to carve out “small software partitions” (for lack of better wording at that time) where customers could install Windows.

I distinctly remember that I chuckled at Bob’s whiteboarding and the first words that came out of my mouths were “come on Bob, are you kidding? A software layer between Windows and the hardware? This will never work!”.

Bob acknowledged my concerns and handed me a CD with a label on it: “ESX”. He suggested I gave it a try.

Next week I got back to the office in Milan, booted that CD, installed ESX 1.0 onto my xSeries 440 in the lab and I distinctly remember I chuckled (again) and said to myself “Holy moly, this thing is going to change the world entirely forever”. And it did.

The rest is history.

I have been working exclusively on VMware technologies since 2001.

Up until 2010 for IBM until, on February 2010, I joined VMware.

I guess it’s fair to say I have seen it all from ESX 1.0 all the way to what VMware is today.

Why am I leaving VMware?

There isn’t any reason other than, after 17 years, I feel the need for a professional change. VMware is an amazing company and one I could have easily retired at still having fun (which is a priority for me).

Having that said, in the last few years, it’s grown in me the desire to expand my reach into different technology areas, different communities and embrace different cultures (more on this later).  In other words, see something different from different perspectives.

VMware has always been amazing to me (both when I was a partner at IBM and an employee at VMware). I only hope I was able to pay my debt with my contributions during the last 17 years.

I will forever be grateful to the company and the community.

Why am I joining Amazon (Web Services)?

As I watched industry trends and I crossed them with my current career interests, it was clear that it was time for me to try a new experience and move on onto my next stint in the industry.

In the last few years my interests have evolved towards ”consuming” [resources and services] Vs. “building” [infrastructures]. This is something that has been close to my heart for a long time now.

Also, just like I have never been a “multi-hypervisor” type of guy, this “multi-cloud” thing hasn’t really warmed me up.

Long story short, the pivot to a leading public cloud provider with a strong personality and value proposition was becoming very appealing to me.

I started creating a short list in my head of organizations I would have liked to join. To be honest, I think there are just “two and a half” of them (I will not name names).

The output of a very well thought process is that I decided to accept an offer from Amazon Web Services for a Principal Solutions Architect role.

I have always been fascinated by the Amazon culture. There is some magic behind it IMO that I have always heard of and, at some point, I really felt the desire to live it from the inside. There are intimidating aspects of that culture but the fascinating aspects of it dwarf hands down any possible concern.

There are a couple of articles I suggest you to read and that fairly represent why the Amazon culture is, in my opinion, unique:

There are tons of such articles out there but I think these two capture my thinking fairly well.

This customer obsession attitude and unprecedented service automation at company scale is literally just mind blowing.

I couldn’t be more excited to join AWS and see this from the inside being part of it.

As a last final thought, I am joining AWS with a (relatively) senior position at Principal level and I will bring as much “experience” as I can into the new role. After all, as Andy Jessy once said, “there is no compression algorithm for experience”.

Having that said, the graph below (source here) describes accurately the mindset I am starting this new adventure with.

The more you live in this industry, the more you understand that what you know is a tiny bit of the entire universe that we are immersed in. There is no doubt why it’s always going to be “day 1”!

When I started talking about this to my (awesome) manager at VMware (Steve), I remember telling him that this opportunity is my “Cloud MBA”.

Saying that I am super excited to start this new chapter of my professional life is an understatement.

Ad maiora!


P.S. Ian Massingham ’s  feedback to this post was: “You missed one critical aspect: Ian M literally would not stop nagging me to join”. I know for a fact that he meant it, for real. That made me chuckle. I will take it as a token of appreciation from Ian.


]]> 22 992
“VMware Cloud on AWS” Vs. “Azure Stack” Mon, 18 Sep 2017 13:28:53 +0000 Introduction

VMware, Amazon Web Services and Microsoft are in the middle of some interesting technology and services roll out that have the potential of moving the needle in cloud adoption (spoiler alert: whatever cloud means). VMware is coming from a very strong (almost exclusive) marketshare in the on-prem data center virtualization . . . → Read More: “VMware Cloud on AWS” Vs. “Azure Stack”]]> Introduction

VMware, Amazon Web Services and Microsoft are in the middle of some interesting technology and services roll out that have the potential of moving the needle in cloud adoption (spoiler alert: whatever cloud means). VMware is coming from a very strong (almost exclusive) marketshare in the on-prem data center virtualization space. AWS is the 800-pounds cloud gorilla and Microsoft is one of the biggest contenders in the same public cloud space.

VMware just made available “VMC on AWS” (aka VMWonAWS, aka the possibility to run the entire VMware SDDC stack on the AWS infrastructure).

Microsoft is making available “Azure Stack” (aka an on-prem instantiation of the services available on the Azure public cloud).

These two announcements will generate (and are already generating) lots of interest in the respective communities. In this post, I would like to make a comparison between the different approaches VMware, AWS and Microsoft are taking when it comes to hybridity (again, whatever hybridity means).


The cloud industry has been poised in the last 10+ years by the fact that, when AWS pioneered it, it changed two very different paradigms at the very same time:

  • it changed the economic paradigm with a PAYGO/OPEX model (Vs. the traditional on-prem/CAPEX model).
  • it also shifted the application architectural paradigm with a cloud-native/Cattle application model (Vs. the traditional enterprise/Pet application model).

I won’t bore you more with this because I have already ranted about it a couple of years ago in my “What do Cloud Native Applications have to do with Cloud?” blog post. It would be beneficial to scan through it if the topic is of your interest.

The picture below is intended to summarize visually this multi-dimensional aspect I have alluded to in the blog post linked above:

As you can see, AWS introduced both a new economic and delivery model (X axis) as well as a new application architectural paradigm (Y axis).

In the last few years the industry has witnessed a number of attempts to bridge these two worlds and dimensions. “VMC on AWS” and “Azure Stack” are two such examples of that (albeit the respective approaches couldn’t be more different).

Let’s explore them.


When VMware and AWS teamed up, it’s clear that they focused on tackling the economic and delivery model of the traditional Enterprise data center stack dimension (X axis).

The idea is to keep the VMware software stack “constant” (and hence compatible and consistent with what the majority of the Enterprise customers are running on-prem) and make it available “as-a-Service” on the AWS infrastructure. That is to say you can consume one (or more) instances of vSphere/vSAN/NSX clusters as a service, on-demand, in the public cloud.

This picture should visually convey this concept:

In other words, the strategy here is as simple as “bringing the existing data center stack into a cloud with a cloud delivery model”.  AWS gets an additional workload capability (a huge one with an enormous total addressable market) while VMware (and its customers) get access to a different “infrastructure economic and delivery model” option.

Azure Stack

When Microsoft looked at the hybridity challenge they took a completely different approach. Instead of focusing on the economic and delivery model aspect of “the cloud”, they rather looked at it from the application architecture and the stack required to run pure cloud-native applications (Y axis).

The idea here is to keep the on-prem delivery model (somewhat) “constant” and focus on making available (in your own data center) services and technologies that usually are only available in “the cloud” (that is, the public cloud).

This picture should, ideally, convey this approach:

Note how the traditional on-prem “data center virtualization” stack from Microsoft HAVE to give way to the new “Azure Stack” stack (no pun intended). Azure Stack isn’t meant to run on a traditional data center stack nor is it meant to run traditional “pets” workloads.

In other words, the strategy here is about “bringing the cloud services required to build and run cloud-native applications into an on-prem data center”.


In conclusions, discussing and comparing “VMC on AWS” and “Azure Stack” doesn’t even make much sense given they are meant to solve completely different problems in the hybridity space. Because of this, they both have complementary holes in their respective value proposition.

“VMC on AWS” shines if you are a happy vSphere customer, you have to support “pet workloads” but you would like to do so with a different economic and delivery model (i.e. if you want to get out of the business of managing VMware clusters, if you want zero-effort global reach, if you want a “pay-per-use” infrastructure opex billing… all without touching your application stack). On the other hand, this solution doesn’t address how to run cloud-native applications on-prem (at least in a way that is compatible with the AWS services – VMware and Pivotal have their own solutions for that but this is out of scope for this blog).

“Azure Stack” shines if you are re-architecting your applications using the 12-factor application methodology (“cattle workloads”) but, for various reasons, you want to run them on-prem in your data center (i.e. you want to leverage advanced cloud services such as Azure Functions, Object Storage, CosmoDB… all without having to move your applications to the public cloud). On the other hand, this solution doesn’t address how to run traditional applications (“pets”) with a cloud(-ish) economic and delivery model.

If you have clear what your challenges are and what you need for your own organization, you have already picked the right solution for you without any further comparison.

In the end, both solutions remain spectacular projects and I am sure they will be successful in their own respective domains. There are gigantic engineering efforts that most people do not even appreciate to make these things happen.

Super kudos to the teams at AWS, Microsoft and VMware for the astonishing work.

Interesting times ahead!



]]> 11 986
A data center provisioning horror story Wed, 26 Jul 2017 12:21:38 +0000 Yesterday I noted a tweet from Frank Denneman:

I guess he was asking this in the context of the VMWonAWS cloud offering and how, with said service, you could provision vSphere capacity without having to “acquire server hardware”.

This reminded me of an anecdote I often use in . . . → Read More: A data center provisioning horror story]]> Yesterday I noted a tweet from Frank Denneman:

I guess he was asking this in the context of the VMWonAWS cloud offering and how, with said service, you could provision vSphere capacity without having to “acquire server hardware”.

This reminded me of an anecdote I often use in talks to describe some of the data center provisioning and optimization horror stories. This won’t answer Frank’s question specifically but it offers a broader view of how awful (and off rail) it could quickly get inside a data center.

It was around year 2005/2006 and I was working at IBM as a hardware pre-sales on the xSeries server line. I was involved in a traditional Server Consolidation project at a major customer. The pattern, those days, was very common:

  1. Pitch vSphere
  2. POC vSphere
  3. Assess the existing environment
  4. Produce a commercial offer that would result in an infrastructure optimization through the consolidation the largest number of physical servers currently deployed
  5. Roll out the project
  6. Go Party

We executed flawlessly up until stage #4 at which point the CIO decided to provide “value” to the discussion.  He stopped the PO process because he thought that the cost of the VMware software licenses was too high (failing to realize that the “value for the money” he was going to extract out of those was much higher than the “value for the money” he was paying for the hardware, I should add).

They decided to split the purchase of the hardware from the purchase of the VMware licenses. They executed on the former while they started a fierce negotiation for the latter with VMware directly (I think Diane Greene still remember those phone calls).

Meanwhile (circa 2 weeks), the hardware was shipped to the customer.

And the fun began.

The various teams and LOBs had projects in flight for which they needed additional capacity. The original plan was that the projects could have been served on the new virtualized infrastructure that was to become the default (some projects could have been deployed on bare metal, but that would have been more of an exception).

The core IT team had the physical servers available but didn’t have the VMware licenses that were supposed to go with the hardware. They tried to push back as much as they could those but they got to the point where they couldn’t handle it anymore.

Given that IT ran out of small servers they used in the past to serve small requirements (to be fulfilled by VMs from now on), they started to provision the newly acquired super powerful 4 sockets (2 cores) / 64GB of memory bare metal servers to host small scale out web sites and DNS servers!

While they would have traditionally used a small server for this (at a 5% utilization rate), they had now to use a monster hardware (at 0.5% utilization rate).

If you think this is bad. You saw nothing. More horror stories to come.

Time went by, negotiations came to an end and an agreement with VMware was found. As soon as the licenses were made available, a new assessment had to be done (given the data center landscape has drifted in the meanwhile).

At that time, there were strict rules and best practices re what you could (or should) have virtualized. One of those best practices were that you could (or should) not virtualize servers with a high number of CPUs (discussing the reason for which is beyond the scope of this short post).

Given those recently deployed small web sites and DNS servers showed up in the assessment as “8 CPUs servers” they were immediately deemed as servers that couldn’t be virtualized for technical reasons.

We were left with a bunch of servers that were supposed to go onto 2 vCPUs VMs in the first place but that had to go into 8 CPUs monster hardware (due to gigantically broken decisions). And we couldn’t do anything about it.

This was 2005 and lots of these specific things have changed. However, I wonder how much of these horror stories still exists nowadays in different forms and shapes.



]]> 3 979
Yelb, yet another sample app Fri, 21 Jul 2017 11:57:00 +0000 Another pet project I have spent cycles on as of late is an open source sample application called Yelb (thanks to my partner in crime chief developer Andrea Siviero for initiating me to the mysteries of Angular2).

This is the link to the Yelb repo on GitHub.

I am trying to be fairly verbose in . . . → Read More: Yelb, yet another sample app]]> Another pet project I have spent cycles on as of late is an open source sample application called Yelb (thanks to my partner in crime chief developer Andrea Siviero for initiating me to the mysteries of Angular2).

This is the link to the Yelb repo on GitHub.

I am trying to be fairly verbose in the README files in the repo so I am not going to repeat myself here. Someone said GitHub repos are the new blog posts in the DevOps and cloud era. I couldn’t agree more.

For the records, Yelb looks like this (as of today). The skeleton of this interface was literally stolen (with permission) from a sample application the Clarity team developed.

When deployed with direct Internet access it should allow people to vote and (thanks to Angular2 and the Clarity based UI) you will see graphs dynamically changing. In addition to that, Yelb also track the number of page views as well as the application layer container hostname serving the request.

I thought this was a good mix of features to be able to demo an app to an audience while inspecting what was going on in the app architecture (e.g. watching the container name serving the request changing when multiple containers are deployed in the app server layer).

Good for using it to demo Docker at a conference, good for using it as the basis to build a new deployment YAML for the 37th container orchestration framework we will see next week.

This is the architecture of the application (as of today).

Check on the GitHub repo for more (up to date) information about Yelb.

If you are into the container space I think it helps a lot owning something that you can bring from personal development (from scratch) to production. You have got to see all the problems a dev sees by taking his/her own app into production using containers and frameworks of sort.

While you are more than welcome to use Yelb for your own demos and tests (at your own peril), I truly suggest you build your own Yelb.

Not to mention the amount of things you learn as you go through these exercises. I am going to embarrass myself here by saying I didn’t even know Angular was not server side and that I didn’t know how the mechanics of the Angular compiling process worked.  Stack Overflow is such an awesome resource when you are into these things.


]]> 1 974
Project Harbor makes an entry into Rancher! Thu, 20 Jul 2017 09:03:45 +0000 This article was originally posted on the VMware Cloud Native corporate blog. I am re-posting here for the convenience of the readers of my personal blog.

Early this year I challenged myself with a pet project to create a Rancher catalog entry for Project Harbor (a VMware-started open sourced enterprise container registry).

. . . → Read More: Project Harbor makes an entry into Rancher!]]> This article was originally posted on the VMware Cloud Native corporate blog. I am re-posting here for the convenience of the readers of my personal blog.

Early this year I challenged myself with a pet project to create a Rancher catalog entry for Project Harbor (a VMware-started open sourced enterprise container registry).

This is something that I have been working on, off and on in my spare time. I originally envisioned this to be a private catalog entry. In other words, an entry that a Rancher administrator could add as a private catalog to her/his own instance.

I am happy to report that, a few weeks ago, Rancher decided to include the Project Harbor entry into the Rancher community catalog. The community catalog is a catalog curated by Rancher but populated by the community of partners and users.

The specific entry for Project Harbor is located here.

As a Rancher administrator, you don’t have to do anything to configure it other than enabling visibility of the Rancher community catalog. Once you have that option set, every Rancher user out there can point to Project Harbor and deploy it in their Cattle environments.

This is how the view of the Rancher community catalog looks like today:

Note that, as of today, this catalog entry only works with Rancher Cattle environments (depending on community interest support could be expanded to Rancher Kubernetes and Rancher Swarm environments as well).

Originally, this catalog entry had a couple of deployment models for Harbor (standalone and distributed). The last version of this catalog entry has only one model and depending on the parameters you select Harbor will be deployed on a single Docker host in the Cattle environment or it will be distributed across the hosts in a distributed fashion.

The README of the catalog entry will explain the various options and parameters available to you.

If you are interested in understanding the genesis of this pet project and all the technical details you have to consider to build such a catalog entry for Harbor, I suggest you read the original blog post that includes lots of technical insides about this implementation (including challenges, trade-offs, and limitations). Note that, at the time of this writing, the Rancher community catalog entry for Project Harbor will instantiate the OSS version of Harbor 1.1.1.

Last but not least, mind that the Rancher community catalog is supported by the community. The Project Harbor catalog entry follows the same support pattern so, if you have any issue with this catalog entry, please file an issue on the GitHub project.


]]> 0 970
Hashidays 2017 – London: a personal report Thu, 22 Jun 2017 10:06:55 +0000 I have lately started a tradition of copying/pasting reports of events I attend for the community to be able to read them. As always, they are organized as a mix of (personal) thoughts that, as such, are always questionable …. as well as raw notes that I took during the keynotes and breakout . . . → Read More: Hashidays 2017 – London: a personal report]]> I have lately started a tradition of copying/pasting reports of events I attend for the community to be able to read them. As always, they are organized as a mix of (personal) thoughts that, as such, are always questionable …. as well as raw notes that I took during the keynotes and breakout sessions.

You can find/read previous reports at these links:

Note some of these reports have private comments meant to be internal considerations to be shared with my team. These comments are removed before posting the blog publicly and replaced with <Redacted comments>.

Have a good read. Hopefully, you will find this small “give back” to the community helpful.


Massimo Re Ferré, CNA BU – VMware

Hashidays London report.  

London – June 12th 2017

Executive summary and general comments

This was in general a good full day event. Gut feeling is that the audience was fairly technical, which mapped well the spirit of the event (and HashiCorp in general).

There had been nuggets of marketing messages spread primarily by Mitchell H (e.g. “provision, secure, connect and run any infrastructure for any application”) but these messages seemed a little bit artificial and bolted on. HashiCorp remains (to me) a very engineering focused organization where the products market themselves in an (apparently) growing and loyal community of users.

There were very few mentions of Docker and Kubernetes compared to other similar conferences. While this may be due to my personal bias (I tend to attend more containers-focused conferences as of late), I found interesting that there were more time spent talking about HashiCorp view on Serverless than containers and Docker.

The HashiCorp approach to intercept the container trend seems interesting. Nomad seems to be the product they are pushing as a counter answer for the like of Docker Swarm / Docker EE and Kubernetes. Yet Nomad seems to be a general-purpose scheduler which (almost incidentally) supports Docker containers. However, a lot of the advanced networking and storage workflows available in Kubernetes and in the Docker Swarm/EE stack aren’t apparently available in Nomad.

One of the biggest tenet of HashiCorp’s strategy is, obviously, multi-cloud. They tend to compete with some specific technologies available from specific cloud providers (that only work in said cloud) so the notion of having cloud agnostic technologies that work seamlessly across different public clouds is something they leverage (a ton).

Terraform seemed to be the special product in terms of highlights and number of sessions. Packer, Vagrant were hardly mentioned outside of the keynote with Vault, Nomad and Consul sharing almost equally the remaining of the time available.

In terms of backend services and infrastructures they tend to work with (or their customers tend to end up on) I will say that the event was 100% centered around public cloud. <Redacted comments>.

All examples, talks, demos, customers’ scenarios etc. etc. were focused on public cloud consumption. If I have to guess a share of “sentiment” I’d say AWS gets a good 70% with GCP another 20% and Azure 10%. These are not hard data, just gut feelings.

The monetization strategy for HashiCorp remains (IMO) an interesting challenge. A lot (all?) of the talks from customers were based on scenarios where they were using standard open source components. Some of them specifically proud themselves for having built everything using free open source software. There was a mention that at some point this specific customer would have bought Enterprise licenses but the way it was phrased let me think this was to be done as a give-back to HashiCorp (to which they owe a lot) rather than specific technical needs for the Enterprise version of the software.

Having that said there is no doubt HashiCorp is doing amazing things technologically and their work is super well respected.

In the next few sections there are some raw notes I took during the various speeches throughout the day.

Opening Keynote (Mitchell Hashimoto)  

The HashiCorp User Group in London is the largest (1300 people) in the world.

HashiCorp strategy is to … Provision (Vagrant, Packer, Terraform), secure (Vault), connect (Consul) and run (Nomad) any infrastructure for any application.

In the last few years, lots of Enterprise features found their way into many of the above products.

The theme for Consul has been “easing the management of Consul at scale”. The family of Autopilot features is an example of that (set of features that allows Consul to self-manage itself). Some of such features are only available in the Enterprise version of Consul.

The theme for Vault has been to broaden the feature set. Replication across data centers is one such feature (achieved via log shipping).

Nomad is being adopted by largest companies first (very different pattern compared to the other HashiCorp tools). The focus recently has been on solving some interesting problems that surface with these large organizations. One such advancement is Dispatch (HashiCorp’s interpretation of Serverless). You can now also run Spark jobs on Nomad.

The theme for Terraform has been to improve platforms support. To achieve this HashiCorp is splitting the Terraform core product from the providers (managing the community of contributors is going to be easier with this model). Terraform will download providers dynamically but they will be developed and distributed separately from the Terraform product code. In the next version, you can also version the providers and require a specific version in a specific Terraform plan. “Terraform init” will download the providers.

Mitchell brings up the example of the DigitalOcean firewall feature. They didn’t know it was coming but 6 hours after the DO announcement they did receive a PR from DO that implemented all the firewall features in the DO provider (these situations are way easier to manage when community members are contributing to provider modules if these modules are not part of the core Terraform code base).

Modern Secret Management with Vault (Jeff Mitchell) 

Vault is not just an encrypted key/value store. For example, generating and managing certificates is something that Vault is proving to be very good at.

One of the key Vault features is that it provides multiple (security related) services fronted with a single API and consistent authn/authz/audit model.

Jeff talks about the concept of “Secure Introduction” (i.e. how you enable a client/consumer with a security key in the first place). There is no one size fits all. It varies and depends on your situation, infrastructure you use, what you trust and don’t trust etc. etc. This also varies if you are using bare metal, VMs, containers, public cloud, etc. as every one of these models has its own facilities to enable “secure introduction”.

Jeff then talks about a few scenarios where you could leverage Vault to secure client to app communication, app to app communication, app to DB communications and how to encrypt databases.

Going multi-cloud with Terraform and Nomad (Paddy Foran) 

Message of the session focuses on multi-cloud. Some of the reasons to choose multi-cloud are resiliency and to consume cloud-specific features (which I read as counter-intuitive to the idea of multi-cloud?).

Terraform provisions infrastructure. Terraform is declarative, graph-based (it will sort out dependencies), predictable and API agnostic.

Nomad schedules apps on infrastructure. Nomad is declarative, scalable, predictable and infrastructure agnostic.

Paddy is showing a demo of Terraform / Nomad across AWS and GCP. Paddy explains how you can use output of the AWS plan and use them as inputs for the GCP plan and vice versa. This is useful when you need to setup VPN connections between two different clouds and you want to avoid lots of manual configurations (which may be error prone).

Paddy then customizes the standard example.nomad task to deploy on the “datacenters” he created with Terraform (on AWS and GCP). This will instantiate a Redis Docker image.

The closing remark of the session is that agnostic tools should be the foundation for multi-cloud.

Running Consul at Massive Scale (James Phillips) 

James goes through some fundamental capabilities of Consul (DNS, monitoring, K/V store, etc.).

He then talks about how they have been able to solve scaling problems using a Gossip Protocol.

It was a very good and technical session arguably targeted to existing Consul users/customers that wanted to fine tune their Consul deployments at scale.

Nomad and Next-generation Application Architectures  (Armon Adgar)

Armon starts to define the role of the scheduler (broadly).

There are a couple of roles that HashiCorp took in mind when building Nomad: developers (or Nomad consumers) and infrastructure teams (or Nomad operators).

Similarly, to Terraform, Nomad is declarative (not imperative). Nomad will know how to do things without you needing to tell it.

The goal for Nomad was never to build an end-to-end platform but rather to build a tool that would do the scheduling and bring in other HashiCorp (or third party) tools to compose a platform. This after all has always been the HashiCorp spirit of building a single tool that solves a particular problem.

Monolith applications have intrinsic application complexity. Micro-services applications have intrinsic operational complexity. Frameworks has helped with monoliths much like schedulers are helping now with micro-services.

Schedulers introduce abstractions that helps with service composition.

Armon talks about the “Dispatch” jobs in Nomad (HashiCorp’s FaaS).

Evolving Your Infrastructure with Terraform (Nicki Watt)

Nicki is the CTO @ OpenCredo.

There is no right or wrong way of doing things with Terraform. It really depends on your situation and scenario.

The first example Nicki talks about is a customer that has used Terraform to deploy infrastructure on AWS to setup Kubernetes.

She walks through the various stages of maturity that customers find themselves in. They usually start with hard coded values inside a single configuration file. Then they start using variables and applying them to parametrized configuration files.

Customers then move onto pattern where you usually have a main terraform configuration file which is composed with reusable and composable modules.

Each module should have very clearly identified inputs and outputs.

The next phase is nested modules (base modules embedded into logical modules).

The last phase is to treat subcomponents of the setup (i.e. Core Infra, RDS, K8s cluster) as totally independent modules. This way you manage these components independently hence limiting the possibility of making a change (e.g. in a variable) that can affect the entire setup.

Now that you moved to this “distributed” stage of independent components and modules, you need to orchestrate what needs to be run first etc. Different people solve this problem in different ways (from README files that guide you through what you need to manually do all the way to DIY orchestration tools going through some off-the-shelf tools such as Jenkins).

This was really an awesome session! Very practical and very down on earth!

Operational Maturity with HashiCorp (James Rasell and Iain Gray)

This is a customer talk.

Elsevier has an AWS first. They have roughly 40 development teams (each with 2-3 AWS accounts and each account has 1-6 VPCs).

Very hard to manage manually at this scale. Elsevier has established a practice inside the company to streamline and optimize this infrastructure deployments (they call this practice “operational maturity”). This is the charter of the Core Engineering team.

The “operational maturity” team has 5 pillars:

  • Infrastructure governance (base infrastructure consistency across all accounts). They have achieved this via a modular Terraform approach (essentially a catalog of company standard TF modules developers re-use).
  • Release deployment governance
  • Configuration management (everything is under source control)
  • Security governance (“AMI bakery” that produces secured AMIs and make it available to developers)
  • Health monitoring

They chose Terraform because:

– it had a low barrier to entry

– it was cloud agnostic

– codified with version control

Elsevier suggests that in the future they may want to use Terraform Enterprise. Which underlines the difficulties of monetizing open source software. They are apparently extracting a great deal of value from Terraform but HashiCorp is making 0 out of it.

Code to Current Account: a Tour of the Monzo Infrastructure (Simon Vans-Colina and Matt Heath)

Enough said. They are running (almost) entirely on free software (with the exception of a small system that allows communications among banks). I assume this implies they are not using any HashiCorp Enterprise pay-for products.

Monzo went through some technology “trial and fail” such as:

  • from Mesos to Kubernetes
  • from RabbitMQ to Linkerd
  • from AWS Cloud Formation to Terraform

They, right now, have roughly 250 services. They all communicate with each other over http.

They use Linkerd for inter-services communication. Matt suggests that Linkerd integrates with Consul (if you use Consul).

They found they had to integrate with some banking systems (e.g. faster payments) via on-prem infrastructure (Matt: “these services do not provide an API, they rather provide a physical fiber connection”). They appear to be using on-prem capacity mostly as a proxy into AWS.

Terraform Introduction training

The day after the event I attended the one day “Terraform Introduction” training. This was a mix of lecture and practical exercises. The mix was fair and overall the training wasn’t bad (albeit some of the lecture was very basic and redundant with what I already knew about Terraform).

The practical side of it guides you through deploying instances on AWS, using modules, variables and Terraform Enterprise towards the end.

I would advise to take this specific training only if you are very new to Terraform given that it assumes you know nothing. If you already used Terraform in one way or another it may be too basic for you.



]]> 0 963
Kubecon 2017 – Berlin: a personal report Mon, 03 Apr 2017 13:48:58 +0000 Following the establishing best practices of ‘open sourcing’ my trip reports of conferences I attend, I am copying and pasting hereafter my raw comments related to the recent Kubecon EMEA 2017 trip.

I have done something similar for Dockercon 2016 and Serverlessconf 2016 last year and given the feedbacks I had received, this is something . . . → Read More: Kubecon 2017 – Berlin: a personal report]]> Following the establishing best practices of ‘open sourcing’ my trip reports of conferences I attend, I am copying and pasting hereafter my raw comments related to the recent Kubecon EMEA 2017 trip.

I have done something similar for Dockercon 2016 and Serverlessconf 2016 last year and given the feedbacks I had received, this is something worthwhile apparently.

As always:

  1. these reports contains some facts (hopefully I did get those right) plus personal opinions and interpretations.
  2. If you are looking for a properly written piece of art, this is not it.
  3. these are, for the most part, raw notes (sometimes written laying on the floor during those oversubscribed sessions at the conference)

Take it or leave it.


Massimo Re Ferre’ – Technical Product Manager – CNA Business Unit @ VMware

Kubecon Europe 2017 – Report

Executive Summary

In general, the event was what I largely expect. We are in the early (but consistent) stages of Kubernetes adoption. The ball is still (mainly) in the hands of geeks (gut feeling: more devs than ops). While there are pockets of efforts on the Internet to help un-initiated to get started with Kubernetes, the truth is that there is still a steep learning curve you have to go through pretty much solo. Kubecon 2017 Europe was an example of this approach: you don’t go there to learn Kubernetes from scratch (e.g. there are no 101 introductory sessions). You go there because you know Kubernetes (already) and you want to move the needle listening to someone else’s (advanced) experiences. In this sense Dockercon is (vastly) different compared to Kubecon. The former appears to be more of a “VMworld minime” at this point, while the latter is still more of a “meetup on steroids”.

All in all, the enthusiasm and the tail winds behind the project are very clear. While the jury is still out re who is going to win in this space, the odds are high that Kubernetes will be a key driving force and a technology that is going to stick around. Among all the projects at that level of the stack, Kubernetes is clearly the one with the most mind-share.

These are some additional (personal) core takeaways from the conference:

  • K8s appears to be extremely successful with startups and small organizations as well as in pockets of Enterprises. The technology has not been industrialized to the point where it has become a strategic choice (not yet at least). Because of this, the prominent deployment model seems to be “you deploy it, you own it, you consume it”. Hence RBAC, multi-tenancy and security haven’t been major concerns. We are at a stage though where, in large Enterprises, these teams that own the deployment are seeking for IT help and support in running Kubernetes for them.
  • The cloud native landscape is becoming messier and messier. The CNCF Landscape slide is making a disservice to the cloud native beginners. It doesn’t serve any other purpose than officially underline the complexity of this domain. While I am probably missing something about the strategy here, I am puzzled how the CNCF is creating category A and category B projects by listing hundreds of projects in the landscape but only selecting a small subset to be part of the CNCF.
  • This is a total gut feeling (I have no data to back this up) but up until 18/24 months ago I would have said the container orchestration/management battle was among Kubernetes, Mesos and Docker. Fast forward to these days, it is my impression that Mesos is fading out a bit. These days the industry seems to be consolidating around two major centers of gravity: one is Kubernetes and its ecosystem of distributions, the other being Docker (Inc.) and their half proprietary stack (Swarm and UCP). More precisely there seems to be a consensus that Docker is a better fit and getting traction for those projects that cannot handle the Kubernetes complexity (and consider K8s being a bazooka to shoot a fly) while Kubernetes is a better fit and getting traction for those projects that can absorb the Kubernetes complexity (and probably requires some of its advanced features). In this context Mesos seems to be in search of its own differentiating niche (possibly around big data?).
  • The open source and public cloud trends are so pervasive in this domain of the industry that it is also changing some of the competitive and positioning dynamics. While in the last 10/15 years the ‘lock-in’ argument was around ‘proprietary software’open source software’. Right now, the ‘lock-in’ argument seems to be around ‘proprietary public cloud services’ Vs. ‘open source software’. Proprietary software doesn’t even seem to be considered a contender in this domain. Instead, its evil role has been assumed by the ‘proprietary cloud services’. According to the CNCF, the only way you can fight this next level of lock-in is through (open source) software that you have the freedom to instantiate on-prem or off-prem at your will (basically de-coupling the added-value services from the infrastructure). This concept was particularly clear in Alexis Richardson’s keynote.


The Expo was pretty standard and what you’d expect to see. Dominant areas of ecosystem seem to be:

  • Kubernetes setup / lifecycle (to testify that this is a hot/challenging area)
  • Networking
  • Monitoring

My feeling is that storage seems to be “under represented” (especially considering the interest/buzz around stateful workloads). There were not a lot of startups representing this sub-domain.

Monitoring, on the other hand, seems to be ‘a thing’. Sematext and Sysdig (to name a few) have interesting offerings and solutions in this area. ‘We have a SaaS version and an on-prem version if you want it’ is the standard delivery model for these tools. Apparently.

One thing that stood out to me was Microsoft’s low profile at the conference (particularly compared to their presence at, say, Dockercon). There shouldn’t be a reason why they wouldn’t want to go big on Kubernetes (too).

Keynote (Wednesday)

There are 1500 attendees circa at the conference. Given the polls during the various breakout sessions, the majority seem to be devs with a minority of ops (of course boundaries are a bit blurry in this new world).

The keynote opens up with the news (not news) that Containerd is joining the CNCF. RKT makes the CNCF too. Patrick C and Brandon P get on stage briefly to talk about, respectively, Containerd and RKT.

Aparna Sinha (PM at Google) gets on stage to talk about K8s 1.6 (just released). She talks about the areas of improvement (namely 5000 hosts support, RBAC, dynamic storage provisioning). One of the new (?) features in the scheduler allows for “taint” / “toleration” which may be useful to segment specific worker nodes for specific namespaces e.g. dedicated nodes to tenants (this needs additional research).

Apparently RBAC has been contributed largely by Red Hat, something I have found interesting given the fact that this is an area where they try to differentiate with OpenShift.

Etcd version 3 gets a mention as having quite a quite big role in the K8s scalability enhancements (note: some customers I have been talking to are a bit concerned about how to [safely] migrate from etcd version 2 to etcd version 3).

Aparna then talks about disks. She suggests to leverage claims to decouple the K8s admin role (infrastructure aware) from the K8s user role (infrastructure agnostic). Dynamic storage provisioning is available out of the box and it supports a set of back end infrastructure (GCE, AWS, Azure, vSphere, Cinder). She finally alludes to some network policies capabilities being cooked up for next version.

I will say that tracking where all (old and new) features sit on the spectrum of experimental, beta, supported (?) is not always very easy. Sometimes a “new” features is being talked about just to find out that it has moved from one stage (e.g. experimental) to the next (i.e. beta).

Clayton Coleman from Red Hat talks about K8s security. Interestingly enough when he polls about how many people stand up and consume their own Kubernetes cluster a VAST majority of users raise their hand (assumption: very few are running a centralized or few centralized K8s instances that users access in multi-tenant mode). This is understandable given the fact that RBAC has just made into the platform. Clayton mention that security in these “personal” environments isn’t as important but as K8s will start to be deployed and managed by a central organization for users to consume it, a clear definition of roles and proper access control will be of paramount importance. As a side note, with 1.6 cluster-up doesn’t enable RBAC by default but Kubeadm does.

Brandon Philips from CoreOS is back on stage to demonstrate how you can leverage a Docker registry to not only push/pull Docker images but entire Helm apps (cool demo). Brandon suggests the standard and specs for doing this is currently being investigated and defined (hint: this is certainly an area that project Harbor should explore and possibly endorse).

Keynote (Thursday)

Alexis Richardson does a good job at defining what Cloud Native is and the associated patterns.

CNCF is “project first” (that is, they prefer to put forward actual projects than just focus on abstract standards -> they want to aggregate people around code, not standards).

Bold claim that all projects in the CNCF are interoperable.

Alexis stresses on the concept of “cloud lock-in” (Vs generic vendor lock-in). He is implying that there are more people that are going to AWS/Azure/GCP consuming higher level services (OSS operationalized by the CSP) compared to the number of people that are using and being locked in by proprietary software.

Huawei talks about their internal use case. They are running 1M (one million!) VMs. They are on a path to reduce those VMs by introducing containers.

Joe Beda (CTO at Heptio) gets on stage. He talks about how to grow the user base 10x. Joe claims that K8s contributors are more concerned about theoretical distributed model problems than they are with solving simpler practical problems (quote: “for most users out there the structures/objects we had 2 or 3 versions ago are enough to solve the majority of the problems people have. We kept adding additional constructs/objects that are innovative but didn’t move the needle in user adoption”).

Joe makes an Interesting comment about finding a good balance between solving products problems in the upstream project Vs solving them by wrapping specific features in K8s distributions (a practice he described as “building a business around the fact that K8s sucks”).

Kelsey Hightower talks about Cluster Federation. Cluster Federation is about federating different K8s clusters. The Federation API control plane is a special K8s client that coordinates dealing with multiple clusters.


These are some notes I took while attending breakout sessions. In some sessions, I could physically not step in (sometimes rooms were completely full). I skipped some of the breakouts as I opted to spend more time at the expo.


This session was presented by Docker (of course).

Containerd was born in 2015 to control/manage runC.

New in project government (but the code is “old”). It’s a core container runtime on top of which you could build a platform (Docker, K8s, Mesos, etc.)

The K8s integration will look like:

Kubelet –> CRI shim –> containerd –> containers

No (opinionated) networking support, no volumes support, no build support, no logging management support etc. etc.

Containerd uses gRPC and exposes gRPC APIs.

There is the expectation that you interact with containerd through the gRPC APIs (hence via a platform above). There is a containerd API that is NOT expected to be a viable way for a standard user to deal with containerd. That is to say… containerd will not have a fully featured/supported CLI. It’s code to be used/integrated into higher level code (e.g. Kubernetes, Docker etc.).

gRPC and container metrics are exposed via Prometheus end-point.

Full Windows support is in plan (not yet into the repo as of today).

Speaker (Justin Cormack) mentions VMware a couple of times as an example of an implementation that can replace containerd with a different runtime (i.e. VIC Engine).

Happy to report that my Containerd blog post was fairly accurate (albeit it did not go into much details):

Kubernetes Day 2 (Cluster Operations)

Presented by Brandon Philips (CoreOS CTO). Brandon’s session are always very dense and useful. Never a dull moment with him on stage.

This session covered some best practices to manage Kubernetes clusters. What stood out for me in this preso was the mechanism Tectonic uses to manage the deployment: fundamentally CoreOS deploys Kubernetes components as containers and let Kubernetes manage those containers (basically letting Kubernetes manage itself). This way Tectonic can take advantage of K8s own features from keeping the control plane up and running all the way to rolling upgrades of the API/scheduler.


This session was presented by two of the engineers responsible for the project. The session was pretty full and roughly 80% of attendees claimed to be using K8s in production (wow). Helm is a package manager for Kubernetes. Helm Charts are logical units of K8s resources + variables (note to self: research the differences between “OpenShift applications” and “Helm charts” <- they appear to be the same thing [or at least are being defined similarly]).

There is a mention about which is a front-end UI to monocular (details here:

The aim of the session was to seed the audience with new use cases for Helm that aspire to go beyond the mere packaging and interactive setup of a multi-container app.


The attendance was low. The event population being skewed towards developers tend to greatly penalize sessions that are skewed towards solutions aimed to solve primarily Ops problems.

Their value prop (at the very high level) is similar to vSphere Integrated Containers, or Intel Clear Containers for that matter: run Docker images as virtual machines (as opposed to containers). Hyper proud themselves to be hypervisor agnostic.

They claim a sub-second start-time (similar to Clear Path). Note: while the VIC value prop isn’t purely about being able to start containerVMs fast, tuning for that characteristic will help (half-joke: I would be more comfortable running VIC in production than showing a live demo of it at a meetup).

The most notable thing about Hyper is that it’s CRI compliant and it naturally fits/integrate into the K8s model as a pluggable container engine.

]]> 4 959