A precompiled almost-HAML engine in C#

Introduction

This project is still a work in progress, so this article serves as an introduction to the problem space and walks through how the code works.

In the past when I wrote different web applications, I used Ruby on Rails combined with the HAML template language. HAML is my favorite way to write HTML because it is an abstract representation of an HTML DOM combined with a hint of Python syntax.

Being an abstract representation means that it doesn’t have to directly correspond to what the resulting HTML looks like. This decoupling enables a HAML render engine to reorganize the code to cleaner and simpler.

Take a look at the following example:

%html{ lang: 'en' }
  %head
    %title Hello world!
  %body
    %a{ href: 'https://technowizardry.net' }= my_link
    %div{ b: 'abc', a: 'xyz' } test

In other template engines like Ruby’s ERB or C#’s Razor, the white space is preserved and whatever indention you add, is included in the output HTML.

<html lang="en">
  <head>
    <title>Hello world!</title>
  <body>
    <a href="https://technowizardry.net">Test</a>
    <div b="abc" a="xyz">test</div>
  </body>
</html>

Indention can be handy when developing, but why waste the space when running production? One could just delete all the spacing in the source code and check this in, but now your code is harder to read. Can we have the best of both worlds?

The current state of the world

I’ve started experimenting with the new .net Core framework a lot because I like the framework and C# as a language. Unfortunately, HAML isn’t directly supported and instead the default render engine in ASP.NET MVC is just an low level HTML renderer which has the same problems as we highlighted above.

Instead I wanted to try to see if I can build my own solution and want to see how far we can push it with performance optimizations. Can we precompile the template into partial HTML streams? Can we optimize the HTML to be more friendly to Gzip? For example, if you have <a class="foo bar" /> and <a class="bar foo" /> Both of these elements are semantically equivalent and the classes can be ordered consistently so that Gzip can be efficiently compress them.

Fair warning, this will be prototype code and not ready for production quite yet.

Adding C# to the mix

I found a previous attempt at this called NHaml. There was quite a bit of work done on it, but it did not support .NET Core and seemed coupled to ASP.NET. I ended up borrowing the parsing logic (with modifications) and writing my own rendering engine.

But first, let’s see some results:

!!!
%html{ lang: 'en' }
  %head
    %title Hello world
    %meta{ charset: 'utf-8' }
    %meta{ content: 'width=device-width, initial-scale=1.0, maximum-scale=1.0', name: 'viewport' }
  %body
    .page-wrap{ class: DateTime.Now.ToString("yyyy"), d: 'bar', a: 'foo' }
      = DateTime.Now.ToString("yyyy-mm-dd")
      %h1= new Random().Next().ToString()
      %p= model.ToString()
      .content-pane.container
      - if (true)
        - if (1 > 0)
          %div really true
        %div Is True
      - else
        %div wat
      - if (false)
        %div Is False
    .modal-backdrop.in

Gets compiled into the following, then the cached class is called for following executions.

using System;
using System.IO;

internal sealed class __haml_UserCode_CompilationTarget
{
	private string model;

	public __haml_UserCode_CompilationTarget(string _modelType)
	{
		this.model = _modelType;
	}

	public void render(TextWriter textWriter)
	{
		textWriter.Write("<!DOCTYPE html><html lang=\"en\"><head><title>Hello world</title><meta charset=\"utf-8\"/><meta content=\"width=device-width, initial-scale=1.0, maximum-scale=1.0\" name=\"viewport\"/></head><body><div a=\"foo\" class=\"page-wrap ");
		textWriter.Write(DateTime.get_Now().ToString("yyyy"));
		textWriter.Write(" d=\"bar\" \">");
		textWriter.Write(DateTime.get_Now().ToString("yyyy-mm-dd"));
		textWriter.Write("<h1>");
		textWriter.Write(new Random().Next().ToString());
		textWriter.Write("</h1><p>");
		textWriter.Write(this.model.ToString());
		textWriter.Write("</p><div class=\"container content-pane\"/>");
		textWriter.Write("<div>really true</div>");
		textWriter.Write("<div>Is True</div>");
		textWriter.Write("</div><div class=\"in modal-backdrop\"/></body></html>");
	}
}

Note how the runs of HTML that never changes is transformed into static strings and all elements are normalized consistently.

A walk through the code

The HamlView is the effective entry point. It checks to see if it has a cached copy of the template in memory, if not then it requests a compilation.

[ To Be continued]

Best Practices for Elasticsearch mappings

At first, Elasticsearch may appear to be schemaless since you can add new fields any time you want, but every field in a document must match the mapping.

Dynamic Templates reduce boilerplate

How many times have you opened up a mapping file to something like this where the same type definition is repeated over and over again?

{
  "properties": {
    "foo": {
      "type": "keyword"
    },
    "foo": {
      "type": "keyword"
    },
    "foo": {
      "type": "keyword"
    },
    "baz": {
      "type": "keyword"
    },
    "other": {
      "type": "text"
    },
    ...
  }
}

It’s super easy to refactor this into an alternative where by default all string values are mapped as keyword, except for the specific field listed as “text”.

{
  "properties": {
    "dynamic_templates": [
      {
        "example_name": {
          "match_mapping_type": "string",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ],
    "other": {
      "type": "text"
    }
  }
}

Disable type detection

For new fields, Elasticsearch can automatically identify what type to use, but it can be wrong or do unexpected things. For example, I’ve seen Elasticsearch accidentally identify a decimal value as a long because the first value to go into the index did not have any decimal points. Then all other documents failed to be indexed because they did not match. This is especially important if you have fields that have a wide range of values (for example, user controlled) because you can’t predict if the first value is going to look like a number or a date, when it should always be considered to be a string.

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html

{
  "mappings": {
    "date_detection": false,
    "numeric_detection": false
  }
}

Query-level metrics for PostgreSQL/MySQL in Kubernetes with Packetbeat

MySQL and PostgreSQL can be a bit of a black box when running if you don’t take the time to configure metrics. How do you identify which queries are slow and need to be optimized? MySQL has the slow log, but that requires a time threshold to log queries that run for longer than >N seconds. What if you want to identify the most common queries even if they are fast?

Continue reading “Query-level metrics for PostgreSQL/MySQL in Kubernetes with Packetbeat”

Migrate Sprockets to Webpacker with React-rails

Ruby on Rails recently launched support for compiling static assets, such as JavaScript, using Webpack. Among other things, Webpack is more powerful at JS compilation when compared to the previous Rails default of Sprockets. Integration with Rails is provided by the Webpacker gem. Several features that I was interested in leveraging were tree shaking and support for the NPM package repository. With Sprockets, common JS libraries such as ReactJS had to be imported using Gems such as react-rails or classnames-rails. This added friction to adding new dependencies and upgrading to new versions of dependencies.

A couple of my projects used react-rails to render React components on the server-side using the legacy Sprockets system. This worked well, but I wanted to migrate to Webpacker to easily upgrade to the newest versions of React and React Bootstrap (previously I imported this using the reactbootstrap-rails, but this stopped being maintained with the launch of Webpacker.) However, migrating React components to support Webpack required changes to every single file adding ES6-style imports, file moves/renames, and scoping changes. This would have been too large to do all at once. What if there was a way to slowly migrate the JS code from Sprockets to Webpack, making components in either side available to the other side?

Continue reading “Migrate Sprockets to Webpacker with React-rails”

You don’t have enough static analysis

Introduction

Pretty much every programming language out there has tools that statically analyze your source code and detect different problems. These problems can range from simple things like ensuring that you have consistent casing for variable names in Java to ruthlessly enforcing method limits in Ruby. If you’ve ever used one of these tools, they may seem overbearing and not worth the hassle, but they will soon prove their value once your application becomes larger, has multiple developers, or is business critical and can’t afford outages caused by trivial mistakes. Static analysis tools are a super-low cost solution for improving the quality of a code-base.

Continue reading “You don’t have enough static analysis”

Structured and auditable changes to infrastructure

Note: I’m going to use AWS services as most of my examples for this post, but that’s just because I’m most familiar with them, the patterns found below are not limited to just AWS and can be applied to any cloud provider or self-hosted where similar patterns exist.

Introduction

Every service has some amount of supporting infrastructure required to support it. This includes any virtual servers (EC2 or other), storage (ex. S3, DynamoDB), load balancing, etc. basically any resources that your service uses that is not your direct business logic could be considered infrastructure. If you use continuous integration and change control on your business logic, then why would you not apply the same rules to your infrastructure?

Allowing and requiring developers to make changes using the UI introduces risk that one might make a mistake and bring down your production service. Continuing from my last post about infrastructure names, you could also make a mistake in any of regional clones.
Continue reading “Structured and auditable changes to infrastructure”

Vending Software Good Practices – Docker Security

Docker containers are the latest craze taking the world by storm. They enable software vendors to have more control over how their software is executed reducing the amount of work that software hosters need to be responsible for. By shifting the burden of figuring out environment requirements on to the software vendor, certain critical decisions that help improve security can be made once and only once and distributed to end-users. This reduces the cost barrier of having more stable/secure software as users no-longer have to think about intricacies of security and management, which we can see that users rarely take the time to invest in.

Docker containers have a number of different security mechanisms. I won’t go into details on that, if you’re interested in learning more, make sure to read the Docker security documentation page.

Capabilities

In Linux kernels, each process has a set of capability flags that the kernel checks when the process makes certain privileged syscalls. Processes running as root automatically get certain capabilities assigned to it.

Some example capabilities:

  • CAP_NET_BIND_SERVICE – Enables processes to bind to ports < 1024. By default, non-root processes can’t find to these reserved ports. Dropping this capability prevents even root processes from binding to these ports
  • Even more on the man page

According to the principal of least privilege, running with fewer capabilities will reduce the attack surface of a given piece of software.

Docker compose.yml

Docker compose files are a popular way to vendor an entire service stack to users. With it you can describe one or more Docker containers in a YAML-based format. More information is available in the official docs. A little used feature enables you to specify which capabilities your service requires.

For example, this is the configuration that I use for running NGINX on my server:

nginx:
  image: nginx:1.9.10
  cap_drop:
    - ALL
  cap_add:
    - CHOWN
    - DAC_OVERRIDE
    - NET_BIND_SERVICE
    - SETGID
    - SETUID

In this example, I enable a whitelist for capabilities instead of using the default list that Docker provides and enable only the minimal capabilities that are required. This list enables NGINX to modify file permissions (for access logs,) bind to port 80 and 443, and change the process user account. The default whitelist is available in the Docker source code here. Based on this, we’re reducing the attack surface that a malicious actor can leverage.

Docker compose is fully self-contained and doesn’t require the user to make any changes to their environment to start using. Docker compose and capabilities are a low-cost way to start reducing the attack surface of an application. Every service owner should attempt to run their application with –cap-drop ALL, then selectively enable capabilities until their application works, then vend that list as a best practice.

AppArmor/Security Profiles

Capabilities are a cheap way to begin to improve security, but they can only restrict a limited subset of kernel sys calls, making fine grained security control impossible. This is where mandatory access control and AppArmor strives. For distributions that support it (such as Ubuntu,) AppArmor is an opt-in security model that enables you to whitelist and/or blacklist specific sys calls, along with the parameters of those sys calls. For example, you could configure a Docker container application to only be able to open TCP connections to specific IP ranges and ports. Docker supports the ability to run containers with specific AppArmor profiles. While this requires more work on the user’s side to use, security conscious service vendors could vend an AppArmor profile along with their service that users could install. I plan to go into more detail on this in the future.

Conclusion

Anybody who builds a Docker container should leverage the security model that Docker provides by running with least privileges and capabilities, then include that configuration in vendor configuration, like Docker compose files. By doing this, your end users all will be able to take advantage of slightly reduced attack surface area, with only minimal effort on your side. Capabilities are in no way fool-proof, and one should never believe that they will significantly reduce the attack surface, but it’s better than nothing.

Dynamic AWS resource discovery for one-click region spin-ups

Disclaimer: At the time of this article’s writing, I work at Amazon, but not in AWS. This article is based on my own research and ideas and is not the official position of Amazon. This article is not intended as marketing material for AWS, only as some architectural patterns for you to use if you do leverage AWS.

AWS provides a number of different resources that you can use to build services using, including S3 buckets, SQS queues, etc. When you create a new instance of that resource, you must pick a name that usually must be unique in a given namespace. Depending on your naming scheme, you may also have to start embedding resource names in code or configuration files. This makes spinning up new regions difficult as now you have to update configuration with names for every stage/region that you might use. This may not seem like that big of a deal, but consider that you may have tens of different SQS queues, S3 buckets, etc. for each region/stage. This can begin to combinatorically explode as you now have # regions * # stages * # resources of different configuration definitions. This results in a lot of boilerplate.

But what if there was a better way?

Continue reading “Dynamic AWS resource discovery for one-click region spin-ups”

Fast development environments

Setting up new hosts entries for every different web site that you develop is hard. This workflow allows you to completely automate it. First thing you’ll want to do is setup a wildcard DNS record that points to your host. This allows you to dynamically setup new development websites without having create new DNS records for each one of them. I created a fake internal-only TLD on my local network’s DNS server that automatically returns the IP address of my development VM for any query to *.devvm. If you don’t have access to that, you could re-use an actual domain and automatically forward something like *.dev.technowizardry.net to the VM. For example, I have the ASUS RT-AC68U router for my personal network. So I SSH’d to the router, typed vi /etc/dnsmasq.conf, then appended:

Continue reading “Fast development environments”