Best Practices for Java testing with JUnit

JUnit is a popular testing library for Java applications and I extensively used it when working at Amazon for the numerous Java applications and services there. However, I came across a number of different anti-patterns and areas to improve the quality of the test code. This post introduces many of the different tricks and patterns that I’ve learned and shared with my coworkers, and now want to share

Another library to know and reference is Mockito, which I use extensively in JUnit test cases and will reference this too below.

These are all real things that I’ve seen developers do.

Continue reading “Best Practices for Java testing with JUnit”

How to build a useful service data change audit log

If you’ve got a service that provides clients with the ability to make changes to those entities, then you probably want an audit log that tracks who makes what changes.

I decided to write this post because I frequently saw teams at Amazon not thinking through these considerations. Some of the guidance does focus on AWS IAM, but a lot of it is practical for any type of audit log.

Important aspects to an audit log:

  • Who made the change?
  • When did they make the change?
  • Where did they make the change?
  • What did they do?
Continue reading “How to build a useful service data change audit log”

Best Practices for working with Google Guice

Google Guice is a dependency injection library for Java and I frequently used it on a number of Java services. Compared to Spring, I liked how simple and narrow focused on just dependency injection it was. However, I often times saw developers using it in incorrect or non-ideal patterns that increased boilerplate or were just wrong.

These are all recommendations that I’ve accumulated over several years at working at Amazon watching engineers and sometimes myself improperly leverage Google Guice.

Continue reading “Best Practices for working with Google Guice”

Defensive Coding: Stop using your storage models everywhere

How to make your system robust against your worst nightmare–your future self

In this post, I talk about some strategies that I’ve learned to simplify class structures in Java services that load and persist data into data stores like DynamoDB or RDS at the same time making the codebase safer.

As always, my opinions are my own.

At Amazon, I ended up joining two teams that were suffering under the technical debt. Each time, I was asked to spend some time understanding why the products were unstable and users were encountering frequent bugs. In one system, responsible for managing critical metadata about products in the catalog, was experiencing problems where users were reporting that they’d randomly lose data.

A service that was losing client data is a terrible service and caused users to lose trust in this system. Note that some details of this story have been modified for confidentiality reasons. Let’s dive in.

Continue reading “Defensive Coding: Stop using your storage models everywhere”

Best Practices for Elasticsearch mappings

At first, Elasticsearch may appear to be schemaless since you can add new fields any time you want, but every field in a document must match the mapping.

Dynamic Templates reduce boilerplate

How many times have you opened up a mapping file to something like this where the same type definition is repeated over and over again?

{
  "properties": {
    "foo": {
      "type": "keyword"
    },
    "foo": {
      "type": "keyword"
    },
    "foo": {
      "type": "keyword"
    },
    "baz": {
      "type": "keyword"
    },
    "other": {
      "type": "text"
    },
    ...
  }
}

It’s super easy to refactor this into an alternative where by default all string values are mapped as keyword, except for the specific field listed as “text”.

{
  "properties": {
    "dynamic_templates": [
      {
        "example_name": {
          "match_mapping_type": "string",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ],
    "other": {
      "type": "text"
    }
  }
}

Disable type detection

For new fields, Elasticsearch can automatically identify what type to use, but it can be wrong or do unexpected things. For example, I’ve seen Elasticsearch accidentally identify a decimal value as a long because the first value to go into the index did not have any decimal points. Then all other documents failed to be indexed because they did not match. This is especially important if you have fields that have a wide range of values (for example, user controlled) because you can’t predict if the first value is going to look like a number or a date, when it should always be considered to be a string.

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html

{
  "mappings": {
    "date_detection": false,
    "numeric_detection": false
  }
}