Meaningful Logging and Metrics

Meaningful Logging and Metrics

Pretty much any application needs logging. Except maybe the simplest of applications that do not require user interaction and only do one incredibly specific thing. Or applications that only interact via the console. Pretty much any other application needs logging, though. When things are on fire, you need to be able to see what has happened. Maybe it’s the trigger to let you know that something is on fire.

All too often, though, the code that does the logging looks horrible.

_logger.Info("Created the order");

That’s just peachy. Which order? For what customer? What does it contain? This does not help me at all when trying to pin down the cause of an issue (or even just the issue itself). This also extends to error reporting, by the way. It’s not very helpful to see a message like this in your logs:

2017-05-18T16:42:15Z [Critical] An error occurred: System.InvalidOperationException: Sequence contains no elements
  at System.Linq.Enumerable.Single[TSource](IEnumerable`1 source)
  at EnterpriseySystem.Common.BetterThanEntityFramework.DoTheQuery[T](MyQueryable`1 query)
  at EnterpriseySystem.DataLayer.EverythingRepository.SaveChanges()
  at EnterpriseySystem.Web.MagicUnitOfWorkFilter.OnActionExecuted(ActionExecutedContext context)

Now, of course, this is exaggerated a bit. Nevertheless, the point should come across. About the only useful factoids you can get from this message are that ‘something went wrong before, during or after the execution of a query’ and ‘it’s something related to saving changes’. It doesn’t tell you what sort of thing it was saving, what the identity of that thing was, or even what the request URL was.

So, let’s add some properties to our logging.

Properties

Properties start to give us some context and can make our lives a lot easier.

2017-05-18T16:42:15Z [Critical] An error occurred gathering changes preparing to create an order for customer 376572, containing products [ 2352, 23422, 9482 ]: System.InvalidOperationException: Sequence contains no elements
  at System.Linq.Enumerable.Single[TSource](IEnumerable`1 source)
  at EnterpriseySystem.Common.BetterThanEntityFramework.DoTheQuery[T](MyQueryable`1 query)
  at EnterpriseySystem.DataLayer.EverythingRepository.SaveChanges()
  at EnterpriseySystem.Web.MagicUnitOfWorkFilter.OnActionExecuted(ActionExecutedContext context)

Now this is something useful. We know the system is trying to create a new order, and which customer it is for, and which products it contains. We also know it’s in the part of the ORM that is gathering changes. This allows us to very quickly focus our search for the issue.

However, think about what a dumpster fire the logging code for this will become.

_logger.Critical(
  ex, 
  "An error occurred gathering changes preparing to create an order " + 
  "for customer {0}, containing products [ {1} ]",
  customer.CustomerId,
  string.Join(", ", order.Lines.Select(l => l.ProductId)));

And this is if I try to make it look nice. Usually it’ll be one giant line of text and parameters. There is another problem, however. What if, during my troubleshooting, I want to find all messages relating to customer 376572? I can search for customer 376572, but that will not find the following message:

2017-05-18T15:32:15Z [Info] Removed address record. AddressId = 8748323, CustomerId = 376572

Clearly we need a better solution. Enter structured logging.

Structured logging

Structured logging makes data a first-class citizen of your logged messages, rather than something you have to pull out using regular expressions and analyzers.

Traditionally, a logged message is, at its very core, simply a couple of lines of text. Those lines of text might include a lot of commonly available data, such as a timestamp, a level (Information, Warning, Error, etc.), and other application-specific properties, but it is, for all intents and purposes, just a string.

Structured logging turns a log message into an object, with properties. Let’s take a look at how that same exception message from earlier might look when using structured logging. This is a raw message dump from Seq (a structured logging back-end).

{
    "@t": "2017-05-26T12:06:09.6427612Z",
    "@mt": "An error occurred gathering changes preparing to create an order for customer {CustomerId}, containing products {OrderProducts}",
    "@m": "An error occurred gathering changes preparing to create an order for customer 376572, containing products [ 2352, 23422, 9482 ]",
    "@i": "9060ab2b",
    "@l": "Critical",
    "@x": "System.InvalidOperationException: Sequence contains no elements\r\n  at System.Linq.Enumerable.Single[TSource](IEnumerable`1 source)\r\n  at EnterpriseySystem.Common.BetterThanEntityFramework.DoTheQuery[T](MyQueryable`1 query)\r\n  at EnterpriseySystem.DataLayer.EverythingRepository.SaveChanges()\r\n  at EnterpriseySystem.Web.MagicUnitOfWorkFilter.OnActionExecuted(ActionExecutedContext context)",
    "CustomerId": 376572,
    "OrderProducts": [
        2352,
        23422,
        9482
    ],
    "Environment": "Development",
    "HttpRemoteHost": "::1",
    "HttpUri": "http://myawesomeservice.mycompany.io/order",
    "HttpMethod": "POST",
    "ConnectionId": "0HL36VOIBEH51"
}

You can see that the customer ID is not only available in the ‘rendered message’ (the @m property), but also as a separate property called Customer​Id. Because it’s available as a separate piece of data, you can suddenly do a bunch of interesting things with it. For example, if you have a big collection of log messages like this, you could search for only those related to customer 376572. Or maybe for messages mentioning product 2352. Or you could search for this specific exception and then graph the product count into a histogram, to see if there is any correlation.

Notice there are also properties on the message related to the current execution context, such as Http​Uri and Connection​Id. The better logging frameworks allow you to set ‘ambient’ properties. These can be valid for the lifetime of the application, like the hosting machine name, or only valid for a limited scope, like a HTTP request. All log messages generated while processing that HTTP request will have those properties, without you having to explicitly add them to each log message.

Of course you need a specialized logging back-end to be able to do something useful with these properties. You’ll need a searchable document store and a front-end to display messages and render graphs. There are many currently available, but some of the big names right now are Elastic Stack, Seq and Splunk.

Consistent property names

Something you’ll quickly run into when using structured logging is the consistency of parameter names. If one piece of code logs the products as Products and another logs them as Order​Products, it’ll become a lot more difficult to reliably use those properties.

You can work around this by putting the property name on a static class that‘s used everywhere. Like so:

internal static class LogPropertyNames
{
  public const string OrderProducts = "{OrderProducts}";
  public const string CustomerId = "{CustomerId}";
}

public void CreateOrder(Order order)
{
  _logger.Information(
    $"Creating an order for customer {LogPropertyNames.CustomerId} " +
    $"with products {LogPropertyNames.OrderProducts}.",
    order.CustomerId,
    order.ProductIds);
}

This works pretty nicely. As long as you maintain the convention that all property names must be maintained in the Log​Property​Names class, it’s pretty easy to ensure consistent property names.

Still, it’s quite a bit of code. When you’re just perusing a method, this bit of code shouts at you. And this is just a very basic piece of code. You’ll often have many more properties to log, and sometimes it is a Debug or Trace message. You’ll only want to log those when there is actually a sink or target configured to accept those levels, because logging messages consumes time and memory.

Time to refactor this into a method.

public void CreateOrder(Order order)
{
  LogCreatingOrder(order);
}

private void LogCreatingOrder(Order order)
{
  if (_logger.IsInformationEnabled)
  {
    _logger.Information(
      $"Creating an order for customer {LogPropertyNames.CustomerId} " +
      $"with products {LogPropertyNames.OrderProducts}.",
      order.CustomerId,
      order.Products.Select(p => p.ProductId));
  }
}

Much better! I think we can improve even further. I don’t really like the fact that your classes will become littered with many methods that are still pretty sizable, just so we can do logging. How about we move those somewhere else?

Meaningful Logging

internal static class LoggerExtensions
{
  public static void LogCreatingOrder(this ILogger logger, Order order)
  {
    if (logger.IsInformationEnabled)
    {
      logger.Information(
        "Creating an order for customer {CustomerId} " +
        "with products {OrderProducts}.",
        order.CustomerId,
        order.Products.Select(p => p.ProductId));
    } 
  }
}

public void CreateOrder(Order order)
{
  _logger.LogCreatingOrder(order);
}

Now we actually have strongly typed and meaningful logging methods on our logging interface. Note how I removed the usage of Log​Property​Names; if all your actual logging code is in one file, it’s dead simple to ensure consistency of your property names, so you don’t really need the static class any more. This is starting to look pretty nice!

Metrics

Along with logging, most non-interactive applications, like web applications, Windows services, or web APIs also need metrics: raw numeric points of data that give you an insight into how well the application is performing and whether it is healthy. I think it’s relevant to see these as different from logging. Yes, with structured logging you can output your metrics as part of logging and use those to create graphs, but in my experience the tooling of dedicated metrics applications, like Datadog or Librato, is that much better at managing alerts and just processing large quantities of time-series data, that it warrants treating them as a separate concern.

But oftentimes, logging and metrics go hand in hand. You created a user account? Write a log message and increase a counter. You finished processing a HTTP request? Write a log message and add the elapsed time to a timer metric. Imaging the mess if you use the ‘old’ style of writing logging code and metrics code in between your other code.

Note how usually log messages and metrics are triggered by ‘events‘: a user was created, an error occurred, an HTTP request was processed. It would be nice if your code would reflect this. Instead of being explicit and talking about ‘logging this’ and ‘outputting that metric’, we can talk about events that happen. They are not application events or domain events, however.

Monitoring events

I’d like to propose the term monitoring events, because you usually use the output of the events (such as logging and metrics) to monitor the performance and health of your application. It’s also unambiguously different from the other kinds of events I mentioned.

It is, simply put, an API for your application code to whatever logging, metrics, alerting, or other monitoring solution you have. In my current solutions, it takes the form of a class, like this:

public class MonitoringEvents
{
  ILogger Logger { get; }
  IMetrics Metrics { get; }
  ILogContext LogContext { get; }
}

IMetrics and ILog​Context define a number of operations to do with metrics and ambient properties, respectively. ILogger is Serilog’s ILogger interface. The actual methods are defined as extension methods and look like this:

internal static class MonitoringEventsExtensions
{
  public static void UserCreated(this MonitoringEvents monitoringEvents, User user)
  {
    monitoringEvents.Logger.Information("Created user {UserId}, named {Username}.",
                                        user.UserId,
                                        user.Username);
    monitoringEvents.Metrics.IncrementCounter("users.created");
  }
}

In the consuming code, it looks like this:

void CreateUser()
{
  // do complicated business stuff
  var user = new User();

  MonitoringEvents.UserCreated(user);
}

In this example, Monitoring​Events looks like it’s a property. That’s because it is. It could of course be a field that gets injected via the constructor, but in conversations with my colleagues we decided that this kind of practice leads to ‘constructor’ bloat; it gets very hard to see what the actual dependencies of a class are, and which are the infrastructural dependencies.

For that reason, we opted to expose our interface using a property which gets set by property injection.

This means your actual business logic code, which is supposed to be the raison d'être for your application, can be more about actual business logic, and less about cross-cutting concerns like monitoring. What you log and which metrics you keep are decoupled from the business code. I think that’s pretty nice with regards to the Single Responsibility Principle, which is not really about responsibilities but about reasons for change. If you do decide to change to a different logging framework (but in my humble opinion, at the moment there is nothing better than Serilog), the only places you have to make changes are in your bootstrapping code, in your interface, and in your extension methods.

I really like this approach. In a lot of applications, logging and monitoring is added as an afterthought, if at all, and because there is usually little perceived business value in logging and metrics, it is treated like a red-headed step-child; its code is often sloppy, ugly, and hard to read. Please, think of the step-child.