Logging is one of the most important parts of software systems. Whether you have just started working on a new piece of software, or your system is running in a large scale production environment, you’ll always find yourself seeking help from log files. Logs are the first thing people look for when something goes wrong, or something doesn’t work as expected. Logging the right information in the right way makes the life of developers so much easier. Two things that developers need to be aware of to get better at logging are what and how to log. In this article, we’ll take a look at the some of the basic logging etiquettes that can get us the best out of our logs. The What and How of Logging Let’s take the example of an e-commerce system and take a look at logging in Order Management Service (OMS). Suppose a customer order fails due to an error from Inventory Management Service (IMS), a downstream service that OMS uses to verify the available inventory. Let’s assume that OMS has already accepted an order, but during the final verification step, IMS returns the following error because the product is no longer available in the inventory. HTTP/2 Product Not Available 404 What to Log Normally, you would log the error in this way log.error( , ex.getResponseBodyAsString()) "Exception in fetching product information - {}" This will output a log in the following format [ T18: : , + ]-[ERROR]-[InventoryValidator]-[ ] Exception fetching product information - Product Not Available 2020 -09 -27 54 41 500 0530 13 in Well, there isn’t much information available in this log statement, is it? A log like this doesn’t serve much purpose because it lacks any contextual information about about the error. Can we add more information to this log to make it more relevant for debugging? How about adding the Order Id and Product Id? log.error( , orderId, productId, ex.getResponseBodyAsString()) "Exception in processing Order #{} for Product #{} due to exception - {}" This will output a log in the following format [ T18: : , + ]-[ERROR]-[InventoryValidator]-[ ] Exception processing Order # Product # due to exception - Product Not Available 2020 -09 -27 54 41 500 0530 13 in 182726 for 21 Now this makes a lot of sense!! Looking at the logs, we can understand that an error occurred while processing Order #182726 because Product #21 was not available. How to Log While the above log makes perfect sense for us humans, it may not be the the best format for machines. Let’s take an example to understand why. Suppose there is some issue in the availability of a certain product (say Product #21) due to which all orders containing that product are failing. You have been assigned the task to find all the failed orders for this product. You happily do a grep for Product #21 in your logs and excitedly wait for the results. When the search completes, you get something like this [ T18: : , + ]-[ERROR]-[InventoryValidator]-[ ] Exception processing Order # Product # due to exception - Product Not Available [ T18: : , + ]-[ERROR]-[InventoryValidator]-[ ] Exception processing Order # Product # due to exception - Product Not Available [ T18: : , + ]-[ERROR]-[InventoryValidator]-[ ] Exception processing Order # Product # due to exception - Product Not Available [ T18: : , + ]-[ERROR]-[InventoryValidator]-[ ] Exception processing Order # Product # due to exception - Product Not Available 2020 -09 -27 54 41 500 0530 13 in 182726 for 21 2020 -09 -27 53 29 500 0530 13 in 972526 for 217 2020 -09 -27 52 34 500 0530 13 in 46675754 for 21 2020 -09 -27 52 13 500 0530 13 in 332254 for 2109 Not quite something that we were expecting right? So how can we improve this? Structured logging to the rescue. Structured Logging Structured logging solves these common problems and allows log analysis tools to provide additional capabilities. Logs written in a structured format are more machine-friendly, i.e. they can be easily parsed by a machine. This can be helpful in the following scenarios: Developers can search logs and correlate events, which is invaluable both during development as well as for troubleshooting production issues. Business teams can parse these logs and perform analysis over certain fields (for e.g. unique product count per day) by extracting and summarising these fields. You can build dashboards (both business and technical) by parsing the logs and performing aggregates over relevant fields. Let’s use our earlier log statement and make a small change to make it structured. log.error( , orderId, productId, ex.getResponseBodyAsString()) "Exception in processing OrderId={} for ProductId={} due to Error={}" This will output a log in the following format [ T18: : , + ]-[ERROR]-[InventoryValidator]-[ ] Exception processing OrderId= ProductId= due to =Product Not Available 2020 -09 -27 54 41 500 0530 13 in 182726 for 21 Error Now this log message can be easily parsed by the machine by using “=” as a delimiter to extract the , and fields. We can now do an exact search over to accomplish our task. OrderId ProductId Error ProductId=21 This also allows us to perform more advanced analytics on the logs, such as preparing a report of all the orders that failed such errors. If you use a log management system like Splunk, the query can now produce the following result Error=”Product Not Available” | stats count by ProductId +-----------+-------+ | ProductId | count | +-----------+-------+ | | | | | | +-----------+-------+ 21 5 27 12 We could also use a JSON layout to print our logs in the JSON format { : : : : : : : } "timestamp" "2020-09-27T18:54:41,500+0530" "level" "ERROR" "class" "InventoryValidator" "line" "13" "OrderId" "182726" "ProductId" "21" "Error" "Product Not Available" It’s important to understand the approach behind structured logging. There is no fixed standard for the same and it can be done in many different ways. Conclusion In this article, we saw the pitfalls of unstructured logging and the benefits and advantages offered by structured logging. Log management systems such as Splunk are hugely benefited by a well structured log message and can offer easy search and analytics on log events. The biggest challenge however, when it comes to structured logging, is establishing a standard set of fields for your software. This can be achieved by following a custom logging model or centralised logging which ensures that all developers use the same fields in their log messages. Note This is the first part of my Microservice Observability Series. The next post is on metrics and can be read here . Also published at https://medium.com/@theawesomenayak/microservice-observability-logging-2be535385616