Overview
Since I started my project qWatch in February, which collects logs for error-analysis, I realized how important it is to have logs in production. It might seem evident for some of you, but for others, logs might just be normal files stored somewhere. In this article, I will share what I know about logs, by going through the following sections:
- What can be found inside a log event?
- Why log files are not enough for the production env?
- Where are logs coming from?
- Log Analytics
Logging Event
Traditionally, a logging event is very simple. Taking Java logging framework
Log4J as an example, the conversion pattern in properties file might look like
the following expression, in which there’s a timestamp (%d
) of the event, the
thread name (%t
), the priority (%p
), the category name (%c
), the message
(%m
) and a line separator (%n
) at the end.
%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %-5p %c{1} - %m%n
But what if we can enrich the logging event with other information to provide a more complete context? For example, the source code, the web access, the cloud provider… Logging event will become much more interesting. For example, in Datadog, you are able to see this information.
Category | Field | Description |
---|---|---|
Core | Source | The source of logs, e.g. Tomcat Server. |
Core | Host | The host machine. |
Core | Service | The service name. |
Core | Status | The log level: error, warn, info, … |
Source Code | Logger Name | The name of the logger. |
Source Code | Exception Class | From which class the exception was thrown. |
Source Code | Thread Name | The name of the thread. |
Source Code | Stacktrace | The stacktrace of exception. |
Source Code | Exception Message | The message of exception. |
Source Code | Class | The class of exception. |
Customer | Log Type | The type of log, e.g. SSO, Tomcat |
Customer | Environment | Dev, pre-prod, prod, … |
Customer | Project | The project name. |
Web Access | Client IP | The IP address of the client. |
Web Access | OS | The operation system of the client. |
Web Access | Browser | The browser of the client. |
Web Access | Referer | The referer of the client. |
Web Access | Response Time Sec | The response time in second. |
Web Access | URL Path | The path of the URL. |
Web Access | User Agent | The user agent of the client. |
Web Access | Device | The device of the client. |
Web Access | Status Code | The status code of the client. |
Web Access | Method | The HTTP method of the client. |
AWS | ELB Name | Elastic Loading Balancing |
AWS | S3 Bucket | S3 Bucket |
I found this enrichment (I don’t know if this is the right word) very important. When there’s something wrong in production, having a simple log message is not enough. As a developer, I need more detail. More detail about the user, more detail about the cluster, more detail about the source code, …
Log File vs Log Platform
What is the different between log files and log platform? Why log files are not enough for the production environment.
In my opinion, log files are suitable for small projects. You can SSH to your
environment and watch the logs using tail -f
or less +F
. However, face to a
big project, where the number of machines keep growing, watching log files
becomes harder and harder. By the way, sometimes you are not even allowed
to access the machines. On the other hand, log platform provides opportunity to
aggregate, enrich, search, analysis, and monitor log events. These options are
essential for being able to handle critical events in production.
# | Log File | Log Platform |
---|---|---|
Tail | Yes | Yes |
Aggregation | No | Yes |
Analysis | Manual | Graphical |
Enrichment | No | Yes |
Search | Single Source | Multi Source |
Monitoring | No | Yes |
Source of Logs
Logs can come from many sources. Here’re some of them I saw, separated by server, container, cloud, and other.
Server: Apache, Cassandra, Consul, Elasticsearch, HA Proxy, Nginx, MongoDB, Java, Journald, Apache Tomcat, Go, Microsoft .NET, Ruby, Node.js, PostgreSQL, Varnish Cache, Python, MySQL, Redis, Microsoft IIS, Apache Kafka, Apache ZooKeeper, RabbitMQ, PHP, Windows, Custom files
Container: Docker, Kubernetes, Amazon ECS, Amazon EKS, Mesos, CoreOS, RedHat Openshift, AWS Fargate, Istio
Cloud: AWS, Fastly, Azure, Cloud Foundry, Heroku, Google Cloud Platform
Other: Rsyslog, Fluentd, Logstash, Syslog-ng, RxNXLog
Log Analytics
When using log analytics, it is possible to perform time-series analysis based on log events aggregation. It allows you to understand the volume of events in the past. Combined with visualization techniques, you will also be able to find out the import information (anomaly, trends, …) easily. Two concrete solutions in my mind are Datadog - Log Explorer and Elastic - Logging.
Obviously, analysis goes far beyond time-series. There’re also pie chart, maps, … The key here is aggregation and filtering. Thanks to log platform, you are able to group by different information and filter what you need.
Conclusion
In this article, I shared what I know about logs through the information inside a log event, the advantages of having a log platform (aggregate, enrich, search, analysis, monitor), the sources of logs, and log analytics. Hope you enjoy this article, see you the next time!
References
- “Log4J - TTCC”, Wikipedia, 2019. https://en.wikipedia.org/wiki/Log4j#TTCC