Overview
Today, I would like to share with you my side project: qWatch. qWatch stands for “Quality Watch”, it is a data aggregator for code quality, based on different metrics. For now, the only implemented one is statistics around logs. In this article, I will explain why I chose logs as the first metric, its mechanism and how the tool is implemented. At the end, I will also share some thoughts about side project.
Functionality
First of all, let me explain two basic commands of qWatch:
qwatch collect
qwatch stats [topN]
Command qwatch-collect
collects logs from a target directory, following a
specific format. For now, I export log history from Datadog as CSV files. Then,
I launch the qwatch-collect command to collect them into my database. The
collect command compares new log events with existing ones, and merges them
without duplicate.
Command qwatch-stats
aggregates existing log events from database using log
patterns, and then display the top N results to the console. It helps to
understand which log events are the most frequent ones. In the following
example, you can see several information: the number of entries extracted, the
date range where the log entries are applied, the top N errors with the number
of occurrences, the pattern ID if exists, and the error summary.
$ qwatch stats 10
1,999 entries extracted (2019-02-13 to 2019-02-26).
Top 10 errors:
- 1,234 [P1] Something goes wrong.
- 234 [P2] Something is incorrect.
- 123 [ ] Incorrect parameter
- 20 [P5] Project ${id} not found
- ...
Why I developed this tool?
I developed this tool because there’re many errors in our production environment. By using the tool, I can measure the number of errors for given time window. Having my own patterns makes the aggregation more precise. Actually, this functionality already exists in Datadog, but it does not support custom log pattern. Another reason is because I wanted to improve my Java skill by writing more code. And most importantly, I believe that we should do data-driven decision making (DDDM), so that we can fix the most valuable bugs in the given time and resource.
Architecture
The architecture of the logs
tool is very simple. It’s a command-based tool.
When the main function is called, it dispatches the arguments to different
command:
var command = args[0];
if (CollectCommand.NAME.equals(command)) {
CollectCommand.newBuilder() //
.logDir(Paths.get("/Users/mincong/Downloads"))
.build()
.execute();
} else if (StatsCommand.NAME.equals(command)) {
int n = args.length > 1 ? Integer.parseInt(args[1]) : 200;
StatsCommand.newBuilder() //
.logDir(Paths.get("/Users/mincong/datadog"))
.topN(n)
.days(14)
.build()
.execute();
} else {
logger.warn("Unknown command: {}", command);
}
Each command has its own logic. One command does not depend on other command.
When performing I/O actions, a command delegates the different classes available
for importation and exportation. They are defined in package qwatch.logs.io
.
For now, 3 classes are created:
- CSV Importer
CsvImporter
- JSON Importer
JsonImporter
- JSON Exporter
JsonExporter
Some classes have concurrent capability. They are executed concurrently using thread pool.
Best Practices
In the following section, I would like to share the best practices I learnt and applied to this small tool.
Immutability. Value classes and data structures used in this project are
immutable. For value classes, I used Google’s
AutoValue to generate the
immutable implementation. It allows to write value classes in a declarative way,
while keeping the hashCode()
, equals()
, and toString()
implemented
correctly. For data structures, I don’t use Java built-in data structures, but
those coming from Vavr. It keeps
syntax concise, and ensures that structures are immutable.
Testing. I tried to do TDD as much as I can. The test coverage of this tool is 76%. The test effort is mostly focus on the I/O part and the log patterns. Keeping testing in mind allows me to create better structure in the source code, where code is loosely coupled. Also, I described some behavior in tests, so that the source code meets the goal before being implemented.
Java 11. I take the chance to upgrade to Java 11 for this project. It allows
me to use var
, the local-variable type inference system introduced by JEP 286.
I didn’t use Java Platform Module System (JPMS) yet, but I will if there’s any
opportunity in the future.
Functional Programming. Using functional library Vavr allows me to manipulate data objects easily in Java. It makes sort, deduplicate, map and transformation easy.
Build. This project is built using Maven. A default Maven configuration is
defined under .mvn
folder, where the multi-thread build is enabled by default
as -T 1C
. It allows to speed up the build time. In CircleCI, the build takes
about 13 seconds to execute.
What’s Next?
I don’t have concrete plan right now. Since I only developed this project during lunch time on weekdays and in the transport, it’s actually hard to go further. I think about several possibilities:
- Avoid manual download from Datadog Log Explorer
- Connect to AWS S3 to log download
- Automate execution
- Measure the Jenkins build and extract warnings and errors
- Generate log patterns from source code directly using a Maven plugin
- Visualize errors using Jupyter Notebook
- Identify bug tickets and measure the bug-fix delay (time between the feature creation date and the bug fix date)
Conclusion
In this article, we saw the tool qWatch
, how it collect and aggregates logs.
I also explained the architecture of the tool, the best practices I applied and
the eventual next steps for the project. Hope you enjoy this one, see you the
next time. The source code is available on GitHub:
https://github.com/mincong-h/quality-watch.
References
- Kevin Bourrillion, Éamonn McManus, “Google / AutoValue”, github.com, 2019. [Online]. Available: https://github.com/google/auto/tree/master/value
- Daniel Dietrich, “Vavr”, github.com, 2019. [Online]. Available: https://github.com/vavr-io/vavr/