Weekly Technical Report for January 1, 2023

As we move into 2023, this year is going to be a tough one. This year will face several challenges, one is to migrate all the data previously deployed on physical servers to the cloud. Then there’s the accelerated development of several new team members who will be able to take on the services involved in the current main business as soon as possible and will be expected to be able to independently resolve user issues and optimize the services. This will allow me to transfer some of the work to them and focus on important goals that are expected to take a long time this year.

There is also the fact that I have reached a stage of personal learning in technical and other areas that will determine the direction of my life in the next 7-8 years. This week, my main focus was on the design and specification of the logging framework and log tracing for several services.

First of all, to solve the problem of log tracing, in order to be able to cross-service on the call process log generated by the unified tracking, need to now TraceId unified. However, the types of TraceId of these services are not uniform, some use Long type and some use string. Moreover, the language and technology stack used by these services are not consistent. Directly using the TraceId in some standard distributed tracing frameworks should not be able to support the current situation of all the services, and can only be used in some services with relatively new technology stacks.

Therefore, from the consideration of compatibility and simplicity of transformation, we are going to use a custom generated value of Long type as TraceId and limit the number of digits to 16 digits. The first four bits start with 99, which characterizes the unified TraceId, and then the remaining two bits identify the service. The next four bits are the current microseconds, and the last eight bits are two sets of four random numbers spliced together. Although this TraceId does not guarantee uniqueness, it is sufficient in the current situation.

If the service is a Java technology stack, the generation of TraceId need to take into account the thread competition, it is best to assign a random number generator for each thread. Alternatively, use TreadLocal directly. The Java service extracts the TraceId from the request when processing the request. If the TraceId starts with 99, no new TraceId is generated. If not, the TraceId is generated as described above and stored in the MDC.

When encountering asynchronous execution, you need to be careful to copy the contents of the MDC to the side thread, otherwise the trace information will be lost. When a downstream service needs to be invoked, the stored TraceId needs to be passed to the downstream service. Finally, when the request is processed, the MDC needs to be emptied to prevent polluting the trace information of the next request.