Weekly Technical Report for the fourth week of February 2023

This week I realized that some of the service frameworks are not written very well, especially certain Java frameworks. When the CPU usage reaches about 40%, there are a lot of timeouts. These services, CPU cores and memory capacity is not not enough, the number of working threads is not not enough. However, it is not enough to run the CPU. using Java performance tools to analyze the discovery, in fact, found that most of the working threads in Idel or Waiting state. At present, the comprehensive analysis of all the circumstances, still puzzled. NIO is also used, also used the Netty framework, but the throughput is not up. Through the analysis of threads, found that there is no particularly busy business threads. Inference should be IO or some kind of waiting mechanism to cause this low processing efficiency.

This time I was going to reduce the cost by cutting down the number of nodes, and I didn’t consider particularly many factors before execution. So when I cut down the node capacity, I just check the CPU load to determine whether the node can withstand it or not. When I raised the average CPU utilization of workload in Beijing to 35%-40%, the whole service experienced a large number of timeouts. At that time, I was shocked. When I analyzed it from the monitor later, almost all the nodes in the background locale timed out, i.e., they were in a state of “shock”.

This week, I also continued to refactor an old PHP gateway service, and the new gateway is written in Java. But I’m not really in favor of using Java as a gateway, after all, the execution characteristics of the Java language largely determines that it is not suitable for particularly high concurrency. And, we currently use are JDK 8, and did not introduce lightweight threads, each 4c8g container threads up to 800, more thread switching overhead will be particularly large. So the throughput of a single container is limited, carrying the same amount of traffic requires more containers. At first I was rewritten in Go, this framework is very good and also has a specialized team to maintain, but the leader still let me use the department of self-developed Java framework. Maybe the personnel considerations mostly , I have no choice but to write it first.