Technical Review for the third week of December 2022

This week, the main work was to optimize for a certain Java service. The service has been having problems with CPU usage not being able to ramp up. The first question to consider is whether the service has a problem of insufficient working threads. Later on, it was found that it was not that the CPU usage could not be raised up, but that the raised CPU usage would lead to more timeout problems. The service had feedback from a long time ago that the performance was insufficient and it was not recommended to continue using it. So I have a feeling that the problem comes from the framework, not the business code.

After reading and combing through the framework code, the framework uses netty as the NIO server framework, and will distribute business processing tasks to the worker thread when executing business logic. Then the worker thread to process the business logic. One problem with the previous framework was that the number of worker threads was too low to cope with high IO situations. In the present case, it is not the same problem as last time, but the number of working threads is enough from the logs. Could it be a client-side issue. In the whole microservice architecture, the service will act as a client to call the interface of other services.

This could be an entry point. Reading through the code, I found that the service framework generates a proxy class ObjectProxy by implementing Java’s InvocationHandler interface, which takes over the RPC calls to other services. In the business code, through the RPC method of initiating calls to other service interfaces, ObjectProxy will be associated with the ProtocolInvoker to obtain the target service corresponding to the list of valid nodes (the list of valid nodes is refreshed every 30s). Then, it passes the list to the load balancer LoadBalancer to get the target node of this call. Then, the invocation is made to the target node through the protocol-specific Invoker class, which is responsible for managing the long connection to the target service, and selects a connection to send the request and receive the response when the invocation is made. The specific request method is related to whether the call is asynchronous or synchronous.

Each target service that the client needs to invoke consists of multiple nodes. For each node, the framework creates two I/O threads by default for network I/O transfers (NIO mode). In addition, the framework creates a TCP connection for each node by default equal to the number of processors. Each I/O thread contains a selector that polls for events related to the connection. A TCPSession is created for each TCP connection, and whenever a request is sent, a Ticket is created to track the request and its associated response. For synchronous requests, the request is sent and blocked until the response arrives. For asynchronous requests, the Ticket is filled with a callback function, and when the response arrives, the TicketNumber (the unique index of the Ticket) is used to locate the corresponding Ticket and call the pre-populated callback function for subsequent processing.

The framework for NIO operations , the underlying use of Java provides the ability to NIO library . Above mentioned TCP link, in fact, is the NIO library in the SocketChannel. then the framework of how to split the packet how it is, the framework through the Buffer to store the current data has been read, and we commonly use the RPC protocol, the packet will be the number of bytes present in the packet’s first, only need to compare the Buffer with the number of bytes of the size of the packet you can know whether or not to read the full If the packet is not read, it will continue to be read. If the packet is not fully read, continue to wait for the subsequent data to arrive. If the packet has been read, then according to the size of the number of bytes specified in the Buffer to split the corresponding data to deal with it. In this framework, an additional worker thread is taken from a thread pool to do the subsequent processing of the Ticket. Currently from the existing code , the work of the thread pool is only used to do this thing , its default number of threads and the same number of cores , the maximum number of threads is twice the number of cores .

According to the definition of the Java NIO library, several IO operations are specified in the Channel, and the Selector polls to check if these operations are ready, and if they are ready it returns the SelectionKey. the SelectionKey contains the necessary parameters for the correct operation from the ready channel.

After reading the code so far, there are no obvious problems from the client side. The solution used is basically mature and stable, it is possible that the problem occurs on the server side, which has to be followed by the familiar comb.