Weekly Technical Report for the fourth week of November 2022

This week, I’ve been mostly optimizing a service. This service is written in Java. When there is not much traffic in the production environment, there is also a call batch timeout. And when the timeout is sent, the CPU usage is very low. After observing this, the CPU usage never went up. At this point, it was speculated that the threads were all blocking on some operation, causing the problem. Most of the services I’ve worked with, including this one, are IO-intensive. These types of services involve a lot of RPC calls, and when RPC calls are made, the worker threads block, making it impossible to process other requests. So the number of worker threads for these types of services are set to be very large to ensure that there are extra threads to handle IO requests, preventing subsequent requests from being processed due to the majority of the threads blocking, and ultimately leading to a large number of timeouts. This is actually problematic, although Java has taken the NIO model when dealing with sockets and parsing requests, but the logical processing of the request is still using a pool of worker threads. When a worker thread blocked on the IO request, the worker thread can only wait for the IO request to complete or timeout. If the result returned by the IO request is not dependent on the final result of the request, the request will be submitted to another thread pool for processing. This time its processing is asynchronous for that worker thread. In the end, after troubleshooting, this service had a configuration about the number of worker threads that was not being read due to a framework issue. This resulted in the service starting with the default number of worker threads setting, which is the same number of worker threads as CPU cores. So, when the worker threads were processing downstream RPC calls, any slight fluctuation in the downstream interface’s elapsed time would cause a large number of requests to timeout. At this time, the CPU usage is also small because the worker thread is blocked most of the time waiting for the IO operation to return. How to troubleshoot it, first of all, you need to set up in the log frame to print the current thread name, for logback log frame, that is, in the log format settings plus % t. Then, a single node in the test environment for pressure testing, to see in a certain QPS, the service for the processing of the request. It is best to add StopWatch to the logical code of the request to assist in the analysis. Finally found that the output log of only a few threads, not right. Then, use jstack to filter out the number of threads. Found that there are indeed only four. It can be explained that the framework used by this Java service uses the netty framework for request processing, and finally it will be forwarded to the netty worker thread pool to process the logic of the request. Here, the netty worker thread pool corresponds to the prefix nioEventLoopGroup-5. In addition, it can be proved from the cumulative cpu time to assist in proving that there are really only these four threads in the processing of all the main logic of the request. For this kind of service, under the production environment, the number of worker threads is generally set to 800~1000, and the timeout of RPC calls will be strictly set to prevent a large number of worker threads from being blocked, which ultimately leads to a drastic decrease in the node’s throughput. After updating the framework, the problem was solved. Additionally, I discovered that one of the service’s main interfaces involved a downstream call based on the http protocol. The call was made by manipulating the OkHttp library, and I found that the call did not set a timeout. This is the wrong situation, if an extreme situation is encountered and the downstream is delayed in returning, then the worker thread will block for a long time. Therefore, you must set a reasonable timeout for downstream calls to protect the smoothness between upstream and downstream calls and prevent avalanches from occurring. There are three general types of timeouts, connection timeout, read timeout, and write timeout, all of which need to be set. In addition, for the possible existence of a large number of Http calls, I turned on OkHttp ConnectionPool. According to the documentation, the advantage of ConnectionPool is that multiple http or http/2 requests to the same address can share the same connection. However, it should be noted that the premise of this sharing is that the server side supports http long connection. In addition, this week also mainly went to the Tencent Cloud’s Advanced Architect TCP related content, because the exam is scheduled on the weekend. This TCP exam is a bit more difficult than the original architect and practitioner exams, and it still requires good preparation. I didn’t have much time to read it at work, so I reviewed the heavy stuff directly until 5:00 a.m. on Saturday, and fortunately, I passed the exam in the end. I should be writing an article dedicated to this exam. ...

November 27, 2022

Technical Review for the third week of November 2022

This is the first of my periodic reviews that has a technical problem I’ve encountered at work, so this installment is mostly a summary of experiences over time, to get a head start on future technical reviews. I’ve been working at the company for almost six months now, and I recently switched from client-side development to backend development. This is what I wish, but actually not what I asked for. Because I personally feel that in the current career plan of China’s tech industry, backend developers can explore slightly more things, and the scale of exposure to problems will be much larger. In fact, the client side is also promising, my first more mature open source project GpgFrontend is a client-side project. I invested a lot of time in it and solved a lot of problems. In particular, compilation issues, platform compatibility issues, stability issues. The depth of its exploration, the difficulty of the problem, are not small. I currently feel that why the client and back-end development will be divided into high and low, it is because the heart is not quiet, always thinking of climbing up. In addition, for the pursuit of technology originally does not matter client, front-end or back-end, the technical ideas behind them are essentially common, but the current occupation is too important, but instead of occupational division of labor to divide the technology clearly. I think this is not good, so although I am currently a back-end development, but I must not think that I am only a back-end development, other technologies do not see, do not learn, their own thinking box up. Now that the position has changed to backend developer, I don’t know what to do instead. Backend development, I currently remember, I wrote my own earliest backend project, is when I was 15 years old with Nodejs to write an astronomy forum Stelescope. It used the Express framework, which was popular at the time. I still remember when I was first introduced to technical concepts such as MongoDB, state saving for login and logout, asynchronous callbacks, and so on. The one I remember most is asynchronous callbacks, which took me a long time to understand. It took me a long time to understand because I didn’t even know the basic concepts of processes and threads, let alone asynchronous callbacks and closures. At that time, the Internet is still arguing about the advantages and disadvantages of Nodejs and PHP, Nodejs is suitable for what scenarios, and then asynchronous callbacks and multi-process parallelism is better or worse. At that time, I remember that asynchronous callbacks can not be blocked, if you encounter a blocking operation will use a non-blocking API, this API can immediately return to the blocking operation will be put aside to deal with, the main thread can still continue to deal with the subsequent code. When the blocking operation is finished, the main thread will be notified to execute the callback function to deal with the results of the blocking operation. Later, I was also exposed to Python backend development and wrote a simple class academic attendance management system ( SP ). Here, I was exposed to the idea of MVC, which I understand as Model, View and Controller. This is an important idea, at that time, the concept of front-end is not very clear, the separation of front-end and back-end is not yet the mainstream idea. At that time, the server was responsible for dynamic page generation, and the controller was triggered by the request to respond. Controllers calculated based on the model, and ultimately through the template engine to render the model into a page, and then the page is returned to the user’s browser. It’s just a process. I remember at that time, will be defined on the server side of many templates, templates to choose some places hollow, to be stored in the data. Or define a small card control, put in the for statement, and then use the template engine to generate a lot of cards. At that time, I as a back-end developer to consider every aspect, including the page is beautiful, data security, fast response and so on. Then I started the next year, I came into contact with the Spring framework, specifically with SpringBoot. this time, really understand the relational database. It turns out that my understanding of relational databases is only to install and configure them. I remember very deeply, when I came into contact with the idea of front-end and back-end separation, I think this is a good thing. I took it out and discussed it with Mr. Wang, saying that it would be best to use front-end and back-end separation for our whole-person education management system. Mr. Wang was very open-minded, talked with me a lot, and agreed with my proposal. Front-end and back-end separation, as the name suggests, is to put the dynamic page rendering to the user side, the server is only responsible for data processing and storage. This will help the division of labor and decoupling, although there were a lot of skepticism on the Internet at the time, I still think this is the trend. At that time, I was obsessed with a kind of RestfulAPI interface specification, and I thought that if we did this, we could even avoid writing project documents. But the reality is very bone, in practice, for some complex cases it is difficult to maintain the style of RestfulAPI. By dealing with SpringBoot, I learned a lot of things about the backend, and I am currently using these original experiences. I wrote a lot of SpringBoot projects during my undergraduate career. By now, I’d come into the company and realized that the department’s tech stack was Java, and the frameworks used for new projects were SpringBoot, in reverse. Although I said I was disgusted with Java, I think it is bloated and cumbersome, including now there is no good feeling, but the Java ecosystem is indeed very powerful, want to find any component is very easy, and the Java components are mature, well-maintained and well-documented. the Java technology stack for the production-oriented back-end projects, it is really a hassle-free, good to find a job for the technology stack. I’ve been working in the company for half a year, in fact, counting the internship is almost a year. In general, for the background of this piece, I have been tinkering with optimization, caching, threading and these things. Every day, I analyze all kinds of alarms, some of which are business and some of which are technical problems. For business problems, you can only understand the background well. For technical problems, you need to expand your knowledge and study quietly. Including some of their own feeling difficult to solve the problem, for example, in the containerization deployment process, there are always some containers have occasional timeout problems, have not been able to determine whether the problem is the container or the background application problem. I am now thinking that this piece requires some more in-depth knowledge, such as the virtualization of the CPU, memory, network. That’s it for today, there’s a bit of other stuff to do. ...

November 20, 2022