Feelings

Weekly report for the first week of March 2023

Recently, compliance requirements have been creeping into the technical side of things. In recent times, there are always products coming to the table about implementing such and such compliance requirements. Or, do some kind of compliance questionnaire. I feel that compliance is mainly about the storage of user information, access needs to be standardized, and then users can gradually start to control their own data. And then there are some personnel, organizational changes on the technical level of the impact of changes, for example, a business has been dispatched to other departments, and then we have been and this business to share the database and other resources. At this time, the issue of cost comes to the fore, that is, the money should be counted over there. Although it is a company, a business group, but I feel that perhaps due to the company’s internal accounting mechanism, for these cost issues are still more serious, will always argue what: they use our database, the money is still on our side and so on. ...

Weekly Technical Report for the fourth week of February 2023

This week I realized that some of the service frameworks are not written very well, especially certain Java frameworks. When the CPU usage reaches about 40%, there are a lot of timeouts. These services, CPU cores and memory capacity is not not enough, the number of working threads is not not enough. However, it is not enough to run the CPU. Using Java performance tools to analyze the discovery, in fact, found that most of the working threads in Idle or Waiting state. At present, the comprehensive analysis of all the circumstances, still puzzled. NIO is also used, also used the Netty framework, but the throughput is not up. Through the analysis of threads, found that there is no particularly busy business threads. Inference should be IO or some kind of waiting mechanism to cause this low processing efficiency. ...

Technical Weekly Report for the third week of February 2023

This week was focused on dealing with a risk item that was discovered before the holidays. A service that was using Redis and not setting a TTL for the key, but was banking on the Redis elimination policy. I see that this service is using Redis with an LRU elimination strategy set up. This strategy may seem perfect, but there are pitfalls when there is a lot of write traffic for a certain shorter period of time. ...

Weekly Technical Report for February 2, 2023

From the end of January to the beginning of February, it falls under the Chinese New Year. During this period, the person responsible for securing the operation of the Spring Festival phase needs to be on call to deal with online issues. I was in a constant state of worry, and the good thing is that online problems did not come to me actively. Maintaining overall immobility throughout the Chinese New Year is the best. ...

Weekly Technical Report for the second week of January 2023

This week was mainly about ensuring the stability of various services in the run up to the Chinese New Year. Recently, I found that a certain service often reported timeout during peak traffic hours, and I reminded to forward to the service owner to deal with it. But after a few days, the service owner still couldn’t explain the reason. I had to deal with this problem personally, because the alarm has been very serious, and the timeout rate of some nodes can reach 20%. In this period of time, it should be due to the approaching holidays, the traffic has risen significantly, compared with the end of December has risen by 100%. So first of all, it is suspected that the carrying capacity of the service is insufficient, so it first carried out a capacity expansion. ...

Weekly Technical Report for January 1, 2023

As we move into 2023, this year is going to be a tough one. This year will face several challenges, one is to migrate all the data previously deployed on physical servers to the cloud. Then there’s the accelerated development of several new team members who will be able to take on the services involved in the current main business as soon as possible and will be expected to be able to independently resolve user issues and optimize the services. This will allow me to transfer some of the work to them and focus on important goals that are expected to take a long time this year. ...

Technical Review for the fourth week of December 2022

This week I contracted the COVID-19 and was home for a total of 9 days. During this period, the most important thing at work was to assess the impact of the promotion and launch of a small program on the basic service system that I am responsible for. This small program hit the needs of the people of China at that time, and it was expected to have a large influx of traffic, which might have an impact on the core services of the basic service system. ...

Technical Review for the third week of December 2022

This week, the main work was to optimize for a certain Java service. The service has been having problems with CPU usage not being able to ramp up. The first question to consider is whether the service has a problem of insufficient working threads. Later on, it was found that it was not that the CPU usage could not be raised up, but that the raised CPU usage would lead to more timeout problems. ...

Weekly Technical Report for the second week of December 2022

This week’s work is mainly a sorting out of this aspect of the work I am responsible for, and many problems have been identified so far. These problems are mainly focused on the data on the cloud, the current problem is mainly how to safely on the cloud, how to transform the current single-geography deployment scheme, how to fix the inconsistency between the data under the cloud and the data on the cloud. ...

Weekly Technical Report for the first week of December 2022

This week’s work, to summarize, is mainly to put a core service on the cloud, and then constantly switch the nodes under the cloud into traffic forwarding nodes. The first step in the cloud is to deploy the service node in the cloud environment: migrate the configuration files, environment, and then compile the image for the cloud environment according to the stable version of the code, and then let the service run up in the cloud environment. ...