MySpleen: News

MirrorWatch

Newbie
Mar 25, 2023
354
75
28
Rating - 100%
19   0   0
There have been a couple of outages recently, and those are related to changes we have made on the back end. Recently we switched this site off of apache (again) and went with nginx (we have in the past used lighttpd too, but its been a while) which in the end has cut the sites load average in half and improved overall stats.

Downside was it doubled all the logfile sizes, and I did not initially manage the log file rotation correctly and MySQL has a fit when you run out of disk space. The logging rotation should be corrected now and the disk space was also expanded for the server to give more headroom as well.

Why did we change this stuff? I was unhappy with the frequency in which 503s were being returned to clients, and sometimes the request times on the site would be ridiculously bad, then blazing fast again upon refresh. Looking deeper into the logs, it turned out that our dynamic php-fpm pool configuration was not adding more threads fast enough (mostly because of our bursty announce/scrape traffic) and that was the root cause of most of the 503 error responses from the site.

The solve was two fold. One, we went with a static php-fpm pool so we would either run out of our max threads or be fine - no php-fpm scaling slowness would be at fault. Two, to help keep those threads available, we also added rate limiting to all of our php endpoints. If you (or really, your client) sends too many requests at once, nginx will not send your request to be processed (protecting php-fpm) and return a 429 error code instead. So if you see a bunch of 'could not parse bencoded data' from the tracker in your client, you probably got a 429 response. Your client will retry again (~30s) until it gets through and be fine. This has helped spread out the announce bursting from all of our seeders and seems to have improved site response and availability. As long as I don't run it out of disk space again, it should be great!