How docutils broke our production
On 21.07.2019, a beautiful sunday, we got some alerts from our production servers. Around 95% of the requests were getting 500 “Internal Server Errors” and no one modified or touched anything.
We started investigating the cause. Soon we realized that our system got a high load that triggered the autoscaling to start new instances. We don’t have a pre-made image so when our instances start, they install all the python libraries from scratch. One of the libraries that we use was dependent on another library called docutils and they didn’t freeze the version. So instead of requiring “docutils==0.14” they just required “docutils”.
Because of this, our new instances got the new version (0.15) that was just released on that day. And this version had some python 3.x code left in it which was giving a syntax error on our python 2.7 backend. It was sad to see this happening. I found a bug report about the issue and also left a comment.
We had to require docutils ourselves, freezing it to the previous working version and it all came back to normal. I think this mistake caused a lot of other projects to fail and it reminded me of the leftPad failure of the javascript world even though it is not exactly the same thing.
Leave a Reply