On a daily basis our DevOps Consultants are presented with challenges that require the sharpest of minds to solve. This is one such case.
IP addresses and host names have been anonymized in the output below
Our client reported a strange issue within their Openstack cluster. HTTP requests to a certain third party service would wait sixty seconds before finally completing successfully with the expected data. A request that stalls for sixty seconds will usually result in a timeout from one of the hosts involved, so returning successfully is very unusual. Even with a successful response, a sixty second GET within the end user’s request flow still caused a bad experience (either timing out the user’s request, or at least making them wait a very long time).
Looking at the affected requests we trimmed it down to a simple reproduction case:
<?php $url ='https://api.example.com/simplerequest.php'; $result = file_get_contents($url); print($result);?>
Running this simple snippet from the PHP CLI on a VM reproduced the exact same behavior 100% of the time. There’s nothing special going on here, a trivial GET request to a third party service via PHP. Experimenting with the case we quickly found some more details:
These facts revealed a specific combination of factors, and changing any of these would cause the request work fine. Only the combination of making a request to this host, via PHP, on a VM in our cluster caused the strange stalls to occur.
back to top