Business Challenge
– A worldwide insurance platform growing their operations wanted to make it easy for their engineers to ship applications from dev through a mature CI/CD pipeline into production. They desired to containerize all their applications for parity between environments, so that they could meet the scaling challenges that were quickly coming their way. Multi-region resiliency, role based access controls, and other international regulatory restrictions were all required.
DevOps Transformation
The engineering team is distributed worldwide so communication and collaboration between them is critical. Operations had to be automated from the time an engineer checked in code and made a pull request to the time it was deployed into production. We leveraged containerization tools such as Docker and Kubernetes to set up their infrastructure on Amazon Web Services. This allowed the team to deploy an account per environment, with a management account that oversees all operations between them all, as well as monitors, aggregates logs, and runs a Chef and CI/CD server. Each environment contains a Kubernetes cluster, with various supporting services running alongside the teams software.
Chef is utilized to provision the various resources in Amazon Web Services, from a central OpsWorks-Managed Chef Server in the management account. Basic cookbooks provision VPNs, and highly available Etcd backends, masters and nodes to ensure resiliency.
Open source tools like Jenkins, Elasticsearch, and Sensu are leveraged.
Results
Environments are all created via AWS Cloudformation and Chef Cookbooks, and control the standing up and maintenance of VPCs, VPNs, monitoring stacks, and Kubernetes clusters.
Applications are written and checked into Version Control. Once changes are detected, Jenkins on Kubernetes starts the build process, which creates a Docker image and ships it to the EC2 Container Registry. Developers can run their applications within the Regional Sandbox, which is free for all to use, with built in Guard Rails. Additionally, a User Acceptance Testing environment runs a Kubernetes install and creates a version of the application for Quality Assurance. Once tests pass and the team is satisfied, the Helm updates the application running in production. Regressions are captured in the pipeline and stopped at the door prior to getting to production.
Monitoring the applications allows engineers the ability to instantly detect issues with their deployments and alert the team. Binary monitoring is through Sensu, with metrics being handled by Prometheus. Logs are aggregated with Fluentd and shipped to Elasticsearch and S3.
The engineering team is able to Build, Test, Deploy, and Monitor their work without having to be involved in the infrastructure at all. This is because the tooling allows them to access the items they need to ensure the applications are behaving correctly and consistently performant.
Standing up new regions is a process that takes less than an hour, and delivers a new VPC, VPN if needed and Kubernetes cluster with various items already stood up inside of it, ready to be used. Teams with access can then start using it immediately, because it mirrors the existing environments.
Conclusion
The team is able to ship dozens of applications around into different environments during development & testing and efficiently deploy them to the cloud at scale, worldwide. The team is not concerned about managing different machine types anymore, instead spending their time delivering features.