Philadelphia, Pennsylvania
April 2–4, 2019
Click Here For Information & Registration
Back To Schedule
Wednesday, April 3 • 4:35pm - 5:45pm
Deep Dive: Chaos Engineering for Cloud Foundry Platform - Karun Chennuri & Ramesh Krishnaram, T-Mobile USA Inc

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Modern Internet-scale microservice architectures exhibit complex communication behavior and failure scenarios with chaotic behavior (a.k.a the Butterfly Effect) that may lead to large scale disruptive events. This complexity comes from the Cloud Foundry components, services running thereon, and the underlying infrastructure necessary to provide highly available compute, network, security, storage, persistence services. For a distributed microservice architecture to function ideally, these elements must all work in tandem and tolerate failure. To systematically verify that a system can tolerate failure, a disciplined approach is necessary. One such approach is “Chaos Engineering.”

Cloud Foundry is key in T-Mobile’s infrastructure, undoubtedly one of the largest CF platforms in the world, running business critical operations with over 30,000+ containers. Building resilency, self-healing and High Availability in to systems and apps – is one of the core factors that decides the success of our group. This proposal demonstrates the approach and the custom tools T-Mobile has been working on to purposefully breaking systems, identifying weaknesses, taking corrective actions and preparing for Game Days.

Here at T-Mobile we started addressing Chaos Engineering at 2 different levels - “Platform” & “App” level Chaos Engineering. In this talk, we would like to discuss the architecture details, drivers that we had opensourced to the community, Demo walk-through on features and future steps. As a part of this talk, Karun would like to demo the following features:

Simulate App level attacks
* Bad gateway errors at app level
* Latency between service and database
* Kill an app/service app is dependent on

Simulate Platform attacks:
* Terminate VM instances
* Host level attacks – CPU, Memory hogs
* Advanced Network Traffic attacks
* Advanced Packet Level attacks

Python, Go, Spring boot, Java, PCF, Linux

All this put together helps any large technology company in a systematic approach to verifying reliability of the Cloud Foundry platform.

avatar for Karun Chennuri

Karun Chennuri

Sr. Engineer, Security Architecture, T-Mobile USA Inc
Karun Chennuri, is the Sr Engineer at T-Mobile, who currently leads DevSecOps efforts for Cloud Foundry and Kubernetes teams within T-Mobile. He is a Software Developer with Security Expertise and has about 14 years of experience handling various assignments dealing in Security Solution... Read More →
avatar for Ramesh Krishnaram

Ramesh Krishnaram

Senior Manager, T-Mobile
Ramesh Krishnaram is the Sr.Manager for Platform Engineering at T-Mobile. His team at T-Mobile is responsible for providing simple, secure, scalable services with which developers can rapidly build, test, deploy software to the cloud. Over the past few years, Ramesh has spent time... Read More →

Wednesday April 3, 2019 4:35pm - 5:45pm EDT
Room 122B