Scaling Your Applications, Automatically

Almost always there will be more than once instance of each of your applications on Kubernetes. Multiple instances provide both fault tolerance and increased traffic serving when the demand for your service increases. After all, why did you move your applications to a distributed platform like Kubernetes? Because you want to leverage large amounts of CPU, memory, and I/O across your cluster. However, as you know these resources cost money so you only want your service replications to increase when the demand increases. When service demand is low the instances should scale down to save you money, and lessen your carbon footprint.

There are three types of scaling in Kubernetes:

This lab shows you how to achieve Horizontal Pod Scaling, automatically. While you can scale manually, ideally the scaling should be automatic based on demand, so the complete name for this Kubernetes feature is the Horizontal Pod Autoscaler (HPA) .

Basic automatic scaling is simply achieved by declaring the CPU threshold and the minimum and maximum number of Pods to scale up and the minimum Pod count down. Exceeding the CPU threshold is monitored by observing the current CPU load metric and triggering scaling events when the activity goes up or down within a specified period. It’s essentially a control loop comparing metrics against declared states.

In this lab, you will learn how to:

    ☐ Install the metrics-server for gathering metrics
    ☐ Install a pod that can be scaled
    ☐ Define the scaling rules and the number of pods to scale up and down
    ☐ Increase service demand to trigger scaling up
    ☐ Observe scaling up and down

Scaling Your Applications, Automatically

Scaling Your Applications, Automatically

LEVEL

DURATION

UPDATED

Categories

Tags