Amazon SageMaker refers to a completely-managed service that allows data scientists and developers to easily and quickly build, train, and implement machine learning models.
Amazon SageMaker eliminates all the obstacles that usually slow down those developers who would like to deploy machine learning. Today it's even easier to handle production machine learning models using automatic scaling.
Instead of manually managing the instances that match the scale you require for your inferences, it's now possible to use SageMaker to balance your instances based on Automatic Scaling Policy automatically.
Automatic Scaling Explained
Amazon SageMaker supports AWS Auto scaling for production options. Auto scaling dynamically modifies the number of the instances delivered for a production deviation regarding adjustments in your workload.
Once the workload escalates, automatic scaling generates more instances online. However, if the workload drops, automatic scaling eliminates the additional instances to save you from paying for those various instances you are not using.
Regulating the number of instances
If you want to use automatic scaling in a production variant, then you should define and employ a scaling policy that makes use of Amazon CloudWatch metrics and focus on values that you allocate. Automatic scaling utilises the system to regulate the number of instances either down or up according to specific workloads.
You can utilise the Management Console to employ a scaling policy centred on an individual predefined metric. The preset parameter is outlined in a list to help you specify it using a name in your code or utilise it in the AWS Management Console. On the other hand, you can either use the AWS Application Automatic Scaling API or the Command Line Interface to employ a scaling policy from a custom or predefined metric. Before using AWS Auto Scaling configuration to manage production flow, ensure that you perform a load test to check if it's working correctly.
Components of Automatic Scaling
The SageMaker automatic scaling utilises a scaling policy to regulate all the instances holding a production variation. The following are the components of AWS Auto Scaling:
- Required permissions – These are the permissions needed to execute automatic scaling actions
- A target metric - This is the Amazon CloudWatch metric used by SageMaker automatic scaling to determine how much and how to scale
- A cool down phase - It is the amount of time after a scale-out, or scale-in activity completes before another operation can begin
- Maximum and minimum capacity - It's the maximum and minimum used in scaling the variant
- A service-related role - It's an Access and Identity Management role linked to a particular service. This role includes all the permissions the service needs to call other services on your behalf. Automatic scaling generates this role automatically.
Benefits of Automatic Scaling
Once you configure the endpoint using automatic scaling, SageMaker will continue monitoring your adopted models to regulate the instance amount automatically. SageMaker will maintain the throughput in desired levels, according to the variations in the application traffic. This process makes it easier to control the models in production, which can assist you in reducing the cost of the deployed models since you don't need to provide adequate capacity to manage your highest load. Instead, you should configure the parameters to accommodate your maximum peak, and the minimum expected traffic and SageMaker would operate within those parameters to minimise the costs.