Machine learning is something that is starting to emerge from the science fiction books and become mainstream. There are some incredible use-cases for it. From the futuristic idea of self-driving cars, to the less glamorous but perhaps more useful idea of improved natural language processing.
While this is a truly incredible concept, and it’s something that is complex to implement in the real world. There are some useful tools, though, such as XGBoost and Amazon Sagemaker, that can make creating real-world machine learning systems a practical possibility.
What is XGBoost?
XG-Boost is Extreme Gradient Boosting. It is an open source implementation of a powerful learning algorithm which can predict variables by combining the estimates of a larger set of simpler, but typically far less powerful models.
It performs well in most learning contests, and it can be used to handle lots of different types of data types and distributions. It can easily be tuned using numerous parameters and can be applied to a lot of problems, including classification, regression and ranking.
Using AWS for Intelligent Systems
Since the tool is open source, it is freely available and can be deployed on almost any system. It is offered for Amazon Web Services and SageMaker. SageMaker is a managed platform for hosting and training new models. Developers can bring their own containers to the platform, or make use of the algorithms and libraries that are provided with SageMaker.
Amazon SageMaker is a good choice for your platform because it is a powerful infrastructure that allows people to train massive datasets on multiple machines. Deployment is simple since SageMaker handles most of the work of setting up the environment and scaling it across multiple machines. It’s also easy to run A/B testing natively, with different weightings for inference, which helps you to narrow down which model performs best for your use case.
Integration with SageMaker Spark SDK
There is easy integration with the SageMaker Spark SDK, which has a concise but powerful API that makes interaction with other tools much smoother. Developers have the option of preprocessing data on Apache Spark, and then calling Amazon XGBoost directly from within the Spark environment, to train the machine learning models on data that has been preprocessed using Spark. Since you’re working within one ecosystem for all of this, it makes the process much smoother and simpler, and also less error-prone.
In addition, since this is being done on cloud instances on Amazon Web Services that are spun up or down on demand, and you pay only for what you are using, the long-term running costs of AWS should be far lower than if you were paying for an entire server that spent a lot of time under-utilized.
Maybe you'll be interested in this ebook: