Automated Machine Learning  —  Not Just for Experts Anymore

Since we launched Auger a few months ago, we found that some of the most significant benefits were to novice machine learning users. These users typically don’t have great instincts on what machine learning algorithms to use for their problem. They usually have applied some statistical method to perform basic prediction, such as linear regression inside a spreadsheet or have tried some newer algorithm like a Support Vector Machine to eke out a little more accuracy. Selecting a state of the art algorithm based on the characteristics of their own dataset (numbers of observations, features, distribution within those features) is not something they were equipped to do. Each algorithm has its unique settings. It is unreasonable for a novice user even to know what all of these settings are for every algorithm. Let alone choose the right values for those settings for their unique problem.

Automated machine learning got its start from experts who spent weeks on a prediction or classification problem. They typically tried various algorithms and hyperparameters until an optimal result was found. Manually selecting each prediction algorithm and choosing hyperparameters for the algorithm took a tremendous amount of time due to the endless number of combinations possible.

Projects like TPOT and AutoSklearn came about as tools to help data science experts to be more productive. Auger.AI indeed shared this heritage. We are a team of machine learning experts who found that algorithm selection and tuning were dominating our efforts in applying machine learning to various problems. We built Auger as a “power tool for prediction” (like an auger for auguring, arg, sorry).

As machine learning gets more widely deployed, it is no longer acceptable to have just a “good enough” prediction model. In most industries, your competitors are using machine learning models. If you are, for example, competing with other websites in some particular market, your ML-driven homepage optimization (something Auger is currently used for successfully) must be better than your competitor for you to get the most conversions and dominate the industry. In any zero-sum competitive prediction races, such as options or futures forecasting driving trading, the best machine learning model will win. Suboptimal models on the other end of those trades will lose.

Automated machine learning tools allow the novice data scientist to converge quickly to a suitable algorithm and the best hyperparameter settings for that algorithm. One on-going problem is that these tools are generally expected to integrate into a more massive machine learning pipeline by a trained developer. By contrast, Auger offers a friendly model manager to view and evaluate the features of your model. Just upload your CSV, click “Run” and start watching resulting models on your Leaderboard.

There are other “machine learning model managers” (from Microsoft, Amazon, and others). But none of them offer “smart search” of machine learning algorithm choices and their options. There are a few “grid search” AutoML players with a visual model manager. The topic of why “grid search” will always yield poorer results than some form of smart search is the topic of another blog post here.

If you are a novice data scientist trying to choose an algorithm for your predictive problem, you owe it to yourself to try an AutoML tool to make an intelligent choice. Either a developer-oriented tool like TPOT or AutoSKLearn. Or one with a visual model manager like Auger.AI.

Adam Blum