Auger.AI Outperforms Other AutoML tools Even On Their Own Chosen Datasets

If you’re a regular reader of this blog you probably know that Microsoft recently released their own AutoML SDK. We at Auger.AI were excited to see this development. It helps to validate and build awareness of this exciting new product category: Automated Machine Learning. And we love to see other products to compare our results to.

Microsoft in their original paper for their AutoML approach compares their performance versus H20 and TPOT against their own chosen set of eighty-nine (89) OpenML.org datasets. We were also excited to see that they were using Bayesian optimization to perform a more intelligent search than grid search of possibilities. While prevalent in the hyperparameter optimization space, there has been little of this more sophisticated approach in AutoML (with the exception of Auger.AI, as we will discuss in subsequent posts). We were curious to see how Auger.AI would fare against these same datasets with the same amount of time for training.

So we ran experiments attempting to determine optimal algorithms with the Azure AutoML SDK, H20, TPOT and of course Auger.AI. We ran “experiments” (attempts to find the best model — or algorithm plus hyperparameters) using each AutoML tool. We time limited each attempt to one hour. All executions were run on AWS c5-xlarge instances. Note that some executions did not actually succeed. This was true of Auger as well as Azure and the other tools (although it was particularly challenging to get TPOT to successfully finish an experiment). We are going to continue to run with different hardware in order to get a complete run against as many of these datasets as possible (so the contents of this post will be changing)

auger automl.png

In the meantime, experiments succeeded for both Auger and Azure AutoML in 71 of the 89 datasets. Auger achieved higher accuracy over 90 percent of the time (on 7 of these datasets Azure AutoML’s results were better than Auger). The average improvement in accuracy was over 3.2%. Below is a table showing the average difference in accuracy (measured in R²) of Auger over Azure AutoML alone for each of those 71 OpenML datasets.

The full results with comparisons against H20 and TPOT are here. Note that, unlike Microsoft’s paper which compares against AutoSKLearn, we did not include that tool because its results were so poor overall on this set of datasets (perhaps not surprising since AutoSKLearn was Microsoft’s strawman to test against). Results against H20 and TPOT showed an even greater difference in performance (lesser accuracy for those tools). This is again perhaps not surprising since the datasets were of course chosen by the Azure AutoML creators.

In summary we continue to find that Auger.AI’s accuracy of winning selected models (algorithms and their hyperparameters) exceeds other AutoML tools. What other datasets would you like to see us try Auger against? What other AutoML tools should we include in this survey? Note that we do need to restrict ourselves to open source or publicly available services. Closed proprietary SaaS services for AutoML with subscriptions cannot be included, except perhaps by a third party consultant that has such a subscription.

In fact we would encourage you to duplicate these results yourself. Use the dataset ids and the URL to download the data will be at http://openml.org/d/<number>. Auger.AI is a free service (with a paid support model) and the other tools are either open source or (in the case of Azure) easily accessible. So you should be able to duplicate these results easily. We hope you do try your own datasets in the process if you try to do so. Please share your results with us.

In future posts we will discuss the core intelligent search of the algorithm-hyperparameter space approach used by Auger.AI and just why these results should not be too surprising.