In our last post, we pointed out that Auger outperforms the new Microsoft AutoML and other AutoML tools such as H2O and TPOT. Specifically, we took the OpenML datasets that Microsoft used to compare their AutoML with other tools, time-limited to one hour, and compared Auger’s predictive models (algorithms plus hyperparameters). Auger provided an average of 3.6% better accuracy than Microsoft’s AutoML (and Auger and Microsoft significantly outperformed both H20 and TPOT).
The big question we have received on this is “how is this possible to have such dramatically higher accuracy?”. There are several significant reasons for this. Let’s discuss each of these.
Most AutoML tools don’t do an intelligent search but instead rely on grid search (or “random search” which is really just “random grid search”). Some AutoML products that do use the same optimizer (which chooses which algorithm and hyperparameters to train next) for each dataset they are presented with. By contrast, Auger uses several off the shelf optimization algorithms (such as Tree Parzen Estimators, Partical Swarm and Nelder Mead) and appropriately chooses which optimizer to use based on characteristics of the dataset. And for some datasets Auger uses its own optimizers, which are worth discussing separately.
Auger has several proprietary optimizers which outperform the more widely known open algorithms cited above on certain datasets. We will not delve deeply into those here. There are a couple of insights that we can share. First, most Bayesian optimization approaches (such as HyperOpt and HyperBand) rely purely on estimates of the accuracy of the trained algorithm. Auger recognizes that, since exploring the space of possible algorithms and hyperparameters is actually an infinite time problem, that we need to define the problem as finding the best model within a bounded timeframe. Auger uses estimates of both accuracy and time to train to choose the model with the best chance of the highest accuracy within the selected time-bound (and we have patents filed on this).
Just like Auger combines leading complementary models into ensembles to get the highest accuracy, Auger in some circumstances combines approaches of optimizers to find better-suggested models faster. Auger’s proprietary optimizers combine well with more commonly known approaches. Much better than two similar known optimizers would (such as HyperOpt and HyperBand).
Whether with off-the-shelf optimizers or Auger’s own optimizers, in order to accelerate the process of getting acceptable model accuracy as soon as possible Auger chooses an initial model (algorithm plus hyperparameters) based on looking at characteristics of the dataset compared with other datasets it has trained against in the past. As a hosted service we have quite a large collection to learn from. Note that we only use the metadata of such datasets (numbers of features, numbers of rows or events, distribution of the data in features) not the data itself. We refer to this approach as “warm start”. Its one of the reasons why Auger so quickly begins to offer reasonably accurate trained models shortly after a Run starts.
What were your experiences running Auger on your datasets? What is the accuracy you got with other AutoML? What did you get with Auger? How much time did you give for the timebound on the training process? Especially if you are one of the few counterexamples of where Auger didn’t find you a more accurate model faster, we want to hear from you.