Friday , June 18 2021

How to quickly test dozens of deep learning models in Python

We develop a neural network assembly line that allows us to easily experiment with numerous model configurations.

Assembly Line of Neural Networks (Source: all_is_magic on Shutterstock and author)

The optimization of machine learning models (ML) is not an exact science. The best model architecture, optimization algorithm and hyperparametric settings depend on the data you are working with. Therefore, being able to quickly test different model configurations is critical to maximizing productivity and driving progress in your ML project. In this article we will create an easy-to-use interface that lets you do it. We are essentially going to build an assembly line for ML models.

Each model is governed by a set of hyperparameters. We create some functions that generalize to these hyperparameters and build ad hoc models. Here are the primary hyperparameters that govern neural networks:

  • Number of hidden levels
  • Number of neurons per layer
  • Activation functions
  • Optimization algorithm
  • Learning rate
  • Regularization technique
  • Hyperparameters of regularization

We can pack all these into a hash table:

model_info = {}
model_info['Hidden layers'] = [100] * 6
model_info['Input size'] = og_one_hot.shape[1] - 1
model_info['Activations'] = ['relu'] * 6
model_info['Optimization'] = & # 39; adadelta & # 39;
model_info["Learning rate"] = .005
model_info["Batch size"] = 32
model_info["Preprocessing"] = & # 39; Standard & # 39;
model_info["Lambda"] = 0
model_2['Regularization'] = & # 39; l2 & # 39;
model_2['Reg param'] = 0,0005

Before we start experimenting with various model architectures, we quickly visualize the data to see what we're working with (data). Although from my experience the standard resizing seems to be the pre-processing method in fact, I have visualized the data using a variety of pre-processing tactics. I used PCA and t-SNE to reduce the dimensionality of data for each pre-processing method. The following are the views of the data that appear to be the most separable:

Source: author

We can then define a function that constructs and compiles a neural network given a hyperparametric hash table:

We can quickly test some basic models now that we have a fast and flexible way of building and compiling neural networks. This allows us to draw fast conclusions about what hyperparameters seem to work best:

Using the above function, I found that deeper and larger architectures are needed to achieve high data performance after evaluating over a dozen model architectures with a cross-validation of 5 times. This is probably due to the highly non-linear structure of our data.

Separately: if you are not familiar with k-fold validation, this is a model evaluation technique that involves dividing data into disjointed partitions. One of these partitions is used as a test set and the rest as set of training. Then we execute the iteration of each fold so that each partition has a turn as a test set. The execution of k-fold cross validation allows us to obtain a reliable evaluation of model performance.

Although k-fold cross validation is a great way to evaluate a model's performance, it is computationally expensive to achieve these results. We can simply split the data into a set of training and tests to draw faster heuristics while optimizing the parameters. We save our model after each era so that we can retrieve it after training, if necessary. We also use the Tensorboard callback to examine how the model was formed:

We can therefore get a more robust performance evaluation once we have gained some information on which hyperparameter settings work properly.

Grid research is not the go-to method for optimizing hyperparameters in the industry. Rather, a method indicated as a coarse approach to ends is more frequently employed. In the approximate method, we start with a wide range of hyperparameters, so we fine-tune the parameter settings that work best. Therefore, we randomly sample the hyperparametric settings from the narrow range of values ​​we want to experiment. We can quickly iterate on many model configurations now that we have a way to dynamically create deep neural networks:

Apart: when you call the Tensorboard log directory from your terminal you CAN NOT & # 39; have spaces in the file path. On Windows, the spaces in the registry directory prevent Tensorboard from loading the data correctly.

The above code will also save important metrics (such as the area under the ROC curve) for each model in a CSV file so that you can easily compare and compare what the hyperparameters lead to changes in performance.

Once we have a better idea of ​​what values ​​of the hyperparameter work well, we can begin to optimize the model within this range of values. The following function generates a randomized neural network. We can therefore use this function to experiment with various settings of randomized hyperparameter in the range of values ​​that we have narrowed down:

We have learned to quickly experiment with numerous model architectures and hyperparametric settings in this article. As always, constructive criticism is appreciated. If you liked the article or learned something new, do not hesitate to follow me on Medium, leave a round of applause or send me a message on [email protected] Thanks again!

Source code: here

Source link

Leave a Reply

Your email address will not be published.