An introduction to AutoML – Deep Learning with TensorFlow 2 and Keras – Second Edition


An introduction to AutoML

The goal of AutoML is to enable domain experts who are unfamiliar with machine learning technologies to use ML techniques easily.

In this chapter, we will go through a practical exercise using Google Cloud, and do quite a bit of hands-on work after briefly discussing the fundamentals. We will talk about automatic data preparation, automatic feature engineering, and automatic model generation. Then, we introduce AutoKeras and Cloud AutoML with its multiple solutions for Table, Vision, Text, Translation, and for Video processing.

What is AutoML?

During the previous chapters we have introduced several models used in modern machine learning and deep learning. For instance, we have seen architectures such as Dense networks, CNNs, RNNs, Autoencoders, and GANs.

Two observations are in order. First, these architectures are manually designed by deep learning experts, and are not necessarily easy to explain to non-experts. Second, the composition of these architectures themselves was a manual process, which involved a lot of human intuition and trial and error.

Today, one primary goal of artificial intelligence research is to achieve Artificial General Intelligence (AGI) – the intelligence of a machine that can understand and automatically learn any type of work or activity that a human being can do. However, the reality was very different before AutoML research and industrial applications started. Indeed, before AutoML, designing deep learning architectures was very similar to crafting – the activity or hobby of making decorative articles by hand.

Take for instance the task of recognizing breast cancer from X-rays. After reading the previous chapters, you will probably think that a deep learning pipeline created by composing several CNNs may be an appropriate tool for this purpose.

That is probably a good intuition to start with. The problem is that it is not easy to explain to the users of your model why a particular composition of CNNs works well within the breast cancer detection domain. Ideally, you want to provide easily accessible deep learning tools to the domain experts (in this case, medical professionals) without such a tool requiring a strong machine learning background.

The other problem is that it is not easy to understand whether or not there are variants (for example, different compositions) of the original manually crafted model that can achieve better results. Ideally, you want to provide deep learning tools for exploring the space of variants (for example, different compositions) in a more principled and automatic way.

So, the central idea of AutoML is to reduce the steep learning curve and the huge costs of handcrafting machine learning solutions by making the whole end-to-end machine learning pipeline more automated. To this end, we assume that the AutoML pipeline consists of three macro-steps: data preparation, feature engineering, and automatic model generation (see Figure 1). Throughout the initial part of this chapter, we are going to discuss these three steps in detail. Then, we will focus on Cloud AutoML:

Figure 1: Three steps of an AutoML pipeline

Achieving AutoML

How can AutoML achieve the goal of end-to-end automatization? Well, you are probably already guessing that a natural choice is to use machine learning – that's very cool – AutoML uses ML for automating ML pipelines.

What are the benefits? Automating the creation and tuning of the machine learning end-to-end offers produces simpler solutions, reduces the time to produce them, and ultimately might produce architectures that could potentially outperform the models that were crafted by hand.

Is this a closed research area? Quite the opposite. At the beginning of 2020, AutoML is a very open research field, which is not surprising, as the initial paper drawing attention to AutoML was published at the end of 2016.

Automatic data preparation

The first stage of a typical machine learning pipeline deals with data preparation (recall the pipeline of Figure 1). There are two main aspects that should be taken into account: data cleansing, and data synthesis.

Data cleansing is about improving the quality of data by checking for wrong data types, missing values, errors, and by applying data normalization, bucketization, scaling, and encoding. A robust AutoML pipeline should automate all of these mundane but extremely important steps as much as possible.

Data synthesis is about generating synthetic data via augmentation for training, evaluation, and validation. Normally, this step is domain-specific. For instance, we have seen how to generate synthetic CIFAR10-like images (Chapter 4, Convolutional Neural Networks) by using cropping, rotation, resizing, and flipping operations. One can also think about generate additional images or video via GANs (see Chapter 6, Generative Adversarial Networks) and using the augmented synthetic dataset for training. A different approach should be taken for text, where it is possible to train RNNs (Chapter 9, Autoencoders) to generate synthetic text or to adopt more NLP techniques such as BERT, seq2seq, or Transformers to annotate or translate text across languages and then translate back to the original one – another domain-specific form of augmentation.

A different approach is to generate synthetic environments where machine learning can occur. This became very popular in reinforcement learning and gaming, especially with toolkits such as OpenAI Gym, which aims to provide an easy-to-set-up simulation environment with a variety of different (gaming) scenarios.

Put simply, we can say that synthetic data generation is another option that should be provided by AutoML engines. Frequently, the tools used are very domain-specific and what works for image or video would not necessary work in other domains such as text. Therefore, we need a (quite) large set of tools for performing synthetic data generation across domains.

Automatic feature engineering

Featuring engineering is the second step of a typical machine learning pipeline (see Figure 1). It consists of three major steps: feature selection, feature construction, and feature mapping. Let's look at each of them in turn:

Feature selection aims at selecting a subset of meaningful features by discarding those that are providing little contribution to the learning task. In this context, meaningful is truly dependent on the application and the domain of your specific problem.

Feature construction has the goal of building new derived features, starting from the basic ones. Frequently, this technique is used to allow better generalization and to have a richer representation of the data.

Feature extraction aims at altering the original feature space by means of a mapping function. This can be implemented in multiple ways; for instance, it can use autoencoders (see Chapter 9, Autoencoders), PCA, or clustering (see Chapter 10, Unsupervised Learning).

In short, feature engineering is an art based on intuition, trial and error, and a lot of human experience. Modern AutoML engines aim to make the entire process more automated, requiring less human intervention.

Automatic model generation

Model generation and hyperparameter tuning is the typical third macro-step of a machine learning pipeline (see Figure 1).

Model generation consists of creating a suitable model for solving specific tasks. For instance, you will probably use CNNs for visual recognition, and you will use RNNs for either time series analysis or for sequences. Of course, many variants are possible, each of which is manually crafted through a process of trial and error, and works for very specific domains.

Hyperparameter tuning happens once the model is manually crafted. This process is generally very computationally expensive and can significantly change the quality of the results in a positive way. That's because tuning the hyperparameters can help to optimize our model further.

Automatic model generation is the ultimate goal of any AutoML pipeline. How can this be achieved? One approach consists in generating the model by combining a set of primitive operations including convolution, pooling, concatenation, skip connections, recurrent neural networks, autoencoders, and pretty much all the deep learning models we have encountered throughout this book. These operations constitute a (typically very large) search space to be explored, and the goal is to make this exploration as efficient as possible. In AutoML jargon, the exploration is called NAS, or Neural Architecture Search.

The seminal paper on AutoML [1] was produced in November 2016. The key idea (see Figure 2) is to use reinforcement learning (RL, see Chapter 11, Reinforcement Learning). An RNN acts as the controller and it generates the model descriptions of candidate neural networks. RL is used to maximize the expected accuracy of the generated architectures on a validation set.

On the CIFAR-10 dataset, this method, starting from scratch, designed a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. The CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, the model can compose a novel recurrent cell that outperforms the widely used an LSTM cell (see Chapter 9, Autoencoders), and other state-of-the-art baselines. The cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 better than the previous state-of-the-art model.

The key outcome of the paper is shown in Figure 2. A controller network based on RNNs produces a sample architecture A with probability p. This candidate architecture A is trained by a child network to get a candidate accuracy R. Then a gradient of p is computed and scaled by R to update the controller. This reinforcement learning operation is computed in a cycle a number of times. The process of generating an architecture stops if the number of layers exceeds a certain value. The details of how a RL-based policy gradient method is used by the controller RNN to generate better architectures are in [1]. Here we emphasize the fact that NAS uses a meta-modeling algorithm based on Q-learning with exploration strategy and with experience replay (see Chapter 11, Reinforcement Learning) to explore the model search space:

Figure 2: NAS with Recurrent Neural Networks

Since the original paper in late 2016, a Cambrian explosion of model generation techniques has been observed. Initially, the goal was to generate the entire model in one single step. Later, a cell-based approach has been proposed where the generation is divided into two macro-steps: first a cell structure is automatically built and then a predefined number of discovered cells are stacked together to generate an entire end-to-end architecture [2].

This Efficient Neural Architecture Search (ENAS) delivers strong empirical performance using significantly fewer GPU-hours compared with all existing automatic model design approaches, and notably, is 1000x less computationally expensive than standard Neural Architecture Search (in 2018). Here, the primary ENAS goal is to reduce the search space via hierarchical composition. Variants of the cell-based approach have been proposed, including pure hierarchical methods where higher-level cells are generated by incorporating lower-level cells iteratively.

Still considering NAS, a completely different idea is to use transfer learning (see Chapter 5, Advanced Convolutional Neural Networks) to transfer the learning of an existing neural network into a new neural network in order to speed up the design [3]. In other words, we want to use transfer learning in AutoML.

Another approach is based on Genetic Programming (GP) and Evolutionary algorithms (EAs) where the basic operations constituting the model search space are encoded into a suitable representation and then this encoding is gradually mutated to progressively better models in a way that resembles the genetic evolution of living beings [4].

Hyperparameter tuning consists of finding the optimal combination of hyperparameters both related to learning optimization (batch size, learning rate, and so on) and model-specific ones (kernel size; number of feature maps and so on for CNNs; number of neurons for dense or autoencoder networks, and so on). Again, the search space can be extremely large. There are two approaches generally used: grid search and random search.

Grid search divides the search space into a discrete grid of values and tests all the possible combinations in the grid. For instance, if there are three hyperparameters and a grid with only two candidate values for each of them, then a total of 2 × 3 = 6 combinations must be checked. There are also hierarchical variants of grid search, which progressively refine the grid for regions of the search space and provide better results. The key idea is to use a coarse grid first, and after finding a better grid region, implement a finer grid search on that region.

Random search performs a random sampling of the parameter search space, and this simple approach has been proven to work very well in many situations [5].

Now that we have briefly discussed the fundamentals we will do quite a bit of hands-on work on Google Cloud. Let's start.


AutoKeras [6] provides functions to automatically search for the architecture and hyperparameters of deep learning models. The framework uses Bayesian optimization for efficient neural architecture search. You can install the alpha version by using pip:

pip3 install autokeras # for 0.4 version
pip3 install git+git:// # for 1.0 version

The architecture is explained in Figure 3 (taken from [6]):

  1. The user calls the API
  2. The searcher generates neural architectures on CPU
  3. Real neural networks with parameters on RAM from the neural architectures
  4. The neural network is copied to GPU for training
  5. Trained neural networks are saved on storage devices
  6. The searcher is updated based on the training results

Steps 2 to 6 will repeat until a time limit is reached:

Figure 3: AutoKeras system overview

Google Cloud AutoML

Cloud AutoML ( is a full suite of products for image, video, and text processing. As of the end of 2019, the suite consists of the following components, which do not require you to know how the deep learning networks are shaped internally:

AutoML Tables

  • Enables you to automatically build and deploy state-of-the-art machine learning models on structured data used for general supervised classification and regression (see chapters 1, 2, and 3).

AutoML Vision

  • AutoML Vision: Enables you to train machine learning models to classify your images according to your own defined labels.
  • AutoML Object Detection: Used to automatically build a custom model to detect objects in an image with bounding boxes and labels, then deploy it to the cloud or on the edge.

AutoML Natural Language

  • AutoML Text Classification: Used to automatically build a machine learning model to classify content into a custom set of categories.
  • AutoML Sentiment Analysis: Used to automatically build a machine learning model to analyze the sentiment expressed within text.
  • AutoML Entity Extraction: Used to automatically build a machine learning model to recognize a custom set of entities within text.
  • Cloud Natural Language API: Use Google's proven pretrained model for general content classification, sentiment analysis, and entity recognition.

AutoML Video Intelligence

  • AutoML Video Intelligence Classification: Used to automatically build a custom model to classify images, then deploy it to the cloud or on the edge.

AutoML Translation

  • AutoML Translation: Build on top of Google's powerful Translation API with the words, phrases, and idioms that you need.

In the remainder of this chapter we will review five AutoML solutions: AutoML Tables, AutoML Vision, AutoML Text Classification, AutoML Translation, and AutoML Video Classification.

Using Cloud AutoML ‒ Tables solution

Let's see an example of using Cloud AutoML Tables (see Figure 4). We'll aim to import some tabular data and train a classifier on that data; we'll use some marketing data from a bank. Note that this and the following examples might be charged by Google according to different usage criteria (please check online for latest cost estimations, at

Figure 4: Google Cloud AutoML

As of the end of 2019, AutoML Tables is still in beta. Thus, we need to enable the beta API (see Figure 5):

Figure 5: AutoML Tables beta API

Then, we can create a new dataset (see Figure 6 and 7) and import the data (see Figure 8):

Figure 6: AutoML Tables: the initial interface

Figure 7: AutoML Tables: create a new dataset

For our example we use a demo dataset stored in Google Cloud storage inside the bucket gs:://cloud-ml-tables-data/bank-marketing.csv:

Figure 8: AutoML Tables: importing a csv dataset from cloud storage

Importing may require quite some time (see Figure 9):

Figure 9: AutoML Tables – importing a CSV dataset

Once the data is imported, AutoML recognizes the type of each column (see Figure 10):

Figure 10: AutoML Tables: importing a CSV dataset

Let's select the target as the Deposit column. Since the selected column is categorical data, AutoML Tables will build a classification model. This will predict the target from the classes in the selected column. The classification is binary: 1 represents a negative outcome, meaning that a deposit is not made at the bank; 2 represents a positive outcome, meaning that a deposit is made at the bank.

The ANALYZE tab (see Figure 11) gives the opportunity to inspect the dataset with several metrics such as feature names, type, missing values, distinct values, invalid values, correlation with the target, mean, and standard deviation:

Figure 11: AutoML Tables: inspecting the dataset

It is now time to train the model by using the TRAIN tab (see Figure 12). In this example, we accept 1 hour as our training budget. During this time, you can go and take a coffee while AutoML works on your behalf (see Figure 13). The training budget is a number between 1 and 72 for the maximum number of node hours to spend training your model.

If your model stops improving before then, AutoML Tables will stop training and you'll only be charged the money corresponding to the actual node budget used:

Figure 12: AutoML Tables: preparing to train

Training a model costs around $20 per hour of compute resources, billed at the granularity of seconds.

This price includes the use of 92 n1-standard-4 equivalent machines in parallel. An initial six hours of free training are included:

Figure 13: AutoML: training the model

After less than one hour, Google AutoML sent an email to my inbox (see Figure 14):

Figure 14: AutoML Tables: training is concluded, and an email is sent to my account

Clicking on the suggested URL, it is possible to see the results of our training. The AutoML generated model reached an accuracy of 90% (see Figure 15). Remember that accuracy is the fraction of classification predictions produced by the model that were correct on a test set, which is held automatically. The log-loss (for example, the cross-entropy between the model predictions and the label values) is also provided. A lower value indicates a higher-quality model.

In addition, the Area Under the Cover Receiver Operating Characteristic (AUC ROC) curve is represented. This ranges from zero to one, and a higher value indicates a higher-quality model. This statistic summarizes a AUC ROC curve, which is a graph showing the performance of a classification model at all classification thresholds. The True Positive Rate (TPR) (also known as "recall") is: where TP is the number of true positives and FN is the number of false negatives. The False Positive Rate (FPR) is: , where FP is the number of false positives and TN is the number of true negatives.

A ROC curve plots TPR versus FPR at different classification thresholds. In Figure 15 you will see the Area Under the Curve (AUC) for one threshold of a ROC curve, whereas you can see the ROC curve itself in Figure 17.

It is possible to deep dive into the evaluation by accessing the evaluation tab and see additional information (see Figure 16) and access the confusion matrix (see Figure 17):

Figure 15: AutoML Tables: analyzing the results of our training

Figure 16: AutoML Tables: deep dive on the results of our training

Note that manually crafted models available in get to an accuracy of ~86-90%. Therefore, our model generated with AutoML is definitively a very good result!

Figure 17: AutoML Tables: additional deep dive on the results of our training

If we are happy with our results, we can then deploy the model in production via the PREDICT tab (see Figure 18). Then it is possible to make online predictions of income by using a REST ( API, using this command for the example we're looking at in this chapter:

curl -X POST -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ \
  -d @request.json

You can use your command as generated by the console (see Figure 19) via JSON (see Figure 20):

Figure 18: AutoML Tables: deploying in production

Figure 19: AutoML Tables: querying the deployed model in production

You can also predict via the web console (see Figure 21):

Figure 20: AutoML Table: accessing the deployed model via REST API and JSON

Figure 21: AutoML Table: predicting deposit via the web console

Put simply, we can say that Google Cloud ML is very focused on simplicity of use and efficiency for AutoML. Let's summarize the main steps required (see Figure 22):

  1. The dataset is imported
  2. Your dataset schema and labels are defined
  3. The input features are automatically recognized
  4. AutoML performs the magic by automatically doing feature engineering, creating a model, and tuning the hyperparameters
  5. The automatically built model can then be evaluated
  6. The model is then deployed in production

Of course, it is possible to repeat in cycle 2-6 by changing the schema and the definition of the labels:

Figure 22: AutoML Table – main steps required

In this section we have seen an example of AutoML focused on easy of use and efficiency. The progress made is shown in Faes et al. [7], quoting the paper:

"We show, to our knowledge, a first of its kind automated design and implementation of deep learning models for health-care application by non-AI experts, namely physicians. Although comparable performance to expert-tuned medical image classification algorithms was obtained in internal validations of binary and multiple classification tasks, more complex challenges, such as multilabel classification, and external validation of these models was insufficient. We believe that AI might advance medical care by improving efficiency of triage to subspecialists and the personalisation of medicine through tailored prediction models. The automated approach to prediction model design improves access to this technology, thus facilitating engagement by the medical community and providing a medium through which clinicians can enhance their understanding of the advantages and potential pitfalls of AI integration."

In this case Cloud AutoML Vision has been used. So, let's look at an example.

Using Cloud AutoML ‒ Vision solution

For this example, we are going to use the code made by Ekaba Bisong and available as open source under the MIT License ( Here the task is to classify images:

Figure 23: Lung chest X-rays

This type of classification requires expert knowledge when performed by humans. Using language typical of clinicians who are specialized in analyzing chest X-rays: "The normal chest X-ray (left panel) shows clear lungs with no areas of abnormal opacification. Bacterial pneumonia (middle) typically exhibits a focal lobar consolidation, in this case in the right upper lobe (see arrows), whereas viral pneumonia (right) manifests with a more diffuse "interstitial" pattern in both lungs". (Source: Kermany, D. S., Goldbaum M., et al. 2018. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell.

Let's start. The first step is to activate the Image Classification option under AutoML Vision (see Figure 24):

Figure 24: AutoML Vision – Image Classification

We can now create a new dataset (see Figure 25):

Figure 25: AutoML Vision – creating a new dataset

The dataset contains:

  • 5,232 chest X-ray images from children
  • 3,883 are samples of bacterial (2,538) and viral (1,345) pneumonia
  • 1,349 samples are healthy lung X-ray images

The dataset is hosted on Kaggle, a web site dedicated to machine learning where people can compete in creating ML models shared with the community. The dataset can be accessed at So, we need to get the dataset from Kaggle. Let's activate Cloud Shell from the upper-right corner of Google Cloud Console (see Figure 26):

Figure 26: AutoML Vision – activating Cloud Shell

Then, we can install Kaggle with pip (see Figure 27):

Figure 27: AutoML Vision - getting Kaggle data

sudo pip install kaggle

Now, we need to generate a token from Kaggle, which can be done by accessing<YourLogin>/account (see Figure 28):

Figure 28: Kaggle – creating a new Kaggle API token

The token can be now uploaded on the cloud ephemeral VM via the console (see Figure 29):

Figure 29: Kaggle – uploading the Kaggle token

Move the uploaded kaggle.json key to the directory. Download the dataset from Kaggle to Google Cloud Storage, unzip the archives, and move to a Google Cloud Platform (GCP) bucket, with the following commands:

Instructions for creating cloud storage can be found at

a_gulli@cloudshell:~$ mv kaggle.json .kaggle/
a_gulli@cloudshell:~$ kaggle datasets download paultimothymooney/chest-xray-pneumonia
a_gulli@cloudshell:~$ unzip
a_gulli@cloudshell:~$ unzip
a_gulli@cloudshell:~$ gsutil -m cp -r chest_xray gs://authentica-de791-vcm/chestXrays

Now we can create a new dataset for the visual training. We need a list of images on Google storage where each image is annotated with a label, as in the following example:

['gs://authentica-de791-vcm/chestXrays/train/NORMAL/IM-0115-0001.jpeg', 'NORMAL']
['gs://authentica-de791-vcm/chestXrays/train/NORMAL/IM-0117-0001.jpeg', 'NORMAL']

The first thing is to create a new notebook (see Figure 30):

Figure 30: GCP – creating a new notebook

Then a new instance with TensorFlow 2.0 (see Figure 31):

Figure 31: GCP – creating a new notebook instance with TensorFlow 2.0

This will create a new machine (see Figure 32):

Figure 32: GCP – provisioning a new machine with TensorFlow 2.0

Once the machine has been provisioned, we can open Jupyter Notebook (see Figure 33) and clone the repository by clicking the link provided by the environment in the UI (see Figure 34):

Figure 33: GCP – opening JupyterLab

Figure 34: JupyterLab – cloning a Git repo by using the icon in grey

We can now preprocess all the images in our bucket by running all the cells in the notebook (see Figure 35). The notebook will help preprocessing them. Make sure that you customize the notebook to take into account your data paths, and GCP buckets:

Figure 35: AutoML Vision – importing the dataset

It will take a while to import the data (see Figure 36). When concluded, an email is sent, and it is possible to browse the images (see Figure 37):

Figure 36: AutoML Vision – importing images

Figure 37: AutoML Vision – lung images

The next step is to start training (see Figure 38). Since at least 100 images are currently assigned to each label, there are enough images to start training. Images will be automatically split into training and test sets, so that it's possible to evaluate the model's performance. Unlabeled images will not be used:

Figure 38: AutoML Vision – start training

There are two options: either the model is hosted in the cloud or it is optimized to run on the Edge (see Figure 39):

Figure 39: AutoML Vision – preparing to train the model

Training can take up from 15 minutes to several hours (see Figure 40):

Figure 40: AutoML Vision – training the model

At the end we will receive an email and we can access the results (see Figure 41):

Figure 41: AutoML Vision – evaluating the results

When a particular problem includes an imbalanced dataset, accuracy isn't a good metric to look for. For example, if your dataset contains 95 negatives and 5 positives, having a model with 95% accuracy doesn't make sense at all. The classifier might label every example as negative and still achieve 95% accuracy. Hence, we need to look for alternative metrics. Precision and Recall are very good metrics to deal with such problems. It is also possible to access a detailed evaluation by clicking the SEE FULL EVALUATION link and see the precision, the Precision@1, and the Recall@1 (see Figure 42) together with the confusion matrix (see Figure 43):

Figure 42: AutoML Vision – evaluating the results: Precision, Precision@1, Recall@1

Figure 43: AutoML Vision – evaluating the results: confusion matrix

Note that again, the AutoML generated model is comparable or even better than the models manually crafted at the end of 2019. Indeed, the best model ( available at the end of 2019 reached a recall of 0.98 and a precision of 0.79 (see Figure 44):

Figure 44: Chest X-Ray Images – manually crafted models on Kaggle

Using Cloud AutoML ‒ Text Classification solution

In this section we are going to build a classifier using AutoML. Let's activate the text classification solution via (see Figure 45 and 46):

Figure 45: AutoML Text Classification – accessing the natural language interface

Figure 46: AutoML Text Classification – launching the application

We are going to use a dataset already available online (, load it into a dataset named "happiness," and perform a single-label classification (see Figure 47). The file is uploaded from my computer (see Figure 48):

Figure 47: AutoML Text Classification – creating the dataset

Figure 48: AutoML Text Classification – uploading the dataset

Once the dataset is loaded you should be able to see that each text fragment is annotated with one category out of seven (see Figure 49):

Figure 49: AutoML Text Classification - sample of text and categories

It is now time to start training the model (see Figure 50, 51, and 52):

Figure 50: AutoML Text Classification – start training

Figure 51: AutoML Text Classification – summary of label distribution

Figure 52: AutoML Text Classification – training a new model

At the end, the model is built and it achieves a good precision of 87.6% and recall of 84.1% (see Figure 53):

Figure 53: AutoML Text Classification – precision and recall

If you are interested in playing some more with happiness-related datasets, I suggest having a look at Kaggle:

Using Cloud AutoML ‒ Translation solution

In this solution, we are going to auto-create a model for translating text from English to Spanish built on the top of a large model provided by Google as the base.

As usual, the first step is to activate the solution (see Figure 54) and then create a dataset (see Figure 55):

Figure 54: AutoML Text Translation – accessing the solution

Figure 55: AutoML Text Translation – creating a new dataset

For this simple example, we use a sample already available in and extract the file en-es.tsv from the archive. You should be able to see a few examples like the following:

Make sure all words are spelled correctly.    Comprueba que todas las palabras están escritas correctamente.
Click for video information    Haz clic para ver la información en vídeo
Click for product information    Haz clic para ver la información sobre el producto
Check website for latest pricing and availability.    Accede al sitio web para consultar la disponibilidad y el precio más reciente.
Tap and hold to copy link    Mantén pulsado el enlace para copiarlo
Tap to copy link    Toca para copiar el enlace

Then, you can create the dataset and select the source and the target language (see Figure 56):

Figure 56: AutoML Text Translation – choosing the language

As the next step, you can upload the training file (see Figure 57) and wait until the data is in (Figure 58):

Figure 57: AutoML Text Translation – select files to train

Figure 58: AutoML Text Translation – examples of sentences

Next, choose a base model from which to start (see Figure 59). As of late 2019, there is only one base model available, named Google Neural Machine Translation (Google NMT). This is the model used in production by Google for online translation. Now, you can start training (see Figure 60):

Figure 59: AutoML Text Translation – selecting the base model

Figure 60: AutoML Text Translation – starting to train

Once the model is trained, we can use it and compare the results against the Google base model (see Figure 61):

Figure 61: AutoML Text Translation – compare the Custom model and Google NMT model

The results are also accessible via a REST API (see Figure 62):

Figure 62: AutoML Text Translation – REST API

Using Cloud AutoML ‒ Video Intelligence Classification solution

In this solution, we are going to automatically build a new model for video classification. The intent is to be able to sort different video segments into various categories (or classes) based on their content. The first step is to activate the solution (see Figure 63) and load a dataset (Figure 64, 65, and 66). We are going to use a collection of about 5,000 videos available in a demo already stored in a GCP bucket on gs://automl-video-demo-data/hmdb_split1_40_mp4.csv:

Figure 63: AutoML Video Intelligence – activating the solution

Figure 64: AutoML Video Intelligence – choosing the dataset

Figure 65: AutoML Video intelligence – starting to load the dataset

Figure 66: AutoML Video Intelligence – importing the videos

Once the videos are imported you should be able to preview them with their associated categories (see Figure 67):

Figure 67: AutoML Video Intelligence – imported video preview

We can now start to build a model. In this case, the solution is warning that we don't have enough videos in some categories, and it is asking whether or not we want to add more videos. Let's ignore the warning for now (see Figure 68):

Figure 68: AutoML Video Intelligence – warning to get more videos

Now we can start training (see Figure 69 and 70):

Figure 69: AutoML Video Intelligence – starting to train

Once the model is trained you can access the results from the console (Figure 68). In this case we achieved a precision of 81.18% and a recall of 76.65%. You can play with the model, for instance increasing the number of labeled videos available, to see how the performance will change:

Figure 70: AutoML Video Intelligence – evaluating the results

Let's have a detailed look at the results via the EVALUATE tab. For instance, we can analyze the precision/recall graph for different levels of threshold (see Figure 71) and the confusion matrix showing examples of wrong classification of shots (see Figure 72):

Figure 71: AutoML Video Intelligence – precision and recall

Figure 72: AutoML Video Intelligence – confusion matrix

We can also test the predictions of the model that was just created. In this case, we use a demo dataset available at gs://automl-video-demo-data/hmdb_split1_test_gs_predict.csv (see Figure 73):

Figure 73: AutoML Video Intelligence – testing the model

This will start a batch process where all the videos in the test dataset are analyzed by our automatically generated model. Once done, you can inspect each video and get the prediction of what different video segments are all about (see Figure 74, where the prediction is "riding a horse"):

Figure 74: AutoML Video Intelligence – analyzing a video segment


Training on GCP has different costs depending upon the AutoML solution adopted; for example, training all the solutions presented in this chapter and serving models for testing had a cost of less than 10 dollars at the end of 2019. This is, however, not including the initial 6 hours of free discount that were available for the account (a grand total of less than $150). Depending on your organizational needs, this is likely to work out significantly less than the cost needed to buy expensive on-premises hardware.

The most expensive solutions for my datasets are reported in Figure 75. Of course, your costs may be different according to your specific needs and the models generated:

Figure 75: AutoML – example of costs

Bringing Google AutoML to Kaggle

On November 4th 2019, Google decided to integrate AutoML directly in Kaggle. To get started, you need to link your GCP account from Kaggle and authorize the access. This is easily done from a Kaggle Notebook as explained in Figure 76:

Figure 76: AutoML and Kaggle

The final step consists simply of activating AutoML (see Figure 77):

Figure 77: Activating AutoML from Kaggle


The goal of AutoML is to enable domain experts who are not familiar with machine learning technologies to use ML techniques easily. The primary goal is to reduce the steep learning curve and the huge costs of handcrafting machine learning solutions by making the whole end-to-end machine learning pipeline (data preparation, feature engineering, and automatic model generation) more automated.

After reviewing the state-of-the-art solution available at the end of 2019, we discussed how to use Cloud AutoML for text, videos, and images, achieving results comparable to the ones achieved with handcrafted models. AutoML is probably the fastest growing research topic and the interested reader can understand the latest results at

The next chapter discusses the math behind deep learning, a rather advanced topic that is recommended if you are interested in understanding what is going on "under the hood" when you play with neural networks.


  1. Neural Architecture Search with Reinforcement Learning, Barret Zoph, Quoc V. Le; 2016,
  2. Efficient Neural Architecture Search via Parameter Sharing, Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean, 2018,
  3. Transfer NAS: Knowledge Transfer between Search Spaces with Transformer Agents, Zalán Borsos, Andrey Khorlin, Andrea Gesmundo, 2019,
  4. NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm, Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, Wolfgang Banzhaf, 2018
  5. Random Search for Hyper-Parameter Optimization, James Bergstra, Yoshua Bengio, 2012,
  6. Auto-Keras: An Efficient Neural Architecture Search System, Haifeng Jin, Qingquan Song and Xia Hu, 2019,
  7. Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study, Livia Faes et al, The Lancet Digital Health Volume 1, Issue 5, September 2019, Pages e232-e242