IBM develops A.I. to speed deep learning architecture selection

IBM develops A.I. to speed deep learning architecture selection

IBM is developing an AI system to help developers choose the right deep learning architecture.

Researcher Martin Wistuba is writing what he calls “an evolutionary algorithm for architecture selection” at the cognitive and computing services giant. The new algorithm is currently up to 50,000 times faster than other methods, with only a small increase in the error rate, he claims.

Deep learning models are applied in many IBM Watson products and services and can perform complex tasks such as visual recognition, text to speech conversion, playing board games, and more. “These models emulate the workings of the human brain, and, like the brain, their architecture is crucial to their function,” wrote Wistuba in a blog post.

Currently, engineers and scientists select the best architecture for a deep learning model from a large set of possible candidates. At present, this is a time-consuming manual process; however, using an automated AI solution to select the neural network can save time and – crucially – enable non-experts to apply deep learning faster, he said.

“My evolutionary algorithm is designed to reduce the search time for the right deep learning architecture to just hours, making the optimisation of deep learning network architecture affordable for everyone.”

So how does it work?

Escaping the cell

Wistuba’s method treats a convolutional neural network architecture as a sequence of ‘neuro-cells,’ then applies a series of mutations in order to find a structure that improves the performance of the network for any given dataset and machine learning task.

This approach substantially shortens network training time, he said. The mutations alter the structure of the network, but don’t change the network’s predictions, and can include adding layers, adding new connections, or widening kernels or layers.

Wistuba compared the new ‘neuro-evolutional’ approach with several other methods in an image classification task, using the CIFAR-10 and CIFAR-100 datasets – image collections commonly used to train machine learning and computer vision systems.

He found that his new algorithm had a slightly higher classification error, but required significantly less time, compared with state-of-the-art human-designed architectures, the results of architecture search methods based on reinforcement learning, and the results of other automated methods based on evolutionary algorithms.

It was up to 50,000 times faster than some other methods, with an error rate “at most 0.6 percent higher than the best competitor on the benchmark dataset CIFAR-10”, according to Wistuba.

He hopes that the new optimisation method will eventually be integrated into IBM’s cloud services – which include Watson and Watson Assistant. Before then, he plans to extend it to larger datasets, like ImageNet, and additional kinds of information, such as time-series and text data.

Internet of Business says

Wistuba will present his work at the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD) in Dublin, Ireland, on 10-14 September.