Machine/deep learning: debunking the hype

Machine/deep learning: debunking the hype

Both machine and deep learning are very hot topics. However, at heart, neither is particularly revolutionary.

Machine learning is essentially about self-training data mining. Data mining algorithms and the tools (SPSS, SAS, Statistica and so on) to build them have been around for thirty or forty years. What used to happen was that what we now call a data scientist had a problem to solve such as making better recommendations, identifying fraud and so on. The data scientist would deploy various data mining algorithms against the problem set, train them (that is, feed them with lots of relevant data) and determine which algorithm best suited the problem at hand, and then that model would be deployed. Best practice meant that because things like buying patterns change over time, then the data scientist would revisit the problem set on a periodic basis to ensure that this algorithm remained the best fit and either to update the algorithm or replace it, as appropriate.

What machine learning does is to automate the process of improving the existing deployed algorithm. Best practice would now mean periodically checking that this is still the best algorithm but no longer requires checking that the algorithm is performing optimally.

From a business perspective this is very important. It means that recommendation engines gradually get better over time. It means that false positives and false negatives (whether in fraud detection or other environments such as name and address matching) incrementally improve.

Deep learning, in effect, goes one step further, by automating the process of creating the best algorithm for the task in hand. This doesn’t actually do anything directly for the business that machine learning does not: for example, it does not reduce the rate of false positives any more than a well-designed machine learning algorithm might do. What it does do is to remove the need to develop and test multiple algorithms to see which is the best fit against the problem dataset. In other words, to a large extent it removes the data scientist from the equation. Taken to its logical conclusion this means that deep learning will ultimately automate the role of the data scientist out of existence.

Have Your Say: