Can IBM Take Machine Learning Mainstream?
It is clear that Watson, IBM’s cognitive engine, is becoming embedded in just about every tool and application that IBM is selling these days. In its most recent announcement, IBM is now making the Watson Machine Learning (ML) engine available on the mainframe in order to bring machine learning to transactional databases.
This ML engine will be available both for on premises use and as a private Cloud implementation. IBM’s announcement had a vast array of capabilities too numerous to mention in a single blog, I would like to mention some of the aspects that I found especially important.
The following are my five top take aways.
- While Watson has commonly known to support unstructured data in Natural Language Processing (NLP) solutions, the underlying engine provides sophisticated machine learning engine. This machine learning engine is applicable to advanced analytics on structured data as well as unstructured data.
- Mainframe transactional databases are the mainstay of many of IBM’s large financial services, retail, and airline customers. Given the complexity, value and the scale of this data it makes sense to provide advanced analytics based on machine learning to support the need to better understand this data. Being able to detect patterns and anomalies in this data provides a valuable tool for customers. Equally important is the ability to execute these advanced algorithms close to the data rather than moving the data to an external platform.
- One of the interesting capabilities of the machine learning engine is its ability to provide productivity assistance to the developers. Experienced data scientists are expert at building models and selecting the right algorithms. However, there are simply not enough data scientists. The benefit of being able to provide a cognitive assistance to help a less experienced developer take advantage of machine learning is extremely important. IBM’s Machine Learning engine provides this capability. In essence, once a model is built, the system learns from the ingested data and recommends an algorithm that best matches the task. Once the algorithm is trained on the data, the system may suggest an alternative algorithm. Being able to provide the developer with help in selecting the most effective algorithm, or part of an algorithm, will make machine learning much more approachable. Through Cognitive Assist for Data Science (CADS), the system sends the testing data to all the 200 algorithms and starts calculating to determine which algorithm or combinations of algorithms provides the highest score and reliability.
- Flexibility and productivity are important aspects of IBM’s announcement. First, the engine enables developers to use the tools they are already familiar with and have made investments in. For example, developers can take advantage of the 55 SPSS algorithms. This is especially important for the large number of organizations that have used SPSS for years. In addition, the ML engine supports many of the languages widely used for machine learning including R, Java, Scala, and Python. Developers have a choice of execution engines including Hadoop and Spark.
- Improving on the user experience is another aspect of the ML engine. IBM has invested in creating a dashboard interface called the visual model builder that assists developers in building models – one of the most complex aspects of machine learning.
It is not a surprise that IBM would make its ML engine available first on the System z. Customers are reluctant to move their crown jewels of data off the mainframe onto other platforms – yet the requirement to bring advanced analytics to this core data is going more urgent. IBM plans to bring this same engine to the Power System in the near future. I liked the pragmatism of this approach to applying cognitive computing and machine learning to complex transactional data. Combine this will the ability to reuse SPSS, open source tools, languages, and algorithms should make this offering attractive. In addition, the ability to combine analytics on a combination of structured and unstructured data will be important element that will have important advantages for customers.