Machine learning (ML) today is a lot different than machine learning of the past. New computing technologies have changed the way it works (not the architecture it’s based on, i.e learning from pattern recognition).
Although numerous ML algorithms have been around for a long time, the ability to carry out complex mathematical calculations in big data and deliver faster and more accurate results is a recent development.
The past couple of years were a good year for the freedom of information, as tech giants like Microsoft, Google, Amazon, Facebook, and even Baidu open-sourced a few of their ML frameworks.
Working within the ML landscape while using the right tools can be very helpful for developers who are trying to build a productive algorithm that taps into its power. We’ve gathered some of the best machine learning tools and resources of this year that will help you seamlessly integrate the power of ML into everyday tasks.
Dimensionality Reduction with the Shogun
Plus Point: Developed with bioinformatics applications in mind, and supports the use of pre-calculated kernels.
Shogun is an open source machine learning library written in C++. It provides a variety of data structures and algorithms for ML problems. It mainly focuses on kernel machines like support vector machines for classification and regression problems.
It supports dozens of algorithms, including Hidden Markov Models, K-Nearest Neighbors, Support vector machines, and Dimensionality reduction algorithms. And, it provides interfaces for Python, Java, C#, Ruby, Octave, Lua, R, and Matlab.
Shogun can process huge datasets containing 10 million samples. A vibrant user community worldwide is currently using this framework as a base for education and research, and contributing to the package.
Sample code for Theano
Plus Point: Efficient symbolic differentiation, extensive unit-testing, and self-verification.
Initially released in 2007, Theano is an open source Python library that helps you define, optimize and evalute mathematical expression efficiently. It is primarily designed to handle tasks that require large neural network algorithms used in deep learning.
It takes your structures and transforms them into efficient code that uses NumPy, an efficient native library. It then complies the code and runs it efficiently on either CPU or GPU architectures.
Theano applies a number of clever code optimizations (such as merge graph, add canonicalization, reduce memory footprint) to extract the maximum performance from your hardware.
10. Apache Mahout
Plus Point: Best option for data scientists, statisticians and mathematicians.
Apache Mahout is a distributed linear algebra framework to produce and implement scalable machine learning algorithms focused primarily on clustering, classification and batch based collaborative filtering. It’s implemented on top of Apache Hadoop using MapReduce paradigm.
Mahout includes matrix and vector libraries, comes with a support for Complementary Naive Bayes and Distributed Naive Bayes classification implementations. Also, it has distributed fitness function capabilities for evolutionary programming.
Several companies such as Twitter, Yahoo, LinkedIn, Foursquare and Facebook are already using this framework internally. Yahoo uses Mahout for pattern mining whereas Twitter uses it for user interest modeling.
Comparing different classifiers in scikit-learn on synthetic data
Plus Point: Relatively easy to use and comes with good tutorials and examples.
What started as a Google Summer of Code is now one of the most popular machine learning library for Python. It features a good selection of algorithms for classification, regression, clustering, model selection, and preprocessing.
Scikit-learn uses Cython (Python to C compiler) to achieve fast performance. And, it works well with Python numerical (NumPy) and scientific (SciPy) libraries SciPy.
If you love to code in Python, Scikit-learn is probably the best option among plain machine learning frameworks. However, if you are working on a large-scale project, we would recommend you to consider other tools.
8. Google ML Kit for Mobile
Plus Point: Offers the features that have long experienced by Google on mobile.
It’s a machine learning framework built for mobile developers to create more engaging and personalized apps. You can use it for image labeling, text recognition, face detection, landmark detection and bar code scanning.
Google will soon integrate a smart reply feature that will provide suggested text snippet based on context. If base APIs do not cover your use cases, you can always upload your own TensorFlow Lite models.
7. Gym And BaseLines by OpenAI
Four legged creature built with Gym
Plus Point: Supports teaching agents, everything from walking to playing games like pinball or pong.
With the aim of promoting and developing safe artificial intelligence, tech billionaire Elon Musk with his buddies started OpenAI, a non-profit AI research organization.
More than 60 full-time researchers are currently working in the organization, and they frequently publish interesting papers on AI capabilities as well as open source software tools.
6. Apple’s Core ML
Plus Point: Optimized for on-device performance
Core ML offers a simple way to integrate trained machine learning models into macOS, iOS and tvOS apps. All you need to do is drop the mlmodel file into your project, and Xcode will automatically create an Objective-C or Swift wrapper class, making it really easy to use the model.
It supports Natural language processing, image classification, word tagging, sentence classification, object tracking, barcode detection, and GameplayKit for evaluation of learned decision trees.
Since the framework is built on top of low-level technologies like Metal and Accelerate, it can leverage both CPUs and GPUs to provide maximum performance.
Moreover, running models strictly on the device ensures privacy and guarantees that the application remains functional when you are not connected to the internet.
Plus Point: Sequential models only require a single line of code for one layer.
Released in 2015, the open source neural network library, Keras focuses on being modular, user-friendly, and extensible. From 2017, Google started supporting Keras in their TensorFlow’s core library.
It has multiple pre-defined layers, arranged into categories: core, locally connected, embedding, normalization, noise, convolutional, pooling and advanced activations. There is also an API for writing layers.
Each layer performs a specific task. Usually, they pass most of the compute-intensive operations to the backend such as Microsoft Cognitive Toolkit or TensorFlow.
Along with standard neural networks, Keras also supports recurrent and convolutional networks. It provides 7 of the common deep learning sample data and 10 well-known models pre-trained against ImageNet.
4. Apache MXNet
Plus Point: Scales to multiple GPUs across multiple hosts with 85% efficiency.
Adopted by Amazon as its primary deep learning framework on AWS, MXNet can scale almost linearly across several GPUs and servers. It is built to be distributed on dynamic cloud infrastructure via a distributed parameter server.
At present, this open-source deep learning framework is supported by Microsoft, Baidu, Intel, and several research institutions like the University of Washington and MIT.
3. Microsoft Cognitive Toolkit (CNTK)
Plus Point: Handles several neural network tasks faster, and has an extensive set of APIs.
The Microsoft Cognitive Toolkit uses directed graphs to describe neural networks as a series of computational steps. This open source framework is developed with sophisticated algorithms (core libraries are written in C++) and production readers to work reliably with large-scale datasets.
It allows developers to realize and merge well-known model types, including recurrent networks, convolutional neural networks, and feed-forward deep neural networks. CNTK modules can handle sparse data or multi-dimensional dense data from C++, Python, and BrainScript.
Moreover, the framework can implement stochastic gradient descent learning in parallel across multiple GPUs and machines, and can fit even the massive-scale models into GPU memory.
Dynamically created graph with PyTorch
Plus Point: Perhaps the best option for projects that need to be up and running in a short time.
PyTorch is an open source ML library for Python based on Caffe2 and Torch. It’s primarily developed by Facebook and mostly used for applications like natural language processing.
The two main feature it provides is Tensor computation with high GPU acceleration and Deep Neural Networks designed for maximum accuracy and flexibility.
It’s not a Python binding into a monolithic C++ framework. PyTorch is developed to be integrated into Python so it can be used with popular packages and libraries like Numba and Cython.
Image credit: Google
Plus Point: Provides abstraction while taking care of the details behind the scene.
Developed by Google Brain Team, TensorFlow is probably the best open source library for complex computation and massive-scale machine learning. It utilizes Python to provide a handy front-end API for creating applications with the framework, and implements all matrix multiplications in C++ to make computations fast.
TensorFlow is capable of training and running deep neural networks for simulations based on partial differential equation, natural language processing, word embedding, image recognition, handwritten digit classification and recurrent neural networks.
If you need to debug and gain introspection into TensorFlow applications, its ‘eager execution’ mode allows you to inspect and modify all graph operations individually rather than creating the whole graph as one object and inspecting it all at once.
There is also a TensorBoard visualization suite that gives you an interactive overview of how graphs run. And of course, all these benefits come with the backing of Google that has made several valuable offering around TensorFlow over the last couple of years.