Apache MADlib: Big Data Machine Learning in SQL

  • Open source, commercially friendly Apache license
  • For PostgreSQL and Greenplum Database®
  • Powerful machine learning, graph, statistics and analytics for data scientists

Read More

Getting Started with Apache MADlib using Jupyter Notebooks

We have created a library of Jupyter Notebooks to help you get started quickly with MADlib. It includes many commonly used algorithms by data scientists.

 

MADlib 1.16 Release

On July 8, 2019, MADlib completed its sixth release as an Apache Software Foundation Top Level Project.

New features include:

  • Deep learning - Early stage support for Keras with Tensorflow backend with GPU acceleration. Focus on image classification use cases.

  • Deep learning utilities - Load model architectures and weights, parallel loading of images from NumPy arrays or file system, preprocess images for gradient descent optimization algorithms.

  • Greenplum 6 support.

  • PostgreSQL 11 support.

Improvements:

  • K-nearest neighbors - Improve performance with kd-tree approximate method.

  • Association rules - Set default maximum itemset rules to 10 to reduce runtime.

You are invited to download the 1.16 release and review the release notes. For more details about the new deep learning feature, please refer to the Apache MADlib deep learning notes and the Jupyter notebook examples.

 

MADlib 1.15.1 Release

On Oct 15, 2018, MADlib completed its fifth release as an Apache Software Foundation Top Level Project.

New features include: Ubuntu 16.04 support.

Improvements:

  • Elastic net - Support grouping by non-numeric columns.

  • K-nearest neighbors - Accept expressions for points.

  • Vec2cols - Allow arrays of different lengths.

You are invited to download the 1.15.1 release and review the release notes.

 

MADlib 1.15 Release

On Aug 10, 2018, MADlib completed its fourth release as an Apache Software Foundation Top Level Project.

New features include: Utilities - Columns to vector, vector to columns, drop columns.

Improvements:

  • Multilayer perceptron - Added momentum and Nesterov's accelerated gradient methods to gradient updates.

  • Statistics - Added grouping support to correlation and covariance.

  • Decision tree/random forest - Added impurity variable importance.

  • Decision tree/random forest - Added new helper function to report variable importance values in a more readable way.

  • Install - Refactored and updated the madpack installation and upgrade tool.

You are invited to download the 1.15 release and review the release notes.

 

MADlib 1.14 Release

On May 1, 2018, MADlib completed its third release as an Apache Software Foundation Top Level Project.

New features include: Balanced datasets, personalized PageRank, mini-batch optimizer for multilayer perceptron neural networks (and associated pre-processor function), PostgreSQL 10.2 support.

Improvements:

  • K-nearest neighbors - Added weighted averaging/voting by distance.

  • Summary - Added more statistics including number of positive, negative, zero values and 95% confidence intervals.

  • Multilayer perceptron - Added support for one-hot encoded categorical dependent variable for classification.

You are invited to download the 1.14 release and review the release notes.

 

MADlib Graduates to Apache Top Level Project

On July 19, 2017, the ASF board established Apache MADlib as a Top Level Project, which was approved by unanimous vote of the directors present. Please see the associated press release from the ASF.

MADlib entered incubation in the fall of 2015 and made five releases as an incubating project. Along the way, the MADlib community has worked hard to ensure that the project is being developed according to the principles of the  The Apache Way. We will continue to do so in the future as a TLP, to the best of our ability.

Thank you to all who have contributed to the project so far, and we look forward more innovation in machine learning in the future as a TLP!