Scikit Learn 0.20.0 is here!

Yesterday, the Scikit Learn community released the version 0.20.0 of Scikit-learn, a popular machine learning library for Python. Scikit learn 0.20.0 explores new features and enhancements for the Scikit-learn library.

Scikit-learn is one of the most popular open source machine learning libraries for Python. It provides algorithms for machine learning tasks such as classification, regression, dimensionality reduction, and clustering. It also offers modules for extracting features, processing data, and evaluating models.

Major features in Scikit Learn 0.20.0.

New Features

There’s a new impute module in Scikit Learn 0.20.0 that offers estimators for learning despite missing data.

String or pandas Categorical Columns in Scikit Learn 0.20.0 can now be encoded with OneHotEncoder or OrdinalEncoder.

PowerTransformer and KBinsDiscretizer join QuantileTransformer now as non-linear transformations.

A sample_weight support has been added to several estimators (which includes KMeans, BayesianRidge and KernelDensity).

This is the first release that comprises Glossary of Common Terms and API Elements developed by Joel Nothman.

Other changes

There are a lot of changes made in sklearn.cluster, sklearn.compose, sklearn.covariance, sklearn.datasets, sklearn.decomposition, etc., in Scikit Learn 0.20.0. Let’s have a look at them in detail.

sklearn.cluster

The cluster.AgglomerativeClustering feature now supports Single Linkage clustering via linkage='single'.

The cluster.KMeans and cluster.MiniBatchKMeans features support sample weights through new parameter sample_weight in fit function.

The cluster.KMeans, cluster.MiniBatchKMeans and cluster.k_means passed with algorithm='full' will now be enforcing row-major ordering, and improve runtime.

sklearn.compose

A compose.ColumnTransformer is a new feature that applies different transformers to different columns of arrays or pandas DataFrames.

The compose.TransformedTargetRegressor has been added in this Scikit Learn version, which transforms the target y before fitting a regression model.

sklearn.covariance

The covariance.graph_lasso, covariance.GraphLasso and covariance.GraphLassoCV have now been renamed to covariance.graphical_lasso, covariance.GraphicalLasso and covariance.GraphicalLassoCV. It will be finally be removed in version 0.22.

sklearn.datasets

The datasets.fetch_openml has been added to fetch datasets from OpenML,a free, open data sharing platform.

In datasets.make_blobs, you can now pass a list to the n_samples parameter. This helps indicate the number of samples to generate per cluster.

The filename attribute has been added to datasets that have a CSV file.

Another new feature return_X_y parameter has also been added to several dataset loaders.

sklearn.decomposition

The decomposition.dict_learning functions and models now offer support for positivity constraints. This applies to the dictionary and sparse code.