News

scikit-learn 0.13.1 is available for download. See what's new and tips on installing.

Presentations

Get into the spirit with videos from Scikit-Learn tutorials.

Participate

Fork the source code, join the mailing lists, report bugs to the issue tracker or participate in the next coding sprint. Read More...

Funding

Generous funding provided by INRIA, Google and others.

Citing

If you use the software, please consider citing scikit-learn.

scikit-learn: machine learning in Python

Easy-to-use and general-purpose machine learning in Python

Scikit-learn integrates machine learning algorithms in the tightly-knit scientific Python world, building upon numpy, scipy, and matplotlib. As a machine-learning module, it provides versatile tools for data mining and analysis in any field of science and engineering. It strives to be simple and efficient, accessible to everybody, and reusable in various contexts.

Supervised learning
Support vector machines, linear models, naive Bayes, Gaussian processes...

Unsupervised learning
Clustering, Gaussian mixture models, manifold learning, matrix factorization, covariance...

And much more
Model selection, datasets, feature extraction... See below.

License: Open source, commercially usable: BSD license (3 clause)

Documentation for scikit-learn version 0.13.1. For other versions and printable format, see Documentation resources.

User Guide¶

1. Installing scikit-learn
- 1.1. Installing an official release
  - 1.1.1. Getting the dependencies
    - 1.1.1.1. Easy install
    - 1.1.1.2. From source package
  - 1.1.2. Windows installer
  - 1.1.3. Building on windows
- 1.2. Third party distributions of scikit-learn
  - 1.2.1. Debian and derivatives (Ubuntu)
  - 1.2.2. Python(x, y)
  - 1.2.3. Enthought Python distribution
  - 1.2.4. Macports
  - 1.2.5. Archlinux
  - 1.2.6. NetBSD
- 1.3. Bleeding Edge
- 1.4. Testing
2. Tutorials: From the bottom up with scikit-learn
- 1. An introduction to machine learning with scikit-learn
  - 1.1. Machine learning: the problem setting
  - 1.2. Loading an example dataset
  - 1.3. Learning and predicting
  - 1.4. Model persistence
- 2.2. A tutorial on statistical-learning for scientific data processing
  - 2.2.1. Statistical learning: the setting and the estimator object in the scikit-learn
    - 2.2.1.1. Datasets
    - 2.2.1.2. Estimators objects
  - 2.2.2. Supervised learning: predicting an output variable from high-dimensional observations
    - 2.2.2.1. Nearest neighbor and the curse of dimensionality
      - 2.2.2.1.1. k-Nearest neighbors classifier
      - 2.2.2.1.2. The curse of dimensionality
    - 2.2.2.2. Linear model: from regression to sparsity
      - 2.2.2.2.1. Linear regression
      - 2.2.2.2.2. Shrinkage
      - 2.2.2.2.3. Sparsity
      - 2.2.2.2.4. Classification
    - 2.2.2.3. Support vector machines (SVMs)
      - 2.2.2.3.1. Linear SVMs
      - 2.2.2.3.2. Using kernels
  - 2.2.3. Model selection: choosing estimators and their parameters
    - 2.2.3.1. Score, and cross-validated scores
    - 2.2.3.2. Cross-validation generators
    - 2.2.3.3. Grid-search and cross-validated estimators
      - 2.2.3.3.1. Grid-search
      - 2.2.3.3.2. Cross-validated estimators
  - 2.2.4. Unsupervised learning: seeking representations of the data
    - 2.2.4.1. Clustering: grouping observations together
      - 2.2.4.1.1. K-means clustering
      - 2.2.4.1.2. Hierarchical agglomerative clustering: Ward
        
        2.2.4.1.2.1. Connectivity-constrained clustering
        
        2.2.4.1.2.2. Feature agglomeration
    - 2.2.4.2. Decompositions: from a signal to components and loadings
      - 2.2.4.2.1. Principal component analysis: PCA
      - 2.2.4.2.2. Independent Component Analysis: ICA
  - 2.2.5. Putting it all together
    - 2.2.5.1. Pipelining
    - 2.2.5.2. Face recognition with eigenfaces
    - 2.2.5.3. Open problem: Stock Market Structure
  - 2.2.6. Finding help
    - 2.2.6.1. The project mailing list
    - 2.2.6.2. Q&A communities with Machine Learning practitioners
3. Supervised learning
- 3.1. Generalized Linear Models
  - 3.1.1. Ordinary Least Squares
    - 3.1.1.1. Ordinary Least Squares Complexity
  - 3.1.2. Ridge Regression
    - 3.1.2.1. Ridge Complexity
    - 3.1.2.2. Setting the regularization parameter: generalized Cross-Validation
  - 3.1.3. Lasso
    - 3.1.3.1. Setting regularization parameter
      - 3.1.3.1.1. Using cross-validation
      - 3.1.3.1.2. Information-criteria based model selection
  - 3.1.4. Elastic Net
  - 3.1.5. Multi-task Lasso
  - 3.1.6. Least Angle Regression
  - 3.1.7. LARS Lasso
    - 3.1.7.1. Mathematical formulation
  - 3.1.8. Orthogonal Matching Pursuit (OMP)
  - 3.1.9. Bayesian Regression
    - 3.1.9.1. Bayesian Ridge Regression
    - 3.1.9.2. Automatic Relevance Determination - ARD
  - 3.1.10. Logistic regression
  - 3.1.11. Stochastic Gradient Descent - SGD
  - 3.1.12. Perceptron
  - 3.1.13. Passive Aggressive Algorithms
- 3.2. Support Vector Machines
  - 3.2.1. Classification
    - 3.2.1.1. Multi-class classification
    - 3.2.1.2. Unbalanced problems
  - 3.2.2. Regression
  - 3.2.3. Density estimation, novelty detection
  - 3.2.4. Complexity
  - 3.2.5. Tips on Practical Use
  - 3.2.6. Kernel functions
    - 3.2.6.1. Custom Kernels
      - 3.2.6.1.1. Using Python functions as kernels
      - 3.2.6.1.2. Using the Gram matrix
      - 3.2.6.1.3. Parameters of the RBF Kernel
  - 3.2.7. Mathematical formulation
    - 3.2.7.1. SVC
    - 3.2.7.2. NuSVC
  - 3.2.8. Implementation details
- 3.3. Stochastic Gradient Descent
  - 3.3.1. Classification
  - 3.3.2. Regression
  - 3.3.3. Stochastic Gradient Descent for sparse data
  - 3.3.4. Complexity
  - 3.3.5. Tips on Practical Use
  - 3.3.6. Mathematical formulation
    - 3.3.6.1. SGD
  - 3.3.7. Implementation details
- 3.4. Nearest Neighbors
  - 3.4.1. Unsupervised Nearest Neighbors
  - 3.4.2. Nearest Neighbors Classification
  - 3.4.3. Nearest Neighbors Regression
  - 3.4.4. Nearest Neighbor Algorithms
    - 3.4.4.1. Brute Force
    - 3.4.4.2. K-D Tree
    - 3.4.4.3. Ball Tree
    - 3.4.4.4. Choice of Nearest Neighbors Algorithm
    - 3.4.4.5. Effect of leaf_size
  - 3.4.5. Nearest Centroid Classifier
    - 3.4.5.1. Nearest Shrunken Centroid
- 3.5. Gaussian Processes
  - 3.5.1. Examples
    - 3.5.1.1. An introductory regression example
    - 3.5.1.2. Fitting Noisy Data
  - 3.5.2. Mathematical formulation
    - 3.5.2.1. The initial assumption
    - 3.5.2.2. The best linear unbiased prediction (BLUP)
    - 3.5.2.3. The empirical best linear unbiased predictor (EBLUP)
  - 3.5.3. Correlation Models
  - 3.5.4. Regression Models
  - 3.5.5. Implementation details
- 3.6. Partial Least Squares
- 3.7. Naive Bayes
  - 3.7.1. Gaussian Naive Bayes
  - 3.7.2. Multinomial Naive Bayes
  - 3.7.3. Bernoulli Naive Bayes
- 3.8. Decision Trees
  - 3.8.1. Classification
  - 3.8.2. Regression
  - 3.8.3. Multi-output problems
  - 3.8.4. Complexity
  - 3.8.5. Tips on practical use
  - 3.8.6. Tree algorithms: ID3, C4.5, C5.0 and CART
  - 3.8.7. Mathematical formulation
    - 3.8.7.1. Classification criteria
    - 3.8.7.2. Regression criteria
- 3.9. Ensemble methods
  - 3.9.1. Forests of randomized trees
    - 3.9.1.1. Random Forests
    - 3.9.1.2. Extremely Randomized Trees
    - 3.9.1.3. Parameters
    - 3.9.1.4. Parallelization
    - 3.9.1.5. Feature importance evaluation
    - 3.9.1.6. Totally Random Trees Embedding
  - 3.9.2. Gradient Tree Boosting
    - 3.9.2.1. Classification
    - 3.9.2.2. Regression
    - 3.9.2.3. Mathematical formulation
      - 3.9.2.3.1. Loss Functions
    - 3.9.2.4. Regularization
      - 3.9.2.4.1. Shrinkage
      - 3.9.2.4.2. Subsampling
    - 3.9.2.5. Interpretation
      - 3.9.2.5.1. Feature importance
      - 3.9.2.5.2. Partial dependence
- 3.10. Multiclass and multilabel algorithms
  - 3.10.1. One-Vs-The-Rest
    - 3.10.1.1. Multilabel learning with OvR
  - 3.10.2. One-Vs-One
  - 3.10.3. Error-Correcting Output-Codes
- 3.11. Feature selection
  - 3.11.1. Univariate feature selection
  - 3.11.2. Recursive feature elimination
  - 3.11.3. L1-based feature selection
    - 3.11.3.1. Selecting non-zero coefficients
    - 3.11.3.2. Randomized sparse models
  - 3.11.4. Tree-based feature selection
- 3.12. Semi-Supervised
  - 3.12.1. Label Propagation
- 3.13. Linear and Quadratic Discriminant Analysis
  - 3.13.1. Dimensionality Reduction using LDA
  - 3.13.2. Mathematical Idea
- 3.14. Isotonic regression
4. Unsupervised learning
- 4.1. Gaussian mixture models
  - 4.1.1. GMM classifier
    - 4.1.1.1. Pros and cons of class GMM: expectation-maximization inference
      - 4.1.1.1.1. Pros
      - 4.1.1.1.2. Cons
    - 4.1.1.2. Selecting the number of components in a classical GMM
    - 4.1.1.3. Estimation algorithm Expectation-maximization
  - 4.1.2. VBGMM classifier: variational Gaussian mixtures
    - 4.1.2.1. Pros and cons of class VBGMM: variational inference
      - 4.1.2.1.1. Pros
      - 4.1.2.1.2. Cons
    - 4.1.2.2. Estimation algorithm: variational inference
  - 4.1.3. DPGMM classifier: Infinite Gaussian mixtures
    - 4.1.3.1. Pros and cons of class DPGMM: Diriclet process mixture model
      - 4.1.3.1.1. Pros
      - 4.1.3.1.2. Cons
    - 4.1.3.2. The Dirichlet Process
- 4.2. Manifold learning
  - 4.2.1. Introduction
  - 4.2.2. Isomap
    - 4.2.2.1. Complexity
  - 4.2.3. Locally Linear Embedding
    - 4.2.3.1. Complexity
  - 4.2.4. Modified Locally Linear Embedding
    - 4.2.4.1. Complexity
  - 4.2.5. Hessian Eigenmapping
    - 4.2.5.1. Complexity
  - 4.2.6. Spectral Embedding
    - 4.2.6.1. Complexity
  - 4.2.7. Local Tangent Space Alignment
    - 4.2.7.1. Complexity
  - 4.2.8. Multi-dimensional Scaling (MDS)
    - 4.2.8.1. Metric MDS
    - 4.2.8.2. Nonmetric MDS
  - 4.2.9. Tips on practical use
- 4.3. Clustering
  - 4.3.1. Overview of clustering methods
  - 4.3.2. K-means
    - 4.3.2.1. Mini Batch K-Means
  - 4.3.3. Affinity Propagation
  - 4.3.4. Mean Shift
  - 4.3.5. Spectral clustering
    - 4.3.5.1. Different label assignement strategies
  - 4.3.6. Hierarchical clustering