We welcome community contributions to MLflow. This page provides useful information about contributing to MLflow.
Table of Contents
Governance of MLflow is conducted by the Technical Steering Committee (TSC), which currently includes the following members:
- Patrick Wendell ([email protected])
- Reynold Xin ([email protected])
- Matei Zaharia ([email protected])
The founding technical charter can be found here.
The MLflow contribution process starts with filing a GitHub issue. MLflow defines four categories of issues: feature requests, bug reports, documentation fixes, and installation issues. Details about each issue type and the issue lifecycle are discussed in the MLflow Issue Policy.
MLflow committers actively triage and respond to GitHub issues. In general, we
recommend waiting for feebdack from an MLflow committer or community member before proceeding to
implement a feature or patch. This is particularly important for
significant changes,
and will typically be labeled during triage with needs design
.
After you have agreed upon an implementation strategy for your feature or patch with an MLflow committer, the next step is to introduce your changes (see developing changes) as a pull request against the MLflow Repository or as a standalone MLflow Plugin. MLflow committers actively review pull requests and are also happy to provide implementation guidance for Plugins.
Once your pull request against the MLflow Repository has been merged, your corresponding changes will be automatically included in the next MLflow release. Every change is listed in the MLflow release notes and Changelog. Congratulations, you have just contributed to MLflow! We appreciate your contribution!
In this section, we provide guidelines to consider as you develop new features and patches for MLflow.
For significant changes to MLflow, we recommend outlining a design for the feature or patch and discussing it with
an MLflow committer before investing heavily in implementation. During issue triage, we try to proactively
identify issues require design by labeling them with needs design
. This is particularly important if your
proposed implementation:
- Introduces changes or additions to the MLflow REST API
- The MLflow REST API is implemented by a variety of open source and proprietary platforms. Changes to the REST API impact all of these platforms. Accordingly, we encourage developers to thoroughly explore alternatives before attempting to introduce REST API changes.
- Introduces new user-facing MLflow APIs
- MLflow's API surface is carefully designed to generalize across a variety of common ML operations. It is important to ensure that new APIs are broadly useful to ML developers, easy to work with, and simple yet powerful.
- Adds new library dependencies to MLflow
- Makes changes to critical internal abstractions. Examples include: the Tracking Artifact Repository, the Tracking Abstract Store, and the Model Registry Abstract Store.
MLflow's users rely on specific platform and API behaviors in their daily workflows. As new versions of MLflow are developed and released, it is important to ensure that users' workflows continue to operate as expected. Accordingly, please take care to consider backwards compatibility when introducing changes to the MLflow code base. If you are unsure of the backwards compatibility implications of a particular change, feel free to ask an MLflow committer or community member for input.
MLflow Plugins enable integration of third-party modules with many of MLflow’s components, allowing you to maintain and iterate on certain features independently of the MLflow Repository. Before implementing changes to the MLflow code base, consider whether your feature might be better structured as an MLflow Plugin. MLflow Plugins are a great choice for the following types of changes:
- Supporting a new storage platform for MLflow artifacts
- Introducing a new implementation of the MLflow Tracking backend (Abstract Store) for a particular platform
- Introducing a new implementation of the Model Registry backend (Abstract Store) for a particular platform
- Automatically capturing and recording information about MLflow Runs created in specific environments
MLflow committers and community members are happy to provide assistance with the development and review of new MLflow Plugins.
Finally, MLflow maintains a list of Plugins developed by community members, which is located at https://mlflow.org/docs/latest/plugins.html#community-plugins. This is an excellent way to inform MLflow users about your exciting new Plugins. To list your plugin, simply introduce a new pull request against the corresponding docs section of the MLflow code base.
For more information about Plugins, see https://mlflow.org/docs/latest/plugins.html.
The majority of the MLflow codebase is developed in Python. This includes the CLI, Tracking Server, Artifact Repositories (e.g., S3 or Azure Blob Storage backends), and of course the Python fluent, tracking, and model APIs.
First, ensure that your name and email are configured in git so that you can sign your work when committing code changes and opening pull requests:
git config --global user.name "Your Name"
git config --global user.email [email protected]
For convenience, we provide a pre-commit git hook that validates that commits are signed-off. Enable it by running:
git config core.hooksPath hooks
Then, install the Python MLflow package from source - this is required for developing & testing changes across all languages and APIs. We recommend installing MLflow in its own conda environment by running the following from your checkout of MLflow:
conda create --name mlflow-dev-env python=3.6
source activate mlflow-dev-env
pip install -r dev-requirements.txt
pip install -r test-requirements.txt
pip install -e . # installs mlflow from current checkout
You may need to run conda install cmake
for the test requirements to properly install, as onnx
needs cmake
.
Ensure Docker is installed.
npm
is required to run the Javascript dev server and the tracking UI.
You can verify that npm
is on the PATH by running npm -v
, and
install npm if needed.
If contributing to MLflow's R APIs, install R. For changes to R
documentation, also install pandoc 2.2.1 or above,
verifying the version of your installation via pandoc --version
. If using Mac OSX, note that
the homebrew installation of pandoc may be out of date - you can find newer pandoc versions at
https://github.com/jgm/pandoc/releases.
If contributing to MLflow's Java APIs or modifying Java documentation, install Java and Apache Maven.
Before running the Javascript dev server or building a distributable wheel, install Javascript dependencies via:
cd mlflow/server/js
npm install
cd - # return to root repository directory
If modifying dependencies in mlflow/server/js/package.json
, run npm update
within
mlflow/server/js
to install the updated dependencies.
Certain MLflow modules are implemented in Java, under the mlflow/java/
directory.
These are the Java Tracking API client (mlflow/java/client
) and the Model Scoring Server
for Java-based models like MLeap (mlflow/java/scoring
).
Other Java functionality (like artifact storage) depends on the Python package, so first install the Python package in a conda environment as described above. Install the Java 8 JDK (or above), and download and install Maven. You can then build and run tests via:
cd mlflow/java
mvn compile test
If opening a PR that makes API changes, please regenerate API documentation as described in Writing Docs and commit the updated docs to your PR branch.
The mlflow/R/mlflow
directory contains R wrappers for the Projects, Tracking and Models
components. These wrappers depend on the Python package, so first install
the Python package in a conda environment:
# Note that we don't pass the -e flag to pip, as the R tests attempt to run the MLflow UI
# via the CLI, which will not work if we run against the development tracking server
pip install .
Install R, then run the following to install dependencies for building MLflow locally:
cd mlflow/R/mlflow
NOT_CRAN=true Rscript -e 'install.packages("devtools", repos = "https://cloud.r-project.org")'
NOT_CRAN=true Rscript -e 'devtools::install_deps(dependencies = TRUE)'
Build the R client via:
R CMD build .
Run tests:
R CMD check --no-build-vignettes --no-manual --no-tests mlflow*tar.gz
cd tests
NOT_CRAN=true LINTR_COMMENT_BOT=false Rscript ../.travis.R
cd -
Run linter:
Rscript -e 'lintr::lint_package()'
If opening a PR that makes API changes, please regenerate API documentation as described in Writing Docs and commit the updated docs to your PR branch.
When developing, you can make Python changes available in R by running (from mlflow/R/mlflow):
Rscript -e 'reticulate::conda_install("r-mlflow", "../../../.", pip = TRUE)'
Please also follow the recommendations from the Advanced R - Style Guide regarding naming and styling.
Verify that the unit tests & linter pass before submitting a pull request by running:
./lint.sh
./travis/run-small-python-tests.sh
# Optionally, run large tests as well. Travis will run large tests on your pull request once
# small tests pass. Note: models and model deployment tests are considered "large" tests. If
# making changes to these components, we recommend running the relevant tests (e.g. tests under
# tests/keras for changes to Keras model support) locally before submitting a pull request.
./travis/run-large-python-tests.sh
Python tests are split into "small" & "large" categories, with new tests falling into the "small" category by default. Tests that take 10 or more seconds to run should be marked as large tests via the @pytest.mark.large annotation. Dependencies for small and large tests can be added to travis/small-requirements.txt and travis/large-requirements.txt, respectively.
We use pytest to run Python tests.
You can run tests for one or more test directories or files via
pytest [--large] [file_or_dir] ... [file_or_dir]
, where specifying --large
tells pytest to
run tests annotated with @pytest.mark.large. For example, to run all pyfunc tests
(including large tests), you can run:
pytest tests/pyfunc --large
Note: Certain model tests are not well-isolated (can result in OOMs when run in the same Python
process), so simply invoking pytest
or pytest tests
may not work. If you'd like to
run multiple model tests, we recommend doing so via separate pytest
invocations, e.g.
pytest --verbose tests/sklearn --large && pytest --verbose tests/tensorflow --large
Note also that some tests do not run as part of PR builds on Travis. In particular, PR builds exclude:
- Tests marked with @pytest.mark.requires_ssh. These tests require that passwordless SSH access to localhost be enabled, and can be run via
pytest --requires-ssh
.- Tests marked with @pytest.mark.release. These tests can be run via
pytest --release
.
In addition, the tests in tests/examples
are run as part of a nightly build on Travis and will
not run on Travis jobs triggered by push requests. If your PR changes anything tested by the tests
or the tests themselves, Travis will detect this and run the nightly tests automatically with the
regular build.
If you need to retrigger Travis tests on a PR, you can push an empty commit to your branch. To create
an empty commit, you can use the --allow-empty` option, e.g.
``git commit --allow-empty -m "Trigger rebuild"
. Note that this will retrigger an entire rebuild -
it is currently not possible to retrigger individual tests.
If opening a PR that changes or adds new APIs, please update or add Python documentation as described in Writing Docs and commit the docs to your PR branch.
If your PR includes code that isn't currently covered by our tests (e.g. adding a new flavor, adding
autolog support to a flavor, etc.), you should write tests that cover your new code. MLflow currently
uses pytest==3.2.1
for testing. Your tests should be added to the relevant file under tests
, or
if there is no appropriate file, in a new file prefixed with test_
so that pytest
includes that
file for testing.
If your tests require usage of a tracking URI, the pytest fixture tracking_uri_mock is automatically set up for every tests. It sets up a mock tracking URI that will set itself up before your test runs and tear itself down after. If you want to deactivate the mock for your test, mark the test with @pytest.mark.notrackingurimock operator.
If you are adding new framework flavor support, you'll need to modify pytest
and Travis configurations so tests for your code can run properly. Generally, the files you'll have to edit are:
.travis.yml
: exclude your tests in the Windows bash scripttravis/run-small-python-tests.sh
: add your tests to the list of ignored framework teststravis/run-large-python-tests.sh
:
- Add your tests to the ignore list, where the other frameworks are ignored
- Add a pytest command for your tests along with the other framework tests (as a separate command to avoid OOM issues)
travis/large-requirements.txt
: add your framework and version to the list of requirements
You can see an example flavor PR here.
To build protobuf files, simply run generate-protos.sh
. The required protoc
version is 3.6.0
.
You can find the URL of a system-appropriate installation of protoc
at
https://github.com/protocolbuffers/protobuf/releases/tag/v3.6.0, e.g.
https://github.com/protocolbuffers/protobuf/releases/download/v3.6.0/protoc-3.6.0-osx-x86_64.zip if
you're on 64-bit Mac OSX.
Then, run the following to install protoc
:
# Update PROTOC_ZIP if on a platform other than 64-bit Mac OSX
PROTOC_ZIP=protoc-3.6.0-osx-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v3.6.0/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
Verify that .proto files and autogenerated code are in sync by running ./test-generate-protos.sh.
MLflow's Tracking component supports storing experiment and run data in a SQL backend. To make changes to the tracking database schema, run the following from your checkout of MLflow:
# starting at the root of the project
$ pwd
~/mlflow
$ cd mlflow
# MLflow relies on Alembic (https://alembic.sqlalchemy.org) for schema migrations.
$ alembic -c mlflow/store/db_migrations/alembic.ini revision -m "add new field to db"
Generating ~/mlflow/mlflow/store/db_migrations/versions/b446d3984cfa_add_new_field_to_db.py
These commands generate a new migration script (e.g. at
~/mlflow/mlflow/alembic/versions/12341123_add_new_field_to_db.py
) that you should then edit to add
migration logic.
We recommend Running the Javascript Dev Server - otherwise, the tracking frontend will request
files in the mlflow/server/js/build
directory, which is not checked into Git.
Alternatively, you can generate the necessary files in mlflow/server/js/build
as described in
Building a Distributable Artifact.
Install Node Modules, then run the following:
In one shell:
mlflow ui
In another shell:
cd mlflow/server/js
npm start
The MLflow Tracking UI will show runs logged in ./mlruns
at http://localhost:3000.
Install Node Modules, then run the following:
Generate JS files in mlflow/server/js/build
:
cd mlflow/server/js
npm run build
Build a pip-installable wheel in dist/
:
cd -
python setup.py bdist_wheel
First, install dependencies for building docs as described in Prerequisites.
To generate a live preview of Python & other rst documentation, run the following snippet. Note that R & Java API docs must be regenerated separately after each change and are not live-updated; see subsequent sections for instructions on generating R and Java docs.
cd docs
make livehtml
Generate R API rst doc files via:
cd docs
make rdocs
Generate Java API rst doc files via:
cd docs
make javadocs
Generate API docs for all languages via:
cd docs
make html
If changing existing Python APIs or adding new APIs under existing modules, ensure that references
to the modified APIs are updated in existing docs under docs/source
. Note that the Python doc
generation process will automatically produce updated API docs, but you should still audit for
usages of the modified APIs in guides and examples.
If adding a new public Python module, create a corresponding doc file for the module under
docs/source/python_api
- see here
for an example.
In order to commit your work, you need to sign that you wrote the patch or otherwise have the right to pass it on as an open-source patch. If you can certify the below (from developercertificate.org):
Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.
Then add a line to every git commit message:
Signed-off-by: Jane Smith <[email protected]>
Use your real name (sorry, no pseudonyms or anonymous contributions). You can sign your commit
automatically with git commit -s
after you set your user.name
and user.email
git configs.