Early access guide
Hello early-fish!
The following topics should provide you with the basics to start with MLReef:
Early access documentation | Documentation for |
---|---|
Basic concepts | Providing you the MLReef basics. |
Known limitations | What is currently NOT possible? |
Your contribution | A guide to an enriching early access experience. |
Basic concepts
MLReef is a MLOps platform for the entire Machine Learning lifecycle. It is based on four major pillars:
- Access: Gain access to company wide ML content, from data operations to models.
- Efficiency: Fast iteration, clear and structured workflow with full flexibility.
- Reproducibility: Gain confidence and ownership through full transparency in each step.
- Collaboration: Promoting collaboration within a team and beyond the entire MLReef community.
All concepts and functions in MLReef follow these conceptual pillars.
The following content will provide a fast overview of the major concepts in MLReef:
Basic concepts | Documentation for |
---|---|
Maturity | The end-to-end scope of MLReef. |
ML project repositories | Start your project with a Git based repository. |
Processing data | Create a data processing pipeline in MLReef. |
Visualizing data | Create a data visualization pipeline in MLReef. |
Experiments and models | Experiment pipelines and their output. |
Maturity
MLReef has a broad scope and vision and we are constantly iterating on existing and new features. Some stages and features are more mature than others. To convey the state of our feature set and be transparent, we have developed a maturity framework for categories, application types, and stages.
Planned: Not yet implemented in MLReef, but on our roadmap.
Minimal: A basic foundation for testing and validation.
Viable: Used by users and customers to solve real problems.
Complete: Contains a competitive feature set sufficient to displace other single-purpose MLOps tools.
Data management | Experiments | Inference |
---|---|---|
ML project repositories Viable |
Model pipelines Viable |
Automatic REST deployment Planned |
Data visualization Minimal |
Experiment metrics Minimal |
|
Data processing Viable |
Model output Minimal |
|
Dataset Viable |
Cloud execution Viable |
|
Data versioning with Git Viable |
Local execution Minimal |
|
Storage optimization Planned |
Hyperparameter optimization Planned |
|
Data source integrations Planned |
ML project repositories
A ML project is hosted and managed within a Git based repository. Similar to how traditional software development is managed but with additional ML focused functions.
Your data and experiments are stored here. Your different ML pipelines can be created within this repository.
You can either create a new project:
or start working based on the existing one via a fork
or clone
:
For further documentation, visit the ML project repository documentation.
Processing data
Processing data in MLReef is structured through the data processing pipeline and completed through atomic dataset.
The following illustration highlights the workflow of processing data in MLReef:
You can access the data processing pipeline within your data tab
in your ML project repository. To create a processing pipeline, follow these steps:
-
Enter the processing pipeline via the blue button
"DataOps"
in the data tab. -
Select the data you want to process.
By selecting a folder, all containing files will automatically be selected.
The data processing pipeline will not change the folder tree structure and the dataset will have the same structure. This can be specially relevant if your labeled your data through folders.
-
Drag and drop data operations from the right side into the data processing pipeline on the left.
Important note: The order of placing multiple data operations is relevant. The data will flow sub-sequentially through all operations starting by the first.
-
Change parameters of your data operations by expanding the allocated data operations.
A parameter can have specific values, input formats or ranges. MLReef automatically validates your input for errors.
Note: Advanced parameters have a value pre-set and user input is not required.
-
Execute your data processing pipeline by pressing the "execute" button. This will create your dataset.
Data visualizations
Visualizing your data through the data visualization pipeline works very similar to the above described data processing pipeline.
You can access the data visualization pipeline within your data tab
in your ML project repository. To create a data visualization, follow these steps:
-
Enter into the data visualization pipeline via the blue button "data visualization" in data tab.
-
Select the data files you want to visualize.
By selecting a folder, all containing files will automatically be selected.
-
Drag and drop data visualizations from the right side into the data visualization pipeline on the left.
Note: In the data visualization pipeline the order of data visualizations used is NOT relevant. Each data visualization is executed in parallel and they do not have an effect on each other.
-
Change parameters of your data visualization by expanding the allocated data visualizations.
A parameter can have specific values, input formats or ranges. MLReef automatically validates your input for errors.
Note: Advanced parameters have a value pre-set and user input is not required.
-
Execute your data visualization pipeline by pressing the "execute" button. This will create your data visualization.
Experiments and models
MLReef has a built-in experiment environment accessed through the experiment tab
in your ML project repository. In here, you can create new ML experiments with full reproducibility on the underlying data and model.
Currently, one model for image classification is available, see FirstDive: "Cats 'n dogs" classification using Resnet50 for a detailed demo on how to use this model.
You can access the experiment pipeline within your experiment tab
in your ML project repository. To create a data visualization, follow these steps:
-
Create a new experiment in the
experiment tab
. -
Select the data files you want to train your model with.
By selecting a folder, all containing files will automatically be selected.
Currently, the data split for training and validation is handled only by the
resnet 50
model. You can set the validation percentage in the advanced parameters. No test split is currently done. -
Drag and drop the model from the right side into the experiment pipeline on the left.
Note: In the experiment pipeline the order of models used is NOT relevant. Each model training is executed in parallel and they do not have an effect on each other.
-
Change parameters of your model by expanding the allocated model.
A parameter can have specific values, input formats or ranges. MLReef automatically validates your input for errors.
Note: Advanced parameters have a value pre-set and user input is not required.
-
Execute your experiment pipeline by pressing the "execute" button. This will create your experiment entry in your experiment overview page:
Known limitations and problems
MLReef is currently in version alpha 0.3
. The general aim of this alpha is to test the broad range of functions, such as the data repository and the main three pipelines.
There are many things that are still in development and probably many that still don't work.
Here you can find a list of major features and known problems. This list will continuously be updated.
Limitation or problem | Description |
---|---|
All projects are found in "explore projects" tab | New projects and forked projects are always here. Status: doing |
Datasets have a copy of original files | This is due to an error in the config file. Status: doing |
Experiment values seem incorrect | A parsing problem reads only the 5 first epochs. Status: Listed |
Your contribution
You can contribute by:
- Sending an email to mailto:help@mlreef.com - This will automatically create a ticket in our service desk.
- Raising a ticket manually.
- Ask the community through our slack channel: https://mlreefcommunity.slack.com
- Answer our Early Access Evaluation
Thank you from the entire MLReef team!