paint-brush
A New Hope for ML Experimentationby@yashnayak
1,553 reads
1,553 reads

A New Hope for ML Experimentation

by Yashaswi NayakJuly 4th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

DVC VSCode Extension combines the power of DVC commands for data management, versioning and experimentation with the sleek elegant coding experience of Visual Studio Code. DVC is an excellent tool to track your experiments, models and related artifacts, but it’s a CLI - which many in the data science community might not be comfortable or familiar with.
featured image - A New Hope for ML Experimentation
Yashaswi Nayak HackerNoon profile picture


Hello There!


Consider a scenario - a lone Data Scientist works away at her system trying to wade through a huge amount of data; cleaning, sorting, processing, and then building a model to run prediction on the newly processed data. The scientist has a bunch of tools at her disposal - Jupyter Notebooks, Airflow, Anaconda, Pandas, data storage, and a cloud virtual machine.


She trains it for hours and hours, only to fall short of perfection - the model doesn’t perform as well as it should have. She looks out the window - it’s nightfall already. She has yet to test her model with a different set of parameters and track a set of different metrics of her experiments.


She switches off her system, calls it a day, and will try the next day with another model, a different approach with a bunch of new data and parameters. This is a long process that might stretch for days…weeks…and months.


It is difficult to jump back to a point when she had tried a specific combination of parameters for the experiment, knowledge is sometimes lost, as all the experiments and every artifact related to the model might not be saved. Tracking is crucial for the improvement of the ML model.


I think this lone ranger scenario can be avoided if we had a comprehensive IDE-style environment where we can run multiple experiments, do data management, and track our code, experiment metrics, plots, model, and data artifacts as well. How cool would that be?

Sounds too good to be true, but this is what DVC VSCode Extension is attempting to do.


DVC is an excellent tool to track your experiments, models, and related artifacts, but it’s a CLI - which many in the data science community might not be comfortable or familiar with.


Gone are the days when you had to learn a bunch of pesky CLI commands like this:


Using DVC got a whole lot easier and more fun.


DVC VSCode Extension

Iterative Team brings you a VS Code extension that combines the power of DVC CLI commands for data management, versioning, and experimentation with the sleek elegant coding experience of Visual Studio Code IDE.


The extension in its current form provides you with the following features:


1. Command Palette

Integrated into VS Code command palette menu. Press F1 to open the palette and type DVC to view a whole bunch of DVC-related commands at your disposal.


2. Experiments Table

Gives you an in-depth view of the experiments run in the workspace. The equivalent of the command dvc exp show in the CLI mode.


3. Plots / Live Plots

You can view the plots generated by the experiment run in the workspace. Can compare the plots of different experiments. Even view the plots updated in real-time.


4. Source Control Management

You can check the status of the workspace using this feature. You can dvc checkout, dvc commit, dvc add, dvc push & dvc pull from this view.


5. Tracked Artifacts - Datasets, Models, and Tokenizers

A small window for tracking your resources in the workspace. From here you can perform file actions, push & pull specific resources and manage the data within tracked datasets.


6. DVC View Container / Tray

The View Container can be activated by clicking the DVC icon in VS Code icon bar. It gives general information about the experiments and resources in the workspace.


Here are some advantages compared to CLI alone when you use the extension:

  • Hides the complexity of the CLI and removes friction from the experience.
  • Enhancing existing and providing extra visualizations.
  • Moving the data science workflows into the build context - fewer unexpected breaks in focus time.
  • View experiment performance in real-time
  • Everybody loves VS Code ❤️🙂


DVC Extenstion - Getting Started

Using the DVC Extension can be summarized into 4 steps

  1. Installation - (One time)
  2. Setting up your project and data
  3. Experimentation
  4. Plotting Graphs and Model Evaluation


Installation

Make sure you have DVC installed on your system. You can run the following command in your terminal:


$ pip3 install dvc


Or you can follow the guide given here for OS-specific installation.


Go to VS Code and in the extension menu, search for DVC. Click Install.


https://www.youtube.com/watch?v=INjOkuanRpc


Now you have the DVC extension ready to go. To get familiar with the usage of the extension we will download a sample ML project


Download Sample Project

You can download the sample project from the repo. Open the folder in VS Code. The DVC extension should detect the DVC binary and the python environment.


If you have a specific environment you can press F1 and select DVC: Setup The Workspace

Provide the compiler path and the python environment binary path.


Using the DVC Extension

You can view the DVC experiments in the current workspace in the DVC view container tab.


Pulling Data

To begin our experimentation, we need to pull the data. Press F1 to open VS Code command palette and select DVC: Pull


You can view the output by selecting DVC: Show DVC Output


Note: As of now the team is still working on the DVC remote storage option in the VS Code plugin, you will have to set your storage remote via command line or config file


Experimentation

You can change the parameters in the params.yaml file and select DVC: Modify Experiment Param(s),Rest and Run in the VS Code command palette.


https://www.youtube.com/watch?v=buuoKsGZvvo


Plots / Live Plots

You can check your experiments and view the plotted graphs using the extension as well.

And the cherry on top is that the extension allows you to cherry-pick your experiments. Pun Intended!


https://www.youtube.com/watch?v=N0VdjyQCo3Q


That’s not all, you can run individual experiments and change specific parameters.

If you wish to view your graphs live, for experiments that take a lot of time - say a DL model maybe with a lot of epochs.


You can view them in real-time as well. Just run your experiment and click on the plots button in the DVC tray.


https://www.youtube.com/watch?v=ov5ScDPV6Rw


When all is well and done, you can commit and push your changes as well.

The Iterative team is going to add more exciting features to the extension soon. Stay tuned.


Don’t let us keep you, go ahead and start experimenting. Happy DVC time!



A bit of parting philosophy

As an ML Ops practitioner, I deal with various challenges when working with different data science teams. There are various tools available in the market - both paid and open-source. I tend to lean towards open-source tools, as there is a kinship with a community that is actively helping out strangers across the world solve similar problems.


This approach is of great significance for the ML community as we are still in the adoption stage where a good tool can help your solve your problems faster and with more confidence. A centralized tool integrated with multiple stages of the ML pipeline goes a long way in helping the data science teams solve problems; they can focus more on the model improvement than on the infrastructure and setups -  this is what drew me to the DVC tool.


A shout out to the team at Iterative for creating this wonderful extension, hoping to see more magic in the future.