Saving and Loading Scalers using MLflow and Databricks with Python
Databricks is web-based platform for which provides automated cluster management while allowing you to work with Python, R and others, on IPython-style notebooks

MLflow is a platform to track experimentation, reproducibility, deployment of machine learning models. On the experimentation phase, mlflow facilitates the versioning of your models. It allows you to save not only the models but also metrics, parameters and datasets. For a quick introduction on mlflow, I found this tutorial and that tutorial to be very helpful.

Say that, to create a model, you standardized a data set using some scaler.

scaler = StandardScaler() 

scaler = scaler.fit(X_train) 

To make predictions on a new data set using this model, you will need not only the model but also the scaler. The scaler can be saved as a pickel object, using the package pickel.

#save the scaler locally using the package pickel 

import pickel 

pickel.dump(scaler, open('scaler.pkl', 'wb')) 

# save the scaler alongside the model under your mlflow experiment 

with mlflow.start_run(): 

	mlflow.sklearn.log_model(model_object, 'model’) 

	mlflow.log_artifact('scaler.pkl') 				

On the databricks UI, a quick glance at the experiment’s latest run shows you both the model and the scaler.

2021-02-24-scaler-mlflow-databricks-img01

To retrieve the model, we use the function load_model().

# load model saved under run id '124bf7c43bdf4733a0a17c5d4435da71'  

run_id = '124bf7c43bdf4733a0a17c5d4435da71' 

model_uri = 'runs:/' + run_id + '/logisticRegr'

model = mlflow.sklearn.load_model(model_uri = model_uri) 

To retrieve the scaler, we use the function download_artifacts(), as shown in the documentation.

client = mlflow.tracking.MlflowClient() 

local_dir = "/tmp/artifact_downloads" 

if not os.path.exists(local_dir): 

	os.mkdir(local_dir) 
	
local_path = client.download_artifacts('124bf7c43bdf4733a0a17c5d4435da71', '', local_dir) 

scaler = open('/tmp/artifact_downloads/scaler.pkl', 'rb') 

The same procedure can be applied to retrieve any metric, parameter or data set.

Saving and Loading Scalers using MLflow and Databricks with Python
Older post

OpenStreetMap Features and Tags in R

Some tips to find the right features and tags for the elements you are mapping in R

Newer post

Deriving Meaning from Date Features for Prediction in Python

Some ideas on new features that can be created based on date features

Saving and Loading Scalers using MLflow and Databricks with Python