Recommand · October 22, 2021 0

Parallelization of sklearn functions using MPI without cross-validation

I have a group of time series which I want to apply a LASSO regression using sklearn on them. As the datasets is pretty sparse I need whole length of time series so that I can’t cross-validate. The datasets are big and training process is time consuming which I have to run it on a cluster.
In order to use different nodes I use MPI. As far as I know there is possibility to use sklearn function on cluster using MPI. This possibility basically works with cross-validation chunks, like following issue:
https://github.com/sebp/scikit-learn-mpi-grid-search

I was wondering if there is any other way to use MPI to parallelize process of training in sklearn without cross-validation? I think it would mean that underlying algorithm of sklearn function should use parallelization.