MLlib in Spark
To develop ML models, Spark provides MLLib; the native library for applying scalable Machine Learning algorithms in Spark applications. While scikit-learn library is great for standalone single node, non-distributed, applications, you need a library that is meant for distributed platform for multi-node Spark cluster and that is where MLLib comes to your rescue.
ML algorithms include the most commonly applied algorithms for Classification, Clustering, Regression, Decision Trees, Recommendation, etc..
Here is an example of applying MLlib algorithms on Titanic dataset
https://colab.research.google.com/drive/1JZHrM7t10QH0v8VZZtjhAGDVtYUKFjO6?usp=sharing