Skip to main content
ToolAI Tools & APIsv3.5

Apache Spark MLlib

by Apache Software Foundation · open-source · Last verified 2026-03-17

Apache Spark's built-in machine learning library for distributed, large-scale ML on data lakes and warehouses. MLlib provides scalable algorithms for classification, regression, clustering, and collaborative filtering, plus a pipeline API for feature engineering.

https://spark.apache.org/mllib
B+
B+Good
Adoption: AQuality: AFreshness: B+Citations: A+Engagement: F

Specifications

License
Apache 2.0
Pricing
open-source
Capabilities
distributed-ml-training, feature-engineering-pipelines, clustering, recommendation, graph-analytics
Integrations
hadoop, kafka, delta-lake, databricks, kubernetes, yarn
Use Cases
large-scale-feature-engineering, batch-ml-training, real-time-streaming-ml, recommendation-systems, fraud-analytics
API Available
No
SDK Languages
scala, python, java, r
Deployment
self-hosted, cloud, kubernetes, databricks, emr, dataproc
Rate Limits
N/A (open-source); cloud costs vary
Data Privacy
Data stays in customer infrastructure; cloud providers vary
Tags
distributed-ml, big-data, spark, feature-engineering, scala
Added
2026-03-17
Completeness
100%

Index Score

72.9
Adoption
85
Quality
82
Freshness
78
Citations
90
Engagement
0

Explore the full AI ecosystem on Agents as a Service