Apache Spark MLlib
by Apache Software Foundation · open-source · Last verified 2026-03-17
Apache Spark's built-in machine learning library for distributed, large-scale ML on data lakes and warehouses. MLlib provides scalable algorithms for classification, regression, clustering, and collaborative filtering, plus a pipeline API for feature engineering.
https://spark.apache.org/mllib ↗B+
B+—Good
Adoption: AQuality: AFreshness: B+Citations: A+Engagement: F
Specifications
- License
- Apache 2.0
- Pricing
- open-source
- Capabilities
- distributed-ml-training, feature-engineering-pipelines, clustering, recommendation, graph-analytics
- Integrations
- hadoop, kafka, delta-lake, databricks, kubernetes, yarn
- Use Cases
- large-scale-feature-engineering, batch-ml-training, real-time-streaming-ml, recommendation-systems, fraud-analytics
- API Available
- No
- SDK Languages
- scala, python, java, r
- Deployment
- self-hosted, cloud, kubernetes, databricks, emr, dataproc
- Rate Limits
- N/A (open-source); cloud costs vary
- Data Privacy
- Data stays in customer infrastructure; cloud providers vary
- Tags
- distributed-ml, big-data, spark, feature-engineering, scala
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
72.9Adoption
85
Quality
82
Freshness
78
Citations
90
Engagement
0