Beamstack: An Open source Framework for running Machine Learning Pipelines with Apache Beam

Olufunbi Babalola | Sep 27, 2024 min read

Overview of Beamstack


Beamstack is an open-source framework currently under development, aimed at facilitating the deployment of Machine Learning and GenAI workflow pipelines with Apache Beam on Kubernetes. Beamstack provides a robust Command Line Interface (CLI) that can potentially reduce pipeline deployment complexity and timelines drastically. It also possesses great monitoring and visualization features.

Why do you need Beamstack?


  • Configurable Deployment Environment: With minimal steps you can setup Beamstack on your local minikube, bare metal or cloud infrastructure
  • Ease of Deployment with Beam Low-code YAML: Beamstack adopts a low-code approach towards pipeline deployment which makes the process easier and faster
  • Composable and Reusable Pipeline Components: Reusable pipeline components designed for easy composition and customization, enabling efficient workflow creation and deployment.
  • Collaborative Setup for Development Teams: Beamstack’s modular architecture facilitates collaborative setup’s for various technical teams within an organization

Beamstack Architecure


How to use Beamstack


  • Install the binary: Install the beamstack binary from the releases section on the beamstack Github repository
  • Configre the target environment: Beamstack initializes your target kubernetes cluster with the necessary components
  • Select YAML package to deploy: Run and deploy your YAML pipeline by running the beamstack deploy command
  • Monitor your running jobs: Open the grafana dashboard or runner UI to observe your running jobs

Beamstack Roadmap


Feature Implementation

Tasks Breakdown