Standalone + Python Embedded Mode with Ray-Based Distributed Runtime? #916
chitralverma
started this conversation in
Ideas
Replies: 1 comment
-
|
cc @mwylde |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey all — I’ve been digging into Arroyo recently and love the direction it’s heading: great performance, clean architecture, and strong support for connectors and stateful compute out of the box.
I wanted to open up a conversation around a different deployment/ usage pattern than what’s currently documented. Most of the examples today assume running Arroyo as a cluster via k8s/ Helm and managing pipelines via the web UI or CLI (needs to be installed beforehand). That makes sense in many production cases, but for some lighter-weight or embedded scenarios, it would be awesome to have something like this:
PipelineHandle-style object)local(single node, threadpool/ multiprocess for ingestion → transform → sink)distributed(by launching arroyo workers via ray workers which also works with k8s etc.)arroyo run ...as described here)Why this might be useful?
pip install <arroyo pkg name>, no Helm, no infra setup, no prior installation of CLI. btw, the cli can still come from the python package.datafusion, i think there is great opportunity to offload tasks between engines like arroyo (meant for streaming) and powerful batch engines like pyarrow, duckdb, polars etc. (primarily focus on batch at the moment) via thearrowiter-opReally curious to hear thoughts from the core team and others in the community.
Happy to help prototype or outline what an MVP could look like. I think this kind of flexibility could make Arroyo even more approachable and powerful.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions