Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Apache Beam #50

Open
MedAnd opened this issue Feb 9, 2019 · 5 comments
Open

Support for Apache Beam #50

MedAnd opened this issue Feb 9, 2019 · 5 comments

Comments

@MedAnd
Copy link

MedAnd commented Feb 9, 2019

Consider adding support for Apache Beam's unified model for defining both batch and streaming data-parallel processing pipelines.

@cybertyche
Copy link
Contributor

This has been under active consideration for quite some time now. Part of the issue is that Beam's model and Trill's are substantially different in several ways, enough that it would almost require a redesign or reimplementation of Trill into something far more DataFlow-like to do, which would eliminate most of what Trill does really well. That said, there is always the possibility that we could find some innovative way to support both.

Another issue has been that we've seen other implementations of Beam have remarkably bad performance over their native implementations, enough that they tend to lose interest quickly.

I'd love to get a conversation going on this though. What I would like to know, if possible, is:

  • Where does this sit on the list of priorities for you?
  • What scenarios does it open up?
  • What feature of Beam is most appealing to you?

@MedAnd
Copy link
Author

MedAnd commented Feb 12, 2019

Some initial feedback...

  1. High, especially if Trill supported distributed (multi-node) clusters like Apache Flink, Google Dataflow etc
  2. Able to replace an in-house Service Fabric hosted processing engine with an advanced & high performance .Net engine like Trill. Trill could be offered as an Azure Platform (alternative to Cloud Dataflow), packaged as a container etc? Many MS platforms are integrating Spark however a distributed (multi-node) Trill solution would be easier to adopt for MS technology shops as we'll be able to leverage the .Net Core eco-system, tooling etc!
  3. Unified, cross platform & cloud model, on-prem platform for defining both batch and streaming processing which avoids vendor and API lock-in.

Think this article is applicable: Why Apache Beam? A Google Perspective

@MedAnd
Copy link
Author

MedAnd commented Feb 14, 2019

A further discussion stimulator... Batch as a Special Case of Streaming

@cybertyche
Copy link
Contributor

This is definitely good conversation fodder, and thank you. I think the biggest question here is if we go forward with a Beam API layer, where in the architecture would it sit? My immediate thought is that it would be atop Trill and not inside it, but that is certainly debatable.

As for batch as a special case of streaming, you've got no argument from me there. :-)

@MedAnd
Copy link
Author

MedAnd commented Mar 1, 2019

I would use either implementation in a large stream processing application if available today ☺️ I think the functionality offered by Azure Stream Analytics (ASA) is compelling, however on the distributed stream processing side I believe Azure does not have a true equivalent to Google Dataflow? Hope this project can change that... more conversation fodder to come ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants