Support for Apache Beam #50

MedAnd · 2019-02-09T14:00:52Z

Consider adding support for Apache Beam's unified model for defining both batch and streaming data-parallel processing pipelines.

cybertyche · 2019-02-11T18:57:38Z

This has been under active consideration for quite some time now. Part of the issue is that Beam's model and Trill's are substantially different in several ways, enough that it would almost require a redesign or reimplementation of Trill into something far more DataFlow-like to do, which would eliminate most of what Trill does really well. That said, there is always the possibility that we could find some innovative way to support both.

Another issue has been that we've seen other implementations of Beam have remarkably bad performance over their native implementations, enough that they tend to lose interest quickly.

I'd love to get a conversation going on this though. What I would like to know, if possible, is:

Where does this sit on the list of priorities for you?
What scenarios does it open up?
What feature of Beam is most appealing to you?

MedAnd · 2019-02-12T15:04:51Z

Some initial feedback...

High, especially if Trill supported distributed (multi-node) clusters like Apache Flink, Google Dataflow etc
Able to replace an in-house Service Fabric hosted processing engine with an advanced & high performance .Net engine like Trill. Trill could be offered as an Azure Platform (alternative to Cloud Dataflow), packaged as a container etc? Many MS platforms are integrating Spark however a distributed (multi-node) Trill solution would be easier to adopt for MS technology shops as we'll be able to leverage the .Net Core eco-system, tooling etc!
Unified, cross platform & cloud model, on-prem platform for defining both batch and streaming processing which avoids vendor and API lock-in.

Think this article is applicable: Why Apache Beam? A Google Perspective

MedAnd · 2019-02-14T11:39:50Z

A further discussion stimulator... Batch as a Special Case of Streaming

cybertyche · 2019-02-21T23:42:14Z

This is definitely good conversation fodder, and thank you. I think the biggest question here is if we go forward with a Beam API layer, where in the architecture would it sit? My immediate thought is that it would be atop Trill and not inside it, but that is certainly debatable.

As for batch as a special case of streaming, you've got no argument from me there. :-)

MedAnd · 2019-03-01T09:44:50Z

I would use either implementation in a large stream processing application if available today ☺️ I think the functionality offered by Azure Stream Analytics (ASA) is compelling, however on the distributed stream processing side I believe Azure does not have a true equivalent to Google Dataflow? Hope this project can change that... more conversation fodder to come ☺️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Apache Beam #50

Support for Apache Beam #50

MedAnd commented Feb 9, 2019

cybertyche commented Feb 11, 2019

MedAnd commented Feb 12, 2019 •

edited

Loading

MedAnd commented Feb 14, 2019 •

edited

Loading

cybertyche commented Feb 21, 2019

MedAnd commented Mar 1, 2019

Support for Apache Beam #50

Support for Apache Beam #50

Comments

MedAnd commented Feb 9, 2019

cybertyche commented Feb 11, 2019

MedAnd commented Feb 12, 2019 • edited Loading

MedAnd commented Feb 14, 2019 • edited Loading

cybertyche commented Feb 21, 2019

MedAnd commented Mar 1, 2019

MedAnd commented Feb 12, 2019 •

edited

Loading

MedAnd commented Feb 14, 2019 •

edited

Loading