Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for protobuf schemas #966

Open
dselans opened this issue Nov 18, 2024 · 6 comments
Open

Support for protobuf schemas #966

dselans opened this issue Nov 18, 2024 · 6 comments

Comments

@dselans
Copy link

dselans commented Nov 18, 2024

Use Case

Our events are defined as Protobuf and reside in an events repo. For example: https://github.com/streamdal/streamdal/tree/main/libs/protos/protos

Services consume and publish protobuf encoded events to and from RabbitMQ.

It would be really useful if the protobuf events and their structure would be exposed/visible in EventCatalog - that would remove the need to manually inspect the proto definitions in the repo.

Proposed Solution

Optimally, EventCatalog had some sort of an integration that allows me to specify a protoset file that EventCatalog will process during build time and convert that to EventCatalog event definitions.

A protoset file is basically a serialized representation of compiled protobuf (.proto) definitions, stored in a binary format.

It is useful because it encapsulates all the schema information—including message types, fields, and services—into a single file that can be programmatically loaded and interpreted at runtime.

I've seen other projects add protobuf "support" by ingesting .proto files but this is almost always riddled with import/path-related problems, package resolution problems and so on.

The protoset approach completely eliminates this, as you have a single, self-contained binary that includes all the compiled .proto definitions, eliminating the need to handle import paths or resolve packages manually. This ensures seamless integration and avoids the common pitfalls associated with parsing raw .proto files.

Implementation Notes

I am not that familiar with how EventCatalog handles this for other formats but I did see in the demo/example different .avro files being located next to different services.

I do not think this is the correct approach for protobuf - even though it is possible, it will be painful because non-trivial .protoevent definitions usually span multiple files (and trying to piece them together is not a good idea).

Community Notes

  • Please vote by adding a 👍 reaction to the issue to help us prioritize.
  • If you are interested to work on this issue, please leave a comment.
  • If this issue is labeled needs-discussion, it means the spec has not been finalized yet. Please reach out in the EventCatalog Discord.
@dselans
Copy link
Author

dselans commented Nov 18, 2024

Sidenote: I love what you guys are working on. Debugging large-scale EDA systems is incredibly painful and the primary way to simplify that process is through good documentation. Even without protobuf support, EventCatalog makes sense simply to allow visualizing flows.

@boyney123
Copy link
Collaborator

@dselans shall we find some time to catch up on this and dive abit deeper?

@boyney123
Copy link
Collaborator

Spoke to @dselans , going to see how we can get proto (protosets) working with EventCatalog! @dselans will get an initial POC together we can dive deeper on

@dselans
Copy link
Author

dselans commented Dec 7, 2024

Unfortunately looks like support for protoset in the JS ecosystem is lacking (at least in the protobufjs lib):

protobufjs/protobuf.js#1117

Might need an alternative approach (and run into all those import-related issues we discussed on the call, @boyney123 ).

@dselans
Copy link
Author

dselans commented Dec 7, 2024

It is possible but it's fairly gross. Looks something like this:

import * as fs from 'fs';
import * as protobuf from 'protobufjs';
import descriptorJson from './descriptor.json';

(async () => {
  const root = protobuf.Root.fromJSON(descriptorJson);
  const FileDescriptorSet = root.lookupType("google.protobuf.FileDescriptorSet");

  const buffer = fs.readFileSync('file.protoset');
  const fdsMessage = FileDescriptorSet.decode(buffer);
  const fds = FileDescriptorSet.toObject(fdsMessage, { defaults: true }) as any;

  const ignoredPaths = ['google']

  for (const file of fds.file || []) {
    if (ignoredPaths.some(name => file.name.includes(name))) continue;
    // if (file.name !== 'proto/common/event.proto') {
    //     continue;
    // }

    console.log(`File: ${file.name}`);
    console.log(`Package: ${file.package}`);
    console.dir(file, { depth: null });

    for (const m of file.messageType || []) {
        console.log(`  Message: ${m.name}, Fields:`, m.field.map((f: any) => f.name));
        console.dir(m, { depth: null });
    }
    for (const e of file.enumType || []) {
        console.log(`  Enum: ${e.name}, Values:`, e.value.map((v: any) => v.name));
    }
    for (const s of file.service || []) {
        console.log(`  Service: ${s.name}, Methods:`, s.method.map((m: any) => m.name));
    }
  }
})();

This requires you to download descriptor.json from https://github.com/protobufjs/protobuf.js/blob/master/google/protobuf/descriptor.json.

@boyney123
Copy link
Collaborator

Hmm yeah thanks for looking into it!

Would would be the preferred way to do it? Could we write the code in Go, create a binary as call this from Node? Not ideal, but maybe something like could work in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants