Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Add Cpp Deployment in Ray Serve #34

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

MingYueQ
Copy link

@MingYueQ MingYueQ commented Jun 30, 2023

Summary

General Motivation

In scenarios such as search, inference, and others, system performance is of utmost importance. Taking the model inference and search engines of Ant Group as an example, these systems require high throughput, low latency, and high concurrency as they handle a massive amount of business. In order to meet these requirements, they have all chosen C++ for designing and developing their systems, ensuring high efficiency and stability. Currently, these systems plan to run on Ray Serve to enhance their distributed capabilities. Therefore, Ray Serve needs to provide C++ deployment so that users can easily deploy their services.

Should this change be within ray or outside?

main ray project. Changes are made to the Ray Serve level.

Stewardship

Required Reviewers

@sihanwang41
@edoakes
@akshay-anyscale

Shepherd of the Proposal (should be a senior committer)

@sihanwang41
@edoakes
@akshay-anyscale

Design and Architecture

Example Model

Taking the recommendation system as an example, when user inputs a search word, the system returns recommended words that are similar to it.

image

The RecommendService receives user requests and calls the FeatureService, SimilarityService, and RankService to calculate similar words, then return the results to users:

#pragma once

#include <memory>

#include "FeatureService.h"
#include "SimilarityService.h"
#include "RankService.h"

namespace ray {
namespace serve {

class RecommendService {
 public:
  RecommendService() {
    feature_service_ = std::make_shared<FeatureService>();
    similarity_service_ = std::make_shared<SimilarityService>();
    rank_service_ = std::make_shared<RankService>();
  }

  std::vector<std::string> Recommend(const std::string &request, const int num) {
    // 1. Calculate vector
    std::vector<float> features = feature_service_->GetVector(request);

    // 2. Calculate similarity
    std::unordered_map<std::string, float> similarities = similarity_service_->GetSimilarity(features);

    // 3. Get similar words
    std::vector<std::string> result = rank_service_->Rank(similarities, num);
    return result;
}

 private:
  std::shared_ptr<FeatureService> feature_service_;
  std::shared_ptr<SimilarityService> similarity_service_;
  std::shared_ptr<RankService> rank_service_;
};

}  // namespace serve
}  // namespace ray

The FeatureService convert requests to vector:

#pragma once

#include <vector>
#include <string>
#include <word2vec.h>

namespace ray {
namespace serve {

class FeatureService {
 public:
  FeatureService() {
    model_.load("model/word2vec.bin");
  }

  ~FeatureService() {
    delete model_;
    model_ = nullptr;
  }

  std::vector<float> GetVector(const std::string &request) {
    return model_.getVector(request);
  }

 private:
  word2vec::Word2Vec model_;
};

}  // namespace serve
}  // namespace ray

The SimilarityService is used to calculate similarity:

#pragma once

#include <vector>
#include <string>
#include <cmath>
#include <unordered_map>

namespace ray {
namespace serve {

class SimilarityService {
 public:
  std::unordered_map<std::string, float> GetSimilarity(std::vector<float> request_vec) {
    std::unordered_map<std::string, float> result;
    for (auto it = recommend_cache_.begin(); it != recommend_cache_.end(); it++) {
      result.insert({it->first, ComputeCosineSimilarity(request_vec, it->second)});
    }
    return result;
  }

 private:
  float ComputeCosineSimilarity(std::vector<float> v1, std::vector<float> v2) {
    int len = v1.size();
    float dotProduct = 0;
    float magnitude1 = 0;
    float magnitude2 = 0;
    for (int i = 0; i < len; i++) {
      dotProduct += v1[i] * v2[i];
      magnitude1 += v1[i] * v1[i];
      magnitude2 += v2[i] * v2[i];
    }
    magnitude1 = std::sqrt(magnitude1);
    magnitude2 = std::sqrt(magnitude2);
    return dotProduct / (magnitude1 * magnitude2);
  }

  std::unordered_map<std::string, std::vector<float>> recommend_cache_ = {
    {"mac", {1.5, 2.3, 3.5, 5.5}},
    {"car", {1.5, 3.2, 3.9, 7.5}},
    {"phone", {1.5, 2.0, 4.5, 8.1}},
  };
};

}  // namespace serve
}  // namespace ray

The RankService is used to sort based on similarity:

#pragma once

#include <vector>
#include <string>
#include <queue>
#include <unordered_map>
#include <algorithm>

namespace ray {
namespace serve {

class RankService {
 public:
  std::vector<std::string> Rank(std::unordered_map<std::string, float> recommends, int num) {
    std::priority_queue<std::pair<std::string, float>, std::vector<std::pair<std::string, float>>, std::greater<std::pair<std::string, float>>> pq;
    for (auto& pair : recommends) {
        pq.push(pair);
        if (pq.size() > num) {
            pq.pop();
        }
    }

    std::vector<std::string> result;
    while (!pq.empty()) {
        result.push_back(pq.top().first);
        pq.pop();
    }
    std::reverse(result.begin(), result.end());
    return result;
  }
};

}  // namespace serve
}  // namespace ray

This is the code that uses the RecommendService class:

#include "RecommendService.h"

int main(int argc, char *argv[]) {
  ray::serve::RecommendService recommend_service;
  std::vector<std::string> recommends = recommend_service.Recommend("computer", 1);
  for(std::string recommend_word : recommends) {
    printf("Recommend word is %s", recommend_word.c_str());
  }
  return 0;
}

In this way, all services need to be deployed together, which increases the system load and is not conducive to expansion.

Converting to a Ray Serve Deployment

Through Ray Serve, the core computing logic can be deployed as a scalable distributed service.
First, convert these Services to run on Ray Serve.
FeatureService:

#pragma once

#include <vector>
#include <string>
#include <word2vec.h>

namespace ray {
namespace serve {

class FeatureService {
 public:
  FeatureService() {
    model_.load("model/word2vec.bin");
  }

  ~FeatureService() {
    delete model_;
    model_ = nullptr;
  }

  std::vector<float> GetVector(const std::string &request) {
    return model_.getVector(request);
  }

  static FeatureService *FactoryCreate() {
    return new FeatureService();
  }
 private:
  word2vec::Word2Vec model_;
};

// Register function
SERVE_DEPLOYMENT(FeatureService::FactoryCreate);

}  // namespace serve
}  // namespace ray

SimilarityService:

#pragma once

#include <vector>
#include <string>
#include <cmath>
#include <unordered_map>

namespace ray {
namespace serve {

class SimilarityService {
 public:
  std::unordered_map<std::string, float> GetSimilarity(std::vector<float> request_vec) {
    std::unordered_map<std::string, float> result;
    for (auto it = recommend_cache_.begin(); it != recommend_cache_.end(); it++) {
      result.insert({it->first, ComputeCosineSimilarity(request_vec, it->second)});
    }
    return result;
  }

  static SimilarityService *FactoryCreate() {
    return new SimilarityService();
  }

 private:
  float ComputeCosineSimilarity(std::vector<float> v1, std::vector<float> v2) {
    int len = v1.size();
    float dotProduct = 0;
    float magnitude1 = 0;
    float magnitude2 = 0;
    for (int i = 0; i < len; i++) {
      dotProduct += v1[i] * v2[i];
      magnitude1 += v1[i] * v1[i];
      magnitude2 += v2[i] * v2[i];
    }
    magnitude1 = std::sqrt(magnitude1);
    magnitude2 = std::sqrt(magnitude2);
    return dotProduct / (magnitude1 * magnitude2);
  }

  std::unordered_map<std::string, std::vector<float>> recommend_cache_ = {
    {"mac", {1.5, 2.3, 3.5, 5.5}},
    {"car", {1.5, 3.2, 3.9, 7.5}},
    {"phone", {1.5, 2.0, 4.5, 8.1}},
  };
};

// Register function
SERVE_DEPLOYMENT(SimilarityService::FactoryCreate);

}  // namespace serve
}  // namespace ray

RankService:

#pragma once

#include <vector>
#include <string>
#include <queue>
#include <unordered_map>
#include <algorithm>

namespace ray {
namespace serve {

class RankService {
 public:
  std::vector<std::string> Rank(std::unordered_map<std::string, float> recommends, int num) {
    std::priority_queue<std::pair<std::string, float>, std::vector<std::pair<std::string, float>>, std::greater<std::pair<std::string, float>>> pq;
    for (auto& pair : recommends) {
        pq.push(pair);
        if (pq.size() > num) {
            pq.pop();
        }
    }

    std::vector<std::string> result;
    while (!pq.empty()) {
        result.push_back(pq.top().first);
        pq.pop();
    }
    std::reverse(result.begin(), result.end());
    return result;
  }

  static RankService *FactoryCreate() {
    return new RankService();
  }
};

// Register function
SERVE_DEPLOYMENT(RankService::FactoryCreate);

}  // namespace serve
}  // namespace ray

RecommendService is a sequential invocation of other services without complex processing logic, so we can directly use the DAG ability to connect these services, eliminating the need for RecommendService and simplifying user logic.
Next, we start the Ray Serve runtime and use Python Serve API deploy these Service as Deployment:

feature_service = serve.deployment(_func_or_class='FeatureService::FactoryCreate', name='feature_service', language='CPP')
similarity_service = serve.deployment(_func_or_class='SimilarityService::FactoryCreate', name='similarity_service', language='CPP')
rank_service = serve.deployment(_func_or_class='RankService::FactoryCreate', name='rank_service', language='CPP')

with InputNode() as input:
    features = feature_service.GetVector.bind(input[0])
    similarities = similarity_service.GetSimilarity.bind(features)
    rank_result = rank_service.Rank.bind(similarities, input[1])

graph = DAGDriver.bind(rank_result, http_adapter=json_request)
handle = serve.run(graph)
ref = handle.GetVector.remote()
result = ray.get(ref)
print(result)

Calling Ray Serve Deployment with HTTP

curl http://127.0.0.1:8000?request=computer&num=1

Overall Design

image
Ray Serve maintains Controller Actor and Ingress Actor. So these two roles are not related to the language of the user's choice. And they have the ability to manage cross-language deployments and route requests.

C++ Case Deduction

The businesses can send control commands to the Controller Actor of Ray Serve, which includes creating ingress, creating deployment, etc. When publishing C++ online services, the DeploymentState component needs to create C++ Deployment Actors. Users can call their business logic in Python/Java Driver, Ray task, or Ray Actor, and the requests will be dispatched to C++ Deployment Actors.
image

Package

C++ programs are typically compiled and packaged into three types of results: binary, static library, shared library.

  • binary: C++ does not have a standard load binary API, and binaries compiled by different compilers may not be the same. Directly loading the binary can result in many uncontrollable factors.
  • static library: Static library bundle all their dependencies together, whereas common utility libraries such as glog and boost may be loaded via dynamic dependencies in the Ray Serve Deployment. This can lead to a higher probability of conflicts when the same library exists as both a dynamic and static dependency.
  • shared library: C++ provides a standard API for loading shared library, and it can reduce memory usage and prevent conflicts.

In conclusion, the business needs to package the system as a shared library to run it on Ray Serve.

Register function

Ray Serve will add SERVE_FUNC and SERVE_DEPLOYMENT macros to publish user Service as Deployment.
SERVE_FUNC: Resolving overloaded function registration;
SERVE_DEPLOYMENT: Publishing user Service as Deployment.
Example:

static RecommendService *CreateRecommendService(std::string request) {
  return new RecommendService(request);
}

static RecommendService *CreateRecommendService(std::string request, int num) {
  return new RecommendService(request, num);
}

static FeatureService *CreateFeatureService() {
  return new FeatureService();
}

// Register function
SERVE_DEPLOYMENT(SERVE_FUNC(CreateRecommendService, std::string, int),
			    CreateFeatureService);

Compatibility, Deprecation, and Migration Plan

The new feature is to add C++ deployment for Ray Serve, without modifying or deprecating existing functionalities. The changes to the Ray Serve API are also adding new capabilities.

Test Plan and Acceptance Criteria

  • Unit and integration test for core components
  • Benchmarks on C++ Deployment

(Optional) Follow-on Work

  • Init Ray Serve C++ project structure
  • Create C++ Deployment through Python/Java API
  • ServeHandler for C++
  • Accessing C++ Deployment using Python/Java/C++ ServeHandle or HTTP

@edoakes
Copy link
Contributor

edoakes commented Jul 12, 2023

@MingYueQ sorry for the delay on responding to the RFC. We have been putting out fires for Ray 2.6 release & have a company retreat this week. I will have a detailed look next week and get back to you.

The main concern is going to be around maintenance burden — the existing Java integrations are written in a way that causes some headaches because we have a lot of branches in the code that are special-cased for Java. If we decide to go ahead with the CPP extension, we’ll need to make sure to improve this across all three languages.

@MingYueQ
Copy link
Author

@edoakes Thank you for your reply. We have the following suggestions regarding your concerns:

  1. In scenarios with extremely high performance requirements, C++ Deployment is more suitable than Python Deployment and Java Deployment, which can expand the scope of Ray Serve services;
  2. Adding C++ Deployment can give Ray Serve users more options and increase the flexibility of Ray Serve;
  3. We will continue to invest in the development of C++ Deployment, and minimize the burden on the community.

@liuyang-my liuyang-my closed this Jul 18, 2023
@liuyang-my
Copy link

@edoakes Thank you for your reply. At the time when Java Deployment was supported, there was not a good abstraction for multiple languages. Because there was no expectation of support for other languages, Java was hardcoded in some places. In the development work related to this proposal, I will work with @MingYueQ to improve the abstraction part for multiple languages.

@liuyang-my liuyang-my reopened this Jul 18, 2023
@edoakes
Copy link
Contributor

edoakes commented Jul 25, 2023

As discussed offline let's re-open discussion after Ray summit when the Serve team has some more bandwidth.

@MingYueQ
Copy link
Author

MingYueQ commented Aug 3, 2023

OK. During the period of the Ray Summit, we will also prepare more detailed information, including user guide, architecture design, benchmark, and so on.

@MingYueQ
Copy link
Author

@edoakes Please review this topic again.

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline. High level questions:

  • How does this relate to the existing Ray C++ actor support? Will this require any extensions of it?
  • Will we support the DeploymentHandle interface in C++? Or will applications be required to have a Python ingress deployment which may call into the C++ deployment? Please add an example of a Python deployment calling into a C++ one (including calling multiple different methods on it).

similarity_service = serve.deployment(_func_or_class='SimilarityService::FactoryCreate', name='similarity_service', language='CPP')
rank_service = serve.deployment(_func_or_class='RankService::FactoryCreate', name='rank_service', language='CPP')

with InputNode() as input:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on the Java improvement REP, let's please update this to use bind( with DeploymentHandle calls instead of the call graph API (there should be no InputNode)

RecommendService is a sequential invocation of other services without complex processing logic, so we can directly use the DAG ability to connect these services, eliminating the need for RecommendService and simplifying user logic.
Next, we start the Ray Serve runtime and use Python Serve API deploy these Service as Deployment:
```python
feature_service = serve.deployment(_func_or_class='FeatureService::FactoryCreate', name='feature_service', language='CPP')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty ugly: it's using a private attribute (func_or_class). Maybe we can introduce a simple wrapper class like serve.CPPDeployment(factory_function)

reps/2023-06-30-serve-cpp-deployment.md Show resolved Hide resolved
reps/2023-06-30-serve-cpp-deployment.md Show resolved Hide resolved
@liuyang-my
Copy link

Some comments inline. High level questions:

  • How does this relate to the existing Ray C++ actor support? Will this require any extensions of it?
  • Will we support the DeploymentHandle interface in C++? Or will applications be required to have a Python ingress deployment which may call into the C++ deployment? Please add an example of a Python deployment calling into a C++ one (including calling multiple different methods on it).
  • Based on Ray C++ Actor, no extensions are needed at the Ray core level.
  • C++ DeploymentHandle will be supported, which also relates to whether the common parts of DeploymentHandle across different languages will be moved to C++. This can be implemented in stages.

@edoakes
Copy link
Contributor

edoakes commented Oct 30, 2023

  • C++ DeploymentHandle will be supported, which also relates to whether the common parts of DeploymentHandle across different languages will be moved to C++. This can be implemented in stages.

Could you add some detail about this part to the REP please? What will be the deployment handle interface in C++ and roughly how would a shared core implementation work across the three languages (this part doesn't need to go into too much detail that can be left to implementation)?

wangyingjie added 3 commits November 8, 2023 10:41
Signed-off-by: wangyingjie <[email protected]>
Signed-off-by: wangyingjie <[email protected]>
@MingYueQ MingYueQ force-pushed the feature-serve-cpp-deployment branch from 5f2fb61 to 5b9417e Compare November 8, 2023 02:42
Signed-off-by: wangyingjie <[email protected]>
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MingYueQ sorry about the delay, just got back from a long vacation! The proposal is looking good. I just have one more comment for you, see inline.

I will send this out to the Ray committers mailing list for broader feedback on Monday and let's try to get it approved by end of next week.

- Init Ray Serve C++ project structure
- Create C++ Deployment through Python/Java API
- ServeHandler for C++
- Accessing C++ Deployment using Python/Java/C++ ServeHandle or HTTP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective I think supporting HTTP is a P0 requirement in order to really say that we support C++ as a language in Ray Serve.

Could you at least sketch a possible API that we could support here? It does not have to be part of the initial scope of implementation. But without having this I think the proposal is incomplete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
Expose two primary APIs to users: SERVE_FUNC and SERVE_DEPLOYMENT, with a detailed explanation provided in the section titled "Register Function".
Furthermore, users can deploy services through these APIs, as exemplified in the "Converting to a Ray Serve Deployment" section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused -- how would a C++ deployment be directly exposed via HTTP? Or are you proposing that in order to expose it via HTTP it must be placed behind a Python deployment that calls into it via DeploymentHandle?

Copy link
Contributor

@edoakes edoakes Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example above, I see the RecommendService but this takes as arguments a std::string and an int. How are these parsed from an HTTP request? And how can an HTTP response be output?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way of accessing via HTTP needs to be adjusted for this demo.

curl http://127.0.0.1:8000?request=computer&num=1

This is not sufficient because the type information for num=1 is lost in the http query. For cases with multiple input parameters, it is possible to transmit them as a JSON string in the HTTP body, for example:

curl -d '{"request": "computer", "num": 1}' http://127.0.0.1:8000

This way, the request flow is: -> HTTPProxy (access with DeploymentHandle) -> Cpp Deployment.
And currently, the HTTPProxy should be able to meet the requirements without any additional modifications.

@MingYueQ can adjust according to this approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any standard HTTP -> C++ struct parsing libraries that we could lean on here to help define the input/output formatting? For example in Python we use FastAPI to handle all of this. Otherwise this will become a rabbit hole (HTTP input parsing/output formatting could be a whole library in itself).

Copy link
Author

@MingYueQ MingYueQ Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in the investigation, if C++ Deployment wants to have the capability to map the path to the user register function, it needs to implement this functionality itself. A more native approach is to implement it using macros. Specifically:

The SERVE_DEPLOYMENT macro mentioned in the REP above originally functions to generate a <method_name, std::function> mapping. In order to maintain this capability, a separate cache needs to be generated to store <path, method_name>, so that the call chain becomes: path -> method_name -> std::function. This will result in a longer call chain. Therefore, it would be better to directly generate a <path, std::function> mapping in SERVE_DEPLOYMENT. Taking RankService as an example:

#pragma once

#include "api/serve_deployment.h"

namespace ray {
namespace serve {

class RankService {
 public:
  static RankService *FactoryCreate() {
    return new RankService();
  }
};

// Register function
SERVE_DEPLOYMENT("/rankService/factoryCreate", RankService::FactoryCreate);

}  // namespace serve
}  // namespace ray

In the SERVE_DEPLOYMENT, the first parameter is the path and the second parameter is the user function.

wangyingjie added 2 commits December 1, 2023 19:29
Signed-off-by: wangyingjie <[email protected]>
Signed-off-by: wangyingjie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants