-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[train] TensorflowTrainer
does not work with keras>3.x
#47464
Comments
Similar error from here... |
And a toy example here: import ray
import tensorflow as tf
from ray.train.tensorflow import TensorflowTrainer
from ray.train import ScalingConfig
def build_model():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128, activation="relu"))
model.add(tf.keras.layers.Dense(10))
model.compile(optimizer="adam", loss="mean_squared_error")
return model
def train_func(config):
strategy = tf.distribute.MultiWorkerMirroredStrategy()
with strategy.scope():
model = build_model()
dataset = ray.train.get_dataset_shard("train")
tf_dataset = dataset.to_tf(
feature_columns="x", label_columns="y", batch_size=32
)
print("TF DATASET: ")
print(tf_dataset)
model.fit(tf_dataset, epochs=5)
train_dataset = ray.data.from_items([{"x": x / 10, "y": x % 10} for x in range(1000)])
scaling_config = ScalingConfig(num_workers=2, use_gpu=False)
trainer = TensorflowTrainer(
train_loop_per_worker=train_func,
datasets={"train": train_dataset},
scaling_config=scaling_config,
)
results = trainer.fit()
print(results.metrics) Error:
|
For more context, this issue was when running |
seeing this issue on custom code using TensorflowTrainer and MultiWorkerMirroredStrategy. versions:
|
@beck-weber-ing |
for me the error looks like this and happens upon calling model.fit (train.py:210):
|
@crbellis @beck-weber-ing @ghsanti Thanks for filing this issue. The issue seems to be with the Tensorflow distributed API ( Even without Ray, The problem is that Workaround 1: Pin the tensorflow (and keras) version
Workaround 2: Use the legacy Keras 2 packageIf you need to use a later version of tensorflow, it is still backwards compatible to Keras 2.x, but you'll need to install a new package and change the keras import ( See here: https://keras.io/getting_started/#tensorflow--keras-2-backwards-compatibility |
TensorflowTrainer
does not work with keras>3.x
Thanks @justinvyu! |
What happened + What you expected to happen
I was trying to run this example from the documentation however it results in an error. I've tested this on 2 different clusters, one with CPU only and GPU set to false, the other with a cluster of GPUs.
Tensorflow example here.
The error is
I expected the sample from the docs to run successfully.
Versions / Dependencies
ray==2.30.0
python==3.11
Reproduction script
No changes made to this code from the doc.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: