You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
Component
Transforms/universal/ededup
What happened + What you expected to happen
I am trying to run the Google Colab Notebook demo_with_launcher.ipynb availbale on the website but when I run the cell for exact dedupliaction i get following error :-
AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'
Complete trace of the cell is mentioned below:-
13:12:40 INFO - exact dedup params are {'doc_column': 'contents', 'doc_id_column': 'document_id', 'use_snapshot': False, 'snapshot_directory': None, 'hash_cpu': 0.5, 'num_hashes': 2}
INFO:ededup_transform_base:exact dedup params are {'doc_column': 'contents', 'doc_id_column': 'document_id', 'use_snapshot': False, 'snapshot_directory': None, 'hash_cpu': 0.5, 'num_hashes': 2}
13:12:40 INFO - pipeline id pipeline_id
INFO:data_processing.runtime.execution_configuration:pipeline id pipeline_id
13:12:40 INFO - code location None
INFO:data_processing.runtime.execution_configuration:code location None
13:12:40 INFO - number of workers 2 worker options {'num_cpus': 0.3, 'max_restarts': -1}
INFO:data_processing_ray.runtime.ray.execution_configuration:number of workers 2 worker options {'num_cpus': 0.3, 'max_restarts': -1}
13:12:40 INFO - actor creation delay 0
INFO:data_processing_ray.runtime.ray.execution_configuration:actor creation delay 0
13:12:40 INFO - job details {'job category': 'preprocessing', 'job name': 'ededup', 'job type': 'ray', 'job id': 'job_id'}
INFO:data_processing_ray.runtime.ray.execution_configuration:job details {'job category': 'preprocessing', 'job name': 'ededup', 'job type': 'ray', 'job id': 'job_id'}
13:12:40 INFO - data factory data_ is using local data access: input_folder - output/02_chunk_out output_folder - output/03_ededup_out
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ is using local data access: input_folder - output/02_chunk_out output_folder - output/03_ededup_out
13:12:40 INFO - data factory data_ max_files -1, n_sample -1
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ max_files -1, n_sample -1
13:12:40 INFO - data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.parquet'], files to checkpoint ['.parquet']
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.parquet'], files to checkpoint ['.parquet']
13:12:40 INFO - Running locally
INFO:data_processing_ray.runtime.ray.transform_launcher:Running locally
2024-12-25 13:12:43,179 INFO worker.py:1744 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
13:12:45 INFO - Exception running ray remote orchestration
ray::orchestrate() (pid=8395, ip=172.28.0.12)
File "/usr/local/lib/python3.10/dist-packages/data_processing_ray/runtime/ray/transform_orchestrator.py", line 58, in orchestrate
runtime = runtime_config.create_transform_runtime()
AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'
INFO:data_processing_ray.runtime.ray.transform_launcher:Exception running ray remote orchestration
ray::orchestrate() (pid=8395, ip=172.28.0.12)
File "/usr/local/lib/python3.10/dist-packages/data_processing_ray/runtime/ray/transform_orchestrator.py", line 58, in orchestrate
runtime = runtime_config.create_transform_runtime()
AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'
13:12:45 INFO - Completed execution in 0.084 min, execution result 1
INFO:data_processing_ray.runtime.ray.transform_launcher:Completed execution in 0.084 min, execution result 1
Search before asking
Component
Transforms/universal/ededup
What happened + What you expected to happen
I am trying to run the Google Colab Notebook demo_with_launcher.ipynb availbale on the website but when I run the cell for exact dedupliaction i get following error :-
AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'
Complete trace of the cell is mentioned below:-
13:12:40 INFO - exact dedup params are {'doc_column': 'contents', 'doc_id_column': 'document_id', 'use_snapshot': False, 'snapshot_directory': None, 'hash_cpu': 0.5, 'num_hashes': 2}
INFO:ededup_transform_base:exact dedup params are {'doc_column': 'contents', 'doc_id_column': 'document_id', 'use_snapshot': False, 'snapshot_directory': None, 'hash_cpu': 0.5, 'num_hashes': 2}
13:12:40 INFO - pipeline id pipeline_id
INFO:data_processing.runtime.execution_configuration:pipeline id pipeline_id
13:12:40 INFO - code location None
INFO:data_processing.runtime.execution_configuration:code location None
13:12:40 INFO - number of workers 2 worker options {'num_cpus': 0.3, 'max_restarts': -1}
INFO:data_processing_ray.runtime.ray.execution_configuration:number of workers 2 worker options {'num_cpus': 0.3, 'max_restarts': -1}
13:12:40 INFO - actor creation delay 0
INFO:data_processing_ray.runtime.ray.execution_configuration:actor creation delay 0
13:12:40 INFO - job details {'job category': 'preprocessing', 'job name': 'ededup', 'job type': 'ray', 'job id': 'job_id'}
INFO:data_processing_ray.runtime.ray.execution_configuration:job details {'job category': 'preprocessing', 'job name': 'ededup', 'job type': 'ray', 'job id': 'job_id'}
13:12:40 INFO - data factory data_ is using local data access: input_folder - output/02_chunk_out output_folder - output/03_ededup_out
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ is using local data access: input_folder - output/02_chunk_out output_folder - output/03_ededup_out
13:12:40 INFO - data factory data_ max_files -1, n_sample -1
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ max_files -1, n_sample -1
13:12:40 INFO - data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.parquet'], files to checkpoint ['.parquet']
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.parquet'], files to checkpoint ['.parquet']
13:12:40 INFO - Running locally
INFO:data_processing_ray.runtime.ray.transform_launcher:Running locally
2024-12-25 13:12:43,179 INFO worker.py:1744 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
13:12:45 INFO - Exception running ray remote orchestration
ray::orchestrate() (pid=8395, ip=172.28.0.12)
File "/usr/local/lib/python3.10/dist-packages/data_processing_ray/runtime/ray/transform_orchestrator.py", line 58, in orchestrate
runtime = runtime_config.create_transform_runtime()
AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'
INFO:data_processing_ray.runtime.ray.transform_launcher:Exception running ray remote orchestration
ray::orchestrate() (pid=8395, ip=172.28.0.12)
File "/usr/local/lib/python3.10/dist-packages/data_processing_ray/runtime/ray/transform_orchestrator.py", line 58, in orchestrate
runtime = runtime_config.create_transform_runtime()
AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'
13:12:45 INFO - Completed execution in 0.084 min, execution result 1
INFO:data_processing_ray.runtime.ray.transform_launcher:Completed execution in 0.084 min, execution result 1
Reproduction script
https://colab.research.google.com/drive/1Z9Y_aOmTrIc3Y7-Rn75bDEy-kvLvk29F?usp=sharing
Anything else
No response
OS
Other
Python
3.10.x
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: