Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] #902

Open
1 of 2 tasks
usmanxia opened this issue Dec 25, 2024 · 1 comment
Open
1 of 2 tasks

[Bug] #902

usmanxia opened this issue Dec 25, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@usmanxia
Copy link

usmanxia commented Dec 25, 2024

Search before asking

  • I searched the issues and found no similar issues.

Component

Transforms/universal/ededup

What happened + What you expected to happen

I am trying to run the Google Colab Notebook demo_with_launcher.ipynb availbale on the website but when I run the cell for exact dedupliaction i get following error :-

AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'

Complete trace of the cell is mentioned below:-

13:12:40 INFO - exact dedup params are {'doc_column': 'contents', 'doc_id_column': 'document_id', 'use_snapshot': False, 'snapshot_directory': None, 'hash_cpu': 0.5, 'num_hashes': 2}
INFO:ededup_transform_base:exact dedup params are {'doc_column': 'contents', 'doc_id_column': 'document_id', 'use_snapshot': False, 'snapshot_directory': None, 'hash_cpu': 0.5, 'num_hashes': 2}
13:12:40 INFO - pipeline id pipeline_id
INFO:data_processing.runtime.execution_configuration:pipeline id pipeline_id
13:12:40 INFO - code location None
INFO:data_processing.runtime.execution_configuration:code location None
13:12:40 INFO - number of workers 2 worker options {'num_cpus': 0.3, 'max_restarts': -1}
INFO:data_processing_ray.runtime.ray.execution_configuration:number of workers 2 worker options {'num_cpus': 0.3, 'max_restarts': -1}
13:12:40 INFO - actor creation delay 0
INFO:data_processing_ray.runtime.ray.execution_configuration:actor creation delay 0
13:12:40 INFO - job details {'job category': 'preprocessing', 'job name': 'ededup', 'job type': 'ray', 'job id': 'job_id'}
INFO:data_processing_ray.runtime.ray.execution_configuration:job details {'job category': 'preprocessing', 'job name': 'ededup', 'job type': 'ray', 'job id': 'job_id'}
13:12:40 INFO - data factory data_ is using local data access: input_folder - output/02_chunk_out output_folder - output/03_ededup_out
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ is using local data access: input_folder - output/02_chunk_out output_folder - output/03_ededup_out
13:12:40 INFO - data factory data_ max_files -1, n_sample -1
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ max_files -1, n_sample -1
13:12:40 INFO - data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.parquet'], files to checkpoint ['.parquet']
INFO:data_processing.data_access.data_access_factory_base976a1a51-da69-4193-8f5d-00c703487760:data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.parquet'], files to checkpoint ['.parquet']
13:12:40 INFO - Running locally
INFO:data_processing_ray.runtime.ray.transform_launcher:Running locally
2024-12-25 13:12:43,179 INFO worker.py:1744 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
13:12:45 INFO - Exception running ray remote orchestration
ray::orchestrate() (pid=8395, ip=172.28.0.12)
File "/usr/local/lib/python3.10/dist-packages/data_processing_ray/runtime/ray/transform_orchestrator.py", line 58, in orchestrate
runtime = runtime_config.create_transform_runtime()
AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'
INFO:data_processing_ray.runtime.ray.transform_launcher:Exception running ray remote orchestration
ray::orchestrate() (pid=8395, ip=172.28.0.12)
File "/usr/local/lib/python3.10/dist-packages/data_processing_ray/runtime/ray/transform_orchestrator.py", line 58, in orchestrate
runtime = runtime_config.create_transform_runtime()
AttributeError: 'EdedupRayTransformConfiguration' object has no attribute 'create_transform_runtime'
13:12:45 INFO - Completed execution in 0.084 min, execution result 1
INFO:data_processing_ray.runtime.ray.transform_launcher:Completed execution in 0.084 min, execution result 1

Reproduction script

https://colab.research.google.com/drive/1Z9Y_aOmTrIc3Y7-Rn75bDEy-kvLvk29F?usp=sharing

Anything else

No response

OS

Other

Python

3.10.x

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@usmanxia usmanxia added the bug Something isn't working label Dec 25, 2024
@Bytes-Explorer
Copy link
Collaborator

@touma-I pls take a look when you are back from vacation and feel free to assign to or take help of other team members.
cc @agoyal26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants