-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook template for transforms to run on Google Colab #851
base: dev
Are you sure you want to change the base?
Notebook template for transforms to run on Google Colab #851
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Ryan-Gordon-314159. As we discussed, the open colab icon will work only when this PR is merged. Other than that, I have tested this by manually running it on Google Colab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments. We need to make this notebook tell a story (not just copy from other notebooks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change the notebook to tell a story that would be interesting to the end user. What do you think of this?
- Step 1: Enable Collab
- Step 2: Pip install
- Step 3: Identify interesting pdf files and use web crawler to fetch them. If possible, identify pdf contant that has OCR in it
- Step 4: Configure the transform to generate MD, run the transform
- Step 5: Show output from the transform, not only the new content but any other fields that the transfrom may have produced that may be of interest based on the content
- Step 6: Repeat step 4 and 5 above but with a different configuration of the tansform for example, generating Json instead of MD or extracting OCR, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to consider getting rid of my_utils.py file, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do not download my_utils.py. I won't be able to merge this PR with this in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be download utils.py ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do not download my_utils.py. I won't be able to merge this PR with this in it.
Why are these changes needed?
This PR contains a notebook that can serve as a guide for porting transforms over to work on Google Colab
Related issue number (if any).
Issue #844