-
-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update bulk image conversion script to support multiple formats #226
base: dev
Are you sure you want to change the base?
Update bulk image conversion script to support multiple formats #226
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anushlinux you have understood and handled a lot of code flows(in local scripts) to OMRChecker. Great work on your first PR! 🎉
I've finished my review and I think we will need some discussion to make it even more extensible. Let's discuss async in the discord channel once you go through my comments.
|
||
# Usual pre-processing commands for speedups (useful during re-runs) | ||
# python3 scripts/local/convert_images.py -i inputs/ --replace [--filter-ext png,jpg] --output-ext jpg | ||
# python3 scripts/local/resize_images.py -i inputs/ -o outputs --max-width=1500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep all these comments as a guide for the contributors to figure out what needs to be done here
pages = convert_from_path(input_path) | ||
for i, page in enumerate(pages): | ||
output_path = os.path.join(output_dir, f"page_{i + 1}.jpg") | ||
page.save(output_path, 'JPEG') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also include file name here
pages = convert_from_path(input_path) | |
for i, page in enumerate(pages): | |
output_path = os.path.join(output_dir, f"page_{i + 1}.jpg") | |
page.save(output_path, 'JPEG') | |
pages = convert_from_path(input_path) | |
input_path = Path(PurePosixPath(input_path).as_posix()) | |
file_name = input_path.name | |
for i, page in enumerate(pages): | |
output_path = os.path.join(output_dir, f"{file_name}_page_{i + 1}.jpg") | |
print(f"Saving page: {output_path}") | |
page.save(output_path, 'JPEG') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also convert_image and convert_pdf_to_jpg should come from image_utils instead of bulk_ops_common.py.
I am thinking of keeping this file only for operations that are common across all sorts of bulk scripts (bulk resize, convert, watermark, etc)
""" | ||
pages = convert_from_path(input_path) | ||
for i, page in enumerate(pages): | ||
output_path = os.path.join(output_dir, f"page_{i + 1}.jpg") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feature extension idea here, see if you can implement -
Let's add a support to extract only the first page based on a flag (in that case the image's output_dir shouldn't be created and the single image should be output directly with same filename without the page_
prefix.
os.makedirs(output_dir, exist_ok=True) | ||
extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf'] | ||
|
||
filepaths = walk_and_extract_files(input_dir, extensions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snake case
filepaths = walk_and_extract_files(input_dir, extensions) | |
file_paths = walk_and_extract_files(input_dir, extensions) |
Bulk converts images and PDFs to the specified format. | ||
""" | ||
os.makedirs(output_dir, exist_ok=True) | ||
extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep pdf handling in a separate script (check other comments)
extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif', 'pdf'] | |
extensions = ['png', 'jpg', 'jpeg', 'tiff', 'tif'] |
for input_path in filepaths: | ||
relative_path = os.path.relpath(os.path.dirname(input_path), input_dir) | ||
output_subdir = os.path.join(output_dir, relative_path) if not in_place else os.path.dirname(input_path) | ||
os.makedirs(output_subdir, exist_ok=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The in_place
flag handling can be a common bulk ops utility. I propose we define a function bulk_apply_on_files(func, input_dir, file_paths, in_place)
- The first arg
func
can beconvert_image
orconvert_pdf_to_image
- The bulk_apply_on_files applies this passed function with the same arguments
- if in_place is
True
and the conversion was successful, the input file gets removed (pdf or image) and the output files are added in the same directory - if in_place is
False
and the conversion was successful, the output files are added in a relative directory output_dir =f"{input_dir}/outputs/"
- If the conversion was unsuccessful, no file changes happen
- if in_place is
Let's discuss more about this in the discord group if needed.
|
||
converted_count = convert_images_in_tree(args) | ||
trigger_size = args["trigger_size"] | ||
converted_count = convert_images(args.filenames, args.format.upper()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can call the function bulk_apply_on_files(convert_image, input_dir, file_paths, in_place)
or something similar based on our discussion (check other comments below)
I have updated the conversion code and implemented my function to the convert_image_hook as you required
Pls review the code
(ps: sorry for deleting the previous pr i did it bimistake)