-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/ard #32688
base: master
Are you sure you want to change the base?
Bugfix/ard #32688
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work!
Is there a related issue?
I've made some suggestions that should also help to pass the CI tests.
You might like to review the yt-dlp extractor as updated in yt-dlp/yt-dlp#9037 and include any additional functionality from there,. or generally re-align the code.
|
||
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(((?:[^/]+/)?(?:player|live|video|serie|sendung)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+))|(((?P<sender>[a-zA-Z0-9\-]+)([/]))?(?P<name>[a-zA-Z0-9\-]+)))' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(((?:[^/]+/)?(?:player|live|video|serie|sendung)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+))|(((?P<sender>[a-zA-Z0-9\-]+)([/]))?(?P<name>[a-zA-Z0-9\-]+)))' | |
_VALID_URL = r'''(?x) | |
https://(?:(?:beta|www)\.)?ardmediathek\.de/ | |
(?: | |
(?:[^/#?]+/)?(?:player|live|video|serie|sendung)/(?:[^/#?]+/)* | |
(?P<id>Y3JpZDovL[a-zA-Z0-9]+)| | |
(?P<sender>[a-zA-Z0-9-]+)/)?(?P<name>[a-zA-Z0-9-]+) | |
) | |
''' |
def _match_id(cls, url): | ||
def _match_id(cls, url, group_name = 'id'): | ||
m = cls.__match_valid_url(url) | ||
assert m | ||
return compat_str(m.group('id')) | ||
return compat_str(m.group(group_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls revert these and use _match_valid_url()
in the extractor, as suggested above.
video_id = self._match_id(url) | ||
video_name = self._match_id(url, group_name='name') | ||
sender = self._match_id(url, group_name='sender') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
video_id = self._match_id(url) | |
video_name = self._match_id(url, group_name='name') | |
sender = self._match_id(url, group_name='sender') | |
video_id, video_name, sender = self._match_valid_url(url).group('id, 'name', 'sender') |
if '/serie/' in url or '/sendung/' in url: | ||
return self._real_extract_serie(video_id) | ||
elif 'none' != video_name.lower(): | ||
return self._real_extract_named_serie(video_name, sender if 'none' != sender.lower() else "ard") | ||
else: | ||
return self._real_extract_video(video_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL pattern matches either for video_id
or for video_name
, so that logic can be used. A non-matched group will be None
rather than 'none'
.
if '/serie/' in url or '/sendung/' in url: | |
return self._real_extract_serie(video_id) | |
elif 'none' != video_name.lower(): | |
return self._real_extract_named_serie(video_name, sender if 'none' != sender.lower() else "ard") | |
else: | |
return self._real_extract_video(video_id) | |
if video_id is not None: | |
return self._real_extract_serie(video_id) | |
return self._real_extract_named_serie(video_name, sender if sender is not None else 'ard') |
If video_name.lower() == 'none'
is an actual possibility, add a test for it and raise ExtractorError( ..., expected=True)
for that case. Or wrap the group in `(?!none...) in the pattern so that it doesn't match.
}).encode(), headers={ | ||
'Content-Type': 'application/json' | ||
})['data']['playerPage'] | ||
f'https://api.ardmediathek.de/page-gateway/pages/ard/item/{video_id}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Py2 (could use .format(video_id)
but this style is used elsewhere):
f'https://api.ardmediathek.de/page-gateway/pages/ard/item/{video_id}', | |
'https://api.ardmediathek.de/page-gateway/pages/ard/item/' + video_id, |
page_number = 0 | ||
page_size = 100 | ||
|
||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the loop condition explicit:
while True: | |
total = traverse_obj(widgets, ('pagination', 'totalElements', T(int))) or 0 | |
while page_number * page_size <= total: |
total = widgets['pagination']['totalElements'] | ||
if (page_number + 1) * page_size > total: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer needed:
total = widgets['pagination']['totalElements'] | |
if (page_number + 1) * page_size > total: | |
break |
widgets = self._download_json( | ||
f'https://api.ardmediathek.de/page-gateway/pages/{sender}/editorial/{video_id}', | ||
video_id, | ||
query={'pageSize': str(10), 'pageNumber': 0} | ||
)['widgets'] | ||
|
||
for widget in widgets: | ||
widget_id = widget['id'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
widgets = self._download_json( | |
f'https://api.ardmediathek.de/page-gateway/pages/{sender}/editorial/{video_id}', | |
video_id, | |
query={'pageSize': str(10), 'pageNumber': 0} | |
)['widgets'] | |
for widget in widgets: | |
widget_id = widget['id'] | |
widgets = self._download_json( | |
'https://api.ardmediathek.de/page-gateway/pages/{0}/editorial/{1}'.format(sender, video_id), | |
video_id, query={'pageSize': 10, 'pageNumber': 0} | |
) | |
for widget_id in traverse_obj(widgets, ('widgets', Ellipsis, 'id')): |
for teaser in page_data['teasers']: | ||
if 'EPISODE' == teaser.get('coreAssetType', None) and teaser['type'] not in ['poster'] and ':' not in teaser['id']: | ||
|
||
item = self._real_extract_video(teaser['id']) | ||
item['webpage_url'] = f"https://www.ardmediathek.de/video/{teaser['id']}" | ||
entries.append(item) | ||
|
||
total = page_data['pagination']['totalElements'] | ||
if (page_number + 1) * page_size > total: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as before:
for teaser in page_data['teasers']: | |
if 'EPISODE' == teaser.get('coreAssetType', None) and teaser['type'] not in ['poster'] and ':' not in teaser['id']: | |
item = self._real_extract_video(teaser['id']) | |
item['webpage_url'] = f"https://www.ardmediathek.de/video/{teaser['id']}" | |
entries.append(item) | |
total = page_data['pagination']['totalElements'] | |
if (page_number + 1) * page_size > total: | |
break | |
entries.extend(traverse_obj(page_data, ( | |
'teasers', lambda _, v: 'EPISODE' == ['coreAssetType'] and v.get('type') != 'poster' and ':' not in v['id'], | |
'id', T(self._mk_teaser)))) | |
total = traverse_obj(page_data, ( | |
'pagination', 'totalElements', T(int))) or 0 |
while True: | ||
page_data = self._download_json( | ||
f'https://api.ardmediathek.de/page-gateway/widgets/{sender}/editorials/{widget_id}', | ||
video_id, | ||
query={'pageSize': page_size, 'pageNumber': page_number} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make loop condition explicit:
while True: | |
page_data = self._download_json( | |
f'https://api.ardmediathek.de/page-gateway/widgets/{sender}/editorials/{widget_id}', | |
video_id, | |
query={'pageSize': page_size, 'pageNumber': page_number} | |
total = 0 | |
while page_number * page_size <= total: | |
page_data = self._download_json( | |
'https://api.ardmediathek.de/page-gateway/widgets/{0}/editorials/{1}'.format(sender, widget_id), | |
video_id, query={'pageSize': page_size, 'pageNumber': page_number} |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Fixing the ARD extractor because API is changed