-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for shows.acast.com #32747
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work!
I've made a couple of suggestions to fix/improve the regexes (should make the CI tests pass).
For tests that don't work because the page is unavailable, you can add 'skip': 'reason why unavailable',
to the test case dict with the info_dict
key. If that leaves no valid test page for one of the extraction tactics being tested, a test with a valid page that has the same structure has to be added, or it should be documented in comments that the structure is obsolete.
Co-authored-by: dirkf <[email protected]>
I committed your suggestions and changed the tests as you asked. Let's see whether there are still any errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes, not all related to the current PR, make the download tests pass:
--- PR/youtube-dl/youtube_dl/extractor/acast.py
+++ new/youtube-dl/youtube_dl/extractor/acast.py
@@ -55,16 +55,16 @@
'''
_TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
- 'md5': 'f5598f3ad1e4776fed12ec1407153e4b',
+ 'md5': '9904ad5409bbc3b945efc1fe55e5f5d4',
'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3',
'title': '2. Raggarmordet - Röster ur det förflutna',
- 'description': 'md5:a992ae67f4d98f1c0141598f7bebbf67',
+ 'description': r're:(?s)Spår får kontakt med raggarna från 1979\. En titt i arkiven ger oss en gammal förundersökning som någon sparat trots alla bestämmelser om att den bör förstöras\. .{356}\.$', # md5:a992ae67f4d98f1c0141598f7bebbf67',
'timestamp': 1477346700,
'upload_date': '20161024',
'duration': 2766,
- 'creator': 'Anton Berg & Martin Johnson',
+ 'creator': 'Third Ear Studio',
'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna',
}
@@ -83,7 +83,7 @@
}]
def _real_extract(self, url):
- channel, display_id = re.match(self._VALID_URL, url).groups()
+ channel, display_id = re.match(self._VALID_URL, url).group('channel', 'id')
episode = self._call_api(
'%s/episodes/%s' % (channel, display_id),
display_id, {'showInfo': 'true'})
@@ -106,8 +106,7 @@
'info_dict': {
'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus',
- 'description': 'md5:c09ce28c91002ce4ffce71d6504abaae',
- 'skip': 'Error 404'
+ 'description': r're:Hosted by Nosheen Iqbal and Michael Safi, Today in Focus brings you closer to Guardian journalism\. .{994}$',
},
'playlist_mincount': 200,
}, {
@@ -120,7 +119,8 @@
'url': 'https://play.acast.com/s/getthebeltpod',
'info_dict': {
'id': '4caac858-3cda-44e6-b684-50d88b909c18',
- 'title': 'Get The Belt'
+ 'title': 'Get The Belt',
+ 'description': r'''re:(?s)British Podcast that you probably are not ready for! Join Myles with segments 'such as Myles Measures' that discusses relevant pop culture and society shenanigans, .{867}$''',
},
'playlist_mincount': 138,
}]
@@ -137,4 +137,4 @@
for episode in (show.get('episodes') or []):
entries.append(self._extract_episode(episode, show_info))
return self.playlist_result(
- entries, show.get('id'), show.get('title'), show.get('description'))
+ entries, show.get('id'), show.get('title'), clean_html(show.get('description')))
Example:
$ python test/test_download.py TestDownload.test_ACast
[acast] 2.raggarmordet-rosterurdetforflutna: Downloading JSON metadata
[info] Writing video description metadata as JSON to: test_ACast_2a92b283-1a75-4ad8-8396-499c641de0d9.info.json
[debug] Invoking downloader on u'https://sphinx.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna/media.mp3?tk=eyJ0ayI6ImRlZmF1bHQiLCJhZHMiOnRydWUsInNwb25zIjp0cnVlLCJzdGF0dXMiOiJwdWJsaWMifQ==&sig=4mClRAB0qh2d8ZixupPWgjm9YxaIJSPcak-ZATQ-H4g'
[download] Destination: test_ACast_2a92b283-1a75-4ad8-8396-499c641de0d9.mp3
[download] 100% of 10.00KiB in 00:00
.
----------------------------------------------------------------------
Ran 1 test in 3.362s
OK
$
For long text fields, the download test proposes a test value expressed as a digest using MD5. Such a test value is accurate and short, but doesn't reveal what changed if the test starts to fail, as happened here. In the diff I used pattern test values for |
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
I tried to add support for Acast's url
shows.acast.com
. It's my first PR and I wasn't able to check my regex properly so I hope someone more experienced can review it properly. That being said, I tested with the new URLs and it works 🙂.While testing with the old URLs to ensure no regression, I noticed that some of them are not active anymore: https://www.acast.com/todayinfocus returns a 404.
I didn't remove it as I wasn't sure I should but I'm happy to do a bit of cleaning if someones greenlights it.
I also saw that they are using a new format with a dollar sign for embeds URL but (not being a regex expert) I failed to add it. Here's an example:
https://embed.acast.com/$/611233d8767fdf0012f22cb6/611233e4988b6a001394b5cf?feed=true