-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XHamster] Overhaul existing extractors and add playlist extractors #32579
base: master
Are you sure you want to change the base?
Conversation
* include domains listed as trusted in page, aliased to xhamster.com * excluding domains that redirect to xhamster (eg xhday.com)
* re-factor extraction code * use traverse_obj()
…count tested * eg not when `playlist_count` is specified * avoid `playlist_mincount` if a `lambda` test may test the count
* re-factor existing playlist extraction - for a URL with specified page, extract that oage only with `(p{n})` appended to title - otherwise follow next page continuations with `(all)` appended * add XHamsterCreatorIE for Creator/Pornstar/Celebrity pages * add XHamsterCategoryIE for Category pages * add XHamsterSearchIE for search pages with search term as base title * add XHamsterSearchKeyIE to support search with xhsearch[n]: pseudo-URL scheme
Channel support is needed too. One point to be considered there is a general policy issue. Where different versions and subsets of a playlist can be extracted, eg different sorts, 1 page vs all pages, various filters, should the playlist ID reflect these differences, or should that just be, say, in the title? I'd also welcome comments on this decorator that I'm proposing to add to the class classpropinit(classproperty):
""" A Python fubar: parent class vars are not in scope when the
`class suite` is evaluated, so disallowing `childvar = fn(parentvar)`.
Instead, the parent class has to be mentioned redundantly and
unmaintainably, since the current class isn't yet bound.
This decorator evaluates a class method and assigns its result
in place of the method.
class child(parent):
# before
childvar = fn(parent.parentvar)
# now
@classpropinit
def childvar(cls):
return fn(cls.parentvar)
# or
childvar = classpropinit(lambda cls: fn(cls.parentvar))
"""
... |
19cf05a
to
b2b622a
Compare
In my opinion, a generic version of the playlist should always be extracted. That would allow filtering after extraction using flags, and disambiguate between titles,
@classpropinit()
def func(cls):
... Not as useful here, but generally considered good practice for consistency with @decorator(option=value)
def func(...):
... |
Thanks, so this use of For playlist examples, consider the test URLs here. XH, like some other sites (YT less so), supports subset playlist URLs with additional path components and/or query parameters (XVideos also uses fragment tags). If such a URL is specified, that subset playlist, filtered and/or sorted as specified, must be what is wanted: then shouldn't the user be able to distinguish it using the playlist ID (and not just the title as implemented here)? Or else should the whole playlist be extracted regardless of the specific URL? This certainly wouldn't be right for search URLs. Here are the test URLs for
Another example that should be added:
So, in the last case, the PR would currently return ID |
The id should be unique, ideally with minimal processing. Using the path for that should work and require no further code. It doesnt matter much since the video ID is the important part. Extracting only the page and filters requested makes sense from a ux perspective as well imo. I am unsure if changing the title that way is the best, but honestly also have no better idea for what else to do. Its probably fine, video title and id matter more in this regard anyways. |
Should comment that I cloned this PR and it still would not download either running the folder directly or after building it into a wheel. It may have broken again. |
Boilerplate: own code, new features+improvement
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
This PR fixes and updates the existing XHamster[Embed,User] IEs and adds some new playlist extractors, including a pseudo-URL scheme
xhsearch...:
likeytsearch...
.Specifically:
traverse_obj()
(p{n})
appended to title; otherwise next page continuations are followed with(all)
appended; any additional qualifications are also added to the title (eg category searchhawaiian (fps=60,all)
)xhsearch...
allows searching from the yt-dl command line, egyoutube-dl "xhsearchall:no sex"
; if a result count is specified likexhsearch15:...
(first 15)
is added to the search term as the title.For testing the playlists, a previous performance enhancement that limited test playlist processing to the
playlist_mincount
if specified is now only applied if other playlist counts are not being tested.