-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bing Result Rewrite #148
Comments
I've determined that the base64 characters assumed by Clean Links is incomplete. While RFC4648 states base 64 encoding is |
On a separate note, the pre-existing base 64 encoding method is not base 64, it is base 32 because it does not also use |
All in all, for base64, you should use |
We’re matching base64 currently, otherwise the encoding for http and www wouldn’t start with On the use of url-base64 instead of normal base64, that would make more sense I agree, but CleanLinks has been built to decode what websites use − not what websites should use. Since browser’s It seems to be the case that bing uses proper url-base64, in which case we need to:
That should be enough for the usual mechanisms to kick in. If you’re willing to write a PR I can make some time to look at it. Frankly, I don’t think we can assume that any arbitrary string containing |
Yes and no. Your regex statement relies on case-insensitivity, instead of more properly defining the limited relationship of characters of the algorithm. It's slightly vague to rely on a regex flag separate from the actual characters list to denote character-space. You should turn off insensitivity and simply include the characters. It'd might be slightly faster if the backend routines don't need to attempt case-insensitivity matching. As for performance, I'm saying it might be slightly faster, is all. Maybe miniscule difference, just merely trying to be conscious of it.
Bing is using it above. It's also proper to use it according to RFC. It wouldn't hurt to simply have it. I don't know if you can use the
I've already attempted to write a rule, but my ability to make rules with
I don't think it is spurious. If anything, it's more proper to use correct encoding for file/url than not. |
We match strings starting with
Yes, so we can now support it. So far websites just percent-encoded the base64-encoded string including + and / just like they would any string passed into a URL.
It’s been a challenge writing something that works but I’m even worse with interfaces. Not sure how to make it more user-friendly, especially with limited resources I can spare at the moment. I think the wiki is a good read for starters but still pretty limited and doesn’t include latest features like search-replace. Also it seems the search-replace only applies to the path part of the URL and we’d be interested in modifying a parameter here.
Decoding random strings like |
Again, that's not what I'm talking about. Your regex statement (ignoring flag) is b32. With the extra a-z considered from the flag, it functions b64. It's just poorly written. As for performance, I understand it is almost completely inconsequential, I merely thought to do the best job the first time around.
Oh, it's quite alright, I understand the whole 'I'm a programmer, not a writer' plea, I'm the same way. “Want documentation? Read the code!” Heh. I tried the wiki but...it was almost useless for me.
Oh, I thought you meant adding b64, b64url, b32, et cetera. My bad. |
Curious...does Clean Links get a final path to work with such as any links that were HTTP becoming http or not? |
Both |
I meant, if someone wrote |
Follow-Up: I ended up removing the atob/btoa routines, they're not capable of handling RFC-4648 base64url unless you swap Edit: If you swap, they decode correctly, and you can simply add the other two Base64Url coding characters into the regex. I went with (in testing, could be handled better with the whole bing/youtube thing): In
|
Bing is now taking to using relative links: |
Bing has taken to obfuscating their search results.
Example:
https://www.bing.com/fd/ls/GLinkPing.aspx?IG=9AFD7A29FFBB46F1B9A81FF058C0640E&&ID=SERP,5206.1&url=https%3A%2F%2Fwww.bing.com%2Fck%2Fa%3F!%26%26p%3D0d3eae76a1129f5f677a93348d0d5d6ee2f5906c36c38d0cad86b467db7afa8aJmltdHM9MTY1MjkzNTY0NSZpZ3VpZD05YWZkN2EyOS1mZmJiLTQ2ZjEtYjlhOC0xZmYwNThjMDY0MGUmaW5zaWQ9NTIwNg%26ptn%3D3%26fclid%3Dc74e5d8f-d72e-11ec-a4f0-c181b1119cc1%26u%3Da1aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g_dj1YSHp0a0ZDemJUVQ%26ntb%3D1
Becomes:
https://www.bing.com/ck/a?!&&p=db4b3ebd2e9cc9b2ea00045df2468659404c3b0196457cff9450c3c8c5a7e73dJmltdHM9MTY1MjkxMTQ0MiZpZ3VpZD0yMGZhZDE5My0zNTMxLTRiNWEtOGFmOC1mYzdiODUyODRlOTEmaW5zaWQ9NTIyMA&ptn=3&fclid=6cea1929-d6f6-11ec-8509-a5dffd6b7157&u=a1aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g_dj1YSHp0a0ZDemJUVQ&ntb=1
Should be:
https://www.youtube.com/watch?v=XHztkFCzbTU
The section halfway through the
p
variable to the end is base64-encoded,&imts=1652911442&iguid=20fad193-3531-4b5a-8af8-fc7b85284e91&insid=5220
. They're stepping up their crap. Just strip the first two characters of theu
paramater and the rest,HR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g_dj1YSHp0a0ZDemJUVQ
is the URL base64-encoded. :)Edit: I know it's a bit beyond the scope of Clean Link's typical parameter allow/block-listing, but I think it'd be helpful to evolve it?
The text was updated successfully, but these errors were encountered: