Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Yiddish language ruleset #336

Merged
merged 2 commits into from
Dec 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Developed by [Florian Eckerstorfer](https://florian.ec) in Vienna, Europe with t
## Features

- Removes all special characters from a string.
- Provides custom replacements for Arabic, Austrian, Azerbaijani, Brazilian Portuguese, Bulgarian, Burmese, Chinese, Croatian, Czech, Esperanto, Estonian, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Italian, Latvian, Lithuanian, Macedonian, Norwegian, Polish, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian and Vietnamese special characters. Instead of removing these characters, Slugify approximates them (e.g., `ae` replaces `ä`).
- Provides custom replacements for Arabic, Austrian, Azerbaijani, Brazilian Portuguese, Bulgarian, Burmese, Chinese, Croatian, Czech, Esperanto, Estonian, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Italian, Latvian, Lithuanian, Macedonian, Norwegian, Polish, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese and Yiddish special characters. Instead of removing these characters, Slugify approximates them (e.g., `ae` replaces `ä`).
- No external dependencies.
- PSR-4 compatible.
- Compatible with PHP >= 8.
Expand Down
52 changes: 52 additions & 0 deletions Resources/rules/yiddish.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"יאַ": "ya",
"אַ": "a",
"אָ": "o",
"יאָ": "yo",
"א": "",
"בֿ": "v",
"ב": "b",
"ג": "g",
"ד": "d",
"ה": "h",
"װ": "v",
"וו": "v",
"יױ": "yoy",
"ױ": "oy",
"יוי": "yoy",
"וי": "oy",
"יו": "yu",
"ו": "u",
"ז": "z",
"ח": "kh",
"ט": "t",
"יײַ": "yay",
"ײַ": "ay",
"יי": "ey",
"ײ": "ey",
"יע": "ye",
"ייִ": "yi",
"יִ": "i",
"י": "i",
"כּ": "k",
"כ": "kh",
"ך": "kh",
"ל": "l",
"מ": "m",
"ם": "m",
"נ": "n",
"ן": "n",
"ס": "s",
"ע": "e",
"פּ": "p",
"פֿ": "f",
"פ": "ph",
"צ": "ts",
"ץ": "ts",
"ק": "k",
"ר": "r",
"שֹ": "s",
"ש": "sh",
"תּ": "t",
"ת": "s"
}
53 changes: 53 additions & 0 deletions src/RuleProvider/DefaultRuleProvider.php
Original file line number Diff line number Diff line change
Expand Up @@ -10896,6 +10896,59 @@ class DefaultRuleProvider implements RuleProviderInterface
'Ỷ' => 'Y',
'Ỹ' => 'Y',
),
'yiddish' =>
array (
'יאַ' => 'ya',
'אַ' => 'a',
'אָ' => 'o',
'יאָ' => 'yo',
'א' => '',
'בֿ' => 'v',
'ב' => 'b',
'ג' => 'g',
'ד' => 'd',
'ה' => 'h',
'װ' => 'v',
'וו' => 'v',
'יױ' => 'yoy',
'ױ' => 'oy',
'יוי' => 'yoy',
'וי' => 'oy',
'יו' => 'yu',
'ו' => 'u',
'ז' => 'z',
'ח' => 'kh',
'ט' => 't',
'יײַ' => 'yay',
'ײַ' => 'ay',
'יי' => 'ey',
'ײ' => 'ey',
'יע' => 'ye',
'ייִ' => 'yi',
'יִ' => 'i',
'י' => 'i',
'כּ' => 'k',
'כ' => 'kh',
'ך' => 'kh',
'ל' => 'l',
'מ' => 'm',
'ם' => 'm',
'נ' => 'n',
'ן' => 'n',
'ס' => 's',
'ע' => 'e',
'פּ' => 'p',
'פֿ' => 'f',
'פ' => 'ph',
'צ' => 'ts',
'ץ' => 'ts',
'ק' => 'k',
'ר' => 'r',
'שֹ' => 's',
'ש' => 'sh',
'תּ' => 't',
'ת' => 's',
),
)/*INSERT_END*/;

/**
Expand Down
1 change: 1 addition & 0 deletions src/Slugify.php
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ class Slugify implements SlugifyInterface
// Languages are preferred if they appear later, list is ordered by number of
// websites in that language
// https://en.wikipedia.org/wiki/Languages_used_on_the_Internet#Content_languages_for_websites
'yiddish',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole list has an invalid order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what you mean? What makes the order invalid?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the comment above the list in the code.

list is ordered by number of websites in that language

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it also says "Languages are preferred if they appear later" meaning the more popular languages should come last. And as you see, Russian, German and Polish appear near the bottom of the list here, while they're near the top of the list (and in the reverse order) on the wiki page. You're right that the order in general doesn't match. For example, whoever put Romanian at the bottom of the list made a mistake, as it's number 23 on the wiki page, way below German, for example. But I believe I correctly placed Yiddish at the top of the list, giving it very low priority, as it doesn't even appear on the list of most popular languages. And you're making this comment on this commit which only adds Yiddish to the list, hence my confusion. Are you claiming that Yiddish was incorrectly placed or just that the list as a whole is not in the stated order?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, the whole list is off the order.

'armenian',
'azerbaijani',
'burmese',
Expand Down
1 change: 1 addition & 0 deletions tests/SlugifyTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,7 @@ public function defaultRuleProvider()
[str_repeat('hi🇦🇹', 5000), substr(str_repeat('hi-', 5000), 0, -1)],
['Č Ć Ž Š Đ č ć ž š đ', 'c-c-z-s-d-c-c-z-s-d'],
['Ą Č Ę Ė Į Š Ų Ū Ž ą č ę ė į š ų ū ž', 'a-c-e-e-i-s-u-u-z-a-c-e-e-i-s-u-u-z'],
['יאַן אַ טאָן יאָ אי רבֿ גיב דו האַװ האַוו יױרן יוירן אַזױ אַזוי יום־כּיפּור חנוכּה יײַכל מײַן בלײך ניי יע ייִדיש פֿליִען צוך סם פ קץ תּורת־אמת', 'yan-a-ton-yo-i-rv-gib-du-hav-hav-yoyrn-yoyrn-azoy-azoy-yum-kipur-khnukh-yaykhl-mayn-bleykh-ney-ye-yidish-flien-tsukh-sm-ph-kts-turs-ms'],
];
}

Expand Down
Loading