-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Char property macro prototype 1 #41
Conversation
Bidi_Class uses names with spaces and hyphens, meaning that the naive $ident match will not work. This also cuts down on chaff, because there are no longer two matches in the macro or repeated information (other than that in the docs). The doc repetition I think is valid, as docs should be considered separately and just populating them with the display name would be redundant.
The display names come from Unicode Chapter 4 <http://www.unicode.org/versions/Unicode10.0.0/ch04.pdf> Table 4-4. General Category
-.- how did I miss my own tests... |
This is cool! Thanks for working on it, @CAD97. It was fast!
I think there are a couple of helper functions in
I think this belongs to a non-ucd component, like So, I was thinking that any helper code, meaning not depending on unicode/i18n data and algorithm, should go under
I think yes. For now, mainly because we usually need these for conformance tests.
I think there are a bunch of fields that need to be defined for each variant. That's why I initially suggested to have a named syntax, instead of an ordered syntax. And, having the names, many of the fields would be optional, and fall back to another one if not provided. For example, there are property values with equal abbr and long names. If we want to provide one function to match both (which I think we should), we only need to list these once. Leaving Also, maybe we can even auto-gen long names from variant names. There are a couple of small crates for this, and I think we can depend on them, or just implement/copy it from there under its own
I think optional fields would be a great thing to have here, because otherwise a lot of manual formatting/etc will be needed for each variant. If we keep the diff limited to implementing the macro and an integration test for it, we can land it and work on it while the rest of the code is evolving, and switch whenever we feel it's ready.
I want to work on these traits before we get the macro out, because, as you already mentioned, this sets the contract for the macro. What I have in mind at the moment is one simple Let me take a shot at this tomorrow and we talk on that PR. WDYT? |
You take a shot and see what you can get. The annotated members is probably a better design for the macro; I just lack the macro knowledge required for the munching required to allow the flexibility it implies. I agree that adding the formal I'll make the simple changes you've suggested for a fair comparison, but I can't wait to see what you come up with. Just keep this macro limitation in mind; I've required exactly one attribute before each enum variant because of it, but you could also choose to require a sigil (such as |
If it's possible to rewrite a bound The reason for taking in the long name form is to allow using it as an |
Actually, I remembered this morning that I meant So, I think we can put both I'll submit a PR today. |
Supplanted by #48 |
48: Char property macro 2.0 r=behnam Replaces #41. See #41 for earlier discussion. An example will show better than I can tell: ```rust char_property! { /// Represents the Unicode character /// [*Bidi_Class*](http://www.unicode.org/reports/tr44/#Bidi_Class) property, /// also known as the *bidirectional character type*. /// /// * <http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types> /// * <http://www.unicode.org/reports/tr44/#Bidi_Class_Values> pub enum BidiClass { /// Any strong left-to-right character /// /// ***General Scope*** /// /// LRM, most alphabetic, syllabic, Han ideographs, /// non-European or non-Arabic digits, ... LeftToRight { abbr => L, long => Left_To_Right, display => "Left-to-Right", } /// Any strong right-to-left (non-Arabic-type) character /// /// ***General Scope*** /// /// RLM, Hebrew alphabet, and related punctuation RightToLeft { abbr => R, long => Right_To_Left, display => "Right-to-Left", } /// Any strong right-to-left (Arabic-type) character /// /// ***General Scope*** /// /// ALM, Arabic, Thaana, and Syriac alphabets, /// most punctuation specific to those scripts, ... ArabicLetter { abbr => AL, long => Arabic_Letter, display => "Right-to-Left Arabic", } } } /// Abbreviated name bindings for the `BidiClass` property pub mod abbr_names for abbr; /// Name bindings for the `BidiClass` property as they appear in Unicode documentation pub mod long_names for long; ``` expands to: ```rust /// Represents the Unicode character /// [*Bidi_Class*](http://www.unicode.org/reports/tr44/#Bidi_Class) property, /// also known as the *bidirectional character type*. /// /// * <http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types> /// * <http://www.unicode.org/reports/tr44/#Bidi_Class_Values> #[allow(bad_style)] #[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)] pub enum BidiClass { /// Any strong left-to-right character LeftToRight, /// Any strong right-to-left (non-Arabic-type) character RightToLeft, /// Any strong right-to-left (Arabic-type) character ArabicLetter, } /// Abbreviated name bindings for the `BidiClass` property #[allow(bad_style)] pub mod abbr_names { pub use super::BidiClass::LeftToRight as L; pub use super::BidiClass::RightToLeft as R; pub use super::BidiClass::ArabicLetter as AL; } /// Name bindings for the `BidiClass` property as they appear in Unicode documentation #[allow(bad_style)] pub mod long_names { pub use super::BidiClass::LeftToRight as Left_To_Right; pub use super::BidiClass::RightToLeft as Right_To_Left; pub use super::BidiClass::ArabicLetter as Arabic_Letter; } #[allow(bad_style)] #[allow(unreachable_patterns)] impl ::std::str::FromStr for BidiClass { type Err = (); fn from_str(s: &str) -> Result<Self, Self::Err> { match s { "LeftToRight" => Ok(BidiClass::LeftToRight), "RightToLeft" => Ok(BidiClass::RightToLeft), "ArabicLetter" => Ok(BidiClass::ArabicLetter), "L" => Ok(BidiClass::LeftToRight), "R" => Ok(BidiClass::RightToLeft), "AL" => Ok(BidiClass::ArabicLetter), "Left_To_Right" => Ok(BidiClass::LeftToRight), "Right_To_Left" => Ok(BidiClass::RightToLeft), "Arabic_Letter" => Ok(BidiClass::ArabicLetter), _ => Err(()), } } } #[allow(bad_style)] #[allow(unreachable_patterns)] impl ::std::fmt::Display for BidiClass { fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result { match *self { BidiClass::LeftToRight => write!(f, "{}", "Left-to-Right"), BidiClass::RightToLeft => write!(f, "{}", "Right-to-Left"), BidiClass::ArabicLetter => write!(f, "{}", "Right-to-Left Arabic"), BidiClass::LeftToRight => write!(f, "{}", "Left_To_Right".replace('_', " ")), BidiClass::RightToLeft => write!(f, "{}", "Right_To_Left".replace('_', " ")), BidiClass::ArabicLetter => write!(f, "{}", "Arabic_Letter".replace('_', " ")), _ => { write!( f, "{}", match *self { BidiClass::LeftToRight => "L", BidiClass::RightToLeft => "R", BidiClass::ArabicLetter => "AL", BidiClass::LeftToRight => "LeftToRight", BidiClass::RightToLeft => "RightToLeft", BidiClass::ArabicLetter => "ArabicLetter", } ) } } } } #[allow(bad_style)] impl ::char_property::EnumeratedCharProperty for BidiClass { fn abbr_name(&self) -> &'static str { match *self { BidiClass::LeftToRight => "L", BidiClass::RightToLeft => "R", BidiClass::ArabicLetter => "AL", } } fn all_values() -> &'static [BidiClass] { const VALUES: &[BidiClass] = &[ BidiClass::LeftToRight, BidiClass::RightToLeft, BidiClass::ArabicLetter, ]; VALUES } } ``` All three of the `abbr`, `long`, and `display` properties of the enum are optional, and have sane fallbacks: `abbr_name` and `long_name` return `None` if unspecified, and `fmt::Display` will check, in order, for `display`, `long_name`, `abbr_name`, and the variant name until it finds one to use (stringified, of course). `FromStr` is defined, matching against any of the provided `abbr`, `long`, and variant name. <hr /> Important notes: - <strike>The current format uses associated consts, so it works on beta but won't work on stable until 1.20 is stable.</strike> - Consts have a slightly different meaning than `pub use` -- `pub use` aliases the type where `const` is a new object and if used in pattern matching is a `==` call and not a pattern match. - For this reason I'm actually slightly leaning towards using `pub use` even once associated consts land; they're compartmentalized (so `use Property::*` doesn't pull in 3x as many symbols as there are variants). After using the const based aliasing for a little bit, I'm inclined to like the current solution of `unic::ucd::bidi::BidiClass::*` + `unic::ucd::bidi::bidi_class::abbr_names::*`. These really should be a `pub use` and not a `const`. - Note that I still think `const` are the way to go for cases like `Canonical_Combining_Class`, though. - <strike>The current syntax could easily be adapted to use modules instead of associated consts, but was written with the associated consts so we could get a feel of how it would look with them.</strike> - The zero-or-more meta match before a enum variant conflicts with the ident match before 1.20. See rust-lang/rust#42913, rust-lang/rust#24189 - There only tests of the macro are rather thin and could be expanded. - It's a macro, so the response when you stick stuff not matching the expected pattern is cryptic at best. - The `CharProperty` trait is pretty much the lowest common denominator. It's a starting point, and we can iterate from there. - How and where do we want to make `CharProperty` a externally visible trait? Currently having it in namespace is the only way to access `abbr_name` and `long_name`. - <strike>Earlier discussion suggested putting these into `unic::utils::char_property`. Moving it would be simple, but for now it's living in the root of `unic-utils`</strike> - <strike>The crate `unic-utils` is currently in the workspace by virtue of being a dependency of `unic`, but is not in any way visible a crate depending on `unic`.</strike> - <strike>Documentation doesn't exist.</strike>
This PR provides a macro to somewhat automate the implementation of Unicode
Catalog
andEnumeration
property types (and uses it in ucd/bidi and ucd/category).I haven't touched ucd/normal because I have yet to do the necessary reading about the composition/decomposition specification in Unicode that is a prerequisite to me trusting myself to make changes in it.
Open questions:
unic-ucd-core
? I'm starting to think so; every UCD crate will almost certainly want to use the macro if they expose a Catalog or Enumeration property. And since it's a macro, they just wouldn't#[macro_use]
on their import (which they need forUnicodeVersion
).try_from_name(&str) -> Option<$name>
(or similar) fn? To do so would entail adding theLong_Name
form to the macro definition. (It's just an$ident
and syntax bikeshedding away from being added.)UnicodeProperty
trait. It's a seperate issue to this macro, but related. SinceNumber
properties exist, I'm thinking the only across-all-properties functionality we can promise is anof(ch: char) -> Self
. Should we also have aUnicodePropertyWithAbbreviatedName
(horrible example name) that exposesabbr_name(&self)
andname(&self)
? In any case, since they're all constructed the same way with the same methods they should probably be promised in a trait, even if they remain intrinsic methods.