Useless Factor: oop

Showing posts with label oop. Show all posts

Thursday, March 4, 2010

A protocol for sets

It's bothered me for some time that Factor has a protocol (an informal abstract class) for sequences and associative mappings, but nothing for sets. In core and basis, there are three set implementations: you can use a hashtable as a set, a bit array, or a plain old sequence. Different words are used to manipulate these, and each set type has a rather incomplete set of operations.

In a branch in my repository, I've fixed this omission. Now all three are under a common protocol, and more can easily be added. If you're interested in reading about the details of the protocol, you can pull from this repository, bootstrap and read the documentation included.

There are a number of benefits to this.

All sets now support all operations using names that correspond to the intuitive meanings of the operations
It is much easier to change set representations within a piece of code: just change the code that initializes the set
It's easier to implement new types of sets because you get many operations 'for free' and a nice common API.

The design I went with had a few conflicting goals:

The protocol should be clean and simple, and therefore easy to implement
The new protocol should be mostly compatible with the existing sets vocabulary.
There should be no performance overhead (besides method dispatch) for using this protocol

Each of these had to be sacrificed a little. For performance, I had to make the protocol include many different operations, since different set implementations might have a way to override them for efficiency. To keep things simple and easy to implement, I did this only for operations that were currently in use in the Factor code base, and I included default methods for as many operations as I could. For compatibility, I made unordered sequences act as sets in a way that's very similar to the old sets vocabulary, and the major operations are generalizations of the operations on sequences.

There is not total backwards compatibility, though. If that had been a design requirement (say, if this were a change made on Factor 1.0), the design would be much cruftier. One change is that the word prune is gone, subsumed by members, which generates a sequence of the members of a set. Given a sequence, this gives a sequence with the same elements but only one copy of each.

A bigger change is that hashtables are no longer meant to be used as sets. In their place is a new data structure, the hash set. Hash sets have literal syntax like HS{ 1 2 3 }, similar to other collections. In their current implementation, they use a hashtable underneath, but in a future implementation a more memory-efficient construction may be used. Hash sets implement set operations and hashtables do not. Previously, words like conjoin, conjoin-at and key? were used to query hashtables as sets, but now these are subsumed by the set words adjoin, adjoin-at and in?. There is a lot of code that uses hashtables as sets, and it's not easy to sort out the set uses of hashtables from the non-set uses for someone who hasn't written the code. So for now, words to manipulate hashtables as sets are still present. conjoin and conjoin-at will be eliminated when all code in the Factor repository is updated to use hash sets instead.

It's somewhat to have a language evolution process where the language is not guaranteed to be compatible from one version to the next, as Factor is right now. There will continue to be incompatibilities until version 1.0 is released for the sake of clean organization of the language. There is a tradeoff here: incompatibilities make Factor harder to use now, and prevent adoption today, but the resulting system will be better-organized and easier to use as a result. Factor would probably be in much worse shape today if a policy of backwards compatibility had been adopted a few years ago, and it's a little too soon to start now in freezing the language.

Update: By the way, I forgot to mention in the original post: many other high-level programming languages don't have such a nice generic collection of data structures. Factor's isn't complete, but in many ways it's more advanced than many other popular programming languages. Java, C++ and C# are languages with generic data structures libraries, but using the data structures is more verbose due to certain missing language features, like the lack of syntax for literal hashtables. Popular scripting languages like Python, Ruby and Perl tend to privilege hashtables and resizable arrays over other data structures. It's possible to create data types that look just like arrays or hashtables using Python or Ruby in terms of their interface. But the higher methods for these data structures will only work for the builtin types and there's no way to make them work for your data type but to reimplement them. Functional languages like Scheme and Haskell have libraries for lists and arrays, but the interfaces are different for different data types. Even though Haskell has type classes, both of these languages' standard libraries are written with lists in mind for the most common operations. Factor used to resemble scripting languages in its support of data structures, but experience in writing large programs with these data structures has led to a better thought-out, more object-oriented model.

Tuesday, December 4, 2007

extra/delegation: an annotated vocabulary

In order to allow for better compiler optimizations change-slot operations and other improvements, Slava plans to add tuple inheritance to the Factor object system. The exact form of this isn't completely finalized, but it looks like delegation in its current form isn't going to stick around much longer. I proposed a new kind of delegation mechanism, along with other changes to the Factor object system. In the end, a more conservative change was chosen in the form of mixins.

I described the Factor object system before, but I'll review delegation here. Basically, when a generic word dispatches on the tuple, one of two things can happen: either a method is found, and then called, or no method is found. If no method is found, the generic word checks the delegate slot. If that slot is filled (with something that's not f), the method will be re-called on that delegate in place of the original object. For more information on this, see my previous blog post

There are a few problems with delegates. One problem is the slicing problem: when relying on delegation to extend behavior and a method is delegated, the method call knows only about the delegate, not the original object. Sometimes, this is the desired behavior, but other times it can lead to subtle bugs. Another problem is performance. When looking up the value of a direct tuple slot, the required operation is a simple method dispatch and pointer arithmetic. If the compiler knows the tuple class of an object before runtime, it can even eliminate the dispatch step. However, when looking at a delegate slot, there is an additional pointer indirection. Crucially, with the current system, there is no way to avoid the dispatch. Tuple inheritance would require fewer levels of indirection and give more opportunity for optimization. With delegation, method dispatch takes linear time with respect to the length of the delegate chain, and this is bad. A very subtle problem with delegation is that everything is delegated at once, when often, we only need to delegate a couple things.

Yet delegation is still sometimes useful, and there must be a way to fix the slicing problem without breaking everything*. I've written a vocabulary (vocab) called delegate which attempts to fix this. The premise is that this will constitute delegation after it is removed from the core. Central to the vocab is the concept of a group, or a word which signifies a number of methods. There are three builtin types of groups: tuple classes, generic words and protocols. There is a word called group-words which extracts the constituent words from a group:


GENERIC: group-words ( group -- words )

For a generic word, the group consists just of itself


M: generic group-words
    1array ;

For a tuple class, the group consists of all of the slot readers and writers, except for the delegate slot. (delegate and set-delegate aren't generic words, anyway)


M: tuple-class group-words
    "slots" word-prop 1 tail ! The first slot is the delegate
    ! 1 tail should be removed when the delegate slot is removed
    dup [ slot-spec-reader ] map
    swap [ slot-spec-writer ] map append ;

A protocol is an explicitly-defined list of generic words which form a group. The syntax PROTOCOL: protocol-name words... ; is used to define a protocol, and the words it contains are stored in the "protocol-words" word property.


: define-protocol ( wordlist protocol -- )
    swap { } like "protocol-words" set-word-prop ;

: PROTOCOL:
    CREATE dup reset-generic dup define-symbol
    parse-definition swap define-protocol ; parsing

PREDICATE: word protocol "protocol-words" word-prop ;

M: protocol group-words
    "protocol-words" word-prop ;

Here's an example of the defintion of the sequence protocol:


PROTOCOL: sequence-protocol
    clone clone-like like new new-resizable nth nth-unsafe
    set-nth set-nth-unsafe length immutable set-length lengthen ;

The advantage of defining delegation over these groups is that, usually, we don't need to delegate for everything. With this, multiple delegation over disjoint groups is possible. This is in use in the xml.data vocab, to let tags act like their names, their attributes assoc and their children sequence at the same time. It could also be used for multiple tuple inheritance.

One mechanism which replaces delegation is called consultation. Billy Tanksley has suggested that this is what Factor's delegation should be termed; in essence, consultation is the same as delegation, except that it's limited to a single group. Say you have some class foo which implements all generic words in the group bar. There is another class baz, which has a word baz>foo itself into into a foo, or getting a corresponding foo to which to delegate the methods. Then, we could define CONSULT: bar baz baz>foo ; in order to define the delegation. If bar contains the words qux and quux, then the following method definitions will be generated:


M: baz qux baz>foo qux ;
M: baz quux baz>foo quux ;

Here's the implementation:


: spin ( x y z -- z y x ) ! This stack shuffler makes things easier to think about
    swap rot ;

: define-consult-method ( word class quot -- )
    pick add <method> spin define-method ;

: define-consult ( class group quot -- )
    >r group-words r>
    swapd [ define-consult-method ] 2curry each ;

: CONSULT:
    scan-word scan-word parse-definition swapd define-consult ; parsing

So, this fixes one problem with delegation: that it happens for everything. The second problem, performance, can be solved with increased static typing, for example static typing of tuple slots. But this won't be an available feature of Factor for some time. Until then, if we're sure that a tuple slot is a specific class, we can use declare to let the compiler know. For example,


CONSULT: bar baz
    baz>foo { foo } declare ;

There's still one last problem, the slicing problem. A complete solution would involve heavy hackery into the Factor method dispatch system, allowing one object to mimic another. But a simpler solution is to just copy all of the method implementations from one class to another, for the generic words in a specific group. This can be unsafe in some situations, such as when the group is a tuple class, but it can also be useful. I called the system mimicry.


: define-mimic ( group mimicker mimicked -- )
    >r >r group-words r> r> [
        pick "methods" word-prop at [ 2drop ]
        [ method-def <method> spin define-method ] if*
    ] 2curry each ; 

: MIMIC:
    scan-word scan-word scan-word define-mimic ; parsing

So there you have it: in fewer than 40 lines, the basis for a replacement of the delegation system, at least partially fixing all three main problems. The code is in my git repository. There are still problems, of course: metadata about what mimics/consults what is not stored, and when you see a generic word or class, you'll see all of the mechanically-defined methods. It's amazing, though, how flexible Factor is to this kind of manipulation and metaprogramming.

* Actually, it's possible to re-implement delegation in Factor, completely eliminating the slicing problem, but at a slight performance cost for tuple slot lookups, and at the cost of breaking a lot of code. The thing is, if instances of tuple class a all delegate to tuple class b, then a instances will return f to b?. To get around this, semi-hacks like the is? word are used.

Friday, October 26, 2007

Factor's object system

In my FAQ, I wrote that Factor was object-oriented. That might have confused some of you: how is Factor object-oriented when code isn't organized in terms of classes like in Java? When there is no system for message passing*? When tuples can't inherit from each other?

Rationale

Originally, Factor didn't have an object system. It was said that it's not necessary; we can get by with just words, conditionals and lists. But as the library grew more complicated, it became clear that we needed a cleaner method of dispatch. This took the form of generic words, or words that can be overloaded by the class of one of their arguments. Initially, we only had predicate classes, and generic words were in fact no more than syntactic sugar for cond. But we also needed a good way to define data types, so that they wouldn't be confused with each other. Later, we realized the need for a system of 'inheritance' from tuples to tuples, flexible constructors, extensible unions and other things. Piece by piece, features were added to form a emerging coherent whole which could be used to organize some Factor code.

It's important to note that most words are not methods and not all data is stored in tuples. Groups of Factor code are put in vocabularies, not in classes. It's common to write a lot of code without using any object-oriented features (aside from generic words defined in libraries). Nevertheless, object orientation, when used appropriately, makes code much better organized.

The class hierarchy

In Factor, everything is an object. Well, of course everything is an object; everything is something! But I mean everything is an instance of the class object. Under that class, there are a number of built-in classes, representing basic things like tuple, vector, array, fixnum, float, bignum, ratio, quotation, curry, etc. You can test if an object is in a particular class with words of the scheme object?. To test if an object is a tuple, use the word tuple?.

On top of these builtin classes, there are a number of abstractions. For example, the class number is the union of all numeric classes. Under this is a numerical tower of subclasses, similar to Scheme. An example of a couple unions of this kind:


UNION: integer fixnum bignum ;
UNION: rational ratio integer;

Factor also supports extensible unions, called mixins. Whereas a union is closed, later code is allowed to add things to a mixin. The above code can be translated into mixin syntax as:


MIXIN: rational
MIXIN: integer
INSTANCE: rational ratio
INSTANCE: rational integer
INSTANCE: integer fixnum
INSTANCE: integer bignum

This is useful for things like sequences and assocs, which form mixin classes. Unions are inappropriate here, as new sequences and assocs can be made in the library.

Going in the other direction, classes can be narrowed down using predicate classes. A predicate class is a subclass of some other class consisting of instances which satisfy a conditional. For example, the natural numbers (defined as integers 0 or greater) could be defined as


PREDICATE: integer natural 0 >= ;

Generic words

So what use are all these classes? We can use predicates on them to test membership, but that's not all that useful. The centerpiece of the object system is generic words, or words which dispatch on the class of their arguments. Typically, only single dispatch is needed, so multiple dispatch is not in the core**. A generic word which dispatches on the top of the stack is defined with the parsing word GENERIC:. Methods, or implementations of the generic word, are written with the parsing word M:.Take the following useless example:


GENERIC: process ( thing -- processed )
M: string process
    "foo" append ;
M: integer process
    5 + ;

Here, there is a generic word process with two methods: one on strings (which appends "foo") and one on integers (which adds 5). This is equivalent to:


: process ( thing -- processed )
    {
        { [ dup string? ] [ "foo" append ] }
        { [ dup integer? ] [ 5 + ] }
    } cond ;

But the first version of process has the advantage that it is extensible later, in other files. Additionally, generic words are sensitive to the class hierarchy and will test things in an appropriate order, and dispatch is more optimized than with a simple cond expression.

There are other ways to dispatch besides GENERIC:. Other method combinations include MATH:, for words which dispatch on builtin math types for two arguments, HOOK:, to dispatch on the class of the value of a variable, and GENERIC#, which allows dispatch on items further down on the stack. You can also make your own method combinations.

Tuples, delegation and constructors

It's possible to construct your own data types using just arrays and predicate classes, but that can get awkward and ambiguous. To make new data structures, Factor offers a tuples, basically another name for structs or records. Here is an example of a tuple class called foo, with slots bar and baz, together with a constructor called <foo>:


TUPLE: foo bar baz ;
C: <foo> foo

The first line creates the tuple class, and the second line makes the constructor. (Slava described the details and rationale for the constructor system.) It is simple to use this. To make a new foo tuple, just get the values of bar and baz on the stack and call <foo. With that object, the bar slot can be read with the word foo-bar the baz slot can be read with the word foo-baz. To write values to those slots, use the words set-foo-bar and set-foo-baz. And, of course, to test if an object is a foo tuple, use the word foo?. All in all, TUPLE: is a wildly unhygenic macro, and we like it that way!

But wouldn't it be useful if we could let tuples inherit from each other? We've already seen that Factor allows for general subtypes and supertypes, but this does not extend to tuples. The way that tuples can be based off each other is through delegation: one tuple delegates to another. Each tuple has a slot, called the delegate (accessed with delegate and set-delegate). When a generic word dispatches on a tuple, but it doesn't have a method implementation for that tuple, it looks in the delegate slot for what to do next. If there is a delegate, then the generic word is again invoked on that delegate, with the hope that generic word itself has an implementation for that class. If not, it's on to the next delegate until there is no delegate anymore--the delegate is f.***

Instead of inheriting from a class, tuples delegate to an instance. That is, if I have my tuple TUPLE: foo bar baz and I want it to inherit from TUPLE: bing quux ;, all I have to do is make an instance of foo and an instance of bing, and set the delegate of the foo to the bing. Slot accessors like bing-quux and set-bing-quux are generic words implemented, by default, only on that one tuple class that they were made for. So, if you call bing-quux on a tuple of class foo that delegates to a bing, the foo instance will fail to respond and the generic word will be called again on the delegate, bing. This delegate will know what to do, and it'll return its quux value.

There's a subtle problem here with delegates. Look at the following code:


TUPLE: foo bar baz ;
TUPLE: bing quux ;
GENERIC: who
M: bing who how ;
GENERIC: how
M: bing how drop "bing" print ;
M: foo how drop "foo" print ;

You might expect that when who is called on a foo instance, "foo" is printed, but in fact, "bing" is printed. This is called the slicing problem: the object is "sliced" when using delegates, so only part of the object is represented in further method calls. On one hand, the behavior of M: bing who is more predictable, but on the other hand, information is lost. It's not a horrible thing, just something to watch out for.

Use cases

As I said before, a lot of Factor code has no need for object orientation. Going by my own experience, some complicated libraries, such as Unicode and inverse, don't yet use the object system at all. The XML-RPC vocabulary relies heavily on generic words to perform dispatch in turning Factor data into XML-RPC format, though no new classes are defined. The XML library, however, uses tuples to represent XML at various stages of processing, and dispatches on the class of these tuples to provide complex functionality. Several tuples are defined in the XML library to represent exceptions, though any object can be thrown as an exception.

In general, the object system should be used when it's useful and not used when it's not useful. That seems simple enough, but it is a radical departure from the philosophy of other languages where all procedures are methods. It's good to use whichever strategy you think will produce the most maintainable, clear code.

The future

For a while, I hesitated to write about the object system because it changes so frequently. Tuple constructors, for example, changed very recently to eliminate the default constructor and replace it with a more flexible system. Mixins are also a recent addition, yet they significantly change the way default methods are implemented. Soon, Slava plans to replace delegation with tuple inheritance.

Since the Factor object system is constructed completely in Factor, rather than being built in to the virtual machine, any changes to it can be done within Factor itself. This is a huge advantage. One change that I proposed could easily be written in Factor and run without any special care. Fundamental changes to the object system can even be interactively tested and debugged!

* Eduardo Cavazos has written a message passing object system in Factor, and he uses it for some of his code. Factor allows other object systems to be created, but with his system, not everything is an 'object' that can receive messages. Still, it's an interesting (and surprisingly terse) example of the flexibility of Factor, and of useful syntax extension. It is not widely used, however.

** I implemented double dispatch as a library called visitor (since it's implemented with the visitor pattern, yielding a simple lexicographic method order). Maybe in the future we'll have double dispatch in the Factor core, to better implement words like like, which end up approximating double dispatch using conditionals. Though it makes sense in theory, I can't imagine any use for triple dispatch in Factor.

*** As you can see, if you make an infinite (that is, recursive) delegate chain, method resolution may not halt. But why would you ever want to do that? Anyway, since we have predicate classes, it's already not guaranteed to halt!

Blogging is much more fun than writing essays about the history of South Africa.

Monday, July 30, 2007

How to make ad-hoc polymorphism less ad hoc in Factor

Note: Originally, I used the words type, class and instance where I now use class, group and implementation, respectively, but I changed this because it was too confusing. This proposal may sound sketchy, which is because I haven't implemented it yet. But I'll have a working draft in a few days.

A few days ago, Slava told me that he had plans to radically change the Factor object system. At first, this alarmed me. I'm afraid we might lose users if we keep changing the language so much, with no end in sight. It's amazing how quickly Factor code undergoes bitrot. One of the changes he suggested included getting rid of class names in slot accessors, which I think is both useless and somewhat obfuscating. But after some thought I realized there was a potential for great changes here.

A proposal: what if Factor had type classes, like Haskell? No, it's impossible to port classes directly from Haskell in their current form. But type classes form a good model to build a dynamically typed object system around. For the rest of this blog posting, I'll use the word "group" to refer to my extended notion of a type class. I call it this because it refers to a group of generic words, as well as a group of classes that implement them. I'll continue referring to Factor classes as classes. This may confuse Haskellers, but it makes more sense, since it fits in with current Factor concepts. A good introduction to type classes is Philip Wadler's original paper, "How to make ad-hoc polymorphism less ad hoc." (Note that I'm not doing any of the dictionary translation part, really just the first part. Also, this does not include multiparameter type classes, but may be extended to that by Factor 2.0 to support homogeneous sequences when Factor is optionally statically typed. Really, the link to Haskell is more metaphorical than anything else. You can also think of groups as abstract classes.)

The sequence protocol, one of the first big uses of Factor's object system, is a good example of what's wrong with what we have right now. Here's some code from core/sequences/sequences.factor:


GENERIC: length ( seq -- n ) flushable
GENERIC: set-length ( n seq -- )
GENERIC: nth ( n seq -- elt ) flushable
GENERIC: set-nth ( elt n seq -- )
GENERIC: new ( len seq -- newseq ) flushable
GENERIC: new-resizable ( len seq -- newseq ) flushable
GENERIC: like ( seq prototype -- newseq ) flushable

<PRIVATE
GENERIC: nth-unsafe ( n seq -- elt ) flushable
GENERIC: set-nth-unsafe ( elt n seq -- )
PRIVATE>

Those are just a bunch of generic words, all detached from one another. The only thing connecting them is documentation, and the understanding that all sequences should implement all of them. (Or at least enough of them so that it can function; some of these methods have implementations on object and some don't.) Some methods have a convenient property that they can defer missing methods to the delegate of a tuple, but this only works if no default implementation on object exists.

The first thing to do is to provide a structure for these generic words. In documentation, they form a protocol, but I'll use them to form a group, called sequence. (Note that this code won't work, if you try to run it. It's just an idea. Hopefully I'll have an implementation in a month or so, but I have a lot to do right now.)


GROUP: sequence
    GENERIC: length ( seq -- n ) flushable
    GENERIC: nth ( n seq -- elt ) flushable
    <PRIVATE
        GENERIC: nth-unsafe ( n seq -- elt ) flushable
    PRIVATE>
    GENERIC: new ( len seq -- newseq ) flushable
    GENERIC: new-resizable ( len seq -- newseq ) flushable
    GENERIC: like ( seq prototype -- newseq ) flushable
;

GROUP: mutable FROM: sequence
    GENERIC: set-nth ( elt n seq -- )
    <PRIVATE
        GENERIC: set-nth-unsafe ( elt n seq -- )
    PRIVATE>
;

GROUP: resizable FROM: mutable
    GENERIC: set-length ( n seq -- )
    GENERIC: underlying ( seq -- underlying ) ! Note: defined in core/growable
    GENERIC: set-underlying ( underlying seq -- ) ! ditto
;

Immediately, you can see that I've added some structure to this. The generic words are each inside a GROUP: declaration. Some of the group declarations have FROM: following the group name, meaning that every implementation of that group must also be an instance of the group following the colon. (Multiple classes could be placed in brackets.) I've separated this out into three type classes, because that is the actual structure of the sequence protocol: all sequences are indexable, and some of those are mutable, and some of those are growable.

In each of the above examples, GENERIC: is used, but it's possible to use other method combinations like GENERIC#, MATH: and HOOK:. This proposal doesn't imply any significant change in dispatch mechanisms, except for the removal of delegates as they are now.

Each group forms a class which is the (initially empty) union of the types implementing the group. At first, this sounds useless, but it allows two things we haven't had in Factor that Haskellers take for granted: the ability to do proper default method implementations and create type class instances of forms like instance Foo a => Bar a where ... (this isn't Haskell98, but it's still useful). Here's how some default methods look:


M: sequence nth bounds-check nth-unsafe ;
M: mutable set-nth bounds-check set-nth-unsafe ;
M: resizable set-nth ensure set-nth-unsafe ;

Currently, each of these definitions is repeated multiple times throughout the Factor code base. Definition on object, the current mechanism for default methods, is avoided to make room for delegation. Other default methods, which are currently implemented on object, could be


M: sequence new-resizable drop  ;
M: sequence new drop f  ;
M: sequence like drop ;

It is good that these are implemented on sequence rather than object because that is a more precise way of indicating what the code is actually doing. I always found it confusing that sequence methods would be implemented on object when they would clearly fail on many types of objects.

So, we have this group. How do we make an implementation of it, or a class which belongs to it? If we just started implementing methods without any declaration, it'd be hard to decide whether a type belongs to a particular class. So, here's an example of how the syntax could look, showing the instance of sequences on integers. Note that integers belong only to the class sequence, not mutable or resizable.


JOIN: sequence integer
M: integer length ;
M: integer nth-unsafe drop ;

The JOIN: declaration ensures that integers will be included in the sequence group. (They're joining the group, hence the name.) We may or may not want an error to be triggered if someone attempts to implement the sequence methods without that declaration. I'm not sure. It might also be an error-triggering condition if an JOIN: declaration exists in a file without a complete implementation of the group's methods.

Slava suggested that we may want a behavior, or set of implementations of methods, without adding any slots or new generic words. That could be done easily with this system:


GROUP: foo FROM: sequence ;
M: foo nth-unsafe ... ;

Now, here's an innovation where we go beyond Haskell, sorta. It could, potentially, be included back into Haskell; there's nothing stopping them, but it might not be as useful there. (I should note that this so far doesn't incorporate nearly all of the functionality of Haskell type classes, but that's OK.) I'm talking about a generalization a Haskell extension called generalized deriving for newtype. Basically, if you make a newtype in Haskell, you can use the deriving construct to allow a few classes to implement all methods by looking up the object stored "in" the object of the newtype. (In reality, there's nothing in it; the difference is only at the type level.) That wasn't a very good description, so let me explain how it works in Factor.

My first real programming project was writing an XML parser. Recently, I found a way to make some operations on tag objects much simpler: a tag can act like its name, its children, and its attributes all at the same time! That probably sounds confusing, but it is actually quite simple. The main way names are used is querying the name-url, name-tag and name-space, the three tuple fields. The way a tag's children are used is as a sequence. And attributes are used as assocs. So really, the concept we want to express here is that there are three different delegations, each for a different type class. (Let's imagine here that the tuple type TUPLE: name url space tag ; defines a type class name-slots containing the generic words name-url, name-tag and name-space. I'm also not sure if this is a good idea.) Then we can specify these delegations using:


TUPLE: tag name attrs children ;
DELEGATE: name-slots tag tag-name
DELEGATE: growable-seq tag tag-children
DELEGATE: assoc tag tag-attrs

Currently, my code for this is a mess of delegation using the delegate slot (which is set awkwardly in a manual constructor definition to the name) and this:


M: tag at* tag-attrs at* ;
M: tag set-at tag-attrs set-at ;
M: tag new-assoc tag-attrs new-assoc ;
M: tag assoc-find >r tag-attrs r> assoc-find ;
M: tag delete-at tag-attrs delete-at ;
M: tag clear-assoc tag-attrs clear-assoc ;
M: tag assoc-size tag-attrs assoc-size ;
M: tag assoc-like tag-attrs assoc-like ;

M: tag nth tag-children nth ;
M: tag nth-unsafe tag-children nth-unsafe ;
M: tag set-nth tag-children set-nth ;
M: tag set-nth-unsafe tag-children set-nth-unsafe ;
M: tag length tag-children length ;

which is just plain unnecessary duplication. The simpler preceding code would basically generate the code underneath. Assuming tuple slot accessors are kept the way they are, this would allow a tuple to "inherit" from more than one tuple type. So all classes have the capability to inherit from one or more "abstract classes" (groups) and "concrete classes" (normal classes). For those of you worried about multiple inheritance conflicts, let's take a simple approach and just say the last DELEGATE: declaration takes precedence. There's no reason to have complicated graph algorithms here. If your code depends on something more complicated than that, then there's probably something wrong with it. In general, the classes you implement or delegate to shouldn't overlap too much.

If you've noticed, I have a notion of a class which is a bit broader than the Haskell concept. Now let me make it even broader: every generic word forms a group (and therefore also a class, since every group forms a class which is the union of the types that implement it). This way, every generic word is in some sort of type class. So, by itself, the code GENERIC: foo would be equivalent to


GROUP: foo
    GENERIC: foo
;

In most Factor code, it wouldn't make sense to put most generic words into classes. They're not part of a protocol, and it's just annoying. When a type implements a method, it is automatically made an instance of the class corresponding to that method. Here's a nice useless example of that:


GENERIC: nothing? ( object -- ? )
M: f nothing? drop t ;
M: sequence nothing? empty? ;
M: zero? nothing zero? ;

The final method works on all objects that have an implementation for zero?. Again, I'm not sure whether this is a good idea or not.

So, there's my long-winded proposal for something like type classes in Factor. The terminology is probably confusing, so feel free to ask for clarification (or correct my errors!). Ed Cavazos made another interesting proposal for a radically changed object system in Factor, which he has already implemented and made use of in extra/mortar. I could implement this system in Factor, and may in the future, but haven't yet. Comments?

Update: I made some changes that Slava suggested. I should've noted better that ad-hoc polymorphism, with or without this change, will still be a lot more ad hoc than in Haskell, if I didn't make that clear.

Update 2: I have a working implementation of this at the wee-url pastebin. There are only two problems: it's slow and it'll hang if you give it a cyclic group structure. (Sorry, that statement sounded too abstract algebra-like but it actually has nothing to do with that, AFAIK.) Oh, and a couple more problems: I can't include this in my repository, because loading it affects the whole system, which is bad. And also, it might be a bad idea to make every generic word and set of tuple slots a class, because that'd add 1199 classes to the core as of Factor .89, most of which will never be used. In .89, there were 296 classes in the core. Another problem is that I haven't fully tested the newly modified class< needed because of the new class structure. I'm not sure if it'll work properly, since this includes multiple inheritance. (But then again, so do overlapping unions, which, in my tests, silently and arbitrarily chose one when there was any ambiguity about which method to run.) When writing this, I realized that it's actually really easy to infer by context when a group is joining a class, so an explicit JOIN: declaration is only needed in certain edge cases, or when a group does not have any directly associated methods.

Update 3: Actually, Slava implemented basically the same thing at the same time as I wrote it, but I didn't have internet at the time, so I didn't know it. Huh.