
darren at darrenduncan
Feb 10, 2009, 12:52 AM
Post #1 of 1
(683 views)
Permalink
|
|
ANNOUNCE - Set::Relation version 0.6.0 for Perl 5
|
|
P.S. While generally I'm not posting on status of my database-related projects to the Bricolage list considering their appropriateness to the list, today's is a rare exception. This post is the first or only one about a specific one of the projects, Set::Relation, and most importantly, this project is already implemented so (ignoring that it still needs a lot of testing and is probably buggy), you can actually use it right now. That includes use in Bricolage's implementation, if you find it appropriate. Also, this project is a direct descendant of a conversation started on this very list. So goes this rare post. ---------- All, I am pleased to announce the first (widely announced, and the 9th actual) release of Set::Relation, the official/unembraced version 0.6.0 for Perl 5, on CPAN. You can see it now, with nicely HTMLized documentation, at: http://search.cpan.org/dist/Set-Relation/ A short summary description with synopsis code is further below in this message. While new, Set::Relation is effectively done (enough for a first major version), with a full feature set and with everything fully documented in POD, and you can start actually using it now. That said, this module is officially in alpha release status so you should take caution with it. While its API is unlikely to change much, and the code appears correct, a lot of it has not yet actually been executed, and the current test suite is almost empty. The module will probably work now but might have breaks. See further below if you'd like to help out with this module's future development. Also expected in the near future, though not today, is a corresponding version for Perl 6, which was intended from day one. The official discussion forums for Set::Relation currently are just the email based ones listed at http://mm.darrenduncan.net/mailman/listinfo and labeled 'muldis-db'; the FORUMS pod section in Relation.pm itself also lists these. Any protracted discussion following this announcement would ideally take place there, so it is easy to find aggregate information resulting from said discussions. As for replying in other forums, use your discretion as usual. No official IRC forums for Set::Relation or other Muldis database-related things exist yet, though in the near future I expect I would get one setup on perl.org or freenode.org, preferably I would want a logged channel. -------- Set::Relation provides a simple Perl-native facility for an application to organize and process information using the relational model of data, without having to employ a separate DBMS, and without having to employ a whole separate sub-language (such as Muldis Rosetta does). Rather, it is integrated a lot more into the Perl way of doing things, and you use it much like a Perl array or hash, or like some other third-party Set:: modules available for Perl. This is a standalone Perl 5 object class that represents a Muldis D quasi-relation value, and its methods implement all the Muldis D relational operators. A simple working example: use Set::Relation; my $r1 = Set::Relation->new( [ [ 'x', 'y' ], [ [ 4, 7 ], [ 3, 2 ], ] ] ); my $r2 = Set::Relation->new( [. { 'y' => 5, 'z' => 6 }, { 'y' => 2, 'z' => 1 }, { 'y' => 2, 'z' => 4 }, ] ); my $r3 = $r1->join( $r2 ); my $r3_as_nfmt_perl = $r3->members(); my $r3_as_ofmt_perl = $r3->members( 1 ); # Then $r3_as_nfmt_perl contains: # [. # { 'x' => 3, 'y' => 2, 'z' => 1 }, # { 'x' => 3, 'y' => 2, 'z' => 4 }, # ] # And $r3_as_ofmt_perl contains: # [ [ 'x', 'y', 'z' ], [ # [ 3, 2, 1 ], # [ 3, 2, 4 ], # ] ] This is the initial complement of public routines; besides the "new" constructor submethod, there are these 68 object methods: "clone", "export_for_new", "has_frozen_identity", "freeze_identity", "which", "members", "heading", "body", "slice", "attr", "evacuate", "insert", "delete", "degree", "is_nullary", "has_attrs", "attr_names", "cardinality", "is_empty", "is_member", "empty", "insertion", "deletion", "rename", "projection", "cmpl_projection", "wrap", "cmpl_wrap", "unwrap", "group", "cmpl_group", "ungroup", "transitive_closure", "restriction", "restriction_and_cmpl", "cmpl_restriction", "extension", "static_extension", "map", "summary", "is_identical", "is_subset", "is_proper_subset", "is_disjoint", "union", "exclusion", "intersection", "difference", "semidifference", "semijoin_and_diff", "semijoin", "join", "product", "quotient", "composition", "join_with_group", "rank", "limit", "substitution", "static_substitution", "subst_in_restr", "static_subst_in_restr", "subst_in_semijoin", "static_subst_in_semijoin", "outer_join_with_group", "outer_join_with_undefs", "outer_join_with_static_exten", "outer_join_with_exten". It is important to note that practically anything you can do in a SQL SELECT (and in various other kinds of SQL), for any vendor of DBMS, you can do with the Set::Relation routines (and ordinary Perl); in the short term a "how do I" kind of FAQ or tutorial will be made, but it doesn't exist yet; meanwhile you should be able to figure it out using the routines' reference documentation. For examples: 1. the "SELECT ... FROM $foo" query portion is handled by any of [projection, extension, rename, map, substitution, etc]; 2. the "WHERE" and "HAVING" clauses are handled by [restriction, semijoin, semidifference, etc] which includes "IN" and "NOT IN"; 3. the "GROUP BY" is handled by [group, cmpl_group, etc]; 4. aggregation operators combined with "GROUP BY" are handled by [summary, etc]; 5. ranking, sorting and quota queries like "RANK", "ORDER BY" and "LIMIT" are handled by [rank, limit, etc]; 6. inner joins are handled by [join, product, intersection, etc]; 7. outer joins are handled by the various [outer_join_*, etc]; 8. union, intersection, difference, etc are handled by the same; 9. "COUNT(*)" is handled by [cardinality]; 10. recursive queries are handled by [transitive_closure, etc]; 11. sub-queries are supported everywhere simply as the normal way of doing things; 12. other features like relational divide, composition, etc are given by [quotient, composition, etc]. Set::Relation is a generic tool and can be widely applied. It has been developed according to a rigorously thought out API and behaviour specification, and it should be easy to learn, to install and use, and to extend. But in the short term at least, this module is still assumed to be very un-optimized for its conceptually low level task of data crunching, and you may want to avoid it if your top concern is execution (CPU, RAM, etc) performance. Set::Relation is best used in situations where you either want to just get some correct solution up and working quickly (conserving developer time), such as because it is a prototype or proof of concept, or where your data set is relatively small, or where your task is one that is less time sensitive like a batch process. Some suggested uses for Set::Relation include applying it to help with: flat file processing, SQL generation, database APIs, testing database related code, teaching databases, and general list or set operations. See http://search.cpan.org/dist/Set-Relation/lib/Set/Relation.pm#Appropriate_Uses_For_Set::Relation for more details. Set::Relation's performance will be improved over time so some of these issues should go away later, or the sibling project Muldis::Rosetta (still under construction) will have much better performance anyway due to its greater complexity to address such matters. Set::Relation requires Perl 5.8.1+, Moose 0.68+, version.pm 0.74+, namespace::clean 0.09+, and List::MoreUtils 0.22+; it has no other direct external dependencies. This module is pure Perl and a single file. It is now maintained in a Git repository; see http://utsl.gen.nz/gitweb/?p=Set-Relation or the distribution's README file. If you like Set::Relation, either as it is now or as you see it becoming, and you would like to help improve it, I welcome any and all kinds of assistance as you would like to offer such. Probably the greatest help I can get if people want to is to supply test files to confirm correct behaviour and expose current or regression bugs; other Set:: modules or database-related modules may be an inspiration for copying/adapting tests from. I would also like to build up a set of usage examples and basic tutorials, meant to answer the sort of questions "how do I do this?". For example, within the context of a relational database represented as a Perl Hash whose elements are Set::Relation objects representing SQL/etc tables/relvars, I would like a number of brief problem descriptions, such as that provide example database schemas and data (multiple questions/examples can share the same schema/data), saying first in a sentence what a query is trying to find out, then example SQL/etc to do it; for each example I/we would then supply Perl code for how to do the same thing with Set::Relation; we have a side-by-side comparison. Otherwise, I invite feedback on all aspects of the module's design, implementation, and documentation. For example, What sorts of changes do you suggest to the criteria Set::Relation uses to determine whether 2 arbitrary Perl values are to be considered identical or not (that's a big one); what sorts of typical module serialization hooks should I or should I not be using as object identifiers? Is the documentation structured the best way it could be. Is the module making as much use of Moose's features as it can be, or making as much use of the lesser known power features of Perl 5 itself as it should be? Do you think details of the module's API or semantics should change, such as to better integrate it into typical or best practice ways of using Perl? What additional prior art such as other Perl modules should I be looking at, either that Set::Relation should use as a dependency, or that it should copy/adapt functionality or techniques from? How are you applying, or would you consider applying, Set::Relation to your work and what changes if any might help you adopt it more easily? Do you propose different internal syntax for the module's code, or propose a different factoring of the code? Can you suggest a better way to package the module; eg would you propose an alternative to the simple Makefile.PL? Do you propose a particular structure for the test suite? What about examples and tutorials; how might those best be organized and what sorts of things should they contain? What can you suggest for helping performance? And then there's Perl 6; do you have suggestions for particular Perl 6 features that should be exploited for Set::Relation's Perl 6 native version? Or do you have ideas for the Perl 6 language itself to adapt distinct Set::Relation features into Perl 6 itself as if a relation were just another generic collection type (which it is)? Note that the work done on Set::Relation and in improving it and testing it will later feed back into implementing Muldis::Rosetta, whose design overlaps. It is very helpful to me if Set::Relation can be made the best it can be, as soon as possible, so to make said feedback more timely. Thank you and have a good day. -- Darren Duncan
|