Sclera is now open-source, with the code available on GitHub under Apache License version 2.0. It is also easier to install and maintain, thanks to a brand new command-line administration tool, scleradmin.
Making the source open gave us an opportunity to carefully revisit the code, and in the process make it leaner and better structured. We also removed some plugins based on what is now legacy software, such as Apache Mahout and Apache Pig over Apache HBase. The latter, at some point, will be replaced by a far more general plugin based on Apache Drill.
Now that the code is open, do have a peek inside and see if anything in there could be useful to what you are working on. Among other things, you will find:
- A SQL parser, based on Scala parser combinators. It is compact, and (in our opinion) simpler than the Lex/YACC based parsers available elsewhere. It parses a large subset of PostgreSQL-compatible “standard” SQL, with additional constructs that reduce the verbosity of the language.
- A parser for Sclera’s variant of Grammar of Graphics — integrated with the SQL parser, but can be separated out easily.
- A regular expression compiler, which converts a regular expression into a Glushkov non-deterministic finite automaton, and a streaming regular expression matcher that builds on the same.
- An implementation of Dynamic Time Warping, used to align two data sequences — a generalization of join.
- Multiple alternative evaluators for relational operators (filter, join, aggregates, etc.).
- A relational optimizer — rather barebones at the moment, but gets the job done.
The code is a mix of object-oriented and functional programming. There is a significant bias towards idiomatic functional programming (map, fold, etc.), but a fair number of stateful constructs (prominently, iterators) are used as well for the sake of efficiency.
Bug reports and suggestions are appreciated. Contributions welcome, especially in terms of useful and innovative plugins.
Hope you have as much fun using Sclera as we had building it!