ORM’s Dreaming Big: Pt 3 (Big Pappa ORM)

Here is Part 1 and Part 2 if you are interested

Two Orms to date have been very interesting to me.

  • RethinkDB – Pushes events to any listeners. This inherently supports cluster since if a socket attempting to synchronize, it needs to know the changes. Luckilly, the changes will be pushed to all threads.
  • Waterline – What I really like about waterline is that instead of being a storage system, it is just the interface to it. This allows you to have specific storages and maximize their best parts without having to write in different query languages.

Databases have gone in and out of style so fast over the past years. MongoDB, CouchDB, Postgres, Hadoop, MySQL. All of which are competing for the marketshare of being the “database of choice”. That being said, All of them have distinct advantages and disadvantages. Anything SQL also gives you the ability to run a PHP stack without much trouble. Anything JSON allows you to store documents in the format your coding in. Additionally, Redis has shown that moving the global memory of sessions and the like to a single store is very important for clusters. As a result, the Storages increase and unfortunately, the interface increases as well.

Queries, Adapters, The Bazaar and the Cathedral

If you havent read this essay, I think you should. The gist of it is there is a clear difference in the way people write parts and something is created in one huge gulp at a time. Now, I’m in the parts boat because as time continues, there may be new Databases. There may be new fancy things. And as a result, you don’t want your api to change with each database you move to. In a bazaar, you have your fruit farmers, butchers, jewlery, etc. Each is specialized with a solid respect to each person and what they do. In a cathedral, you are given your daily bread and wine on special occassions. This is simple and works however, some cathedral’s bread is better than others. And sometimes you get olive oil with yours. I believe that allowing as many databases to interact with your ORM is superior to forcing a the API to adhear to your Databases design. In this way, you can make the Query Language the best Possible without breaking compatibility with an Databases. As a result, there is a few simple laws I believe he ORM should adhear to

  • One Query Syntax for all databases it interfaces with
  • “Adapters” decompile the query into what the database will actually use and send it

As result, there becomes a dependency. Adapters -> Models. However, Adapters are more abstract, they generally should be reused for multiple databases of the same type. As a result there becomes a further dependency.

Adapters -> Connection to a Database -> Model/Table

Pluggin in, Plugging out and letting everything just Figure itself out

Content management systems are allowing you to design databases in the browser. WordPress has custom PostTypes. Drupal has Schemas. Treeline is making “machines”. However, the most important concept here is that when you make an edit to the a database model, the whole server doesn’t have to shut down on the way there. PHP has the advantage of not existing in memory as a result, each call to the server is essentially a “fresh” instance. NodeJS doesn’t have that kindness. As a result, making sure your orm is designed in such a way that dependencies don’t require the need to Destroy and recreate is of upmost importantce. Something simple such as

orm.addModel("ModelName", modelConfig)
orm.removeModel("ModelName");

Can really make or break what an ORMs capable of. A simplistic Algorithm would be something like this.

util.inherits(ORM,EventEmitter);

ORM.prototype.addModel = function(name, config){
  this.models[name] = new Model(config);
  var connDeps = getConnectionDeps(config);
  var modelDeps = getModelDeps(config);
  allListeners(name,modelDeps,this,"model");
  anyListeners(name,connDeps,this,"connection");
}



function allListeners(name,deps,orm,key){
  var depnum = 0;
  var addlistener = function(){
    depnum++;
    if(depnum === 0){
      orm.emit(
        "add-"+key+"["+name+"]",
        orm[key][name]
      )
    }
  }
  var remlistener = function(){
    if(depnum === 0){
      orm.emit(
        "rem-"+key+"["+name+"]",
        orm[key][name]
      )
    }
    depnum--;
  }

  deps.forEach(function(depname){
    if(!orm[key][depname]){
      depnums--;
      orm.on(
        "add-"+key+"["+depname+"]",
        addListener
      )
    }else{
      orm.on(
        "rem-"+key+"["+depname+"]",
        remListener
      )
    }
  });
  orm.on("destroy-"+key+"["+name+"]",function(){
    deps.forEach(function(depname){
      orm.off(
        "add-"+key+"["+depname+"]",
        addListener
      )
      orm.off(
        "rem-"+key+"["+depname+"]",
        remListener
      )
  });
}

function anyListeners(name,deps,orm,key){
  var depnum = 0;
  var addlistener = function(){
    if(depnum === 0){
      orm.emit(
        "add-"+key+"["+name+"]",
        orm[key][name]
      )
    }
    depnum++;
  }
  var remlistener = function(){
    depnum--;
    if(depnum === 0){
      orm.emit(
        "rem-"+key+"["+name+"]",
        orm[key][name]
      )
    }
  }

  deps.forEach(function(depname){
    if(!orm[key][depname]){
      orm.on(
        "add-"+key+"["+depname+"]",
        addListener
      )
    }else{
      depnums++;
      orm.on(
        "rem-"+key+"["+depname+"]",
        remListener
      )
    }
  });
  orm.on("destroy-"+key+"["+name+"]",function(){
    deps.forEach(function(depname){
      orm.off(
        "add-"+key+"["+depname+"]",
        addListener
      )
      orm.off(
        "rem-"+key+"["+depname+"]",
        remListener
      )
  });
}

With even emitters, you can toggle back and forth with minimal issues. Theres other parts that are important such as…

  • Binding a model to the orm instead of making the call itself
  • Being able to queue requests from a model
  • Throwing errors when things fail

Cluster Support

Cluster support is one of the most important parts about any modern javascript module now adays. If it can’t be run on a cluster, its not fit for production. If its not fit for production, its going to end up being just fgor somebodies side project. From a simple concept, you can add cluster support by relaying events. This simple example is all we really need to ensure we are sending events properly. First off we must figure out what events need to be sent globally. For our cases, we’ll do delete of a model

ORM.prototype.addModel = function(){
  var model = this.figureOutDeps(arguments);
  var self = this;
  model.on("delete", function(instances){
    if(self.isChild){
      process.send({
        type:"orm-rebroadcast",
        event:"model["+model.name+"-delete"
        data:instances
      });
    }
    self.emit(
      "model["+model.name+"-delete",
      instances
    );
  });
}

As you can see, when a delete has happened locally, we then tell the master what has happened. From here the master tells every other worker what has happened

ORM.prototype.asMaster = function(workers){
  var self = this;
  workers.forEach(function(worker){
    worker.on("message", function(msg){
      if(msg.type === "orm-rebroadcast"){
        self.broadcast(msg,worker);
      }
    });
  });
  this.workers = workers;
}

ORM.prototype.broadcast = function(msg,not){
  this.workers.forEach(function(worker){
    if(worker === not) return;
    worker.send(msg);
  });
}

From there we can implement the worker’s listeners

ORM.prototype.asWorker = function(){
  var self = this;
  process.on("message", function(msg){
    if(msg.type === "orm-rebroadcast"){
      self.emit(
        msg.event,
        msge.data
      );
    }
  })
}

There are things that can be done a little bit nicer. For example, having workers tell the master what the want to listen and not listen for. Additionally, We can reimplement this with Redis or any other api because it really isn’t that complicated.

ORM’s Dreaming Big: Pt 1 (Schema)

So I’ve came across waterline. Waterline is unique in that it isn’t just an interface to sql or mongodb but anything.* Anything*. (*Anything that someone has made an adapter for it). Unfortunately my experiences with it have led me to believe it is not a good fit for a full stack outside of sails. I’ve written more about my issues with the framework here. Now, I’m aksing for a lot therre so I don’t expect it to go anywhere. But I made me have a three day coding sprint of trying to develop my own ORM the way I wanted it. As I continued, I realized it is a ton of work alone and not all of the things I desire are available to mooch off of. But the Idea… The Dream… That can live on…

What would it be used for?

This is one of the reasons why I wrote this post. Waterline is awesome. However, It left me wanting more and questioning if supporting too much is giving me less. As a result, the interfaces I believe are of upmost importance to provide compatibility with is.

  • Memory – This was a proof of concept by them, however it can be implemented in a fast manner. Libraries like Lazy.js will compile all the arguments so it s only ran in a single loop. Additionally, supporting proper indexes can add even more speed to it. However, the point of being able to use in memory is so that you can create “collections” easily, beautifull and query them as you would anything else
  • LocalStorage – This is another clientside feature that can be implemented. LocalStorage is an interesting beast but nonetheless quite manageable. WHat you would do is store each everything to start with connection/model where connection and model would be the connection and model name. From there you would store indexes under connection/model/indexName and probably each object in its own place such as connnection/model/ObjectID. This will allow you to not load too much at once and be able to asyncrnously retrieve objects as you need them instead of loading everything into memory and hoping all goes well.
  • HTTP- By providing a wrapper to create HTTP calls, you can interface with your database easily as if it was again on the server. Of course a serverside implementation is also necessary, however I think that’s relatively simple in the long run. Perhaps that begs the question of creating “Users” that can interface with your ORM.
  • FileSystem – Mongodb is the standard, without a doubt. However, I’m a strong believer in diversity (when it’s convenient to say I am). As a result, creating a filesystem document based framework doesn’t see too far off or out of line. It would most likely be quite similar to localStorage actually
  • MongoDB – Mongo in many ways provided the breakthrough db. Might as well still be able to interface with it

Sugery Snacks

The Validator

The validator Is without a dote one of the most important features to any database. The purposes of a validator are not just for the databases purposes but also for the clientside. When creating a form, being able to just hook in and apply a validator is without a doubt one of the sweetest things possible. Unfortunately, those validation parameters generally are not included in the database as well. This is partly a good thing since you don’t users to see all of your internals, however for some code rewriting your models is kind of a pain in the ass. Just tedious work. As a result, here is the first commandment

  • A Validator can also be easilly hooked into any form
  • A Validator can also easily be used to generate forms

The second part is obviously much more complicated as you can see here and here.  But we’re dreaming big here right? No holds bar. Whatever we damn well please. And I would be pleased to not have to rewrite code for every single form I ever come up with.

As for Creating it, Here is a Laundry list of features….

“Native” Types

The Native Types I would prefer to keep as simple as possible

  • Number
  • Buffer
  • String
  • JSON
  • Any
  • Typed ObjectID – Can specify a specific Model(s) allowed.
  • Typed Array – Can Specify the Type of Array it will be (Any is also allowed)

The reason long, date and others are not supported is because those will end up being compiled to these native types anyway. The Object Id is the only thing that’s really different. Numbers, HashMaps and ObjectIds are the only thing that cannot be evaluated to an array.

to use these

prop1:Number, //You can provide the object class
prop2:"buffer" //You can provide a string specifying the type
prop3: ["string"] // You can create a typed array
prop4: {
  native:Object //you can specify the type explicitly
}
prop5: Framework.Types.ObjectID //This specifies any other document
prop6: "objectid:modelname" //This specifies that you expect it to use another model
prop7: AModelClass, //This specifies that you expect to use that other model. This will be the same as above
prop8: {
  native: "objectid"
  model: "modelname" //Specify the type explicitly
}
prop8: null, //Specifies Anything
prop9: FrameWork.Types.Anything //Specifies AnythingAswell
Additional Types

You may also use custom schematypes. However, schematypes will not have all the features that a validator expects. In addition, you may also provide anything that you would write in a custom schematype within the schema as well.

To use a custom schematype

prop1: {
  native: CustomSchemaTypeClass
}
prop2: {
  native: "customschematypeclass"
}

If you provide a string, your validator will create a dependency on that schematype. This means if that schematype is not available in your framework, until it is your model cannot be used. Below cannot be used with Custom SchemaTypes and only available to the Schema. Additionally, you can provide custom options that will override the SchemaTypes original Options

prop1: {
  native: "customschematypeclass",
  a_custom_option: "a value"
}
prop2: {
  native: CustomSchemaTypeClass({
     a_custom_option: "a value"
  })
}
Basic Validators
  • Required – Cannot be null or undefined
  • Final – After first created, cannot be set again
  • Unique – this will also create an index.

These are simple and straight forward

Default

At times you may want to provide a default. And now you can in three different ways

property:{
  native: [String],
  default:["value"],
}
property2:{
  native: [Number],
  default:function(){
    return Math.random();
  }
}
property:{
  native:[ObjectID]
  default:function(next){
    Query().find({something:value}).exec(next);
  }
}

 Validators Available in Custom SchemaTypes

The following are available to use within your Schema as well as SchemaTypes. Schematypes are interesting in that they can be extended indefinitely however only return the Schematype. THis is done because of the following

function CustomSchemaType(options){
  if(!(this instanceof CustomSchemaType)) return new CustomSchemaType(options);
  this.options = options
}

CustomSchemaType.prototype = function(options){
  options = _.merge(this.options,options);
  return new this.constructor(options); 
}

Simply Put, you can extend and extend and extend away

Custom Validators

Custom validators come in two flavors: Syncronouse and Asyncronous. This will be a theme as we continue

syncProprty:{
  native:Number,
  validator: function(value){
    return false;
  }
},
asyncProperty:{
  native:Number,
  validator: function(value,next){
    next(false);
  }
}
Value to Array Validation

The idea here is that you may want to compare an value to the array of values. Something like

//BAD!
arrayCompareBad:{
  native:Number,
  validator: function(value){
    return [1,2,3,4].indexOf(value);
  }
},
//Good.
arrayCompareGood:{
  native:Number,
  in:[1,2,3,4]
}

But that is slow as you create an array every time. Instead the idea is you’d be able to define it before hand

  • In – Ensures that the value is in the values
  • Not In – Ensures the value is not in the values

Now you can provide this value up front or provide it through a Syncronous or Asyncronous Function. It’s important to note that anything can use this syntax.  Enum’s have bothered me for quite some time. Any value can be compared to other values to enusre they are restricted by a certian subset. Enums used to only apply to strings however they can be applied to numbers, Buffers and yes even Arrays. Hashmaps and ObjectIds are a different beast however. Hashmaps are keys so there is no point in attempting to give it an enum. ObjectID’s require that certian ObjectId’s already exist. Now, this can be done however At that point you would need to specify a query and do it asyncronously.

Array to Array Validation
  • Any – True If At least One of Validator Array Values are present
  • All – False Unless all of the Validator Array Values are present.
  • More – True if there is more than just the Validator Array Values present
  • Not Any – False if at least Obe of the Validator Array Values are present
  • Not All – True Unless all of the Validator Array Values are present
  • Not More – False if there is more than only the Validator Array Values are present. Will also return true if empty.

You can use these like so

property:{
  native: [String],
  any:["any","of","these"],
  not_more:["any", "of", "these", "and", "no","more"]
}
Population and Save Overrides

At times you will want to Populate your data from a source other than the database. Additionally, you will want to override how that property is stored after its been validated. An example would be “User.notifications”. To duplicate the data would be absurd and if you are storing every ObjectId, you may run into numbers in the thousands. However, you can have that particular part populated on the fly.

image: {
  native: string,
  native_populate: Buffer,
  populate: function(storedvalue,next){
    fs.readFile(storedvalue,next);
  }
}

It should be noted that these aspects should probably also be available as a stream. Something like this would also work

function MyReadableStreamClass(storedvalue){
  ReadableStream.call(this);
  this.storedvalue = storedvalue;
}
image: {
  native: String,
  native_populate: Buffer
  populate: MyReadableStreamClass
}

This will create the readable stream on the fly. It should be noted that different things will populate in different ways. As a result, while this may send raw data, another might send JSON.

Now, you may be populating data, however what happens when someone wants to save something.

image: {
  native: Buffer
  depopulate: function(args,next){
    var ext = mime.findOutMimeExtension(args.name)
    var name = this.instance.id+"/image."+ext;
    fs.writeFile(name, args.buffer, function(e){
      if(e) return next(e);
      next(void(0), name); //name will be stored with the doc
    });
  }
}

//or

image: {
  type: Buffer,
  depopulate: MyWritableStreamClass
}
Digestors (Constructor Overloading) and Verbosity

Your developers may  want to be using Moment’s as dates. However, expect to be able to send a normal Javascript date as something to be stored. This is where digestors and Verbosity comes in

date: {
  native: Number,
  digestor: [function(date){
    if(date instanceof Date) 
      return date.getTime();
  }, function(time){
    if(typeof time == "number") 
      return time;
  }, function(moment){
    if(moment instanceof Moment) 
      return moment.valueOf();
  }]
}

Here we can see that any of the above values will be considered a valid number. As a result, you don’t have to worry about what you set the Date to be. If you want to always get the date as a moment

date:{
  native:Number,
  verbose:function(value){
    return moment(value);
  }
}

This will allow you to easily do whatever you want with the number without changing your database.

Schema Methods

Virtual Properties

Now, we’ve seen actual properties, but there are some properties that will not be stored but are derived from the instance itself

var schema = new Schema(validations);
schema.virtual("virtual_property", function(){
  // getter
  return this.stringA +"-"+ this.stringB;
},function(value){
  value = value.split("-");
  this.stringA = value[0];
  this.stringB = value[1];
});

Schema Indexes

The last and probably the most impotant is the indexes. With indexes. Now, Indexes are not and should never be available to a custom schematype. Additionally, because indexes can be so flexible, It brings up some interesting decisions. It’s important to note, not only can you index normal properties however you can also index virtual properties as well. Indexes available are

var schema = new Schema(validations);

schema.index("propertyname");
schema.index("uniqueproperty", "unique");
schema.index("functionalproperty", function(a,b){
  return b - a;
});
schema.index("callbackproperty", function(a,b,next){
  async.map([a,b],fs.stat,function(err,res){
    if(err) return next(err);
    next(void(0), res[0].size - res[1].size);
  });
});
Validate

Using validate is simple.

  • If the object is JSON – will validate the json
  • If the object is DOM element – will validate the dom element if its a form. If it is not, will throw an error.

This will be available from the model via Model.validate which is is basically Model.constructor.validate.bind(Model.constructor)

Finishing words

It’s important to note that the schema is simply a validator and provides database indexes. You cannot make queries with it and it essentially does nothing except provides important information for storing the data. Ideally, you will want to do as much as you can with the SchemaTypes so that you can reuse more code. And the indexes, virtual properties, what’s required and etc is Schema to Schema dependent. If you don’t like me dreaming big, well… To be honest… I believe dreaming big is part of the reason why I am here today. Because I see what I want to make and I go out and try to do it. And If I cannot, I flesh out the idea so much so that I can look back on it and say “If only”.