Failure and Compression

The last few days I was hammering away at some interesting ideas. They were about making any file into a 16 hexadecimal string.  The idea came about when I saw a post on hacker news about distributed systems and url standards for them. When it comes to distributed systems and sharing of files I’m of this opinion

  • You want the requestor able to get chunks from multiple computers and consolidate them into one – This also means that all the chunks know that they are a part of a whole and are able to point to the whole
  • You want the requestor to know whats coming to them – This can be metrics about the file like length, checksum, hash, etc.
  • Security is secondary to ensuring the system is working without hicups. Secure requests should be a manner of encrypted messeges with keys that are already available to each computer.

I want to take the second one a step further. I want to make the url so detailed that the computer can actually recreate the file from their tiny little hash! Impossible? Never! Here is the product of my work https://github.com/formula1/compression-failures

Technique 1: File attributes and filtering possibilities

So the first thing I went after was attempting to turn any file into a few 32 bit integers. This included obvious things like file length to ensure I have a finite number of possiblities and checksums to remove extremes and to limit my possibilities even more. From there its kind of free shooting. I figured my best next step is to seperate the file out into what I will call ‘patterns’. The way patterns work are as so

  • A pattern has a segment length – 010 has a length of 1, 001100 length of two, 000111 length of three, 1111 4, 00000 5, etc
  • A pattern has a repition count – 010 has 3 repetitions, 001100 also has 3, 000111 has 2, 1111 has 1 and 00000 has 1
  • A pattern must also either start with 1 or zero
  • A pattern ends when what would be the next repetition ends early or goes for too long.

Splitting the file up into patterns sounds like a swell idea right? well with the 64k byte file I used, it had around 10,000 patterns. However, more interestingly is that the segment lengths were at 9 and the repetitions were at 7. Now these two aspects filter the number of possibilties even more (which I did not mathematically calculate). Ideally, each of these attributes would get me closer and closer to finally understanding what the file actually contains. However, I realized this isn’t going to be nearly as simple as what I thought

  • given
    • 00110011
    • 11001100
  • can you tell the difference between their length, checksums, unique segments, unique repetitions, total repetitions and starts with 1 counts? yes you can!
    • 8, 4, 1, 1, 4, 0
    • 8, 4, 1, 1, 4, 1
  • what about
    • 010011000111
    • 001101000111
  • You cannot
    • 12, 6, 3, 1, 6, 0
    • 12, 6, 3, 1, 6, 0

So I had this brilliant idea! What if I can figure out the order? Not the actual count, just the order of largest to least. To do this I would check the difference between each pattern. If the pattern increased in segment length 1 if decreased 0. What could go wrong?

  • given
    • 010011000111
    • 001101000111
  • You can!
    • 12, 6, 3, 1, 6, 0, 11
    • 12, 6, 3, 1, 6, 0, 01
  • What about
    • 001101000111
    • 000111010011
  • You cannot
    • 12, 6, 3, 1, 6, 0, 01
    • 12, 6, 3, 1, 6, 0, 01

How about if I get the order of the highest and lowests? Technically I can do this indefinitely until I order them completely!

  • given
    • 001101000111
    • 000111010011
  • You can!
    • 12, 6, 3, 1, 6, 0, 01, 1
    • 12, 6, 3, 1, 6, 0, 01, 0
  • what about
    • 0000011111010011
    • 0000111101000111
  • You cannot
    • 12, 6, 3, 1, 6, 0, 01, 0
    • 12, 6, 3, 1, 6, 0, 01, 0

I would need ot be able to point out exactly how much is in each of those highest amounts. Perhaps I can count the number of unique differences? Nope, because both are unique. This was my snag. I stil don’t know how to handle it. I am also fully aware that the ordering may actually be far more data than I anticipate.

Technique 2: Prime numbers!! 😀

:C So the way I planned to do it was as so:

  • Turn the file into a Gigantic number
  • Boil the number down to prime numbers and exponents
  • If the prime number if too big, represent it as the ‘nth prime’ instead.
  • if the numbe rof factors are too long
    • find the closes prime number to the gigantic number
    • subtract the gigantic number from the prime
    • Do it again with the left over

What I would be left with is a sequence like this

mul(2,3,nth(7),add(nth(13), mul(5,7)))

Looks great right? I thought so too! I deliberately created the syntax so I can format anything into 4 bit operations. Unfortunately I didn’t realize how slow prime number generation could really be. I ended up creating my own because I needed to be constantly streaming them and or hold a binary list. However, the problem with prime generation is that they always need the previous numbers to go forward. Finding a prime is actually about finding what isn’t a prime and collecting the leftovers. This sounds straight forward but its actually pretty ugly and leaves me waiting for minutes at a time to handle 24 bit primes.

  • stack = [];
  • i = 2
  • do
    • yield i
    • stack.push({ value: i + i, jump: i })
    • while(stack[0].value == ++i)
      • var jump = stack.shift().jump
      • var value = i;
      • var index = -1;
      • do
        • value += jump
        • index = stack.find({ value : value });
      • while(isFound(index))
      • stack.insertAt(index, { value: value, jump: jump})
  • while(true)

This generates primes fast, don’t get me wrong. I was suprised at myself at how well it worked. Truelly. Not only that, I can take full credit (with inspirations from some seives). But if it doesn’t create primes, it finds what isn’t primes. If it can’t handle 24 bits, whats to say it can handle 10 bytes or even 1000 bytes? When I was writing the readme I decided I would write a bit about making workers and making it threaded. This is kind of a neat idea but its still not perfect considering each worker still must wait for its previous workers primes. as we get into huge numbers, that is less true because 2*current may often be 3 workers later. Another concept is using a less absolute prime number creator like a mersenne prime. These are easily calculateable and also can interact well with logs so Its possible I could speed up the algorithm to a huge degree. Instead of trying to find out if its prime. I check how far away from the next 2, If 1, I consider it a mersenne prime. Else, get the mersenne prime of the two before it. Multiply it a few times. Subtract the total. And do it again with the left overs. This seems just as good but prime numbers are pretty special. And as good as mersenne primes are, they will probably not always be good enough for my purposes.

What can I say about it?

What I love about working on projects like these is I tend to want to end them. Usually, I want to come to some conclusion like “it’s imposible” or “way too slow” but I always find a way to make it work. From there its just about implementing it or ignoring it. A project like this has huge uses but many of the uses I don’t have direct interaction with. Only potential. I do believe growing myself is important but I don’t want to lose it over a cute little experiment with big potential

EventModel, Callbacks, Promises, Generators, Await: Opinion about Async and Threading

Why are things Async?

This is the first thing we need to ask ourselves before continuing and there is a good reason. What we need to understand about aynschronous programming is waiting. Waiting is the worst enemy of speed. And speed is important for User Experience and the ‘ol time=money equation. In the world where nothing is asynchronous. We’ll use dummy milliseconds to figure out total time (I should really just create a performance test for this, but I’m just doing this on the fly).

  • Our Server Waits for a Connection – This is what will start the following and what we want our server to do 100% of the time (nearly impossible)
  • On Connection
    • 2 ms – Our Server processes the Domain Name/Path
    • 3 ms – Our Server digests the Query (If applicable)
    • 10 ms – Our Server digests Post Data (If applicable)
    • 10 ms – Our Server Validates Query/Post Data
    • 20 ms to 50 ms – Our Server makes a database call – It is actually unknown how long it will take, but may be one of the slower aspects of our application
    • 30 ms – Our server Turns the data into servable HTML
    • 10 ms – Our server serves the HTML
  • We start waiting for the next connection

So an application may take between each connection We are using 65 for absolutely necessary aspects and 20 to 50 for wasted time. This also means we likely cannot handle 100 connections a second (highly unlikely). But that being said, the 20 to 50 is what Asynchronous is really about. I’m most likely over estimating servers because I love them so much, but thats Lets make a clientside example. Our Client does a few ajax requests to our server for an awesome app

  • 30 ms – The Page is rendered
  • 100 ms – We render the initial Map on a canvas
  • 100 ms – We make a call to our server getting favorite locations
  • 100 ms – We make an api call to a map application to get all possible locations
  • 50 ms – We Position the map to our current location
  • 20 ms – We Render favorite locations on the map
  • Wait for Click
    • 100 ms – We make a call to our server load that specific locations information
    • 300 ms – We animate the item click to display a popup
    • 20 ms – We render the location in the popup
  • Wait for Click

This is where asynchronous becomes all the more important. In the server its more about fear, scalability and just sexy programming. In the Clientside, every time we block the user loses control. Every time the user loses control, the experience degrades immensly. In this example we have wasted about 400 ms on startup or half a second and about 420 ms (half a second again) every time they click.  These wait times are absolutely absurd from a experience example. What asynchronous programming allows us to do allow events to tell us when something should happen next. In its most basic form

  • Event Loop – while(true){ scripts.forEach(script -> script.execute() );
  • Our Server Waits for a Connection – This is what will start the following and what we want our server to do 100% of the time (nearly impossible)
  • On Connection
    • 2 ms – Our Server processes the Domain Name/Path
    • 3 ms – Our Server digests the Query (If applicable)
    • 10 ms – Our Server digests Post Data (If applicable)
    • 10 ms – Our Server Validates Query/Post Data
    • 10 ms – make database call
      • on return (10 to 40 ms)
        • 30 ms – Our server Turns the data into servable HTML
        • 10 ms – Our server serves the HTML
  • We start waiting for the next connection

We now have split up a connection, 35 ms to process the query then 40 ms to send it back. The Waiting is not even considered stopped at this point.

  • 30 ms – The Page is rendered
  • Wait for all both
    • 10 ms – We make a call to our server getting favorite locations
    • 10 ms – We make an api call to a map application to get all possible locations
    • On Return (100 ms)
      • 50 ms – We Position the map to our current location
      • 20 ms – We Render favorite locations on the map
  • 100 ms – We render the initial Map on a canvas
  • Wait for Click
    • 10 ms – We make a call to our server load that specific locations information
      • On Return (100 ms)
        • 20 ms – We render the location in the popup
    • 10 ms – Start animation
      • On Finish (300 ms)
        • Thats it
  • Wait for Click

Here We get far larger speed increases, startup is now only 150 ms on initial and 70 ms when it comes back  and only 20 are used up between clicks and 20 when the ajax call comes back. The animation is essentially there just to mask the ajax call anyway. These are big differences

85 compared with 35 + 40

400 and 420 compares with 150 + 70 and 20 + 20

The breakups are really important as well since everytime Its broken up, it allows the application to do other tasks

So Async is Perfect, no problemo

Not exactly… Similar to Functional Programming (which is likely going to be a different topic), this will make things fast (and in functional programming arguably more reliable and predictable). However, as it stands the way you have to design your application becomes a little bit stranger. Right off the bat its important we talk about threading.

Threading – The Building block of Async

In the Async programming model generally what happens is there is a seperate worker thread that recieves work, does it, then sends the result back (basically a function). So if we are to do this raw dog, this is what we are looking at

var worker = new Worker("./Path/to/a/script");

worker.onMessage = doTheRest;

worker.sendMessage(input)

This is a basic Model for Workers.  We will create a thread, listen for when it returns us data and provide it an input to do. However, multiple scripts/modules/whatever you want to call them will likely be using this one worker. Something like an Ajax call is common for a ton of applications to use and we don’t know who will get what.

setTimeout(function(){
  worker.onMessage = doTheRestOne;
  worker.sendMessage(inputOne);
}, Math.random()*1000)

setTimeout(function(){
  worker.onMessage = doTheRestTwo;
  worker.sendMessage(inputTwo);
}, Math.random()*1000);

Which happens first? Will doTheRest be registered before inputOne is finished? As a result we need to consider how to keep it relatively resusable.

Object Oriented Events – The XMLHTTPRequest standard

First I will create a rather unoptimized class

function OurWorkerClass(){
  this.worker = new Worker("path/to/a/script");
  this.worker.onMessage = function(packet){
    if(packet.error){
      return this.errorFn(packet.error); 
    }
    this.finishFn(packet.output);
    this.worker.destroy();
  }.bind(this);
}

OurWorkerClass.prototype.onFinish = function(fn){
  this.finishFn = fn;
}

OurWorkerClass.prototype.onError = function(fn){
  this.errorFn = fn;
}

OurWorkerClass.prototype.doWork = function(input){
  this.worker.sendMessage(input);
};

This is unoptimized since we are creating a worker and a closure for every instance. But this is to show how this thing works

var worker = new OurWorkerClass();
worker.onError(handleError);
worker.onFinish(function(output1){
  var outherWorker = new AnotherClass();
  otherWorker.onError(handleError);
  otherWorker.onFinish(function(output2){
    var thirdWorker = new thirdClass();
    thirdWorker.onError(handleError);
    thirdWorker.onFinish(finished);
    thirdWorker.doWork(output2);
  });
  worker.doWork(output1);
});
worker.doWork(input);

I would say the frame work muddles the code. Much more initialization then logic step by step progress. Ugly stuff.

Enter Callbacks

var worker = new Worker("./Path/to/a/script");

var pendingWork = {};

function doWork(input, callback){
  var id = Date.now() + Math.random().toString();
  pendingWork[id] = callback;
  worker.sendMessage({id: id, input: input});
}

worker.onMessage = function(packet){
  var id = packet.id;
  var error = packet.error;
  var output = packet.output
  pendingWork[id](error, output);
  delete pendingWork[id];
}

This is a basic Callback Model for Workers.  To use it we would call the doWork function with an input and a callback and it will correctly notify us which work did what. However, when in practice, this is what it turns into.

doWork(input1, function(err1, ouptut1){
  if(err1) return finished(err1)
  doOtherWork(output1, function(err2, output2){
    if(err2) return finished(err2);
    thirdWork(output2, function(err3, output3){
      if(err3) return finished(err3);
      fourth(output3, function(err4, output4){
        if(err4) return finished(err4);
        finished(void 0, output4);
      });
    });
  })
});

Theres the argument that Ryan Dall spoke about in terms of creating multiple functions to avoid it. This actually isn’t a bad idea in general as every function you create in another function would literally be created instead of being referenced from before. This is what it looks like though

doWork(input, callback1.bind(void 0, finished));

function callback1(finished, err, output){
  if(err) return finished(err);
  doOtherWork(output, callback2.bind(void 0, finished));
}

function callback2(finshed, err, output){
  if(err) return finished(err);
  thirdWork(output, callback3.bind(void 0, finished));
}

function callback3(finished, err, output){
  if(err) return finished(err);
  fourthWork(output, finished);
}

And this only exists because javascript exists as a two sweep scripting language and hoists functions to the top. In my humble opinion, this is fugly.

Promises – One of the many gifts jQuery popularized

Promises are one of the greatest things that has ever happened, I assure you. But they aren’t too freindly from a speed/memory perspective according to many node contributers.

var availableWorkers = [];
function getWorker(){
  if(availableWorkers.length){
    return availableWorkers.shift();
  }
  return new Worker("path/to/our/script");
}

function finishedWorker(worker){
  availableWorkers.push(worker);
}

function doWork(input){
  var worker = getWorker();
  return new Promise(function(res, rej){
    worker.onMessage = function(packet){
      finshedWorker(worker);
      if(packet.error) return rej(packet.error);
      res(packet.output);
    };
    worker.sendMessage(input);
  });
}

perhaps I’m muddling too much worker code with these examples. It just is a lot of fun. Regardless, this is what it turns into

doWork
  .then(doOtherWork)
  .then(thirdWork)
  .then(fouth)
  .catch(handleError);

Sexy, clean, beautiful. Really is georgous in my humble opinon. However, things aren’t always so clean. See Example B

doWork(input).then(function(output1){
  var p = doOtherWork(output1);
  p.catch(handleSpecialError);
  return p.then(function(output2){
    return doOneWith2Arguments(output1, output2);
  });
}).then(doThird.bind(void 0, input))
.then(function(output3){
  return doFourth(output3, input);
}).then(function(output4){
  return Promise.all([
    doFifthA(output4),
    doFifthB(input)
  ]).then(function(outputs){
    return finishFifth(outputs[0], outputs[1]);
  });
}).catch(handleError);

Once we start customizing our catches and arguments, things start getting weird. It can start to become quite difficult to figure out what the hell is going on. On line three, that catch will exit the program. doThird recieves the output of doOneWith2Arguments and also takes in input1 as its first parameter. For doFourth we need to pass in input as the second argument. The fifth is attempting to do two works side by side. So what are we supposed to do?

Generators – Not made for Async, but looks like it

Going Async With ES6 Generators

This is what above looks like

runner(function main*(){
  try{
    var output1 = yield doWork(input);
    var output2;
    try{
      output2 = yield doOtherWork(output1);
    }catch(e){
      specialErrorHandle(e);
      return;
    }
    var output3 = yield doneWith2Arguments(
      output1,
      output2
    );
    var output4 = yield doThird(input, output3);
    var output5 = yield doFourth(output4, input);
    var outputs = yield Promise.all([
      doFifthA(output5);
      doFifthB(input);
    ]);
    var finaloutput = finishFifth(
      outputs[0],
      outputs[1]
    );
    finished(finaloutput);
  }catch(e){
    handleError(e);
  }
})

This is almost the holy grail. What we’ve been waiting for. Something that looks like what is should be. Its a crazy thought right? A program being sequential and effective? Wild stuff really. Unfortunately, these still need to be wrapped in some function or be used via promises or callbacks.

Await – The True Holy Grail

https://jakearchibald.com/2014/es7-async-functions/

Basically above only await can be used anywhere and likely on anything that returns a promise. It will be glorious.

So, this post is over right?

Well, yes and no. Lets go back for a second. So Async allows us to handle work in a seperate thread and continue execution without blocking the event loop. This is the important thing though. Blocking. If it weren’t for the blocking of the main thread, there would be no problem. But as GPU processing becomes easier to use and CPUs go from 64 bit single threaded to 64 bit 4 core we start seeing the opportunity to maximize what we have.

What if events spawned a new thread?

Lets look at our server example

  • Wait for Connection
    • Create a new thread (or retreive one from the pool)
    • Provide the thread the connection
  • Wait for Conection

Problem here is that Database and http calls would require witing in threads probably causing 20 to 30 threads running at one time slowing down everything

Lets look at a client Example

  • On Click
    • CSS Animations (GPU Bound)
    • Ajax Call -> dom manipulations
      • Next Animation Frame Write dom to GPU

CSS animations is fine, dom manipulations are global. This means that there would need to be a global thread that is mutable by all others.

So the issues would be

  • Mutability of Shared Resources (Dom specifically)
  • There may be a situation where there are more threads than necessary running at once causing everything to slow down.

Lazy Everything: Dirty checking, Caching results, single iteration

One thing that is somewhat popular now adays is Lazy Evaluation.  Lodash implemented a form after a competitor (Lazy.js) where showing big promise for takeover. However, javascript isn’t the source of all lazy evaluation. Haskell, Scala and other Functional Languages have been taking advantage of it in full effect for a while. Things Such as the streaming api in node and lazy getters will likely cause a vast speed increase in your application.

Functional Programming Model

This is where I can go down another path. But heres the basics of it. Every time you set an output of a function to another you create a memory pointer or clone the variable. Additionally Whenever you provide it as an argument, the same thing occurs. Hitting memory 1 time instead of 2 times may increase your speed greatly. Referencing a property directly instead of through a pointer may also greatly increase the speed of your application. But beware, this may be what your application looks like

lastFunction(
  firstFunctionCalled(),
  FifthfunctionCalled(
    ThirdFunctionCalled(
     SecondFunctionCalled()
    ),
    FourthFunctionCalled()
  )
)

This may start becoming intuitive but for me I usually think in step by step instead of what to do last. That being said, generally your application will not have mutations to the global scope or mutate the arguments so this form of programming should be fine

Closing thoughts

Async solves the waiting problem which is a very important problem indeed.

ORM’s Dreaming Big: pt 2 (The instance)

Before we went over the Schema which is about validation, indexes, population and schematypes. Here we’ll go into what people will be most likely using, the instance. So, what is the instance?

An Instance is

  • Holds values you wish to store or have retrieved
  • Something that can be created, requested, updated and deleted
  • Something that has properties which can match conditions

Basically an instance is the actual values you want to store or retrieve. Probably the most important part about the database as without it, well, you have nothing.

Generic yet Important Things

Callbacks

Callbacks can be implemented in one of two ways; callback(err,obj) or Promises.

Constructor(ObjectId, function(err,instance){
  if(err) throw err;
  console.log("this is our instance", instance);
});

Constructor(ObjectId).next(function(instance){
  console.log("this is our instance", instance); 
}).catch(function(err){
  throw err;
});

This is meant to support anything that you want.

Handles and Values

Instances are technically either an ObjectID handle or the Actual Values. They both have the same interface however with ObjectID’s you do not have to load all of the values nor do you have all of the values. While with an instance you do. This is to support as much IO or as little IO as you desire without having to change the interface.

Creating

Creating an instance should be as simple as creating an object in javascript

Standard – Construct and Save
var instance = new Constructor({
  property:"value"
});
instance.property2 = "value2";

instance.save(function(err){
  if(err) throw new Error("creating caused an error");
  console.log("finshed creating");
});

Now, we haven’t gotten into “Constructors” or “Models” however hopefully this sort of syntax is familiar to you. It’s simple. We want to create a new instance, so we construct it. Because values may be added or removed, the object is not saved right after creating. Additionally, its important that this is done asynchrnously. We don’t know when the file will be saved or how the file will be saved, only that the file will be saved.

Calling the Constructor – Less IO

When all of the values are already in the json object, constructing the object is a waste of time and resources.

Constructor({
  property:"value",
  property2:"value2"
}, function(err,objectid){
  if(err) throw new Error("error when creating");
  console.log("finished creating");
});

You may notice that the standard’s callback has no objectid but the create does. This is because if when you’ve succesfully saved, an ObjectID is already set for you in addition, you already have an interface to interact with the Instance. So there is no point in returning anything. While when using create, it will give you the ObjectID handle to provide you an interface to interact with. However, the handle will not have any properties in it so I would suggest you use instances if you want them.

Static Method – Obvious

In addition,  you may also call the static method. This will return the instance.

Constructor.create({
  property:"value",
  property2:"value2"
}, function(err,instance){
  if(err) throw new Error("error when creating");
  console.log("finished creating");
});

Retrieving

Generally the way retrieving will work is through the Constructors static methods. However, we are going for sugar here.

Standard – ObjectID Populating

If we have an ObjectID Handle, we can populate it into an actual Instance. It’s important to note there is a difference between an ObjectID Value and an ObjectID Handle. The value is the bytes/buffer that actually gets indexed. The ObjectID Handle has all the methods of a normal Instance. Generally all ObjectID Values will be transformed into Instances when retrieving an instance. In addition, anywhere you can use an ObjectID Value you can use an ObjectID Handle

objectidHandle.populate(function(err,instance){
  if(err) throw new Error("error in populating");
  console.log("populated the instance");
});
By ObjectID Value – Opposite
//Retreiving
Constructor(objectidValue,function(err,instance){
  if(err) throw new Error("error in retrieving");
  console.log("retrieved the instance");
});

The above is simple. We use our constructor with the object id and it will return an instance. This is the exact opposite as the initial where we send in some values to create an instance and we receive an objectID handle and this will retreive the instance based on the Handle or Value.

Static Method – Obvious
//Retreiving
Constructor.get(objectidValue,function(err,instance){
  if(err) throw new Error("error in retrieving");
  console.log("retrieved the instance");
});

Updating

Standard – save

With your Constructed/Retrieved Object it is the exact same as it was before, simply save.

//Updating
instance.property = "new value";
instance.save(function(err){
  if(err) throw new Error("updating caused an error");
  console.log("finished");
});
Static Method – Obvious

You may also update just by calling the update method of your constructor

Constructor.update(ObjectId, 
  {property:"new value"},
  function(err, instance){
    if(err) throw new Error("error when using update");
    console.log("ran update");
});

Deleting

Deleting is the last part of our crud interface here. As you might Imagine, it’s more of the same

Standard – destroy
//Updating
instance.destroy(function(err){
  if(err) throw new Error("destroying caused an error");
  console.log("finished");
});
Static Method – Obvious

You may also update just by calling the update method of your constructor

Constructor.destroy(ObjectId, function(err, instance){
    if(err) throw new Error("error when using update");
    console.log("ran update");
});

 Property Setting and Getting

Digestors and Verbose

All properties on an instance are actually getter and setter functions.

Object.defineProperty(instance, "propertyname", {
  get: function(){
    return Schema.propertyname.verbose(
      instance._rawValues.propertyname
    );
  },
  set: function(v){
    var ds = Schema.propertyname.digestors;
    var l = ds.length;
    var vv = void(0);
    for(var i=0;i<l;i++){
      vv = ds[i](v);
      if(typeof vv != "undefined") break;
    }
    if(i===l){
      throw new Error("cannot digest ",v);
    }
    instance._rawValues.propertyname = vv;
  }
});

For the getter we are returning the verbose value. For the Setter, we are digesting the value to its raw type.

Marking Dirty Properties and resetting

The first thing that can be done is to mark dirty properties. This also ties in with the way digesters and getters. In addition when setting, we also set which properties are dirty. This is done so that only the dirty values are actually updated

set: function(v){
    var vv = Schema.property.digest(v);
    if( vv == instance._initvalues.property ){
      delete instance._dirty.property
    }else{
      instance._dirty.property = vv;
    }
    instance._values.property = vv;
   }

Instance.prototype.save = function(cb){
  return Instance.update(this.id,this._dirty,cb);
}

Instance.prototype.reset = function(){
  for(var i in this._dirty){
    if(this._dirty[i]){
      this._dirty[i] = false;
      this._values[i] = this._initvalues[i]
    }
  }
}

 

Resync with the sender

At times you may want to ensure your instance is the exact same as the one on the database or the location that sent you the instance. All that needs to be done is resync

Instance.prototype.resync = function(cb){
  var _this = this;
  Instance(this.id,function(err,values){
    if(err) return cb(err);
    for(var i in values){
      _this[i] = values[i];
    }
    cb(void(0), _this);
  });
}
Listening for Updates

You may also use an event emitter that sends “update” event with property name and value

Instance.prototype.syncTo = function(ee){
  var _this = this
  ee.on("update",function(prop,val){
    _this[prop] = val;
  })
};

Dom and QueryString Interactions

And of course, you will need some dom interactions. Idealy, I would use an available library such as qs and serializeObject and deserialize to ensure I don’t mess up. From there I would properly set either the values in the object or the values in the query string or form. In addition its also possibly to bind the Instance to the form by using syncTo.

That is the instance

Perhaps there is too much sugar here. Perhaps not enough in the right areas. I’ve considered streaming as an alterantive plan however, In the end, I believe a simple API is a good api. Perhaps I should prioritize a bit. What is for sure in is all of the static methods, ObjectID.populate, instance saving, instance destroying and using the Constructor, well as a constructor. In addition, the dom/querystring aspects is pretty important since without it we’re back at square one: A decent ORM with refusal to believe the DOM or urls that don’t use JSON exist. Everything else is a bit up in the air.

ORM’s Dreaming Big: Pt 1 (Schema)

So I’ve came across waterline. Waterline is unique in that it isn’t just an interface to sql or mongodb but anything.* Anything*. (*Anything that someone has made an adapter for it). Unfortunately my experiences with it have led me to believe it is not a good fit for a full stack outside of sails. I’ve written more about my issues with the framework here. Now, I’m aksing for a lot therre so I don’t expect it to go anywhere. But I made me have a three day coding sprint of trying to develop my own ORM the way I wanted it. As I continued, I realized it is a ton of work alone and not all of the things I desire are available to mooch off of. But the Idea… The Dream… That can live on…

What would it be used for?

This is one of the reasons why I wrote this post. Waterline is awesome. However, It left me wanting more and questioning if supporting too much is giving me less. As a result, the interfaces I believe are of upmost importance to provide compatibility with is.

  • Memory – This was a proof of concept by them, however it can be implemented in a fast manner. Libraries like Lazy.js will compile all the arguments so it s only ran in a single loop. Additionally, supporting proper indexes can add even more speed to it. However, the point of being able to use in memory is so that you can create “collections” easily, beautifull and query them as you would anything else
  • LocalStorage – This is another clientside feature that can be implemented. LocalStorage is an interesting beast but nonetheless quite manageable. WHat you would do is store each everything to start with connection/model where connection and model would be the connection and model name. From there you would store indexes under connection/model/indexName and probably each object in its own place such as connnection/model/ObjectID. This will allow you to not load too much at once and be able to asyncrnously retrieve objects as you need them instead of loading everything into memory and hoping all goes well.
  • HTTP- By providing a wrapper to create HTTP calls, you can interface with your database easily as if it was again on the server. Of course a serverside implementation is also necessary, however I think that’s relatively simple in the long run. Perhaps that begs the question of creating “Users” that can interface with your ORM.
  • FileSystem – Mongodb is the standard, without a doubt. However, I’m a strong believer in diversity (when it’s convenient to say I am). As a result, creating a filesystem document based framework doesn’t see too far off or out of line. It would most likely be quite similar to localStorage actually
  • MongoDB – Mongo in many ways provided the breakthrough db. Might as well still be able to interface with it

Sugery Snacks

The Validator

The validator Is without a dote one of the most important features to any database. The purposes of a validator are not just for the databases purposes but also for the clientside. When creating a form, being able to just hook in and apply a validator is without a doubt one of the sweetest things possible. Unfortunately, those validation parameters generally are not included in the database as well. This is partly a good thing since you don’t users to see all of your internals, however for some code rewriting your models is kind of a pain in the ass. Just tedious work. As a result, here is the first commandment

  • A Validator can also be easilly hooked into any form
  • A Validator can also easily be used to generate forms

The second part is obviously much more complicated as you can see here and here.  But we’re dreaming big here right? No holds bar. Whatever we damn well please. And I would be pleased to not have to rewrite code for every single form I ever come up with.

As for Creating it, Here is a Laundry list of features….

“Native” Types

The Native Types I would prefer to keep as simple as possible

  • Number
  • Buffer
  • String
  • JSON
  • Any
  • Typed ObjectID – Can specify a specific Model(s) allowed.
  • Typed Array – Can Specify the Type of Array it will be (Any is also allowed)

The reason long, date and others are not supported is because those will end up being compiled to these native types anyway. The Object Id is the only thing that’s really different. Numbers, HashMaps and ObjectIds are the only thing that cannot be evaluated to an array.

to use these

prop1:Number, //You can provide the object class
prop2:"buffer" //You can provide a string specifying the type
prop3: ["string"] // You can create a typed array
prop4: {
  native:Object //you can specify the type explicitly
}
prop5: Framework.Types.ObjectID //This specifies any other document
prop6: "objectid:modelname" //This specifies that you expect it to use another model
prop7: AModelClass, //This specifies that you expect to use that other model. This will be the same as above
prop8: {
  native: "objectid"
  model: "modelname" //Specify the type explicitly
}
prop8: null, //Specifies Anything
prop9: FrameWork.Types.Anything //Specifies AnythingAswell
Additional Types

You may also use custom schematypes. However, schematypes will not have all the features that a validator expects. In addition, you may also provide anything that you would write in a custom schematype within the schema as well.

To use a custom schematype

prop1: {
  native: CustomSchemaTypeClass
}
prop2: {
  native: "customschematypeclass"
}

If you provide a string, your validator will create a dependency on that schematype. This means if that schematype is not available in your framework, until it is your model cannot be used. Below cannot be used with Custom SchemaTypes and only available to the Schema. Additionally, you can provide custom options that will override the SchemaTypes original Options

prop1: {
  native: "customschematypeclass",
  a_custom_option: "a value"
}
prop2: {
  native: CustomSchemaTypeClass({
     a_custom_option: "a value"
  })
}
Basic Validators
  • Required – Cannot be null or undefined
  • Final – After first created, cannot be set again
  • Unique – this will also create an index.

These are simple and straight forward

Default

At times you may want to provide a default. And now you can in three different ways

property:{
  native: [String],
  default:["value"],
}
property2:{
  native: [Number],
  default:function(){
    return Math.random();
  }
}
property:{
  native:[ObjectID]
  default:function(next){
    Query().find({something:value}).exec(next);
  }
}

 Validators Available in Custom SchemaTypes

The following are available to use within your Schema as well as SchemaTypes. Schematypes are interesting in that they can be extended indefinitely however only return the Schematype. THis is done because of the following

function CustomSchemaType(options){
  if(!(this instanceof CustomSchemaType)) return new CustomSchemaType(options);
  this.options = options
}

CustomSchemaType.prototype = function(options){
  options = _.merge(this.options,options);
  return new this.constructor(options); 
}

Simply Put, you can extend and extend and extend away

Custom Validators

Custom validators come in two flavors: Syncronouse and Asyncronous. This will be a theme as we continue

syncProprty:{
  native:Number,
  validator: function(value){
    return false;
  }
},
asyncProperty:{
  native:Number,
  validator: function(value,next){
    next(false);
  }
}
Value to Array Validation

The idea here is that you may want to compare an value to the array of values. Something like

//BAD!
arrayCompareBad:{
  native:Number,
  validator: function(value){
    return [1,2,3,4].indexOf(value);
  }
},
//Good.
arrayCompareGood:{
  native:Number,
  in:[1,2,3,4]
}

But that is slow as you create an array every time. Instead the idea is you’d be able to define it before hand

  • In – Ensures that the value is in the values
  • Not In – Ensures the value is not in the values

Now you can provide this value up front or provide it through a Syncronous or Asyncronous Function. It’s important to note that anything can use this syntax.  Enum’s have bothered me for quite some time. Any value can be compared to other values to enusre they are restricted by a certian subset. Enums used to only apply to strings however they can be applied to numbers, Buffers and yes even Arrays. Hashmaps and ObjectIds are a different beast however. Hashmaps are keys so there is no point in attempting to give it an enum. ObjectID’s require that certian ObjectId’s already exist. Now, this can be done however At that point you would need to specify a query and do it asyncronously.

Array to Array Validation
  • Any – True If At least One of Validator Array Values are present
  • All – False Unless all of the Validator Array Values are present.
  • More – True if there is more than just the Validator Array Values present
  • Not Any – False if at least Obe of the Validator Array Values are present
  • Not All – True Unless all of the Validator Array Values are present
  • Not More – False if there is more than only the Validator Array Values are present. Will also return true if empty.

You can use these like so

property:{
  native: [String],
  any:["any","of","these"],
  not_more:["any", "of", "these", "and", "no","more"]
}
Population and Save Overrides

At times you will want to Populate your data from a source other than the database. Additionally, you will want to override how that property is stored after its been validated. An example would be “User.notifications”. To duplicate the data would be absurd and if you are storing every ObjectId, you may run into numbers in the thousands. However, you can have that particular part populated on the fly.

image: {
  native: string,
  native_populate: Buffer,
  populate: function(storedvalue,next){
    fs.readFile(storedvalue,next);
  }
}

It should be noted that these aspects should probably also be available as a stream. Something like this would also work

function MyReadableStreamClass(storedvalue){
  ReadableStream.call(this);
  this.storedvalue = storedvalue;
}
image: {
  native: String,
  native_populate: Buffer
  populate: MyReadableStreamClass
}

This will create the readable stream on the fly. It should be noted that different things will populate in different ways. As a result, while this may send raw data, another might send JSON.

Now, you may be populating data, however what happens when someone wants to save something.

image: {
  native: Buffer
  depopulate: function(args,next){
    var ext = mime.findOutMimeExtension(args.name)
    var name = this.instance.id+"/image."+ext;
    fs.writeFile(name, args.buffer, function(e){
      if(e) return next(e);
      next(void(0), name); //name will be stored with the doc
    });
  }
}

//or

image: {
  type: Buffer,
  depopulate: MyWritableStreamClass
}
Digestors (Constructor Overloading) and Verbosity

Your developers may  want to be using Moment’s as dates. However, expect to be able to send a normal Javascript date as something to be stored. This is where digestors and Verbosity comes in

date: {
  native: Number,
  digestor: [function(date){
    if(date instanceof Date) 
      return date.getTime();
  }, function(time){
    if(typeof time == "number") 
      return time;
  }, function(moment){
    if(moment instanceof Moment) 
      return moment.valueOf();
  }]
}

Here we can see that any of the above values will be considered a valid number. As a result, you don’t have to worry about what you set the Date to be. If you want to always get the date as a moment

date:{
  native:Number,
  verbose:function(value){
    return moment(value);
  }
}

This will allow you to easily do whatever you want with the number without changing your database.

Schema Methods

Virtual Properties

Now, we’ve seen actual properties, but there are some properties that will not be stored but are derived from the instance itself

var schema = new Schema(validations);
schema.virtual("virtual_property", function(){
  // getter
  return this.stringA +"-"+ this.stringB;
},function(value){
  value = value.split("-");
  this.stringA = value[0];
  this.stringB = value[1];
});

Schema Indexes

The last and probably the most impotant is the indexes. With indexes. Now, Indexes are not and should never be available to a custom schematype. Additionally, because indexes can be so flexible, It brings up some interesting decisions. It’s important to note, not only can you index normal properties however you can also index virtual properties as well. Indexes available are

var schema = new Schema(validations);

schema.index("propertyname");
schema.index("uniqueproperty", "unique");
schema.index("functionalproperty", function(a,b){
  return b - a;
});
schema.index("callbackproperty", function(a,b,next){
  async.map([a,b],fs.stat,function(err,res){
    if(err) return next(err);
    next(void(0), res[0].size - res[1].size);
  });
});
Validate

Using validate is simple.

  • If the object is JSON – will validate the json
  • If the object is DOM element – will validate the dom element if its a form. If it is not, will throw an error.

This will be available from the model via Model.validate which is is basically Model.constructor.validate.bind(Model.constructor)

Finishing words

It’s important to note that the schema is simply a validator and provides database indexes. You cannot make queries with it and it essentially does nothing except provides important information for storing the data. Ideally, you will want to do as much as you can with the SchemaTypes so that you can reuse more code. And the indexes, virtual properties, what’s required and etc is Schema to Schema dependent. If you don’t like me dreaming big, well… To be honest… I believe dreaming big is part of the reason why I am here today. Because I see what I want to make and I go out and try to do it. And If I cannot, I flesh out the idea so much so that I can look back on it and say “If only”.

Automated Mongoose: A Red Herring?

So for many months I have been attempting to do automate the views and methods of Mongoose Schemas. However, the longer I attempt it, the further I realize how loose Mongoose can be. A small example is the SchemaTypes (Which has little to no documentation). At this Location we see an issue I and another person has had an issue with. While the maintainer isn’t very interested in fixing this, despite it existing in every other schema type (His own reasons are his, not for me to accept or deny). I have gone down a seperate route

    if(path.hasOwnProperty("caster")){
      return "Array"
    }else if(typeof path.instance != "undefined"){
      return path.instance;
    }else
      return path.options.type.name;

It’s not a huge issue, but it results in a little frustration. None the less I’m finding there are many issues with the whole scenario. And I don’t mean in terms of SchemaTypes, I mean other issues. Amoung them being…

  • Extended SchemaTypes: Urls are Essentially Strings, however they must go through a different validation process. Should I extend the String SchemaType inorder to ensure it has the same possibilities?
  • Faceted Searching: This is very important. When it comes down to finding exactly or around what you need, its nice to have a way to trim down the issues. However, each SchemaType has their own MongoDB Comparison Operators or Evaluation Operators. Of which I cannot be sure of which can be applied to which (Unless I check for Ancestry)
  • Different Properties are viewed differently: though the input may be the same, ensureing a Title is seen as a title and a url is used as an href isn’t a given. This may seem obvious, however I have been attempting for many months to ensure that there is no difference between routing and viewing. Mostly because I enjoy being a lazy dry programmer.
  • Different Models use a different Organization Pattern. This is Most apperent with Maps and Photo oriented data where nobody really cares about the text unless they click on something.
  • Almost all Models will use some sort of Auxillary Index or Model. For Example: You can have a user model. Has its name, email, password and role. Basic Stuff. But then we want to add Events. What have they created? What have they liked? etc. In addition, we want to add an index to the number of views to a particular picture but also compare those views to videos. This is where we start creating other things that are not attatched to the Original Model however the information gets appended for the views’ purposes.
  • Terms and Conditions, TourGuide-ing and Multi Page Methods. This also Is pretty important as Even though the person may have successfully authenticated. That does not mean they are good to go. However, how are we to know the next step in a multi page method?
  • User Roles. What is the best manner to document what the user’s role is, the role heirarchy and ensuring we know who can do what in the routing and the viewing.
  • Pretty URLs: Nobody wants to see /Items/3746982119433234. Its ugly, unfamiliar. People would rather see /item/best_red_rose_bouquet
  • Model Index and Root: What do we do here? Give them a preview? tell them to do X, Y and Z?
  • Aggregating Content: How do we show the content aggregated? With a Schema?

There are many issues at hand. And It leads me to further understand how nice content management systems are. Not because they are bloated. Not because they are broken. Not because they don’t offer all the features the language of your choice has to offer. Not because they have rediculous patterns for event emitting and caching. Or how they don’t like anything more than the good ‘ol post. But because they solved those problems for you. They solve them by forcing you to do it the hardway. They solve it by giving you an excuse to complain and want better. They solve it by other people getting motivated and solving the problem through “plugins” or “modules”.

I want to automate mongoose so bad. I can feel its at the tip of my finger. Barely inches away. And yet I understand, even after I’m done with the beginnings, there is so much more that you need to make sure people can do or have access to the libraries you use to do it.

Timezones: The client, The server and The database

To Start out, I want to say why I’m writing this post. I have just recently been struggling with server to database interactions with date time and I had no idea why. Thought was “maybe I’m using the wrong time stamp”, “maybe I’m not setting the time to Midnight morning” then i realized “maybe my server is at a different timezone than my database”. And this was very very true.

SQL at base makes their timezones based on the servers timezone. That means if you’re in New York, running a server, it will be based off New York. If London, it will be London. This is ok, so long as your server is on the same timezone, and unfortunately for me, it doesn’t seem like it.  Wordpress has a few functions to work with such as current_time() and the seemingly undocumented option time_zone. Except its useless when it is empty, basically telling me UTC (the universal time for computers generally). PHP also returned my UTC which also doesn’t help me. In all who I hang with the most (My clean cut but simple friend wordpress and my old disorganized but extremely intelligent friend PHP) just don’t see at the same level as mySQL. So what do I do? Well, change mySQLs timezone and change my servers timezone to ensure everyone sees the same.

$wpdb->query("SET time_zone = '+0:00'");
date_default_timezone_set('UTC');

Based off this post and the php manual,  I’ve found this is the ultimate solution to my problem. I was having such problems with timezones being in different places and wordpress not helping too much that I’ve just given up and decided the proper solution is just to set everything in UTC.

This isn’t what I necessarilly want to do though. The reason for this is I’m forcing the user to experience my application in UTC. Why is this a problem? Well, user experience. However, if I’m going to change the website dependent on the users time zone, I would need to find the persons position in the world to find out what time zone they are in. And generally users don’t just give away their time zones. So how am I suppose to do this?

I make the theme have all datetime oriented aspects in my theme return in UTC and when I want to shift the date, either through xslt or jquery, I change the date based on what I recieve from javascript. It isn’t pretty. But I can make it pretty. And functionality is more important than in this case