Failure and Compression

The last few days I was hammering away at some interesting ideas. They were about making any file into a 16 hexadecimal string.  The idea came about when I saw a post on hacker news about distributed systems and url standards for them. When it comes to distributed systems and sharing of files I’m of this opinion

  • You want the requestor able to get chunks from multiple computers and consolidate them into one – This also means that all the chunks know that they are a part of a whole and are able to point to the whole
  • You want the requestor to know whats coming to them – This can be metrics about the file like length, checksum, hash, etc.
  • Security is secondary to ensuring the system is working without hicups. Secure requests should be a manner of encrypted messeges with keys that are already available to each computer.

I want to take the second one a step further. I want to make the url so detailed that the computer can actually recreate the file from their tiny little hash! Impossible? Never! Here is the product of my work https://github.com/formula1/compression-failures

Technique 1: File attributes and filtering possibilities

So the first thing I went after was attempting to turn any file into a few 32 bit integers. This included obvious things like file length to ensure I have a finite number of possiblities and checksums to remove extremes and to limit my possibilities even more. From there its kind of free shooting. I figured my best next step is to seperate the file out into what I will call ‘patterns’. The way patterns work are as so

  • A pattern has a segment length – 010 has a length of 1, 001100 length of two, 000111 length of three, 1111 4, 00000 5, etc
  • A pattern has a repition count – 010 has 3 repetitions, 001100 also has 3, 000111 has 2, 1111 has 1 and 00000 has 1
  • A pattern must also either start with 1 or zero
  • A pattern ends when what would be the next repetition ends early or goes for too long.

Splitting the file up into patterns sounds like a swell idea right? well with the 64k byte file I used, it had around 10,000 patterns. However, more interestingly is that the segment lengths were at 9 and the repetitions were at 7. Now these two aspects filter the number of possibilties even more (which I did not mathematically calculate). Ideally, each of these attributes would get me closer and closer to finally understanding what the file actually contains. However, I realized this isn’t going to be nearly as simple as what I thought

  • given
    • 00110011
    • 11001100
  • can you tell the difference between their length, checksums, unique segments, unique repetitions, total repetitions and starts with 1 counts? yes you can!
    • 8, 4, 1, 1, 4, 0
    • 8, 4, 1, 1, 4, 1
  • what about
    • 010011000111
    • 001101000111
  • You cannot
    • 12, 6, 3, 1, 6, 0
    • 12, 6, 3, 1, 6, 0

So I had this brilliant idea! What if I can figure out the order? Not the actual count, just the order of largest to least. To do this I would check the difference between each pattern. If the pattern increased in segment length 1 if decreased 0. What could go wrong?

  • given
    • 010011000111
    • 001101000111
  • You can!
    • 12, 6, 3, 1, 6, 0, 11
    • 12, 6, 3, 1, 6, 0, 01
  • What about
    • 001101000111
    • 000111010011
  • You cannot
    • 12, 6, 3, 1, 6, 0, 01
    • 12, 6, 3, 1, 6, 0, 01

How about if I get the order of the highest and lowests? Technically I can do this indefinitely until I order them completely!

  • given
    • 001101000111
    • 000111010011
  • You can!
    • 12, 6, 3, 1, 6, 0, 01, 1
    • 12, 6, 3, 1, 6, 0, 01, 0
  • what about
    • 0000011111010011
    • 0000111101000111
  • You cannot
    • 12, 6, 3, 1, 6, 0, 01, 0
    • 12, 6, 3, 1, 6, 0, 01, 0

I would need ot be able to point out exactly how much is in each of those highest amounts. Perhaps I can count the number of unique differences? Nope, because both are unique. This was my snag. I stil don’t know how to handle it. I am also fully aware that the ordering may actually be far more data than I anticipate.

Technique 2: Prime numbers!! 😀

:C So the way I planned to do it was as so:

  • Turn the file into a Gigantic number
  • Boil the number down to prime numbers and exponents
  • If the prime number if too big, represent it as the ‘nth prime’ instead.
  • if the numbe rof factors are too long
    • find the closes prime number to the gigantic number
    • subtract the gigantic number from the prime
    • Do it again with the left over

What I would be left with is a sequence like this

mul(2,3,nth(7),add(nth(13), mul(5,7)))

Looks great right? I thought so too! I deliberately created the syntax so I can format anything into 4 bit operations. Unfortunately I didn’t realize how slow prime number generation could really be. I ended up creating my own because I needed to be constantly streaming them and or hold a binary list. However, the problem with prime generation is that they always need the previous numbers to go forward. Finding a prime is actually about finding what isn’t a prime and collecting the leftovers. This sounds straight forward but its actually pretty ugly and leaves me waiting for minutes at a time to handle 24 bit primes.

  • stack = [];
  • i = 2
  • do
    • yield i
    • stack.push({ value: i + i, jump: i })
    • while(stack[0].value == ++i)
      • var jump = stack.shift().jump
      • var value = i;
      • var index = -1;
      • do
        • value += jump
        • index = stack.find({ value : value });
      • while(isFound(index))
      • stack.insertAt(index, { value: value, jump: jump})
  • while(true)

This generates primes fast, don’t get me wrong. I was suprised at myself at how well it worked. Truelly. Not only that, I can take full credit (with inspirations from some seives). But if it doesn’t create primes, it finds what isn’t primes. If it can’t handle 24 bits, whats to say it can handle 10 bytes or even 1000 bytes? When I was writing the readme I decided I would write a bit about making workers and making it threaded. This is kind of a neat idea but its still not perfect considering each worker still must wait for its previous workers primes. as we get into huge numbers, that is less true because 2*current may often be 3 workers later. Another concept is using a less absolute prime number creator like a mersenne prime. These are easily calculateable and also can interact well with logs so Its possible I could speed up the algorithm to a huge degree. Instead of trying to find out if its prime. I check how far away from the next 2, If 1, I consider it a mersenne prime. Else, get the mersenne prime of the two before it. Multiply it a few times. Subtract the total. And do it again with the left overs. This seems just as good but prime numbers are pretty special. And as good as mersenne primes are, they will probably not always be good enough for my purposes.

What can I say about it?

What I love about working on projects like these is I tend to want to end them. Usually, I want to come to some conclusion like “it’s imposible” or “way too slow” but I always find a way to make it work. From there its just about implementing it or ignoring it. A project like this has huge uses but many of the uses I don’t have direct interaction with. Only potential. I do believe growing myself is important but I don’t want to lose it over a cute little experiment with big potential

Decentralized Name Ledger : Free Write

Whats worse than a javascript developer? A javascript developer with an unthemed wordpress blog, who wants to decentralize javascript.

So, package management is kind of a big deal. But for me it really became in the limelight with npm. I’m truelly amazed at how a tool like npm can speed up development so much. Turning my little project into something I can build anywhere! However, like the great torrent, decentralization is a possibility. Many node developers. Here, I went at length trying to fight the idea of a decentralized name ledger. Insistant that competition is the only way for a name ledger to work. Hypothetically that may be true but only if proven. Here I will attempt to understand the complexities involved in a blockchain style ledger.

So how do we do this?

What maintainence needs to be done? To start out, somebody needs to keep tabs of it. And not just one person, many people. Normally every individual does but realistically a few groups do. This involves

  1. Accepting New transactions – which may happen at an absurd rate
  2. Mining (Validating) the chain – involves heavy duty processing power
  3. Having the ledger available for distribution – Hypothetically, this can be done in chunks since each block can be infohashed and etc. But, realistically, I have no idea how the the algortihm works so it may be that the entire chain changes per transaction (Highly doubt it). (EDIT) Probably because for this to happen, the block cannot change inorder to ensure immutability. However, if it cannot change, it cannot point forward, only backward. Does blockchain try to point forward? It doesn’t need to technically

Someone needs to do this work. Arguably, the makers / Users / Investors of the block chain will but that means that our coin is… well… valuable. And as a maker, theres only so much I am willing to invest until I realize the money / work I put in will not be the same as I get out. I expect no one else to do the same unless they feel like waisting electricity for a place in internet history. However, many of these issue can be solved by…

  1. Using someone elses block chain

Enter Etherium. This is quite literally what it was made for. For other people to tag on their arbitrary nonsense. So… technically…

  1. Our Coin is “valuable”
  2. There exists a mining community
  3. There are individuals willing to distribute the ledger

And unless someone decides to sell big it will stay that way. Alright. Great. Now lets make a name ledger!

Ok, there exists examples. So EtherId decides to allow you to pay for time. Then you must pay for more time. Who do you pay? Because ethereum has a limit, am I really willing to dish out (now 5 dollars) up to 100 dollars for a name? can I sell it back?  Paying for time seems like it could end up in an endless battle. Lets organize these concerns

  1. does there need to be a cost for a name? – I would say yes. The last thing anyone needs is a spammer destroying a currency.
  2. Is the cost for a name static or can it rise/fall? – It should. There is no reason why a name should always cost 1 ether especially considering nobodies are meant to be publishing their packages happily.
  3. Where does my money go? Can I release the name and recouperate it? – If say no, I have sincere doubts about the viability of the system as a whole. At that point the currency is trying to keep their necks above water where it consumes from everyone else effectively causing deflation.
  4. Paying for time – Lets assume we own the name ‘hello-world’ which resolves to a package used by newbies world wide. Lets say I didn’t pay this month because I was broke. Someone will ill intentions swoops in and takes it. Now they never have to sell or give it back. There is no centralized authority to fight about this. There may be no public shaming if done anonymously. So long as this other individual has enough, they will always have control over it.

Lets trial and error some solutions!

  1. Names will increase in value the more that are taken within a given letter range
    1. Example
      1. For 1 letter names, there exists 36 possibilities.
      2. The first name may be free
      3. the second may cost a bit
      4. third more
      5. etc
      6. Each name’s maximum is the maximum number of Permutations for given length (name.length!)
    2. Good
      1. What this does is curbs spamming of names and rewards more lengthy unique names
      2. When done only done to individual accounts, it doesn’t bring into account when a spammer makes 1,000,000 accounts each registering one name
    3. Bad
      1. Puts a cost (though small at times) on registering names. This will prevent most people from ever registering since then they have to actually have ether
  2. The price of keeping a name increases as the bid to buy it increases
    1. Example
      1. I buy the name ‘big-money-no-wammies’ name at 0.01 ether
      2. Years go buy with not another dime spent
      3. Someone comes in and wants it and offers 0.02 ether
      4. I don’t want to sell but they other person impatiently waits
      5. The price of keeping the name is now 0.01 ether
      6. Another person comes in and sets the price at 0.005
      7. The price starts at 0.01 ether
      8. Another person comes in and makes a bid for 0.02 ether
      9. The price remains at 0.01 ether
      10. Another person comes and makes a bid at 100 ether
      11. The price for keeping the name is now 50 ether
      12. I cannot pay
      13. I recieve their 100 ether bid, I recieve all the money I’ve had to invest back
      14. All other bids persist
    2. Good
      1. People cannot squat on a name
      2. Its possible to crowdsource the purchase
      3. Most names do not need to maintain upkeep so people can keep them for free
      4. People cannot bully others quickly without taking a hit
    3. Bad
      1. This can turn into bullying. Forcing someone to continuously pay .1 is may still cause many people to lose their names
      2. ‘Fairness’ is not rewarded – I lose my name to someone forcing me to pay 10. I want it back so I use their 10 to make them pay. They dont pay so we get the name. They make a bid for 10 again. And the cycle continues where I must invest while they just force me to invest
  3. Rising and failing of the price of a name can be solved by making our own crypto currency! Oh wait…. Bad solution…
  4. Clients can ‘lockin’ a name to a specific user. That user can then point a new name to be considered what the old name was
    1. Example
      1. I get bullied out of ‘poop in my pants’
      2. People only trust me despite the fact the other person is in control. In their client that resolved the name, they ‘locked’ me to the name
      3. I create a new name that points to the old one.
      4. The next time they resolve my name, they then get redirected.

Ok, those solutions are fine but I still think that competitive centralization is a better form. for a couple of reasons

  1. Bullying can be avoiding by blocking IP addresses/deleting users and undoing whatever crap has happened. In our examples, we are trying to curb it but it still can exist
  2. We raise the barrier to entry- We are effectively pushing away people who want to try it and enabling those who want to find reasons to destroy it
  3. Finding out who is the true owner doesn’t need to happen through parsing possibly millions of transactions

All of these are very real and very bad problems.

Here Enters the “Free Market”

So instead of having a distributed ledger with no one source of truth. Lets try to build tools so that individuals don’t have to depend on one source of truth.

The Client Can
  • Use Mulitple Ledgers to resolve a name
    • Issue – This may or may not resolve to the same outcome
      • Solution – Its in the best interest for each to resolve to the same outcome
  • Trust one ledger over others when they conflict
  • Tell multiple ledgers that they wish to take a name, get rid of a name or transfer a name
    • Transfering implies the other user exists on all ledgers and are associated with a single unique identifier
  • Do research on a ledger to identify them as someone they want to support or not
A Ledger Can
  • Be a part of a network using a distributed ledger – This would be ideal if the market becomes crowded and/or to enable competition. In this manner, all shops can run. Some will be used more than others and some will have quality control tests
    • Issue – if adding to the ledger is a free practice, it may result in issues
    • Solution – since this is a distributed network rather then attempting to ‘be’ blockchain, ledger writes may be white listed
  • Deny access to the resource associated to the name for any of the following reasons
    • The person who had registered it had been removed from this ledger for reason X
    • The person who registered it has not associated it to anything
    • The result of the name does not pass the quality requirements of the ledger
    • They don’t feel like it
    • A perfectly valid reason that I cannot think of and they may not be able to either
  • Send a breif history of that name along with the result of resolving the name to prove validity of themselves or others
Everything else is Survival of the Fittest

 

So Whats the point?

Well, to be fair, if there are a million name resolvers running around it can get pretty ugly. But additionally, a completely inhuman approach leads to the inability to police it except from a game theory perspective. I think long term the game theory approach is probably the best but long term is not short term and short term I believe the free market approach is far more appealing. We avoid the headaches of ‘Whoops, didn’t think of that’ and allow people who did think of it to implement it. Additionally, I believe people’s priorities will change over time. Now its just about getting a name. In the future, a testing suite might be necessary. In the far distant future, you may be forced only to use certian technologies. I do believe its in a ledgers best interest to use a distributed ledger. But I don’t think that making it trustless is necessarilly possible. Making it safe is much more possible