Mihai Stancu

Notes & Rants

Liquid networks — 2015-08-26

Liquid networks

Chemical interaction

Gasses are a chaotic and dynamic state of matter. In gases elements can meet and mix and react, fast. Heat, pressure and newer more attractive reactions can separate the newly formed compound in a very short span.

Solids are orderly and still. In solids elements cannot move and reactions can take place only on the fringes of the solid. Unmoved the newly formed compound won’t enter any new reactions.

Liquids on the other hand can allow mixing and precipitation and thus reactions which take longer to catalyze will get a chance to occur. The newly formed compound can drift and enter new reactions.

Human interaction

A human network with the properties of a gas is like a boiler room or the open floor of a trade market with no walls, they talk to so many people under heat and pressure trying to find the most attractive deal. Things get done and things get undone quickly — there’s little time for creativity.

A human networks with the properties of a solid is like a rigid departments system with walls between them. They mostly talk among themselves and only communicate with certain other departments according to the normal workflow. Things get done a certain way and only that way — no room for creativity.

A human network with the properties of a liquid1 is an open space with just the most basic rules to not be disruptive and just the right amount of freedom to allow for ideas to flow and mix and combine in several stages before a good one can finally surface.

Advertisements
Everything is a file —

Everything is a file

UNIX invented it, BSD and Linux gave it to the world

Everything is a file is very successful paradigm in the UNIX/Linux communities which has allowed the kernel to simplify and uniformize how it uses devices which are exposed to the user as files. All files are treated as a bag of bytes. Reading/writing from a file is straightforward.

Besides actual data storage a lof ot fruitful exaptation has been derived from this paradigm and from the UNIX/Linux file system conventions:

  • Files, folders, symlinks, hardlinks, named pipes (fifo), network pipes, devices
  • Applications which handle readable files and can work together well (ex.: lines separated with \n, columns separated with \t): less/more, tail, head, sort, split, join, fold, par, grep, awk, colum, wc, sed, tee
  • Configuration management
  • Application storage
  • Library registry
  • Disk cloning
    • Disk images for backup (dd)
    • Smaller than disk size images (skip unused space)
    • Compress disk images on the fly without storing the uncompressed version (dd | gz)
    • Restoring disk images from backups
    • Disk recovery — HDDs, CDs, DVDs, USB sticks etc. — when they have bad sectors or scratches
    • Creating bootable USB sticks from a raw image file or an ISO (dd again)
  • Virtual filesystems
    • Mounting a raw image file or ISO as a filesystem
    • Mounting archives and compressed archives as a filesystem (tar, gz, bz, zip, rar)
    • Network filesystems look just like normal folders SAMBA, NFS
    • Using various network protocols as filesystems: HTTP, FTP, SSH
  • Searching everywhere (find, grep, sed)

Plan9 from Bell Labs made it better

Current UNIX/Linux distros don’t implement this paradigm fully — ex.: network devices aren’t files — but some less known systems do (such as UNIX successor plan9 / inferno and their Linux correspondent glendix).

The plan9 project went onward in applying the paradigm for:

  • Processes
    • Process management
    • Inter process communication
    • Client-Server network communication
  • Network related issues:
    • Network interfaces are files
    • Access rights to network interfaces is based on filesystem access rights to symlinks pointing to interface files
    • The filesystem (9P) extends over the network as a network communication protocol
  • Graphics interfaces and mouse IO

Other innovations it brought us (which got implemented in UNIX/Linux):

  • UTF-8 / Unicode
  • Filesystem snapshotting
  • Union filesystems
  • Lightweight threads
Paradigm — 2015-08-23

Paradigm

A distinct set of concepts or thought patterns 1
A world view underlying the theories and methodology of a particular scientific subject. 2
A framework containing the basic assumptions, ways of thinking, and methodology that are commonly accepted by members of a scientific community 3

I’d add that the concepts construct a particular context around themselves and the result is a guideline for derivative thinking based on the starting points.

But perhaps my understanding of this concept is skewed due to its use in the programming world especially to describe the need for a paradigm shift in order to correctly understand and use tools which are based on (initially) foreign concepts.

How-to put a rabbit in the browser —

How-to put a rabbit in the browser

A rational for why you would want to do this is here.

I couldn’t find an AMQP library written in browser side JavaScript but I did find a STOMP JavaScript library which works nicely with a server-side STOMP plugin so we use STOMP as our message protocol.

STOMP is a simpler more text-oriented messaging protocol which includes HTTP-like headers. RabbitMQ has a plugin for implementing STOMP and that plugin allows access to AMQP based exchanges and queues by mapping STOMP destinations to exchange/queue names.

You will need:

  • Native WebSockets or SockJS on the browser side
  • The STOMP.js browser library
  • A SockJS server which is conveniently available in the rabbitmq_web_stomp plugin for rabbitmq (bundled with rabbitmq)
  • A STOMP compliant message broker which is available via the rabbitmq_stomp plugin for rabbitmq (bundled with rabbitmq)

Install rabbitmq and enable the plugins:

sudo apt-get install rabbitmq
sudo rabbitmq-plugins enable rabbitmq_management
sudo rabbitmq-plugins enable rabbitmq_web_stomp
sudo service rabbitmq-server restart

Configure exchanges/queues:

  1. Log into http://localhost:15672/ with user guest and password guest.
  2. Create a new test exchange (called test in my code) and a new test queue (called test in my code).
  3. Create bindings between the new exchange and queue.
  4. Manually publish a message in the exchange and ensure it arrives in the queue.

Copy the following two files:

<!-- index.html -->

<!DOCTYPE html>
<html>
    <head>
        <title>Rabbits in the front-end</title>

        <a href="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.1.4/jquery.js">https://cdnjs.cloudflare.com/ajax/libs/jquery/2.1.4/jquery.js</a>
        <a href="https://cdnjs.cloudflare.com/ajax/libs/stomp.js/2.3.3/stomp.js">https://cdnjs.cloudflare.com/ajax/libs/stomp.js/2.3.3/stomp.js</a>
        <a href="http://script.js">http://script.js</a>
    </head>
    <body>
        <form>
            <input type="text" value="">
            <button id="send">Send</button>
        </form>
    </body>
</html>
/** script.js */

var RabbitMQ = {
    hostname: window.location.hostname,
    port: 15674,
    path: "stomp/websocket",

    username: "guest",
    password: "guest",

    exchange: "/exchange/test",
    queue: "/queue/test",

    onMessage: function(message) {
        alert(message.body);
    },
    onSuccess: function(message) { 
        this.subscribe(RabbitMQ.queue, RabbitMQ.onMessage);
    },
    onError: function() {
        console.log(arguments);
    }
};


var ws = new WebSocket("ws://" + RabbitMQ.hostname + ":"+RabbitMQ.port+"/"+RabbitMQ.path);

var qc = Stomp.over(ws);
qc.heartbeat.outgoing = 0;
qc.heartbeat.incoming = 0;
qc.connect(
    RabbitMQ.username,
    RabbitMQ.password,
    RabbitMQ.onSuccess,
    RabbitMQ.onError,
    "/"
);



$(window).load(function() {
    $("form button#send").click(function(e) {
        e.preventDefault();

        var parent = $(this).parent();

        if ($("input", parent).val()) {
            qc.send(RabbitMQ.exchange, null, $("input", parent).val());

            $("input", parent).val("")
        }

        return false;
    });
});

And finally visit http://localhost/index.html to check the results.

Rabbits in the browser — 2015-08-21

Rabbits in the browser

eCommerce applications a usually read-intensive — due to the number of products and category listings — and tend to optimize their scaling for a higher number of reads by using replication for example and letting the slaves handle reads.

Writing performance often bottlenecks in the checkout phase of the application where new orders are registered, stocks are balanced etc..

This type of bottleneck is all the more visible in highly cached applications where most of the read-intensive information is served from memory while the checkout still needs a lot of concurrent write access on a single master database.

Replacing synchronous on demand processing with asynchronous message passing and processing should:

  1. Allow more simultaneous connections — since the connections are simple TCP socket
  2. Decrease the number of processes used — no nginx, no php-fpm, just tcp kernel-threads and rabbit worker threads
  3. Decrease the memory use — based on the number of consumers used to process the inbound data
  4. Decrease DB concurrency — based on the number of consumers doing the work rather than the number of buyers placing an order (orders of magnitude lower)

Messages with reply-queues could allow asynchronous responses to be received later on when the processor has finished its task. A TCP proxy in front of a cluster of RabbitMQ machines and a few PHP worker machines behind them should scale much better than receiving all the heavy-processing traffic in nginx + php-fpm processes.

Message brokers such as RabbitMQ can dynamically generate reply-queues when asked to do so and those queues are session-based only so their content is only accessible to the user that sent the request.

Security-wise message brokers support TLS over the socket but extra security measures can be envisioned — ex.: security token, message-digest checks etc..

A short example of the above principles is here.

SQL survived way too long — 2015-08-15

SQL survived way too long

SQL currently forces two computer programs to discuss what they want to do in an English based textual representation (wtf?) and both computer programs jump through hoops to adapt to that.

The database itself uses lexers, parsers, optimizers and interpretors to understand and execute the requested SQL statements. The consumer application uses DBALs, ActiveRecords or ORMs or other manners of SQL statement construction in order to convey them to the database.

The fact that two computer programs have to talk to each other in a language foreign to them and entire third party projects are dedicated to helping them cross these communication boundaries easier should give us enough pause (once again, wtf?).

Some of the disadvantages of forcing two natively binary machines to speak broken English:

  • computational overhead on both sides — either in generating queries or in parsing them
  • representational overhead on both sides — data communicated between the two entities is needlessly converted into text (including integers, floats, enums, sets) or it is at least syntactically escaped (strings, blobs)
  • structural duplication — a logical entity is declared once in the application logic and once as a (set of) table(s) in the database
  • added vulnerabilities — the entire spectrum of SQL injection attacks is caused by the textual nature of the communication
  • feature restriction — new features need to be added in database server logic, consumer application logic as well as SQL syntax logic (buzz words: standardization, vendor acceptance, uniformity) which ripples into implementations at the DBAL and ActiveRecord or ORM level in order for the consumer application logic to have access to it

SEQUEL the “Structured English Query Language” was born in the 70s and was used by (not necessarily technical) humans directly, that’s why it was English based. SQL is now a mature technology provided by many vendors with very high performance yields which made it a de facto standard for every data-driven development project — this is a classic case of exaptation.

The very same technology backbone without the “English” part and the “textual” part would mean:

  1. Less code to maintain — less effort and less errors
  2. Less market fragmentation from vendors in terms of behavior — or at least in terms of syntax
  3. No need for DBALs, ActiveRecords or ORMs — but complex client libraries and APIs would be needed to expose the same functionality
  4. Deduplication of structural code — consumer applications would provide authoritative structure of database objects
  5. Access to the entire consumer application library in the database — complex mathematical or logical functions would sometimes need to be implemented at the database level in the (more limited) SQL or PL/SQL languages even when already available at the application level; ex.: hashing functions, geo-positioning arithmetic, complex string manipulation, output format manipulation etc.
  6. More permeability of features between database server and consumer application — such as C-level datatypes such as structs, enums and unions with all their nested structuring and functionality which are currently not available in MySQL (unions, nested structs) but are partially in PostgreSQL
  7. Added permeability allows shifting responsibilities when necessary — moving some of the processing from an overburdened database server to an easier-to-scale application level
Exaptation — 2015-08-08