Crowd-Sourced Computing on the Web - Proof of Concept

Years ago services like [email protected] introduced us to crowd-sourcing computing cycles in order to analyze big data. Unfortunately, these programs require that you install some software on your computer and keep it running. Lately, I’ve been noodling over applying this same concept to the web, and I thought I’d share my thoughts.

When somebody is viewing a webpage, most of the time the browser is relatively idle, and probably their machine as well. For years now, modern browsers have implemented the Web Worker API, exposing a way to tap into those unused computer cycles. The missing step is connecting big data to those computing cycles, and possibly rewarding web surfers or site owners for the used cycles.

Article updated on April 5, 2013

If you are seeing the not supported error in your browser (after the first worker starts), try FireFox. It seems the latest WebKit treats the dataURI of a dynamically inserted JavaScript files differently than those included when the page loads.

How do it…

For starters we need a marketplace where somebody who has a lot of data to analyze can go to buy cycles. This is the hard part of the problem, and what I hope one of you might solve. The gist is that a large amount of segmentable data would be uploaded and a function for analyzing the segments.

Sites or users who have opted in to the marketplace include two JavaScript files. The first is the static code to manage the web workers and process data URIs:

(function() {
	var i = 1,
		failCount = 0,
		MAX_FAIL = 10;

	function fetchMoreData(sData) {
		if (! window.WebAnalyzerStop) {
			// fetch more data using JSONP
			var el = document.createElement('script');
			el.async = true;
			el.src = '{PATH_TO_DATA_URL}?data=' + encodeURIComponent(sData);
			document.body.appendChild(el);
		}
	}

	function WebAnalyzerProcessData(sDataUri) {
		try {
			var oWorker = new Worker(sDataUri);

			oWorker.onmessage = function(e) {
				// worker finished. data should be reported back to the server
				//	where it can be recorded and more data can be loaded for processing
				fetchMoreData(e.data);
			};

			oWorker.onerror = function() {
				// worker broke, probably should report somewhere,
				// 	but also get more data for crunching
				failCount++;

				// if something is seriously wrong, let's just stop
				if (MAX_FAIL > failCount) {
					fetchMoreData('');
				}
			};

			console.log("WebWorker " + i++ + " started!");
		}
		catch(e) {
			if (window.console) {
				console.log('WebWorkers are not fully supported by your browser');
			}
		}
	}

	// exposed globally so data loading JSONP requests can start new workers
	window.WebAnalyzerProcessData = WebAnalyzerProcessData;
}());

The second is would be a call to the marketplace server that would return a JavaScript dataURI, and call the WebAnalyzerProcessData function (this is a JSONP system with a known callback function name):

(function() {
    var WebAnalyzerDataUri = "data:text/javascript;charset=US-ASCII,var%20i%20%3D%20500000000%2C%20n%20%3D%200%2C%20dTime%20%3D%20(new%20Date()).getTime()%3Bwhile%20(0%20%3C%20i)%20%7Bn%20%2B%3D%20i%3Bi%20-%3D%201%3B%7DdTime%20%3D%20(new%20Date()).getTime()%20-%20dTime%3BpostMessage(%22Calculated%20%22%20%2B%20n%20%2B%20%22%20in%20%22%20%2B%20dTime%20%2B%20%22ms%22)%3B";

    if (window.WebAnalyzerProcessData) {
        window.WebAnalyzerProcessData(WebAnalyzerDataUri);
    }
}());

If you heard your fan kick on (it will after ~30 seconds), it is because I am demonstrating the system right now. The web worker is adding the numbers between 1 and 500000000, then calling the data URL again (this is a demo, so the data URL does the same thing, but ideally it would be dynamic).

How it works…

The static code manages creating web workers and fetching new data from the dynamic data URL (provided by the marketplace). We expose a global callback function WebAnalyzerProcessData that the dynamic data script can send dataURI encoded JavaScript too. Inside a try/catch block a new web worker is created from this dataURI and the message and error events are subscribed to. The try/catch block is used, because some older browsers do not yet support web workers and IE does not support dataURI blobs. Whether the worker finishes on an error or a message, we fetch a more data, stopping only when MAX_FAIL is reached or somebody sets window.WebAnalyzerStop = true;. Messages are passed to the marketplace using the data query parameter so the analysis can be recorded.

The dynamic data URL needs to extract a chunk of data and combine it with the data analysis function, creating a JavaScript file. This file is then encoded (I used encodeURIComponent, but base64 or another compressing encoding can be used as well) as a string and returned dynamically. This dataURI is passed into WebAnalyzerProcessData, where another web worker will be created to process the next chunk of data.

If a popular site were to participate millions of chunks of data could be processed every day, helping solve some big data problems. Besides having to build the marketplace, there is onus on the big data provider to provide the marketplace with segmentable data and a function that can process chunks of data. The marketplace would need to combine all the data and provide it back to the big data provider.

There’s more…

This technology really excites me, because there are many large data problems that cannot yet be feasibly solved with a modern computer, but could be solved with millions of computers crushing away at it continuously. Additionally, are computers sit idle most of the time, and this would be a great way to put some of those CPU cycles to good use. If the marketplace is well-implemented then it should be profitable by charging the big data providers to use its system, and be able to pay bounties to sites or users.

That said, if the marketplace is not well curated or is intentionally malicious, it could also become the biggest bot-net in the world or possibly be used to attack some of the intractable problems like encryption, and who knows, maybe even Skynet.

Overall though, I think the benefits outweigh the risks, and hope some entrepreneur takes on the task of building this system.

Hash-Hack for Cross-Domain IFrame Communication

I recently needed a simple, legacy browser (*cough* IE 6) compatible, solution for sending cross-domain communication between an iframe and its parent window. Fortunately, the URL protocol allows changing the hash part of the URL without causing the page to refresh, and the same is true when updating the URL of an iframe. This means that you can use a hash-hack to communicate between the frames. If you do not need to support < IE ...

Simple JavaScript Requirement System for Asynchronous Scripts

I recently went through the chore of making sure that all the scripts on this site are loaded asynchronously using JavaScript Deferment, as shown in Django Template Tags for JavaScript Deferment. This ended up causing problems with my JavaScript, because most of my scripts depend on blog.js, which is now loaded asynchronously. I decided to write a simple require system that other scripts could use to ensure that blog.js is already loaded, and today ...

Python-Style Decorators in JavaScript

Python-style decorators (or Java annotations) are a useful feature that is not natively available in JavaScript. This article describes a technique for apply Python-style decorators to JavaScript functions. We’ll call them annotations, so not to confuse them with the decorator pattern.

How do it…

To illustrate what an annotation will be, lets try a simple example:
 function alertFoo() { alert('foo'); } alertFoo = annotate(alertFoo).by(alertBar); 
We annotate the alertFoo function ...

Shell Sort

Continuing our sorting algorithm discussion, today’s article will cover a technique known as Shell Sort. Shell sort is a quadratic (O(n2)) in-place comparison sort algorithm developed by Donald Shell in 1959 that is basically Insertion Sort with a gap between comparing indexes. The assumption is that comparing farther apart elements can move some out-of-place elements into position faster.

How do it…

The first thing we need is an algorithm to ...

Cross Browser and Legacy Supported RequestFrameAnimation

There is an awesome new feature in HTML 5, window.requestAnimationFrame, which tells the browser that you wish to perform an animation and requests that the browser schedule a repaint of the window for the next animation frame1. This allows you to do animations more efficiently and without using setInterval or series of setTimeout function(s). I have created a Demo that attempts to use the browser’s requestAnimationFrame side-by-side with the legacy polyfill executing.

Article ...

Processing.js Frame Animation Queue

A couple weeks ago, I mentioned that I was using Processing.js to animate a visualization tool for work. Today’s article showcases the class-based strategy we used to handle animation frames. There is a simple Demo available, if you would like to see Processing.js in action.

Getting ready

To start using processing, you will need three things: the latest Processing.js code, a canvas element on an HTML page with the ...

Query Parameter Parsing Function

I needed a straight (ie. no framework) JavaScript function for parsing query parameters the other day and found one on stack overflow that I liked. Most functions that lookup query parameters, search the url string each time it performs a key lookup. However, the query parameters never change once your JavaScript code starts to run, so it is inefficient and unnecessary to search the query parameters string on each lookup. The lookup function in ...

Parsing JavaScript Function Argument Names

Lately, I have been experimenting with various JavaScript MVCs, and I was surprised when AngularJS threw an error, when I changed the argument name in a controller function declaration using a factory resource. The library was obviously doing some magic to parse the names of the arguments of a function and validating it against a previously defined namespace. This was interesting, so I thought I would discover how they were doing it and some possible ...

Super Simple Image Viewer V3

I have had several inquiries lately about the Super Simple Image Viewer widget that I built several years back. As it is still being used, I thought I would take a minute to revamp the code and get it into a public version control. If you are unfamiliar with the software, it is a simple JavaScript widget that allows developers to create an image slide show on their site. Here is a list of ...