What goes into a Hacklab?

We're having a lot of fun getting up and running with our Hacklab. None of us are professional electronics engineers, so we weren't really sure what we needed. I myself have built a few electronics projects in the past, but never really had the opportunity to setup a dedicated lab. My work has always been in a spare bedroom on a desk next to my PC working with the bare minimum of tools. So its been great fun having a budget and the ability to go out and get whatever we needed to make the lab complete and functional. We made one gigantic order from Adafruit who by the way are an awesome little company that make great stuff.

IMG_8459 IMG_8458 IMG_8454 IMG_8456

A few of the key things we picked up that I'm really happy with:

IMG_8461This DS1052E Oscilloscope is frankly awesome. I've not had access to a scope for so long I had forgotten how much fun they are. One of our marketers has a small recording studio so he wants to build this B16 VCA Compressor kit to add to his collection. So we'll get to make great use of the scope then - perhaps we'll even put together a little post about how to use the scope to confirm the device is working.

 

IMG_8457The  Saleae Logic 16 is a great piece of kit. I personally have the 8 channel model and it has really saved me so much time over the years that it was worth every penny. If you're doing any sort of digital communication between devices then this is an essential tool making it so much easier to debug your signals and figure out whats going wrong.

 

 

IMG_8455The most essential tool in our arsenal, the new Digital Hakko FX-888D is a great little soldering iron and a welcome upgrade to the standard and much loved FX-888.

 

 

 

 

After getting all our kit out, I couldn't resist getting our first smell of solder in the air. We have a couple of these Adjustable breadboard power supply kits and I had some willing colleagues eager to try out their first soldering iron so we set to work. It was late so we didn't get much done, but the first solder in our hacklab has been laid and its time to begin the real fun!

IMG_8468 IMG_8477 IMG_8469 IMG_8483

When using streaming replication in PostgreSQL 9, it's important to know what the latency is between the master and slaves, especially when deploying on cloud based instances. Ideally, we'd like to know by how many bytes the WAL logs are lagging.  PostgreSQL offers a neat way to check just that between a given slave and its master via the pg_current_xlog_location() and pg_last_xlog_replay_location() functions.  However, the output seems cryptic.

postgres=# select * from pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 6F/E3C53568
(1 row)

OK, so what does that mean?  Looking at the WAL files in the pg_xlog/ directory, we see this file that appears to be related.

000000010000006F000000E2

Since WAL files are fixed chunks, we'd expect that the internal xlog pointer will have a byte level position in some future WAL file that has yet to be written. We'd like to decipher the output of the functions and determine what the current byte is.

One of the great things about using open source is that you can quickly drill down into the source code and find out why things are the way they are.  PostgreSQL has converted to using GIT, so let's check it out:

git clone git://git.postgresql.org/git/postgresql.git

After searching a bit, we can find the relevant piece of code in bufpage.h, which defines a struct responsible for tracking the current byte position of the PostgreSQL log,

This tells us that the output format of these functions is "${xlogid}/${xrecoff}" and that (xlogid << 32 | xrecoff) is the 64 bit number that represents the current byte position of the xlog.

From here, it's straightforward to create something that can poll both functions on master and slave, takes the difference, and submits it to some monitoring system.

Here's a simple ruby script that does this and posts the delta to graphite.

 

 

 

Ad-hoc measurement of page load times across multiple web servers can be a drag. Quite often you end up with a number of screen windows doing bash timed curl loops with file redirects and some grep/awk magic. This tends to work OK until you need to get a bit more sophisticated, for example - compare results or get samples at fixed frequency.

Here we give a quick-and-dirty Clojure script that takes that curl loop just a bit further - from a single command you can monitor response times across arbitrary number of endpoints, retrieving measurements at given time intervals and track results in real-time. The script itself is trivial - most of the functionality is due to Quartzite scheduling library. You can grab the source at:

https://github.com/SupplyFrame/grunf

example usage:

lein run -m grunf.bin '(["http://www.google.com" "http://finance.yahoo.com" "http://www.bing.com/news"] 1000)'

(1368072742623 http://www.google.com 225.029)
(1368072743236 http://www.bing.com/news 839.584)
(1368072743457 http://finance.yahoo.com 1059.564)
...

Have fun !

After a lot of procrastination, we have finally decided to get our hands dirty and start building a in-house hacklab. Why ? Because it's fun ! We also think we have some great ideas we want to build in our spare time.

For now it's mostly about goofing around. Check out some photos from day #1:

Stay tuned !

One of the challenging things related to building "big data" apps is dealing with messy data sets. At SupplyFrame, we ran into a problem while doing some analysis with K-Means clustering:  All interesting features in our data had varying amounts of missing values.  It turns out that how the values are missing is significant!  Say you knocked out various cells at random:  Your analysis won't suffer too much as the contribution to the error is uniform.  This is known as Missing Completely at Random (MCAR).

However, let's say you knock out cells more frequently if the user came from a certain country with latency problems.  Now, the contribution to error is no longer random.  We had to modify the K-Means algorithm to handle this situation.  Since we also deal with non-Euclidean distances, we had to adapt K-Means to accept any distance function.  Here is a simple Python project that provides a reference implementation:

https://github.com/SupplyFrame/kmeans-pds

Sample run of this code on the mouse data set from the Wikipedia article on K-Means is given in the following graph:

Mouse data set

Stay tuned for a series of articles about how this works and an exploration of the tools used to visualize the results.

This is the first post in hopefully a soon to be long line of posts on the SupplyFrame Engineering Blog. Here we'll be documenting our adventures in our shiny new Electronics Lab and hopefully blogging a little bit about the work we do here and the things that interest us.

Stick around, it may be infrequent, but hopefully it will at least be interesting!