Running the Hackaday ARG

hackaday-space

So over the last month I've been very busy helping out Hackaday in preparation for the announcement of The Hackaday Prize.

During the lead up to the announcement we wanted to build some suspense at Hackaday so we started releasing a series of space themed puzzles over all over April. Every week we posted a new puzzle, starting with Transmission #01 which was released on April 1st.

These went down very well and in all we had a few hundred people participate in trying to solve them, much fun was had and an amazing display of ingenuity, persistence and downright evil genius was shown by the Hackaday community at large.

Since we've been asked a fair number of questions about why we did this and how we put it together I've started a series of articles over at Hackaday documenting the event. You can checkout the first on Hackaday Space : Transmission 1 now.

At the beginning of March every year, tech hordes from all over the world flee to Austin in order to spend couple of days at the city-wide circus called South by Southwest. Even though the festival is getting bigger and bigger every year and definitely loosing a bit of it's "cool" it's still worth a visit. So a bunch of us have packed our bags and decided to swap sunny Pasadena and it's 80F for a rainy Austin in the low 40s (not something you would expect in Texas). Bad weather aside, we had so much fun!

On Sunday, March 9, we gave a talk at SXSW StageTwo along with a number of awesome hardware startups and tech journalists about our mission to organize the world of engineering knowledge and how it might be relevant to everyone designing new hardware products.

We also got to spend some time at SXSW Create event, dedicated to maker/hacker culture and talk with a number of interesting engineers. The part we were most impressed were Internet of Things projects in unexpected areas such as agriculture, data they're starting to generate and all the possible "big data" problems that will come out of instrumenting the real-world in the same way we have been instrumenting the Web.

For more details you can check out our coverage of the event at Hackaday.

Hackaday at SXSW Create - part #1 from Supplyframe on Vimeo.

Now we all know that us Computer Scientists are suckers for Cyberpunk, so when we realized that MIT Media Lab was hosting a series of events dedicated to sci-fi inspired hardware and the future of making, they couldn't get us out of the room. We also did a coverage of demos of works from Media Lab SciFi2SciFab class you can read all about on Hackaday.

Remembering the Future - Hackaday at SXSW part 2 from Supplyframe on Vimeo.

For a list of all other hardware-related events that were talking place at SXSW check out the SXSW 2014 Hardware Guide.

In many real-world applications,  "Signals" are typically recorded and represented as time dependent functions. For example, we can record the daily trading price in the stock market, or measuring the hourly temperature in Los Angeles. In order to extract meaningful characteristics of the data, many different transforms have developed to decompose the data into simpler pieces.

The question is: How to design a good transform on the signal we want to analyze? It becomes a classical problem in signal representation- We want to define a transform which provides a "sparse" representation of the signal which capture most or all information of a signal. "Sparseness" is one of the reasons for the extensive use of popular transforms, because they discover the structure of the singal and provide a "compact" representation. In this post, we provide an example that how to analyze the web traffic by Discrete Fourier Transform (DFT).

Website traffic

Website traffic

In every hour, we record the total number of user actions in our website and the data is shown in figure 1. This signal seems to be periodic, but the question is: How could we explore the structure? DFT provides a nice tool to represent the discrete time signal into periodic Fourier series. The original signal x(t_n) can be represented by
\begin{equation}
\label{eq:1}
x(t_n) = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} X_k \cdot e^{i 2 \pi k n / N},
\quad n\in\mathbb{Z}\
\end{equation}

where the coefficients of Fourier series  X_k are defined as

\begin{equation}
\label{eq:2}
X_k = \frac{1}{\sqrt{N}} \sum_{n=0}^{N-1} x(t_n) \cdot e^{-i 2 \pi k n / N}, \quad k\in\mathbb{Z}\
\end{equation}

After DFT, we can analyze the energy in each frequency component. The frequency component is given by f_k = \frac{k}{N~T} , where T is the sampling frequency. We plot the power spectrum of the data |X(f_k)| in figure 2. The result shows the web traffic is mostly on low-frequency components.

Energy Spectrum

Energy Spectrum

Let's look at different frequency components of the data: We compare different frequency components with the original data: the original signal is shown in figure 3(a), the components with frequency |f| < 0.46 is in figure 3(b), and components with frequency 0.46 \leq |f| < 1.40 in figure 3(c). Note that the traffic in weekdays is much higher than weekend, and the first components successfully capture the traffic change between weekdays and weekend. Also, the traffic is much higher for working hours than the rest periods, which can be clearly identified for the daily change in the second components.

fft_analysis

Signal Components

Based on sparse representation, we can select ``significant'' components for many different applications, i.e, signal de-noising and or pattern recognition. We will introduce the wavelet transform in the next blog.

AWS EC2 Reserved Instances sound like a great idea. You spend a lot with EC2, keep instances running for a long time and have predictable workloads throughout the year ? Reserved Instances to the rescue ! You pay a flat reservation fee upfront and reap the benefit of reduced computing cost over time.

All sounds great and makes perfect sense, until the day comes when you realize that the original forecast was off. For whatever reason. Perhaps you have improved the performance of your infrastructure and don't need as much resources. Unused reservations start piling up and you start bleeding money. Naively you say "not a problem ! we'll just sell these instances on EC2 Reserved Instance Marketplace and get our money back !". Well, that is where this story begins ...

There are two main problems with this kind of reasoning. First one is being able to accurately forecast future demand for resources from very limited historical information. EC2 reservations are usually sold at 1 and 3 year terms, while EC2 has really only been around for 6-7 years. And let's be honest, most of infrastructure you're trying to buy reserved instances for haven't been on AWS for more than couple of years. So, at best, you're really trying to forecast 1-3 year demand from 1-3 year worth of data ? Maybe you have a solid business, good plan that you won't deviate from and a strong reason to believe that engineering team won't be able to make things faster/cheaper anytime soon. But maybe you should think twice.

The second and more severe problem is liquidity risk you're running when dealing with EC2 Reserved Instance Marketplace. When people hear the world "marketplace" they imagine zillions of transactions a second, hordes of buyers, sellers and arbitrage-opportunity-seekers clearing the market in a blink of an eye, forces of supply and demand ensuring everyone gets the right price. But there is a huge difference between just saying something is a 'marketplace' and actually running a liquid one. In an inefficient market, it's very easy to end up on the short end ... and these reservations are not cheap.

RI Marketplace problems are not limited to simply not having enough buyers and sellers around. Market is highly fragmented - you can only buy and sell instances within the same region and can't really convert instance types. It is really a ecosystem of region-specific marketplaces, all of which highly illiquid, with each unique (region, instance type, duration, offering type) tuple representing a different commodity. There is no market maker either. You would expect that that Amazon might want to buy back your unused reservations at a discounted price, but that's not an option, at least not for now. After all, why give someone a fraction of their money back when you can keep all of it. To be fair, Amazon is making some efforts to 'improve' things - until recently you were only able to buy and sell instances within particular (region, availability zone) and now it's only (region) but the whole thing is still a long way from a real marketplace you can rely on.

So we went ahead and collected some data on EC2 Reserved Instance Markeplace activity over the last two months in order to get at least a bit more insight into how good or bad things really are.

Read more...

No project launch is complete without a making-of video. So here it goes ...

Datasheet.net Demo Video - Making of from Supplyframe on Vimeo.

Our latest product, Datasheet.net, is now finally available for anyone to use. After a few months in private beta we feel ready to let the world see what we've been playing with. Here's a quote from the going live post on the Datasheet.net blog.

We think datasheets suck. Plain and simple this is a format that hasn’t changed since the invention of the PDF. We’re trying to change that, trying to make datasheets a ‘live’ document, something that actually helps make our lives as engineers better. In the long term this means us working closely with manufacturers to help improve the datasheet format, we’re not sure where this leads yet but its something we’re working towards.

But changing the way manufacturers work is a long and slow road, so in the meantime as a first step we’ve put together some useful features on top of datasheets as they currently stand.

We'll be promoting the project more over the coming weeks, and hopefully extending it to support all manner of new features and cool things. But for now signup to Datasheet.net to start making datasheets a little bit better!

If you have been following LA tech scene lately - you might have noticed a lot of activity around Hadoop ecosystem & related projects.  While it's easy to discredit this as just a reflection of omnipresent buzz around "Big Data" and "Data Science" one can argue that in this environment - it's much more than just that. Area's traditional focus on "hard science", both in industry and academia seems to have given the community quite an edge - when talking Data Science there is a lot more appreciation for actual research topics (rather than just black box tools) and when talking Distributed Systems - a huge passion for 'exotic' subjects such as Haskell and Clojure.

Big Data Camp LA 2013 will be the key event of the season bringing different sides of this community together. SupplyFrame is one of the sponsors and a bunch of our engineers will be attending. We have our fair share of challenges, experiences and ideas in this space and if you're the same way, we would love to chat ! That's what unconferences are for.

See you at #BigDataCampLA !

Last couple of days we have been goofing around with different design ideas for new supplyframe t-shirt and here is what we finally came up with

tshirt4

Hope the great W. Shockley will forgive us for canibalising such an awesome hack :)

On our recent trip to China, we were lucky enough to get the chance to visit one of local Hackerspaces in Shanghai. The place is called Xinchejian and it looks like something straight off William Gibson novel. Tons of cardboard boxes, tiny robots scattered all over the place, art pieces, 3d printers, hydroponics hacks (?) ... The kind of organized chaos all of us happily relate with. We met with David Li, a foreman of the Hackerspace who introduced us to the local scene and challenges of running this kind of environment in the place where passtime is not a commodity.

7070a8b0228211e3879522000a1fbdb8_7

A factor that makes all the difference here is easy access to serious production-grade fabrication resources. The limit seems to be only in creativity and novelty of ideas (and ability to get some bootstrap funding). Everything else is easy. And that is exactly what this kind of place is all about - enabling creative people with tools for expressing their ideas.  David had a great analogy with BSD and Linux communities in the 90ies - while BSD community was somewhat elitist and not really welcome to noobs, Linux crowd always had patience for even the most boring and repetitive questions. And Linux eventually enabled so much change, while BSD for the most part ended up on the shelf. It seems that the same holds nowadays in terms of the gap between "true" Electrical Engineers and the new hacker/maker crowd.

b047df12228211e3b1b722000a1d0aba_7

That is why David tries to attract as many people from diverse areas. A beautiful example is the one of a girl who was a professional piano player with no knowledge of electronics whatsoever. She joined the Hackerspace and within couple of months ended up picking up the necessary skills and building a spectacular interactive "singing tree" installation art piece (for which she also composed an original score). And that is what this 'hardware revolution' is all about ...

5544021c228211e3892122000a9f13f9_7

Sadly, our time was limited but next time around we'll try to hang out more and participate in the community. If you happen to find yourself around Shanghai, by all means don't forget to visit Xinchejian !

Binomial distribution pops up in our problems daily, given that the number of occurrences k of events with probability p in a sequence of size n can be described as

k \sim Binomial(n,p)

Question that naturally arises in this context is - given observations of k and n , how do we estimate p ? One might say that simply computing p = {k \over n} should be enough, since that's both uniformly minimum variance and a maximum likelihood estimator. However, such estimator might be misleading when n is small (as anyone trying to estimate clickthrough rates from small number of impressions can testify).

The question is - what can we do ? One thing that naturally comes to mind is incorporating any prior knowledge about the distribution we might have. A wise choice for prior of binomial distribution is usually Beta distribution, not just because of it's convenience (given that it's conjugate prior), but also because of flexibility in incorporating different distribution shapes

Screen Shot 2013-09-10 at 3.36.46 PM

In this context, we can express our model as:

k_i \sim Binomial(n_i, p_i)

p_i \sim Beta(a, b), i = 1...N

where N is total number of observations and a and b are parameters to be estimated. Such model is also called Empirical Bayes. Unlike traditional Bayes, in which we pull prior distribution and it's parameters out of the thin air, Empirical Bayes estimates prior parameters from the data.

In order to estimate parameters of the prior, we calculate marginal distribution as

m(k|a,b) = \int \prod_{i=1}^{N} f(k_i|p)\pi(p|a,b)dp = \prod_{i=1}^N {n_i \choose k_i} {{\Gamma(a+b)\Gamma(a+k_i)\Gamma(n_i-k_i+b)}\over{\Gamma(a)\Gamma(b)\Gamma(a+b+n_i)}}

where f and \pi are density functions of binomial and beta distributions, respectively. Parameter estimates \hat{a} and \hat{b} can be obtained by maximizing the log likelihood of the marginal distribution.

Finally, Empirical Bayes estimator can be constructed as expectation of posterior distribution:

\hat{p_i} = E(p_i|k_i, \hat{a}, \hat{b}) = {{\hat{a}+k_i}\over{\hat{a} + \hat{b} + n_i}}

Pretty easy. The real question is - how do we do this in practice ? It turns out that there is no off-the-shelf package in R for doing this, so we have built one. It relies pretty heavily on fitdistrplus package and there are certainly a number of things to be improved, but it's a start. You can grab it at our Github repository.