No project launch is complete without a making-of video. So here it goes ...
Our latest product, Datasheet.net, is now finally available for anyone to use. After a few months in private beta we feel ready to let the world see what we've been playing with. Here's a quote from the going live post on the Datasheet.net blog.
We think datasheets suck. Plain and simple this is a format that hasn’t changed since the invention of the PDF. We’re trying to change that, trying to make datasheets a ‘live’ document, something that actually helps make our lives as engineers better. In the long term this means us working closely with manufacturers to help improve the datasheet format, we’re not sure where this leads yet but its something we’re working towards.
But changing the way manufacturers work is a long and slow road, so in the meantime as a first step we’ve put together some useful features on top of datasheets as they currently stand.
We'll be promoting the project more over the coming weeks, and hopefully extending it to support all manner of new features and cool things. But for now signup to Datasheet.net to start making datasheets a little bit better!
If you have been following LA tech scene lately - you might have noticed a lot of activity around Hadoop ecosystem & related projects. While it's easy to discredit this as just a reflection of omnipresent buzz around "Big Data" and "Data Science" one can argue that in this environment - it's much more than just that. Area's traditional focus on "hard science", both in industry and academia seems to have given the community quite an edge - when talking Data Science there is a lot more appreciation for actual research topics (rather than just black box tools) and when talking Distributed Systems - a huge passion for 'exotic' subjects such as Haskell and Clojure.
Big Data Camp LA 2013 will be the key event of the season bringing different sides of this community together. SupplyFrame is one of the sponsors and a bunch of our engineers will be attending. We have our fair share of challenges, experiences and ideas in this space and if you're the same way, we would love to chat ! That's what unconferences are for.
See you at
On our recent trip to China, we were lucky enough to get the chance to visit one of local Hackerspaces in Shanghai. The place is called Xinchejian and it looks like something straight off William Gibson novel. Tons of cardboard boxes, tiny robots scattered all over the place, art pieces, 3d printers, hydroponics hacks (?) ... The kind of organized chaos all of us happily relate with. We met with David Li, a foreman of the Hackerspace who introduced us to the local scene and challenges of running this kind of environment in the place where passtime is not a commodity.
A factor that makes all the difference here is easy access to serious production-grade fabrication resources. The limit seems to be only in creativity and novelty of ideas (and ability to get some bootstrap funding). Everything else is easy. And that is exactly what this kind of place is all about - enabling creative people with tools for expressing their ideas. David had a great analogy with BSD and Linux communities in the 90ies - while BSD community was somewhat elitist and not really welcome to noobs, Linux crowd always had patience for even the most boring and repetitive questions. And Linux eventually enabled so much change, while BSD for the most part ended up on the shelf. It seems that the same holds nowadays in terms of the gap between "true" Electrical Engineers and the new hacker/maker crowd.
That is why David tries to attract as many people from diverse areas. A beautiful example is the one of a girl who was a professional piano player with no knowledge of electronics whatsoever. She joined the Hackerspace and within couple of months ended up picking up the necessary skills and building a spectacular interactive "singing tree" installation art piece (for which she also composed an original score). And that is what this 'hardware revolution' is all about ...
Sadly, our time was limited but next time around we'll try to hang out more and participate in the community. If you happen to find yourself around Shanghai, by all means don't forget to visit Xinchejian !
Binomial distribution pops up in our problems daily, given that the number of occurrences of events with probability in a sequence of size can be described as
Question that naturally arises in this context is - given observations of and , how do we estimate ? One might say that simply computing should be enough, since that's both uniformly minimum variance and a maximum likelihood estimator. However, such estimator might be misleading when is small (as anyone trying to estimate clickthrough rates from small number of impressions can testify).
The question is - what can we do ? One thing that naturally comes to mind is incorporating any prior knowledge about the distribution we might have. A wise choice for prior of binomial distribution is usually Beta distribution, not just because of it's convenience (given that it's conjugate prior), but also because of flexibility in incorporating different distribution shapes
In this context, we can express our model as:
where is total number of observations and and are parameters to be estimated. Such model is also called Empirical Bayes. Unlike traditional Bayes, in which we pull prior distribution and it's parameters out of the thin air, Empirical Bayes estimates prior parameters from the data.
In order to estimate parameters of the prior, we calculate marginal distribution as
where and are density functions of binomial and beta distributions, respectively. Parameter estimates and can be obtained by maximizing the log likelihood of the marginal distribution.
Finally, Empirical Bayes estimator can be constructed as expectation of posterior distribution:
Pretty easy. The real question is - how do we do this in practice ? It turns out that there is no off-the-shelf package in R for doing this, so we have built one. It relies pretty heavily on fitdistrplus package and there are certainly a number of things to be improved, but it's a start. You can grab it at our Github repository.
In SupplyFrame we started to use Riemann as our stream processing framework to do system and application monitoring. Riemann is a lightweight clojure based DSL operates on event streams. The power of Clojure expressiveness gives it ability to encode the whole event handling logic into one config file. Compare to generic stream framework like Esper or Storm, it is cleaner, simpler, but still highly programmable.
For example, If we want to send the mean data of last 10 minutes to graphite and alert sysops if the median of metric is more than 1000, we can express the idea in riemann like this:
Pretty clean, right?
How to install & deploy Riemann
On the homepage of riemann website, there are some prebuilt packages available. You can simply download the
deb package and
$ sudo dpkg -i riemann.0.2.x.deb to install it. The package will install the configuration file into
/etc/riemann/riemann.config and you can use
sudo service riemann start/stop/restart/reload to run or to stop it. If you prefer to use it as an instance instead of a system service, you can clone the project and use leiningen to install the dependencies and compile the project. You can either run Riemann in a screen or use nohup to force it to run as a daemon. If you run Riemann as an instance, remember to assign each Riemann with different ports that it listens to.
Riemann DSL structure
Riemann is designed to operate on flows of events sent over protocol buffers. An event is basically a map of these fields (copied from riemann documentation):
|host||A hostname, e.g. "api1", "foo.com"|
|service||e.g. "API port 8000 reqs/sec"|
|state||Any string less than 255 bytes, e.g. "ok", "warning", "critical"|
|time||The time of the event, in unix epoch seconds|
|tags||Freeform list of strings, e.g. ["rate", "fooproduct", "transient"]|
|metric||A number associated with this event, e.g. the number of reqs/sec.|
|ttl||A floating-point time, in seconds, that this event is considered valid for. Expired states may be removed from the index.|
You can think that the events in Riemann are nouns, and we're going to use verbs, streams and stream functions to process them.
Streams and stream functions
(streams ...) is the magic macro that forms the unique DSL of processing events. The expressions in
(streams ...) are treated as rules that process the same event simultaneously; while the nested s-expressions in
(streams...) will pipe the results from parent s-expression to child s-expressions.
You can tell from the previous example,
(folds/mean ...) and
(folds/median ...) are at the same level, which means they're processing the same event. And the event pipeline handling logic is expressed in nested s-expressions.
How to handle an event manually 1
Every event handler is a function that takes an event or list of events as input. If you want to deal with an event on the most nested s-expression, its easy. An anonymous function will do the job:
For more example of the basic usage of Riemann and standard stream processing functions, you can find it at Riemann's HOWTO page.
How to handle an event manually 2
In most of the use cases, Riemann's built-in stream aggregation functions solve the problems. However, in some rare cases you might also want to write nested stream processing function similar to the built-in functions. Here's how you can do it:
Here's an use case. We want to detect errors and those events which exceeds defined threshold. In most alerting systems, they either notify sysops with all peak events or smoothen the data which is even more error prone. We want something like this: in time period if the ratio of overshoot or error events is more than x, pipe the events to handler functions. Here's how we implemented it in Riemann:
With this function, now we can define our event logic as
For common Riemann tasks and functions, the official Riemann's HOWTO page is a great reference. For Clojure's syntax and semantics, there's a lot of resources and tutorials in the web; Full disclojure video series. is a good start to get familiar with it.
This year, few of us visited Open Hardware Summit at MIT and got the chance to show some of our efforts in making datasheet content more accessible and fun to use. Project is currently hosted at beta.datasheet.net and is written by Ben Delarre, a "man who hates datasheets" (and is committed to making them better). We got great feedback from a lot of participants and are heading back with a plate full of bugs and features we need to fix/implement over the next month. Soon we'll start handing out invites and sending a call for early testers. Stay tuned !
For a great overview of some of the OHS demos, check our friend Eric Evenchick's post on Hackaday.
We're proud to announce that this year we're sponsoring the Open source hardware summit! This is a new move for us as we haven't sponsored an event like this before, but since we're getting more involved in the Open Hardware scene we thought it would be a great place to go and meet more members of the community and find out whats going on. Myself and a couple of others will be attending the event which starts on September 6th in Boston, MA.
I'm really looking forward to getting my grubby paws on one of these awesome conference badges and hacking up some fun projects while we're there.
Because of some miss communication with our suppliers in China we didn't get the LEDs made up the way we needed them to be. So we now need to make up lots of small wires, and connect everything up with screw terminals. This is obviously a major undertaking and I really needed some help. So on Friday we decided to run a little build afternoon in our Hacklab. We got some food in and began the long and arduous process of cutting, stripping and tinning hundreds of wires. Nearly everyone pitched in at some point throughout the afternoon and we managed to knock out almost 1/3rd of the parts we needed.
I have to say a big thanks to everyone who pitched in, amazing stuff!