Impressions from Berlin Buzzwords 2015

May 31 – June 3 2015

Stream processing, Internet of things, Real time analytics, Big data, Recommendations, Machine learning. Berlin Buzzwords undoubtedly lives up to its name by presenting the frontlines of data technology trends.

The conference is focused on three core concepts – search, data and scale, bringing together a diverse range of people and with presentations touching the perimeter of the buzzword range.
Berlin Buzzwords kicked off on Sunday evening with a Barcamp, Monday and Tuesday contained full day conferences, while Wednesday was filled with hackathons and workshops.

Comperio

Comperio was one of the many companies sponsoring the conference, and came to Berlin bringing two speakers. André Lynum talked about “Beyond Significant terms” – a deep dive into how to utilize Elasticsearch built in indexes and APIs for improved lexical analysis, topic management and trend information. André’s talk went far beyond what the well known Elasticsearch significant terms aggregation provides. Christoffer Vig captured a spot on the informal Open Stage, giving a funny and off-kilter presentation and demo of the analytics and visualization capabilities of Kibana 4 based on a beer product catalogue.

The talks

Many people attended the comparison of Solr and Elasticsearch Performance & Scalability with Radu Gheorghe & Rafał Kuć from Sematext. This was a fast paced run through of how they were able to create tests reproducing the same conditions on both search engines. Elasticsearch outperformed Solr on text search using wikipedia data, while, surprisingly Solr outperformed Elasticsearch on aggregations. Solr has recently started catching up with Elasticsearch on providing nested aggregations and perhaps the improved performance comes as a result of a slimmed down implementation? It will be very interesting to follow the developments of both platforms into the future, and as consumers of the products we see competition is a good thing driving innovation and performance.

Two other interesting technical talks was Adrian Grands explaining some of the algorithms behind Elasticsearchs aggregations and Ted Dunnings presentation of the t-digest algorithm. Both were a window into how approximations can yield fast algorithms for complex statistics with provable bounds which they managed to keep approachable to the casual listener.

SQL?

Another theme threatening to return from the basement was how to properly support SQL style joins into search engines. Real life use cases sometimes demand objects with relations. The stock answer from the NoSQL world is to denormalize your data before inserting it, but Lucene/Elasticsearch/Solr did get limited Join support a while ago. Taking this further Mikhail Khludnev showed how the new Global Ordinal Join aims to provide a Join with improved performance.

Talking the talk

As search consultants one of our main challenges at Comperio is communicating about technical topics with customers who need to connect technical topics to their own competence and background. Ellen Friedman from MapR explained how such communication can be beneficial to almost any team or team member and shared some experiences and ideas regarding how you can try this at home. At its core it boils down to understanding and describing your technical work across several layers and showing respect for the perspective and background your conversation partner.
She also shared a very funny parrot joke. Not going to reveal that one here, watch the video if you’ld like a good laugh.

Hackathon

Comperio also attended the Apache Flink workshop hosted at Google’s offices in Berlin by the talented developers at data Artisans. Apache Flink is in some ways similar to Apache Spark and other recent distributed computing frameworks, and is an alternative to Hadoop’s MapReduce component. It represents a novel approach to data processing, modelling all data as streams, exposing both a batch- and stream APIs. Apache Flink has a built in optimizer that optimizes memory, network traffic and processing power. This leaves the developer to implement core functionality in Java, Scala or Python.

The buzz

Berlin Buzzwords is a great opportunity to surf the crest of the big data wave with the most interesting people in the field. The city of Berlin with it’s sense of being on the edge of new developments provides the perfect backdrop for a conference on the latest “Buzzwords”. Comperio will certainly be back next year.