Kibana 4 is a great tool for analyzing data. Vinmonopolet, the Norwegian government owned alcoholic beverage retail monopoly, makes their list of products available online in an easily digestible csv format. So, what beer should I buy next? Kibana will soon tell me.
The CSV file import facility in Neo4J is interesting in that it allows you to run Cypher queries iteratively over your dataset. This gives us a lot of flexibility and relieves us of the need for transforming our data to a Neo4J specific format. We can export tables with for example foreign keys to other [...]
Recently on a project I got an interesting request. Content owners wanted to enrich new documents submitted to the search index with content from documents already present in the index. We use Solr as the search backend for this particular customer so I started thinking about how to achieve this with Solr. A bit of [...]
Creating JSON-like structures in Python (or any other programming language), can be a cumbersome experience. Consider this snippet from the elasticsearch-py library, taken from the example/query.py file: I would argue that 33 lines for creating the facets above is too much. To save you from the dreaded hassle of writing JSON in your programs, I [...]
“Ability to search source code? (BB-39)” is an issue created in July 2011 on Bitbucket and its status is still new. If you have used Bitbucket before, you would have certainly noticed that there is no way to search in a repository’s source code. Now what if you had more than 200 repositories (as is [...]
Update Part two is now available here! At the beginning of this year Christopher Vig wrote a great post about indexing an SQL database to the internet’s current search engine du jour, Elasticsearch. This first post in a two part series will show that Apache Solr is a robust and versatile alternative that makes indexing [...]
In this blog post we introduce a Vagrant box to easily create configurable and reproducible development environments for ELK (Elasticsearch, Logastash and Kibana). At Comperio, we mainly use this box for query log analysis using the ELK stack. In case you don’t know, Vagrant is a free and open-source software that combines VirtualBox (a virtualization [...]
E is for Elasticsearch Elasticsearch is an open source search and analytics engine that extends the limits of full-text search through a robust set of APIs and DSLs, to deliver a flexible and almost limitless search experience. L is for Logstash One of the most popular open source log parser solutions on the market, Logstash has the possibility of reading any data source [...]
Once the design for the seasonal recipes app started coming into place, we soon saw there was something fishy about the results. Elasticsearch and custom relevancy to the rescue!
This series of blog posts covers how to set up FAST ESP special tokenization, character normalization, phonetic normalization and lemmatization.