Whether you want to do market research or gather financial risk information or just get news about your favorite footballer from various news site, web scraping has many uses. In my quest to learn know more about web crawling and scraping , I decided to test couple of Open Source Web Crawlers which were not [...]
If you have worked with search solutions before, you will know that very often there is a need to process data before it can be displayed in search results. This processing might be required to address some of(but not limited to) these common issues: Missing metadata issues Inconsistent metadata issues Cleansing of content Integration of semantic [...]
As human beings, we like to believe that each and every one of us is a special individual, and not easily replaceable. That may be fine, but please, don’t fall into the habit of treating your computer the same way.
Elasticsearch easily stores terabytes of data, but how can you make sure users only see the data they should? This post will explore how to use Shield, a plugin for Elasticsearch, to authenticate users with Active Directory.
Many of you who use Elasticsearch may have used the significant terms aggregation and been intrigued by this example of fast and simple word analysis. The details and mechanism behind this aggregation tends to be kept rather vague however and couched in terms like “magic” and the commonly uncommon. This is unfortunate since developing informative [...]
May 31 – June 3 2015 Stream processing, Internet of things, Real time analytics, Big data, Recommendations, Machine learning. Berlin Buzzwords undoubtedly lives up to its name by presenting the frontlines of data technology trends.
Using Logstash and Kibana on Found by Elastic, Part 1 This is part one of a two post blog series, aiming to demonstrate how to feed logs from IIS into Elasticsearch and Kibana via Logstash, using the hosted services provided by Found by Elastic. This post will deal with setting up the basic functionality and [...]
Web Analytics is the process of measuring and analyzing web data to assess and improve the effectiveness of a website.Tracking and improving search (search analytics) is an important part of web analytics which is often forgotten by many site owners. Website search analytics should not be underestimated as it can provide valuable insights into what [...]
This will do two things for the user: It’ll be easier to see the search box . [...]
IPython notebooks have become an indispensable tool for many Python developers. They are a reasonably good environment for interactive computing, can contain inline data visualisations and can be hosted remotely for sharing results or working together with other developers. In many academic environments and increasingly in industry IPython notebooks are used for data visualisation work [...]