Big Data and Enterprise Search
There have been a number of reports and papers issued recently on Big Data including:
- Forrester reviewing the Big Data solutions of 2013
- The Economist talks about what is Big Data and how can it be used
- Wall Street Journal talk about what is next for Big Data
- The Sunday Times reviewing how Big Data has helped various companies
- Martin White discusses if there is a need for enterprise search whilst Big Data lives
- Stephen Arnold discusses his Big Data trends for 2013
- Mike Walsh reviews Big Data strategies for 2013
As well as software vendors pushing their appetite for Big Data, either via their websites e.g. Microsoft and their Big Data Week or via taking adverts in the UK national press i.e. IBM running a number of full page adverts in The Times, week commencing Monday 18th February 2013.
From reading these reports and papers, what actually is Big Data? Is Big Data a hype? Is it only relevant for a small number of very large organisations where the volume, rate of change, variety and worth of this data is highly relevant? It is very hard to answer these questions – what may be Big Data to you may not be Big Data to me.
Big Data to me is being able to capture data, whether it is structured, semi-structured or totally unstructured, store it, interpret it, and leverage it to provide insights in order to help the business.
So how does Big Data and enterprise search co-exist? Can traditional search tools work as the “gateway” to explore Big Data by, for instance, preparing the data to help in creating the predictive model?
Based upon the Forrester report mentioned above, SAS and IBM are the leaders in the Big Data space with a large number of tools available to process and analyse the data. As the first part of any analysis is preparing the data, and with a large proportion of the data being unstructured, could enterprise search use its distinctive capabilities of pre-processing large amounts of both structured and unstructured content – I think not. Currently, enterprise search tools do not have the capability to traverse some a large amount of data in a timely manner in order to try and produce a smaller, relevant content set without using a very large amount of hardware. The structured data that exists may already be well organised but unstructured content is another matter and trying to interpret meaning between structured and unstructured content can be very complex.
So, if enterprise search cannot help directly with preparing the data for analysis, where can search help? Frost and Sullivan forecast the global enterprise search market to be US$4.68bn by 2019. Search has the capability to bring a wide variety of different content sources together and produce meaning. Search queries, from simple to complex, can then be run against the search index returning, hopefully the most relevant content based upon the search query terms entered. But, for the most part, this won’t give the answers to the Big Data questions e.g. – helping to uncover the answers to making the best use of the data available.
However, Search Based Applications (SBA), as defined by Wikipedia , “use semantic technologies to aggregate, normalize and classify unstructured, semi-structured and/or structured content across multiple repositories, and employ natural language technologies for accessing the aggregated information.”, can be built to slice and dice the information in the search index on-the-fly – isn’t this close to what the Big Data engines are trying to achieve. There are a number of search related companies building SBAs which look to build insight in the realms of data that organisations amass.
There are obviously limitations to what search engines can do in terms of the size of the data sets – that’s why it’s called Big Data. However, there must be a reason why companies like IBM purchased Vivisimo in 2012 or Oracle bought Endeca in 2011, with both companies looking to capitalize on the capabilities both Vivisimo and Endeca offered in terms of unlocking the structured and unstructured content within organisations.
Quote from Oracle on the Endeca acquisition, “Oracle with Endeca plans to create a comprehensive technology platform to process, store, manage, search and analyze structured and unstructured information together enabling businesses to make stronger and more profitable decisions” – search and Big Data complementing each other – I think so.
Finally, the latest information from Gartner indicates “Big Data is forecast to drive $34 billion of IT spending in 2013 and create 4.4 million IT jobs by 2015, but it is currently still a solution looking for a problem”.