Search Nuggets » search engine

Idea: Your life searchable through Norch – NOde seaRCH, IFTTT and Google Drive

Espen Klem — Wed, 26 Nov 2014 14:33:08 +0000

First some disclaimers:

This has been posted earlier on lab.klemespen.com.
Even though some of these ideas are not what you’d normally implement in a business environment, some of the concepts can obviously be transferred over to businesses trying to provide an efficient workplace for its employees.
Norch is developed by Fergus McDowall, an employee of Comerio.

What if you could index your whole life and make this lifeindex available through search? What would that look like, and how could it help you? Refinding information is obviously one of the use case for this type of search. I’m guessing there’s a lot more, and I’m curious to figure them out.

Actions and reactions instead of web pages

I had the lifeindex idea for a little while now. Originally the idea was to index everything I browsed. From what I know and where Norch is, it would take a while before I was anywhere close to achieving that goal. Then I thought of IFTTT, and saw it as a ‘next best thing’. But then it hit me that now I’m indexing actions, and that’s way better than pages. But what I’m missing from most sources now are the reactions to my actions. If I have a question, I also want to crawl and index the answer. If I have a statement, I want to get the critique indexed.

IFTTT and similar services (like Zapier) is quite limiting in their choice of triggers. Not sure if this is because of choices done by those services or limitations from the sites they crawl/pull information from.

A quick fix for this, and a generally good idea for Search Engines, would be to switch from a preview of your content to the actual content in the form of an embed-view. Here exemplified:

Will embed-view of your content replace the preview-pane in modern #search #engine solutions? Why preview when you can have the real deal?

— Espen Klem (@eklem) November 24, 2014

Technology: Hello IFTTT, Google SpreadSheet and Norch

IFTTT is triggered by my actions, and stores some data to a series of spreadsheets on Google Drive. These spreadsheets can deliver JSON. After a little document processing these JSON-files can be fed to the Norch-indexer.

Why hasn’t this idea popped up earlier?

Search engines used to be hardware guzzling technology. With Norch, the “NOde seaRCH” engine, that has changed. Elasticsearch and Solr are easy and small compared to i.e. SharePoint Search, but still it needs a lot of hardware. Norch can run on a Raspberry Pi, and soon it will be able to run in your browser. Maybe data sets closer to small data is more interesting than big data?

Norch running on a Raspberry Pi

Why using a search engine?

It’s cheap and quick. I’m not a developer, and I’ll still be able to glue all these sources together. Search engines are often a good choice when you have multiple sources. IFTTT and Google SpreadSheet makes it even easier, normalising the input and delivering it as JSON.

How far in the process have I come?

So far, I’ve set up a lot of triggers/sources at IFTTT.com:

Instagram: When posting or liking both photos and videos.
Flickr: When posting an image, creating a set or linking a photo.
Google Calendar: When adding something to one of my calendars.
Facebook: When i post a link, is tagged, post a status message.
Twitter: When I tweet, retweet, reply or if somebody mentions me.
Youtube: When I post or like a video.
GitHub: When I create an issue, gets assigned to an issue or any issues that I part take in is closed.
WordPress: When new posts or comments on posts.
Android location tracking: When I enter and exit certain areas.
Android phone log: Placed, received and missed calls.
Gmail: Starred emails.

And gotten a good chunk of data. Indexing my SMS’es felt a bit creepy, so I stopped doing that. And storing email just sounded too excessive, but I think starred emails would suit the purpose of the project.

Those Google Drive documents are giving me JSON. Not JSON that I can feed directly Norch-indexer, it needs a little trimming.

Issues discovered so far

Manual work

This search solution needs a lot of manual setup. Every trigger needs to be set up manually. Everytime a new trigger is triggered, I get a new spreadsheet that needs a title row added. Or else, the JSON variables will look funny, since first row is used for variable names.

The spreadsheets only accepts 2000 rows. After that a new file is created. Either I need to delete content, rename the file or reconfigure some stuff.

Level of maturity

IFTTT is a really nice service, and they treat their users well. But, for now, it’s not something you can trust fully.

Cleaning up duplicates and obsolete stuff

I have no way of removing stuff from the index automatically at this point. If I delete something I’ve added/written/created, it will not be reflected in the index.

Missing sources

Books I buy, music I listen to, movies and TV-series I watch. Or Amazon, Spotify, Netflix and HBO. Apart from that, there are no Norwegian services available through IFTTT.

History

The crawling is triggered by my actions. That leaves me without history. So, i.e. new contacts on LinkedIn is meaningless when I don’t get to index the existing ones.

Next steps

JSON clean-up

I need to make a document processing step. Norch-document-processor would be nice if it had handled JSON in addition to HTML. Not yet, but maybe in the future? Anyway, there’s just a small amount of JSON clean-up before I got my data in and index.

When this step is done, a first version can be demoed.

UX and front-end code

To show the full potential, I need some interaction design of the idea. For now they’re all in my head. And these sketches needs to be converted to HTML, CSS and Angular view.

Embed codes

Figure out how to embed Instagram, Flickr, Facebook and LinkedIn-posts, Google Maps, federated phonebook search etc.

OAUTH configuration

Set up OAUTH NPM package to access non-public spreadsheets on Google Drive. Then I can add some of the less open information I have stored.

5 reasons Lebron is the future, or why the Forage search engine will rock

Espen Klem — Wed, 28 May 2014 08:51:51 +0000

The Lebron stack

Last week, I saw the future. Wohaa, that’s always a great feeling. I’ve seen it in earlier weeks also, but now it was even brighter than before. For me, it’s still called the Lebron Stack as Max Ogden explains it and consists of LevelDB, Browserify and npm. All this is mostly happening in JavaScript. Before I’m knocked to the ground: I wasn’t the first to either make the prediction or say it out loud. I’m way behind, and it’s not a very novel or extreme idea, just a really good one. But when something is predicted, it may take a long time before it happens, if it happens at all. I think it’s happening now-ish.

So this blog post is about why I think that time is now. Disclaimer for the .Net and Java heads And all you .Net- and Java-heads will surely find some stuff that will be done better within your part of the world, but hear me out! I know the list of “This already exists in OS [W] or [X]” or “You can do that with software [X], [Y] or [Z]“. I have these thoughts my self, and I’ve been wondering why I still think that Lebron and JavaScript still will be so much more important. I’m not saying that .Net and Java stuff will go away, it will just be less important (it already is) and most of the cool and stuff closer to the user will happen in the JavaScript world.

The future is bright at the Webrebels conference in Oslo, May – 2014.

Here are the reasons I found so far

Most stuff happens in the browser Selling anything, you want to be where the people are. For regular people that’s on their smartphone using a web app or just a native app, which in most cases is a web app wrapped as a native app. Emerging markets makes this shift towards the browser happen even faster. The Firefox OS may fail as an OS, but still succeed creating a standard smartphone API for web applications, the WebAPI. This will make it even easier to create web apps for all of the world’s smartphones, which leads me on to my next point.
Easier for startups and developers Competing with the big ones is never easy. Amazon Web Services, AWS, and similar services made it a little easier to scale hardware use dynamically, and from that, the cost of hardware. With the browser as a VM and single page applications a lot of the web application rendering and logic is moved from the servers to the clients. So for a small company the choice is easy. Why do all the heavy lifting on your own servers when the users can do most of the application rendering and logic on their smartphones? The irony in the old “thin vs. thick client” debate is that the clients actually got a lot thinner, and in the same go started doing more of the heavy lifting. While a Google data center is impressive, I also got a feeling it’s a sign of something gone terribly wrong.
Collaboration, modularity and minimum effort npm is great stuff. It takes away a lot of dependency pain in the JavaScript world. Combined with people that are very good at writing small modular programs and lots of stuff under the MIT license we have a winner. We now have tools for collaboration that actually works. People build their killer apps with very little effort on top of others’ greatness. No more reinventing the text editor.
Cheaper hardware for regular users Okay, most people access the Internet through their phone, but the Chromebook explains this point very well. Why have a full OS, with all the hardware costs to run it fairly fast, when all you do is fire up a browser? The browser is the OS more and more each day. Last time my desktop at home broke down, I bought a new one. The new one was state of the art and it was a miscalculation buying it. Almost every time I boot it (running Ubuntu), I’m asked to upgrade to the newest version. That means every half year or so. The laptop I have, I actually use a little, but much less than my pad/tablet and phone.
Everything fun is online Not a real argument, but hey… Isn’t it true?

But what about the Forage search engine you say?

So, what does these reasons for Lebron/JavaScript’s future success have to do with the Forage search engine? First of all, it’s written in JavaScript and needs very little hardware to run properly. You install it with npm, and that takes care of all the dependencies, like LevelDB, where the data is actually stored. Hopefully it will run in the browser in near future using Browserify and make testing, installing and maintaining search software so much easier and more accessible. It also opens up a lot of new interesting use cases for search. My guess is that it won’t compete with the bigger search engines, but that it will open up the possibility for better and cheaper search functionality for small scale solutions.

Anything you want to add to or subtract from the list?

Idea: search server running inside browser

Espen Klem — Tue, 29 Apr 2014 18:15:36 +0000

Got an idea to use the browser as a virtual machine for Forage Forage is Fergus McDowall’s pet project: A search server written in JavaScript and based on Node.js and LevelDB. Since it’s JavaScript, and HTML5 local storage has the same key/value storage as levelDB (HTML5 local storage for Chrome actually is levelDB) it has the possibility to run inside any modern browser. This would mean that the user could get a search server running inside browser.

Forage could then be added with a bookmarklet to any page (A bookmark adding a javascript to the page you’re on). With some simple UI-stuff you could define the Forage Document Processor Adapter, set up rules for Forage Crawler, crawl, process, index and then search within your indexed documents. All without using any servers, on premises or in the cloud. When the user is satisfied she or he could download the JSON-file with processed documents + scripts for adding a search box, search result and navigators to a page.

Possible use cases for search server running inside browser:

Easy site search setup
One real benefit, and the initial idea, would be that the user would not need any server to test Forage and actually crawl a site. When page crawled the user can download JSON ready to be indexed + setup-files for a search box, navigators and search result. Or add it to a cloud service and there continue the work you started in your browser.
A easy and modern search solution behind the firewall
Behind the firewall, almost all software looks a bit duller, more beige and basically not modern. But through the browser you could easily combine the strength of Forage and all the hidden gems behind a firewall. There would be some big issues with security, but for intranet and people search it could be a great solution.
Ad hoc search on a site that is not yours
Say you’re looking for something on a site. How about ad hock index it and then search it. Yes, it’s a bandwidth abuse waiting to happen, but could make a good tool for a lot of situations.
Your life, searchable
This may need a browser add on, but then again, maybe not. Anyway: How about your whole online life, searchable. Today you have your browser history. It shows you page title and page link. What if all the text and images was searchable?

Some UX sketches of the idea:

The user finds a page to crawl …

… clicks the bookmarklet …

... that adds Forage JavaScript-stuff to the page …

… much like a browser plugin or add on …

… tests a jQuery selector statement …

… and adds the field to the item when satisfied. Repeated until a full item is defined.

Here’s the feature suggestion at the Forage GitHub page. Ideas or comments are more than welcome! Want to know more about Forage? Check out the Forage GitHub-pages or stuff we’ve written about Forage.

EDIT: Drawn some new mock ups on the crawler part: Forage Fetch and written about the killer combo Lebron and what it will mean for search.