Search Nuggets » search

Experimenting with Open Source Web Crawlers

Mridu Agarwal — Fri, 29 Apr 2016 11:03:42 +0000

Whether you want to do market research or gather financial risk information or just get news about your favorite footballer from various news site, web scraping has many uses.

In my quest to learn know more about web crawling and scraping , I decided to test couple of Open Source Web Crawlers which were not only easily available but quite powerful as well. In this article I am mostly going to cover their basic features and how easy they are to start with.

If you are like one of those persons who likes to quickly get started while learning something, I would suggest that you try OpenWebSpider first.

It is a simple web browser based open source crawler and search engine which is simple to install and use and is very good for those who are trying to get acquainted to web crawling . It stores webpages in MySql or MongoDb. I used MySql for my testing purpose. You can follow the steps here to install it. It’s pretty simple and basic.

So, once you have installed everything , you just need to open a web-browser at http://127.0.0.1:9999/ and you are ready to crawl and search. Just check your database settings, type the Url of the site you want to crawl and within couple of minutes, you have all the data you need. You can even search it going to the search tab and typing in your query. Whoa! That was quick and compact and needless to say you don’t need any programming skills to crawl it.

If you are trying to create an off-line copy of your data or your very own mini Wikipedia, I think go for this as it’s the easiest way to do it.

Following are some screen shots:

You can also see the this Search engine demo here, before actually getting started.

Ok, after getting my hands on into web crawling, I was curious to do more sophisticated stuff like extracting topics from a web site where I do not have any RSS feed or API. Extracting this structured data could be quite important to many business scenarios where you are trying to follow competitor’s product news or gather data for business intelligence. I decided to use Scrapy for this experiment.

The good thing about Scrapy is that it is not only fast and simple, but very extensible as well. While installing it on my windows environment, I had few hiccups mainly because of the different compatible version of python but in the end, once you get it, it’s very simple(Isn’t that how you feel anyways , once things works ? Anyways, forget it! :D). Follow these links, if you are having trouble installing Scrapy like me:

https://github.com/scrapy/scrapy/wiki/How-to-Install-Scrapy-0.14-in-a-64-bit-Windows-7-Environment

http://doc.scrapy.org/en/latest/intro/install.html#intro-install

After installing, you need to create a Scrapy project. Since we are doing more customized stuff than just crawling the entire website, this requires more effort and knowledge of programming skills and sometime browser tools to understand the HTML DOM. You can follow this link to get started with you first Scrapy project .Once you have crawled the data that you need, it would be interesting to feed this data into a search engine. I have also been looking for open source web crawlers for Elastic Search and this looked like the perfect opportunity. Scrapy provides integration with Elastic Search out of the box , which is awesome. You just need to install the Elastic Search module for Scrapy(of course Elastic Search should be running somewhere) and configure the Item Pipeline for Scrapy. Follow this link for the step by step guide. Once done, you have the fully integrated crawler and search system!

I crawled http://primehealthchannel.com and created an index named “healthitems” in Scrapy.

To search the elastic search index, I am using Chrome extension Sense to send queries to Elastic Search, and this is how it looks

GET /scrapy/healthitems/_search

I hope you had fun reading this and now wants to try some of your own cool ideas . Do let us know how you used it and which crawler you like the most!

Search: better user experience with one line of JavaScript

Espen Klem — Mon, 18 May 2015 13:59:01 +0000

What’s the cheapest trick you can do to get a better user experience on your search solution, and make your users do better search queries?

Add a small line of JavaScript in your template’s document ready function:

$("#MySearchBox").focus();

This will do two things for the user:

It’ll be easier to see the search box .
The user can start typing without having to click inside the search box.

Next issue is that most intranet and websites are more than just a search solution. Maybe you don’t want that much attention on the search box on your homepage. The solution is then to do this on your search result page.

This will make it easier for your users to enhance their search query when they’re not happy with the search result at hand.

Do you have any other examples on other quick fixes that could make an even better user experience for your search solution?

Ny versjon av Comperio FRONT.NET

Christoffer Vig — Wed, 13 May 2015 10:24:54 +0000

Comperio har gjennom tidenes løp levert over 100 søkeprosjekter. Tankegods, svette og erfaringer hentet fra dette arbeidet har krystallisert seg inn i vår egentuviklede programvare for søkeapplikasjoner: FRONT. Tidligere i vår lanserte vi versjon 5 av Java-FRONT, denne gang er det den noe yngre fetteren Comperio FRONT.NET som har fått ligge på operasjonsbordet. Hovedtrekkene i den nye versjonen er nye søkeadaptere, forbedret stabilitet og ytelse, samt forbedret logging.

Mellomvare for søk

FRONT.NET opererer som mellomvare, og lar deg konfigurere forretningslogikk for søk uavhengig av både søkemotor og presentasjon. FRONT.NET er laget for å kunne hente og sette sammen informasjon fra ulike kilder, og kan gjerne kalles en søkeorkestrator.

FRONT.NET lar deg skille mellom forretningslogikk og applikasjonslogikk. Applikasjoner som trenger søkefunksjonalitet trenger ikke bry seg med kompliserte søkeuttrykk, men sender simpelthen spørreord over til FRONT.NET. Trenger du å avgrense søket, kan du sende med filter, som for eksempel brukerinformasjon, sted, avdeling, eller lignende. De komplekse spørringene tar FRONT seg av.

Søkemotoruavhengighet

FRONT.NET tilbyr et generelt format for spørringer, og søkeresultater. Dataformatet fra FRONT er det samme, uavhengig av om motoren i bakkant er SharePoint, ESP, eller Solr. FRONT.NET har i dag adaptere for Fast ESP, SharePoint 2010 og 2013, Elasticsearch, Solr og Google Search Appliance. Dette gjør det enkelt å sette sammen resultater fra ulike søkemotorer. Dersom du ønsker å bytte ut søkemotoren trenger det ikke innebære endringer i din applikasjon, da det kun er snakk om å bytte ut søkeadapter i FRONT.NET. Nye adaptere utvikles så snart vi ser behovet melde seg.

Elasticsearch adapter

Elasticsearch er en søkemotor i stor vekst. Til utvikling av Elasticsearch adapteret har vi kunnet dra nytte av NEST, den offisielle .NET klienten for Elasticsearch. Elasticsearch har enorm fleksibilitet i forhold til hvordan spørringer kan uttrykkes, med mulighet for nestede boolske uttrykk og dynamiske ranking-funksjoner. I utvikling av adapteret har vi valgt å minimere kompleksiteten i FRONT ved å delegere disse mulighetene inn i Elasticsearch via søkemaler (search templates). Dette ivaretar fleksibiliteten, samtidig som APIer og programmeringsgrensesnittene er beholdt.

Google Search Appliance Adapter

Comperio ble ifjor partner med Google, og vi har nå utviklet FRONT.NET adapter for Googles intranett søkemotor Google Search Appliance, eller bare GSA for kort. GSA tilbyr enkel integrasjon mot en rekke ulike kilder, søkegrensesnittet er enkelt og forholde seg til og adapteret har støtte for alle vanlige søkeoperasjoner.

Logging

For å kunne utvikle en god søkeløsning er det avgjørende at man har tilgang til gode søkelogger som avslører hvordan søkeapplikasjon brukes.
FRONT.NET har nylig fått funksjonalitet for å kunne logge direkte til Logstash. Logstash kombinert med Elasticsearch og Kibana gir deg et kraftig verktøy for dataanalyse.

FRONTD

Versjon 5 av FRONT.NET kjører som en frittstående tjeneste i Windows.
Tidligere versjoner opererte som web applikasjon under IIS (Internet information server), men vi ser at når vi kjører frittstående oppnås forenklet administrasjon, samt forbedret stabilitet og ytelse.

Microsoft, .NET og veien videre

Microsoft og .NET verdenen er under rivende utvikling for tiden, ikke minst gjennom Microsoft sin nye og kjært velkomne åpning mot open source. Vi liker veldig godt ideen om kryssplattform .NET, og neste versjon av FRONT.NET vil forhåpentligvis kjøre like bra på OS X og Linux som på Microsoft.

Search without search box: Recipe App – Alpha version

Espen Klem — Tue, 02 Dec 2014 14:44:48 +0000

First thing first: I really like search as a technology. Not so much because how it helps us today, but how it can help us tomorrow. Especially on the UX front, stuff moves slowly. One of the biggest issues, in my mind, is the empty search box. That’s why I tend to look for solutions where you have search without search box, or functions and tools that extends the search box.

The empty search box by Google

The problem with an empty search box

So, what’s the biggest problem with the empty search box? In my mind it doesn’t give a hint about what’s the possible outcome of asking a question to the search engine. Think about these five questions being different search boxes:

Soooo…?
What would you like?
What would you like for desert?
Do you like ice cream for desert?
Do you like pistachio ice cream for desert?

The first question is the most open ended, and is the equivalent of the empty search box in a general search engine. The one with all of the worlds knowledge at hand. Second one hints that the answers of your request lies within your likings. Third, the questions get semi-concrete, and fourth and fifth, you’re asked a yes/no-question.

A search without search box working quite well

The recipe app asks you if you like any of these recipes based on two variables (time of year + place you harvest the food). We know the date and make an assumption about place and tell you that we’ve chosen these variables. Then we give you a couple of extra variables to play with to refine the search a bit more.

So far, the content seems to trigger peoples imagination and the swipe interaction is easy enough to do many times, although not very well communicated so far. When I normally give people a working search prototype, they do 2 – 5 search queries. Now I see between 5 – 15. That’s great stuff, and maybe a search without search box is actually a good idea?

A screenshot of the Recipe App search solution without a search box

Known bugs and weaknesses

URL stays the same
No way of sharing a specific search (month+place+filters)
Buggy visual relevance
For desktop and pad, all recipes with match on three ingredients or more should have full width result view.
Not all recipes indexed
HTML for the recipes has changed. We didn’t have time to figure out all the new characteristics of the new HTML, so a lot of recipes were not indexed.
Visual snag on time navigator
When selecting “short”, “medium” or “long” time to prepare recipe it should collapse as with the type navigator.
Swipe hangs every now and then
The swipe library is either not tuned perfectly or a little fragile. Easy to get into a state where it stops working.
All ingredients equally important
“Oregano” and “Chicken wings” equally important. That results in some not-so-desired search results.
Google Analytics and single page app not fixed
We log one pageview per user since it’s a single page app. With a rewriting of the URL this can be easily fixed.
Design and UX
It’s just a makeshift design to communicate the idea. It should better explain interaction and season+place.

Any comments? We’d love to get your input! Check out the solution, the actual recipe app, or you can check out the other blog posts about the Recipe App. There’s a lot of both search domain-, tech- and UX-stuff.

Idea: Your life searchable through Norch – NOde seaRCH, IFTTT and Google Drive

Espen Klem — Wed, 26 Nov 2014 14:33:08 +0000

First some disclaimers:

This has been posted earlier on lab.klemespen.com.
Even though some of these ideas are not what you’d normally implement in a business environment, some of the concepts can obviously be transferred over to businesses trying to provide an efficient workplace for its employees.
Norch is developed by Fergus McDowall, an employee of Comerio.

What if you could index your whole life and make this lifeindex available through search? What would that look like, and how could it help you? Refinding information is obviously one of the use case for this type of search. I’m guessing there’s a lot more, and I’m curious to figure them out.

Actions and reactions instead of web pages

I had the lifeindex idea for a little while now. Originally the idea was to index everything I browsed. From what I know and where Norch is, it would take a while before I was anywhere close to achieving that goal. Then I thought of IFTTT, and saw it as a ‘next best thing’. But then it hit me that now I’m indexing actions, and that’s way better than pages. But what I’m missing from most sources now are the reactions to my actions. If I have a question, I also want to crawl and index the answer. If I have a statement, I want to get the critique indexed.

IFTTT and similar services (like Zapier) is quite limiting in their choice of triggers. Not sure if this is because of choices done by those services or limitations from the sites they crawl/pull information from.

A quick fix for this, and a generally good idea for Search Engines, would be to switch from a preview of your content to the actual content in the form of an embed-view. Here exemplified:

Will embed-view of your content replace the preview-pane in modern #search #engine solutions? Why preview when you can have the real deal?

— Espen Klem (@eklem) November 24, 2014

Technology: Hello IFTTT, Google SpreadSheet and Norch

IFTTT is triggered by my actions, and stores some data to a series of spreadsheets on Google Drive. These spreadsheets can deliver JSON. After a little document processing these JSON-files can be fed to the Norch-indexer.

Why hasn’t this idea popped up earlier?

Search engines used to be hardware guzzling technology. With Norch, the “NOde seaRCH” engine, that has changed. Elasticsearch and Solr are easy and small compared to i.e. SharePoint Search, but still it needs a lot of hardware. Norch can run on a Raspberry Pi, and soon it will be able to run in your browser. Maybe data sets closer to small data is more interesting than big data?

Norch running on a Raspberry Pi

Why using a search engine?

It’s cheap and quick. I’m not a developer, and I’ll still be able to glue all these sources together. Search engines are often a good choice when you have multiple sources. IFTTT and Google SpreadSheet makes it even easier, normalising the input and delivering it as JSON.

How far in the process have I come?

So far, I’ve set up a lot of triggers/sources at IFTTT.com:

Instagram: When posting or liking both photos and videos.
Flickr: When posting an image, creating a set or linking a photo.
Google Calendar: When adding something to one of my calendars.
Facebook: When i post a link, is tagged, post a status message.
Twitter: When I tweet, retweet, reply or if somebody mentions me.
Youtube: When I post or like a video.
GitHub: When I create an issue, gets assigned to an issue or any issues that I part take in is closed.
WordPress: When new posts or comments on posts.
Android location tracking: When I enter and exit certain areas.
Android phone log: Placed, received and missed calls.
Gmail: Starred emails.

And gotten a good chunk of data. Indexing my SMS’es felt a bit creepy, so I stopped doing that. And storing email just sounded too excessive, but I think starred emails would suit the purpose of the project.

Those Google Drive documents are giving me JSON. Not JSON that I can feed directly Norch-indexer, it needs a little trimming.

Issues discovered so far

Manual work

This search solution needs a lot of manual setup. Every trigger needs to be set up manually. Everytime a new trigger is triggered, I get a new spreadsheet that needs a title row added. Or else, the JSON variables will look funny, since first row is used for variable names.

The spreadsheets only accepts 2000 rows. After that a new file is created. Either I need to delete content, rename the file or reconfigure some stuff.

Level of maturity

IFTTT is a really nice service, and they treat their users well. But, for now, it’s not something you can trust fully.

Cleaning up duplicates and obsolete stuff

I have no way of removing stuff from the index automatically at this point. If I delete something I’ve added/written/created, it will not be reflected in the index.

Missing sources

Books I buy, music I listen to, movies and TV-series I watch. Or Amazon, Spotify, Netflix and HBO. Apart from that, there are no Norwegian services available through IFTTT.

History

The crawling is triggered by my actions. That leaves me without history. So, i.e. new contacts on LinkedIn is meaningless when I don’t get to index the existing ones.

Next steps

JSON clean-up

I need to make a document processing step. Norch-document-processor would be nice if it had handled JSON in addition to HTML. Not yet, but maybe in the future? Anyway, there’s just a small amount of JSON clean-up before I got my data in and index.

When this step is done, a first version can be demoed.

UX and front-end code

To show the full potential, I need some interaction design of the idea. For now they’re all in my head. And these sketches needs to be converted to HTML, CSS and Angular view.

Embed codes

Figure out how to embed Instagram, Flickr, Facebook and LinkedIn-posts, Google Maps, federated phonebook search etc.

OAUTH configuration

Set up OAUTH NPM package to access non-public spreadsheets on Google Drive. Then I can add some of the less open information I have stored.

Bitbucket to Elasticsearch Connector

Murhaf Fares — Thu, 18 Sep 2014 11:46:16 +0000

“Ability to search source code? (BB-39)” is an issue created in July 2011 on Bitbucket and its status is still new. If you have used Bitbucket before, you would have certainly noticed that there is no way to search in a repository’s source code. Now what if you had more than 200 repositories (as is the case for Comperio) and you wanted to search for some examples on how to use a function, for example? There are two options. Either clone all the repos to your local machine and then do some ‘grep’ magic or use our connector to index Bitbucket content in elasticsearch and then search happily ever after.

In this blog post, we introduce an open-source and free connector that indexes content from Bitbucket in elasticsearch. The connector is written in Python and it has two main modes: index, indexes everything from your Bitbucket account in elasticsearch, and update, updates your elasticsearch index based on the commits from the last time your ran the connector (there are three types of git update: add, change and delete).
The connector creates an elasticsearch index (based on the configurations provided in elasticsearch.conf) which in turn has two types of documents, namely ‘file’ and ‘repo’. We only provide a mapping file for the ‘file-typed’ documents, you can create one for repos as well. For information on the connector and how to use it, please see the project’s page on GitHub.

Bitbucket REST APIs
If you check the source code of the connector, you will see that we are using two versions of Bitbucket REST APIs (version 1.0 and version 2.0). We are doing so because not everything supported by version 1.0 is supported by version 2.0 and vice versa, e.g. branches are retrievable in API V 1.0 but not 2.0.

Field collapsing for duplicates from different branches
If a repo has more than one branch, the connector would index the files in all branches as separate documents. This means that whenever you are searching for something, you will see the same matching file from the different branches as separate hits as well. In order to avoid this, we created an ID called collapse_id which allows us to collapse hits of the same file, but from different branches, using queries similar to the following:

See another example of field collapsing using the top hits aggregation on elasticsearch.org.

SharePoint search display templates made easy

Madalina Rogoz — Mon, 23 Jun 2014 10:49:47 +0000

One of the most interesting features in SharePoint 2013 are the display templates. This blog post will describe how to customize a search display template in order to show the item created date.

The display templates are located in the site collection root site, under the _catalogs/Display Templates folder. Each template has two files: the html file and a javascript file. The javascript file is the one that SharePoint uses, while the html exists in order to make it easier for us developers to create and customize the display templates. Once an html file has been modified, SharePoint generates the corresponding updated javascript file, so the process is fully automated.

So what are the steps involved in creating a new display template?

In SharePoint designer, after opening the root website, navigate to the Display Templates folder. Here, in the Search subfolder, you will find the display templates that SharePoint uses for many types of results – documents, web sites, people and others. Choose a display template as a starting point for your own, depending on the result type that you are targeting, copy it and then rename it. Once saved, you will see that SharePoint has already generated the corresponding javascript file. Modify the html file as you wish. What is left to do now is to link the template to an actual result type, so that SharePoint knows what to associate your display template to. This can be done through the interface, by navigating to Site Settings – Search Result Types.

Let’s take it step by step.

Duplicating an existing display template

Open the site collection with SharePoint designer. Navigate to the Display Templates folder under _catalogs\masterpage\display templates. Here you will find some Folders that SharePoint uses to group Display Templates together.

The Word item template is in the Search folder. Make a copy of the Item_Word.html and rename it to ComperioItem_Word.html

Edit the file and inside you will see some html and javascript code.

Edit the title field so that your custom template has a different display name than the original one.

The tag contains some properties of the template. Some important properties are:

TargetControlType – tells SharePoint what the template is used for (search results, web parts, filters)
ManagedPropertyMapping – contains a list of the managed properties that are or can be used by the current template

The tag contains the rendering logic for the display template.

The template can only recognize the managed properties that are listed in the header. To use another managed property, add it here. We are using the Created property that displays the document created date.

 
'Title':'Title','Path':'Path', [………..] , ‘Created’:’Created’

Now that the template is aware of the property, you can use it by adding it to the body. Add the line below in the Body of the rendering template:

 _#= ctx.CurrentItem.Created =#_

Mapping the display template to a result type

Now we are going to manually link the display template to a result type, so that we can see that it works.

In order for SharePoint to use your custom template, it needs to know of it. So navigate to Site Actions – Site Settings and under Site Collection Administration choose Search Result Types. Create a New Result Type for your template like below:

Choose a Name
Choose Microsoft Word as content to match
Choose your custom display template

Now go to the results page and search for a word document. The results should display a date looking like this:

We have only been scratching the surface and a lot more customization can be done here, for example formatting the date value in javascript by using a function like format(“yyyy-MM-dd”) on a Date object. Combined, SharePoint and javascript have (almost) endless possibilities.

Happy coding!

5 reasons Lebron is the future, or why the Forage search engine will rock

Espen Klem — Wed, 28 May 2014 08:51:51 +0000

The Lebron stack

Last week, I saw the future. Wohaa, that’s always a great feeling. I’ve seen it in earlier weeks also, but now it was even brighter than before. For me, it’s still called the Lebron Stack as Max Ogden explains it and consists of LevelDB, Browserify and npm. All this is mostly happening in JavaScript. Before I’m knocked to the ground: I wasn’t the first to either make the prediction or say it out loud. I’m way behind, and it’s not a very novel or extreme idea, just a really good one. But when something is predicted, it may take a long time before it happens, if it happens at all. I think it’s happening now-ish.

So this blog post is about why I think that time is now. Disclaimer for the .Net and Java heads And all you .Net- and Java-heads will surely find some stuff that will be done better within your part of the world, but hear me out! I know the list of “This already exists in OS [W] or [X]” or “You can do that with software [X], [Y] or [Z]“. I have these thoughts my self, and I’ve been wondering why I still think that Lebron and JavaScript still will be so much more important. I’m not saying that .Net and Java stuff will go away, it will just be less important (it already is) and most of the cool and stuff closer to the user will happen in the JavaScript world.

The future is bright at the Webrebels conference in Oslo, May – 2014.

Here are the reasons I found so far

Most stuff happens in the browser Selling anything, you want to be where the people are. For regular people that’s on their smartphone using a web app or just a native app, which in most cases is a web app wrapped as a native app. Emerging markets makes this shift towards the browser happen even faster. The Firefox OS may fail as an OS, but still succeed creating a standard smartphone API for web applications, the WebAPI. This will make it even easier to create web apps for all of the world’s smartphones, which leads me on to my next point.
Easier for startups and developers Competing with the big ones is never easy. Amazon Web Services, AWS, and similar services made it a little easier to scale hardware use dynamically, and from that, the cost of hardware. With the browser as a VM and single page applications a lot of the web application rendering and logic is moved from the servers to the clients. So for a small company the choice is easy. Why do all the heavy lifting on your own servers when the users can do most of the application rendering and logic on their smartphones? The irony in the old “thin vs. thick client” debate is that the clients actually got a lot thinner, and in the same go started doing more of the heavy lifting. While a Google data center is impressive, I also got a feeling it’s a sign of something gone terribly wrong.
Collaboration, modularity and minimum effort npm is great stuff. It takes away a lot of dependency pain in the JavaScript world. Combined with people that are very good at writing small modular programs and lots of stuff under the MIT license we have a winner. We now have tools for collaboration that actually works. People build their killer apps with very little effort on top of others’ greatness. No more reinventing the text editor.
Cheaper hardware for regular users Okay, most people access the Internet through their phone, but the Chromebook explains this point very well. Why have a full OS, with all the hardware costs to run it fairly fast, when all you do is fire up a browser? The browser is the OS more and more each day. Last time my desktop at home broke down, I bought a new one. The new one was state of the art and it was a miscalculation buying it. Almost every time I boot it (running Ubuntu), I’m asked to upgrade to the newest version. That means every half year or so. The laptop I have, I actually use a little, but much less than my pad/tablet and phone.
Everything fun is online Not a real argument, but hey… Isn’t it true?

But what about the Forage search engine you say?

So, what does these reasons for Lebron/JavaScript’s future success have to do with the Forage search engine? First of all, it’s written in JavaScript and needs very little hardware to run properly. You install it with npm, and that takes care of all the dependencies, like LevelDB, where the data is actually stored. Hopefully it will run in the browser in near future using Browserify and make testing, installing and maintaining search software so much easier and more accessible. It also opens up a lot of new interesting use cases for search. My guess is that it won’t compete with the bigger search engines, but that it will open up the possibility for better and cheaper search functionality for small scale solutions.

Anything you want to add to or subtract from the list?

Instant Search in SharePoint 2013

Erik Andreassen Perez — Mon, 26 May 2014 07:27:23 +0000

Have you been thinking about implementing instant search to your SharePoint 2013 project, but not quite sure where to start? In this blog post I will try to explain how you can easily enhance the search experience in SharePoint 2013 in a few simple steps.

Instant search is widely known as «the way Google do it» – in fact they were the ones who started this trend and now everyone are used to it. What if you could give your SharePoint users the same experience they are already familiar with?

To begin with you must have in mind that instant search will produce a lot more queries and that your search performance will get worse if you have too many users hammering your SSA (Search Service Application). So, yes – there is a risk. The trade-off is high if you can increase the search experience to your users, but there’s also a risk of damaging the search experience totally if things start to go slow and people end up getting zero-hits.

Anyways, for the proof of concept you don’t need to think about this now :-)

Before moving on please check that you can meet these prerequisites:

Prerequisites

Administrator Access to a SharePoint 2013 site
SharePoint Designer 2013 or a way of mapping your site’s directory
A text editor such as Sublime Text or Notepad++ (doesn’t really matter, but I recommend an editor with some code highlighting / intellisense)
Basic understanding of the concept regarding DisplayTemplate in SharePoint 2013 (You can read more about it here and here :-) )

What we are going to do is that we are simply going to create a new DisplayTemplate for the SearchBox Control based on the original DisplayTemplate and edit the OnKeyUpEvent.

Changing the DisplayTemplate

Open SharePoint Designer 2013 and open your site
Browse to http://SPSite/_catalogs/masterpage/Display Template/Search
Create a copy of the DisplayTemplate named Control_SearchBox.html and name it Control_InstantSearchBox.html and open it and edit the -tag to something else (you’ll need it do identify the template later)</li> <li> Go to the javascript section in the template and create a function called doInstantSearch(clientControl, value, event) that only executes the normal query request.</li> </ol> </ol> <p>It should look something like this:<script src="https://gist.github.com/38a2fd89c8d10e15ea70.js?file=InstantSearch_Example_1.js"></script></p> <ol> <li>Edit the searchbox-control’s ‘onkeyup’-event by replacing the attribute’s value with: ‘doInstantSearch($getClientControl(this), this.value, event);’ and save your template. Now you can go to your SearchBox web part and change the template to your new one (look for the title you set in step 3.) and test it yourself.</li> <li>Now you have a one-to-one relationship between each key you press and query that you send to your SSA. However, as you will probably notice, the whole thing feels a little bit slow and not so smooth. This is because you are hammering your SSA and re-rendering your template each time you press a key – even if it’s not a real character – not so funny right? Let’s fix it.</li> <li>In order to fix this we need to add a character-filter and a timeout to our function. We will get the character from the keyCode and try to match the character using a regular expression. In this example we will operate with a RegEx matching all alphanumeric-characters including special norwegian characters like «æ ø å». If the character matches it we will reset the timeout function and start over with a new 300 ms delay. When we hit a non-alphanumeric character or the delay times out we will execute the search request.</li> </ol> <p>This is what we end up with:<br /> <script src="https://gist.github.com/38a2fd89c8d10e15ea70.js?file=InstantSearch_Example_2.js"></script></p> <p>And now you have an instant search with a 300 ms delay for each legal character that the user enters in the search box.</p> <p>Try it yourself :)</p> </article> <article> <h1>Crawl interfaces for Forage running inside your browser</h1> <p>Espen Klem — Wed, 21 May 2014 14:55:25 +0000</p> <p>Got an idea a while back on how we could use the JavaScript/Nodejs Search Engine <a href="https://github.com/fergiemcdowall/forage/">Forage</a> so that the users would <a href="http://blog.comperiosearch.com/blog/2014/04/29/idea-search-server-running-inside-your-browser/">have their own search server inside the browser</a>. The main takeaway from this would be that you don’t need to install anything to test the search engine. Since last time, I’ve made a quick logo for Forage, and drawn some more user interfaces. The mockups are mainly about crawl interfaces setting up the crawler, which in Forage terms is called Forage Fetch.</p> <h2>Crawl interfaces, suggested</h2> <p><a href="https://www.flickr.com/photos/eklem/14257669113/in/photostream/">Initial Crawl-window</a><br /> <a href="https://www.flickr.com/photos/eklem/14257669113/in/photostream/"></a></p> <p>To crawl most pages elegantly and easily, you need five information elements:</p> <ol> <li>Somewhere to start. Which place do you want your crawler to start. You don’t have to specify the domain, we pick the domain name from the page you’re visiting.</li> <li>Which links to follow. This is not necessarily the pages you want to crawl. Typically these pages have lists of pages you want to crawl.</li> <li>Which links not to follow. To not make the crawler go wild, you set some boundaries. Often a page has several URLs.</li> <li>Which links to crawl. These are the actual pages you’re looking for.</li> <li>Which links not to crawl.</li> </ol> <p>A simple illustration on the above rules. Forage Fetch doesn’t have all these features yet, but they’re <a href="https://github.com/fergiemcdowall/forage-fetch/issues/6">suggested as enhancements</a>.<br /> <a href="https://github.com/fergiemcdowall/forage-fetch/issues/6"></a></p> <p><a href="https://www.flickr.com/photos/eklem/14257669223/in/set-72157643790505944">Selecting which rule type to add<br /> </a><a href="https://www.flickr.com/photos/eklem/14257669223/in/set-72157643790505944"></a></p> <p>To ensure you’re adding valid rules, <a href="https://www.flickr.com/photos/eklem/14050882680/in/set-72157643790505944/">it’s a good ting to test first.<br /> </a><a href="https://www.flickr.com/photos/eklem/14050882680/in/set-72157643790505944/"></a></p> <p><a href="https://www.flickr.com/photos/eklem/14050906527/in/set-72157643790505944/">Start URL added<br /> </a><a href="https://www.flickr.com/photos/eklem/14050906527/in/set-72157643790505944/"></a></p> <p><a href="https://www.flickr.com/photos/eklem/14235224692/in/set-72157643790505944/">The minimum amount of rules needed to start the crawler<br /> </a><a href="https://www.flickr.com/photos/eklem/14235224692/in/set-72157643790505944/"></a></p> <p>Next tasks will be to make a clickable prototype in HTML/CSS and read up on HTML5 local storage/web storage.</p> <p><strong>All comments on the idea are welcome! </strong>Here’s <a href="http://blog.comperiosearch.com/blog/tag/forage/">what we’ve blogged about Forage</a> so far.</p> </article> </main></body></html>