Adding search to your website:
There are a plethora of search options available for your web site.
- Outsourced Search: Use a search service to spider and index your site and provide the search box
on your web pages, search engine and provide the search results which
point back to your site.
The actual search index and spidering of the web site is handled by the
search service. Google and others can provide this service.
It is the equivalent of a Google search with:
search-word site:your-domain.com
The HTML form which calls the Google search engine can also embed the domain to acheive the same effect. Google and other search firms provide free and for fee services. - Your own Search: Index your site and provide your own search capabilities.
Commercial and open source solutions exist. This can be a cgi program
which performs a grep/search on the site contents when called or it can
use a previously generated index of the contents of the web site for faster results.
- Search Appliance: A separate search "appliance" or search server can spider your web site and provide the search facility for your site. This works best for sites with multiple web servers or for intranets with multiple file and web servers.
Product | Vendor | Web Site |
---|---|---|
Google site search | http://www.google.com/help/features.html#sitesearch
or Google API's or AdSense for search |
|
Yahoo Searchbox | Yahoo | Yahoo small business searchbox |
Bloodhound | Bloodhound | http://www.bloodhound.com/ |
Vendor | Product |
---|---|
Focuseek | Searchbox2: Index HTML. PDF, MS/Word, RTF and plain text documents |
Folio | Folio Site Director |
Google enterprise solutions:
|
|
SLI Systems | Learning search (for eCommerce sites) |
Lycos | Inmagic |
Maxum Development Corp. | Phantom |
Netscape | Compass Server |
Quadralay Corp | Web Works Search |
HotBot | www.hotbot.com |
Opentext | Livelink |
Multi-Media
Product | Vendor | Web Site | Use |
---|---|---|---|
UKMax | - | - | - |
ICQ | - | - | - |
Copernic Technologies Inc. (Quebec City) | - | www.copernic.com | Queries multiple search engines. |
Clever | IBM | www.almaden.ibm.com/cs/k53/clever.html | Ranks search results.
Most authorative first. |
Thunderstone | - | www.thunderstone.com | - |
Direct Hit | - | www.directhit.com | Uses personal info to modify search.
Incorporates relevance ranking. |
Islip | - | www.islip.com | Indexes video closed captioned text. |
Network Wizards | - | www.nw.com | - |
Search Engine | Web Site |
---|---|
perl_site_search | Simplest search to implement |
SWISH | Version 1.1: Use on low number of local pages only. |
SWISH++ | The fastest SWISH. Written in C++. |
Lucene | From the Apache group. Written in Java and runs on Tomcat. |
WebGlimpse/Glimpse | Original U of Arizona and commercial versions. Written in Perl and C.
HTML, PDF, Word and other formats. |
freeWais | Can perform "And", "Or" and "Not" type searches.
Also:
|
freeWais-sf | One of the first available content indexing/search engines.
The SF is for "Structured Fields". These fields are used for informations types such as author, title, date... Can perform "And", "Or" and "Not" type searches. Info:
|
DataParkSearch | HTML, plain text, audio MP3 and GIF images. Supports synonyms, and fuzzy search. Multi-character support. Index and CGI. GPL |
Spider/Robot Index and Engine | |
ht/Dig | Search/Index single site resident on server or spider remote WWW servers.
Supports robots.txt exclusions. HTML and plain text documents. GPL. (San Diego State U.)
See the YoLinux htDig Web Site Search installation and configuration tutorial (default Red Hat/Fedora/CentOS web site search) |
Harvest (Robot Indexer) | Supports HTML include TeX, DVI, PS, full text, mail, man pages, news, troff, WordPerfect, RTF, Microsoft Word/Excel, SGML, C sources and PDF (using Xpdf) Modular. Written in Perl. |
Solr and Lucene | From the Apache group. Software works together to provide an enterprise search solution:
Solr: fulltext search, HTML administration interface, distributed seach, Hit highlighting, ... Lucene: Available in Java, C++, PHP, Python, ... Will index text from PDFs, HTML, Microsoft Word, and OpenDocument documents, ... Indexing and search. |
Search Recommendations for your web site:
- The most simple solution is to use outsourced search.
Google and others can provide a search box for your web page and the service to index your site and provide this search capability. Let them handle it.
- The next most simple solution is only for small simple web sites with static web pages: perl_site_search. It can index your local pages on your hard drive and provide a simple search CGI. This can not be used for dynamic content or with server side includes. The entire web page must reside in a single HTML file.
- If your site is more complex and produces dynamic content, a spider must make HTTP requests from your web server to gather and index the content.
I have found ht/Dig to be easy to employ as it is provided with most Linux distributions and just requires configuration.
See the YoLinux htDig Web Site Search installation and configuration tutorial (default Red Hat/Fedora/CentOS web site search) - For a fully featured, high performance and very sophisticated enterprise search, look at Solr/Lucene. This will require the installation of a Java App server such as Tomcat and a fair bit of configuration and system administration.
YoLinux.com Site Search Setup Tutorials:
- ht://Dig - Comes with most Linux distributributions
- WAIS - One of the originals - Wide Area Information Server
- American Society of Indexers - http://ASIndexing.org
On line reviews.
Comprehensive list of search sites. See:
- SearchEngineWatch.com
- SearchTools.com: Tools for web sites and intranets