spacer
spacer

looking for news?
spacer

The WebGlimpse Solution

Webglimpse site search software includes a web administration interface, remote link spider, and the powerful Glimpse file indexing and query system. Add sophisticated search capability to your site.

Webglimpse is scalable: index one small local site, hundreds of remote sites, or gigabytes of compressed documents. The code is open, mature, widely used, and actively supported.

much more.-->

Requires Unix server to run, can index documents on any server as long as they are accessible via Web or networked drive.

Aside from creating a searchable website, Webglimpse can be used for data mining applications, as part of a document management solution, as Glimpse for LXR and other integrated solutions.

spacer Check out the extremely cool search at bTucson.com for Tucson, Arizona. Webglimpse combines SQL item and category matches with full text search, without doing any SQL queries. How? We use a specially formatted dump of the tables to search, find matching records with glimpse and rapidly parse out the fields. The links from the search go to SQL-generated pages and print specific database fields.

bTucson.com also uses our new project, Abra cataloging software. Ajax-y drop-down menus let the user quickly access 28000+ categories from a single small-footprint page. The trick here is we use compacted category codes to speed access to subcategory trees.

Webglimpse and Abra are projects of Internet WorkShop. We also provide search-enabled Web Hosting.


Download latest versions as of 3/22/07:
Webglimpse  2.17.3   Glimpse 4.18.5
   Deutsch   Espaol   Français   Hebrew   Italiano   Nederlands   Polsku   Româneşte   Suomi   Eesti keeles   Bulgarian   translate
Some recent history: (see Developer's area for the full story, Downloads for the code)

3/22/07: One thing leads to another - Webglimpse 2.17.3 has a fix for dealing with badly formed URLs. Basically we eval() the URI call and fall thru to our own methods if the URI module fails. Also some minor tweaks for dealing with multiple archives, and a new option inside the makenh script for keeping large hashes out of memory by using dbmopen.

3/15/07: Webglimpse 2.17.2 uses the URI cpan module if available for URL parsing instead of our own recipe from back when.

1/25/07: New German (Deutsch) translations available of several pages of this site.

1/23/07: Webglimpse 2.17.1 has a minor tweak to avoid errors when LWP is not available.

12/06/06: Webglimpse 2.17.0 contains a new nifty add-on called BibGlimpse that lets you use Webglimpse as a scientific reprints repository. Thanks to David Kreil and Tom Tuechler at Boku Bioinformatics, Vienna, Austria. Take a look at the BibGlimpse online documentation that also links to detailed installation instructions.

11/23/06: There now is a new application available to aid distributed literature research that builds on WebGlimpse and which comes in the latest WebGlimpse distribution. Features of the light-weight PDF reprint manager, BibGlimpse, include addition of reprints without forms, automated bibliographic record retrieval for PubMed listed papers using machine learning techniques to match a record to the PDF with a success rate of over 95%, management of user annotation of papers, and structured full-text queries using the WebGlimpse engine.

9/30/06: RECOMMENDED: Webglimpse 2.16.4 adds one last bit of sanitization by eliminating some deprecated variables. This is the currently recommended release.

9/17/06: Webglimpse 2.16.3 adds further sanitization for output variables to prevent XSS/HTML injection - we failed to fully sanitize the query string as unusual chars are required for regexp queries, but we need to eliminate those before displaying the query string on a page. Also minor fixes to the "Within X words" feature and to allow spaces inside archive configuration variables.

8/16/06: Webglimpse 2.16.2 has some further tweaks to handling of BASE HREF tags and also eliminates mailto: and javascript: links at an earlier step in the spidering process.

8/10/06: The Searchable Site article is now available for all to view (without a subscription to Linux Journal). Has a nice introduction to Webglimpse for newcomers, with a focus on making your cool website generate some revenue...

8/01/06: Webglimpse 2.15.5 installs htuml2txt.pl filter automatically and removes the choice, that was confusing some users. With prefiltering on htuml2txt.pl should always be the best option. Alternatefilters can still be specified by manually editing the .glimpse_filters file in the archive directory.

7/23/06: Webglimpse 2.15.3 removes <SCRIPT ..> ... </SCRIPT> sections by default (if you choose the default htuml2txt.pl as the filter). Eliminates ugly hit results matching javascript code.

7/19/06: Webglimpse 2.15.0 uses WWW::Mechanize if its available, to parse the links out of each page. If not available we fall back to our original code; but WWW:Mechanize does a better job of recognizing BASE HREF tags and generally has more modern HTML parsing code.

5/30/06: Webglimpse 2.14.9 has a fix to the NextHits toolbar (a problem was caused by the input sanitization introduced in 2.14.5)

5/30/06: Check out The Searchable Site - our article in the July issue of Linux Journal! (you must be a subscriber - we'll post the content here on August 1)

5/24/06: Webglimpse 2.14.8 has bugfixes and improvements fixing warnings and speeding up searches on large archives. Tests not required in the ranking formula are avoided. Fix to CenterOutput routine avoids warnings and also speeds up code if centering is not necessary.

5/05/06: Webglimpse 2.14.7 has small fix for handling literal '[' and ']' chars in URLs and other places. Thanks to Robert Pelcher for the report!

5/01/06: The cPanel installer now automatically adds an alias of www.[domain name] during installation. cPanel does this automatically for domains so we do too...

4/23/06: cPanel installer now uses an automatic installation script. Single command and you are done, if you are the administrator of a cPanel server. Thanks to contract programmer Julian Lishev for getting this done and making Webglimpse accessible to the world of cPanel users.

4/19/06: The Webglimpse Manual. Finally, an organized layout of all the Webglimpse documentation plus several brand new docs. Thanks to contract docwriter Edis Feldhouse who did a really heroic job (especially given what she had to work with!). The manual is released in beta and any feedback is welcome!

4/17/06: Webglimpse 2.14.6 has several fixes for jump-to-line function; also allows use of custom filters that require the original filename prior to filtering. Thanks to Dr D. P. Kreil and to alert user Steve Cochran for the fixes & reports!

4/08/06: Webglimpse 2.14.5 adds additional input sanitization to fix reported XSS vulnerability

4/08/06: cPanel installer released. 3.X FTP-only install is pulled for now due to difficulties with permissions and security issues. During beta testing we found that most users with only FTP access are either under cPanel or hSphere anyway, so using the built-in install mechanisms of those platforms will give us better usability and security.

4/03/06: Webglimpse 3.0.11b has fix for the installer to display path to a detailed log file in case of error. FTP only install has several potential permissions issues, but in some cases its the only option users have...

4/01/06: Glimpse 4.18.5 has several compile-time fixes and a new make check target, thanks to Nelson Beebe, who not only patiently took us through several rounds of fixes but contributed binaries to 19 different platforms! Thanks, Nelson!

(binaries now available for several flavors of Linux, SunOS, FreeBSD, Darwin, IRIX, NetBSD, OSF1 and OpenBSD)

3/18/06: Not strictly a Webglimpse thing, but since I'm the primary maintainer...I've finally got a home page (Golda's). It does use the 'next generation' Abra software that we're working on...

3/6/06: Webglimpse 2.14.3 adds three hidden tags that allow you to modify the URL of hit results. This is useful for shopping carts such as SoftCart or Minivend that need to keep a session id in the URL, and for use with Google Analytics. Thanks to Tom Monroe of Infinity Imaging for this detailed Analytics & Softcart HowTo!

2/14/06: Webglimpse 2.14.2 now supports LimitPrefix for TREE type roots. That means, you can traverse external links on only a limited portion of a istarting site without hitting unwanted pages.

2/14/06: A user reports that Glimpse 4.18.2 "compiled and seems to be working find under MS services for unix" - so Windows users may be able to now use Glimpse without worrying about Cygwin. Windows Services for Unix appears to be a free download from Microsoft.

2/6/06: Webglimpse 3.00.01b is now available for beta test. Web based install wizard eliminates need for shell access (or at most a few commands will be needed to set permissions). Beta testers welcome!

2/3/06: Glimpse 4.18.2 ends backwards compatibility with varargs.h, as some systems now don't have STRICT_ANSI defined to let us know they have stdarg.h. That's ok ... stdarg.h is there on just about all *nixes since circa 1995.

9/5/05: Note added to docs - delete /tmp/xpdf* files if using xpdf to filter your PDFs to text. Also we are in development on Webglimpse 3.0, which will feature an FTP-only install.

8/09/05:Webglimpse 2.14.1 fixes a bug that prevented the administrative interface cookies from working with Internet Explorer. Also some fine-tuning to the keyword highlighting & centering code.

4/29/05:Webglimpse 2.14.0 now centers output around the keywords and makes sure to always display the matched keywords even when the matching line has to be trimmed for size.

3/22/05:Webglimpse 2.13.2 fixes a bug which on some systems prevented indexing of PDF files. Also adds a new optional filter script, and improves the German language output results template.

12/28/04: Webglimpse 2.13.1 has support for Dutch (Nederlandse) thanks to Rev David Morris of GentleWare Studios!

11/27/04: Webglimpse 2.13.0 now supports the option to find keywords WITHIN X WORDS of each other (wordspan). Also some minor fixes regarding highlighting keywords and elimination of redunant links when spidering.

10/7/04: Webglimpse 2.12.2 has some minor fixes and tweaks dealing with special characters in searches, caching and structured queries.

8/02/04: Webglimpse 2.12.0 detects and uses LWP and HTTP modules if available. This enables us to traverse sites requiring cookies and cookie-based login.

6/10/04: Glimpse 4.18.0 has new configure script generated by autoconf 2.57 - may fix compilation problems on FreeBSD

5/25/04: Webglimpse 2.11.0 has support for Romanian and updated French text, thanks to Marian-Nicolae V. Ion!

4/27/04: Webglimpse 2.10.4 has a fix to make jump-to-line work on subsequent 'Next Hits' pages

4/16/04: wgusers mailing list is re-enabled using Mailman. (Had been taken down as spammers were abusing the list thru majordomo)

3/11/04: Webglimpse 2.10.2 has several fixes to the Customized Output module,specifically to the INCLUDE FILE feature and affecting behaviour of cached output pages.

12/12/03: Webglimpse 2.10.1 adds optional , cleans up Next Hits toolbar and offers 100% uptime

11/16/03: Webglimpse 2.8.1 is a maintenance release with several small fixes and additional tests. Recommended to install.

9/02/03: Webglimpse 2.8.0 has much cleaner, more modern results output (commerical version) using stylesheets. Also added support for Bulgarian, several minor bugfixes.

6/19/03: Webglimpse 2.7.8 handles Russian month names correctly (thanks Adeena Ascher at the JDC!), and fixes a problem with cachefile links containing the '#' wildcard.

6/16/03: Webglimpse 2.7.7 now defaults to DD/MM/YY numeric dates for non-English languages.

5/15/03: Webglimpse 2.7.6 has support for Polish (Polsku), thanks to Wojciech Dorosz! Also some minor fixes involving queries with quote characters.

4/5/03: Webglimpse 2.7.4 can prefilter files for greater speed; plus several other major fixes and features.

2/02/03: Webglimpse 2.6.7 has improved handling of PHP files and more powerful options for Customizing results output.

12/25/02: White paper for using Agrep and MySQL for powerful full-text searches of database entries, by Kevin McGrail

11/29/02: Glimpse 4.17.2 has fixes for compiling on FreeBSD, binaries now available for MacOS and Linux.

11/22/02: PPL: Pay-per-line licensing model to be tried for next-generation search project.

11/18/02: Webglimpse 2.6.2 can search multiple archives from a single search screen. Also has an option to traverse but not index starting 'trunk' pages in a link tree.

10/01/02: Webglimpse 2.5.4 has a new option for greatly improved performance when searching for common words in large archives; an option to return full sentences instead of fragments; several minor fixes; and output templates in Estonian!

8/01/02: Webglimpse 2.5.1 allows you to highlight query words in color, or use your own custom tags. Plus, smarter and easier install for non-root users.

6/17/02: Webglimpse 2.4.6 provides several options for search interfaces, including use of ANY or ALL keywords instead of making the user create their own boolean expression by hand. Plus several minor fixes and a somewhat significant one to handling of documents with no titles.

5/5/02: Webglimpse 2.4.0 has the ability to search only the links on any particular page; if you want you can add search boxes to all the pages in your site so your users can combine browsing with searching. Plus lots of other fix es and tweaks, including the ability to re-sort hits after a search, handle files with really long lines, and optionally register your archives with us!

3/21/02: Webglimpse 2.3.3 is a maintenance release with several minor bugfixes, also some more flexible rules for defining sites and a new statistics module for logging searches. Webglimpse 2.3.1 added support for multiple ranking formulas: users can sort hits by date, title, meta tags, or (also new) link popularity. Plus, new templates for French and Norwegian languages.

11/04/01: Webglimpse 2.2.0 has a nifty command-line interface for managing your archives through a telnet session. ===> Versions 2.2.1 and higher have an important security fix for link-based archives. And 2.2.2 can auto-generate its own search form.

All 2.X versions have some cool web-based adminstration tools for multiple archives, the ability to combine local directories, remote sites and links into a single archive, and the beginnings of category support. See the Live demo (read-only) of the management interface.


Glimpse was originally developed by Udi Manber, Sun Wu, and Burra Gopal.

Glimpse, Webglimpse and this site now maintained by Internet Workshop, and the Webglimpse developer community.


Webglimpse Supporters: Webglimpse does not have general news and information, if looking for news please try 4am news. News for Germany. eBay Einkaufstipps The Online Marketplace. Branchenbuch Deutschland in Gemany, Branchenbuch Österreich in Austria and Branchenbuch Schweiz in Switzerland. If you are looking for foreign free kostenlose counter domain services, please use PAM. meta suchmaschine Games und Spiele bei Spieletipps.
[ Home ]  [ Purchase ]  [ Downloads ]  [ Docs ]  [ Support ]  [ Contact Us ]  [ Hosting ]  [ Top of Page ]

[ Webglimpse Advanced Site Search Software ] 


Copyright © Internet WorkShop, 2002. All Rights Reserved. Legal Disclaimer and Privacy Policy. -->
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.