spacer

Snowball

Introduction
Demo
Download
Mailing lists
License
Credits
Projects

Browse SVN


 

Links to resources

Quick Introduction
An account of Snowball
How You Can Help

Snowball
the manual
how to run it

Tar gzipped files of Snowball sources

stemmers
English (porter)
English (porter2)
A note on early English
Romance stemmers:
French
Spanish
Portuguese
Italian
Romanian
Germanic stemmers
German
(German variant)
Dutch
Scandinavian stemmers
Swedish
Norwegian
Danish
Russian
Finnish
Character codes

Contributed stemmers in other programming languages

Wrappers

External Contributions
Irish and Czech
Object Pascal codegenerator for Snowball
Two stemmers for Romanian
Hungarian
Turkish
Armenian
Basque (Euskera)
Catalan

Other work
The Schinke Latin stemmer
The Lovins English stemmer
The Kraaij/Pohlmann Dutch stemmer


Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it.



(Since it effectively provides a ‘suffix STRIPPER GRAMmar’, I had toyed with the idea of calling it ‘strippergram’, but good sense has prevailed, and so it is ‘Snowball’ named as a tribute to SNOBOL, the excellent string handling language of Messrs Farber, Griswold, Poage and Polonsky from the 1960s.

- Martin Porter)


Please address all Snowball-related mail to snowball-discuss@lists.tartarus.org. Any such mail sent directly to Martin Porter or Richard Boulton may be answered less speedily, and in any case they reserve the right to post their answers on snowball-discuss.

Major events

May 2012 - Contributed stemmers for Irish and Czech

Jul 2010 - Contributed stemmers for Armenian, Basque, Catalan

Mar 2007 - Romanian stemmer

Jan 2007 - Turkish stemmer Contributed by Evren (Kapusuz) Cilden

Sep 2006 - Hungarian stemmer Contributed by Anna Tordai

Jun 2006 - Supported and updated Python bindings

May 2005 - UTF-8 Unicode support

Sep 2002 - Finnish stemmer

Jul 2002 - ISO Latin I as default
The use of MS DOS Latin I is now history, but the old versions of the Snowball stemmers are still accessible on the site.

May 2002 - Unicode support

Feb 2002 - Java support
Richard has modified the snowball code generator to produce Java output as well as ANSI C output. This means that pure Java systems can now use the snowball stemmers.

Last modified on Thu, 01 Jan 1970 01:00:00 +0100.
Write to our mailing list if you have comments or questions about the project. Note that this mailing list will reject postings from non-subscribers (due to the immense amount of spam received otherwise). The list is fairly low-traffic, but if you don't wish to receive messages (but wish to be able to post), you can disable sending of messages in the mailing list options after subscribing.
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.