KinoSearch::InvIndexer.3pm

Langue: en

Version: 2010-05-02 (fedora - 01/12/10)

Section: 3 (Bibliothèques de fonctions)

NAME

KinoSearch::InvIndexer - build inverted indexes

WARNING

KinoSearch is alpha test software. The API and the file format are subject to change.

SYNOPSIS

     use KinoSearch::InvIndexer;
     use KinoSearch::Analysis::PolyAnalyzer;
 
     my $analyzer
         = KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
 
     my $invindexer = KinoSearch::InvIndexer->new(
         invindex => '/path/to/invindex',
         create   => 1,
         analyzer => $analyzer,
     );
 
     $invindexer->spec_field( 
         name  => 'title' 
         boost => 3,
     );
     $invindexer->spec_field( name => 'bodytext' );
 
     while ( my ( $title, $bodytext ) = each %source_documents ) {
         my $doc = $invindexer->new_doc($title);
 
         $doc->set_value( title    => $title );
         $doc->set_value( bodytext => $bodytext );
 
         $invindexer->add_doc($doc);
     }
 
     $invindexer->finish;
 
 

DESCRIPTION

The InvIndexer class is KinoSearch's primary tool for creating and modifying inverted indexes, which may be searched using KinoSearch::Searcher.

METHODS

new

     my $invindexer = KinoSearch::InvIndexer->new(
         invindex => '/path/to/invindex',  # required
         create   => 1,                    # default: 0
         analyzer => $analyzer,            # default: no-op Analyzer
     );
 
 

Create an InvIndexer object.

*
invindex - can be either a filepath, or an InvIndex subclass such as KinoSearch::Store::FSInvIndex or KinoSearch::Store::RAMInvIndex.
*
create - create a new invindex, clobbering an existing one if necessary.
*
analyzer - an object which subclasses KinoSearch::Analysis::Analyzer, such as a PolyAnalyzer.

spec_field

     $invindexer->spec_field(
         name       => 'url',      # required
         boost      => 1,          # default: 1,
         analyzer   => undef,      # default: analyzer spec'd in new()
         indexed    => 0,          # default: 1
         analyzed   => 0,          # default: 1
         stored     => 1,          # default: 1
         compressed => 0,          # default: 0
         vectorized => 0,          # default: 1
     );
 
 

Define a field.

*
name - the field's name.
*
boost - A multiplier which determines how much a field contributes to a document's score.
*
analyzer - By default, all indexed fields are analyzed using the analyzer that was supplied to new(). Supplying an alternate for a given field overrides the primary analyzer.
*
indexed - index the field, so that it can be searched later.
*
analyzed - analyze the field, using the relevant Analyzer. Fields such as ``category'' or ``product_number'' might be indexed but not analyzed.
*
stored - store the field, so that it can be retrieved when the document turns up in a search.
*
compressed - compress the stored field, using the zlib compression algorithm.
*
vectorized - store the field's ``term vectors'', which are required by KinoSearch::Highlight::Highlighter for excerpt selection and search term highlighting.

new_doc

     my $doc = $invindexer->new_doc;
 
 

Spawn an empty KinoSearch::Document::Doc object, primed to accept values for the fields spec'd by spec_field.

add_doc

     $invindexer->add_doc($doc);
 
 

Add a document to the invindex.

add_invindexes

     my $invindexer = KinoSearch::InvIndexer->new( 
         invindex => $invindex,
         analyzer => $analyzer,
     );
     $invindexer->add_invindexes( $another_invindex, $yet_another_invindex );
     $invindexer->finish;
 
 

Absorb existing invindexes into this one. May only be called once per InvIndexer. add_invindexes() and add_doc() cannot be called on the same InvIndexer.

delete_docs_by_term

     my $term = KinoSearch::Index::Term->new( 'id', $unique_id );
     $invindexer->delete_docs_by_term($term);
 
 

Mark any document which contains the supplied term as deleted, so that it will be excluded from search results. For more info, see Deletions in KinoSearch::Docs::FileFormat.

finish

     $invindexer->finish( 
         optimize => 1, # default: 0
     );
 
 

Finish the invindex. Invalidates the InvIndexer. Takes one hash-style parameter.

*
optimize - If optimize is set to 1, the invindex will be collapsed to its most compact form, which will yield the fastest queries.
Copyright 2005-2009 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.165.