Skip to content

volkan/lucene-solr-filter-eliminateduplicate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Eliminate duplicate words components for Apache Lucene/Solr

Build Status

Please use the following field type definitions.

Remove duplicate words

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="org.apache.lucene.EliminateDuplicateFilterFactory" />
  </analyzer>
</fieldType>

Result

Input Output
text word word text word word text word

Custom PositionFilterFactory

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
	  <filter class="org.apache.lucene.PositionFilterFactory" />
	  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

Result

Input Output
text word word text word word text word

About

Eliminate Duplicate Words

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages