Lucene5學習之排序
這回我們來學習Lucene的排序。機智的少年應該已經發現了,IndexSearcher類的search方法有好幾個重載:
/** Finds the top <code>n</code>
- hits for <code>query</code>. *
- @throws BooleanQuery.TooManyClauses If a query would exceed
- {@link BooleanQuery#getMaxClauseCount()} clauses. */ public TopDocs search(Query query, int n) throws IOException { return search(query, null, n); }
/** Finds the top <code>n</code>
- hits for <code>query</code>, applying <code>filter</code> if non-null. *
- @throws BooleanQuery.TooManyClauses If a query would exceed
{@link BooleanQuery#getMaxClauseCount()} clauses. */ public TopDocs search(Query query, Filter filter, int n) throws IOException { return search(createNormalizedWeight(wrapFilter(query, filter)), null, n); }
/* Lower-level search API.
- <p>{@link LeafCollector#collect(int)} is called for every matching
- document. *
- @param query to match documents
- @param filter if non-null, used to permit documents to be collected.
- @param results to receive hits
- @throws BooleanQuery.TooManyClauses If a query would exceed
{@link BooleanQuery#getMaxClauseCount()} clauses. */ public void search(Query query, Filter filter, Collector results) throws IOException { search(leafContexts, createNormalizedWeight(wrapFilter(query, filter)), results); }
/* Lower-level search API.
- <p>{@link LeafCollector#collect(int)} is called for every matching document. *
- @throws BooleanQuery.TooManyClauses If a query would exceed
{@link BooleanQuery#getMaxClauseCount()} clauses. */ public void search(Query query, Collector results) throws IOException { search(leafContexts, createNormalizedWeight(query), results); }
/** Search implementation with arbitrary sorting. Finds
- the top <code>n</code> hits for <code>query</code>, applying
- <code>filter</code> if non-null, and sorting the hits by the criteria in
- <code>sort</code>.
- <p>NOTE: this does not compute scores by default; use
- {@link IndexSearcher#search(Query,Filter,int,Sort,boolean,boolean)} to
- control scoring. *
- @throws BooleanQuery.TooManyClauses If a query would exceed
{@link BooleanQuery#getMaxClauseCount()} clauses. */ public TopFieldDocs search(Query query, Filter filter, int n, Sort sort) throws IOException { return search(createNormalizedWeight(wrapFilter(query, filter)), n, sort, false, false); }
/** Search implementation with arbitrary sorting, plus
- control over whether hit scores and max score
- should be computed. Finds
- the top <code>n</code> hits for <code>query</code>, applying
- <code>filter</code> if non-null, and sorting the hits by the criteria in
- <code>sort</code>. If <code>doDocScores</code> is <code>true</code>
- then the score of each hit will be computed and
- returned. If <code>doMaxScore</code> is
- <code>true</code> then the maximum score over all
- collected hits will be computed.
- @throws BooleanQuery.TooManyClauses If a query would exceed
- {@link BooleanQuery#getMaxClauseCount()} clauses.
*/
public TopFieldDocs search(Query query, Filter filter, int n,
Sort sort, boolean doDocScores, boolean doMaxScore) throws IOException {
return search(createNormalizedWeight(wrapFilter(query, filter)), n, sort, doDocScores, doMaxScore);
}</pre>
query參數就不用解釋了,filter用來再次過濾的,int n表示只返回Top N,Sort表示排序對象, doDocScores這個參數是重點,表示是否對文檔進行相關性打分,如果你設為false,那你索引文檔的score值就是NAN, doMaxScore表示啥意思呢,舉個例子說明吧,假如你有兩個Query(QueryA和QueryB),兩個條件是通過BooleanQuery連接起來的,假如QueryA條件匹配到某個索引文檔,而QueryB條件也同樣匹配到該文檔,如果doMaxScore設為true,表示該文檔的評分計算規則為取兩個Query(當然你可能會有N個Query鏈接,那就是N個Query中取最大值)之中的最大值,否則就是取兩個Query查詢評分的相加求和。默認為false. 注意:在Lucene4.x時代,doDocScores和doMaxScore這兩個參數可以通過indexSearcher類來設置 ,比如這樣:
searcher.setDefaultFieldSortScoring(true, false);
而在Lucene5.x時代,你只能在調用search方法時傳入這兩個參數,比如這樣:
searcher.search(query, filter, n, sort, doDocScores, doMaxScore);
看方法聲明我們知道,我們如果需要改變默認的按評分降序排序行為,則必須傳入一個Sort對象,那我們來觀摩下Sort類源碼:
package org.apache.lucene.search;
/*
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at *
- http://www.apache.org/licenses/LICENSE-2.0 *
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. */
import java.io.IOException; import java.util.Arrays;
/**
- Encapsulates sort criteria for returned hits. *
- <p>The fields used to determine sort order must be carefully chosen.
- Documents must contain a single term in such a field,
- and the value of the term should indicate the document's relative position in
- a given sort order. The field must be indexed, but should not be tokenized,
- and does not need to be stored (unless you happen to want it back with the
- rest of your document data). In other words: *
- <p><code>document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));</code></p>
- *
- <p><h3>Valid Types of Values</h3> *
- <p>There are four possible kinds of term values which may be put into
- sorting fields: Integers, Longs, Floats, or Strings. Unless
- {@link SortField SortField} objects are specified, the type of value
- in the field is determined by parsing the first term in the field. *
- <p>Integer term values should contain only digits and an optional
- preceding negative sign. Values must be base 10 and in the range
- <code>Integer.MIN_VALUE</code> and <code>Integer.MAX_VALUE</code> inclusive.
- Documents which should appear first in the sort
- should have low value integers, later documents high values
- (i.e. the documents should be numbered <code>1..n</code> where
- <code>1</code> is the first and <code>n</code> the last). *
- <p>Long term values should contain only digits and an optional
- preceding negative sign. Values must be base 10 and in the range
- <code>Long.MIN_VALUE</code> and <code>Long.MAX_VALUE</code> inclusive.
- Documents which should appear first in the sort
- should have low value integers, later documents high values.
- <p>Float term values should conform to values accepted by
- {@link Float Float.valueOf(String)} (except that <code>NaN</code>
- and <code>Infinity</code> are not supported).
- Documents which should appear first in the sort
- should have low values, later documents high values. *
- <p>String term values can contain any valid String, but should
- not be tokenized. The values are sorted according to their
- {@link Comparable natural order}. Note that using this type
- of term value has higher memory requirements than the other
- two types. *
- <p><h3>Object Reuse</h3> *
- <p>One of these objects can be
- used multiple times and the sort order changed between usages. *
- <p>This class is thread safe. *
- <p><h3>Memory Usage</h3> *
- <p>Sorting uses of caches of term values maintained by the
- internal HitQueue(s). The cache is static and contains an integer
- or float array of length <code>IndexReader.maxDoc()</code> for each field
- name for which a sort is performed. In other words, the size of the
- cache in bytes is: *
- <p><code>4 IndexReader.maxDoc() (# of different fields actually used to sort)</code> *
- <p>For String fields, the cache is larger: in addition to the
- above array, the value of every term in the field is kept in memory.
- If there are many unique terms in the field, this could
- be quite large. *
- <p>Note that the size of the cache is not affected by how many
- fields are in the index and <i>might</i> be used to sort - only by
- the ones actually used to sort a result set. *
- <p>Created: Feb 12, 2004 10:53:57 AM *
@since lucene 1.4 */ public class Sort {
/**
- Represents sorting by computed relevance. Using this sort criteria returns
- the same results as calling
- {@link IndexSearcher#search(Query,int) IndexSearcher#search()}without a sort criteria,
only with slightly more overhead. */ public static final Sort RELEVANCE = new Sort();
/* Represents sorting by index order. / public static final Sort INDEXORDER = new Sort(SortField.FIELD_DOC);
// internal representation of the sort criteria SortField[] fields;
/**
- Sorts by computed relevance. This is the same sort criteria as calling
- {@link IndexSearcher#search(Query,int) IndexSearcher#search()}without a sort criteria,
only with slightly more overhead. */ public Sort() { this(SortField.FIELD_SCORE); }
/* Sorts by the criteria in the given SortField. / public Sort(SortField field) { setSort(field); }
/** Sets the sort to the given criteria in succession: the
- first SortField is checked first, but if it produces a
- tie, then the second SortField is used to break the tie,
- etc. Finally, if there is still a tie after all SortFields
are checked, the internal Lucene docid is used to break it. */ public Sort(SortField... fields) { setSort(fields); }
/* Sets the sort to the given criteria. / public void setSort(SortField field) { this.fields = new SortField[] { field }; }
/** Sets the sort to the given criteria in succession: the
- first SortField is checked first, but if it produces a
- tie, then the second SortField is used to break the tie,
- etc. Finally, if there is still a tie after all SortFields
are checked, the internal Lucene docid is used to break it. */ public void setSort(SortField... fields) { this.fields = fields; }
/**
- Representation of the sort criteria.
@return Array of SortField objects used in this sort criteria */ public SortField[] getSort() { return fields; }
/**
- Rewrites the SortFields in this Sort, returning a new Sort if any of the fields
- changes during their rewriting. *
- @param searcher IndexSearcher to use in the rewriting
- @return {@code this} if the Sort/Fields have not changed, or a new Sort if there
- is a change
- @throws IOException Can be thrown by the rewriting
@lucene.experimental */ public Sort rewrite(IndexSearcher searcher) throws IOException { boolean changed = false;
SortField[] rewrittenSortFields = new SortField[fields.length]; for (int i = 0; i < fields.length; i++) { rewrittenSortFields[i] = fields[i].rewrite(searcher); if (fields[i] != rewrittenSortFields[i]) { changed = true; } }
return (changed) ? new Sort(rewrittenSortFields) : this; }
@Override public String toString() { StringBuilder buffer = new StringBuilder();
for (int i = 0; i < fields.length; i++) { buffer.append(fields[i].toString()); if ((i+1) < fields.length) buffer.append(','); }
return buffer.toString(); }
/* Returns true if <code>o</code> is equal to this. / @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof Sort)) return false; final Sort other = (Sort)o; return Arrays.equals(this.fields, other.fields); }
/* Returns a hash code value for this object. / @Override public int hashCode() { return 0x45aaf665 + Arrays.hashCode(fields); }
/* Returns true if the relevance score is needed to sort documents. / public boolean needsScores() { for (SortField sortField : fields) { if (sortField.needsScores()) { return true; } } return false; }
}</pre>
首先定義了兩個靜態常量:
public static final Sort RELEVANCE = new Sort();
public static final Sort INDEXORDER = new Sort(SortField.FIELD_DOC);
RELEVANCE 表示按評分排序,
INDEXORDER 表示按文檔索引排序,什么叫按文檔索引排序?意思是按照索引文檔的docId排序,我們在創建索引文檔的時候,Lucene默認會幫我們自動加一個Field(docId), 如果你沒有修改默認的排序行為,默認是先按照索引文檔相關性評分降序排序(如果你開啟了對索引文檔打分功能的話),然后如果兩個文檔評分相同,再按照索引文檔id升序排列。
然后就是Sort的構造函數,你需要提供一個SortField對象,其中有一個構造函數要引起你們的注意:
public Sort(SortField... fields) { setSort(fields); }
SortField… fields寫法是JDK7引入的新語法,類似于以前的SortField[] fields寫法,但它又于以前的這種寫法有點不同,它支持field1,field2,field3,field4,field5,………fieldN這種方式傳參,當然你也可以傳入一個數組也是可以的。其實我是想說Sort支持傳入多個SortField即表示Sort是支持按多個域進行排序,就像SQL里的order by age,id,哦-哦,TM又扯遠了,那接著去觀摩下SoreField的源碼,看看究竟:
public class SortField { /** * Specifies the type of the terms to be sorted, or special types such as CUSTOM */ public static enum Type { /** Sort by document score (relevance). Sort values are Float and higher * values are at the front. */ SCORE, /** Sort by document number (index order). Sort values are Integer and lower * values are at the front. */ DOC, /** Sort using term values as Strings. Sort values are String and lower * values are at the front. */ STRING, /** Sort using term values as encoded Integers. Sort values are Integer and * lower values are at the front. */ INT, /** Sort using term values as encoded Floats. Sort values are Float and * lower values are at the front. */ FLOAT, /** Sort using term values as encoded Longs. Sort values are Long and * lower values are at the front. */ LONG, /** Sort using term values as encoded Doubles. Sort values are Double and * lower values are at the front. */ DOUBLE, /** Sort using a custom Comparator. Sort values are any Comparable and * sorting is done according to natural order. */ CUSTOM, /** Sort using term values as Strings, but comparing by * value (using String.compareTo) for all comparisons. * This is typically slower than {@link #STRING}, which * uses ordinals to do the sorting. */ STRING_VAL, /** Sort use byte[] index values. */ BYTES, /** Force rewriting of SortField using {@link SortField#rewrite(IndexSearcher)} * before it can be used for sorting */ REWRITEABLE } /** Represents sorting by document score (relevance). */ public static final SortField FIELD_SCORE = new SortField(null, Type.SCORE); /** Represents sorting by document number (index order). */ public static final SortField FIELD_DOC = new SortField(null, Type.DOC); private String field; private Type type; // defaults to determining type dynamically boolean reverse = false; // defaults to natural order // Used for CUSTOM sort private FieldComparatorSource comparatorSource;
首先你看到的里面定義了一個排序規則類型的枚舉Type,
SCORE:表示按評分排序,默認是降序
DOC:按文檔ID排序,除了評分默認是降序以外,其他默認都是升序
STRING:表示把域的值轉成字符串進行排序,
STRING_VAL也是把域的值轉成字符串進行排序,不過比較的時候是調用String.compareTo來比較的
STRING_VAL性能比STRING要差,STRING是怎么比較的,源碼里沒有說明。
相應的還有INT,FLOAT,DOUBLE,LONG就不多說了,
CUSTOM:表示自定義排序,這個是要結合下面的成員變量
private FieldComparatorSource comparatorSource;一起使用,即指定一個自己的自定義的比較器,通過自己的比較器來決定排序順序。
SortField還有3個比較重要的成員變量,除了剛才的說自定義比較器外:
private String field; private Type type; // defaults to determining type dynamically boolean reverse = false; // defaults to natural order
毫無疑問,Field表示你要對哪個域進行排序,即排序域名稱
Type即上面解釋過的排序規則即按什么來排序,評分 or docID 等等
reverse表示是否反轉默認的排序行為,即升序變降序,降序就變升序,比如默認評分是降序的,reverse設為true,則默認評分就按升序排序了,而其他域就按升序排序了。默認reverse為false.
OK,了解以上內容,我想大家已經對如何實現自己對索引文檔的自定義排序已經了然于胸了。下面我把我寫的測試demo代碼貼出來供大家參考:
首先創建用于測試的索引文檔:
package com.yida.framework.lucene5.sort; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.nio.file.Paths; import java.text.ParseException; import java.util.ArrayList; import java.util.Date; import java.util.List; import java.util.Properties; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.DateTools; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.IntField; import org.apache.lucene.document.NumericDocValuesField; import org.apache.lucene.document.SortedDocValuesField; import org.apache.lucene.document.SortedNumericDocValuesField; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.BytesRef; /** * 創建測試索引 * @author Lanxiaowei * */ public class CreateTestIndex { public static void main(String[] args) throws IOException { String dataDir = "C:/data"; String indexDir = "C:/lucenedir"; Directory dir = FSDirectory.open(Paths.get(indexDir)); Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer); indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND); IndexWriter writer = new IndexWriter(dir, indexWriterConfig); List<File> results = new ArrayList<File>(); findFiles(results, new File(dataDir)); System.out.println(results.size() + " books to index"); for (File file : results) { Document doc = getDocument(dataDir, file); writer.addDocument(doc); } writer.close(); dir.close(); } /** * 查找指定目錄下的所有properties文件 * * @param result * @param dir */ private static void findFiles(List<File> result, File dir) { for (File file : dir.listFiles()) { if (file.getName().endsWith(".properties")) { result.add(file); } else if (file.isDirectory()) { findFiles(result, file); } } } /** * 讀取properties文件生成Document * * @param rootDir * @param file * @return * @throws IOException */ public static Document getDocument(String rootDir, File file) throws IOException { Properties props = new Properties(); props.load(new FileInputStream(file)); Document doc = new Document(); String category = file.getParent().substring(rootDir.length()); category = category.replace(File.separatorChar, '/'); String isbn = props.getProperty("isbn"); String title = props.getProperty("title"); String author = props.getProperty("author"); String url = props.getProperty("url"); String subject = props.getProperty("subject"); String pubmonth = props.getProperty("pubmonth"); System.out.println("title:" + title + "\n" + "author:" + author + "\n" + "subject:" + subject + "\n" + "pubmonth:" + pubmonth + "\n" + "category:" + category + "\n---------"); doc.add(new StringField("isbn", isbn, Field.Store.YES)); doc.add(new StringField("category", category, Field.Store.YES)); doc.add(new SortedDocValuesField("category", new BytesRef(category))); doc.add(new TextField("title", title, Field.Store.YES)); doc.add(new Field("title2", title.toLowerCase(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS, Field.TermVector.WITH_POSITIONS_OFFSETS)); String[] authors = author.split(","); for (String a : authors) { doc.add(new Field("author", a, Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); } doc.add(new Field("url", url, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("subject", subject, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); doc.add(new IntField("pubmonth", Integer.parseInt(pubmonth), Field.Store.YES)); doc.add(new NumericDocValuesField("pubmonth", Integer.parseInt(pubmonth))); Date d = null; try { d = DateTools.stringToDate(pubmonth); } catch (ParseException pe) { throw new RuntimeException(pe); } doc.add(new IntField("pubmonthAsDay", (int) (d.getTime() / (1000 * 3600 * 24)), Field.Store.YES)); for (String text : new String[] { title, subject, author, category }) { doc.add(new Field("contents", text, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); } return doc; } }
不要問我為什么上面創建索引還要用已經提示快要被廢棄了的Field類呢,我會告訴你:我任性!!!不要在意那些細節,我只是想變著花樣玩玩。其實就是讀取data文件夾下的所有properties文件然后讀取文件里的數據寫入索引。我待會兒會在底下附件里上傳測試用的properties數據文件。
然后就是編寫測試類進行測試:
package com.yida.framework.lucene5.sort; import java.io.IOException; import java.io.PrintStream; import java.nio.file.Paths; import java.text.DecimalFormat; import org.apache.commons.lang.StringUtils; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.SortField.Type; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; public class SortingExample { private Directory directory; public SortingExample(Directory directory) { this.directory = directory; } public void displayResults(Query query, Sort sort) throws IOException { IndexReader reader = DirectoryReader.open(directory); IndexSearcher searcher = new IndexSearcher(reader); //searcher.setDefaultFieldSortScoring(true, false); //Lucene5.x把是否評分的兩個參數放到方法入參里來進行設置 //searcher.search(query, filter, n, sort, doDocScores, doMaxScore); TopDocs results = searcher.search(query, null, 20, sort,true,false); System.out.println("\nResults for: " + query.toString() + " sorted by " + sort); System.out .println(StringUtils.rightPad("Title", 30) + StringUtils.rightPad("pubmonth", 10) + StringUtils.center("id", 4) + StringUtils.center("score", 15)); PrintStream out = new PrintStream(System.out, true, "UTF-8"); DecimalFormat scoreFormatter = new DecimalFormat("0.######"); for (ScoreDoc sd : results.scoreDocs) { int docID = sd.doc; float score = sd.score; Document doc = searcher.doc(docID); out.println(StringUtils.rightPad( StringUtils.abbreviate(doc.get("title"), 29), 30) + StringUtils.rightPad(doc.get("pubmonth"), 10) + StringUtils.center("" + docID, 4) + StringUtils.leftPad( scoreFormatter.format(score), 12)); out.println(" " + doc.get("category")); // out.println(searcher.explain(query, docID)); } System.out.println("\n**************************************\n"); reader.close(); } public static void main(String[] args) throws Exception { String indexdir = "C:/lucenedir"; Query allBooks = new MatchAllDocsQuery(); QueryParser parser = new QueryParser("contents",new StandardAnalyzer()); BooleanQuery query = new BooleanQuery(); query.add(allBooks, BooleanClause.Occur.SHOULD); query.add(parser.parse("java OR action"), BooleanClause.Occur.SHOULD); Directory directory = FSDirectory.open(Paths.get(indexdir)); SortingExample example = new SortingExample(directory); example.displayResults(query, Sort.RELEVANCE); example.displayResults(query, Sort.INDEXORDER); example.displayResults(query, new Sort(new SortField("category", Type.STRING))); example.displayResults(query, new Sort(new SortField("pubmonth", Type.INT, true))); example.displayResults(query, new Sort(new SortField("category", Type.STRING), SortField.FIELD_SCORE, new SortField( "pubmonth", Type.INT, true))); example.displayResults(query, new Sort(new SortField[] { SortField.FIELD_SCORE, new SortField("category", Type.STRING) })); directory.close(); } }
理解清楚了我上面說的那些知識點,我想這些測試代碼你們應該看得懂,不過我還是要提醒一點,在new Sort對象時,可以傳入多個SortField來支持多域排序,比如:
new Sort(new SortField("category", Type.STRING), SortField.FIELD_SCORE, new SortField( "pubmonth", Type.INT, true))
表示先按category域按字符串升序排,再按評分降序排,接著按pubmonth域進行數字比較后降序排, 一句話,域的排序順序跟你StoreField定義的先后順序保持一致。注意Sort的默認排序行為。
下面是運行后的打印結果,你們請對照這打印結構和代碼多理解醞釀下吧:
Results for: *:* (contents:java contents:action) sorted by <score> Title pubmonth id score Ant in Action 200707 6 1.052735 /technology/computers/programming Lucene in Action, Second E... 201005 9 1.052735 /technology/computers/programming Tapestry in Action 200403 11 0.447534 /technology/computers/programming JUnit in Action, Second Ed... 201005 8 0.429442 /technology/computers/programming A Modern Art of Education 200403 0 0.151398 /education/pedagogy Imperial Secrets of Health... 199903 1 0.151398 /health/alternative/chinese Lipitor, Thief of Memory 200611 2 0.151398 /health Nudge: Improving Decisions... 200804 3 0.151398 /health Tao Te Ching 道德經 200609 4 0.151398 /philosophy/eastern G?del, Escher, Bach: an Et... 199905 5 0.151398 /technology/computers/ai Mindstorms: Children, Comp... 199307 7 0.151398 /technology/computers/programming/education Extreme Programming Explained 200411 10 0.151398 /technology/computers/programming/methodology The Pragmatic Programmer 199910 12 0.151398 /technology/computers/programming ************************************** Results for: *:* (contents:java contents:action) sorted by <doc> Title pubmonth id score A Modern Art of Education 200403 0 0.151398 /education/pedagogy Imperial Secrets of Health... 199903 1 0.151398 /health/alternative/chinese Lipitor, Thief of Memory 200611 2 0.151398 /health Nudge: Improving Decisions... 200804 3 0.151398 /health Tao Te Ching 道德經 200609 4 0.151398 /philosophy/eastern G?del, Escher, Bach: an Et... 199905 5 0.151398 /technology/computers/ai Ant in Action 200707 6 1.052735 /technology/computers/programming Mindstorms: Children, Comp... 199307 7 0.151398 /technology/computers/programming/education JUnit in Action, Second Ed... 201005 8 0.429442 /technology/computers/programming Lucene in Action, Second E... 201005 9 1.052735 /technology/computers/programming Extreme Programming Explained 200411 10 0.151398 /technology/computers/programming/methodology Tapestry in Action 200403 11 0.447534 /technology/computers/programming The Pragmatic Programmer 199910 12 0.151398 /technology/computers/programming ************************************** Results for: *:* (contents:java contents:action) sorted by <string: "category"> Title pubmonth id score A Modern Art of Education 200403 0 0.151398 /education/pedagogy Lipitor, Thief of Memory 200611 2 0.151398 /health Nudge: Improving Decisions... 200804 3 0.151398 /health Imperial Secrets of Health... 199903 1 0.151398 /health/alternative/chinese Tao Te Ching 道德經 200609 4 0.151398 /philosophy/eastern G?del, Escher, Bach: an Et... 199905 5 0.151398 /technology/computers/ai Ant in Action 200707 6 1.052735 /technology/computers/programming JUnit in Action, Second Ed... 201005 8 0.429442 /technology/computers/programming Lucene in Action, Second E... 201005 9 1.052735 /technology/computers/programming Tapestry in Action 200403 11 0.447534 /technology/computers/programming The Pragmatic Programmer 199910 12 0.151398 /technology/computers/programming Mindstorms: Children, Comp... 199307 7 0.151398 /technology/computers/programming/education Extreme Programming Explained 200411 10 0.151398 /technology/computers/programming/methodology ************************************** Results for: *:* (contents:java contents:action) sorted by <int: "pubmonth">! Title pubmonth id score JUnit in Action, Second Ed... 201005 8 0.429442 /technology/computers/programming Lucene in Action, Second E... 201005 9 1.052735 /technology/computers/programming Nudge: Improving Decisions... 200804 3 0.151398 /health Ant in Action 200707 6 1.052735 /technology/computers/programming Lipitor, Thief of Memory 200611 2 0.151398 /health Tao Te Ching 道德經 200609 4 0.151398 /philosophy/eastern Extreme Programming Explained 200411 10 0.151398 /technology/computers/programming/methodology A Modern Art of Education 200403 0 0.151398 /education/pedagogy Tapestry in Action 200403 11 0.447534 /technology/computers/programming The Pragmatic Programmer 199910 12 0.151398 /technology/computers/programming G?del, Escher, Bach: an Et... 199905 5 0.151398 /technology/computers/ai Imperial Secrets of Health... 199903 1 0.151398 /health/alternative/chinese Mindstorms: Children, Comp... 199307 7 0.151398 /technology/computers/programming/education ************************************** Results for: *:* (contents:java contents:action) sorted by <string: "category">,<score>,<int: "pubmonth">! Title pubmonth id score A Modern Art of Education 200403 0 0.151398 /education/pedagogy Nudge: Improving Decisions... 200804 3 0.151398 /health Lipitor, Thief of Memory 200611 2 0.151398 /health Imperial Secrets of Health... 199903 1 0.151398 /health/alternative/chinese Tao Te Ching 道德經 200609 4 0.151398 /philosophy/eastern G?del, Escher, Bach: an Et... 199905 5 0.151398 /technology/computers/ai Lucene in Action, Second E... 201005 9 1.052735 /technology/computers/programming Ant in Action 200707 6 1.052735 /technology/computers/programming Tapestry in Action 200403 11 0.447534 /technology/computers/programming JUnit in Action, Second Ed... 201005 8 0.429442 /technology/computers/programming The Pragmatic Programmer 199910 12 0.151398 /technology/computers/programming Mindstorms: Children, Comp... 199307 7 0.151398 /technology/computers/programming/education Extreme Programming Explained 200411 10 0.151398 /technology/computers/programming/methodology ************************************** Results for: *:* (contents:java contents:action) sorted by <score>,<string: "category"> Title pubmonth id score Ant in Action 200707 6 1.052735 /technology/computers/programming Lucene in Action, Second E... 201005 9 1.052735 /technology/computers/programming Tapestry in Action 200403 11 0.447534 /technology/computers/programming JUnit in Action, Second Ed... 201005 8 0.429442 /technology/computers/programming A Modern Art of Education 200403 0 0.151398 /education/pedagogy Lipitor, Thief of Memory 200611 2 0.151398 /health Nudge: Improving Decisions... 200804 3 0.151398 /health Imperial Secrets of Health... 199903 1 0.151398 /health/alternative/chinese Tao Te Ching 道德經 200609 4 0.151398 /philosophy/eastern G?del, Escher, Bach: an Et... 199905 5 0.151398 /technology/computers/ai The Pragmatic Programmer 199910 12 0.151398 /technology/computers/programming Mindstorms: Children, Comp... 199307 7 0.151398 /technology/computers/programming/education Extreme Programming Explained 200411 10 0.151398 /technology/computers/programming/methodology **************************************
寫的比較匆忙,如果有哪里沒有說清楚或說的不對的,請盡情的噴我,謝謝!
demo源碼我也會上傳到底下的附件里,你們運行測試類的時候,記得把測試用的數據文件copy到C盤下,如圖: