实践(1) 关于solr-core pom和schema部分特性解读
本部分是入口,对pom依赖是工程的启动点,对schema的认识是应用的起点。
注意: pom 中配置信息要完全正确,另外,依赖的工程要求一一正确,后面才可以执行
build.bat > mvnlog.txt //mvn输出的控制台结果直接输出到 mvnlog.txt文件中
子工程pom中依赖的jar 版本会从父pom中寻找,同时除了版本不一样外,其他的信息要保持一致
(1)solr-core
pom 文件参考:http://repo1.maven.org/maven2/org/apache/solr/solr-core/4.0.0/solr-core-4.0.0.pom
版本信息实例参考如下,
注意:
--》版本号有比较强的依赖,例如jetty server的8的才可以,6、7、9的都不行。
--》schema的结构与性能的配合
<dependencies>
<dependency>
<groupId>com.taobao.terminator</groupId>
<artifactId>terminator-solrj</artifactId>
<version>4.0.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.10</version>
<type>jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>easymock</groupId>
<artifactId>easymock</artifactId>
<version>2.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-test-framework</artifactId>
<version>4.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-test-framework</artifactId>
<version>4.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-highlighter</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>4.0.0</version>
</dependency>
<!--new addby -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-codecs</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-memory</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-misc</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queries</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-spatial</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-suggest</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-grouping</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.6</version>
</dependency>
<!--solr clould zkcli need this jar-->
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.2</version>
</dependency>
<dependency>
<groupId>commons-fileupload</groupId>
<artifactId>commons-fileupload</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
<version>1.6.4</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-jdk14</artifactId>
<version>1.6.4</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.3</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>r05</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
<optional>true</optional><!-- Only used for tests and one command-line utility: JettySolrRunner -->
<!-- <version>9.0.0.M4</version> -->
<!-- <version>7.0.0.RC4</version> -->
<version>8.1.2.v20120308</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
<optional>true</optional><!-- Only used for tests and one command-line utility: JettySolrRunner -->
<!-- <version>9.0.0.M4</version> -->
<!-- <version>7.0.0.RC4</version> -->
<version>8.1.2.v20120308</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-webapp</artifactId>
<optional>true</optional><!-- Only used for tests and one command-line utility: JettySolrRunner -->
<!-- <version>9.0.0.M4</version> -->
<!-- <version>7.0.0.RC4</version> -->
<version>8.1.2.v20120308</version>
</dependency>
<dependency>
<groupId>org.codehaus.woodstox</groupId>
<artifactId>wstx-asl</artifactId>
<scope>runtime</scope>
<version>4.0.6</version>
<exclusions>
<exclusion><groupId>stax</groupId>
<artifactId>stax-api</artifactId>
</exclusion></exclusions>
</dependency>
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
<version>2.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.2.1</version>
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpmime</artifactId>
<version>4.2.1</version>
</dependency>
</dependencies>
(2)solr-schema
注意:
schema version信息与老版本的升级、bytes类型的发挥、boolean类型的发挥、int pint tint和区间和排序的问题、地理数据结构、全语言分词的支持、随机域的使用
this schema includes many optional features and should not be used for benchmarking.
To improve performance one could
- set stored="false" for all fields possible (esp large fields) when you only need to search on the field
but don't need to return the original value.
- set indexed="false" if you don't need to search on the field, but only
return the field as a result of searching on other indexed fields.
- remove all unneeded copyField statements
- for best index size and searching performance, set "index" to false for all general text fields,
use copyField to copy them to the catchall "text" field, and use that for searching.
- For maximum indexing performance, use the StreamingUpdateSolrServer java client.
- Remember to run the JVM in server mode, and use a higher logging level
that avoids logging every request
attribute "name" is the name of this schema and is only used for display purposes.
version="x.y" is Solr's version number for the schema syntax and
semantics. It should not normally be changed by applications.
1.0: multiValued attribute did not exist, all fields are multiValued
by nature
1.1: multiValued attribute introduced, false by default
1.2: omitTermFreqAndPositions attribute introduced, true by default
except for text fields.
1.3: removed optional field compress feature
1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser
behavior when a single string produces multiple tokens. Defaults
to off for version >= 1.4
1.5: omitNorms defaults to true for primitive field types
(int, float, boolean, string...)
2. sortMissingLast 和 sortMissingFirst对的属性之一。只在对该进行排序时,才起作用。
sortMissingLast = true时,那些在该上没有值的documents将被排在那些在该上有值的documents之后。
sortMissingFirst = true时的情况正好相反。
如果两者都设为false,则使用Lucene的排序。
3. positionIncrementGap=100 只对multiValue = true 的fieldType有意义。
<!--
Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.
-->
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<!--
Numeric field types that index each value at various levels of precision
to accelerate range queries when the number of values between the range
endpoints is large. See the javadoc for NumericRangeQuery for internal
implementation details.
Smaller precisionStep values (specified in bits) will lead to more tokens
indexed per value, slightly larger index size, and faster range queries.
A precisionStep of 0 disables indexing at different precision levels.
-->
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
<!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
<fieldtype name="binary" class="solr.BinaryField"/>
<!--
Note:
These should only be used for compatibility with existing indexes (created with lucene or older Solr versions).
Use Trie based fields instead. As of Solr 3.5 and 4.x, Trie based fields support sortMissingFirst/Last
Plain numeric field types that store and index the text
value verbatim (and hence don't correctly support range queries, since the
lexicographic ordering isn't equal to the numeric ordering)
-->
<fieldType name="pint" class="solr.IntField"/>
<fieldType name="plong" class="solr.LongField"/>
<fieldType name="pfloat" class="solr.FloatField"/>
<fieldType name="pdouble" class="solr.DoubleField"/>
<fieldType name="pdate" class="solr.DateField" sortMissingLast="true"/>
<!-- The "RandomSortField" is not used to store or search any
data. You can declare fields of this type it in your schema
to generate pseudo-random orderings of your docs for sorting
or function purposes. The ordering is generated based on the field
name and the version of the index. As long as the index version
remains unchanged, and the same field name is reused,
the ordering of the docs will be consistent.
If you want different psuedo-random orderings of documents,
for the same version of the index, use a dynamicField and
change the field name in the request.
-->
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<!-- Arabic -->
<!-- Bulgarian -->
<!-- Catalan -->
<CJK bigram >
<!-- Czech -->
<!-- Danish -->
<!-- German -->
<!-- Greek -->
<!-- Spanish -->
<!-- Basque -->
<!-- Persian -->
<!-- Finnish -->
<!-- French -->
<!-- Irish -->
<!-- Galician -->
<!-- Hindi -->
<!-- Hungarian -->
<!-- Armenian -->
<!-- Indonesian -->
<!-- Italian -->
<!-- Latvian -->
<!-- Dutch -->
<!-- Norwegian -->
<!-- Portuguese -->
<!-- Romanian -->
<!-- Russian -->
<!-- Swedish -->
<!-- Thai -->
<!-- Turkish -->