(7) 得分因子是可以调整的,但是得分因子的增加、得分公式的扩展,无法直接从solr配置插入。----但是,可以扩展lucene的代码或者参数 spanquery,重新一个query,插入solr,这样工作量稍大.另外,社区提供了bm25、pagerank等排序batch,对lucene 有所以了解后,就可以直接引用了。
(16) 在排序上,对与去重或者对应基于时间动态性上,还没有现成的支持。去重是指排序的前几条结果,可能某个域值完全相同了,或者某几个域值完全相同,导致看起来,靠前的结果带有一些关联字段的“聚集性”,对有些应用来说,并不是最好的。 在时间因素上动态性,也没有直接支持,也只能靠间接的按时间排序来实现。
这个问题其实不是lucene、solr要关注的吧,应该是应用的特殊性导致的吧。
配置方法
全局配置 schema.xml
Similarity
A (global) declaration can be used to specify a custom Similarity implementation that you want Solr to use when dealing with your index. A Similarity can be specified either by referring directly to the name of a class with a no-arg constructor...
<similarity class="org.apache.lucene.search.similarities.DefaultSimilarity"/>
...or by referencing a SimilarityFactory implementation, which may take optional init params....
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">P</str>
<str name="afterEffect">L</str>
<str name="normalization">H2</str>
<float name="c">7</float>
</similarity>
Begining with Solr4.0, Similarity factories such as SchemaSimilarityFactory can also support specifying specific Similarity implementations on individual field types...
<types>
<fieldType name="text_dfr" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">I(F)</str>
<str name="afterEffect">B</str>
<str name="normalization">H2</str>
</similarity>
</fieldType>
<fieldType name="text_ib" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<similarity class="solr.IBSimilarityFactory">
<str name="distribution">SPL</str>
<str name="lambda">DF</str>
<str name="normalization">H2</str>
</similarity>
</fieldType>
...
</types>
<similarity class="solr.SchemaSimilarityFactory"/>
If no (global) is configured in the schema.xml file, an implicit instance ofDefaultSimilarityFactory is used.
问题和需求
By DefaultComputerValue
By CustomScore, By DefaultComputerValue
CustomScore*fa + DefaultComputerValue* fb
Doc1 10\100 10*0.8+ 100*0.2=28
Doc2 1\99 1*0.8 + 99 *0.2 =20.6
Doc3 3\98 3*0.8+ 98* 0.2 =22
Doc4 20\50 20*0.8+ 50*0.2=36
Solr3.4.0 得分代码分析
abstract class SimilarityFactory
成员变量 public abstract Similarity getSimilarity();
Payload问题
http://wiki.apache.org/lucene-java/Payloads
Scoring payloads involves overriding the Similarity.scorePayload() method. For example, if one has implemented storing a Float payload, it could be used for scoring in the following way:
public float scorePayload(byte [] payload, int offset, int length) {
assert length == 4;
int accum = ((payload[0+offset]&0xff)) |
((payload[1+offset]&0xff)<<8) |
((payload[2+offset]&0xff)<<16) |
((payload[3+offset]&0xff)<<24);
return Float.intBitsToFloat(accum);
}
Don't forget to activate your Similarity implementation using IndexSearcher.setSimilarity(). Also, note that even then not all queries will actually make use of your method. For example, you will need to use BoostingTermQuery instead of TermQuery. QueryParser currently (Lucene 2.3.2) always uses TermQuery and you will need to extend QueryParser and overwrite getFieldQuery().
Note, that is just one possible way of scoring a payload. Payloads are application specific. For example payload Token Filters see the payload package in the contrib/Analysis module.
Custom sort (score + custom value)
http://grokbase.com/t/lucene/solr-user/08b25j6ked/custom-sort-score-custom-value
Hi,
I want to implement a custom sort in Solr based on a combination of relevance (Solr gives me it yet => score) and a custom value I've calculated previously for each document. I see two options:
1. Use a function query (I'm using a DisMaxRequestHandler).
2. Create a component that set SortSpec with a sort that has a custom
ComparatorSource (similar to QueryElevationComponent).
The first option has the problem: While the relevance value changes for
every query, my custom value is constant for each doc. It implies queries
with documents that have high relevance are less affected with my custom
value. On the other hand, queries with low relevance are affected a lot with my custom value. Can it be proportional with a function query? (i.e. docs with low relevance are less affected by my custom value).
The second option has the problem: Solr score isn't normalized. I need it normalized in order to apply my custom value in the sortValue function in ScoreDocComparator.What do you think? What's the best option in that case? Another option?
Thank you in advance,
George
BoostQParserPlugin
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/search/BoostQParserPlugin.html
org.apache.solr.search
Class BoostQParserPlugin
http://stackoverflow.com/questions/3035831/solr-lucene-scorer
Scorer are parts of lucene Queries via the 'weight' query method.
In short, the framework calls Query.weight(..).scorer(..) . Have a look at
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Query.html
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Weight.html
http://lucene.apache.org/jva/2_4_0/api/org/apache/lucene/search/Scorer.html
To use your own Query class in Solr, you'll need to implement your own solr QueryParserPlugin that uses your own QParser that generates your previously implemented lucene Query. You then can use it in Solr specified here:
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin
This part on implementation should stay simple as this is just some glueing code.
Enjoy hacking Solr!
share|improve this answer
answered Jun 14 '10 at 10:33
You can override the logic solr scorer uses. Solr uses DefaultSimilarity class for scoring. 1) make a class extending DefaultSimilarity. 2) override the functions tf(), idf() etc according to your need.
public class CustomSimilarity extends DefaultSimilarity {
public CustomSimilarity() {
super();
}
public float tf(int freq) {
//your code
return (float) 1.0;
}
public float idf(int docFreq, int numDocs) {
//your code
return (float) 1.0;
}
}
3) After creating a class compile and make a jar. 4) put the jar in lib folder of corresponding index or core. 5) Change the schema.xml of corresponding index .CustomSimilarity"/>
You can check out various factors affecting score here
For your requirement you can create buckets if your score is in specific range. Also read about field boosting, document boosting etc. That might be helpful in your case.
http://stackoverflow.com/questions/11748487/how-can-i-filter-solr-results-by-custom-score
How can I filter SOLR results by custom score
I'm using solr function queries to generate my own custom score. I achieve this using something along these lines:
q=_val_:"my_custom_function()"
This populates the score field as expected, but it also includes documents that score 0. I need a way to filter the results so that scores below zero are not included.
I realize that I'm using score in a non-standard way and that normally the score that lucene/solr produce is not absolute. However, producing my own score works really well for my needs.
I've tried using {!frange l=0} but this causes the score for all documents to be "1.0".
I suspect pseudo-fields could be used, but since solr 4 is still alpha, I'm looking for a way to do it using Solr 3.1.
how can I limit by score before sorting in a solr query
I am searching "product documents". In other words, my solr documents are product records. I want to get say the top 50 matching products for a query. Then I want to be able to sort the top 50 scoring documents by name or price. I'm not seeing much on how to do this, since sorting by score, then by name or price won't really help, since scores are floats.
I wouldn't mind if I could do something like map the scores to ranges (like a score of 8.0-8.99 would go in the 8 bucket score), then sort by range, then by names, but since there is basically no normalization to scoring, this would still make things a bit harder.
Tl;dr How do I exclude low scoring documents from the solr result set before sorting? solr scoring
share|improve this question
asked Dec 7 '10 at 22:21
You can use frange to achieve this, as long as you don't want to sort on score (in which case I guess you could just do the filtering on the client side). Your query would be something along the lines of:
q={!frange l=5}query($qq)&qq=[awesome product]&sort=price asc
Set the l argument in the q-frange-parameter to the lower bound you want to filter score on, and replace the qq parameter with your user query.
answered Dec 8 '10 at 10:23
Karl Johansson
1,046310
thanks, since I can get a reasonable frange from the first time the results are displayed sorted by score alone, this works great! – Zak Dec 9 '10 at 18:40
I don't think you can simply exclude low scoring documents from the solr result set before sorting
because the relevance score is only meaningful for a given combination of search query and resulting document list. I.e. scores are only meaningful within a given search and you cannot set some threshold for all searches.
If you were using Java (or PHP) you could get the top 50 documents and then re-sort this list in your programming language but I don't think you can do it with just SOLR.
Anyway, I would recommend you don't go down this route of re-sorting the results from SOLR, as it will simply confuse the user. People expect search results to be like Google (and most other search engines), where results come back in some form of TFIDF ranking.
Having said that, you could use some other criteria to separate documents with the same relevance scores by adding an index-time boost factor based on a price range scale.
I'd suggest you use SOLR to its strengths and use facets. Provide a price range facet on the left (like Ebay, Amazon, et al.) and/or a product category facet, etc. Also provide a "sort" widget to allow the results to be sorted by product name, if the user wants it.
[EDIT] this question might also be useful:
Digg-like search result ranking with Lucene / Solr ?
As observed by Karl Johansson, you could do the filtering on the client side: load the first 50 rows of the response (sorted by score desc) and then manipulate them in JS for example.
The jQuery DataTables plugin works fantastically for that kind of thing: sorting, sorting on multiple columns, dynamic filtering, etc. -- and with only 50 rows it would be very fast too, so that users can "play" with the sorting and filtering until they find what they want.
Score filter
http://lucene.472066.n3.nabble.com/score-filter-td493438.html
Hello, Is there a way to set a score filter? I tried "+score:[1.2 TO *]" but it did not work. |
What's the motivation for wanting to do this? The reason I ask, is score is a relative thing determined by Lucene based on your index statistics. It is only meaningful for comparing the results of a specific query with a specific instance of the index. In other words, it isn't useful to filter on b/c there is no way of knowing what a good cutoff value would be. So, you won't be able to do score:[1.2 TO *] because score is a not an actual Field.
That being said, you probably could implement a HitCollector at the Lucene level and somehow hook it into Solr to do what you want. Or, of course, just stop processing the results in your app after you see a score below a certain value. Naturally, this still means you have to retrieve the results.
Re: score filter
In my case, for example searching a book. Some of the returned documents are with high relevance (score > 3), but some of document with low score (<0.01) are useless.
Without a "score filter", I have to go through each document to find out the number of documents I'm interested (score > nnn). This causes some problem for pagination. For example if I only need to display the first 10 records I need to retrieve all 1000 documents to figure out the number of meaningful documents which have score > nnn.
Thx,
Kevin
What's the motivation for wanting to do this? The reason I ask, is score is a relative thing determined by Lucene based on your index statistics. It is only meaningful for comparing the results of a specific query with a specific instance of the index. In other words, it isn't useful to filter on b/c there is no way of knowing what a good cutoff value would be. So, you won't be able to do score:[1.2 TO *] because score is a not an actual Field.
That being said, you probably could implement a HitCollector at the Lucene level and somehow hook it into Solr to do what you want. Or, of course, just stop processing the results in your app after you see a score below a certain value. Naturally, this still means you have to retrieve the results.
-Grant
Re: score filter
At what point do you draw the line? 0.01 is too low, but what about 0.5 or 0.3? In fact, there may be queries where 0.01 is relevant.
Relevance is a tricky thing and putting in arbitrary cutoffs is usually not a good thing. An alternative might be to instead look at the difference between scores and see if the gap is larger than some delta, but even that is subject to the vagaries of scoring.
What kind of relevance testing have you done so far to come up with
those values? See also http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/
Re: score filter
Just did some research. It seems that it's doable with additional code added to Solr but not out of box. Thank you, Grant.
At what point do you draw the line? 0.01 is too low, but what about 0.5 or 0.3? In fact, there may be queries where 0.01 is relevant.
Relevance is a tricky thing and putting in arbitrary cutoffs is usually not a good thing. An alternative might be to instead look at the difference between scores and see if the gap is larger than some delta, but even that is subject to the vagaries of scoring.
What kind of relevance testing have you done so far to come up with those values? See also http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/
Re: score filter
Don't bother doing this. It doesn't work.
This seems like a good idea, something that would be useful for almost every Lucene installation, but it isn't in Lucene because it does not work in the real world.
A few problems:
* Some users want every match and don't care how many pages of results they look at.
* Some users are very bad at creating queries that match their information needs. Others are merely bad, not very bad. The good matches for their query are on top, but the good matches for
their information need are on the third page.
* Misspellings can put the right match (partial match) at the bottom. I did this yesterday at my library site, typeing "Katherine Kerr" instead of the correct "Katharine Kerr".
Their search engine showed no matches (grrr), so I had to search again with "Kerr".
* Most users do not know how to repair their queries, like I did with "Katherine Kerr", changing it to "Kerr". Even if they do, you shouldn't make them. Just show the weakly relevant results.
* Documents have errors, just like queries. I find bad data on our site about once a month, and we have professional editors. We still haven't fixed our entry for "Betty Page" to read "Bettie Page".
* People may use non-title words in the query, like searching for "batman" when they want "The Dark Knight".
So, don't do this. If you are forced to do it, make sure that you measure your search quality before and after it is implemented, because it will get worse. Then you can stop doing it.
wunder
Re: score filter
+1. Of course it is doable, but that doesn't mean you should, which is what I was trying to say before, (but was typing on my iPod so it wasn't fast) and which Walter has done so. It is entirely conceivable to me that someone could search for a very common word such that the score of all relevant (and thus, "good") documents are below your predefined threshold.
At any rate, proceed at your own peril. To implement it, look into the SearchComponent functionality.
Re: score filter
Hello Grant,
I need to frame a query that is a combination of two query parts and I use a 'function' query to prepare the same. Something like:
q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1))
where $uq and $cq are two queries.
Now, I want a search result returned only if I get a hit on $uq. So, I specify default value of $uq query as 0.0 in order for the final score to be zero in cases where $uq doesn't record a hit. Even though, the scoring works as expected (i.e, document that don't match $uq have a score of zero), all the documents are returned as search results. Is there a way to filter search results that have a score of zero?
Thanks for your help,
Debdoot
Re: score filter
: I need to frame a query that is a combination of two query parts and I use a 'function' query to prepare the same. Something like:
: q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1))
: where $uq and $cq are two queries.
:
: Now, I want a search result returned only if I get a hit on $uq. So, I specify default value of $uq query as 0.0 in order for the final score to be zero in cases where $uq doesn't record a hit. Even though, the scoring works as expected (i.e, document that don't match $uq have a score of zero), all the documents are returned as search results. Is there a way to filter search results that have a score of zero?
a) you could wrap your query in {!frange} .. but that will make everything
that does have a value> 0.0 get the same final score
b) you could use an fq={!frange} that refers back to your original $q
c) you could just use an fq that refers directly to your $uq since that's
what you say you actaully want to filter on in the first place..
uq=...
cq=...
q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1))
fq={!v=uq}
Boost score for early matches
Solr - How to boost score for early matches?
up vote 1 down vote favorite
How can I boost the score for documents in which my query matches a particular field earlier. For example, searching for "super man" should give "super man returns" a higher score than "there is my super man". Is this possible?
Uh, store the first few words explicitly in another field, and boost matches on this field. – aitchnyu Aug 22 at 9:45
The problem there is that the size of the query can vary from say 3 characters to say 100 characters, and so determining how many words/chars to index separately can be difficult. – techfoobar Aug 22 at 9:49
Secondly, suppose i index the first 25 characters, and one record has "my super man blah.." and another record has "super man returns blah.." - both will match the query "super man" and both will be boosted when i boost this secondary field. – techfoobar Aug 22 at 9:50
2 Answers
Thank you for the answer. But i solved it today by using the approach i've outlined in my answer. – techfoobar Aug 22 at 18:33
But this is not going to work if the words do not occur at the very start. May want to check out payloads as well where u can add index time suggestions as laid down in the second option. – Jayendra Aug 22 at 18:35
Will check that out as well. However, the current solution can be made to work to a large extent by fine tuning the ps parameter to make it more lenient. I currently use 2 (dist between 2 terms in the pf) and it seems to be working quite well for my medium sized data set (1000s of records, greatly varying in content). Will check out your point and let you know if it helped. – techfoobar Aug 22 at 18:38
up vote 0 down vote accepted Solved it myself after reading a LOT about this online. What specifically helped me was a reply on nabble which goes like (I used dismax, so explaining that here):
• Create a separate field named say 'nameString' which stores the value as "START "
• Change the search query to "START "
• Add the new field nameString as one of the fields to look in in the query fields param (qf)
• While searching use the parameter pf (phrase field) as the new field nameString with a phrase slop of 1 or 2 (lower values would mean stricter searching)
Your final query params will be something like:
q=_START_
defType=dismax
qf=name nameString
pf=nameString
ps=2
Solr: How can I get all documents ordered by score with a list of keywords?
I have a Solr 3.1 database containing Emails with two fields:
• datetime
• text
For the query I have two parameters:
• date of today
• keyword array("important thing", "important too", "not so important, but more than average")
Is it possible to create a query to
1. get ALL documents of this day AND
2. sort them by relevancy by ordering them so that the email with contains most of my keywords(important things) scores best?
The part with the date is not very complicated:
fq=datetime[YY-MM-DDT00:00:00.000Z TO YY-MM-DDT23:59:59.999Z]
I know that you can boost the keywords this way:
q=text:"first keyword"^5 OR text:"second one"^2 OR text:"minus scoring"^0.5 OR text:"*"
But how do I only use the keywords to sort this list and get ALL entries instead of doing a realy query and get only a few entries back?
Thanks for help!
2 Answers
You need to specify your terms in the main query and then change your date query to be a filter query on these results by adding the following.
fq=datetime[YY-MM-DDT00:00:00.000Z TO YY-MM-DDT23:59:59.999Z]
So you should have something like this:
q=&fq=datetime[YY-MM-DDT00:00:00.000Z TO YY-MM-DDT23:59:59.999Z]
Edit: A little more about filter queries (as suggested by rfreak).
From Solr Wiki - FilterQuery Guidance - "Now, what is a filter query? It is simply a part of a query that is factored out for special treatment. This is achieved in Solr by specifying it using the fq (filter query) parameter instead of the q (main query) parameter. The same result could be achieved leaving that query part in the main query. The difference will be in query efficiency. That's because the result of a filter query is cached and then used to filter a primary query result using set intersection."
These should be sorted by relevancy score already, that is just the default behavior of Solr. You can see the score by adding that field.
fl=*,score
If you use the Full Interface for Make A Query on the Admin Interface on your Solr installation at http:////admin/form.jspyou will see where you can specify the filter query, fields, and other options. You can check out the Solr Wiki for more details on the options and how they are used.
I hope that this helps you.
+! The filter query is an excellent suggestion. You may consider adding a bit about the advantage of using the filter query there. – rfeak May 27 '11 at 14:55
Thank you! The filter query is working as expected. But unfortunately I still dont know how to handle the keywords because they filter the emails instead of only sort them. – Daniel May 27 '11 at 16:06
Sorting by relevance is default behavior on solr/lucene.If your results are unsatisfied, try to put the keywords in quotes
//Edit: Folowing the answer from Paige Cook, use somethink like that
q="important thing"&fq=datetime[YY-MM-DDT00:00:00.000Z TO YY-MM-DDT23:59:59.999Z]
//2. nd update. By thinking about this answer: quotes are not an good idea, because in this case you will only receive "important thing" mails, but no "important too"
The Point is: what keywords you are using. Because: searching for -- important thing -- results in the highest scores for "important thing" mails. But lucene does not know, how to score "important too" or "not so important, but more than average" in relation to your keywords. An other idea would be searching only for "important". But the field-values "importand thing" and "importand too" gives nearly the same score values,because 50% of the searched keywords (in this key: "imported") are part of the field-value. So probably you have to change your keywords. It could work after changeing "importend to" into "also an important mail", to get the beast ratio of search-word "important" and field-value in order to score the shortest Mail-discripton to the highest value.
Thanks for your answer! You point exactly to my problem because the keywords filter the documents instead of only sorting them all an influencing the relevancy score. I do not know how to handle this. – Daniel May 27 '11 at 16:13
Was this post useful to you?
Solr changes document's score when its random field value altered
1 down vote favorite
I need to navigate forth and back in Solr results set ordered by score viewing documents one by one. To visualise that, first a list of document titles is presented to user, then he or she can click one of the title to see more details and then needs to have an opportunity to move to the next document in the original list without getting back and clicking another title.
During viewing documents get changed: their dynamic field is modified (or created is not exists yet) to mark that document has already been viewed (used in other search).
The problem I face is that when the document is altered and re-indexed to keep those changes, sometimes (and not always, which is very disturbing) its place in the results set for the same query changes (in other words, it's score changes as that doesn't happen when browsing results sorted by one of the documents' fields). So, "Previous" / "Next" navigation doesn't work properly.
I'm not using any custom weighting or boosters on fields for score calculation. Also, that dynamic field changed during browsing doesn't participate in the query used to get the record set browsed.
So, the questions are: can the modification of the document's field not included in the query change its relevance score? And if it can, then how can I control that?
UPDATE
I did some tests and can add the following:
1. Document changes its place in the result set even if no field is amended - just requesting the document and re-indexing it without any changes to its fields makes it take another place next time the same query over the same index is executed.
2. That happens even if the result set is sorted explicitly ("first_name DESC"), so score (which depends on the update date) is not involved. The document stays the same, its field result set is sorted by is the same, yet its position changes.
Still have no idea how to avoid that.
2 Answers
In Solr, if your field is "indexed", it will have an effect on the relevancy ranking ("stored" fields show up in search results but are not necessarily searchable). If the fields in question aren't marked as indexed then you are good to go. Note that "indexed" and "stored" are not necessarily the same, hence you confusion about results lists changing even though not all fields are shown (a field can be "indexed" and not "stored" as well).
In this case I think you want your "viewed" field to be "stored" but not "indexed". If you really want to control the query, you can use copyField to copy the relevant results into a single searchable field. You can also boost terms or documents so that certain fields are "less important" to the search query.
If you want to see how the relevancy rankings are calculated, you can add "debugQuery=on" to the end of your Solr Query (see the Relevancy FAQ for more info).
However, all that being said, I would recommend you cache your search result query (at least for the first page for your results), since you will always have results changing (documents added, removed by other users, etc). Your best bet is to design a UI that anticipates this, or at least batches a user's query.
Thanks, for some reason I was sure changes to fields not participating in the query don't affect the calculated score. In my case it is necessary to have this field indexed as there is another query where I need to filter documents searching only viewed or only not viewed before. Caching is also not suitable as users is supposed to navigate through the whole result set, not only through the page (well, caching still possible and to be honest bearable in terms of resources but just not elegant). I'll try to boost the field being searched and tell if that works. – Yuriy Jun 7 '11 at 7:45
Just noticed that it also happens when the results are sorted by other field than score. How that's possible? I thought if ordering is specified and score is not in the clause explicitly (say, ordering is like "first_name DESC"), it doesn't influence the ordering. However, it seems it does. How can I get rid of that? – Yuriy Jun 8 '11 at 14:11
Okay, looks like boosting works, but has no effect. If I boost the field I am searching in, all the matches are boosted equally and still the recently re-indexed documents get some delta in their relevance which makes difference. There should be a way to exclude the date of last update from the ordering completely but I can't find it yet... – Yuriy Jun 8 '11 at 14:50
I've found the solution which doesn't eliminate the problem completely but makes it much less likely to happen.
So the problem happens when the documents are sorted by some field and there is a number of them with the same value in this field (e.g. result set is sorted by first name, and there are 100 entries for "John").
This is when the indexed time gets involved - apparently Solr uses it to sort the documents when their main sorting fields are identical. To make this case much less probable, you need to add more sorting fields, e.g. "first_name desc" should become "first_name desc, last_name desc, register_date asc".
Also, adding document's unique id as the last sorting field should remove the problem completely (the set of sorting fields will never be identical for any two documents in the index).
share|improve this answer
Relevance Customization
http://lucene.472066.n3.nabble.com/Relevance-Customization-td501310.html
Hi all.
I want to know if its possible to customize the solr relevance, somehing
like this:
1 - I create a static score for each document and index it.
2 - I change the relevance to Score(Solr) + Score(Static) where the solr score is equal to 30% of the total score. Mixing the two scores into only one.
This is defferent of sorting by mine static socre and after by solr score because I don't want to kill solr score, just give it a little less importance.
There is a way to do this?
Thank's
Re: Relevance Customization
It can be done with something like q=yourQuery _val_:yourStaticScoreField
http://wiki.apache.org/solr/FunctionQuery#fieldvalue
But this adds solr score with static score. I am not sure how to get 30% of solr score. May be something like?
q=yourQuery^0.3 _val_:yourStaticScoreField^0.7
Modify SOLR scoring
Hi everybody,
I'm using SOLR with a schema (for example) like this: parutiondate, date, indexed, not stored
fulltext, stemmed, indexed, not stored
I know it's possible to order by a field or more, but I want to order by score and modify the "scrore"" formula. I'll want keep the SOLR score but add a new parameter in the formula to boost the score of the most recent document.
What is the best way to do this ?
Thanks.
Excuse for my english.
RE: modify SOLR scoring
I believe you can use a function query to do this:
http://wiki.apache.org/solr/FunctionQuery
if you embed the following in your query, you should get a boost for more recent date values:
_val_:"ord(dateField)"
Where "dateField" is the field name of the date you want to use.
Re: modify SOLR scoring
http://lucene.472066.n3.nabble.com/modify-SOLR-scoring-td497348.html
I am interested in a very similar topic like yours. I want to modify the field named "score" and the document boost but not reindex the all fields since it would take to much power.
Please let me know if you find a solution to this.
Kindly
Change order before returning data
http://stackoverflow.com/questions/4965172/change-order-before-returning-data
Is there any way to change order of result in SOLR. E.g when I query in SOLR i will get 1000 records with highest score, then in those 1000 records I will use my own function to change order again and just get 10 records of those records. I can get 1000 records and process by php or java, but I have to transfer 1000 records from SOLR server to webserver and I dont want that, I just want to get 10 records after changing order and use paging. Is SOLR support this kind of custom function?
Answers
If you function can be applied when the records are initially indexed, you can do it there and add the result as a value on the record. Then sort the result set by the precalculated value. If not, i haven't worked with it directly, but this thread seems to have the answer you're looking for
Hi My case is very special, I had preindex score in database already. Let me give one example, I have shopping site, when I search for TV LCD 32 inch, I got many result from some different branch like LG, Toshiba ... and may result for LG appear consequently I want to separate it e.g I dont want 3 results for LG sit next together, Currently I get 1000 best records (base on score) and change the order again using PHP, now I want to move this job to SOLR (I dont want transfer data to much between SOLR and Webserver, I just need 10 records to display) – user612433 Feb 11 '11 at 3:45
Yes you can create a column with the info you want to be taken into account into the score.
For ex, for a "popularity" column, your query would be:
your query && _val_:"popularity"^0.7
0.7 being the boost factor into the final score. you can also filter the result set to get less results:
your query && fq=popularity:[10 TO *]
limiting the total number of documents matched
http://search-lucene.com/m/4AHNF17wIJW1/
Re: limiting the total number of documents matched
Yonik Seeley 2010-07-17, 00:55
On Wed, Jul 14, 2010 at 5:46 PM, Paul <[EMAIL PROTECTED]> wrote:
I thought of another way to do it, but I still have one thing I don't know how to do. I could do the search without sorting for the 50th page, then look at the relevancy score on the first item on that page, then repeat the search, but add score > that relevancy as a parameter. Is it possible to do a search with "score:[5 to *]"? It didn't work in my first attempt.
frange could possible help (range query on an arbitrary function).
http://www.lucidimagination.com/blog/tag/frange/
So perhaps something like
q={!frange l=0.85}query($qq)
qq=
where 0.85 is the lower bound you want for scores and qq is the normal relevancy query
-Yonik
http://www.lucidimagination.com
On Wed, Jul 14, 2010 at 5:34 PM, Paul <[EMAIL PROTECTED]> wrote:
I was hoping for a way to do this purely by configuration and making the correct GET requests, but if there is a way to do it by creating a custom Request Handler, I suppose I could plunge into that. Would that yield the best results, and would that be particularly difficult?
>> On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against the IDs and sort by your other field. 1000 seems like a lot for that approach, but who knows until you try it on your data.
>>> -Kallin Nagelberg
>>> Subject: limiting the total number of documents matched
I'd like to limit the total number of documents that are returned for a search, particularly when the sort order is not based on relevancy.In other words, if the user searches for a very common term, they might get tens of thousands of hits, and if they sort by "title", then very high relevancy documents will be interspersed with very low relevancy documents. I'd like to set a limit to the 1000 most relevant documents, then sort those by title. Is there a way to do this?
I guess I could always retrieve the top 1000 documents and sort them in the client, but that seems particularly inefficient. I can't find any other way to do this, though.