mysolr was born to be a fast and easy-to-use client for Apache Solr’s API and because existing Python clients didn’t fulfill these conditions.
Since version 0.5 mysolr supports Python 3 except concurrent search feature.
from mysolr import Solr
# Default connection to localhost:8080
solr = Solr()
# All solr params are supported!
query = {'q' : '*:*', 'facet' : 'true', 'facet.field' : 'foo'}
response = solr.search(**query)
# do stuff with documents
for document in response.documents:
# modify field 'foo'
document['foo'] = 'bar'
# update index with modified documents
solr.update(response.documents, commit=True)
To install mysolr from Pypi:
pip install mysolr
From source code:
python setup.py install
Concurrent search feature is only available for python 2.X because it depends on Gevent and grequests. So if you want to use this feature, you have to install it as an extra.
pip install "mysolr[async]"
Use mysolr.Solr object to connect to a Solr instance.
from mysolr import Solr
# Default connection. Connecting to http://localhost:8080/solr/
solr = Solr()
# Custom connection
solr = Solr('http://foo.bar:9090/solr/')
New in version 0.9.
You can reuse HTTP connection by using requests.Session object
from mysolr import Solr
import requests
session = requests.Session()
solr = Solr('http://localhost:8983/solr/collection1', make_request=session)
New in version 0.9.
Using a requests.Session object allows you to connect to servers secured with HTTP basic authentication as follows:
from mysolr import Solr
import requests
session = requests.Session()
session.auth = ('admin', 'admin')
solr = Solr('http://localhost:8983/solr/collection1', make_request=session)
New in version 0.8.
Solr 4.0 changed a bit the api so, Solr object will guess the solr server version by making a request. You can manually set the solr version with the paremeter version
from mysolr import Solr
# Default connection. Connecting to a solr 4.X server
solr = Solr(version=4)
Making a query to Solr is very easy, just call search method with your query.
from mysolr import Solr
solr = Solr()
# Search for all documents
response = solr.search(q='*:*')
# Get documents
documents = response.documents
Besides, all available Solr query params are supported. So making a query using pagination would be as simple as
from mysolr import Solr
solr = Solr()
# Get 10 documents
response = solr.search(q='*:*', rows=10, start=0)
Some parameters contain a period. In those cases you have to use a dictionary to build the query:
from mysolr import Solr
solr = Solr()
query = {'q' : '*:*', 'facet' : 'true', 'facet.field' : 'foo'}
response = solr.search(**query)
Sometimes specifying a HTTP parameter multiple times is needed. For instance when faceting by several fields. Use a list in that case.:
from mysolr import Solr
solr = Solr()
query = {'q' : '*:*', 'facet' : 'true', 'facet.field' : ['foo', 'bar']}
response = solr.search(**query)
The typical concept of cursor in relational databases is also implemented in mysolr.
from mysolr import Solr
solr = Solr()
cursor = solr.search_cursor(q='*:*')
# Get all the documents
for response in cursor.fetch(100):
# Do stuff with the current 100 documents
pass
This is a query example using facets with mysolr.
from mysolr import Solr
solr = Solr()
# Search for all documents facets by field foo
query = {'q' : '*:*', 'facet' : 'true', 'facet.field' : 'foo'}
response = solr.search(**query)
# Get documents
documents = response.documents
# Get facets
facets = response.facets
Facets are parsed and can be accessed by retrieving facets attribute from the SolrResponse object. Facets look like this:
{
'facet_dates': {},
'facet_fields': {'foo': OrderedDict[('value1', 2), ('value2', 2)]},
'facet_queries': {},
'facet_ranges': {}
}
Ordered dicts are used to store the facets because order matters.
In any case, if you don’t like how facets are parsed you can use raw_content attribute which contains the raw response from solr.
This is an example of a query that uses the spellcheck component.
from mysolr import Solr
solr = Solr()
# Spell check query
query = {
'q' : 'helo wold',
'spellcheck' : 'true',
'spellcheck.collate': 'true',
'spellcheck.build':'true'
}
response = solr.search(**query)
Spellchecker results are parsed and can be accessed by getting the spellcheck attribute from the SolrResponse object.:
{'collation': 'Hello world',
'correctlySpelled': False,
'suggestions': {
'helo': {'endOffset': 4,
'numFound': 1,
'origFreq': 0,
'startOffset': 0,
'suggestion': [{'freq': 14,
'word': 'hello'}]},
'wold': {'endOffset': 9,
'numFound': 1,
'origFreq': 0,
'startOffset': 5,
'suggestion': [{'freq': 14, 'word': 'world'}]}}}
stats attribute is just a shortcut to stats result. It is not parsed and has the format sent by Solr.
Like stats, highlighting is just a shortcut.
As mysolr is using requests, it is posible to make concurrent queries thanks to grequest
from mysolr import Solr
solr = Solr()
# queries
queries = [
{
'q' : '*:*'
},
{
'q' : 'foo:bar'
}
]
# using 10 threads
responses = solr.async_search(queries, size=10)
See installation section for further information about how to install this feature.
from mysolr import Solr
solr = Solr()
# Create documents
documents = [
{'id' : 1,
'field1' : 'foo'
},
{'id' : 2,
'field2' : 'bar'
}
]
# Index using json is faster!
solr.update(documents, 'json', commit=False)
# Manual commit
solr.commit()
How to copy all documents from one solr server to another.
from mysolr import Solr
PACKET_SIZE = 5000
solr_source = Solr('http://server1:8080/solr/')
solr_target = Solr('http://server2:8080/solr/')
cursor = solr_source.search_cursor(q='*:*')
for resp in cursor.fetch(PACKET_SIZE):
source_docs = resp.documents
solr_target.update(source_docs)
Acts as an easy-to-use interface to Solr.
Asynchronous search using async module from requests.
Parameters: |
|
---|
Sends a commit message to Solr.
Parameters: |
|
---|
Sends an ID delete message to Solr.
Parameters: | commit – If True, sends a commit message after the operation is executed. |
---|
Sends a query delete message to Solr.
Parameters: | commit – If True, sends a commit message after the operation is executed. |
---|
Gets solr system status.
Check if a Solr server is up using ping call
Implements convenient access to Solr MoreLikeThis functionality
Please, visit http://wiki.apache.org/solr/MoreLikeThis to learn more about MLT configuration and common parameters.
There are two ways of using MLT in Solr:
Parameters: |
|
---|
Sends an optimize message to Solr.
Parameters: |
|
---|
Ping call to solr server.
Sends a rollback message to Solr server.
Queries Solr with the given kwargs and returns a SolrResponse object.
Parameters: |
|
---|
Sends an update/add message to add the array of hashes(documents) to Solr.
Parameters: |
|
---|
Parse solr response and make it accesible.
Tries to extract an error message from a SolrResponse body content.
Useful for error identification (e.g.: indexation errors)
Tries to parse the raw content to know if its a structured results response or an unstructured HTML page (usually resulting from an error)
Parse facets.
Parse spellcheck result into a more readable format.
Implements the concept of cursor in relational databases
Generator method that grabs all the documents in bulk sets of ‘rows’ documents
Parameters: | rows – number of rows for each request |
---|
We would like to thank the following developers their work and inspiration: