score.es

Aim of this module is to provide automated management of the contents of an elasticsearch index and a convenient access, that integrates seamlessly with the score.db module.

Quickstart

The module will automatically detect classes with the special member __score_es__. This dictionary must contain a mapping of real member names to an elasticsearch mapping definition:

class User(Base):
    __score_es__ = {
        'name': {'type': 'string', 'index': 'not_analyzed'},
    }
    name = Column(String)

class Text(Base):
    __score_es__ = {
        'title': {'type': 'string'},
        'body': {'type': 'string', 'term_vector': 'with_offsets'},
    }
    title = Column(String(200))
    body = Column(String)

class SillyText(Text):
    pass

Create your elasticsearch index automatically after updating all models:

>>> score.es.create()

You can now use the context member es to query your index:

>>> for text in ctx.es.query(Text, 'title:dead AND title:parrot'):
...     print('{text.id}: {text.title}'.format(text=text))

Configuration

score.es.init(confdict, db, ctx=None)[source]

Initializes this module acoording to our module initialization guidelines with the following configuration keys:

args.hosts
A list of hosts (as read by score.init.parse_list()) to pass to the Elasticsearch constructor.
args.*
Any other arguments to be passed to the Elasticsearch constructor.
index score
The index to use in all operations.
keep_source False
Whether the _source field should be enabled. The default is False, since the canonical representation of all objects are to be found in the score.db database and should be retrieved from there.
ctx.member es

The name of the context member, that should be registered with the configured score.ctx module (if there is one). The default value allows one to conveniently query the index:

>>> for knight in ctx.es.query(User, 'name:sir*')
...     print(knight.name)

Details

Automatic Fields

Whenever objects of managed classes are stored in the configured database, they are also automatically added to the configured elasticsearch index. The following document properties will be added automatically:

  • _id: This is equal to the id of the object in the database
  • _type: Equal to the name of the top-most es class, i.e. text. The name is the same name defined in the database as __score_db__[‘type_name’]
  • class: A value appearing multiple times, once for each class upwards in the class hierarchy toward the top-most es class, i.e. silly_text and text. The names are those described for _type, above.
  • concrete_class: Appears only once, and corresponds to the class name of the object, i.e. silly_text only. The names are those described for _type, above.

Apart from the additional members, the following deviations from elasticsearch’s default behaviour automatically apply to all mappings:

  • _source will be disabled, since all member values will be retrieved from the database. It is possible to store single members using the field property store, if necessary. It is also possible to fall back to the default behaviour of elasticsearch and storing the _source fields by passing the appropriate init configuration.

Conversion

It is possible to provide a conversion function for member values. The function may accept either one or two parameters. The first value is always the value to convert, whereas the second value will be the object instance:

class L33tText(SillyText):
    __score_es__ = {
        'body': {'type': 'string', 'term_vector': 'with_offsets',
                 '__convert__': lambda b: b.replace('e', '3')},
    }

class VeryShortText(SillyText):
    __score_es__ = {
        'body': {'type': 'string', 'term_vector': 'with_offsets',
                 '__convert__': lambda b, text: text.title},
    }

API

score.es.init(confdict, db, ctx=None)[source]

Initializes this module acoording to our module initialization guidelines with the following configuration keys:

args.hosts
A list of hosts (as read by score.init.parse_list()) to pass to the Elasticsearch constructor.
args.*
Any other arguments to be passed to the Elasticsearch constructor.
index score
The index to use in all operations.
keep_source False
Whether the _source field should be enabled. The default is False, since the canonical representation of all objects are to be found in the score.db database and should be retrieved from there.
ctx.member es

The name of the context member, that should be registered with the configured score.ctx module (if there is one). The default value allows one to conveniently query the index:

>>> for knight in ctx.es.query(User, 'name:sir*')
...     print(knight.name)
class score.es.ConfiguredEsModule(db, es, index, keep_source)[source]

This module’s configuration class.

index

The index to operate on. This will be passed as the index keyword argument to almost every Elasticsearch function.

es

The configured elasticsearch.Elasticsearch instance. Do not forget to use the configured index value when operating on this directly.

destroy()[source]

Completely deletes the whole index.

create(destroy=True)[source]

Creates the elasticsearch index and registers all mappings. If the parameter destroy is left at its default value, the index will be destroyed first.

If the index is not deleted first, this function will raise an exception if the new mapping contradicts an existing mapping in the index.

refresh(ctx)[source]

Re-inserts every object into the lucene index. Note that this operation might take a very long time, depending on the number of objects.

insert(object_)[source]

Inserts an object_ into the index.

delete(object_)[source]

Removes an object_ from the index.

query(ctx, class_, query, *, analyze_wildcard=False, offset=0, limit=10)[source]

Executes a lucene query on the index and yields a list of objects of given class_, retrieved from the database. It is also possible to provide multiple classes, in which case the same query will be performed on multiple types at once.

The query can be provided as a string, or as a query DSL. The parameter analyze_wildcard wildcard is passed to elasticsearch.Elasticsearch.search(), whereas offset and limit are mapped to from_ and size respectively.

classes()[source]

Provides a list of top-most classes with a __score_es__ declaration.

get_es_class(object_)[source]

Returns the top-most es class of an object_, which must either be a database class, or an object thereof.