diff options
Diffstat (limited to 'doc/build/orm/session_state_management.rst')
-rw-r--r-- | doc/build/orm/session_state_management.rst | 560 |
1 files changed, 560 insertions, 0 deletions
diff --git a/doc/build/orm/session_state_management.rst b/doc/build/orm/session_state_management.rst new file mode 100644 index 000000000..1ca7ca2e4 --- /dev/null +++ b/doc/build/orm/session_state_management.rst @@ -0,0 +1,560 @@ +State Management +================ + +.. _session_object_states: + +Quickie Intro to Object States +------------------------------ + +It's helpful to know the states which an instance can have within a session: + +* **Transient** - an instance that's not in a session, and is not saved to the + database; i.e. it has no database identity. The only relationship such an + object has to the ORM is that its class has a ``mapper()`` associated with + it. + +* **Pending** - when you :meth:`~.Session.add` a transient + instance, it becomes pending. It still wasn't actually flushed to the + database yet, but it will be when the next flush occurs. + +* **Persistent** - An instance which is present in the session and has a record + in the database. You get persistent instances by either flushing so that the + pending instances become persistent, or by querying the database for + existing instances (or moving persistent instances from other sessions into + your local session). + +* **Detached** - an instance which has a record in the database, but is not in + any session. There's nothing wrong with this, and you can use objects + normally when they're detached, **except** they will not be able to issue + any SQL in order to load collections or attributes which are not yet loaded, + or were marked as "expired". + +Knowing these states is important, since the +:class:`.Session` tries to be strict about ambiguous +operations (such as trying to save the same object to two different sessions +at the same time). + +Getting the Current State of an Object +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The actual state of any mapped object can be viewed at any time using +the :func:`.inspect` system:: + + >>> from sqlalchemy import inspect + >>> insp = inspect(my_object) + >>> insp.persistent + True + +.. seealso:: + + :attr:`.InstanceState.transient` + + :attr:`.InstanceState.pending` + + :attr:`.InstanceState.persistent` + + :attr:`.InstanceState.detached` + + +Session Attributes +------------------ + +The :class:`~sqlalchemy.orm.session.Session` itself acts somewhat like a +set-like collection. All items present may be accessed using the iterator +interface:: + + for obj in session: + print obj + +And presence may be tested for using regular "contains" semantics:: + + if obj in session: + print "Object is present" + +The session is also keeping track of all newly created (i.e. pending) objects, +all objects which have had changes since they were last loaded or saved (i.e. +"dirty"), and everything that's been marked as deleted:: + + # pending objects recently added to the Session + session.new + + # persistent objects which currently have changes detected + # (this collection is now created on the fly each time the property is called) + session.dirty + + # persistent objects that have been marked as deleted via session.delete(obj) + session.deleted + + # dictionary of all persistent objects, keyed on their + # identity key + session.identity_map + +(Documentation: :attr:`.Session.new`, :attr:`.Session.dirty`, +:attr:`.Session.deleted`, :attr:`.Session.identity_map`). + +Note that objects within the session are by default *weakly referenced*. This +means that when they are dereferenced in the outside application, they fall +out of scope from within the :class:`~sqlalchemy.orm.session.Session` as well +and are subject to garbage collection by the Python interpreter. The +exceptions to this include objects which are pending, objects which are marked +as deleted, or persistent objects which have pending changes on them. After a +full flush, these collections are all empty, and all objects are again weakly +referenced. To disable the weak referencing behavior and force all objects +within the session to remain until explicitly expunged, configure +:class:`.sessionmaker` with the ``weak_identity_map=False`` +setting. + +.. _unitofwork_merging: + +Merging +------- + +:meth:`~.Session.merge` transfers state from an +outside object into a new or already existing instance within a session. It +also reconciles the incoming data against the state of the +database, producing a history stream which will be applied towards the next +flush, or alternatively can be made to produce a simple "transfer" of +state without producing change history or accessing the database. Usage is as follows:: + + merged_object = session.merge(existing_object) + +When given an instance, it follows these steps: + +* It examines the primary key of the instance. If it's present, it attempts + to locate that instance in the local identity map. If the ``load=True`` + flag is left at its default, it also checks the database for this primary + key if not located locally. +* If the given instance has no primary key, or if no instance can be found + with the primary key given, a new instance is created. +* The state of the given instance is then copied onto the located/newly + created instance. For attributes which are present on the source + instance, the value is transferred to the target instance. For mapped + attributes which aren't present on the source, the attribute is + expired on the target instance, discarding its existing value. + + If the ``load=True`` flag is left at its default, + this copy process emits events and will load the target object's + unloaded collections for each attribute present on the source object, + so that the incoming state can be reconciled against what's + present in the database. If ``load`` + is passed as ``False``, the incoming data is "stamped" directly without + producing any history. +* The operation is cascaded to related objects and collections, as + indicated by the ``merge`` cascade (see :ref:`unitofwork_cascades`). +* The new instance is returned. + +With :meth:`~.Session.merge`, the given "source" +instance is not modified nor is it associated with the target :class:`.Session`, +and remains available to be merged with any number of other :class:`.Session` +objects. :meth:`~.Session.merge` is useful for +taking the state of any kind of object structure without regard for its +origins or current session associations and copying its state into a +new session. Here's some examples: + +* An application which reads an object structure from a file and wishes to + save it to the database might parse the file, build up the + structure, and then use + :meth:`~.Session.merge` to save it + to the database, ensuring that the data within the file is + used to formulate the primary key of each element of the + structure. Later, when the file has changed, the same + process can be re-run, producing a slightly different + object structure, which can then be ``merged`` in again, + and the :class:`~sqlalchemy.orm.session.Session` will + automatically update the database to reflect those + changes, loading each object from the database by primary key and + then updating its state with the new state given. + +* An application is storing objects in an in-memory cache, shared by + many :class:`.Session` objects simultaneously. :meth:`~.Session.merge` + is used each time an object is retrieved from the cache to create + a local copy of it in each :class:`.Session` which requests it. + The cached object remains detached; only its state is moved into + copies of itself that are local to individual :class:`~.Session` + objects. + + In the caching use case, it's common to use the ``load=False`` + flag to remove the overhead of reconciling the object's state + with the database. There's also a "bulk" version of + :meth:`~.Session.merge` called :meth:`~.Query.merge_result` + that was designed to work with cache-extended :class:`.Query` + objects - see the section :ref:`examples_caching`. + +* An application wants to transfer the state of a series of objects + into a :class:`.Session` maintained by a worker thread or other + concurrent system. :meth:`~.Session.merge` makes a copy of each object + to be placed into this new :class:`.Session`. At the end of the operation, + the parent thread/process maintains the objects it started with, + and the thread/worker can proceed with local copies of those objects. + + In the "transfer between threads/processes" use case, the application + may want to use the ``load=False`` flag as well to avoid overhead and + redundant SQL queries as the data is transferred. + +Merge Tips +~~~~~~~~~~ + +:meth:`~.Session.merge` is an extremely useful method for many purposes. However, +it deals with the intricate border between objects that are transient/detached and +those that are persistent, as well as the automated transference of state. +The wide variety of scenarios that can present themselves here often require a +more careful approach to the state of objects. Common problems with merge usually involve +some unexpected state regarding the object being passed to :meth:`~.Session.merge`. + +Lets use the canonical example of the User and Address objects:: + + class User(Base): + __tablename__ = 'user' + + id = Column(Integer, primary_key=True) + name = Column(String(50), nullable=False) + addresses = relationship("Address", backref="user") + + class Address(Base): + __tablename__ = 'address' + + id = Column(Integer, primary_key=True) + email_address = Column(String(50), nullable=False) + user_id = Column(Integer, ForeignKey('user.id'), nullable=False) + +Assume a ``User`` object with one ``Address``, already persistent:: + + >>> u1 = User(name='ed', addresses=[Address(email_address='ed@ed.com')]) + >>> session.add(u1) + >>> session.commit() + +We now create ``a1``, an object outside the session, which we'd like +to merge on top of the existing ``Address``:: + + >>> existing_a1 = u1.addresses[0] + >>> a1 = Address(id=existing_a1.id) + +A surprise would occur if we said this:: + + >>> a1.user = u1 + >>> a1 = session.merge(a1) + >>> session.commit() + sqlalchemy.orm.exc.FlushError: New instance <Address at 0x1298f50> + with identity key (<class '__main__.Address'>, (1,)) conflicts with + persistent instance <Address at 0x12a25d0> + +Why is that ? We weren't careful with our cascades. The assignment +of ``a1.user`` to a persistent object cascaded to the backref of ``User.addresses`` +and made our ``a1`` object pending, as though we had added it. Now we have +*two* ``Address`` objects in the session:: + + >>> a1 = Address() + >>> a1.user = u1 + >>> a1 in session + True + >>> existing_a1 in session + True + >>> a1 is existing_a1 + False + +Above, our ``a1`` is already pending in the session. The +subsequent :meth:`~.Session.merge` operation essentially +does nothing. Cascade can be configured via the :paramref:`~.relationship.cascade` +option on :func:`.relationship`, although in this case it +would mean removing the ``save-update`` cascade from the +``User.addresses`` relationship - and usually, that behavior +is extremely convenient. The solution here would usually be to not assign +``a1.user`` to an object already persistent in the target +session. + +The ``cascade_backrefs=False`` option of :func:`.relationship` +will also prevent the ``Address`` from +being added to the session via the ``a1.user = u1`` assignment. + +Further detail on cascade operation is at :ref:`unitofwork_cascades`. + +Another example of unexpected state:: + + >>> a1 = Address(id=existing_a1.id, user_id=u1.id) + >>> assert a1.user is None + >>> True + >>> a1 = session.merge(a1) + >>> session.commit() + sqlalchemy.exc.IntegrityError: (IntegrityError) address.user_id + may not be NULL + +Here, we accessed a1.user, which returned its default value +of ``None``, which as a result of this access, has been placed in the ``__dict__`` of +our object ``a1``. Normally, this operation creates no change event, +so the ``user_id`` attribute takes precedence during a +flush. But when we merge the ``Address`` object into the session, the operation +is equivalent to:: + + >>> existing_a1.id = existing_a1.id + >>> existing_a1.user_id = u1.id + >>> existing_a1.user = None + +Where above, both ``user_id`` and ``user`` are assigned to, and change events +are emitted for both. The ``user`` association +takes precedence, and None is applied to ``user_id``, causing a failure. + +Most :meth:`~.Session.merge` issues can be examined by first checking - +is the object prematurely in the session ? + +.. sourcecode:: python+sql + + >>> a1 = Address(id=existing_a1, user_id=user.id) + >>> assert a1 not in session + >>> a1 = session.merge(a1) + +Or is there state on the object that we don't want ? Examining ``__dict__`` +is a quick way to check:: + + >>> a1 = Address(id=existing_a1, user_id=user.id) + >>> a1.user + >>> a1.__dict__ + {'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x1298d10>, + 'user_id': 1, + 'id': 1, + 'user': None} + >>> # we don't want user=None merged, remove it + >>> del a1.user + >>> a1 = session.merge(a1) + >>> # success + >>> session.commit() + +Expunging +--------- + +Expunge removes an object from the Session, sending persistent instances to +the detached state, and pending instances to the transient state: + +.. sourcecode:: python+sql + + session.expunge(obj1) + +To remove all items, call :meth:`~.Session.expunge_all` +(this method was formerly known as ``clear()``). + +.. _session_expire: + +Refreshing / Expiring +--------------------- + +:term:`Expiring` means that the database-persisted data held inside a series +of object attributes is erased, in such a way that when those attributes +are next accessed, a SQL query is emitted which will refresh that data from +the database. + +When we talk about expiration of data we are usually talking about an object +that is in the :term:`persistent` state. For example, if we load an object +as follows:: + + user = session.query(User).filter_by(name='user1').first() + +The above ``User`` object is persistent, and has a series of attributes +present; if we were to look inside its ``__dict__``, we'd see that state +loaded:: + + >>> user.__dict__ + { + 'id': 1, 'name': u'user1', + '_sa_instance_state': <...>, + } + +where ``id`` and ``name`` refer to those columns in the database. +``_sa_instance_state`` is a non-database-persisted value used by SQLAlchemy +internally (it refers to the :class:`.InstanceState` for the instance. +While not directly relevant to this section, if we want to get at it, +we should use the :func:`.inspect` function to access it). + +At this point, the state in our ``User`` object matches that of the loaded +database row. But upon expiring the object using a method such as +:meth:`.Session.expire`, we see that the state is removed:: + + >>> session.expire(user) + >>> user.__dict__ + {'_sa_instance_state': <...>} + +We see that while the internal "state" still hangs around, the values which +correspond to the ``id`` and ``name`` columns are gone. If we were to access +one of these columns and are watching SQL, we'd see this: + +.. sourcecode:: python+sql + + >>> print(user.name) + {opensql}SELECT user.id AS user_id, user.name AS user_name + FROM user + WHERE user.id = ? + (1,) + {stop}user1 + +Above, upon accessing the expired attribute ``user.name``, the ORM initiated +a :term:`lazy load` to retrieve the most recent state from the database, +by emitting a SELECT for the user row to which this user refers. Afterwards, +the ``__dict__`` is again populated:: + + >>> user.__dict__ + { + 'id': 1, 'name': u'user1', + '_sa_instance_state': <...>, + } + +.. note:: While we are peeking inside of ``__dict__`` in order to see a bit + of what SQLAlchemy does with object attributes, we **should not modify** + the contents of ``__dict__`` directly, at least as far as those attributes + which the SQLAlchemy ORM is maintaining (other attributes outside of SQLA's + realm are fine). This is because SQLAlchemy uses :term:`descriptors` in + order to track the changes we make to an object, and when we modify ``__dict__`` + directly, the ORM won't be able to track that we changed something. + +Another key behavior of both :meth:`~.Session.expire` and :meth:`~.Session.refresh` +is that all un-flushed changes on an object are discarded. That is, +if we were to modify an attribute on our ``User``:: + + >>> user.name = 'user2' + +but then we call :meth:`~.Session.expire` without first calling :meth:`~.Session.flush`, +our pending value of ``'user2'`` is discarded:: + + >>> session.expire(user) + >>> user.name + 'user1' + +The :meth:`~.Session.expire` method can be used to mark as "expired" all ORM-mapped +attributes for an instance:: + + # expire all ORM-mapped attributes on obj1 + session.expire(obj1) + +it can also be passed a list of string attribute names, referring to specific +attributes to be marked as expired:: + + # expire only attributes obj1.attr1, obj1.attr2 + session.expire(obj1, ['attr1', 'attr2']) + +The :meth:`~.Session.refresh` method has a similar interface, but instead +of expiring, it emits an immediate SELECT for the object's row immediately:: + + # reload all attributes on obj1 + session.refresh(obj1) + +:meth:`~.Session.refresh` also accepts a list of string attribute names, +but unlike :meth:`~.Session.expire`, expects at least one name to +be that of a column-mapped attribute:: + + # reload obj1.attr1, obj1.attr2 + session.refresh(obj1, ['attr1', 'attr2']) + +The :meth:`.Session.expire_all` method allows us to essentially call +:meth:`.Session.expire` on all objects contained within the :class:`.Session` +at once:: + + session.expire_all() + +What Actually Loads +~~~~~~~~~~~~~~~~~~~ + +The SELECT statement that's emitted when an object marked with :meth:`~.Session.expire` +or loaded with :meth:`~.Session.refresh` varies based on several factors, including: + +* The load of expired attributes is triggered from **column-mapped attributes only**. + While any kind of attribute can be marked as expired, including a + :func:`.relationship` - mapped attribute, accessing an expired :func:`.relationship` + attribute will emit a load only for that attribute, using standard + relationship-oriented lazy loading. Column-oriented attributes, even if + expired, will not load as part of this operation, and instead will load when + any column-oriented attribute is accessed. + +* :func:`.relationship`- mapped attributes will not load in response to + expired column-based attributes being accessed. + +* Regarding relationships, :meth:`~.Session.refresh` is more restrictive than + :meth:`~.Session.expire` with regards to attributes that aren't column-mapped. + Calling :meth:`.refresh` and passing a list of names that only includes + relationship-mapped attributes will actually raise an error. + In any case, non-eager-loading :func:`.relationship` attributes will not be + included in any refresh operation. + +* :func:`.relationship` attributes configured as "eager loading" via the + :paramref:`~.relationship.lazy` parameter will load in the case of + :meth:`~.Session.refresh`, if either no attribute names are specified, or + if their names are inclued in the list of attributes to be + refreshed. + +* Attributes that are configured as :func:`.deferred` will not normally load, + during either the expired-attribute load or during a refresh. + An unloaded attribute that's :func:`.deferred` instead loads on its own when directly + accessed, or if part of a "group" of deferred attributes where an unloaded + attribute in that group is accessed. + +* For expired attributes that are loaded on access, a joined-inheritance table + mapping will emit a SELECT that typically only includes those tables for which + unloaded attributes are present. The action here is sophisticated enough + to load only the parent or child table, for example, if the subset of columns + that were originally expired encompass only one or the other of those tables. + +* When :meth:`~.Session.refresh` is used on a joined-inheritance table mapping, + the SELECT emitted will resemble that of when :meth:`.Session.query` is + used on the target object's class. This is typically all those tables that + are set up as part of the mapping. + + +When to Expire or Refresh +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :class:`.Session` uses the expiration feature automatically whenever +the transaction referred to by the session ends. Meaning, whenever :meth:`.Session.commit` +or :meth:`.Session.rollback` is called, all objects within the :class:`.Session` +are expired, using a feature equivalent to that of the :meth:`.Session.expire_all` +method. The rationale is that the end of a transaction is a +demarcating point at which there is no more context available in order to know +what the current state of the database is, as any number of other transactions +may be affecting it. Only when a new transaction starts can we again have access +to the current state of the database, at which point any number of changes +may have occurred. + +.. sidebar:: Transaction Isolation + + Of course, most databases are capable of handling + multiple transactions at once, even involving the same rows of data. When + a relational database handles multiple transactions involving the same + tables or rows, this is when the :term:`isolation` aspect of the database comes + into play. The isolation behavior of different databases varies considerably + and even on a single database can be configured to behave in different ways + (via the so-called :term:`isolation level` setting). In that sense, the :class:`.Session` + can't fully predict when the same SELECT statement, emitted a second time, + will definitely return the data we already have, or will return new data. + So as a best guess, it assumes that within the scope of a transaction, unless + it is known that a SQL expression has been emitted to modify a particular row, + there's no need to refresh a row unless explicitly told to do so. + +The :meth:`.Session.expire` and :meth:`.Session.refresh` methods are used in +those cases when one wants to force an object to re-load its data from the +database, in those cases when it is known that the current state of data +is possibly stale. Reasons for this might include: + +* some SQL has been emitted within the transaction outside of the + scope of the ORM's object handling, such as if a :meth:`.Table.update` construct + were emitted using the :meth:`.Session.execute` method; + +* if the application + is attempting to acquire data that is known to have been modified in a + concurrent transaction, and it is also known that the isolation rules in effect + allow this data to be visible. + +The second bullet has the important caveat that "it is also known that the isolation rules in effect +allow this data to be visible." This means that it cannot be assumed that an +UPDATE that happened on another database connection will yet be visible here +locally; in many cases, it will not. This is why if one wishes to use +:meth:`.expire` or :meth:`.refresh` in order to view data between ongoing +transactions, an understanding of the isolation behavior in effect is essential. + +.. seealso:: + + :meth:`.Session.expire` + + :meth:`.Session.expire_all` + + :meth:`.Session.refresh` + + :term:`isolation` - glossary explanation of isolation which includes links + to Wikipedia. + + `The SQLAlchemy Session In-Depth <http://techspot.zzzeek.org/2012/11/14/pycon-canada-the-sqlalchemy-session-in-depth/>`_ - a video + slides with an in-depth discussion of the object + lifecycle including the role of data expiration. |