summaryrefslogtreecommitdiff
path: root/docs/user_guide.rst
blob: 1c6619fd94de843a08865c27f97b8ff99daeea49 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
User Guide
==========
This section covers the main features of requests-cache.

.. contents::
    :local:
    :depth: 2

Installation
------------
Install with pip:

    $ pip install requests-cache

Requirements
~~~~~~~~~~~~
* Requires python 3.6+.
* You may need additional dependencies depending on which backend you want to use. To install with
  extra dependencies for all supported :ref:`user_guide:cache backends`:

    $ pip install requests-cache[backends]

Optional Setup Steps
~~~~~~~~~~~~~~~~~~~~
* See :ref:`security` for recommended setup steps for more secure cache serialization.
* See :ref:`Contributing Guide <contributing:dev installation>` for setup steps for local development.

General Usage
-------------
There are two main ways of using requests-cache:

* **Sessions:** (recommended) Use :py:class:`.CachedSession` to send your requests
* **Patching:** Globally patch ``requests`` using :py:func:`.install_cache()`

Sessions
~~~~~~~~
:py:class:`.CachedSession` can be used as a drop-in replacement for :py:class:`requests.Session`.
Basic usage looks like this:

    >>> from requests_cache import CachedSession
    >>>
    >>> session = CachedSession()
    >>> session.get('http://httpbin.org/get')

Any :py:class:`requests.Session` method can be used (but see :ref:`user_guide:http methods` section
below for config details):

    >>> session.request('GET', 'http://httpbin.org/get')
    >>> session.head('http://httpbin.org/get')

Caching can be temporarily disabled with :py:meth:`.CachedSession.cache_disabled`:

    >>> with session.cache_disabled():
    ...     session.get('http://httpbin.org/get')

The best way to clean up your cache is through :ref:`user_guide:cache expiration`, but you can also
clear out everything at once with :py:meth:`.BaseCache.clear`:

    >>> session.cache.clear()

Patching
~~~~~~~~
In some situations, it may not be possible or convenient to manage your own session object. In those
cases, you can use :py:func:`.install_cache` to add caching to all ``requests`` functions:

    >>> import requests
    >>> import requests_cache
    >>>
    >>> requests_cache.install_cache()
    >>> requests.get('http://httpbin.org/get')

As well as session methods:

    >>> session = requests.Session()
    >>> session.get('http://httpbin.org/get')

:py:func:`.install_cache` accepts all the same parameters as :py:class:`.CachedSession`:

    >>> requests_cache.install_cache(expire_after=360, allowable_methods=('GET', 'POST'))

It can be temporarily :py:func:`.enabled`:

    >>> with requests_cache.enabled():
    ...     requests.get('http://httpbin.org/get')  # Will be cached

Or temporarily :py:func:`.disabled`:

    >>> requests_cache.install_cache()
    >>> with requests_cache.disabled():
    ...     requests.get('http://httpbin.org/get')  # Will not be cached

Or completely removed with :py:func:`.uninstall_cache`:

    >>> requests_cache.uninstall_cache()
    >>> requests.get('http://httpbin.org/get')

You can also clear out all responses in the cache with :py:func:`.clear`, and check if
requests-cache is currently installed with :py:func:`.is_installed`.

Limitations
^^^^^^^^^^^
Like any other utility that uses global patching, there are some scenarios where you won't want to
use :py:func:`.install_cache`:

* In a multi-threaded or multiprocess application
* In an application that uses other packages that extend or modify :py:class:`requests.Session`
* In a package that will be used by other packages or applications

Cache Backends
--------------
Several cache backends are included, which can be selected with
the ``backend`` parameter for either :py:class:`.CachedSession` or :py:func:`.install_cache`:

* ``'sqlite'``: `SQLite <https://www.sqlite.org>`_ database (**default**)
* ``'redis'``: `Redis <https://redis.io>`_ cache (requires ``redis``)
* ``'mongodb'``: `MongoDB <https://www.mongodb.com>`_ database (requires ``pymongo``)
* ``'gridfs'``: `GridFS <https://docs.mongodb.com/manual/core/gridfs/>`_ collections on a MongoDB database (requires ``pymongo``)
* ``'dynamodb'``: `Amazon DynamoDB <https://aws.amazon.com/dynamodb>`_ database (requires ``boto3``)
* ``'filesystem'``: Stores responses as files on the local filesystem
* ``'memory'`` : A non-persistent cache that just stores responses in memory

A backend can be specified either by name, class or instance:

    >>> from requests_cache.backends import RedisCache
    >>> from requests_cache import CachedSession
    >>>
    >>> # Backend name
    >>> session = CachedSession(backend='redis', namespace='my-cache')

    >>> # Backend class
    >>> session = CachedSession(backend=RedisCache, namespace='my-cache')

    >>> # Backend instance
    >>> session = CachedSession(backend=RedisCache(namespace='my-cache'))

See :py:mod:`requests_cache.backends` for more backend-specific usage details, and see
:ref:`advanced_usage:custom backends` for details on creating your own implementation.

Cache Name
~~~~~~~~~~
The ``cache_name`` parameter will be used as follows depending on the backend:

* ``sqlite``: Database path, e.g ``~/.cache/my_cache.sqlite``
* ``dynamodb``: Table name
* ``mongodb`` and ``gridfs``: Database name
* ``redis``: Namespace, meaning all keys will be prefixed with ``'<cache_name>:'``
* ``filesystem``: Cache directory

Cache Options
-------------
A number of options are available to modify which responses are cached and how they are cached.

HTTP Methods
~~~~~~~~~~~~
By default, only GET and HEAD requests are cached. To cache additional HTTP methods, specify them
with ``allowable_methods``. For example, caching POST requests can be used to ensure you don't send
the same data multiple times:

    >>> session = CachedSession(allowable_methods=('GET', 'POST'))
    >>> session.post('http://httpbin.org/post', json={'param': 'value'})

Status Codes
~~~~~~~~~~~~
By default, only responses with a 200 status code are cached. To cache additional status codes,
specify them with ``allowable_codes``"

    >>> session = CachedSession(allowable_codes=(200, 418))
    >>> session.get('http://httpbin.org/teapot')

Request Parameters
~~~~~~~~~~~~~~~~~~
By default, all request parameters are taken into account when caching responses. In some cases,
there may be request parameters that don't affect the response data, for example authentication tokens
or credentials. If you want to ignore specific parameters, specify them with ``ignored_parameters``:

    >>> session = CachedSession(ignored_parameters=['auth-token'])
    >>> # Only the first request will be sent
    >>> session.get('http://httpbin.org/get', params={'auth-token': '2F63E5DF4F44'})
    >>> session.get('http://httpbin.org/get', params={'auth-token': 'D9FAEB3449D3'})

Request Headers
~~~~~~~~~~~~~~~
By default, request headers are not taken into account when caching responses. In some cases,
different headers may result in different response data, so you may want to cache them separately.
To enable this, use ``include_get_headers``:

    >>> session = CachedSession(include_get_headers=True)
    >>> # Both of these requests will be sent and cached separately
    >>> session.get('http://httpbin.org/headers', {'Accept': 'text/plain'})
    >>> session.get('http://httpbin.org/headers', {'Accept': 'application/json'})

Cache Expiration
----------------
By default, cached responses will be stored indefinitely. You can initialize the cache with an
``expire_after`` value to specify how long responses will be cached.

Expiration Types
~~~~~~~~~~~~~~~~
``expire_after`` can be any of the following:

* ``-1`` (to never expire)
* A positive number (in seconds)
* A :py:class:`~datetime.timedelta`
* A :py:class:`~datetime.datetime`

Examples:

    >>> # Set expiration for the session using a value in seconds
    >>> session = CachedSession(expire_after=360)

    >>> # To specify a different unit of time, use a timedelta
    >>> from datetime import timedelta
    >>> session = CachedSession(expire_after=timedelta(days=30))

    >>> # Update an existing session to disable expiration (i.e., store indefinitely)
    >>> session.expire_after = -1

Expiration Scopes
~~~~~~~~~~~~~~~~~
Passing ``expire_after`` to :py:class:`.CachedSession` will set the expiration for the duration of that session.
Expiration can also be set on a per-URL or per-request basis. The following order of precedence
is used:

1. Per-request expiration (``expire_after`` argument for :py:meth:`.CachedSession.request`)
2. Per-URL expiration (``urls_expire_after`` argument for :py:class:`.CachedSession`)
3. Per-session expiration (``expire_after`` argument for :py:class:`.CachedSession`)

To set expiration for a single request:

    >>> session.get('http://httpbin.org/get', expire_after=360)

URL Patterns
~~~~~~~~~~~~
You can use ``urls_expire_after`` to set different expiration values for different requests, based on
URL glob patterns. This allows you to customize caching based on what you know about the resources
you're requesting. For example, you might request one resource that gets updated frequently, another
that changes infrequently, and another that never changes. Example:

    >>> urls_expire_after = {
    ...     '*.site_1.com': 30,
    ...     'site_2.com/resource_1': 60 * 2,
    ...     'site_2.com/resource_2': 60 * 60 * 24,
    ...     'site_2.com/static': -1,
    ... }
    >>> session = CachedSession(urls_expire_after=urls_expire_after)

**Notes:**

* ``urls_expire_after`` should be a dict in the format ``{'pattern': expire_after}``
* ``expire_after`` accepts the same types as ``CachedSession.expire_after``
* Patterns will match request **base URLs**, so the pattern ``site.com/resource/`` is equivalent to
  ``http*://site.com/resource/**``
* If there is more than one match, the first match will be used in the order they are defined
* If no patterns match a request, ``CachedSession.expire_after`` will be used as a default.

Removing Expired Responses
~~~~~~~~~~~~~~~~~~~~~~~~~~
For better performance, expired responses won't be removed immediately, but will be removed
(or replaced) the next time they are requested. To manually clear all expired responses, use
:py:meth:`.CachedSession.remove_expired_responses`:

    >>> session.remove_expired_responses()

Or, when using patching:

    >>> requests_cache.remove_expired_responses()

You can also apply a different ``expire_after`` to previously cached responses, which will
revalidate the cache with the new expiration time:

    >>> session.remove_expired_responses(expire_after=timedelta(days=30))

Potential Issues
----------------
* Version updates of ``requests``, ``urllib3`` or ``requests-cache`` itself may not be compatible with
  previously cached data (see issues `#56 <https://github.com/reclosedev/requests-cache/issues/56>`_
  and `#102 <https://github.com/reclosedev/requests-cache/issues/102>`_).
  The best way to prevent this is to use a virtualenv and pin your dependency versions.
* See :ref:`security` for notes on serialization security