docs/intro_to_parsing.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158

.. _intro_to_parsing:

======================
Loading and saving RDF
======================

Reading RDF files
-----------------

RDF data can be represented using various syntaxes (``turtle``, ``rdf/xml``, ``n3``, ``n-triples``,
``trix``, ``JSON-LD``, etc.). The simplest format is
``ntriples``, which is a triple-per-line format.

Create the file :file:`demo.nt` in the current directory with these two lines in it:

.. code-block:: Turtle

    <http://example.com/drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
    <http://example.com/drewp> <http://example.com/says> "Hello World" .

On line 1 this file says "drewp is a FOAF Person:. On line 2 it says "drep says "Hello World"".

RDFLib can guess what format the file is by the file ending (".nt" is commonly used for n-triples) so you can just use
:meth:`~rdflib.graph.Graph.parse` to read in the file. If the file had a non-standard RDF file ending, you could set the
keyword-parameter ``format`` to specify either an Internet Media Type or the format name (a :doc:`list of available
parsers <plugin_parsers>` is available).

In an interactive python interpreter, try this:

.. code-block:: python

    from rdflib import Graph

    g = Graph()
    g.parse("demo.nt")

    print(len(g))
    # prints: 2

    import pprint
    for stmt in g:
        pprint.pprint(stmt)
    # prints:
    # (rdflib.term.URIRef('http://example.com/drewp'),
    #  rdflib.term.URIRef('http://example.com/says'),
    #  rdflib.term.Literal('Hello World'))
    # (rdflib.term.URIRef('http://example.com/drewp'),
    #  rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
    #  rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))

The final lines show how RDFLib represents the two statements in the
file: the statements themselves are just length-3 tuples ("triples") and the
subjects, predicates, and objects of the triples are all rdflib types.

Reading remote RDF
------------------

Reading graphs from the Internet is easy:

.. code-block:: python

    from rdflib import Graph

    g = Graph()
    g.parse("http://www.w3.org/People/Berners-Lee/card")
    print(len(g))
    # prints: 86

:func:`rdflib.Graph.parse` can process local files, remote data via a URL, as in this example, or RDF data in a string
(using the ``data`` parameter).


Saving RDF
----------

To store a graph in a file, use the :func:`rdflib.Graph.serialize` function:

.. code-block:: python

    from rdflib import Graph

    g = Graph()
    g.parse("http://www.w3.org/People/Berners-Lee/card")
    g.serialize(destination="tbl.ttl")

This parses data from http://www.w3.org/People/Berners-Lee/card and stores it in a file ``tbl.ttl`` in this directory
using the turtle format, which is the default RDF serialization (as of rdflib 6.0.0).

To read the same data and to save it as an RDF/XML format string in the variable ``v``, do this:

.. code-block:: python

    from rdflib import Graph

    g = Graph()
    g.parse("http://www.w3.org/People/Berners-Lee/card")
    v = g.serialize(format="xml")


The following table lists the RDF formats you can serialize data to with rdflib, out of the box, and the ``format=KEYWORD`` keyword used to reference them within ``serialize()``:

.. csv-table::
   :header: "RDF Format", "Keyword", "Notes"

   "Turtle",    "turtle, ttl or turtle2",     "turtle2 is just turtle with more spacing & linebreaks"
   "RDF/XML",   "xml or pretty-xml",     "Was the default format, rdflib < 6.0.0"
   "JSON-LD",   "json-ld",     "There are further options for compact syntax and other JSON-LD variants"
   "N-Triples", "ntriples, nt or nt11",     "nt11 is exactly like nt, only utf8 encoded"
   "Notation-3","n3",     "N3 is a superset of Turtle that also caters for rules and a few other things"

   "Trig",     "trig",     "Turtle-like format for RDF triples + context (RDF quads) and thus multiple graphs"
   "Trix",     "trix",     "RDF/XML-like format for RDF quads"
   "N-Quads",   "nquads",     "N-Triples-like format for RDF quads"

Working with multi-graphs
-------------------------

To read and query multi-graphs, that is RDF data that is context-aware, you need to use rdflib's
:class:`rdflib.ConjunctiveGraph` or :class:`rdflib.Dataset` class. These are extensions to :class:`rdflib.Graph` that
know all about quads (triples + graph IDs).

If you had this multi-graph data file (in the ``trig`` format, using new-style ``PREFIX`` statement (not the older
``@prefix``):

.. code-block:: Turtle

    PREFIX eg: <http://example.com/person/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>

    eg:graph-1 {
        eg:drewp a foaf:Person .
        eg:drewp eg:says "Hello World" .
    }

    eg:graph-2 {
        eg:nick a foaf:Person .
        eg:nick eg:says "Hi World" .
    }

You could parse the file and query it like this:

.. code-block:: python

    from rdflib import Dataset
    from rdflib.namespace import RDF

    g = Dataset()
    g.parse("demo.trig")

    for s, p, o, g in g.quads((None, RDF.type, None, None)):
        print(s, g)

This will print out:

.. code-block::

    http://example.com/person/drewp http://example.com/person/graph-1
    http://example.com/person/nick http://example.com/person/graph-2