summaryrefslogtreecommitdiff
path: root/docs/importer.rst
blob: 7b839d305c0681dbe48045590d1e795f784aafcf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
.. _import:

Importing an existing site
##########################

Description
===========

``pelican-import`` is a command-line tool for converting articles from other
software to reStructuredText or Markdown. The supported import formats are:

- Blogger XML export
- Dotclear export
- Posterous API
- Tumblr API
- WordPress XML export
- RSS/Atom feed

The conversion from HTML to reStructuredText or Markdown relies on `Pandoc`_.
For Dotclear, if the source posts are written with Markdown syntax, they will
not be converted (as Pelican also supports Markdown).

.. note::

   Unlike Pelican, Wordpress supports multiple categories per article. These
   are imported as a comma-separated string. You have to resolve these
   manually, or use a plugin such as `More Categories`_ that enables multiple
   categories per article.

Dependencies
============

``pelican-import`` has some dependencies not required by the rest of Pelican:

- *BeautifulSoup4* and *lxml*, for WordPress and Dotclear import. Can be
  installed like any other Python package (``pip install BeautifulSoup4
  lxml``).
- *Feedparser*, for feed import (``pip install feedparser``).
- *Pandoc*, see the `Pandoc site`_ for installation instructions on your
  operating system.

.. _Pandoc: https://pandoc.org/
.. _Pandoc site: https://pandoc.org/installing.html


Usage
=====

::

    pelican-import [-h] [--blogger] [--dotclear] [--posterous] [--tumblr] [--wpfile] [--feed]
                   [-o OUTPUT] [-m MARKUP] [--dir-cat] [--dir-page] [--strip-raw] [--wp-custpost]
                   [--wp-attach] [--disable-slugs] [-e EMAIL] [-p PASSWORD] [-b BLOGNAME]
                   input|api_token|api_key

Positional arguments
--------------------
  =============         ============================================================================
  ``input``             The input file to read
  ``api_token``         (Posterous only) api_token can be obtained from http://posterous.com/api/
  ``api_key``           (Tumblr only) api_key can be obtained from https://www.tumblr.com/oauth/apps
  =============         ============================================================================

Optional arguments
------------------

  -h, --help            Show this help message and exit
  --blogger             Blogger XML export (default: False)
  --dotclear            Dotclear export (default: False)
  --posterous           Posterous API (default: False)
  --tumblr              Tumblr API (default: False)
  --wpfile              WordPress XML export (default: False)
  --feed                Feed to parse (default: False)
  -o OUTPUT, --output OUTPUT
                        Output path (default: content)
  -m MARKUP, --markup MARKUP
                        Output markup format: ``rst``, ``markdown``, or ``asciidoc``
                        (default: ``rst``)
  --dir-cat             Put files in directories with categories name
                        (default: False)
  --dir-page            Put files recognised as pages in "pages/" sub-
                          directory (blogger and wordpress import only)
                          (default: False)
  --filter-author       Import only post from the specified author
  --strip-raw           Strip raw HTML code that can't be converted to markup
                        such as flash embeds or iframes (wordpress import
                        only) (default: False)
  --wp-custpost         Put wordpress custom post types in directories. If
                        used with --dir-cat option directories will be created
                        as "/post_type/category/" (wordpress import only)
  --wp-attach           Download files uploaded to wordpress as attachments.
                        Files will be added to posts as a list in the post
                        header and links to the files within the post will be
                        updated. All files will be downloaded, even if they
                        aren't associated with a post. Files will be downloaded
                        with their original path inside the output directory,
                        e.g. "output/wp-uploads/date/postname/file.jpg".
                        (wordpress import only) (requires an internet
                        connection)
  --disable-slugs       Disable storing slugs from imported posts within
                        output. With this disabled, your Pelican URLs may not
                        be consistent with your original posts. (default:
                        False)
  -e EMAIL, --email=EMAIL
                        Email used to authenticate Posterous API
  -p PASSWORD, --password=PASSWORD
                        Password used to authenticate Posterous API
  -b BLOGNAME, --blogname=BLOGNAME
                        Blog name used in Tumblr API


Examples
========

For Blogger::

    $ pelican-import --blogger -o ~/output ~/posts.xml

For Dotclear::

    $ pelican-import --dotclear -o ~/output ~/backup.txt

for Posterous::

    $ pelican-import --posterous -o ~/output --email=<email_address> --password=<password> <api_token>

For Tumblr::

    $ pelican-import --tumblr -o ~/output --blogname=<blogname> <api_token>

For WordPress::

    $ pelican-import --wpfile -o ~/output ~/posts.xml

Tests
=====

To test the module, one can use sample files:

- for WordPress: https://www.wpbeginner.com/wp-themes/how-to-add-dummy-content-for-theme-development-in-wordpress/
- for Dotclear: http://media.dotaddict.org/tda/downloads/lorem-backup.txt

.. _More Categories: https://github.com/pelican-plugins/more-categories