diff options
author | Matth?us G. Chajdas <dev@anteru.net> | 2019-11-10 13:56:53 +0100 |
---|---|---|
committer | Matth?us G. Chajdas <dev@anteru.net> | 2019-11-10 13:56:53 +0100 |
commit | 1dd3124a9770e11b6684e5dd1e6bc15a0aa3bc67 (patch) | |
tree | 87a171383266dd1f64196589af081bc2f8e497c3 /doc/docs/tokens.rst | |
parent | f1c080e184dc1bbc36eaa7cd729ff3a499de568a (diff) | |
download | pygments-master.tar.gz |
Diffstat (limited to 'doc/docs/tokens.rst')
-rw-r--r-- | doc/docs/tokens.rst | 372 |
1 files changed, 0 insertions, 372 deletions
diff --git a/doc/docs/tokens.rst b/doc/docs/tokens.rst deleted file mode 100644 index 801fc638..00000000 --- a/doc/docs/tokens.rst +++ /dev/null @@ -1,372 +0,0 @@ -.. -*- mode: rst -*- - -============== -Builtin Tokens -============== - -.. module:: pygments.token - -In the :mod:`pygments.token` module, there is a special object called `Token` -that is used to create token types. - -You can create a new token type by accessing an attribute of `Token`: - -.. sourcecode:: pycon - - >>> from pygments.token import Token - >>> Token.String - Token.String - >>> Token.String is Token.String - True - -Note that tokens are singletons so you can use the ``is`` operator for comparing -token types. - -As of Pygments 0.7 you can also use the ``in`` operator to perform set tests: - -.. sourcecode:: pycon - - >>> from pygments.token import Comment - >>> Comment.Single in Comment - True - >>> Comment in Comment.Multi - False - -This can be useful in :doc:`filters <filters>` and if you write lexers on your -own without using the base lexers. - -You can also split a token type into a hierarchy, and get the parent of it: - -.. sourcecode:: pycon - - >>> String.split() - [Token, Token.Literal, Token.Literal.String] - >>> String.parent - Token.Literal - -In principle, you can create an unlimited number of token types but nobody can -guarantee that a style would define style rules for a token type. Because of -that, Pygments proposes some global token types defined in the -`pygments.token.STANDARD_TYPES` dict. - -For some tokens aliases are already defined: - -.. sourcecode:: pycon - - >>> from pygments.token import String - >>> String - Token.Literal.String - -Inside the :mod:`pygments.token` module the following aliases are defined: - -============= ============================ ==================================== -`Text` `Token.Text` for any type of text data -`Whitespace` `Token.Text.Whitespace` for specially highlighted whitespace -`Error` `Token.Error` represents lexer errors -`Other` `Token.Other` special token for data not - matched by a parser (e.g. HTML - markup in PHP code) -`Keyword` `Token.Keyword` any kind of keywords -`Name` `Token.Name` variable/function names -`Literal` `Token.Literal` Any literals -`String` `Token.Literal.String` string literals -`Number` `Token.Literal.Number` number literals -`Operator` `Token.Operator` operators (``+``, ``not``...) -`Punctuation` `Token.Punctuation` punctuation (``[``, ``(``...) -`Comment` `Token.Comment` any kind of comments -`Generic` `Token.Generic` generic tokens (have a look at - the explanation below) -============= ============================ ==================================== - -The `Whitespace` token type is new in Pygments 0.8. It is used only by the -`VisibleWhitespaceFilter` currently. - -Normally you just create token types using the already defined aliases. For each -of those token aliases, a number of subtypes exists (excluding the special tokens -`Token.Text`, `Token.Error` and `Token.Other`) - -The `is_token_subtype()` function in the `pygments.token` module can be used to -test if a token type is a subtype of another (such as `Name.Tag` and `Name`). -(This is the same as ``Name.Tag in Name``. The overloaded `in` operator was newly -introduced in Pygments 0.7, the function still exists for backwards -compatibility.) - -With Pygments 0.7, it's also possible to convert strings to token types (for example -if you want to supply a token from the command line): - -.. sourcecode:: pycon - - >>> from pygments.token import String, string_to_tokentype - >>> string_to_tokentype("String") - Token.Literal.String - >>> string_to_tokentype("Token.Literal.String") - Token.Literal.String - >>> string_to_tokentype(String) - Token.Literal.String - - -Keyword Tokens -============== - -`Keyword` - For any kind of keyword (especially if it doesn't match any of the - subtypes of course). - -`Keyword.Constant` - For keywords that are constants (e.g. ``None`` in future Python versions). - -`Keyword.Declaration` - For keywords used for variable declaration (e.g. ``var`` in some programming - languages like JavaScript). - -`Keyword.Namespace` - For keywords used for namespace declarations (e.g. ``import`` in Python and - Java and ``package`` in Java). - -`Keyword.Pseudo` - For keywords that aren't really keywords (e.g. ``None`` in old Python - versions). - -`Keyword.Reserved` - For reserved keywords. - -`Keyword.Type` - For builtin types that can't be used as identifiers (e.g. ``int``, - ``char`` etc. in C). - - -Name Tokens -=========== - -`Name` - For any name (variable names, function names, classes). - -`Name.Attribute` - For all attributes (e.g. in HTML tags). - -`Name.Builtin` - Builtin names; names that are available in the global namespace. - -`Name.Builtin.Pseudo` - Builtin names that are implicit (e.g. ``self`` in Ruby, ``this`` in Java). - -`Name.Class` - Class names. Because no lexer can know if a name is a class or a function - or something else this token is meant for class declarations. - -`Name.Constant` - Token type for constants. In some languages you can recognise a token by the - way it's defined (the value after a ``const`` keyword for example). In - other languages constants are uppercase by definition (Ruby). - -`Name.Decorator` - Token type for decorators. Decorators are syntactic elements in the Python - language. Similar syntax elements exist in C# and Java. - -`Name.Entity` - Token type for special entities. (e.g. `` `` in HTML). - -`Name.Exception` - Token type for exception names (e.g. ``RuntimeError`` in Python). Some languages - define exceptions in the function signature (Java). You can highlight - the name of that exception using this token then. - -`Name.Function` - Token type for function names. - -`Name.Function.Magic` - same as `Name.Function` but for special function names that have an implicit use - in a language (e.g. ``__init__`` method in Python). - -`Name.Label` - Token type for label names (e.g. in languages that support ``goto``). - -`Name.Namespace` - Token type for namespaces. (e.g. import paths in Java/Python), names following - the ``module``/``namespace`` keyword in other languages. - -`Name.Other` - Other names. Normally unused. - -`Name.Tag` - Tag names (in HTML/XML markup or configuration files). - -`Name.Variable` - Token type for variables. Some languages have prefixes for variable names - (PHP, Ruby, Perl). You can highlight them using this token. - -`Name.Variable.Class` - same as `Name.Variable` but for class variables (also static variables). - -`Name.Variable.Global` - same as `Name.Variable` but for global variables (used in Ruby, for - example). - -`Name.Variable.Instance` - same as `Name.Variable` but for instance variables. - -`Name.Variable.Magic` - same as `Name.Variable` but for special variable names that have an implicit use - in a language (e.g. ``__doc__`` in Python). - - -Literals -======== - -`Literal` - For any literal (if not further defined). - -`Literal.Date` - for date literals (e.g. ``42d`` in Boo). - - -`String` - For any string literal. - -`String.Affix` - Token type for affixes that further specify the type of the string they're - attached to (e.g. the prefixes ``r`` and ``u8`` in ``r"foo"`` and ``u8"foo"``). - -`String.Backtick` - Token type for strings enclosed in backticks. - -`String.Char` - Token type for single characters (e.g. Java, C). - -`String.Delimiter` - Token type for delimiting identifiers in "heredoc", raw and other similar - strings (e.g. the word ``END`` in Perl code ``print <<'END';``). - -`String.Doc` - Token type for documentation strings (for example Python). - -`String.Double` - Double quoted strings. - -`String.Escape` - Token type for escape sequences in strings. - -`String.Heredoc` - Token type for "heredoc" strings (e.g. in Ruby or Perl). - -`String.Interpol` - Token type for interpolated parts in strings (e.g. ``#{foo}`` in Ruby). - -`String.Other` - Token type for any other strings (for example ``%q{foo}`` string constructs - in Ruby). - -`String.Regex` - Token type for regular expression literals (e.g. ``/foo/`` in JavaScript). - -`String.Single` - Token type for single quoted strings. - -`String.Symbol` - Token type for symbols (e.g. ``:foo`` in LISP or Ruby). - - -`Number` - Token type for any number literal. - -`Number.Bin` - Token type for binary literals (e.g. ``0b101010``). - -`Number.Float` - Token type for float literals (e.g. ``42.0``). - -`Number.Hex` - Token type for hexadecimal number literals (e.g. ``0xdeadbeef``). - -`Number.Integer` - Token type for integer literals (e.g. ``42``). - -`Number.Integer.Long` - Token type for long integer literals (e.g. ``42L`` in Python). - -`Number.Oct` - Token type for octal literals. - - -Operators -========= - -`Operator` - For any punctuation operator (e.g. ``+``, ``-``). - -`Operator.Word` - For any operator that is a word (e.g. ``not``). - - -Punctuation -=========== - -.. versionadded:: 0.7 - -`Punctuation` - For any punctuation which is not an operator (e.g. ``[``, ``(``...) - - -Comments -======== - -`Comment` - Token type for any comment. - -`Comment.Hashbang` - Token type for hashbang comments (i.e. first lines of files that start with - ``#!``). - -`Comment.Multiline` - Token type for multiline comments. - -`Comment.Preproc` - Token type for preprocessor comments (also ``<?php``/``<%`` constructs). - -`Comment.Single` - Token type for comments that end at the end of a line (e.g. ``# foo``). - -`Comment.Special` - Special data in comments. For example code tags, author and license - information, etc. - - -Generic Tokens -============== - -Generic tokens are for special lexers like the `DiffLexer` that doesn't really -highlight a programming language but a patch file. - - -`Generic` - A generic, unstyled token. Normally you don't use this token type. - -`Generic.Deleted` - Marks the token value as deleted. - -`Generic.Emph` - Marks the token value as emphasized. - -`Generic.Error` - Marks the token value as an error message. - -`Generic.Heading` - Marks the token value as headline. - -`Generic.Inserted` - Marks the token value as inserted. - -`Generic.Output` - Marks the token value as program output (e.g. for python cli lexer). - -`Generic.Prompt` - Marks the token value as command prompt (e.g. bash lexer). - -`Generic.Strong` - Marks the token value as bold (e.g. for rst lexer). - -`Generic.Subheading` - Marks the token value as subheadline. - -`Generic.Traceback` - Marks the token value as a part of an error traceback. |