diff options
author | no author <noone@nowhere> | 2005-09-26 02:58:54 +0000 |
---|---|---|
committer | no author <noone@nowhere> | 2005-09-26 02:58:54 +0000 |
commit | 84b8431608174e74a4c0d2394eb330a6621bc74b (patch) | |
tree | ffc2bd7ce21708a9147247c80b0e7fc7728ea063 | |
download | coderay-84b8431608174e74a4c0d2394eb330a6621bc74b.tar.gz |
New Repository, initial import
47 files changed, 4857 insertions, 0 deletions
@@ -0,0 +1,340 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + <signature of Ty Coon>, 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. @@ -0,0 +1,64 @@ += CodeRay + +== About +CodeRay is a Ruby library for syntax highlighting. + +Syntax highlighting means: You put your code in, and you get it back colored; +Keywords, Strings, Floats, Comments - all in different colors. +And with line numbers. + +*Syntax* *Highlighting*... +* makes code easier to read +* lets you detect errors faster +* helps you to understand the syntax of a language +* looks nice +* is what everybody should have on their website +* solves all your problems and makes the girls run after you + +Version: 0.4.1 (2005.june.1) +Author:: murphy +Idea:: licenser +Website:: rd.cYcnus.de/coderay[http://rd.cYcnus.de/coderay] +Copyright:: (c) 2005 by cYcnus +License:: Not yet decided + +----- + +== Installation + + % gem install coderay + + +=== Dependencies + +CodeRay needs Ruby 1.8 and the strscan[http://www.ruby-doc.org/stdlib/libdoc/strscan/rdoc/index.htm] library included. + + +== Example Usage +(Forgive me, but this is not highlighted.) + + require 'coderay' + + hl = CodeRay.html :line_numbers => :column + puts hl.highlight_page "puts 'Hello, world!'", :ruby + + +== Documentation + +See CodeRay. + + +----- + +== Credits + +=== Special Thanks to +* licenser (Heinz N. Gies) for ending my QBasic career, inventing the Coder project and the input/output plugin system. + CodeRay would not exist without him. + +=== Thanks to +* Caleb Clausen for writing RubyLexer (see http://rubyforge.org/projects/rubylexer) and lots of mails +* Jamis Buck for writing Syntax (see http://rubyforge.org/projects/syntax) +* everyone who used CodeRay on http://www.rubyforen.de and http://www.infhu.de/mx +* iGEL, magichisoka, manveru and everyone I forgot from rubyforen.de +* Dookie (who is no longer with us...) and Leonidas from http://www.python-forum.de diff --git a/bin/coderay b/bin/coderay new file mode 100644 index 0000000..d4239fd --- /dev/null +++ b/bin/coderay @@ -0,0 +1,62 @@ +#!C:/ruby/bin/ruby
+
+# CodeRay Executable
+#
+# Version: 0.1
+# Author: murphy
+
+require 'optparse'
+
+def err msg
+ $stderr.puts msg
+end
+
+begin
+ require 'coderay'
+
+ if ARGV.empty?
+ puts <<-USAGE
+Usage:
+ coderay lang [format] < file > output
+ coderay file [format]
+ USAGE
+ end
+
+ unless format = ARGV[1]
+ $stderr.puts 'No format given; setting to default (HTML)'
+ format = :html
+ end
+
+ lang = ARGV[0] or raise 'No lang/file given.'
+ if lang[/\A:(\w+)\z/]
+ lang = $1.to_sym
+ input = $stdin.read
+ tokens = CodeRay.scan input, lang
+ else
+ file = lang
+ tokens = CodeRay.scan_file file
+ output_filename = file[0...-File.extname(file).size]
+ end
+
+ output = tokens.encode format
+ out = $stdout
+ if output_filename
+ output_filename << '.' << CodeRay::Encoders[format]::FILE_EXTENSION
+ if File.exist? output_filename
+ err 'File %s already exists.' % output_filename
+ exit
+ else
+ out = File.open output_filename, 'w'
+ end
+ else
+
+ end
+ out.print output
+
+rescue => boom
+ err "Error: #{boom.message}\n"
+ err boom.backtrace
+ err '-' * 50
+ err ARGV.options
+ exit 1
+end
diff --git a/demo/demo_count.rb b/demo/demo_count.rb new file mode 100644 index 0000000..bcb7c2d --- /dev/null +++ b/demo/demo_count.rb @@ -0,0 +1,10 @@ +require 'coderay'
+
+stats = CodeRay.encoder(:statistic)
+stats.encode("puts 17 + 4\n", :ruby)
+
+puts '%d out of %d tokens have the kind :integer.' % [
+ stats.type_stats[:integer].count,
+ stats.real_token_count
+]
+#-> 2 out of 4 tokens have the kind :integer.
diff --git a/demo/demo_css.rb b/demo/demo_css.rb new file mode 100644 index 0000000..972bbfa --- /dev/null +++ b/demo/demo_css.rb @@ -0,0 +1,6 @@ +require 'coderay'
+
+data = File.read 'L:\bench\strange.ruby'
+page = CodeRay.scan(data, :ruby).optimize.html(:css => :style, :debug => $DEBUG).page
+
+puts page
diff --git a/demo/demo_div.rb b/demo/demo_div.rb new file mode 100644 index 0000000..27b6f32 --- /dev/null +++ b/demo/demo_div.rb @@ -0,0 +1,19 @@ +require 'coderay' + +puts CodeRay.scan(DATA.read, :ruby).div + +__END__ +for a in 0..255 + a = a.chr + begin + x = eval("?\\#{a}") + if x == a[0] + next + else + print "#{a}: #{x}" + end + rescue SyntaxError => boom + print "#{a}: error" + end + puts +end diff --git a/demo/demo_dump.rb b/demo/demo_dump.rb new file mode 100644 index 0000000..b848dcd --- /dev/null +++ b/demo/demo_dump.rb @@ -0,0 +1,8 @@ +require 'coderay'
+
+puts CodeRay.
+ scan("puts 'Hello, world!'", :ruby).
+ compact.
+ dump.
+ undump.
+ html(:wrap => :div)
diff --git a/demo/demo_encoder.rb b/demo/demo_encoder.rb new file mode 100644 index 0000000..267676b --- /dev/null +++ b/demo/demo_encoder.rb @@ -0,0 +1,39 @@ +require 'coderay'
+
+SAMPLE = "puts 17 + 4\n"
+puts 'Encoders Demo: ' + SAMPLE
+scanner = CodeRay::Scanners[:ruby].new SAMPLE
+encoder = CodeRay::Encoders[:statistic].new
+
+tokens = scanner.tokenize
+stats = encoder.encode_tokens tokens
+
+puts
+puts 'Statistic:'
+puts stats
+
+# alternative 1
+tokens = CodeRay.scan SAMPLE, :ruby
+encoder = CodeRay.encoder(:tokens)
+textual = encoder.encode_tokens tokens
+puts
+puts 'Original text:'
+puts textual
+
+# alternative 2
+yaml = CodeRay.encoder(:yaml).encode SAMPLE, :ruby
+puts
+puts 'YAML:'
+puts yaml
+
+# alternative 3
+BIGSAMPLE = SAMPLE * 100
+dump = CodeRay.scan(BIGSAMPLE, :ruby).dump
+puts
+puts 'Dump:'
+p dump
+puts 'compressed: %d byte < %d byte' % [dump.size, BIGSAMPLE.size]
+
+puts
+puts 'Undump:'
+puts dump.undump.statistic
diff --git a/demo/demo_global_vars.rb b/demo/demo_global_vars.rb new file mode 100644 index 0000000..2bacfe5 --- /dev/null +++ b/demo/demo_global_vars.rb @@ -0,0 +1,15 @@ +code = <<'CODE'
+$ie.text_field(:name, "pAnfrage ohne $gV und mit #{$gv}").set artikel
+oder
+text = $bla.test(...)
+CODE
+
+require 'coderay'
+require 'erb'
+include ERB::Util
+
+tokens = CodeRay.scan code, :ruby
+tokens.each_text_token { |text, kind| text.replace h(text) }
+tokens.each(:global_variable) { |text, kind| text.replace '<span class="glob-var">%s</span>' % text }
+
+puts tokens.text
diff --git a/demo/demo_global_vars2.rb b/demo/demo_global_vars2.rb new file mode 100644 index 0000000..7646890 --- /dev/null +++ b/demo/demo_global_vars2.rb @@ -0,0 +1,28 @@ +require 'coderay'
+require 'erb'
+include ERB::Util
+
+code = <<'CODE'
+$ie.text_field(:name, "pAnfrage ohne $gV und mit #{$gv}").set artikel
+oder
+text = $bla.test(...)
+CODE
+puts <<HTML
+<html>
+<head>
+<style>span.glob-var { color: green; font-weight: bold; }</style>
+</head>
+<body>
+HTML
+
+CodeRay.scan_stream code, :ruby do |text, kind|
+ next if text.is_a? Symbol
+ text = h(text)
+ text = '<span class="glob-var">%s</span>' % text if kind == :global_variable
+ print text
+end
+
+puts <<HTML
+</body>
+</html>
+HTML
diff --git a/demo/demo_html.rb b/demo/demo_html.rb new file mode 100644 index 0000000..d2d25a1 --- /dev/null +++ b/demo/demo_html.rb @@ -0,0 +1,395 @@ +$: << '..' +require 'coderay' + +tokens = CodeRay.scan DATA.read, :ruby +html = tokens.html(:tab_width => 2, :line_numbers => :table) + +puts html.page + +__END__ +require 'scanner' + +module CodeRay + + class RubyScanner < Scanner + + RESERVED_WORDS = [ + 'and', 'def', 'end', 'in', 'or', 'unless', 'begin', + 'defined?', 'ensure', 'module', 'redo', 'super', 'until', + 'BEGIN', 'break', 'do', 'next', 'rescue', 'then', + 'when', 'END', 'case', 'else', 'for', 'retry', + 'while', 'alias', 'class', 'elsif', 'if', 'not', 'return', + 'undef', 'yield', + ] + + DEF_KEYWORDS = ['def'] + MODULE_KEYWORDS = ['class', 'module'] + DEF_NEW_STATE = WordList.new(:initial). + add(DEF_KEYWORDS, :def_expected). + add(MODULE_KEYWORDS, :module_expected) + + WORDS_ALLOWING_REGEXP = [ + 'and', 'or', 'not', 'while', 'until', 'unless', 'if', 'elsif', 'when' + ] + REGEXP_ALLOWED = WordList.new(false). + add(WORDS_ALLOWING_REGEXP, :set) + + PREDEFINED_CONSTANTS = [ + 'nil', 'true', 'false', 'self', + 'DATA', 'ARGV', 'ARGF', '__FILE__', '__LINE__', + ] + + IDENT_KIND = WordList.new(:ident). + add(RESERVED_WORDS, :reserved). + add(PREDEFINED_CONSTANTS, :pre_constant) + + METHOD_NAME = / #{IDENT} [?!]? /xo + METHOD_NAME_EX = / + #{METHOD_NAME} # common methods: split, foo=, empty?, gsub! + | \*\*? # multiplication and power + | [-+~]@? # plus, minus + | [\/%&|^`] # division, modulo or format strings, &and, |or, ^xor, `system` + | \[\]=? # array getter and setter + | <=?>? | >=? # comparison, rocket operator + | << | >> # append or shift left, shift right + | ===? # simple equality and case equality + /ox + GLOBAL_VARIABLE = / \$ (?: #{IDENT} | \d+ | [~&+`'=\/,;_.<>!@0$?*":F\\] | -[a-zA-Z_0-9] ) /ox + + DOUBLEQ = / " [^"\#\\]* (?: (?: \#\{.*?\} | \#(?:$")? | \\. ) [^"\#\\]* )* "? /ox + SINGLEQ = / ' [^'\\]* (?: \\. [^'\\]* )* '? /ox + STRING = / #{SINGLEQ} | #{DOUBLEQ} /ox + SHELL = / ` [^`\#\\]* (?: (?: \#\{.*?\} | \#(?:$`)? | \\. ) [^`\#\\]* )* `? /ox + REGEXP = / \/ [^\/\#\\]* (?: (?: \#\{.*?\} | \#(?:$\/)? | \\. ) [^\/\#\\]* )* \/? /ox + + DECIMAL = /\d+(?:_\d+)*/ # doesn't recognize 09 as octal error + OCTAL = /0_?[0-7]+(?:_[0-7]+)*/ + HEXADECIMAL = /0x[0-9A-Fa-f]+(?:_[0-9A-Fa-f]+)*/ + BINARY = /0b[01]+(?:_[01]+)*/ + + EXPONENT = / [eE] [+-]? #{DECIMAL} /ox + FLOAT = / #{DECIMAL} (?: #{EXPONENT} | \. #{DECIMAL} #{EXPONENT}? ) / + INTEGER = /#{OCTAL}|#{HEXADECIMAL}|#{BINARY}|#{DECIMAL}/ + + def reset + super + @regexp_allowed = false + end + + def next_token + return if @scanner.eos? + + kind = :error + if @scanner.scan(/\s+/) # in every state + kind = :space + @regexp_allowed = :set if @regexp_allowed or @scanner.matched.index(?\n) # delayed flag setting + + elsif @state == :def_expected + if @scanner.scan(/ (?: (?:#{IDENT}(?:\.|::))* | (?:@@?|$)? #{IDENT}(?:\.|::) ) #{METHOD_NAME_EX} /ox) + kind = :method + @state = :initial + else + @scanner.scan(/./) + kind = :error + end + @state = :initial + + elsif @state == :module_expected + if @scanner.scan(/<</) + kind = :operator + else + if @scanner.scan(/ (?: #{IDENT} (?:\.|::))* #{IDENT} /ox) + kind = :method + else + @scanner.scan(/./) + kind = :error + end + @state = :initial + end + + elsif # state == :initial + # IDENTIFIERS, KEYWORDS + if @scanner.scan(GLOBAL_VARIABLE) + kind = :global_variable + elsif @scanner.scan(/ @@ #{IDENT} /ox) + kind = :class_variable + elsif @scanner.scan(/ @ #{IDENT} /ox) + kind = :instance_variable + elsif @scanner.scan(/ __END__\n ( (?!\#CODE\#) .* )? | \#[^\n]* | =begin(?=\s).*? \n=end(?=\s|\z)(?:[^\n]*)? /x) + kind = :comment + elsif @scanner.scan(METHOD_NAME) + if @last_token_dot + kind = :ident + else + matched = @scanner.matched + kind = IDENT_KIND[matched] + if kind == :ident and matched =~ /^[A-Z]/ + kind = :constant + elsif kind == :reserved + @state = DEF_NEW_STATE[matched] + @regexp_allowed = REGEXP_ALLOWED[matched] + end + end + + elsif @scanner.scan(STRING) + kind = :string + elsif @scanner.scan(SHELL) + kind = :shell + ## HEREDOCS + elsif @scanner.scan(/\//) and @regexp_allowed + @scanner.unscan + @scanner.scan(REGEXP) + kind = :regexp + ## %strings + elsif @scanner.scan(/:(?:#{GLOBAL_VARIABLE}|#{METHOD_NAME_EX}|#{STRING})/ox) + kind = :global_variable + elsif @scanner.scan(/ + \? (?: + [^\s\\] + | + \\ (?:M-\\C-|C-\\M-|M-\\c|c\\M-|c|C-|M-))? (?: \\ (?: . | [0-7]{3} | x[0-9A-Fa-f][0-9A-Fa-f] ) + ) + /ox) + kind = :integer + + elsif @scanner.scan(/ [-+*\/%=<>;,|&!()\[\]{}~?] | \.\.?\.? | ::? /x) + kind = :operator + @regexp_allowed = :set if @scanner.matched[-1,1] =~ /[~=!<>|&^,\(\[+\-\/\*%]\z/ + elsif @scanner.scan(FLOAT) + kind = :float + elsif @scanner.scan(INTEGER) + kind = :integer + elsif @scanner.scan(/:(?:#{GLOBAL_VARIABLE}|#{METHOD_NAME_EX}|#{STRING})/ox) + kind = :global_variable + else + @scanner.scan(/./m) + end + end + + token = Token.new @scanner.matched, kind + + if kind == :regexp + token.text << @scanner.scan(/[eimnosux]*/) + end + + @regexp_allowed = (@regexp_allowed == :set) # delayed flag setting + + token + end + end + + ScannerList.register RubyScanner, 'ruby' + +end + +module CodeRay + require 'scanner' + + class Highlighter + + def initialize lang + @scanner = Scanner[lang].new + end + + def highlight code + @scanner.feed code + @scanner.all_tokens.map { |t| t.inspect }.join "\n" + end + + end + + class HTMLHighlighter < Highlighter + + ClassOfKind = { + :attribute_name => 'an', + :attribute_name_fat => 'af', + :attribute_value => 'av', + :attribute_value_fat => 'aw', + :bin => 'bi', + :char => 'ch', + :class => 'cl', + :class_variable => 'cv', + :color => 'cr', + :comment => 'c', + :constant => 'co', + :definition => 'df', + :directive => 'di', + :doc => 'do', + :doc_string => 'ds', + :exception => 'ex', + :error => 'er', + :float => 'fl', + :function => 'fu', + :global_variable => 'gv', + :hex => 'hx', + :include => 'ic', + :instance_variable => 'iv', + :integer => 'i', + :interpreted => 'in', + :label => 'la', + :local_variable => 'lv', + :oct => 'oc', + :operator_name => 'on', + :pre_constant => 'pc', + :pre_type => 'pt', + :predefined => 'pd', + :preprocessor => 'pp', + :regexp => 'rx', + :reserved => 'r', + :shell => 'sh', + :string => 's', + :symbol => 'sy', + :tag => 'ta', + :tag_fat => 'tf', + :tag_special => 'ts', + :type => 'ty', + :variable => 'v', + :xml_text => 'xt', + + :ident => :NO_HIGHLIGHT, + :operator => :NO_HIGHLIGHT, + :space => :NO_HIGHLIGHT, + } + ClassOfKind[:procedure] = ClassOfKind[:method] = ClassOfKind[:function] + ClassOfKind.default = ClassOfKind[:error] or raise 'no class found for :error!' + + def initialize lang, options = {} + super lang + + @HTML_TAB = ' ' * options.fetch(:tabs2space, 8) + case level = options.fetch(:level, 'xhtml') + when 'html' + @HTML_BR = "<BR>\n" + when 'xhtml' + @HTML_BR = "<br />\n" + else + raise "Unknown HTML level: #{level}" + end + end + + def highlight code + @scanner.feed code + + out = '' + while t = @scanner.next_token + warn t.inspect if t.text.nil? + out << to_html(t) + end + TEMPLATE =~ /<%CONTENT%>/ + $` + out + $' + end + + private + def to_html token + css_class = ClassOfKind[token.kind] + if defined? ::DEBUG and not ClassOfKind.has_key? token.kind + warn "no token class found for :#{token.kind}" + end + + text = text_to_html token.text + if css_class == :NO_HIGHLIGHT + text + else + "<span class=\"#{css_class}\">#{text}</span>" + end + end + + def text_to_html text + return '' if text.empty? + text = text.dup # important + if text.index(/["><&]/) + text.gsub!('&', '&') + text.gsub!('"', '"') + text.gsub!('>', '>') + text.gsub!('<', '<') + end + if text.index(/\s/) + text.gsub!("\n", @HTML_BR) + text.gsub!("\t", @HTML_TAB) + text.gsub!(/^ /, ' ') + text.gsub!(' ', ' ') + end + text + end + + TEMPLATE = <<-'TEMPLATE' +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> +<html dir="ltr"> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> +<meta http-equiv="Content-Style-Type" content="text/css"> + +<title>RubyBB BBCode</title> +<style type="text/css"> +.code { + width: 100%; + background-color: #FAFAFA; + border: 1px solid #D1D7DC; + font-family: 'Courier New', 'Terminal', monospace; + font-size: 10pt; + color: black; + vertical-align: top; + text-align: left; +} +.code .af { color:#00C; } +.code .an { color:#007; } +.code .av { color:#700; } +.code .aw { color:#C00; } +.code .bi { color:#509; font-weight:bold; } +.code .c { color:#888; } +.code .ch { color:#C28; font-weight:bold; } +.code .cl { color:#B06; font-weight:bold; } +.code .co { color:#036; font-weight:bold; } +.code .cr { color:#0A0; } +.code .cv { color:#369; } +.code .df { color:#099; font-weight:bold; } +.code .di { color:#088; font-weight:bold; } +.code .do { color:#970; } +.code .ds { color:#D42; font-weight:bold; } +.code .er { color:#F00; background-color:#FAA; } +.code .ex { color:#F00; font-weight:bold; } +.code .fl { color:#60E; font-weight:bold; } +.code .fu { color:#06B; font-weight:bold; } +.code .gv { color:#800; font-weight:bold; } +.code .hx { color:#058; font-weight:bold; } +.code .i { color:#00D; font-weight:bold; } +.code .ic { color:#B44; font-weight:bold; } +.code .in { color:#B2B; font-weight:bold; } +.code .iv { color:#33B; } +.code .la { color:#970; font-weight:bold; } +.code .lv { color:#963; } +.code .oc { color:#40E; font-weight:bold; } +.code .on { color:#000; font-weight:bold; } +.code .pc { color:#038; font-weight:bold; } +.code .pd { color:#369; font-weight:bold; } +.code .pp { color:#579; } +.code .pt { color:#339; font-weight:bold; } +.code .r { color:#080; font-weight:bold; } +.code .rx { color:#927; font-weight:bold; } +.code .s { color:#D42; font-weight:bold; } +.code .sh { color:#B2B; font-weight:bold; } +.code .sy { color:#A60; } +.code .ta { color:#070; } +.code .tf { color:#070; font-weight:bold; } +.code .ts { color:#D70; font-weight:bold; } +.code .ty { color:#339; font-weight:bold; } +.code .v { color:#036; } +.code .xt { color:#444; } +</style> +</head> +<body> +<div class="code"> +<%CONTENT%> +</div> +<div class="validators"> +<a href="http://validator.w3.org/check?uri=referer"><img src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" height="31" width="88" style="border:none;"></a> +<img style="border:0" src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!" > +</div> +</body> +</html> + TEMPLATE + + end + +end + diff --git a/demo/demo_html2.rb b/demo/demo_html2.rb new file mode 100644 index 0000000..0982ad8 --- /dev/null +++ b/demo/demo_html2.rb @@ -0,0 +1,4 @@ +require 'coderay' +require 'coderay/encoders/html' + +puts CodeRay["puts CodeRay['...', :ruby]", :ruby].div diff --git a/demo/demo_load_encoder.rb b/demo/demo_load_encoder.rb new file mode 100644 index 0000000..3c85463 --- /dev/null +++ b/demo/demo_load_encoder.rb @@ -0,0 +1,13 @@ +require 'coderay'
+
+begin
+ CodeRay::Encoders::YAML
+rescue
+ puts 'CodeRay::Encoders::YAML is not defined; you must load it first.'
+end
+
+yaml_encoder = CodeRay::Encoders[:yaml]
+puts 'Now it is loaded.'
+
+p yaml_encoder == CodeRay::Encoders::YAML #-> true
+puts 'See?'
diff --git a/demo/demo_more.rb b/demo/demo_more.rb new file mode 100644 index 0000000..7ebf5c3 --- /dev/null +++ b/demo/demo_more.rb @@ -0,0 +1,206 @@ +require 'rubygems' +$: << '..' +require 'coderay' +require 'benchmark' + +c, ruby = DATA.read.split(/^---$/) +DATA.rewind +me = DATA.read[/.*^__END__$/m] +$input = c + ruby + me + +time = Benchmark.realtime do + + # here CodeRay comes to play + hl = CodeRay.encoder(:html, :tab_width => 2, :line_numbers => :table, :wrap => :div) + c = hl.highlight c, :c + ruby = hl.highlight ruby, :ruby + me = hl.highlight me, :ruby + + body = %w[C Ruby Genereated\ by].zip([c, ruby, me]).map do |title, code| + "<h1>#{title}</h1>\n#{code}" + end.join + body = hl.class::Output.new(body, :div).page! + + # CodeRay also provides a simple page generator + $output = body #hl.class.wrap_in_page body +end + +File.open('test.html', 'w') do |f| + f.write $output +end +puts 'Input: %dB, Output: %dB' % [$input.size, $output.size] +puts 'Created "test.html" in %0.3f seconds (%d KB/s). Take a look with your browser.' % [time, $input.size / 1024.0 / time] + +__END__ +/********************************************************************** + + version.c - + + $Author: nobu $ + $Date: 2004/03/25 12:01:40 $ + created at: Thu Sep 30 20:08:01 JST 1993 + + Copyright (C) 1993-2003 Yukihiro Matsumoto + +**********************************************************************/ + +#include "ruby.h" +#include "version.h" +#include <stdio.h> + +const char ruby_version[] = RUBY_VERSION; +const char ruby_release_date[] = RUBY_RELEASE_DATE; +const char ruby_platform[] = RUBY_PLATFORM; + +void +Init_version() +{ + VALUE v = rb_obj_freeze(rb_str_new2(ruby_version)); + VALUE d = rb_obj_freeze(rb_str_new2(ruby_release_date)); + VALUE p = rb_obj_freeze(rb_str_new2(ruby_platform)); + + rb_define_global_const("RUBY_VERSION", v); + rb_define_global_const("RUBY_RELEASE_DATE", d); + rb_define_global_const("RUBY_PLATFORM", p); +} + +void +ruby_show_version() +{ + printf("ruby %s (%s) [%s]\n", RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_PLATFORM); +} + +void +ruby_show_copyright() +{ + printf("ruby - Copyright (C) 1993-%d Yukihiro Matsumoto\n", RUBY_RELEASE_YEAR); + exit(0); +} +--- +# +# = ostruct.rb: OpenStruct implementation +# +# Author:: Yukihiro Matsumoto +# Documentation:: Gavin Sinclair +# +# OpenStruct allows the creation of data objects with arbitrary attributes. +# See OpenStruct for an example. +# + +# +# OpenStruct allows you to create data objects and set arbitrary attributes. +# For example: +# +# require 'ostruct' +# +# record = OpenStruct.new +# record.name = "John Smith" +# record.age = 70 +# record.pension = 300 +# +# puts record.name # -> "John Smith" +# puts record.address # -> nil +# +# It is like a hash with a different way to access the data. In fact, it is +# implemented with a hash, and you can initialize it with one. +# +# hash = { "country" => "Australia", :population => 20_000_000 } +# data = OpenStruct.new(hash) +# +# p data # -> <OpenStruct country="Australia" population=20000000> +# +class OpenStruct + # + # Create a new OpenStruct object. The optional +hash+, if given, will + # generate attributes and values. For example. + # + # require 'ostruct' + # hash = { "country" => "Australia", :population => 20_000_000 } + # data = OpenStruct.new(hash) + # + # p data # -> <OpenStruct country="Australia" population=20000000> + # + # By default, the resulting OpenStruct object will have no attributes. + # + def initialize(hash=nil) + @table = {} + if hash + for k,v in hash + @table[k.to_sym] = v + new_ostruct_member(k) + end + end + end + + # Duplicate an OpenStruct object members. + def initialize_copy(orig) + super + @table = @table.dup + end + + def marshal_dump + @table + end + def marshal_load(x) + @table = x + @table.each_key{|key| new_ostruct_member(key)} + end + + def new_ostruct_member(name) + unless self.respond_to?(name) + self.instance_eval %{ + def #{name}; @table[:#{name}]; end + def #{name}=(x); @table[:#{name}] = x; end + } + end + end + + def method_missing(mid, *args) # :nodoc: + mname = mid.id2name + len = args.length + if mname =~ /=$/ + if len != 1 + raise ArgumentError, "wrong number of arguments (#{len} for 1)", caller(1) + end + if self.frozen? + raise TypeError, "can't modify frozen #{self.class}", caller(1) + end + mname.chop! + @table[mname.intern] = args[0] + self.new_ostruct_member(mname) + elsif len == 0 + @table[mid] + else + raise NoMethodError, "undefined method `#{mname}' for #{self}", caller(1) + end + end + + # + # Remove the named field from the object. + # + def delete_field(name) + @table.delete name.to_sym + end + + # + # Returns a string containing a detailed summary of the keys and values. + # + def inspect + str = "<#{self.class}" + for k,v in @table + str << " #{k}=#{v.inspect}" + end + str << ">" + end + + attr_reader :table # :nodoc: + protected :table + + # + # Compare this object and +other+ for equality. + # + def ==(other) + return false unless(other.kind_of?(OpenStruct)) + return @table == other.table + end +end diff --git a/demo/demo_scanner.rb b/demo/demo_scanner.rb new file mode 100644 index 0000000..a250f91 --- /dev/null +++ b/demo/demo_scanner.rb @@ -0,0 +1,12 @@ +require 'coderay'
+c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"
+for text, kind in c_scanner
+ print text if kind == :operator
+end
+puts
+
+ruby_scanner = CodeRay::Scanners[:ruby].new %q<c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;">
+
+puts ruby_scanner.any? { |text, kind| kind == :string and text == :open}
+puts ruby_scanner.find { |text, kind| kind == :regexp }
+puts ruby_scanner.map { |text, kind| text if kind != :space }.compact.join(' ')
diff --git a/demo/demo_server.rb b/demo/demo_server.rb new file mode 100644 index 0000000..44485f0 --- /dev/null +++ b/demo/demo_server.rb @@ -0,0 +1,92 @@ +# CodeRay dynamic highlighter
+#
+# Usage: start this and your browser.
+#
+# Go to http://localhost:49374/?<path to the file>
+# (mnemonic: 49374 = Four-Nine-Three-Seven-Four = For No Token Shall Fall)
+# and you should get the highlighted version.
+
+require 'webrick'
+require 'pathname'
+
+class << File
+ alias dir? directory?
+end
+
+require 'erb'
+include ERB::Util
+def url_decode s
+ s.to_s.gsub(/%([0-9a-f]{2})/i) { [$1.hex].pack 'C' }
+end
+
+class String
+ def to_link name = File.basename(self)
+ "<a href=\"?path=#{url_encode self}\">#{name}</a>"
+ end
+end
+
+require 'coderay'
+class CodeRayServlet < WEBrick::HTTPServlet::AbstractServlet
+
+ STYLE = 'style="font-family: sans-serif; color: navy;"'
+ BANNER = '<p><img src="http://rd.cYcnus.de/coderay/coderay-banner" style="border: 0" alt="HIghlighted by CodeRay"/></p>'
+
+ def do_GET req, res
+ q = req.query_string || ''
+ args = Hash[*q.scan(/(.*?)=(.*?)(?:&|$)/).flatten].each_value { |v| v.replace url_decode(v) }
+ path = args.fetch 'path', '.'
+
+ backlinks = '<p>current path: %s<br />' % html_escape(path) +
+ (Pathname.new(path) + '..').cleanpath.to_s.to_link('up') + ' - ' +
+ '.'.to_link('current') + '</p>'
+
+ res.body =
+ if File.dir? path
+ path = Pathname.new(path).cleanpath.to_s
+ dirs, files = Dir[File.join(path, '*')].sort.partition { |p| File.dir? p }
+
+ page = "<html><head></head><body #{STYLE}>"
+ page << backlinks
+
+ page << '<dl>'
+ page << "<dt>Directories</dt>\n" + dirs.map do |p|
+ "<dd>#{p.to_link}</dd>\n"
+ end.join << "\n"
+ page << "<dt>Files</dt>\n" + files.map do |p|
+ "<dd>#{p.to_link}</dd>\n"
+ end.join << "\n"
+ page << "</dl>\n"
+ page << "#{BANNER}</body></html>"
+
+ elsif File.exist? path
+ div = CodeRay.scan_file(path).html :tab_width => 8, :wrap => :div
+ div.replace <<-DIV
+ <div #{STYLE}>
+ #{backlinks}
+#{div}
+ </div>
+ #{BANNER}
+ DIV
+ div.page
+ end
+
+ res['Content-Type'] = 'text/html'
+ end
+end
+
+# 0xCODE = 49374
+module CodeRay
+ PORT = 0xC0DE
+end
+
+server = WEBrick::HTTPServer.new :Port => CodeRay::PORT
+
+server.mount '/', CodeRayServlet
+
+server.mount_proc '/version' do |req, res|
+ res.body = 'CodeRay::Version = ' + CodeRay::Version
+ res['Content-Type'] = "text/plain"
+end
+
+trap("INT") { server.shutdown }
+server.start
diff --git a/demo/demo_simple.rb b/demo/demo_simple.rb new file mode 100644 index 0000000..a3129b0 --- /dev/null +++ b/demo/demo_simple.rb @@ -0,0 +1,10 @@ +
+# Load CodeRay
+# If this doesn't work, try ruby -rubygems.
+require 'coderay'
+
+# Generate HTML page for Ruby code.
+page = CodeRay.scan("puts 'Hello, world!'", :ruby).span
+
+# Print it
+puts page
diff --git a/demo/demo_stream.rb b/demo/demo_stream.rb new file mode 100644 index 0000000..b1d8560 --- /dev/null +++ b/demo/demo_stream.rb @@ -0,0 +1,8 @@ +$: << '..'
+require 'coderay'
+
+e = CodeRay.encoder(:html)
+t = e.encode_stream('a LOT of :code', :ruby)
+
+puts t
+p t.class
diff --git a/demo/demo_stream2.rb b/demo/demo_stream2.rb new file mode 100644 index 0000000..8a6bec7 --- /dev/null +++ b/demo/demo_stream2.rb @@ -0,0 +1,8 @@ +require 'coderay'
+
+token_stream = CodeRay::TokenStream.new do |kind, text|
+ puts 'kind: %s, text size: %d.' % [kind, text.size]
+end
+
+token_stream << [:regexp, '/\d+/']
+#-> kind: rexpexp, text size: 5.
diff --git a/demo/demo_tokens.rb b/demo/demo_tokens.rb new file mode 100644 index 0000000..eb8d448 --- /dev/null +++ b/demo/demo_tokens.rb @@ -0,0 +1,3 @@ +require 'coderay'
+
+puts CodeRay.scan("puts 3 + 4, '3 + 4'", :ruby).tokens
diff --git a/lib/coderay.rb b/lib/coderay.rb new file mode 100644 index 0000000..17c315d --- /dev/null +++ b/lib/coderay.rb @@ -0,0 +1,169 @@ +# = CodeRay +# +# CodeRay is a Ruby library for syntax highlighting. +# +# I try to make CodeRay easy to use and intuitive, but at the same time fully featured, complete, +# fast and efficient. +# +# See README. +# +# It consists mainly of +# * the main engine: CodeRay, CodeRay::Scanner, CodeRay::Tokens, CodeRay::TokenStream, CodeRay::Encoder +# * the scanners in CodeRay::Scanners +# * the encoders in CodeRay::Encoders +# +# Here's a fancy graphic to light up this gray docu: +# +# http://rd.cYcnus.de/coderay/scheme.png +# +# == Documentation +# +# See CodeRay, Encoders, Scanners, Tokens. +# +# == Usage +# +# Remember you need RubyGems to use CodeRay. Run Ruby with -rubygems option +# if required. +# +# === Highlight Ruby code in a string as html +# +# require 'coderay' +# print CodeRay.scan('puts "Hello, world!"', :ruby).compact.html.page +# +# # prints something like this: +# puts <span class="s">"Hello, world!"</span> +# +# +# === Highlight C code from a file in a html div +# +# require 'coderay' +# print CodeRay.scan(File.read('ruby.h'), :c).html.div +# # print CodeRay.scan_file('ruby.h').html.div ## not working yet +# +# You can include this div in your page. The used CSS styles can be printed with +# +# % ruby -rcoderay -e "print CodeRay::Encoders[:html]::CSS" +# +# === Highlight without typing too much +# +# If you are one of the hasty (or lazy, or extremely curious) people, just run this file: +# +# % ruby -rubygems coderay.rb +# +# If the output was to fast for you, try +# +# % ruby -rubygems coderay.rb > example.html +# +# and look at the file it created. +# +module CodeRay + + Version = '0.4.2' + + require 'coderay/tokens' + require 'coderay/scanner' + require 'coderay/encoder' + + + class << self + + # Scans the given +code+ (a String) with the Scanner for +lang+. + # + # This is a simple way to use CodeRay. Example: + # require 'coderay' + # page = CodeRay.scan("puts 'Hello, world!'", :ruby).html + # + # See also demo/demo_simple. + def scan code, lang, options = {}, &block + scanner = Scanners[lang].new code, options, &block + scanner.tokenize + end + + # Scans +filename+ (a path to a code file) with the Scanner for +lang+. + # + # If +lang+ is :auto or omitted, the CodeRay::FileType module is used to + # determine it. If it cannot find out what type it is, it uses CodeRay::Scanners::Plaintext. + # + # Calls CodeRay.scan. + # + # Example: + # require 'coderay' + # page = CodeRay.scan_file('some_c_code.c').html + def scan_file filename, lang = :auto, options = {}, &block + file = IO.read filename + if lang == :auto + require 'coderay/helpers/filetype' + lang = FileType.fetch filename, :plaintext, true + end + scan file, lang, options = {}, &block + end + + # Scan the +code+ (a string) with the scanner for +lang+. + # + # Calls scan. + # + # See CodeRay.scan. + def scan_stream code, lang, options = {}, &block + options[:stream] = true + scan code, lang, options, &block + end + + # Encode +code+ with the Encoder for +format+ and the Scanner for +lang+. + # +options+ will be passed to the Encoder. + # + # See CodeRay::Encoder.encode_stream + def encode_stream code, lang, format, options = {} + encoder(format, options).encode_stream code, lang, options + end + + def encode code, lang, format, options = {} + encoder(format, options).encode code, lang, options + end + + # Finds the Encoder class for +format+ and creates an instance, passing + # +options+ to it. + # + # Example: + # require 'coderay' + # token_count = CodeRay.encoder(:count).encodea("puts 17 + 4\n", :ruby).to_i #-> 8 + # require 'coderay' + # + # stats = CodeRay.encoder(:statistic) + # stats.encode("puts 17 + 4\n", :ruby) + # + # puts '%d out of %d tokens have the kind :integer.' % [ + # stats.type_stats[:integer].count, + # stats.real_token_count + # ] + # #-> 2 out of 4 tokens have the kind :integer. + def encoder format, options = {} + Encoders[format].new options + end + + end + + # This Exception is raised when you try to stream with something that is not + # capable of streaming. + class NotStreamableError < Exception + def initialize obj + @obj = obj + end + + def to_s + '%s is not Streamable!' % @obj.class + end + end + + # A dummy module that is included by subclasses of CodeRay::Scanner an CodeRay::Encoder + # to show that they are able to handle streams. + module Streamable + end + +end + +# Run a test script. +if $0 == __FILE__ + $stderr.print 'Press key to print demo.'; gets + code = File.read($0)[/module CodeRay.*/m] + print CodeRay.scan(code, :ruby).html +end diff --git a/lib/coderay/encoder.rb b/lib/coderay/encoder.rb new file mode 100644 index 0000000..5f6d511 --- /dev/null +++ b/lib/coderay/encoder.rb @@ -0,0 +1,210 @@ +module CodeRay
+
+ # This module holds class Encoder and its subclasses.
+ # For example, the HTML encoder is named CodeRay::Encoders::HTML
+ # can be found in coderay/encoders/html.
+ #
+ # Encoders also provides methods and constants for the register mechanism
+ # and the [] method that returns the Encoder class belonging to the
+ # given format.
+ module Encoders
+
+ # Raised if Encoders[] fails because:
+ # * an file could not be found
+ # * the requested Encoder is not registered
+ EncoderNotFound = Class.new Exception
+
+ # Loaded Encoders are saved here.
+ ENCODERS = Hash.new do |h, lang|
+ path = Encoders.path_to lang
+ lang = lang.to_sym
+ begin
+ require path
+ rescue LoadError
+ raise EncoderNotFound, "#{path} not found."
+ else
+ # Encoder should have registered by now
+ unless h[lang]
+ raise EncoderNotFound, "No Encoder for #{lang} found in #{path}."
+ end
+ end
+ h[lang]
+ end
+
+ class << self
+
+ # Every Encoder class must register itself for one or more +formats+
+ # by calling register_for, which calls this method.
+ #
+ # See CodeRay::Encoder.register_for.
+ def register encoder_class, *formats
+ for format in formats
+ ENCODERS[format.to_sym] = encoder_class
+ end
+ end
+
+ # Returns the Encoder for +lang+.
+ #
+ # Example:
+ # require 'coderay'
+ # yaml_encoder = CodeRay::Encoders[:yaml]
+ def [] lang
+ ENCODERS[lang]
+ end
+
+ # Alias for +[]+.
+ alias load []
+
+ # Returns the path to the encoder for format.
+ def path_to plugin
+ File.join 'coderay', 'encoders', "#{plugin}.rb"
+ end
+
+ end
+
+
+ # The Encoder base class. Together with CodeRay::Scanner and
+ # CodeRay::Tokens, it forms the highlighting triad.
+ #
+ # Encoder instances take a Tokens object and do something with it.
+ #
+ # The most common Encoder is surely the HTML encoder
+ # (CodeRay::Encoders::HTML). It highlights the code in a colorful
+ # html page.
+ # If you want the highlighted code in a div or a span instead,
+ # use its subclasses Div and Span.
+ class Encoder
+
+ attr_reader :token_stream
+
+ class << self
+
+ # Register this class for the given langs.
+ #
+ # Example:
+ # class MyEncoder < CodeRay::Encoders:Encoder
+ # register_for :myenc
+ # ...
+ # end
+ #
+ # See Encoder.register.
+ def register_for *args
+ Encoders.register self, *args
+ end
+
+ # Returns if the Encoder can be used in streaming mode.
+ def streamable?
+ is_a? Streamable
+ end
+
+ # If FILE_EXTENSION isn't defined, this method returns the downcase
+ # class name instead.
+ def const_missing sym
+ if sym == :FILE_EXTENSION
+ sym.to_s.downcase
+ else
+ super
+ end
+ end
+
+ end
+
+ # Subclasses are to store their default options in this constant.
+ DEFAULT_OPTIONS = { :stream => false }
+
+ # The options you gave the Encoder at creating.
+ attr_accessor :options
+
+ # Creates a new Encoder.
+ # +options+ is saved and used for all encode operations, as long as you
+ # don't overwrite it there by passing additional options.
+ #
+ # Encoder objects provide three encode methods:
+ # - encode simply takes a +code+ string and a +lang+
+ # - encode_tokens expects a +tokens+ object instead
+ # - encode_stream is like encode, but uses streaming mode.
+ #
+ # Each method has an optional +options+ parameter. These are added to
+ # the options you passed at creation.
+ def initialize options = {}
+ @options = self.class::DEFAULT_OPTIONS.merge options
+ raise "I am only the basic Encoder class. I can't encode anything. :(\n" +
+ "Use my subclasses." if self.class == Encoder
+ end
+
+ # Encode a Tokens object.
+ def encode_tokens tokens, options = {}
+ options = @options.merge options
+ setup options
+ compile tokens, options
+ finish options
+ end
+
+ # Encode the given +code+ after tokenizing it using the Scanner for
+ # +lang+.
+ def encode code, lang, options = {}
+ options = @options.merge options
+ scanner_options = options.fetch(:scanner_options, {})
+ tokens = CodeRay.scan code, lang, scanner_options
+ encode_tokens tokens, options
+ end
+
+ # You can use highlight instead of encode, if that seems
+ # more clear to you.
+ alias highlight encode
+
+ # Encode the given +code+ using the Scanner for +lang+ in streaming
+ # mode.
+ def encode_stream code, lang, options = {}
+ raise NotStreamableError, self unless kind_of? Streamable
+ options = @options.merge options
+ setup options
+ scanner_options = options.fetch :scanner_options, {}
+ @token_stream = CodeRay.scan_stream code, lang, scanner_options, &self
+ finish options
+ end
+
+ # Behave like a proc. The tokens method is converted to a proc.
+ def to_proc
+ method(:token).to_proc
+ end
+
+ protected
+
+ # Called with merged options before encoding starts.
+ # Sets @out to an empty string.
+ #
+ # See the HTML Encoder for an example of option caching.
+ def setup options
+ @out = ''
+ end
+
+ # Called with +text+ and +kind+ of the currently scanned token.
+ # For simple scanners, it's enougth to implement this method.
+ #
+ # Raises a NotImplementedError exception if it is not overwritten in
+ # subclass.
+ def token text, kind
+ raise NotImplementedError, "#{self.class}#token not implemented."
+ end
+
+ # Called with merged options after encoding starts.
+ # The return value is the result of encoding, typically @out.
+ def finish options
+ @out
+ end
+
+ # Do the encoding.
+ #
+ # The already created +tokens+ object must be used; it can be a
+ # TokenStream or a Tokens object.
+ def compile tokens, options
+ tokens.each(&self)
+ end
+
+ end
+
+ end
+end
+
+# vim:sw=2:ts=2:et:tw=78
diff --git a/lib/coderay/encoders/count.rb b/lib/coderay/encoders/count.rb new file mode 100644 index 0000000..80aec57 --- /dev/null +++ b/lib/coderay/encoders/count.rb @@ -0,0 +1,20 @@ +module CodeRay
+module Encoders
+
+ class Count < Encoder
+
+ register_for :count
+
+ protected
+
+ def setup options
+ @out = 0
+ end
+
+ def token text, kind
+ @out += 1
+ end
+ end
+
+end
+end
diff --git a/lib/coderay/encoders/div.rb b/lib/coderay/encoders/div.rb new file mode 100644 index 0000000..640df0e --- /dev/null +++ b/lib/coderay/encoders/div.rb @@ -0,0 +1,16 @@ +module CodeRay module Encoders
+
+ require 'coderay/encoders/html'
+ class Div < HTML
+
+ FILE_EXTENSION = 'div.html'
+
+ register_for :div
+
+ DEFAULT_OPTIONS = HTML::DEFAULT_OPTIONS.merge({
+ :css => :style,
+ :wrap => :div,
+ })
+ end
+
+end end
diff --git a/lib/coderay/encoders/helpers/html_css.rb b/lib/coderay/encoders/helpers/html_css.rb new file mode 100644 index 0000000..f9cadf7 --- /dev/null +++ b/lib/coderay/encoders/helpers/html_css.rb @@ -0,0 +1,168 @@ +module CodeRay module Encoders
+
+ class HTML
+ class CSS
+
+ def initialize stylesheet = TOKENS
+ @classes = Hash.new
+ parse stylesheet
+ end
+
+ def [] *styles
+ cl = @classes[styles.first]
+ return '' unless cl
+ style = false
+ 1.upto(cl.size + 1) do |offset|
+ break if style = cl[styles[offset .. -1]]
+ end
+ return style
+ end
+
+ private
+
+ CSS_CLASS = /
+ ( (?: # $1 = classes
+ \s* \. [-\w]+
+ )+ )
+ \s* \{
+ ( [^\}]* ) # $2 = style
+ \} \s*
+ |
+ ( . ) # $3 = error
+ /mx
+ def parse stylesheet
+ stylesheet.scan CSS_CLASS do |classes, style, error|
+ raise "CSS parse error: '#{error}' not recognized" if error
+ styles = classes.scan(/[-\w]+/)
+ cl = styles.pop
+ @classes[cl] ||= Hash.new
+ @classes[cl][styles] = style.strip
+ end
+ end
+
+ MAIN = <<-'MAIN'
+.code {
+ background-color: #FAFAFA;
+ border: 1px solid #D1D7DC;
+ font-family: 'Courier New', 'Terminal', monospace;
+ font-size: 10pt;
+ color: black;
+ vertical-align: top;
+ text-align: left;
+ padding: 0px;
+}
+span.code { white-space: pre; }
+.code tt { font-weight: bold; }
+.code pre {
+ font-size: 10pt;
+ margin: 0px 5px;
+}
+.code .code_table {
+ margin: 0px;
+}
+.code .line_numbers {
+ margin: 0px;
+ background-color:#DEF; color: #777;
+ vertical-align: top;
+ text-align: right;
+}
+.code .code_cell {
+ width: 100%;
+ background-color:#FAFAFA;
+ color: black;
+ vertical-align: top;
+ text-align: left;
+}
+.code .no {
+ background-color:#DEF;
+ color: #777;
+ padding: 0px 5px;
+ font-weight: normal;
+ font-style: normal;
+}
+
+.code tt { display: hidden; }
+
+ MAIN
+
+ TOKENS = <<-'TOKENS'
+.af { color:#00C; }
+.an { color:#007; }
+.av { color:#700; }
+.aw { color:#C00; }
+.bi { color:#509; font-weight:bold; }
+.c { color:#888; }
+
+.ch { color:#04D; /* background-color:#f0f0ff; */ }
+.ch .k { color:#04D; }
+.ch .dl { color:#039; }
+
+.cl { color:#B06; font-weight:bold; }
+.co { color:#036; font-weight:bold; }
+.cr { color:#0A0; }
+.cv { color:#369; }
+.df { color:#099; font-weight:bold; }
+.di { color:#088; font-weight:bold; }
+.dl { color:black; }
+.do { color:#970; }
+.ds { color:#D42; font-weight:bold; }
+.e { color:#666; font-weight:bold; }
+.er { color:#F00; background-color:#FAA; }
+.ex { color:#F00; font-weight:bold; }
+.fl { color:#60E; font-weight:bold; }
+.fu { color:#06B; font-weight:bold; }
+.gv { color:#d70; font-weight:bold; }
+.hx { color:#058; font-weight:bold; }
+.i { color:#00D; font-weight:bold; }
+.ic { color:#B44; font-weight:bold; }
+.in { color:#B2B; font-weight:bold; }
+.iv { color:#33B; }
+.la { color:#970; font-weight:bold; }
+.lv { color:#963; }
+.oc { color:#40E; font-weight:bold; }
+.on { color:#000; font-weight:bold; }
+.pc { color:#038; font-weight:bold; }
+.pd { color:#369; font-weight:bold; }
+.pp { color:#579; }
+.pt { color:#339; font-weight:bold; }
+.r { color:#080; font-weight:bold; }
+
+.rx { background-color:#fff0ff; }
+.rx .k { color:#808; }
+.rx .dl { color:#404; }
+.rx .mod { color:#C2C; }
+.rx .fu { color:#404; font-weight: bold; }
+
+.s { background-color:#fff0f0; }
+.s .s { background-color:#ffe0e0; }
+.s .s .s { background-color:#ffd0d0; }
+.s .k { color:#D20; }
+.s .dl { color:#710; }
+
+.sh { background-color:#f0fff0; }
+.sh .k { color:#2B2; }
+.sh .dl { color:#161; }
+
+.sy { color:#A60; }
+.sy .k { color:#A60; }
+.sy .dl { color:#630; }
+
+.ta { color:#070; }
+.tf { color:#070; font-weight:bold; }
+.ts { color:#D70; font-weight:bold; }
+.ty { color:#339; font-weight:bold; }
+.v { color:#036; }
+.xt { color:#444; }
+ TOKENS
+
+ DEFAULT_STYLESHEET = MAIN + TOKENS
+
+ end
+ end
+
+end end
+
+if $0 == __FILE__
+ require 'pp'
+ pp CodeRay::Encoders::HTML::CSS.new
+end
diff --git a/lib/coderay/encoders/helpers/html_helper.rb b/lib/coderay/encoders/helpers/html_helper.rb new file mode 100644 index 0000000..03ea0a2 --- /dev/null +++ b/lib/coderay/encoders/helpers/html_helper.rb @@ -0,0 +1,68 @@ +module CodeRay module Encoders
+
+ class HTML
+
+ ClassOfKind = {
+ :attribute_name => 'an',
+ :attribute_name_fat => 'af',
+ :attribute_value => 'av',
+ :attribute_value_fat => 'aw',
+ :bin => 'bi',
+ :char => 'ch',
+ :class => 'cl',
+ :class_variable => 'cv',
+ :color => 'cr',
+ :comment => 'c',
+ :constant => 'co',
+ :content => 'k',
+ :definition => 'df',
+ :delimiter => 'dl',
+ :directive => 'di',
+ :doc => 'do',
+ :doc_string => 'ds',
+ :error => 'er',
+ :escape => 'e',
+ :exception => 'ex',
+ :float => 'fl',
+ :function => 'fu',
+ :global_variable => 'gv',
+ :hex => 'hx',
+ :include => 'ic',
+ :instance_variable => 'iv',
+ :integer => 'i',
+ :interpreted => 'in',
+ :label => 'la',
+ :local_variable => 'lv',
+ :modifier => 'mod',
+ :oct => 'oc',
+ :operator_name => 'on',
+ :pre_constant => 'pc',
+ :pre_type => 'pt',
+ :predefined => 'pd',
+ :preprocessor => 'pp',
+ :regexp => 'rx',
+ :reserved => 'r',
+ :shell => 'sh',
+ :string => 's',
+ :symbol => 'sy',
+ :tag => 'ta',
+ :tag_fat => 'tf',
+ :tag_special => 'ts',
+ :type => 'ty',
+ :variable => 'v',
+ :xml_text => 'xt',
+
+ :ident => :NO_HIGHLIGHT, # 'id'
+ :operator => :NO_HIGHLIGHT, # 'op'
+ :space => :NO_HIGHLIGHT, # 'sp'
+ :plain => :NO_HIGHLIGHT,
+ }
+ ClassOfKind[:procedure] = ClassOfKind[:method] = ClassOfKind[:function]
+ ClassOfKind[:open] = ClassOfKind[:close] = ClassOfKind[:delimiter]
+ ClassOfKind[:nesting_delimiter] = ClassOfKind[:delimiter]
+ ClassOfKind[:escape] = ClassOfKind[:delimiter]
+ ClassOfKind.default = ClassOfKind[:error] or raise 'no class found for :error!'
+
+ end
+
+end end
diff --git a/lib/coderay/encoders/helpers/html_output.rb b/lib/coderay/encoders/helpers/html_output.rb new file mode 100644 index 0000000..e2b26e7 --- /dev/null +++ b/lib/coderay/encoders/helpers/html_output.rb @@ -0,0 +1,240 @@ +module CodeRay
+ module Encoders
+
+ class HTML
+
+ # This module is included in the output String from thew HTML Encoder.
+ #
+ # It provides methods like wrap, div, page etc.
+ #
+ # Remember to use #clone instead of #dup to keep the modules the object was
+ # extended with.
+ #
+ # TODO: more doc.
+ module Output
+
+ class << self
+
+ # This makes Output look like a class.
+ #
+ # Example:
+ #
+ # a = Output.new '<span class="co">Code</span>'
+ # a.wrap! :page
+ def new string, element = nil
+ output = string.clone.extend self
+ output.wrapped_in = element
+ output
+ end
+
+ # Raises an exception if an object that doesn't respond to to_str is extended by Output,
+ # to prevent users from misuse. Use Module#remove_method to disable.
+ def extended o
+ warn "The Output module is intended to extend instances of String, not #{o.class}." unless o.respond_to? :to_str
+ end
+
+ def page_template_for_css css = :default
+ css = CSS::DEFAULT_STYLESHEET if css == :default
+ PAGE.apply 'CSS', css
+ end
+
+ # Define a new wrapper. This is meta programming.
+ def wrapper *wrappers
+ wrappers.each do |wrapper|
+ define_method wrapper do |*args|
+ wrap wrapper, *args
+ end
+ define_method(:"#{wrapper}!") do |*args|
+ wrap! wrapper, *args
+ end
+ end
+ end
+ end
+
+ wrapper :div, :span, :page
+
+ def wrapped_in
+ @wrapped_in || nil
+ end
+ attr_writer :wrapped_in
+
+ def wrapped_in? element
+ wrapped_in == element
+ end
+
+ def wrap_in template
+ clone.wrap_in! template
+ end
+
+ def wrap_in! template
+ Template.wrap! self, template, 'CONTENT'
+ self
+ end
+
+ def wrap! element, *args
+ return self if not element or element == wrapped_in
+ case element
+ when :div
+ raise "Can't wrap %p in %p" % [wrapped_in, element] unless wrapped_in? nil
+ wrap_in! DIV
+ when :span
+ raise "Can't wrap %p in %p" % [wrapped_in, element] unless wrapped_in? nil
+ wrap_in! SPAN
+ when :page
+ wrap! :div if wrapped_in? nil
+ raise "Can't wrap %p in %p" % [wrapped_in, element] unless wrapped_in? :div
+ wrap_in! Output.page_template_for_css
+ when nil
+ return self
+ else
+ raise "Unknown value %p for :wrap" % element
+ end
+ @wrapped_in = element
+ self
+ end
+
+ def wrap *args
+ clone.wrap!(*args)
+ end
+
+ def numerize! mode = :table, options = {}
+ return self unless mode
+
+ offset = options.fetch :line_numbers_offset, DEFAULT_OPTIONS[:line_numbers_offset]
+ unless offset.is_a? Integer
+ raise ArgumentError, "Invalid value %p for :offset; Integer expected." % offset
+ end
+
+ unless NUMERIZABLE_WRAPPINGS.include? options[:wrap]
+ raise ArgumentError, "Can't numerize, :wrap must be in %p, but is %p" % [NUMERIZABLE_WRAPPINGS, options[:wrap]]
+ end
+
+ bold_every = options.fetch :bold_every, DEFAULT_OPTIONS[:bold_every]
+ bolding =
+ if bold_every == :no_bolding or bold_every == 0
+ proc { |line| line.to_s }
+ elsif bold_every.is_a? Integer
+ proc do |line|
+ if line % bold_every == 0
+ "<strong>#{line}</strong>" # every bold_every-th number in bold
+ else
+ line.to_s
+ end
+ end
+ else
+ raise ArgumentError, "Invalid value %p for :bolding; :no_bolding or Integer expected." % bolding
+ end
+
+ line_count = count("\n")
+ line_count += 1 if self[-1] != ?\n
+
+ case mode
+ when :inline
+ max_width = line_count.to_s.size
+ line = offset - 1
+ gsub!(/^/) do
+ line += 1
+ line_number = bolding.call line
+ "<span class=\"no\">#{ line_number.rjust(max_width) }</span> "
+ end
+ wrap! :div
+
+ when :table
+ # This is really ugly.
+ # Because even monospace fonts seem to have different heights when bold,
+ # I make the newline bold, both in the code and the line numbers.
+ # FIXME Still not working perfect for Mr. Internet Exploder
+ line_numbers = (offset ... offset + line_count).to_a.map(&bolding).join("\n")
+ line_numbers << "\n" # also for Mr. MS Internet Exploder :-/
+ line_numbers.gsub!(/\n/) { "<tt>\n</tt>" }
+
+ line_numbers_tpl = DIV_TABLE.apply('LINE_NUMBERS', line_numbers)
+ gsub!(/\n/) { "<tt>\n</tt>" }
+ wrap_in! line_numbers_tpl
+ @wrapped_in = :div
+
+ else
+ raise ArgumentError, "Unknown value %p for mode: :inline or :table expected" % mode
+ end
+
+ self
+ end
+
+ def numerize *args
+ clone.numerize!(*args)
+ end
+
+ class Template < String
+
+ def self.wrap! str, template, target
+ target = Regexp.new(Regexp.escape("<%#{target}%>"))
+ if template =~ target
+ str[0,0] = $`
+ str << $'
+ else
+ raise "Template target <%%%p%%> not found" % target
+ end
+ end
+
+ def apply target, replacement
+ target = Regexp.new(Regexp.escape("<%#{target}%>"))
+ if self =~ target
+ Template.new($` + replacement + $')
+ else
+ raise "Template target <%%%p%%> not found" % target
+ end
+ end
+
+ module Simple
+ def ` str #`
+ Template.new str
+ end
+ end
+ end
+
+ extend Template::Simple
+
+#-- don't include the templates in docu
+
+ SPAN = `<span class="code"><%CONTENT%></span>`
+
+ DIV, DIV_TABLE, PAGE =
+ <<-`DIV`, <<-`DIV_TABLE`, <<-`PAGE`
+
+<div class="code">
+<pre><%CONTENT%></pre>
+</div>
+ DIV
+
+<div class="code">
+ <table class="code_table">
+ <tr>
+ <td class="line_numbers"><pre><%LINE_NUMBERS%></pre></td>
+ <td class="code_cell"><div class="nowrap"><pre><%CONTENT%></pre></div></td>
+ </tr>
+ </table>
+</div>
+ DIV_TABLE
+<?xml version="1.0" encoding="iso-8859-1"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="de">
+<head>
+ <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
+ <title>CodeRay HTML Encoder Example</title>
+ <style type="text/css">
+<%CSS%>
+ </style>
+</head>
+<body style="background-color: white;">
+<%CONTENT%>
+</body>
+</html>
+ PAGE
+
+ end
+
+ end
+
+end
+end
diff --git a/lib/coderay/encoders/html.rb b/lib/coderay/encoders/html.rb new file mode 100644 index 0000000..69b6e22 --- /dev/null +++ b/lib/coderay/encoders/html.rb @@ -0,0 +1,167 @@ +module CodeRay
+module Encoders
+
+ class HTML < Encoder
+
+ include Streamable
+ register_for :html
+
+ FILE_EXTENSION = 'html'
+
+ DEFAULT_OPTIONS = {
+ :tab_width => 8,
+
+ :level => :xhtml,
+ :css => :class,
+
+ :wrap => :page,
+ :line_numbers => :table,
+ :line_numbers_offset => 1,
+ :bold_every => 10,
+ }
+ NUMERIZABLE_WRAPPINGS = [:div, :page]
+
+ require 'coderay/encoders/helpers/html_helper'
+ require 'coderay/encoders/helpers/html_output'
+ require 'coderay/encoders/helpers/html_css'
+
+ def initialize(*)
+ super
+ @last_options = nil
+ end
+
+ protected
+
+ HTML_ESCAPE = { #:nodoc:
+ '&' => '&',
+ '"' => '"',
+ '>' => '>',
+ '<' => '<',
+ }
+
+ # This is to prevent illegal HTML.
+ # Strange chars should still be avoided in codes.
+ evil_chars = Array(0x00...0x20) - [?n, ?t]
+ evil_chars.each { |i| HTML_ESCAPE[i.chr] = ' ' }
+ ansi_chars = Array(0x7f..0xff)
+ ansi_chars.each { |i| HTML_ESCAPE[i.chr] = '&#%d;' % i }
+ # \x9 (\t) and \xA (\n) not included
+ HTML_ESCAPE_PATTERN = /[&"><\0-\x8\xB-\x1f\x7f-\xff]/
+
+ def setup options
+ if options[:line_numbers] and not NUMERIZABLE_WRAPPINGS.include? options[:wrap]
+ warn ':line_numbers wanted, but :wrap is %p' % options[:wrap]
+ end
+ super
+ return if options == @last_options
+ @last_options = options
+
+ @HTML_ESCAPE = HTML_ESCAPE.dup
+ @HTML_ESCAPE["\t"] = ' ' * options[:tab_width]
+
+ @opened = [nil]
+ @css = CSS.new
+
+ case options[:css]
+
+ when :class
+ @css_style = Hash.new do |h, k|
+ if k.is_a? Array
+ type = k.first
+ else
+ type = k
+ end
+ c = ClassOfKind[type]
+ if c == :NO_HIGHLIGHT
+ h[k] = false
+ else
+ if options[:debug]
+ debug_info = ' title="%p"' % [ k ]
+ else
+ debug_info = ''
+ end
+ h[k] = '<span%s class="%s">' % [debug_info, c]
+ end
+ end
+
+ when :style
+ @css_style = Hash.new do |h, k|
+ if k.is_a? Array
+ styles = k.dup
+ else
+ styles = [k]
+ end
+ styles.map! { |c| ClassOfKind[c] }
+ if styles.first == :NO_HIGHLIGHT
+ h[k] = false
+ else
+ if options[:debug]
+ debug_info = ' title="%s"' % [ styles.inspect.gsub(/#{HTML_ESCAPE_PATTERN}/o) { |m| @HTML_ESCAPE[m] } ]
+ else
+ debug_info = ''
+ end
+ style = @css[*styles]
+ h[k] =
+ if style
+ '<span%s style="%s">' % [debug_info, style]
+ else
+ false
+ end
+ end
+ end
+
+ else
+ raise "Unknown value %p for :css." % options[:css]
+
+ end
+ end
+
+ def finish options
+ not_needed = @opened.shift
+ @out << '</span>' * @opened.size
+
+ @out.extend Output
+ @out.numerize! options[:line_numbers], options # if options[:line_numbers]
+ @out.wrap! options[:wrap] # if options[:wrap]
+
+ #require 'pp'
+ #pp @css_style, @css_style.size
+
+ super
+ end
+
+ def token text, type
+ if text.is_a? String
+ # be careful when streaming: text is changed!
+ text.gsub!(/#{HTML_ESCAPE_PATTERN}/o) { |m| @HTML_ESCAPE[m] }
+ @opened[0] = type
+ style = @css_style[@opened]
+ if style
+ @out << style << text << '</span>'
+ else
+ @out << text
+ end
+ else
+ case text
+ when :open
+ @opened[0] = type
+ @out << @css_style[@opened]
+ @opened << type
+ when :close
+ unless @opened.empty?
+ raise 'Not Token to be closed.' unless @opened.size > 1
+ @out << '</span>'
+ @opened.pop
+ end
+ when nil
+ raise 'Token with nil as text was given: %p' % [[text, type]]
+ else
+ raise 'unknown token kind: %p' % text
+ end
+ end
+ end
+
+ end
+
+end
+end
diff --git a/lib/coderay/encoders/null.rb b/lib/coderay/encoders/null.rb new file mode 100644 index 0000000..67c4987 --- /dev/null +++ b/lib/coderay/encoders/null.rb @@ -0,0 +1,20 @@ +module CodeRay
+ module Encoders
+
+ class Null < Encoder
+
+ include Streamable
+ register_for :null
+
+ protected
+
+ def token(*)
+ # do nothing
+ end
+
+ end
+
+ end
+end
+
+
diff --git a/lib/coderay/encoders/span.rb b/lib/coderay/encoders/span.rb new file mode 100644 index 0000000..a7715f4 --- /dev/null +++ b/lib/coderay/encoders/span.rb @@ -0,0 +1,17 @@ +module CodeRay module Encoders
+
+ require 'coderay/encoders/html'
+ class Span < HTML
+
+ FILE_EXTENSION = 'span.html'
+
+ register_for :span
+
+ DEFAULT_OPTIONS = HTML::DEFAULT_OPTIONS.merge({
+ :css => :style,
+ :wrap => :span,
+ :line_numbers => nil,
+ })
+ end
+
+end end
diff --git a/lib/coderay/encoders/statistic.rb b/lib/coderay/encoders/statistic.rb new file mode 100644 index 0000000..0685c03 --- /dev/null +++ b/lib/coderay/encoders/statistic.rb @@ -0,0 +1,74 @@ +module CodeRay module Encoders
+
+ # Makes a statistic for the given tokens.
+ class Statistic < Encoder
+
+ include Streamable
+ register_for :stats, :statistic
+
+ attr_reader :type_stats, :real_token_count
+
+ protected
+
+ TypeStats = Struct.new :count, :size
+
+ def setup options
+ @type_stats = Hash.new { |h, k| h[k] = TypeStats.new 0, 0 }
+ @real_token_count = 0
+ end
+
+ def generate tokens, options
+ @tokens = tokens
+ super
+ end
+
+ def token text, type
+ @type_stats['TOTAL'].count += 1
+ if text.is_a? String
+ @real_token_count += 1 unless type == :space
+ @type_stats[type].count += 1
+ @type_stats[type].size += text.size
+ @type_stats['TOTAL'].size += text.size
+ else
+ @content_type = type
+ @type_stats['open/close'].count += 1
+ end
+ end
+
+ STATS = <<-STATS
+
+Code Statistics
+
+Tokens %8d
+ Non-Whitespace %8d
+Bytes Total %8d
+
+Token Types (%d):
+ type count ratio size (average)
+-------------------------------------------------------------
+%s
+ STATS
+# space 12007 33.81 % 1.7
+ TOKEN_TYPES_ROW = <<-TKR
+ %-20s %8d %6.2f %% %5.1f
+ TKR
+
+ def finish options
+ all = @type_stats['TOTAL']
+ all_count, all_size = all.count, all.size
+ @type_stats.each do |type, stat|
+ stat.size /= stat.count.to_f
+ end
+ types_stats = @type_stats.sort_by { |k, v| -v.count }.map do |k, v|
+ TOKEN_TYPES_ROW % [k, v.count, 100.0 * v.count / all_count, v.size]
+ end.join
+ STATS % [
+ all_count, @real_token_count, all_size,
+ @type_stats.delete_if { |k, v| k.is_a? String }.size,
+ types_stats
+ ]
+ end
+
+ end
+
+end end
diff --git a/lib/coderay/encoders/text.rb b/lib/coderay/encoders/text.rb new file mode 100644 index 0000000..4f0a754 --- /dev/null +++ b/lib/coderay/encoders/text.rb @@ -0,0 +1,33 @@ +module CodeRay
+ module Encoders
+
+ class Text < Encoder
+
+ include Streamable
+ register_for :text
+
+ FILE_EXTENSION = 'txt'
+
+ DEFAULT_OPTIONS = {
+ :separator => ''
+ }
+
+ protected
+ def setup options
+ super
+ @sep = options[:separator]
+ end
+
+ def token text, kind
+ return unless text.respond_to :to_str
+ @out << text + @sep
+ end
+
+ def finish options
+ @out.chomp @sep
+ end
+
+ end
+
+ end
+end
diff --git a/lib/coderay/encoders/tokens.rb b/lib/coderay/encoders/tokens.rb new file mode 100644 index 0000000..4573307 --- /dev/null +++ b/lib/coderay/encoders/tokens.rb @@ -0,0 +1,44 @@ +module CodeRay
+ module Encoders
+
+ # The Tokens encoder converts the tokens to a simple
+ # readable format. It doesn't use colors and is mainly
+ # intended for console output.
+ #
+ # The tokens are converted with Tokens.write_token.
+ #
+ # The format is:
+ #
+ # <token-kind> \t <escaped token-text> \n
+ #
+ # Example:
+ #
+ # require 'coderay'
+ # puts CodeRay.scan("puts 3 + 4", :ruby).tokens
+ #
+ # prints:
+ #
+ # ident puts
+ # space
+ # integer 3
+ # space
+ # operator +
+ # space
+ # integer 4
+ #
+ class Tokens < Encoder
+
+ include Streamable
+ register_for :tokens
+
+ FILE_EXTENSION = 'tok'
+
+ protected
+ def token *args
+ @out << CodeRay::Tokens.write_token(*args)
+ end
+
+ end
+
+ end
+end
diff --git a/lib/coderay/encoders/yaml.rb b/lib/coderay/encoders/yaml.rb new file mode 100644 index 0000000..4e2b7a1 --- /dev/null +++ b/lib/coderay/encoders/yaml.rb @@ -0,0 +1,19 @@ +module CodeRay
+ module Encoders
+
+ class YAML < Encoder
+
+ register_for :yaml
+
+ FILE_EXTENSION = 'yaml'
+
+ protected
+ def compile tokens, options
+ require 'yaml'
+ @out = tokens.to_a.to_yaml
+ end
+
+ end
+
+ end
+end
diff --git a/lib/coderay/helpers/filetype.rb b/lib/coderay/helpers/filetype.rb new file mode 100644 index 0000000..7f34c35 --- /dev/null +++ b/lib/coderay/helpers/filetype.rb @@ -0,0 +1,145 @@ +# =FileType
+#
+# A simple filetype recognizer
+#
+# Author: murphy (mail to murphy cYcnus de)
+#
+# Version: 0.1 (2005.september.1)
+#
+# ==Documentation
+#
+# TODO
+#
+module FileType
+
+ UnknownFileType = Class.new Exception
+
+ class << self
+
+ def [] filename, read_shebang = false
+ name = File.basename filename
+ ext = File.extname name
+ ext.sub!(/^\./, '') # delete the leading dot
+
+ type =
+ TypeFromExt[ext] ||
+ TypeFromExt[ext.downcase] ||
+ TypeFromName[name] ||
+ TypeFromName[name.downcase]
+ type ||= shebang(filename) if read_shebang
+
+ type
+ end
+
+ def shebang filename
+ begin
+ File.open filename, 'r' do |f|
+ first_line = f.gets
+ first_line[TypeFromShebang]
+ end
+ rescue IOError
+ nil
+ end
+ end
+
+ # This works like Hash#fetch.
+ def fetch filename, default = nil, read_shebang = false
+ if default and block_given?
+ warn 'block supersedes default value argument'
+ end
+
+ unless type = self[filename, read_shebang]
+ return yield if block_given?
+ return default if default
+ raise UnknownFileType, 'Could not determine type of %p.' % filename
+ end
+ type
+ end
+
+ end
+
+ TypeFromExt = {
+ 'rb' => :ruby,
+ 'rbw' => :ruby,
+ 'cpp' => :cpp,
+ 'c' => :c,
+ 'h' => :c,
+ 'xml' => :xml,
+ 'htm' => :html,
+ 'html' => :html,
+ }
+
+ TypeFromShebang = /\b(?:ruby|perl|python|sh)\b/
+
+ TypeFromName = {
+ 'Rakefile' => :ruby,
+ 'Rantfile' => :ruby,
+ }
+
+end
+
+if $0 == __FILE__
+ $VERBOSE = true
+ eval DATA.read, nil, $0, __LINE__+4
+end
+
+__END__
+
+require 'test/unit'
+
+class TC_FileType < Test::Unit::TestCase
+
+ def test_fetch
+ assert_raise FileType::UnknownFileType do
+ FileType.fetch ''
+ end
+
+ assert_throws :not_found do
+ FileType.fetch '.' do
+ throw :not_found
+ end
+ end
+
+ assert_equal :default, FileType.fetch('c', :default)
+
+ stderr, fake_stderr = $stderr, Object.new
+ $err = ''
+ def fake_stderr.write x
+ $err << x
+ end
+ $stderr = fake_stderr
+ FileType.fetch('c', :default) { }
+ assert_equal "block supersedes default value argument\n", $err
+ $stderr = stderr
+ end
+
+ def test_ruby
+ assert_equal :ruby, FileType['test.rb']
+ assert_equal :ruby, FileType['C:\\Program Files\\x\\y\\c\\test.rbw']
+ assert_equal :ruby, FileType['/usr/bin/something/Rakefile']
+ assert_equal :ruby, FileType['~/myapp/gem/Rantfile']
+ assert_not_equal :ruby, FileType['test_rb']
+ assert_not_equal :ruby, FileType['Makefile']
+ assert_not_equal :ruby, FileType['set.rb/set']
+ assert_not_equal :ruby, FileType['~/projects/blabla/rb']
+ end
+
+ def test_c
+ assert_equal :c, FileType['test.c']
+ assert_equal :c, FileType['C:\\Program Files\\x\\y\\c\\test.h']
+ assert_not_equal :c, FileType['test_c']
+ assert_not_equal :c, FileType['Makefile']
+ assert_not_equal :c, FileType['set.h/set']
+ assert_not_equal :c, FileType['~/projects/blabla/c']
+ end
+
+ def test_shebang
+ dir = './test'
+ if File.directory? dir
+ Dir.chdir dir do
+ assert_equal :c, FileType['test.c']
+ end
+ end
+ end
+
+end
diff --git a/lib/coderay/helpers/gzip_simple.rb b/lib/coderay/helpers/gzip_simple.rb new file mode 100644 index 0000000..02d1ffd --- /dev/null +++ b/lib/coderay/helpers/gzip_simple.rb @@ -0,0 +1,123 @@ +# =GZip Simple
+#
+# A simplified interface to the gzip library +zlib+ (from the Ruby Standard Library.)
+#
+# Author: murphy (mail to murphy cYcnus de)
+#
+# Version: 0.2 (2005.may.28)
+#
+# ==Documentation
+#
+# See +GZip+ module and the +String+ extensions.
+#
+module GZip
+
+ require 'zlib'
+
+ # The default zipping level. 7 zips good and fast.
+ DEFAULT_GZIP_LEVEL = 7
+
+ # Unzips the given string +s+.
+ #
+ # Example:
+ # require 'gzip_simple'
+ # print GZip.gunzip(File.read('adresses.gz'))
+ #
+ def GZip.gunzip s
+ Zlib::Inflate.inflate s
+ end
+
+ # Zips the given string +s+.
+ #
+ # Example:
+ # require 'gzip_simple'
+ # File.open('adresses.gz', 'w') do |file
+ # file.write GZip.gzip('Mum: 0123 456 789', 9)
+ # end
+ #
+ # If you provide a +level+, you can control how strong
+ # the string is compressed:
+ # - 0: no compression, only convert to gzip format
+ # - 1: compress fast
+ # - 7: compress more, but still fast (default)
+ # - 8: compress more, slower
+ # - 9: compress best, very slow
+ def GZip.gzip s, level = DEFAULT_GZIP_LEVEL
+ Zlib::Deflate.new(level).deflate s, Zlib::FINISH
+ end
+end
+
+# String extensions to use the GZip module.
+#
+# The methods gzip and gunzip provide an even more simple
+# interface to the ZLib:
+#
+# # create a big string
+# x = 'a' * 1000
+#
+# # zip it
+# x_gz = x.gzip
+#
+# # test the result
+# puts 'Zipped %d bytes to %d bytes.' % [x.size, x_gz.size]
+# #-> Zipped 1000 bytes to 19 bytes.
+#
+# # unzipping works
+# p x_gz.gunzip == x #-> true
+class String
+ # Returns the string, unzipped.
+ # See GZip.gunzip
+ def gunzip
+ GZip.gunzip self
+ end
+ # Replaces the string with its unzipped value.
+ # See GZip.gunzip
+ def gunzip!
+ replace gunzip
+ end
+
+ # Returns the string, zipped.
+ # +level+ is the gzip compression level, see GZip.gzip.
+ def gzip level = GZip::DEFAULT_GZIP_LEVEL
+ GZip.gzip self, level
+ end
+ # Replaces the string with its zipped value.
+ # See GZip.gzip.
+ def gzip!(*args)
+ replace gzip(*args)
+ end
+end
+
+if $0 == __FILE__
+ eval DATA.read, nil, $0, __LINE__+4
+end
+
+__END__
+#CODE
+
+# Testing / Benchmark
+x = 'a' * 1000
+x_gz = x.gzip
+puts 'Zipped %d bytes to %d bytes.' % [x.size, x_gz.size] #-> Zipped 1000 bytes to 19 bytes.
+p x_gz.gunzip == x #-> true
+
+require 'benchmark'
+
+INFO = 'packed to %0.3f%%' # :nodoc:
+
+x = Array.new(100000) { rand(255).chr + 'aaaaaaaaa' + rand(255).chr }.join
+Benchmark.bm(10) do |bm|
+ for level in 0..9
+ bm.report "zip #{level}" do
+ $x = x.gzip level
+ end
+ puts INFO % [100.0 * $x.size / x.size]
+ end
+ bm.report 'zip' do
+ $x = x.gzip
+ end
+ puts INFO % [100.0 * $x.size / x.size]
+ bm.report 'unzip' do
+ $x.gunzip
+ end
+end
diff --git a/lib/coderay/helpers/scanner_helper.rb b/lib/coderay/helpers/scanner_helper.rb new file mode 100644 index 0000000..a2e14bb --- /dev/null +++ b/lib/coderay/helpers/scanner_helper.rb @@ -0,0 +1,63 @@ +module CodeRay
+module Scanners
+
+ class Scanner
+
+ # A WordList is a Hash with some additional features.
+ # It is intended to be used for keyword recognition.
+ class WordList < Hash
+
+ def initialize default = false, case_mode = :case_match
+ @case_ignore =
+ case case_mode
+ when :case_match then false
+ when :case_ignore then true
+ else
+ raise ArgumentError,
+ "#{self.class.name}.new: second argument must be :case_ignore or :case_match, but #{case_mode} was given."
+ end
+
+ if @case_ignore
+ super() do |h, k|
+ h[k] = h.fetch k.downcase, default
+ end
+ else
+ super default
+ end
+ end
+
+ def include? word
+ self[word] if @case_ignore
+ has_key? word
+ end
+
+ def add words, kind = true
+ words.each do |word|
+ self[mind_case(word)] = kind
+ end
+ self
+ end
+
+ alias words keys
+
+ def case_ignore?
+ @case_mode
+ end
+
+ private
+ def mind_case word
+ if @case_ignore
+ word.downcase
+ else
+ word.dup
+ end
+ end
+
+ end
+
+ end
+
+end
+end
+
+# vim:sw=2:ts=2:et:tw=78
diff --git a/lib/coderay/scanner.rb b/lib/coderay/scanner.rb new file mode 100644 index 0000000..1cca607 --- /dev/null +++ b/lib/coderay/scanner.rb @@ -0,0 +1,298 @@ +module CodeRay
+
+ # This module holds class Scanner and its subclasses.
+ # For example, the Ruby scanner is named CodeRay::Scanners::Ruby
+ # can be found in coderay/scanners/ruby.
+ #
+ # Scanner also provides methods and constants for the register mechanism
+ # and the [] method that returns the Scanner class belonging to the
+ # given lang.
+ module Scanners
+
+ # Raised if Scanners[] fails because:
+ # * a file could not be found
+ # * the requested Scanner is not registered
+ ScannerNotFound = Class.new(Exception)
+
+ # Loaded Scanners are saved here.
+ SCANNERS = Hash.new { |h, lang|
+ raise ScannerNotFound, "No scanner for #{lang} found."
+ }
+
+ class << self
+
+ # Registers a scanner class by setting SCANNERS[lang].
+ #
+ # Typically used in Scanners, for example in the Ruby scanner:
+ #
+ # register_for :ruby
+ def register scanner_class, *langs
+ for lang in langs
+ raise ArgumentError, 'lang must be a Symbol, but it was a %s' % lang.class unless lang.is_a? Symbol
+ SCANNERS[lang] = scanner_class
+ end
+ end
+
+ # Loads the scanner class for +lang+ and returns it.
+ #
+ # Example:
+ #
+ # Scanners[:xml].new
+ #
+ # +lang+ is converted using +normalize+ and must be
+ # * a String containing only alphanumeric characters (\w+)
+ # * a Symbol
+ #
+ # Strings are converted to lowercase symbols (so +'C'+ and +'c'+ load the
+ # same scanner, namely the one registered for +:c+.)
+ #
+ # If the scanner isn't registered yet, it is searched.
+ # CodeRay expects that the scanner class is defined in
+ #
+ # <install-dir>/coderay/scanners/<lang>.rb
+ #
+ # (See path_to.)
+ #
+ # If the file isn't found, a ScannerNotFound exception is raised
+ #
+ # The scanner should register itself using +register+. If the scanner is
+ # still not found (because has not registered or registered under another lang),
+ # a ScannerNotFound exception is raised.
+ def [] lang
+ lang = normalize lang
+
+ SCANNERS.fetch lang do
+ scanner_file = path_to lang
+
+ begin
+ require scanner_file
+ rescue LoadError
+ raise ScannerNotFound, "File #{scanner_file} not found."
+ end
+
+ SCANNERS.fetch lang do
+ raise ScannerNotFound, <<-ERR
+No scanner for #{lang} found in #{scanner_file}.
+Known scanners: #{SCANNERS}
+ ERR
+ end
+ end
+ end
+
+ # Alias for +[]+.
+ alias load []
+
+ # Calculates the path where a scanner for +lang+
+ # is expected to be. This is:
+ #
+ # <install-dir>/coderay/scanners/<lang>.rb
+ def path_to lang
+ File.join 'coderay', 'scanners', "#{lang}.rb"
+ end
+
+ # Returns an array of all filenames in the scanners/ folder.
+ # The extension +.rb+ is not included.
+ def languages
+ scanners = File.join File.dirname(__FILE__), 'scanners', '*.rb'
+ Dir[scanners].map do |file|
+ File.basename file, '.rb'
+ end
+ end
+
+ # Loads all scanners that +languages+ finds using +load+.
+ def load_all
+ for lang in languages
+ load lang
+ end
+ end
+
+ # Converts +lang+ to a downcase Symbol if it is a String,
+ # or returns +lang+ if it already is a Symbol.
+ #
+ # Raises +ArgumentError+ for all other objects, or if the
+ # given String includes non-alphanumeric characters (\W).
+ def normalize lang
+ if lang.is_a? Symbol
+ lang
+ elsif lang.is_a? String
+ if lang[/\w+/] == lang
+ lang[/\w+/].downcase.to_sym
+ else
+ raise ArgumentError, "Invalid lang: '#{lang}' given."
+ end
+ elsif lang.nil?
+ :plaintext
+ else
+ raise ArgumentError, "String or Symbol expected, but #{lang.class} given."
+ end
+ end
+
+ end
+
+
+ require 'strscan'
+ # The base class for all Scanners.
+ #
+ # It is a subclass of Ruby's great +StringScanner+, which
+ # makes it easy to access the scanning methods inside.
+ #
+ # It is also +Enumerable+, so you can do this:
+ #
+ # require 'coderay'
+ #
+ # c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"
+ #
+ # for text, kind in c_scanner
+ # puts text if kind == :operator
+ # end
+ #
+ # # prints: (*==)++;
+ #
+ # OK, this is not a very good example :)
+ # You can also use map, any?, find and even sort_by.
+ class Scanner < StringScanner
+
+ # Raised if a Scanner fails while scanning
+ ScanError = Class.new(Exception)
+
+ require 'coderay/helpers/scanner_helper'
+
+ # The default options for all scanner classes.
+ #
+ # Define @default_options for subclasses.
+ DEFAULT_OPTIONS = { :stream => false }
+
+ class << self
+ # Register the scanner class for all
+ # +langs+.
+ #
+ # See Scanners.register.
+ def register_for *langs
+ Scanners.register self, *langs
+ end
+
+ # Returns if the Scanner can be used in streaming mode.
+ def streamable?
+ is_a? Streamable
+ end
+
+ end
+
+=begin
+ ## Excluded for speed reasons - protected seems to make methods slow.
+
+ # Save the StringScanner methods from being called.
+ # This would not be useful for highlighting.
+ strscan_public_methods = StringScanner.instance_methods - StringScanner.ancestors[1].instance_methods
+ protected(*strscan_public_methods)
+=end
+ # Creates a new Scanner.
+ #
+ # * +code+ is the input String and is handled by the superclass StringScanner.
+ # * +options+ is a Hash with Symbols as keys.
+ # It is merged with the default options of the class (you can overwrite
+ # default options here.)
+ # * +block+ is the callback for streamed highlighting.
+ #
+ # If you set :stream to +true+ in the options, the Scanner uses a
+ # TokenStream with the +block+ as callback to handle the tokens.
+ #
+ # Else, a Tokens object is used.
+ def initialize code, options = {}, &block
+ @options = self.class::DEFAULT_OPTIONS.merge options
+ raise "I am only the basic Scanner class. I can't scan anything. :(\n" +
+ "Use my subclasses." if self.class == Scanner
+
+ # I love this hack. It seems to silence all dos/unix/mac newline problems.
+ super code.gsub(/\r\n?/, "\n")
+
+ if @options[:stream]
+ warn "warning in CodeRay::Scanner.new: :stream is set, but no block was given" unless block_given?
+ raise NotStreamableError, self unless kind_of? Streamable
+ @tokens = TokenStream.new(&block)
+ else
+ warn "warning in CodeRay::Scanner.new: Block given, but :stream is #{@options[:stream]}" if block_given?
+ @tokens = Tokens.new
+ end
+ end
+
+ # More mnemonic accessor name for the input string.
+ alias code string
+
+ # Scans the code and returns all tokens in a Tokens object.
+ def tokenize options = {}
+ options = @options.merge({}) #options
+ if @options[:stream] # :stream must have been set already
+ reset ## what is this for?
+ scan_tokens @tokens, options
+ @tokens
+ else
+ @cached_tokens ||= scan_tokens @tokens, options
+ end
+ end
+
+ # you can also see this as a read-only attribute
+ alias tokens tokenize
+
+ # Traverses the tokens.
+ def each &block
+ raise ArgumentError, 'Cannot traverse TokenStream.' if @options[:stream]
+ tokens.each(&block)
+ end
+ include Enumerable
+
+ # The current line position of the scanner.
+ #
+ # Beware, this is implemented inefficiently. It should be used
+ # for debugging only.
+ def line
+ string[0..pos].count("\n") + 1
+ end
+
+ protected
+
+ # This is the central method, and often the only one a subclass implements.
+ #
+ # Subclasses must implement this method; it must return +tokens+ and must only
+ # use Tokens#<< for storing scanned tokens.
+ def scan_tokens tokens, options
+ raise NotImplementedError, "#{self.class}#scan_tokens not implemented."
+ end
+
+ # Scanner error with additional status information
+ def raise_inspect msg, tokens, ambit = 30
+ raise ScanError, <<-EOE % [
+
+
+***ERROR in %s: %s
+
+tokens:
+%s
+
+current line: %d pos = %d
+matched: %p
+bol? = %p, eos? = %p
+
+surrounding code:
+%p ~~ %p
+
+
+***ERROR***
+
+ EOE
+ File.basename(caller[0]),
+ msg,
+ tokens.last(10).map { |t| t.inspect }.join("\n"),
+ line, pos,
+ matched, bol?, eos?,
+ string[pos-ambit,ambit],
+ string[pos,ambit],
+ ]
+ end
+
+ end
+
+ end
+end
+
+# vim:sw=2:ts=2:et:tw=78
diff --git a/lib/coderay/scanners/c.rb b/lib/coderay/scanners/c.rb new file mode 100644 index 0000000..3420822 --- /dev/null +++ b/lib/coderay/scanners/c.rb @@ -0,0 +1,147 @@ +module CodeRay module Scanners
+
+ class C < Scanner
+
+ register_for :c
+
+ RESERVED_WORDS = [
+ 'asm', 'break', 'case', 'continue', 'default', 'do', 'else',
+ 'for', 'goto', 'if', 'return', 'switch', 'while',
+ 'struct', 'union', 'enum', 'typedef',
+ 'static', 'register', 'auto', 'extern',
+ 'sizeof',
+ 'volatile', 'const', # C89
+ 'inline', 'restrict', # C99
+ ]
+
+ PREDEFINED_TYPES = [
+ 'int', 'long', 'short', 'char', 'void',
+ 'signed', 'unsigned', 'float', 'double',
+ 'bool', 'complex', # C99
+ ]
+
+ PREDEFINED_CONSTANTS = [
+ 'EOF', 'NULL',
+ 'true', 'false', # C99
+ ]
+
+ IDENT_KIND = Scanner::WordList.new(:ident).
+ add(RESERVED_WORDS, :reserved).
+ add(PREDEFINED_TYPES, :pre_type).
+ add(PREDEFINED_CONSTANTS, :pre_constant)
+
+ ESCAPE = / [rbfnrtv\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x
+ UNICODE_ESCAPE = / u[a-fA-F0-9]{4} | U[a-fA-F0-9]{8} /x
+
+ def scan_tokens tokens, options
+
+ state = :initial
+
+ until eos?
+
+ kind = :error
+ match = nil
+
+ if state == :initial
+
+ if scan(/ \s+ | \\\n /x)
+ kind = :space
+
+ elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx)
+ kind = :comment
+
+ elsif match = scan(/ \# \s* if \s* 0 /x)
+ match << scan_until(/ ^\# (?:elif|else|endif) .*? $ | \z /xm) unless eos?
+ kind = :comment
+
+ elsif scan(/ [-+*\/=<>?:;,!&^|()\[\]{}~%]+ | \.(?!\d) /x)
+ kind = :operator
+
+ elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x)
+ kind = IDENT_KIND[match]
+ if kind == :ident and check(/:(?!:)/)
+ match << scan(/:/)
+ kind = :label
+ end
+
+ elsif match = scan(/L?"/)
+ tokens << [:open, :string]
+ if match[0] == ?L
+ tokens << ['L', :modifier]
+ match = '"'
+ end
+ state = :string
+ kind = :delimiter
+
+ elsif scan(/#\s*(\w*)/)
+ kind = :preprocessor # FIXME multiline preprocs
+ state = :include_expected if self[1] == 'include'
+
+ elsif scan(/ L?' (?: [^\'\n\\] | \\ #{ESCAPE} )? '? /ox)
+ kind = :char
+
+ elsif scan(/0[xX][0-9A-Fa-f]+/)
+ kind = :hex
+
+ elsif scan(/(?:0[0-7]+)(?![89.eEfF])/)
+ kind = :oct
+
+ elsif scan(/(?:\d+)(?![.eEfF])/)
+ kind = :integer
+
+ elsif scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/)
+ kind = :float
+
+ else
+ getch
+ end
+
+ elsif state == :string
+ if scan(/[^\\"]+/)
+ kind = :content
+ elsif scan(/"/)
+ tokens << ['"', :delimiter]
+ tokens << [:close, :string]
+ state = :initial
+ next
+ elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
+ kind = :char
+ elsif scan(/ \\ | $ /x)
+ kind = :error
+ state = :initial
+ else
+ raise "else case \" reached; %p not handled." % peek(1), tokens
+ end
+
+ elsif state == :include_expected
+ if scan(/<[^>\n]+>?|"[^"\n\\]*(?:\\.[^"\n\\]*)*"?/)
+ kind = :include
+ state = :initial
+
+ elsif match = scan(/\s+/)
+ kind = :space
+ state = :initial if match.index ?\n
+
+ else
+ getch
+
+ end
+
+ else
+ raise 'else-case reached', tokens
+
+ end
+
+ match ||= matched
+ raise [match, kind], tokens if kind == :error
+
+ tokens << [match, kind]
+
+ end
+
+ tokens
+ end
+
+ end
+
+end end
diff --git a/lib/coderay/scanners/delphi.rb b/lib/coderay/scanners/delphi.rb new file mode 100644 index 0000000..4c03147 --- /dev/null +++ b/lib/coderay/scanners/delphi.rb @@ -0,0 +1,123 @@ +module CodeRay module Scanners
+
+ class Delphi < Scanner
+
+ register_for :delphi
+
+ RESERVED_WORDS = [
+ 'and', 'array', 'as', 'at', 'asm', 'at', 'begin', 'case', 'class',
+ 'const', 'constructor', 'destructor', 'dispinterface', 'div', 'do',
+ 'downto', 'else', 'end', 'except', 'exports', 'file', 'finalization',
+ 'finally', 'for', 'function', 'goto', 'if', 'implementation', 'in',
+ 'inherited', 'initialization', 'inline', 'interface', 'is', 'label',
+ 'library', 'mod', 'nil', 'not', 'object', 'of', 'or', 'out', 'packed',
+ 'procedure', 'program', 'property', 'raise', 'record', 'repeat',
+ 'resourcestring', 'set', 'shl', 'shr', 'string', 'then', 'threadvar',
+ 'to', 'try', 'type', 'unit', 'until', 'uses', 'var', 'while', 'with',
+ 'xor', 'on'
+ ]
+
+ DIRECTIVES = [
+ 'absolute', 'abstract', 'assembler', 'at', 'automated', 'cdecl',
+ 'contains', 'deprecated', 'dispid', 'dynamic', 'export',
+ 'external', 'far', 'forward', 'implements', 'local',
+ 'near', 'nodefault', 'on', 'overload', 'override',
+ 'package', 'pascal', 'platform', 'private', 'protected', 'public',
+ 'published', 'read', 'readonly', 'register', 'reintroduce',
+ 'requires', 'resident', 'safecall', 'stdcall', 'stored', 'varargs',
+ 'virtual', 'write', 'writeonly'
+ ]
+
+ IDENT_KIND = Scanner::WordList.new(:ident, :case_ignore).
+ add(RESERVED_WORDS, :reserved).
+ add(DIRECTIVES, :directive)
+
+ def scan_tokens tokens, options
+
+ state = :initial
+
+ until eos?
+
+ kind = :error
+ match = nil
+
+ if state == :initial
+
+ if scan(/ \s+ /x)
+ kind = :space
+
+ elsif scan(%r! \{ \$ [^}]* \}? | \(\* \$ (?: .*? \*\) | .* ) !mx)
+ kind = :preprocessor
+
+ elsif scan(%r! // [^\n]* | \{ [^}]* \}? | \(\* (?: .*? \*\) | .* ) !mx)
+ kind = :comment
+
+ elsif scan(/ [-+*\/=<>:;,.@\^|\(\)\[\]]+ /x)
+ kind = :operator
+
+ elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x)
+ kind = IDENT_KIND[match]
+
+ elsif match = scan(/ ' ( [^\n']|'' ) (?:'|$) /x)
+ tokens << [:open, :char]
+ tokens << ["'", :delimiter]
+ tokens << [self[1], :content]
+ tokens << ["'", :delimiter]
+ tokens << [:close, :char]
+ next
+
+ elsif match = scan(/ ' /x)
+ tokens << [:open, :string]
+ state = :string
+ kind = :delimiter
+
+ elsif scan(/ \# (?: \d+ | \$[0-9A-Fa-f]+ ) /x)
+ kind = :char
+
+ elsif scan(/ \$ [0-9A-Fa-f]+ /x)
+ kind = :hex
+
+ elsif scan(/ (?: \d+ ) (?![eE]|\.[^.]) /x)
+ kind = :integer
+
+ elsif scan(/ \d+ (?: \.\d+ (?: [eE][+-]? \d+ )? | [eE][+-]? \d+ ) /x)
+ kind = :float
+
+ else
+ getch
+ end
+
+ elsif state == :string
+ if scan(/[^\n']+/)
+ kind = :content
+ elsif scan(/''/)
+ kind = :char
+ elsif scan(/'/)
+ tokens << ["'", :delimiter]
+ tokens << [:close, :string]
+ state = :initial
+ next
+ elsif scan(/\n/)
+ state = :initial
+ else
+ raise "else case \' reached; %p not handled." % peek(1), tokens
+ end
+
+ else
+ raise 'else-case reached', tokens
+
+ end
+
+ match ||= matched
+ raise [match, kind], tokens if kind == :error
+
+ tokens << [match, kind]
+
+ end
+
+ tokens
+ end
+
+ end
+
+end end
diff --git a/lib/coderay/scanners/helpers/ruby_helper.rb b/lib/coderay/scanners/helpers/ruby_helper.rb new file mode 100644 index 0000000..241b392 --- /dev/null +++ b/lib/coderay/scanners/helpers/ruby_helper.rb @@ -0,0 +1,212 @@ +module CodeRay module Scanners
+
+ class Ruby
+
+ RESERVED_WORDS = %w[
+ and def end in or unless begin
+ defined? ensure module redo super until
+ BEGIN break do next rescue then
+ when END case else for retry
+ while alias class elsif if not return
+ undef yield
+ ]
+
+ DEF_KEYWORDS = %w[ def ]
+ MODULE_KEYWORDS = %w[class module]
+ DEF_NEW_STATE = WordList.new(:initial).
+ add(DEF_KEYWORDS, :def_expected).
+ add(MODULE_KEYWORDS, :module_expected)
+
+ IDENTS_ALLOWING_REGEXP = %w[
+ and or not while until unless if then elsif when sub sub! gsub gsub! scan slice slice! split
+ ]
+ REGEXP_ALLOWED = WordList.new(false).
+ add(IDENTS_ALLOWING_REGEXP, :set)
+
+ PREDEFINED_CONSTANTS = %w[
+ nil true false self
+ DATA ARGV ARGF __FILE__ __LINE__
+ ]
+
+ IDENT_KIND = WordList.new(:ident).
+ add(RESERVED_WORDS, :reserved).
+ add(PREDEFINED_CONSTANTS, :pre_constant)
+
+# IDENT = /[a-zA-Z_][a-zA-Z_0-9]*/
+ IDENT = /[a-z_][\w_]*/i
+
+ METHOD_NAME = / #{IDENT} [?!]? /ox
+ METHOD_NAME_EX = /
+ #{IDENT}[?!=]? # common methods: split, foo=, empty?, gsub!
+ | \*\*? # multiplication and power
+ | [-+]@? # plus, minus
+ | [\/%&|^`~] # division, modulo or format strings, &and, |or, ^xor, `system`, tilde
+ | \[\]=? # array getter and setter
+ | << | >> # append or shift left, shift right
+ | <=?>? | >=? # comparison, rocket operator
+ | ===? # simple equality and case equality
+ /ox
+ INSTANCE_VARIABLE = / @ #{IDENT} /ox
+ CLASS_VARIABLE = / @@ #{IDENT} /ox
+ OBJECT_VARIABLE = / @@? #{IDENT} /ox
+ GLOBAL_VARIABLE = / \$ (?: #{IDENT} | [1-9] | 0[a-zA-Z_0-9]* | [~&+`'=\/,;_.<>!@$?*":\\] | -[a-zA-Z_0-9] ) /ox
+ PREFIX_VARIABLE = / #{GLOBAL_VARIABLE} |#{OBJECT_VARIABLE} /ox
+ VARIABLE = / @?@? #{IDENT} | #{GLOBAL_VARIABLE} /ox
+
+ QUOTE_TO_TYPE = {
+ '`' => :shell,
+ '/'=> :regexp,
+ }
+ QUOTE_TO_TYPE.default = :string
+
+ REGEXP_MODIFIERS = /[mixounse]*/
+ REGEXP_SYMBOLS = /
+ [|?*+?(){}\[\].^$]
+ /x
+
+ DECIMAL = /\d+(?:_\d+)*/ # doesn't recognize 09 as octal error
+ OCTAL = /0_?[0-7]+(?:_[0-7]+)*/
+ HEXADECIMAL = /0x[0-9A-Fa-f]+(?:_[0-9A-Fa-f]+)*/
+ BINARY = /0b[01]+(?:_[01]+)*/
+
+ EXPONENT = / [eE] [+-]? #{DECIMAL} /ox
+ FLOAT_OR_INT = / #{DECIMAL} (?: #{EXPONENT} | \. #{DECIMAL} #{EXPONENT}? )? /ox
+ FLOAT = / #{DECIMAL} (?: #{EXPONENT} | \. #{DECIMAL} #{EXPONENT}? ) /ox
+ NUMERIC = / #{OCTAL} | #{HEXADECIMAL} | #{BINARY} | #{FLOAT_OR_INT} /ox
+
+ SYMBOL = /
+ :
+ (?:
+ #{METHOD_NAME_EX}
+ | #{PREFIX_VARIABLE}
+ | ['"]
+ )
+ /ox
+
+ # TODO investigste \M, \c and \C escape sequences
+ # (?: M-\\C-|C-\\M-|M-\\c|c\\M-|c|C-|M-)? (?: \\ (?: [0-7]{3} | x[0-9A-Fa-f]{2} | . ) )
+ # assert_equal(225, ?\M-a)
+ # assert_equal(129, ?\M-\C-a)
+ ESCAPE = /
+ [abefnrstv]
+ | M-\\C-|C-\\M-|M-\\c|c\\M-|c|C-|M-
+ | [0-7]{1,3}
+ | x[0-9A-Fa-f]{1,2}
+ | .
+ /mx
+
+ CHARACTER = /
+ \?
+ (?:
+ [^\s\\]
+ | \\ #{ESCAPE}
+ )
+ /mx
+
+ # NOTE: This is not completel correct, but
+ # nobody needs heredoc delimiters ending with \n.
+ HEREDOC_OPEN = /
+ << (-)? # $1 = float
+ (?:
+ ( [A-Za-z_0-9]+ ) # $2 = delim
+ |
+ ( ["'`] ) # $3 = quote, type
+ ( [^\n]*? ) \3 # $4 = delim
+ )
+ /mx
+
+ RDOC = /
+ =begin (?!\S)
+ .*?
+ (?: \Z | ^=end (?!\S) [^\n]* )
+ /mx
+
+ DATA = /
+ __END__$
+ .*?
+ (?: \Z | (?=^\#CODE) )
+ /mx
+
+ RDOC_DATA_START = / ^=begin (?!\S) | ^__END__$ /x
+
+ FANCY_START = / % ( [qQwWxsr] | (?![\w\s=]) ) (.) /mox
+
+ FancyStringType = {
+ 'q' => [:string, false],
+ 'Q' => [:string, true],
+ 'r' => [:regexp, true],
+ 's' => [:symbol, false],
+ 'x' => [:shell, true],
+ 'w' => [:string, :word],
+ 'W' => [:string, :word],
+ }
+ FancyStringType['w'] = FancyStringType['q']
+ FancyStringType['W'] = FancyStringType[''] = FancyStringType['Q']
+
+ class StringState < Struct.new :type, :interpreted, :delim, :heredoc,
+ :paren, :paren_depth, :pattern
+
+ CLOSING_PAREN = Hash[ *%w[
+ ( )
+ [ ]
+ < >
+ { }
+ ] ]
+
+ CLOSING_PAREN.values.each { |o| o.freeze } # debug, if I try to change it with <<
+ OPENING_PAREN = CLOSING_PAREN.invert
+
+ STRING_PATTERN = Hash.new { |h, k|
+ delim, interpreted = *k
+ delim_pattern = Regexp.escape(delim.dup)
+ if starter = OPENING_PAREN[delim]
+ delim_pattern << Regexp.escape(starter)
+ end
+
+
+ special_escapes =
+ case interpreted
+ when :regexp_symbols
+ '| ' + REGEXP_SYMBOLS.source
+ when :words
+ '| \s'
+ end
+
+ h[k] =
+ if interpreted and not delim == '#'
+ / (?= [#{delim_pattern}\\] | \# [{$@] #{special_escapes} ) /mx
+ else
+ / (?= [#{delim_pattern}\\] #{special_escapes} ) /mx
+ end
+ }
+
+ HEREDOC_PATTERN = Hash.new { |h, k|
+ delim, interpreted, indented = *k
+ delim_pattern = Regexp.escape(delim.dup)
+ delim_pattern = / \n #{ '(?>[\ \t]*)' if indented } #{ Regexp.new delim_pattern } $ /x
+ h[k] =
+ if interpreted
+ / (?= #{delim_pattern}() | \\ | \# [{$@] ) /mx
+ else
+ / (?= #{delim_pattern}() | \\ ) /mx
+ end
+ }
+
+ def initialize kind, interpreted, delim, heredoc = false
+ if paren = CLOSING_PAREN[delim]
+ delim, paren = paren, delim
+ paren_depth = 1
+ end
+ if heredoc
+ pattern = HEREDOC_PATTERN[ [delim, interpreted, heredoc == :indented] ]
+ delim = nil
+ else
+ pattern = STRING_PATTERN[ [delim, interpreted] ]
+ end
+ super kind, interpreted, delim, heredoc, paren, paren_depth, pattern
+ end
+ end unless defined? StringState
+
+ end
+
+end end
diff --git a/lib/coderay/scanners/mush.rb b/lib/coderay/scanners/mush.rb new file mode 100644 index 0000000..5217ed9 --- /dev/null +++ b/lib/coderay/scanners/mush.rb @@ -0,0 +1,102 @@ +module CodeRay module Scanners
+
+ class Mush < Scanner
+
+ register_for :mush
+
+ RESERVED_WORDS = [
+ ]
+
+ IDENT_KIND = Scanner::WordList.new(:ident, :case_ignore).
+ add(RESERVED_WORDS, :reserved).
+ add(DIRECTIVES, :directive)
+
+ def scan_tokens tokens, options
+
+ state = :initial
+
+ until eos?
+
+ kind = :error
+ match = nil
+
+ if state == :initial
+
+ if scan(/ \s+ /x)
+ kind = :space
+
+ elsif scan(%r! \{ \$ [^}]* \}? | \(\* \$ (?: .*? \*\) | .* ) !mx)
+ kind = :preprocessor
+
+ elsif scan(%r! // [^\n]* | \{ [^}]* \}? | \(\* (?: .*? \*\) | .* ) !mx)
+ kind = :comment
+
+ elsif scan(/ [-+*\/=<>:;,.@\^|\(\)\[\]]+ /x)
+ kind = :operator
+
+ elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x)
+ kind = IDENT_KIND[match]
+
+ elsif match = scan(/ ' ( [^\n']|'' ) (?:'|$) /x)
+ tokens << [:open, :char]
+ tokens << ["'", :delimiter]
+ tokens << [self[1], :content]
+ tokens << ["'", :delimiter]
+ tokens << [:close, :char]
+ next
+
+ elsif match = scan(/ ' /x)
+ tokens << [:open, :string]
+ state = :string
+ kind = :delimiter
+
+ elsif scan(/ \# (?: \d+ | \$[0-9A-Fa-f]+ ) /x)
+ kind = :char
+
+ elsif scan(/ \$ [0-9A-Fa-f]+ /x)
+ kind = :hex
+
+ elsif scan(/ (?: \d+ ) (?![eE]|\.[^.]) /x)
+ kind = :integer
+
+ elsif scan(/ \d+ (?: \.\d+ (?: [eE][+-]? \d+ )? | [eE][+-]? \d+ ) /x)
+ kind = :float
+
+ else
+ getch
+ end
+
+ elsif state == :string
+ if scan(/[^\n']+/)
+ kind = :content
+ elsif scan(/''/)
+ kind = :char
+ elsif scan(/'/)
+ tokens << ["'", :delimiter]
+ tokens << [:close, :string]
+ state = :initial
+ next
+ elsif scan(/\n/)
+ state = :initial
+ else
+ raise "else case \' reached; %p not handled." % peek(1), tokens
+ end
+
+ else
+ raise 'else-case reached', tokens
+
+ end
+
+ match ||= matched
+ raise [match, kind], tokens if kind == :error
+
+ tokens << [match, kind]
+
+ end
+
+ tokens
+ end
+
+ end
+
+end end
diff --git a/lib/coderay/scanners/plaintext.rb b/lib/coderay/scanners/plaintext.rb new file mode 100644 index 0000000..0aebf35 --- /dev/null +++ b/lib/coderay/scanners/plaintext.rb @@ -0,0 +1,13 @@ +module CodeRay module Scanners
+
+ class Plaintext < Scanner
+
+ register_for :plaintext, :plain
+
+ def scan_tokens tokens, options
+ tokens << [scan_until(/\z/), :plain]
+ end
+
+ end
+
+end end
diff --git a/lib/coderay/scanners/ruby.rb b/lib/coderay/scanners/ruby.rb new file mode 100644 index 0000000..433726b --- /dev/null +++ b/lib/coderay/scanners/ruby.rb @@ -0,0 +1,333 @@ +module CodeRay module Scanners
+
+ # This scanner is really complex, since Ruby _is_ a complex language!
+ #
+ # It tries to highlight 100% of all common code,
+ # and 90% of strange codes.
+ #
+ # It is optimized for HTML highlighting, and is not very useful for
+ # parsing or pretty printing.
+ #
+ # For now, I think it's better than the scanners in VIM or Syntax, or
+ # any highlighter I was able to find, except Caleb's RubyLexer.
+ #
+ # I hope it's also better than the rdoc/irb lexer.
+ class Ruby < Scanner
+
+ include Streamable
+
+ register_for :ruby
+
+ require 'coderay/scanners/helpers/ruby_helper'
+
+ DEFAULT_OPTIONS = {
+ :parse_regexps => true,
+ }
+
+ private
+ def scan_tokens tokens, options
+ parse_regexp = false # options[:parse_regexps]
+ first_bake = saved_tokens = nil
+ last_token_dot = false
+ fancy_allowed = regexp_allowed = true
+ heredocs = nil
+ last_state = nil
+ state = :initial
+ depth = nil
+ states = []
+
+ until eos?
+ type = :error
+ match = nil
+ kind = nil
+
+ if state.instance_of? StringState
+# {{{
+
+ match = scan_until(state.pattern) || scan_until(/\z/)
+ tokens << [match, :content] unless match.empty?
+ break if eos?
+
+ if state.heredoc and self[1]
+ match = getch + scan_until(/$/)
+ tokens << [match, :delimiter]
+ tokens << [:close, state.type]
+ state = :initial
+ next
+ end
+
+ case match = getch
+
+ when state.delim
+ if state.paren
+ state.paren_depth -= 1
+ if state.paren_depth > 0
+ tokens << [match, :nesting_delimiter]
+ next
+ end
+ end
+ tokens << [match, :delimiter]
+ if state.type == :regexp and not eos?
+ modifiers = scan(/#{REGEXP_MODIFIERS}/ox)
+ tokens << [modifiers, :modifier] unless modifiers.empty?
+ if parse_regexp
+ extended = modifiers.index ?x
+ tokens, regexp = saved_tokens, tokens
+ for text, type in regexp
+ if text.is_a? String
+ case type
+ when :content
+ text.scan(/([^#]+)|(#.*)/) do |plain, comment|
+ if plain
+ tokens << [plain, :content]
+ else
+ tokens << [comment, :comment]
+ end
+ end
+ when :character
+ if text[/\\(?:[swdSWDAzZbB]|\d+)/]
+ tokens << [text, :modifier]
+ else
+ tokens << [text, type]
+ end
+ else
+ tokens << [text, type]
+ end
+ else
+ tokens << [text, type]
+ end
+ end
+ first_bake = saved_tokens = nil
+ end
+ end
+ tokens << [:close, state.type]
+ fancy_allowed = regexp_allowed = false
+ state = :initial
+
+ when '\\'
+ if state.interpreted
+ if esc = scan(/ #{ESCAPE} /ox)
+ tokens << [match + esc, :char]
+ else
+ tokens << [match, :error]
+ end
+ else
+ case m = getch
+ when state.delim, '\\'
+ tokens << [match + m, :char]
+ else
+ tokens << [match + m, :content]
+ end
+ end
+
+ when '#'
+ case peek(1)[0]
+ when ?{
+ states.push [state, depth, heredocs]
+ fancy_allowed = regexp_allowed = true
+ state, depth = :initial, 1
+ tokens << [match + getch, :escape]
+ when ?$, ?@
+ tokens << [match, :escape]
+ last_state = state # scan one token as normal code, then return here
+ state = :initial
+ else
+ raise "else-case # reached; #%p not handled" % peek(1), tokens
+ end
+
+ when state.paren
+ state.paren_depth += 1
+ tokens << [match, :nesting_delimiter]
+
+ when REGEXP_SYMBOLS
+ tokens << [match, :function]
+
+ else
+ raise "else-case \" reached; %p not handled, state = %p" % [match, state], tokens
+
+ end
+ next
+# }}}
+ else
+# {{{
+ if match = scan(/ [ \t\f]+ | \\? \n | \# .* /x) or
+ ( bol? and match = scan(/ #{DATA} | #{RDOC} /ox) )
+ fancy_allowed = true
+ case m = match[0]
+ when ?\s, ?\t, ?\f
+ match << scan(/\s*/) unless eos? or heredocs
+ type = :space
+ when ?\n, ?\\
+ type = :space
+ regexp_allowed = m == ?\n
+ if heredocs
+ unscan # heredoc scanning needs \n at start
+ state = heredocs.shift
+ tokens << [:open, state.type]
+ heredocs = nil if heredocs.empty?
+ next
+ else
+ match << scan(/\s*/) unless eos?
+ end
+ when ?#, ?=, ?_
+ type = :comment
+ regexp_allowed = true
+ else
+ raise "else-case _ reached, because case %p was not handled" % [matched[0].chr], tokens
+ end
+ tokens << [match, type]
+ next
+
+ elsif state == :initial
+ if match = scan(/ \.\.?\.? | [-+*=>;,|&!\(\)\[\]~^]+ | [\{\}] | :: /x)
+ if match !~ / [.\)\]\}] \z/x or match =~ /\.\.\.?/
+ regexp_allowed = fancy_allowed = :set
+ end
+ last_token_dot = :set if match == '.' or match == '::'
+ type = :operator
+ unless states.empty?
+ case match
+ when '{'
+ depth += 1
+ when '}'
+ depth -= 1
+ if depth == 0
+ state, depth, heredocs = *states.pop
+ type = :escape
+ end
+ end
+ end
+
+ elsif match = scan(/#{METHOD_NAME}/o)
+ if last_token_dot
+ type = if match[/^[A-Z]/] then :constant else :ident end
+ else
+ type = IDENT_KIND[match]
+ if type == :ident and match[/^[A-Z]/]
+ type = :constant
+ elsif type == :reserved
+ state = DEF_NEW_STATE[match]
+ end
+ end
+ fancy_allowed = regexp_allowed = REGEXP_ALLOWED[match]
+
+ elsif match = scan(/ ['"] /mx)
+ tokens << [:open, :string]
+ type = :delimiter
+ state = StringState.new :string, match != '\'', match.dup # important for streaming
+
+ elsif match = scan(/#{INSTANCE_VARIABLE}/o)
+ type = :instance_variable
+
+ elsif regexp_allowed and match = scan(/ \/ /mx)
+ tokens << [:open, :regexp]
+ type = :delimiter
+ interpreted = true
+ state = StringState.new :regexp, interpreted, match.dup
+ if parse_regexp
+ tokens, saved_tokens = [], tokens
+ end
+
+ elsif match = scan(/#{NUMERIC}/o)
+ type = if match[/#{FLOAT}/o] then :float else :integer end
+
+ elsif fancy_allowed and match = scan(/#{SYMBOL}/o)
+ case match[1]
+ when ?', ?"
+ tokens << [:open, :symbol]
+ state = StringState.new :symbol, match[1] == ?", match[1,1]
+ end
+ type = :symbol
+
+ elsif fancy_allowed and match = scan(/#{HEREDOC_OPEN}/o)
+ indented, quote = self[1] == '-', self[3]
+ delim = self[quote ? 4 : 2]
+ type = QUOTE_TO_TYPE[quote]
+ tokens << [:open, type]
+ tokens << [match, :delimiter]
+ match = :close
+ heredoc = StringState.new type, quote != '\'', delim, (indented ? :indented : :linestart )
+ heredocs ||= [] # create heredocs if empty
+ heredocs << heredoc
+
+ elsif fancy_allowed and match = scan(/#{FANCY_START}/o)
+ type, interpreted = *FancyStringType.fetch(self[1]) do
+ raise 'Unknown fancy string: %%%p' % k, tokens
+ end
+ tokens << [:open, type]
+ state = StringState.new type, interpreted, self[2]
+ type = :delimiter
+
+ elsif fancy_allowed and match = scan(/#{CHARACTER}/o)
+ type = :integer
+
+ elsif match = scan(/ [\/%<?:] /x)
+ regexp_allowed = fancy_allowed = :set
+ type = :operator
+
+ elsif match = scan(/`/)
+ if last_token_dot
+ type = :operator
+ else
+ tokens << [:open, :shell]
+ type = :delimiter
+ state = StringState.new :shell, true, '`'
+ end
+
+ elsif match = scan(/#{GLOBAL_VARIABLE}/o)
+ type = :global_variable
+
+ elsif match = scan(/#{CLASS_VARIABLE}/o)
+ type = :class_variable
+
+ else
+ match = getch
+
+ end
+
+ elsif state == :def_expected
+ if match = scan(/ (?: #{VARIABLE} (?: ::#{IDENT} )* \. )? #{METHOD_NAME_EX} /ox)
+ type = :method
+ else
+ match = getch
+ end
+ state = :initial
+
+ elsif state == :module_expected
+ if match = scan(/<</)
+ type = :operator
+ else
+ if match = scan(/ (?:#{IDENT}::)* #{IDENT} /ox)
+ type = :class
+ else
+ match = getch
+ end
+ end
+ state = :initial
+
+ end
+
+ regexp_allowed = regexp_allowed == :set
+ fancy_allowed = fancy_allowed == :set
+ last_token_dot = last_token_dot == :set
+
+ if $DEBUG
+ raise_inspect 'error token %p in line %d' % [tokens.last, line], tokens if not type or type == :error
+ end
+
+ tokens << [match, type]
+
+ if last_state
+ state = last_state
+ last_state = nil
+ end
+# }}}
+ end
+ end
+
+ tokens
+ end
+ end
+
+end end
+# vim:fdm=marker
diff --git a/lib/coderay/scanners/rubyfast.rb b/lib/coderay/scanners/rubyfast.rb new file mode 100644 index 0000000..baff382 --- /dev/null +++ b/lib/coderay/scanners/rubyfast.rb @@ -0,0 +1,287 @@ +module CodeRay module Scanners
+
+ class Ruby < Scanner
+
+ register_for :rubyfast
+
+ RESERVED_WORDS = [
+ 'and', 'def', 'end', 'in', 'or', 'unless', 'begin',
+ 'defined?', 'ensure', 'module', 'redo', 'super', 'until',
+ 'BEGIN', 'break', 'do', 'next', 'rescue', 'then',
+ 'when', 'END', 'case', 'else', 'for', 'retry',
+ 'while', 'alias', 'class', 'elsif', 'if', 'not', 'return',
+ 'undef', 'yield',
+ ]
+
+ DEF_KEYWORDS = ['def']
+ MODULE_KEYWORDS = ['class', 'module']
+ DEF_NEW_STATE = WordList.new(:initial).
+ add(DEF_KEYWORDS, :def_expected).
+ add(MODULE_KEYWORDS, :module_expected)
+
+ WORDS_ALLOWING_REGEXP = [
+ 'and', 'or', 'not', 'while', 'until', 'unless', 'if', 'elsif', 'when'
+ ]
+ REGEXP_ALLOWED = WordList.new(false).
+ add(WORDS_ALLOWING_REGEXP, :set)
+
+ PREDEFINED_CONSTANTS = [
+ 'nil', 'true', 'false', 'self',
+ 'DATA', 'ARGV', 'ARGF', '__FILE__', '__LINE__',
+ ]
+
+ IDENT_KIND = WordList.new(:ident).
+ add(RESERVED_WORDS, :reserved).
+ add(PREDEFINED_CONSTANTS, :pre_constant)
+
+ IDENT = /[a-zA-Z_][a-zA-Z_0-9]*/
+
+ METHOD_NAME = / #{IDENT} [?!]? /xo
+ METHOD_NAME_EX = /
+ #{IDENT}[?!=]? # common methods: split, foo=, empty?, gsub!
+ | \*\*? # multiplication and power
+ | [-+~]@? # plus, minus
+ | [\/%&|^`] # division, modulo or format strings, &and, |or, ^xor, `system`
+ | \[\]=? # array getter and setter
+ | <=?>? | >=? # comparison, rocket operator
+ | << | >> # append or shift left, shift right
+ | ===? # simple equality and case equality
+ /ox
+ GLOBAL_VARIABLE = / \$ (?: #{IDENT} | [1-9] | 0[a-zA-Z_0-9]* | [~&+`'=\/,;_.<>!@$?*":\\] | -[a-zA-Z_0-9] ) /ox
+
+ DOUBLEQ = / " [^"\#\\]* (?: (?: \#\{.*?\} | \#(?:$")? | \\. ) [^"\#\\]* )* "? /mox
+ SINGLEQ = / ' [^'\\]* (?: \\. [^'\\]* )* '? /mox
+ STRING = / #{SINGLEQ} | #{DOUBLEQ} /ox
+
+ SHELL = / ` [^`\#\\]* (?: (?: \#\{.*?\} | \#(?:$`)? | \\. ) [^`\#\\]* )* `? /mox
+ REGEXP =%r! / [^/\#\\]* (?: (?: \#\{.*?\} | \#(?:$/)? | \\. ) [^/\#\\]* )* /? !mox
+
+ DECIMAL = /\d+(?:_\d+)*/ # doesn't recognize 09 as octal error
+ OCTAL = /0_?[0-7]+(?:_[0-7]+)*/
+ HEXADECIMAL = /0x[0-9A-Fa-f]+(?:_[0-9A-Fa-f]+)*/
+ BINARY = /0b[01]+(?:_[01]+)*/
+
+ EXPONENT = / [eE] [+-]? #{DECIMAL} /ox
+ FLOAT = / #{DECIMAL} (?: #{EXPONENT} | \. #{DECIMAL} #{EXPONENT}? ) /
+ INTEGER = /#{OCTAL}|#{HEXADECIMAL}|#{BINARY}|#{DECIMAL}/
+
+ ESCAPE_STRING = /
+ % (?!\s)
+ (?:
+ [qsw]
+ (?:
+ \( [^\)\\]* (?: \\. [^\)\\]* )* \)?
+ |
+ \[ [^\]\\]* (?: \\. [^\]\\]* )* \]?
+ |
+ \{ [^\}\\]* (?: \\. [^\}\\]* )* \}?
+ |
+ \< [^\>\\]* (?: \\. [^\>\\]* )* \>?
+ |
+ \\ [^\\ ]* \\?
+ |
+ ( [^a-zA-Z0-9] ) # $1
+ (?:(?!\1)[^\\])* (?: \\. (?:(?!\1)[^\#\\])* )* \1?
+ )
+ |
+ [QrxWr]?
+ (?:
+ \( [^\)\#\\]* (?: (?:\#\{.*?\}|\#|\\.) [^\)\#\\]* )* \)?
+ |
+ \[ [^\]\#\\]* (?: (?:\#\{.*?\}|\#|\\.) [^\]\#\\]* )* \]?
+ |
+ \{ [^\}\#\\]* (?: (?:\#\{.*?\}|\#|\\.) [^\}\#\\]* )* \}?
+ |
+ \< [^\>\#\\]* (?: (?:\#\{.*?\}|\#|\\.) [^\>\#\\]* )* \>?
+ |
+ \# [^\# \\]* (?: \\. [^\# \\]* )* \#?
+ |
+ \\ [^\\\# ]* (?: (?:\#\{.*?\}|\# ) [^\\\# ]* )* \\?
+ |
+ ( [^a-zA-Z0-9] ) # $2
+ (?:(?!\2)[^\#\\])* (?: (?:\#\{.*?\}|\#|\\.) (?:(?!\2)[^\#\\])* )* \2?
+ )
+ )
+ /mox
+
+ SYMBOL = /
+ :
+ (?:
+ #{GLOBAL_VARIABLE}
+ | @@?#{IDENT}
+ | #{METHOD_NAME_EX}
+ | #{STRING}
+ )/ox
+
+ HEREDOC = /
+ << (?! [\dc] )
+ (?: [^\n]*? << )?
+ (?:
+ ([a-zA-Z_0-9]+)
+ (?: .*? ^\1$ | .* )
+ |
+ -([a-zA-Z_0-9]+)
+ (?: .*? ^\s*\2$ | .* )
+ |
+ (["\'`]) (.*?) \3
+ (?: .*? ^\4$ | .* )
+ |
+ - (["\'`]) (.*?) \5
+ (?: .*? ^\s*\6$ | .* )
+ )
+ /mx
+
+ RDOC = /
+ =begin (?!\S) [^\n]* \n?
+ (?:
+ (?! =end (?!\S) )
+ [^\n]* \n?
+ )*
+ (?:
+ =end (?!\S) [^\n]*
+ )?
+ /mx
+
+ DATA = /
+ __END__\n
+ (?:
+ (?=\#CODE)
+ |
+ .*
+ )
+ /
+
+ private
+ def scan_tokens tokens, options
+
+ state = :initial
+ regexp_allowed = true
+ last_token_dot = false
+
+ until eos?
+ match = nil
+ kind = :error
+
+ if scan(/\s+/) # in every state
+ kind = :space
+ regexp_allowed = :set if regexp_allowed or matched.index(?\n) # delayed flag setting
+
+ elsif scan(/ \#[^\n]* /x) # in every state
+ kind = :comment
+ regexp_allowed = :set if regexp_allowed
+
+ elsif state == :initial
+ # IDENTIFIERS, KEYWORDS
+ if scan(GLOBAL_VARIABLE)
+ kind = :global_variable
+ elsif scan(/ @@ #{IDENT} /ox)
+ kind = :class_variable
+ elsif scan(/ @ #{IDENT} /ox)
+ kind = :instance_variable
+ elsif scan(/ #{DATA} | #{RDOC} /ox)
+ kind = :comment
+ elsif scan(METHOD_NAME)
+ match = matched
+ if last_token_dot
+ kind =
+ if match[/^[A-Z]/]
+ :constant
+ else
+ :ident
+ end
+ else
+ kind = IDENT_KIND[match]
+ if kind == :ident and match[/^[A-Z]/]
+ kind = :constant
+ elsif kind == :reserved
+ state = DEF_NEW_STATE[match]
+ regexp_allowed = REGEXP_ALLOWED[match]
+ end
+ end
+
+ elsif scan(STRING)
+ kind = :string
+ elsif scan(SHELL)
+ kind = :shell
+ elsif scan(HEREDOC)
+ kind = :string
+ elsif check(/\//) and regexp_allowed
+ scan(REGEXP)
+ kind = :regexp
+ elsif scan(ESCAPE_STRING)
+ match = matched
+ kind =
+ case match[0]
+ when ?s
+ :symbol
+ when ?r
+ :regexp
+ when ?x
+ :shell
+ else
+ :string
+ end
+
+ elsif scan(/:(?:#{GLOBAL_VARIABLE}|#{METHOD_NAME_EX}|#{STRING})/ox)
+ kind = :symbol
+ elsif scan(/
+ \? (?:
+ [^\s\\]
+ |
+ \\ (?:M-\\C-|C-\\M-|M-\\c|c\\M-|c|C-|M-))? (?: \\ (?: . | [0-7]{3} | x[0-9A-Fa-f][0-9A-Fa-f] )
+ )
+ /mx)
+ kind = :integer
+
+ elsif scan(/ [-+*\/%=<>;,|&!()\[\]{}~?] | \.\.?\.? | ::? /x)
+ kind = :operator
+ match = matched
+ regexp_allowed = :set if match[-1,1] =~ /[~=!<>|&^,\(\[+\-\/\*%]\z/
+ last_token_dot = :set if match == '.' or match == '::'
+ elsif scan(FLOAT)
+ kind = :float
+ elsif scan(INTEGER)
+ kind = :integer
+ else
+ getch
+ end
+
+ elsif state == :def_expected
+ if scan(/ (?:#{IDENT}::)* (?:#{IDENT}\.)? #{METHOD_NAME_EX} /ox)
+ kind = :method
+ else
+ getch
+ end
+ state = :initial
+
+ elsif state == :module_expected
+ if scan(/<</)
+ kind = :operator
+ else
+ if scan(/ (?:#{IDENT}::)* #{IDENT} /ox)
+ kind = :method
+ else
+ getch
+ end
+ state = :initial
+ end
+
+ end
+
+ text = match || matched
+
+ if kind == :regexp and not eos?
+ text << scan(/[eimnosux]*/)
+ end
+
+ regexp_allowed = (regexp_allowed == :set) # delayed flag setting
+ last_token_dot = last_token_dot == :set
+
+ tokens << [text, kind]
+ end
+
+ tokens
+ end
+ end
+
+end end
diff --git a/lib/coderay/scanners/rubylex.rb b/lib/coderay/scanners/rubylex.rb new file mode 100644 index 0000000..2e69d39 --- /dev/null +++ b/lib/coderay/scanners/rubylex.rb @@ -0,0 +1,102 @@ +require 'rubygems'
+require_gem 'rubylexer'
+require 'rubylexer.rb'
+
+module CodeRay module Scanners
+
+ class RubyLex < Scanner
+
+ register_for :rubylex
+
+ class FakeFile < String
+
+ def initialize(*)
+ super
+ @pos = 0
+ end
+
+ attr_accessor :pos
+
+ def read x
+ pos = @pos
+ @pos += x
+ self[pos ... @pos]
+ end
+
+ def getc
+ pos = @pos
+ @pos += 1
+ self[pos]||-1
+ end
+
+ def eof?
+ @pos == size
+ end
+
+ def each_byte
+ until eof?
+ yield getc
+ end
+ end
+
+ def method_missing meth, *args
+ raise NoMethodError, '%s%s' % [meth, args]
+ end
+
+ end
+
+ private
+ Translate = {
+ :ignore => :comment,
+ :varname => :ident,
+ :number => :integer,
+ :ws => :space,
+ :escnl => :space,
+ :keyword => :reserved,
+ :methname => :method,
+ :renderexactlystring => :regexp,
+ :string => :string,
+ }
+
+ def scan_tokens tokens, options
+ require 'tempfile'
+ Tempfile.open('~coderay_tempfile') do |file|
+ file.binmode
+ file.write code
+ file.rewind
+ lexer = RubyLexer.new 'code', file
+ loop do
+ begin
+ tok = lexer.get1token
+ rescue => kaboom
+ err = <<-EOE
+ ERROR!!!
+#{kaboom.inspect}
+#{kaboom.backtrace.join("\n")}
+ EOE
+ tokens << [err, :error]
+ Kernel.raise
+ end
+ break if tok.is_a? EoiToken
+ next if tok.is_a? FileAndLineToken
+ kind = tok.class.name[/(.*?)Token$/,1].downcase.to_sym
+ kind = Translate.fetch kind, kind
+ text = tok.ident
+ case kind
+ when :hereplaceholder
+ text = tok.ender
+ kind = :string
+ when :herebody, :outlinedherebody
+ text = tok.ident.ident
+ kind = :string
+ end
+ text = text.inspect unless text.is_a? String
+ p token if kind == :error
+ tokens << [text.dup, kind]
+ end
+ end
+ tokens
+ end
+ end
+
+end end
diff --git a/lib/coderay/tokens.rb b/lib/coderay/tokens.rb new file mode 100644 index 0000000..71ad33a --- /dev/null +++ b/lib/coderay/tokens.rb @@ -0,0 +1,302 @@ +module CodeRay
+
+ # The Tokens class represents a list of tokens returnd from
+ # a Scanner.
+ #
+ # A token is not a special object, just a two-element Array
+ # consisting of
+ # * the _token_ _kind_ (a Symbol representing the type of the token)
+ # * the _token_ _text_ (the original source of the token in a String)
+ #
+ # A token looks like this:
+ #
+ # [:comment, '# It looks like this']
+ # [:float, '3.1415926']
+ # [:error, 'äöü']
+ #
+ # Some scanners also yield some kind of sub-tokens, represented by special
+ # token texts, namely :open and :close .
+ #
+ # The Ruby scanner, for example, splits "a string" into:
+ #
+ # [
+ # [:open, :string],
+ # [:delimiter, '"'],
+ # [:content, 'a string'],
+ # [:delimiter, '"'],
+ # [:close, :string]
+ # ]
+ #
+ # Tokens is also the interface between Scanners and Encoders:
+ # The input is split and saved into a Tokens object. The Encoder
+ # then builds the output from this object.
+ #
+ # Thus, the syntax below becomes clear:
+ #
+ # CodeRay.scan('price = 2.59', :ruby).html
+ # # the Tokens object is here -------^
+ #
+ # See how small it is? ;)
+ #
+ # Tokens gives you the power to handle pre-scanned code very easily:
+ # You can convert it to a webpage, a YAML file, or dump it into a gzip'ed string
+ # that you put in your DB.
+ #
+ # Tokens' subclass TokenStream allows streaming to save memory.
+ class Tokens < Array
+
+ class << self
+
+ # Convert the token to a string.
+ #
+ # This format is used by Encoders.Tokens.
+ # It can be reverted using read_token.
+ def write_token text, type
+ if text.is_a? String
+ "#{type}\t#{escape(text)}\n"
+ else
+ ":#{text}\t#{type}\t\n"
+ end
+ end
+
+ # Read a token from the string.
+ #
+ # Inversion of write_token.
+ #
+ # TODO Test this!
+ def read_token token
+ type, text = token.split("\t", 2)
+ if type[0] == ?:
+ [text.to_sym, type[1..-1].to_sym]
+ else
+ [type.to_sym, unescape(text)]
+ end
+ end
+
+ # Escapes a string for use in write_token.
+ def escape text
+ text.gsub(/[\n\\]/, '\\\\\&')
+ end
+
+ # Unescapes a string created by escape.
+ def unescape text
+ text.gsub(/\\[\n\\]/) { |m| m[1,1] }
+ end
+
+ end
+
+ # Whether the object is a TokenStream.
+ #
+ # Returns false.
+ def stream?
+ false
+ end
+
+ alias :orig_each :each
+ # Iterates over all tokens.
+ #
+ # If a filter is given, only tokens of that kind are yielded.
+ def each kind_filter = nil, &block
+ unless kind_filter
+ orig_each(&block)
+ else
+ orig_each do |text, kind|
+ next unless kind == kind_filter
+ yield text, kind
+ end
+ end
+ end
+
+ # Iterates over all text tokens.
+ # Range tokens like [:open, :string] are left out.
+ #
+ # Example:
+ # tokens.each_text_token { |text, kind| text.replace html_escape(text) }
+ def each_text_token
+ orig_each do |text, kind|
+ next unless text.respond_to? :to_str
+ yield text, kind
+ end
+ end
+
+ # Encode the tokens using encoder.
+ #
+ # encoder can be
+ # * a symbol like :html oder :statistic
+ # * an Encoder class
+ # * an Encoder object
+ #
+ # options are passed to the encoder.
+ def encode encoder, options = {}
+ unless encoder.is_a? Encoders::Encoder
+ unless encoder.is_a? Class
+ encoder_class = Encoders[encoder]
+ end
+ encoder = encoder_class.new options
+ end
+ encoder.encode_tokens self, options
+ end
+
+ # Redirects unknown methods to encoder calls.
+ #
+ # For example, if you call +tokens.html+, the HTML encoder
+ # is used to highlight the tokens.
+ def method_missing meth, options = {}
+ Encoders[meth].new(options).encode_tokens self
+ end
+
+ # Returns the tokens compressed by joining consecutive
+ # tokens of the same kind.
+ #
+ # This can not be undone, but should yield the same output
+ # in most Encoders. It basically makes the output smaller.
+ #
+ # Combined with dump, it saves database space.
+ def optimize
+ last_kind, last_text = nil, nil
+ new = self.class.new
+ each do |text, kind|
+ if text.is_a? String
+ if kind == last_kind
+ last_text << text
+ else
+ new << [last_text, last_kind] if last_kind
+ last_text = text
+ last_kind = kind
+ end
+ else
+ new << [last_text, last_kind] if last_kind
+ last_kind, last_text = nil, nil
+ new << [text, kind]
+ end
+ end
+ new << [last_text, last_kind] if last_kind
+ new
+ end
+
+ # Compact the object itself; see compact.
+ def optimize!
+ replace optimize
+ end
+
+ # Dumps the object into a String that can be saved
+ # in files or databases.
+ #
+ # The dump is created with Marshal.dump;
+ # In addition, it is gzipped using GZip.gzip.
+ #
+ # The returned String object includes Undumping
+ # so it has an #undump method. See Tokens.load.
+ #
+ # You can configure the level of compression,
+ # but the default value 7 should be what you want
+ # in most cases as it is a good comprimise between
+ # speed and compression rate.
+ #
+ # See GZip module.
+ def dump gzip_level = 7
+ require 'coderay/helpers/gzip_simple'
+ dump = Marshal.dump self
+ dump = dump.gzip gzip_level
+ dump.extend Undumping
+ end
+
+ # The total size of the tokens;
+ # Should be equal to the input size before
+ # scanning.
+ def text_size
+ map { |t, k| t }.join.size
+ end
+
+ # Include this module to give an object an #undump
+ # method.
+ #
+ # The string returned by Tokens.dump includes Undumping.
+ module Undumping
+ # Calls Tokens.load with itself.
+ def undump
+ Tokens.load self
+ end
+ end
+
+ # Undump the object using Marshal.load, then
+ # unzip it using GZip.gunzip.
+ #
+ # The result is commonly a Tokens object, but
+ # this is not guaranteed.
+ def Tokens.load dump
+ require 'coderay/helpers/gzip_simple'
+ dump = dump.gunzip
+ @dump = Marshal.load dump
+ end
+
+ end
+
+
+ # The TokenStream class is a fake Array without elements.
+ #
+ # It redirects the method << to a block given at creation.
+ #
+ # This allows scanners and Encoders to use streaming (no
+ # tokens are saved, the input is highlighted the same time it
+ # is scanned) with the same code.
+ #
+ # See CodeRay.encode_stream and CodeRay.scan_stream
+ class TokenStream < Tokens
+
+ # Whether the object is a TokenStream.
+ #
+ # Returns true.
+ def stream?
+ true
+ end
+
+ # The Array is empty, but size counts the tokens given by <<.
+ attr_reader :size
+
+ # Creates a new TokenStream that calls +block+ whenever
+ # its << method is called.
+ #
+ # Example:
+ #
+ # require 'coderay'
+ #
+ # token_stream = CodeRay::TokenStream.new do |kind, text|
+ # puts 'kind: %s, text size: %d.' % [kind, text.size]
+ # end
+ #
+ # token_stream << [:regexp, '/\d+/']
+ # #-> kind: rexpexp, text size: 5.
+ #
+ def initialize &block
+ raise ArgumentError, 'Block expected for streaming.' unless block
+ @callback = block
+ @size = 0
+ end
+
+ # Calls +block+ with +token+ and increments size.
+ def << token
+ @callback.call token
+ @size += 1
+ end
+
+ # This method is not implemented due to speed reasons. Use Tokens.
+ def text_size
+ raise NotImplementedError, 'This method is not implemented due to speed reasons.'
+ end
+
+ # A TokenStream cannot be dumped. Use Tokens.
+ def dump
+ raise NotImplementedError, 'A TokenStream cannot be dumped.'
+ end
+
+ # A TokenStream cannot be compacted. Use Tokens.
+ def compact
+ raise NotImplementedError, 'A TokenStream cannot be compacted.'
+ end
+
+ end
+
+end
+
+# vim:sw=2:ts=2:et:tw=78
|