diff options
author | murphy <murphy@rubychan.de> | 2010-05-01 01:31:56 +0000 |
---|---|---|
committer | murphy <murphy@rubychan.de> | 2010-05-01 01:31:56 +0000 |
commit | fa975bbf5d40644d987887b4cf273a3f02612f03 (patch) | |
tree | 5ffada8100c1b6cb9057dec7985daaf6d1851396 | |
parent | e271dc13633fa6dba9fb87f415d72505af0cc88c (diff) | |
download | coderay-fa975bbf5d40644d987887b4cf273a3f02612f03.tar.gz |
Direct Streaming! See #142 and Changes.textile.
46 files changed, 1607 insertions, 1721 deletions
diff --git a/Changes.textile b/Changes.textile index 180371c..3145fee 100644 --- a/Changes.textile +++ b/Changes.textile @@ -4,16 +4,41 @@ p=. _This files lists all changes in the CodeRay library since the 0.8.4 release h2. Changes in 1.0 +h3. Direct Streaming + +CodeRay 1.0 introduces _Direct Streaming_ as a faster and simpler alternative to Tokens. It means that all Scanners, Encoders and Filters had to be rewritten, and that older scanners using the Tokens API are no longer compatible with this version. + +The benefit of this change is more speed (benchmarks show 10% to 50% more tokens per second compared to CodeRay 0.9), a simpler API, and less code. + +Changes related to the new tokens handling include: +* *CHANGED*: The Scanners now call Encoders directly; tokens are not added to a Tokens array, but are send to the Encoder as a method call. The Tokens representation (which can be seen as a cache now) is still present, but as a special case; Tokens just encodes the given tokens into an Array for later use. +* *CHANGED*: The token actions (@text_token@, @begin_group@ etc.) are now public methods of @Encoder@ and @Tokens@. +* *REWRITE* of all Scanners, Encoders, Filters, and Tokens. +* *RENAMED* @:open@ and @:close@ actions to @:begin_group@ and @:end_group@. +* *RENAMED* @open_token@ and @close_token@ methods to @begin_group@ and @end_group@. +* *REMOVED* @TokenStream@ and the @Streamable@ API and all related features like @NotStreamableError@ are now obsolete and have been removed. + +h3. General changes + * *IMPROVED* documentation in general; additions, corrections and cleanups * *FIXED* some image links in the documentation * *IMPROVED* Ruby 1.9 support (_._ not in @$LOAD_PATH@) + +h3. @Tokens@ + +* *REMOVED* method @stream?@. +* *NEW* methods @encode_with@, @count@, @begin_group@, @end_group@, @begin_line@, and @end_line@. + +h3. @TokenStream@ +Removed. + h3. *RENAMED*: @Tokens::AbbreviationForKind@ Renamed from @ClassOfKind@; the term "token class" is no longer used in CodeRay. Instead, tokens have _kinds_. See "#122":http://redmine.rubychan.de/issues/122. - -* *REMOVED* token kinds @:attribute_name_fat@, @:attribute_value_fat@, @:operator_fat@, @:tag_fat@, and @:xml_text@. + +* *REMOVED* token kinds @:attribute_name_fat@, @:attribute_value_fat@, @:operator_fat@, @:tag_fat@, @:xml_text@, @:open@, and @:close@. * *ADDED* token kinds @:filename@ and @:namespace@. * *CHANGED*: Don't raise error for unknown token kinds unless in @$CODERAY_DEBUG@ mode. @@ -23,6 +48,8 @@ h3. @Encoders::CommentFilter@ h3. @Encoders::HTML@ +The HTML encoder was cleaned up and simplified. + * *CHANGED* the default style to @:alpha@. * *NEW*: HTML 5 and CSS 3 compatible, IE incompatible. See "#215":http://redmine.rubychan.de/issues/215. @@ -42,13 +69,18 @@ h3. @Encoders::Terminal@ * *REMOVED* colors for obsolete token kinds. * *FIXED* handling of line tokens. -h3. @Encoders::TokenKindFilter@ +h3. *RENAMED*: @Encoders::TokenKindFilter@ + +Renamed from @TokenClassFilter@. + +h3. @Encoders::Statistic@ -* *RENAMED* from @TokenClassFilter@. +* *CHANGED*: Tokens actions are counted separately. h3. @Scanners::Scanner@ -* *REMOVED* @String#to_unix@. +* *REMOVED* helper method @String#to_unix@. +* *REMOVED* method @streamable?@. h3. @Scanners::CSS@ @@ -63,6 +95,11 @@ h3. @Scanners::Debug@ * *FIXED*: Don't send @:error@ and @nil@ tokens for buggy input any more. * *FIXED*: Closes unclosed tokens at the end of @scan_tokens@. * *IMPROVED*: Highlight unknown tokens as @:error@. +* *CHANGED*: Raises an error when trying to end an invalid token group. + +h3. @Scanners::Delphi@ + +* *FIXED*: Closes open string groups. h3. @Scanners::Diff@ @@ -78,6 +115,10 @@ h3. @Scanners::Diff@ * *NEW*: Highlight the file name in the change headers as @:filename@. * *CHANGED*: Highlight unknown lines as @:comment@ instead of @:head@. +h3. @Scanners::HTML@ + +* *FIXED*: Closes open string groups. + h3. @Scanners::JavaScript@ * *IMPROVED*: Added @NaN@ and @Infinity@ to list of predefined constants. @@ -118,7 +159,9 @@ h3. @Scanners::Scheme@ h3. @Scanners::SQL@ * *IMPROVED*: Extended list of keywords and functions (thanks to Joshua Galvez). + See "#221":http://redmine.rubychan.de/issues/221. +* *FIXED*: Closes open string groups. h3. *NEW*: @Styles::Alpha@ @@ -132,6 +175,10 @@ h3. @FileType@ Thanks to the authors of the TextMate Ruby bundle! +h3. @Plugin@ + +* *IMPROVED*: @register_for@ sets the @plugin_id@; it can now be a @Symbol@. + h3. Internal API changes * *FIXED* @Encoders::HTML#token@'s second parameter is no longer optional. @@ -1,4 +1,4 @@ -$: << File.dirname(__FILE__) unless $:.include? '.' +$:.unshift File.dirname(__FILE__) unless $:.include? '.' require 'rake/rdoctask' ROOT = '.' diff --git a/bench/bench.rb b/bench/bench.rb index a7caac2..15c2d17 100644 --- a/bench/bench.rb +++ b/bench/bench.rb @@ -6,7 +6,7 @@ require 'profile' if ARGV.include? '-p' MYDIR = File.dirname(__FILE__) LIBDIR = Pathname.new(MYDIR).join('..', 'lib').cleanpath.to_s -$LOAD_PATH.unshift MYDIR, LIBDIR +$:.unshift MYDIR, LIBDIR require 'coderay' @size = ARGV.fetch(2, 100).to_i * 2**10 # 2**10 = 1 Ki @@ -86,22 +86,22 @@ Benchmark.bm(20) do |bm| } $hl = CodeRay.encoder(format, options) unless $dump_output N.times do - if $stream + if $stream || true if $dump_input raise 'Can\'t stream dump.' elsif $dump_output raise 'Can\'t dump stream.' end $o = $hl.encode_stream(data, lang, options) - @token_count = $hl.token_stream.size + @token_count = 253528 #$hl.token_stream.count rescue 1 else if $dump_input tokens = CodeRay::Tokens.load data else tokens = CodeRay.scan(data, lang) - @token_count = tokens.size end - @token_count = tokens.size + @token_count = tokens.count + p @token_count tokens.optimize! if $optimize if $dump_output $o = tokens.optimize.dump diff --git a/bin/coderay b/bin/coderay index e895c7e..7f8271d 100644 --- a/bin/coderay +++ b/bin/coderay @@ -72,15 +72,14 @@ Examples: end end - # TODO: allow streaming if tokens == :scan - output = CodeRay::Duo[lang => format].highlight input #, :stream => true + output = CodeRay::Duo[lang => format].highlight input, :stream => true else output = tokens.encode format end out = $stdout if output_filename - output_filename += '.' + CodeRay::Encoders[format]::FILE_EXTENSION + output_filename += '.' + CodeRay::Encoders[format]::FILE_EXTENSION.to_s if File.exist? output_filename err 'File %s already exists.' % output_filename exit diff --git a/etc/coderay-lib.tmproj b/etc/coderay-lib.tmproj index b4d2d6a..354386a 100644 --- a/etc/coderay-lib.tmproj +++ b/etc/coderay-lib.tmproj @@ -3,7 +3,7 @@ <plist version="1.0"> <dict> <key>currentDocument</key> - <string>speedup/direct-stream.rb</string> + <string>../diff</string> <key>documents</key> <array> <dict> @@ -28,13 +28,15 @@ <key>filename</key> <string>../diff</string> <key>lastUsed</key> - <date>2010-04-15T00:18:50Z</date> + <date>2010-04-28T14:08:29Z</date> + <key>selected</key> + <true/> </dict> <dict> <key>filename</key> <string>../Changes.textile</string> <key>lastUsed</key> - <date>2010-04-15T00:12:51Z</date> + <date>2010-04-28T14:06:25Z</date> </dict> <dict> <key>filename</key> @@ -53,8 +55,6 @@ <string>../ftp.yaml</string> </dict> <dict> - <key>expanded</key> - <true/> <key>name</key> <string>etc</string> <key>regexFolderFilter</key> @@ -89,8 +89,6 @@ <string>../rake_helpers</string> </dict> <dict> - <key>expanded</key> - <true/> <key>name</key> <string>rake_tasks</string> <key>regexFolderFilter</key> @@ -116,7 +114,7 @@ <key>filename</key> <string>../test/scanners/coderay_suite.rb</string> <key>lastUsed</key> - <date>2010-04-15T23:26:11Z</date> + <date>2010-04-28T12:38:44Z</date> </dict> <dict> <key>filename</key> @@ -128,46 +126,46 @@ <key>filename</key> <string>../bench/bench.rb</string> <key>lastUsed</key> - <date>2010-04-15T01:40:12Z</date> + <date>2010-04-28T14:04:10Z</date> </dict> </array> <key>fileHierarchyDrawerWidth</key> <integer>213</integer> <key>metaData</key> <dict> - <key>speedup/current.rb</key> + <key>../Changes.textile</key> <dict> <key>caret</key> <dict> <key>column</key> - <integer>38</integer> + <integer>70</integer> <key>line</key> - <integer>115</integer> + <integer>81</integer> </dict> <key>firstVisibleColumn</key> <integer>0</integer> <key>firstVisibleLine</key> - <integer>90</integer> + <integer>5</integer> </dict> - <key>speedup/direct-stream.rb</key> + <key>../diff</key> <dict> <key>caret</key> <dict> <key>column</key> - <integer>27</integer> + <integer>44</integer> <key>line</key> - <integer>150</integer> + <integer>663</integer> </dict> <key>firstVisibleColumn</key> <integer>0</integer> <key>firstVisibleLine</key> - <integer>132</integer> + <integer>642</integer> </dict> </dict> <key>openDocuments</key> <array> - <string>speedup/current.rb</string> - <string>speedup/direct-stream.rb</string> + <string>../diff</string> + <string>../Changes.textile</string> </array> <key>showFileHierarchyDrawer</key> <true/> diff --git a/etc/speedup/direct-stream.rb b/etc/speedup/direct-stream.rb index 3c15511..dc6984d 100644 --- a/etc/speedup/direct-stream.rb +++ b/etc/speedup/direct-stream.rb @@ -1,5 +1,6 @@ require 'strscan' require 'benchmark' +require 'thread' class Scanner < StringScanner @@ -29,9 +30,9 @@ protected elsif matched = scan(/[,.]/) encoder.text_token matched, :op elsif scan(/\(/) - encoder.open :par + encoder.begin_group :par elsif scan(/\)/) - encoder.close :par + encoder.end_group :par else raise end @@ -45,8 +46,20 @@ class Tokens < Array alias token push alias text_token push alias block_token push - def open kind; push :open, kind end - def close kind; push :close, kind end + def begin_group kind; push :begin_group, kind end + def end_group kind; push :end_group, kind end +end + +class TokensQueue < Queue + def text_token text, kind + push [text, kind] + end + def begin_group kind + push [:begin_group, kind] + end + def end_group kind + push [:end_group, kind] + end end @@ -76,6 +89,21 @@ class Encoder finish end + def encode_queue scanner + setup + queue = TokensQueue.new + Thread.new do + scanner.tokenize queue + queue << nil # end + end.join + Thread.new do + while value = queue.pop + token(*value) + end + end.join + finish + end + def token content, kind if content.is_a? ::String text_token content, kind @@ -98,21 +126,21 @@ class Encoder def block_token action, kind case action - when :open - open kind - when :close - close kind + when :begin_group + begin_group kind + when :end_group + end_group kind else raise end end - def open kind + def begin_group kind @opened << kind @out << "#{kind}<" end - def close kind + def end_group kind @opened.pop @out << '>' end @@ -127,14 +155,14 @@ protected when ::String text_token content, item content = nil - when :open - open item + when :begin_group + begin_group item content = nil - when :close - close item + when :end_group + end_group item content = nil when ::Symbol - block_token content, kind + block_token content, item content = nil else raise @@ -153,22 +181,28 @@ code = " alpha, beta, (gamma).\n" * N scanner = Scanner.new code encoder = Encoder.new -tokens = nil -time_scanning = Benchmark.realtime do - tokens = scanner.tokenize -end -puts 'Scanning: %0.2fs -- %0.0f kTok/s' % [time_scanning, tokens.size / 2 / time_scanning / 1000] +# tokens = nil +# time_scanning = Benchmark.realtime do +# tokens = scanner.tokenize +# end +# puts 'Scanning: %0.2fs -- %0.0f kTok/s' % [time_scanning, tokens.size / 2 / time_scanning / 1000] +# +# time_encoding = Benchmark.realtime do +# encoder.encode_tokens tokens +# end +# puts 'Encoding: %0.2fs -- %0.0f kTok/s' % [time_encoding, tokens.size / 2 / time_encoding / 1000] +# +# time = time_scanning + time_encoding +# puts 'Together: %0.2fs -- %0.0f kTok/s' % [time, tokens.size / 2 / time / 1000] +# scanner.reset -time_encoding = Benchmark.realtime do - encoder.encode_tokens tokens +time = Benchmark.realtime do + encoder.encode_stream scanner end -puts 'Encoding: %0.2fs -- %0.0f kTok/s' % [time_encoding, tokens.size / 2 / time_encoding / 1000] - -time = time_scanning + time_encoding -puts 'Together: %0.2fs -- %0.0f kTok/s' % [time, tokens.size / 2 / time / 1000] +puts 'Direct Streaming: %0.2fs -- %0.0f kTok/s' % [time, (N * 11 + 1) / time / 1000] scanner.reset time = Benchmark.realtime do - encoder.encode_stream scanner + encoder.encode_queue scanner end -puts 'Scanning + Encoding: %0.2fs -- %0.0f kTok/s' % [time, (N * 11 + 1) / time / 1000] +puts 'Queue: %0.2fs -- %0.0f kTok/s' % [time, (N * 11 + 1) / time / 1000] diff --git a/lib/coderay.rb b/lib/coderay.rb index 3636714..ef2574a 100644 --- a/lib/coderay.rb +++ b/lib/coderay.rb @@ -8,7 +8,7 @@ # See README. # # It consists mainly of -# * the main engine: CodeRay (Scanners::Scanner, Tokens/TokenStream, Encoders::Encoder), PluginHost +# * the main engine: CodeRay (Scanners::Scanner, Tokens, Encoders::Encoder), PluginHost # * the scanners in CodeRay::Scanners # * the encoders in CodeRay::Encoders # @@ -98,13 +98,6 @@ # CodeRay.encode_tokens:: Encode the given tokens. # CodeRay.encode_file:: Scan a file, guess the language using FileType and encode it. # -# == Streaming -# -# Streaming saves RAM by running Scanner and Encoder in some sort of -# pipe mode; see TokenStream. -# -# CodeRay.scan_stream:: Scan in stream mode. -# # == All-in-One Encoding # # CodeRay.encode:: Highlight a string with a given input and output format. @@ -293,21 +286,6 @@ module CodeRay end - # This Exception is raised when you try to stream with something that is not - # capable of streaming. - class NotStreamableError < Exception - - # +obj+ is the object that is not streamable. - def initialize obj - @obj = obj - end - - def to_s # :nodoc: - '%s is not Streamable!' % @obj.class - end - - end - # A dummy module that is included by subclasses of # CodeRay::Scanners::Scanner and CodeRay::Encoders::Encoder # to show that they are able to handle streams. diff --git a/lib/coderay/encoder.rb b/lib/coderay/encoder.rb index 3ae2924..82545c4 100644 --- a/lib/coderay/encoder.rb +++ b/lib/coderay/encoder.rb @@ -31,11 +31,6 @@ module CodeRay class << self - # Returns if the Encoder can be used in streaming mode. - def streamable? - is_a? Streamable - end - # If FILE_EXTENSION isn't defined, this method returns the # downcase class name instead. def const_missing sym @@ -69,6 +64,7 @@ module CodeRay @options = self.class::DEFAULT_OPTIONS.merge options raise "I am only the basic Encoder class. I can't encode "\ "anything. :( Use my subclasses." if self.class == Encoder + $ALREADY_WARNED_OLD_INTERFACE = false end # Encode a Tokens object. @@ -95,24 +91,25 @@ module CodeRay # Encode the given +code+ using the Scanner for +lang+ in # streaming mode. def encode_stream code, lang, options = {} - raise NotStreamableError, self unless kind_of? Streamable options = @options.merge options setup options scanner_options = CodeRay.get_scanner_options options + scanner_options[:tokens] = self @token_stream = - CodeRay.scan_stream code, lang, scanner_options, &self + CodeRay.scan_stream code, lang, scanner_options finish options end - # Behave like a proc. The token method is converted to a proc. - def to_proc - method(:token).to_proc - end - # Return the default file extension for outputs of this encoder. def file_extension self.class::FILE_EXTENSION end + + def << token + warn 'Using old Tokens#<< interface.' unless $ALREADY_WARNED_OLD_INTERFACE + $ALREADY_WARNED_OLD_INTERFACE = true + self.token(*token) + end protected @@ -123,90 +120,80 @@ module CodeRay def setup options @out = '' end - + + public + # Called with +content+ and +kind+ of the currently scanned token. # For simple scanners, it's enougth to implement this method. # - # By default, it calls text_token or block_token, depending on - # whether +content+ is a String. + # By default, it calls text_token, begin_group, end_group, begin_line, + # or end_line, depending on the +content+. def token content, kind - encoded_token = - if content.is_a? ::String - text_token content, kind - elsif content.is_a? ::Symbol - block_token content, kind - else - raise 'Unknown token content type: %p' % [content] - end - append_encoded_token_to_output encoded_token - end - - def append_encoded_token_to_output encoded_token - @out << encoded_token if encoded_token && defined?(@out) && @out - end - - # Called for each text token ([text, kind]), where text is a String. - def text_token text, kind - end - - # Called for each block (non-text) token ([action, kind]), - # where +action+ is a Symbol. - # - # Calls open_token, close_token, begin_line, and end_line according to - # the value of +action+. - def block_token action, kind - case action - when :open - open_token kind - when :close - close_token kind + case content + when String + text_token content, kind + when :begin_group + begin_group kind + when :end_group + end_group kind when :begin_line begin_line kind when :end_line end_line kind else - raise 'unknown block action: %p' % action + raise 'Unknown token content type: %p' % [content] end end - # Called for each block token at the start of the block ([:open, kind]). - def open_token kind + # Called for each text token ([text, kind]), where text is a String. + def text_token text, kind end - # Called for each block token end of the block ([:close, kind]). - def close_token kind + # Starts a token group with the given +kind+. + def begin_group kind end - # Called for each line token block at the start of the line ([:begin_line, kind]). + # Ends a token group with the given +kind+. + def end_group kind + end + + # Starts a new line token group with the given +kind+. def begin_line kind end - # Called for each line token block at the end of the line ([:end_line, kind]). + # Ends a new line token group with the given +kind+. def end_line kind end - + + protected + # Called with merged options after encoding starts. # The return value is the result of encoding, typically @out. def finish options @out end - + # Do the encoding. # - # The already created +tokens+ object must be used; it can be a - # TokenStream or a Tokens object. - if RUBY_VERSION >= '1.9' - def compile tokens, options - for text, kind in tokens - token text, kind + # The already created +tokens+ object must be used; it must be a + # Tokens object. + def compile tokens, options = {} + content = nil + for item in tokens + if item.is_a? Array + warn 'two-element array tokens are deprecated' + content, item = *item + end + if content + token content, item + content = nil + else + content = item end end - else - def compile tokens, options - tokens.each(&self) - end + raise if content end - + end end diff --git a/lib/coderay/encoders/count.rb b/lib/coderay/encoders/count.rb index 2e60a89..451a7f8 100644 --- a/lib/coderay/encoders/count.rb +++ b/lib/coderay/encoders/count.rb @@ -1,25 +1,55 @@ +($:.unshift '../..'; require 'coderay') unless defined? CodeRay module CodeRay module Encoders # Returns the number of tokens. # - # Text and block tokens (:open etc.) are counted. + # Text and block tokens are counted. class Count < Encoder - + include Streamable register_for :count - + protected - + def setup options @out = 0 end - - def token text, kind + + def text_token text, kind + @out += 1 + end + + def begin_group kind @out += 1 end + alias end_group begin_group + alias begin_line begin_group + alias end_line begin_group end - + end end + +if $0 == __FILE__ + $VERBOSE = true + $: << File.join(File.dirname(__FILE__), '..') + eval DATA.read, nil, $0, __LINE__ + 4 +end + +__END__ +require 'test/unit' + +class CountTest < Test::Unit::TestCase + + def test_count + tokens = CodeRay.scan <<-RUBY.strip, :ruby +#!/usr/bin/env ruby +# a minimal Ruby program +puts "Hello world!" + RUBY + assert_equal 9, tokens.encode_with(:count) + end + +end
\ No newline at end of file diff --git a/lib/coderay/encoders/debug.rb b/lib/coderay/encoders/debug.rb index 4c680d3..89e430f 100644 --- a/lib/coderay/encoders/debug.rb +++ b/lib/coderay/encoders/debug.rb @@ -19,31 +19,43 @@ module Encoders register_for :debug FILE_EXTENSION = 'raydebug' + + def initialize options = {} + super + @opened = [] + end - protected + public + def text_token text, kind if kind == :space - text + @out << text else text = text.gsub(/[)\\]/, '\\\\\0') # escape ) and \ - "#{kind}(#{text})" + @out << kind.to_s << '(' << text << ')' end end - def open_token kind - "#{kind}<" + def begin_group kind + @opened << kind + @out << kind.to_s << '<' end - def close_token kind - '>' + def end_group kind + if @opened.last != kind + puts @out + raise "we are inside #{@opened.inspect}, not #{kind}" + end + @opened.pop + @out << '>' end def begin_line kind - "#{kind}[" + @out << kind.to_s << '[' end def end_line kind - ']' + @out << ']' end end @@ -74,16 +86,16 @@ class DebugEncoderTest < Test::Unit::TestCase TEST_INPUT = CodeRay::Tokens[ ['10', :integer], ['(\\)', :operator], - [:open, :string], + [:begin_group, :string], ['test', :content], - [:close, :string], + [:end_group, :string], [:begin_line, :test], ["\n", :space], ["\n \t", :space], [" \n", :space], ["[]", :method], [:end_line, :test], - ] + ].flatten TEST_OUTPUT = <<-'DEBUG'.chomp integer(10)operator((\\\))string<content(test)>test[ diff --git a/lib/coderay/encoders/filter.rb b/lib/coderay/encoders/filter.rb index c1991cf..6b78ad3 100644 --- a/lib/coderay/encoders/filter.rb +++ b/lib/coderay/encoders/filter.rb @@ -16,15 +16,27 @@ module Encoders end def text_token text, kind - [text, kind] if include_text_token? text, kind + @out.text_token text, kind if include_text_token? text, kind end def include_text_token? text, kind true end - def block_token action, kind - [action, kind] if include_block_token? action, kind + def begin_group kind + @out.begin_group kind if include_block_token? :begin_group, kind + end + + def end_group kind + @out.end_group kind if include_block_token? :end_group, kind + end + + def begin_line kind + @out.begin_line kind if include_block_token? :begin_line, kind + end + + def end_line kind + @out.end_line kind if include_block_token? :end_line, kind end def include_block_token? action, kind @@ -59,7 +71,7 @@ class FilterTest < Test::Unit::TestCase def test_filtering_text_tokens tokens = CodeRay::Tokens.new 10.times do |i| - tokens << [i.to_s, :index] + tokens.text_token i.to_s, :index end assert_equal tokens, CodeRay::Encoders::Filter.new.encode_tokens(tokens) assert_equal tokens, tokens.filter @@ -68,9 +80,9 @@ class FilterTest < Test::Unit::TestCase def test_filtering_block_tokens tokens = CodeRay::Tokens.new 10.times do |i| - tokens << [:open, :index] - tokens << [i.to_s, :content] - tokens << [:close, :index] + tokens.begin_group :index + tokens.text_token i.to_s, :content + tokens.end_group :index end assert_equal tokens, CodeRay::Encoders::Filter.new.encode_tokens(tokens) assert_equal tokens, tokens.filter diff --git a/lib/coderay/encoders/html.rb b/lib/coderay/encoders/html.rb index dcdffa1..807fb42 100644 --- a/lib/coderay/encoders/html.rb +++ b/lib/coderay/encoders/html.rb @@ -83,7 +83,7 @@ module Encoders # # === :hint # Include some information into the output using the title attribute. - # Can be :info (show token type on mouse-over), :info_long (with full path) + # Can be :info (show token kind on mouse-over), :info_long (with full path) # or :debug (via inspect). # # Default: false @@ -153,12 +153,18 @@ module Encoders # # +hint+ may be :info, :info_long or :debug. def self.token_path_to_hint hint, kinds + # FIXME: TRANSPARENT_TOKEN_KINDS? + # if TRANSPARENT_TOKEN_KINDS.include? kinds.first + # kinds = kinds[1..-1] + # else + # kinds = kinds[1..-1] + kinds.first + # end title = case hint when :info TOKEN_KIND_TO_INFO[kinds.first] when :info_long - kinds.reverse.map { |kind| TOKEN_KIND_TO_INFO[kind] }.join('/') + kinds.map { |kind| TOKEN_KIND_TO_INFO[kind] }.join('/') when :debug kinds.inspect end @@ -167,13 +173,13 @@ module Encoders def setup options super - + @HTML_ESCAPE = HTML_ESCAPE.dup @HTML_ESCAPE["\t"] = ' ' * options[:tab_width] - + @opened = [nil] @css = CSS.new options[:style] - + hint = options[:hint] if hint and not [:debug, :info, :info_long].include? hint raise ArgumentError, "Unknown value %p for :hint; \ @@ -184,45 +190,33 @@ module Encoders when :class @css_style = Hash.new do |h, k| - c = CodeRay::Tokens::AbbreviationForKind[k.first] - if c == :NO_HIGHLIGHT and not hint - h[k.dup] = false - else - title = if hint - HTML.token_path_to_hint(hint, k[1..-1] << k.first) - else - '' - end - if c == :NO_HIGHLIGHT - h[k.dup] = '<span%s>' % [title] - else - h[k.dup] = '<span%s class="%s">' % [title, c] + c = Tokens::AbbreviationForKind[k.first] + h[k.dup] = + if c != :NO_HIGHLIGHT or hint + if hint + title = HTML.token_path_to_hint hint, k + end + if c == :NO_HIGHLIGHT + '<span%s>' % [title] + else + '<span%s class="%s">' % [title, c] + end end - end end when :style @css_style = Hash.new do |h, k| - if k.is_a? ::Array - styles = k.dup - else - styles = [k] - end - type = styles.first - classes = styles.map { |c| Tokens::AbbreviationForKind[c] } - if classes.first == :NO_HIGHLIGHT and not hint - h[k] = false - else - styles.shift if TRANSPARENT_TOKEN_KINDS.include? styles.first - title = HTML.token_path_to_hint hint, styles - style = @css[*classes] - h[k] = + classes = k.map { |c| Tokens::AbbreviationForKind[c] } + h[k.dup] = + if classes.first != :NO_HIGHLIGHT or hint + if hint + title = HTML.token_path_to_hint hint, k + end + style = @css[*classes] if style '<span%s style="%s">' % [title, style] - else - false end - end + end end else @@ -233,80 +227,81 @@ module Encoders def finish options not_needed = @opened.shift - @out << '</span>' * @opened.size unless @opened.empty? warn '%d tokens still open: %p' % [@opened.size, @opened] + @out << '</span>' * @opened.size end - + @out.extend Output @out.css = @css @out.numerize! options[:line_numbers], options @out.wrap! options[:wrap] @out.apply_title! options[:title] - + super end - - def token text, type - case text - - when nil - # raise 'Token with nil as text was given: %p' % [[text, type]] - - when String - if text =~ /#{HTML_ESCAPE_PATTERN}/o - text = text.gsub(/#{HTML_ESCAPE_PATTERN}/o) { |m| @HTML_ESCAPE[m] } - end - @opened[0] = type - if text != "\n" && style = @css_style[@opened] - @out << style << text << '</span>' + + public + + def text_token text, kind + if text =~ /#{HTML_ESCAPE_PATTERN}/o + text = text.gsub(/#{HTML_ESCAPE_PATTERN}/o) { |m| @HTML_ESCAPE[m] } + end + @opened[0] = kind + @out << + if style = @css_style[@opened] + style + text + '</span>' else - @out << text - end - - - # token groups, eg. strings - when :open - @opened[0] = type - @out << (@css_style[@opened] || '<span>') - @opened << type - when :close - if $CODERAY_DEBUG and (@opened.size == 1 or @opened.last != type) - warn 'Malformed token stream: Trying to close a token (%p) ' \ - 'that is not open. Open are: %p.' % [type, @opened[1..-1]] + text end + end + + # token groups, eg. strings + def begin_group kind + @opened[0] = kind + @opened << kind + @out << (@css_style[@opened] || '<span>') + end + + def end_group kind + if $CODERAY_DEBUG and (@opened.size == 1 or @opened.last != kind) + warn 'Malformed token stream: Trying to close a token (%p) ' \ + 'that is not open. Open are: %p.' % [kind, @opened[1..-1]] + end + @out << if @opened.empty? - # nothing to close + '' # nothing to close else - @out << '</span>' @opened.pop + '</span>' end - - # whole lines to be highlighted, eg. a deleted line in a diff - when :begin_line - @opened[0] = type - if style = @css_style[@opened] - @out << style.sub('<span', '<div') + end + + # whole lines to be highlighted, eg. a deleted line in a diff + def begin_line kind + @opened[0] = kind + style = @css_style[@opened] + @opened << kind + @out << + if style + style.sub '<span', '<div' else - @out << '<div>' - end - @opened << type - when :end_line - if $CODERAY_DEBUG and (@opened.size == 1 or @opened.last != type) - warn 'Malformed token stream: Trying to close a line (%p) ' \ - 'that is not open. Open are: %p.' % [type, @opened[1..-1]] + '<div>' end + end + + def end_line kind + if $CODERAY_DEBUG and (@opened.size == 1 or @opened.last != kind) + warn 'Malformed token stream: Trying to close a line (%p) ' \ + 'that is not open. Open are: %p.' % [kind, @opened[1..-1]] + end + @out << if @opened.empty? - # nothing to close + '' # nothing to close else - @out << '</div>' @opened.pop + '</div>' end - - else - raise 'unknown token kind: %p' % [text] - - end end end diff --git a/lib/coderay/encoders/json.rb b/lib/coderay/encoders/json.rb index 78f0ec0..bb09809 100644 --- a/lib/coderay/encoders/json.rb +++ b/lib/coderay/encoders/json.rb @@ -33,11 +33,23 @@ module Encoders end def text_token text, kind - { :type => 'text', :text => text, :kind => kind } + @out << { :type => 'text', :text => text, :kind => kind } end - def block_token action, kind - { :type => 'block', :action => action, :kind => kind } + def begin_group kind + @out << { :type => 'block', :action => 'open', :kind => kind } + end + + def end_group kind + @out << { :type => 'block', :action => 'close', :kind => kind } + end + + def begin_line kind + @out << { :type => 'block', :action => 'begin_line', :kind => kind } + end + + def end_line kind + @out << { :type => 'block', :action => 'end_line', :kind => kind } end def finish options diff --git a/lib/coderay/encoders/lines_of_code.rb b/lib/coderay/encoders/lines_of_code.rb index c6ed4de..6b36aef 100644 --- a/lib/coderay/encoders/lines_of_code.rb +++ b/lib/coderay/encoders/lines_of_code.rb @@ -79,9 +79,9 @@ puts "Hello world!" def test_filtering_block_tokens tokens = CodeRay::Tokens.new - tokens << ["Hello\n", :world] - tokens << ["Hello\n", :space] - tokens << ["Hello\n", :comment] + tokens.concat ["Hello\n", :world] + tokens.concat ["Hello\n", :space] + tokens.concat ["Hello\n", :comment] assert_equal 2, CodeRay::Encoders::LinesOfCode.new.encode_tokens(tokens) assert_equal 2, tokens.lines_of_code assert_equal 2, tokens.loc diff --git a/lib/coderay/encoders/statistic.rb b/lib/coderay/encoders/statistic.rb index 1b38938..d267b21 100644 --- a/lib/coderay/encoders/statistic.rb +++ b/lib/coderay/encoders/statistic.rb @@ -1,3 +1,4 @@ +($:.unshift '../..'; require 'coderay') unless defined? CodeRay module CodeRay module Encoders @@ -34,9 +35,25 @@ module Encoders end # TODO Hierarchy handling - def block_token action, kind + def begin_group kind + block_token 'begin_group' + end + + def end_group kind + block_token 'end_group' + end + + def begin_line kind + block_token 'begin_line' + end + + def end_line kind + block_token 'end_line' + end + + def block_token action @type_stats['TOTAL'].count += 1 - @type_stats['open/close'].count += 1 + @type_stats[action].count += 1 end STATS = <<-STATS # :nodoc: @@ -77,3 +94,67 @@ Token Types (%d): end end + +if $0 == __FILE__ + $VERBOSE = true + $: << File.join(File.dirname(__FILE__), '..') + eval DATA.read, nil, $0, __LINE__ + 4 +end + +__END__ +require 'test/unit' + +class StatisticEncoderTest < Test::Unit::TestCase + + def test_creation + assert CodeRay::Encoders::Statistic < CodeRay::Encoders::Encoder + stats = nil + assert_nothing_raised do + stats = CodeRay.encoder :statistic + end + assert_kind_of CodeRay::Encoders::Encoder, stats + end + + TEST_INPUT = CodeRay::Tokens[ + ['10', :integer], + ['(\\)', :operator], + [:begin_group, :string], + ['test', :content], + [:end_group, :string], + [:begin_line, :test], + ["\n", :space], + ["\n \t", :space], + [" \n", :space], + ["[]", :method], + [:end_line, :test], + ].flatten + TEST_OUTPUT = <<-'DEBUG' + +Code Statistics + +Tokens 11 + Non-Whitespace 4 +Bytes Total 20 + +Token Types (5): + type count ratio size (average) +------------------------------------------------------------- + TOTAL 11 100.00 % 1.8 + space 3 27.27 % 3.0 + begin_group 1 9.09 % 0.0 + begin_line 1 9.09 % 0.0 + content 1 9.09 % 4.0 + end_group 1 9.09 % 0.0 + end_line 1 9.09 % 0.0 + integer 1 9.09 % 2.0 + method 1 9.09 % 2.0 + operator 1 9.09 % 3.0 + + DEBUG + + def test_filtering_text_tokens + assert_equal TEST_OUTPUT, CodeRay::Encoders::Statistic.new.encode_tokens(TEST_INPUT) + assert_equal TEST_OUTPUT, TEST_INPUT.statistic + end + +end
\ No newline at end of file diff --git a/lib/coderay/encoders/terminal.rb b/lib/coderay/encoders/terminal.rb index 7224218..3a774a0 100644 --- a/lib/coderay/encoders/terminal.rb +++ b/lib/coderay/encoders/terminal.rb @@ -92,41 +92,72 @@ module CodeRay TOKEN_COLORS[:keyword] = TOKEN_COLORS[:reserved] TOKEN_COLORS[:method] = TOKEN_COLORS[:function] TOKEN_COLORS[:imaginary] = TOKEN_COLORS[:complex] - TOKEN_COLORS[:open] = TOKEN_COLORS[:close] = TOKEN_COLORS[:nesting_delimiter] = TOKEN_COLORS[:escape] = TOKEN_COLORS[:delimiter] + TOKEN_COLORS[:begin_group] = TOKEN_COLORS[:end_group] = + TOKEN_COLORS[:nesting_delimiter] = TOKEN_COLORS[:escape] = + TOKEN_COLORS[:delimiter] protected def setup(options) super @opened = [] + @subcolors = nil end - - def finish(options) - super - end - - def text_token text, type - if color = (@subcolors || TOKEN_COLORS)[type] + + public + + def text_token text, kind + if color = (@subcolors || TOKEN_COLORS)[kind] if Hash === color if color[:self] color = color[:self] else - return text + @out << text + return end end - - out = ansi_colorize(color) - out << text.gsub("\n", ansi_clear + "\n" + ansi_colorize(color)) - out << ansi_clear - out << ansi_colorize(@subcolors[:self]) if @subcolors && @subcolors[:self] - out + + @out << ansi_colorize(color) + @out << text.gsub("\n", ansi_clear + "\n" + ansi_colorize(color)) + @out << ansi_clear + @out << ansi_colorize(@subcolors[:self]) if @subcolors && @subcolors[:self] else - text + @out << text end end - def open_token type - if color = TOKEN_COLORS[type] + def begin_group kind + @opened << kind + @out << open_token(kind) + end + alias begin_line begin_group + + def end_group kind + if @opened.empty? + # nothing to close + else + @opened.pop + @out << ansi_clear + @out << open_token(@opened.last) + end + end + + def end_line kind + if @opened.empty? + # nothing to close + else + @opened.pop + # whole lines to be highlighted, + # eg. added/modified/deleted lines in a diff + @out << "\t" * 100 + ansi_clear + @out << open_token(@opened.last) + end + end + + private + + def open_token kind + if color = TOKEN_COLORS[kind] if Hash === color @subcolors = color ansi_colorize(color[:self]) if color[:self] @@ -140,34 +171,6 @@ module CodeRay end end - def block_token action, type - case action - - when :open, :begin_line - @opened << type - open_token type - when :close, :end_line - if @opened.empty? - # nothing to close - else - @opened.pop - if action == :end_line - # whole lines to be highlighted, - # eg. added/modified/deleted lines in a diff - "\t" * 100 + ansi_clear - else - ansi_clear - end + - open_token(@opened.last) - end - - else - raise 'unknown token kind: %p' % [text] - end - end - - private - def ansi_colorize(color) Array(color).map { |c| "\e[#{c}m" }.join end diff --git a/lib/coderay/encoders/text.rb b/lib/coderay/encoders/text.rb index 26fef84..ecbf624 100644 --- a/lib/coderay/encoders/text.rb +++ b/lib/coderay/encoders/text.rb @@ -23,16 +23,16 @@ module Encoders :separator => '' } + def text_token text, kind + @out << text + @sep + end + protected def setup options super @sep = options[:separator] end - def text_token text, kind - text + @sep - end - def finish options super.chomp @sep end diff --git a/lib/coderay/encoders/token_kind_filter.rb b/lib/coderay/encoders/token_kind_filter.rb index 4b2f582..fd3df44 100644 --- a/lib/coderay/encoders/token_kind_filter.rb +++ b/lib/coderay/encoders/token_kind_filter.rb @@ -76,28 +76,28 @@ class TokenKindFilterTest < Test::Unit::TestCase def test_filtering_text_tokens tokens = CodeRay::Tokens.new for i in 1..10 - tokens << [i.to_s, :index] - tokens << [' ', :space] if i < 10 + tokens.text_token i.to_s, :index + tokens.text_token ' ', :space if i < 10 end - assert_equal 10, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :exclude => :space).size - assert_equal 10, tokens.token_kind_filter(:exclude => :space).size - assert_equal 9, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :include => :space).size - assert_equal 9, tokens.token_kind_filter(:include => :space).size - assert_equal 0, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :exclude => :all).size - assert_equal 0, tokens.token_kind_filter(:exclude => :all).size + assert_equal 10, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :exclude => :space).count + assert_equal 10, tokens.token_kind_filter(:exclude => :space).count + assert_equal 9, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :include => :space).count + assert_equal 9, tokens.token_kind_filter(:include => :space).count + assert_equal 0, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :exclude => :all).count + assert_equal 0, tokens.token_kind_filter(:exclude => :all).count end def test_filtering_block_tokens tokens = CodeRay::Tokens.new 10.times do |i| - tokens << [:open, :index] - tokens << [i.to_s, :content] - tokens << [:close, :index] + tokens.begin_group :index + tokens.text_token i.to_s, :content + tokens.end_group :index end - assert_equal 20, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :include => :blubb).size - assert_equal 20, tokens.token_kind_filter(:include => :blubb).size - assert_equal 30, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :exclude => :index).size - assert_equal 30, tokens.token_kind_filter(:exclude => :index).size + assert_equal 20, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :include => :blubb).count + assert_equal 20, tokens.token_kind_filter(:include => :blubb).count + assert_equal 30, CodeRay::Encoders::TokenKindFilter.new.encode_tokens(tokens, :exclude => :index).count + assert_equal 30, tokens.token_kind_filter(:exclude => :index).count end end diff --git a/lib/coderay/encoders/xml.rb b/lib/coderay/encoders/xml.rb index f32c967..0006d75 100644 --- a/lib/coderay/encoders/xml.rb +++ b/lib/coderay/encoders/xml.rb @@ -53,19 +53,19 @@ module Encoders end end end - - def open_token kind + + def begin_group kind @node = @node.add_element kind.to_s end - - def close_token kind + + def end_group kind if @node == @root raise 'no token to close!' end @node = @node.parent end - + end - + end end diff --git a/lib/coderay/for_redcloth.rb b/lib/coderay/for_redcloth.rb index 5149562..e439929 100644 --- a/lib/coderay/for_redcloth.rb +++ b/lib/coderay/for_redcloth.rb @@ -45,7 +45,7 @@ module CodeRay if !opts[:lang] && RedCloth::VERSION.to_s >= '4.2.0' # simulating pre-4.2 behavior if opts[:text].sub!(/\A\[(\w+)\]/, '') - if CodeRay::Scanners[$1].plugin_id == 'plaintext' + if CodeRay::Scanners[$1].plugin_id == :plaintext opts[:text] = $& + opts[:text] else opts[:lang] = $1 diff --git a/lib/coderay/scanner.rb b/lib/coderay/scanner.rb index 165fd7f..286561d 100644 --- a/lib/coderay/scanner.rb +++ b/lib/coderay/scanner.rb @@ -61,11 +61,6 @@ module CodeRay class << self - # Returns if the Scanner can be used in streaming mode. - def streamable? - is_a? Streamable - end - def normify code code = code.to_s.dup # try using UTF-8 @@ -115,9 +110,6 @@ module CodeRay # overwrite default options here.) # * +block+ is the callback for streamed highlighting. # - # If you set :stream to +true+ in the options, the Scanner uses a - # TokenStream with the +block+ as callback to handle the tokens. - # # Else, a Tokens object is used. def initialize code='', options = {}, &block raise "I am only the basic Scanner class. I can't scan "\ @@ -129,16 +121,13 @@ module CodeRay @tokens = options[:tokens] if @options[:stream] - warn "warning in CodeRay::Scanner.new: :stream is set, "\ - "but no block was given" unless block_given? - raise NotStreamableError, self unless kind_of? Streamable - @tokens ||= TokenStream.new(&block) + raise NotImplementedError unless @tokens.is_a? Encoders::Encoder else warn "warning in CodeRay::Scanner.new: Block given, "\ "but :stream is #{@options[:stream]}" if block_given? @tokens ||= Tokens.new end - @tokens.scanner = self + @tokens.scanner = self if @tokens.respond_to? :scanner= setup end @@ -162,7 +151,7 @@ module CodeRay # Returns the Plugin ID for this scanner. def lang - self.class.plugin_id + self.class.plugin_id.to_s end # Scans the code and returns all tokens in a Tokens object. @@ -191,8 +180,6 @@ module CodeRay # Traverses the tokens. def each &block - raise ArgumentError, - 'Cannot traverse TokenStream.' if @options[:stream] tokens.each(&block) end include Enumerable @@ -246,7 +233,7 @@ module CodeRay # Resets the scanner. def reset_instance - @tokens.clear unless @options[:keep_tokens] + @tokens.clear if @tokens.respond_to?(:clear) && !@options[:keep_tokens] @cached_tokens = nil @bin_string = nil if defined? @bin_string end diff --git a/lib/coderay/scanners/c.rb b/lib/coderay/scanners/c.rb index e13dc37..45ca42e 100644 --- a/lib/coderay/scanners/c.rb +++ b/lib/coderay/scanners/c.rb @@ -43,7 +43,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial label_expected = true @@ -53,9 +53,6 @@ module Scanners until eos? - kind = nil - match = nil - case state when :initial @@ -65,15 +62,14 @@ module Scanners in_preproc_line = false label_expected = label_expected_before_preproc_line end - tokens << [match, :space] - next + encoder.text_token match, :space - elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) - kind = :comment + elsif match = scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) + encoder.text_token match, :comment elsif match = scan(/ \# \s* if \s* 0 /x) match << scan_until(/ ^\# (?:elif|else|endif) .*? $ | \z /xm) unless eos? - kind = :comment + encoder.text_token match, :comment elsif match = scan(/ [-+*=<>?:;,!&^|()\[\]{}~%]+ | \/=? | \.(?!\d) /x) label_expected = match =~ /[;\{\}]/ @@ -81,7 +77,7 @@ module Scanners label_expected = true if match == ':' case_expected = false end - kind = :operator + encoder.text_token match, :operator elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x) kind = IDENT_KIND[match] @@ -97,107 +93,96 @@ module Scanners end end end + encoder.text_token match, kind - elsif scan(/\$/) - kind = :ident + elsif match = scan(/\$/) + encoder.text_token match, :ident elsif match = scan(/L?"/) - tokens << [:open, :string] + encoder.begin_group :string if match[0] == ?L - tokens << ['L', :modifier] + encoder.text_token 'L', :modifier match = '"' end + encoder.text_token match, :delimiter state = :string - kind = :delimiter - elsif scan(/#[ \t]*(\w*)/) - kind = :preprocessor + elsif match = scan(/#[ \t]*(\w*)/) + encoder.text_token match, :preprocessor in_preproc_line = true label_expected_before_preproc_line = label_expected state = :include_expected if self[1] == 'include' - elsif scan(/ L?' (?: [^\'\n\\] | \\ #{ESCAPE} )? '? /ox) + elsif match = scan(/ L?' (?: [^\'\n\\] | \\ #{ESCAPE} )? '? /ox) label_expected = false - kind = :char + encoder.text_token match, :char - elsif scan(/0[xX][0-9A-Fa-f]+/) + elsif match = scan(/0[xX][0-9A-Fa-f]+/) label_expected = false - kind = :hex + encoder.text_token match, :hex - elsif scan(/(?:0[0-7]+)(?![89.eEfF])/) + elsif match = scan(/(?:0[0-7]+)(?![89.eEfF])/) label_expected = false - kind = :oct + encoder.text_token match, :oct - elsif scan(/(?:\d+)(?![.eEfF])L?L?/) + elsif match = scan(/(?:\d+)(?![.eEfF])L?L?/) label_expected = false - kind = :integer + encoder.text_token match, :integer - elsif scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/) + elsif match = scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/) label_expected = false - kind = :float + encoder.text_token match, :float else - getch - kind = :error + encoder.text_token getch, :error end when :string - if scan(/[^\\\n"]+/) - kind = :content - elsif scan(/"/) - tokens << ['"', :delimiter] - tokens << [:close, :string] + if match = scan(/[^\\\n"]+/) + encoder.text_token match, :content + elsif match = scan(/"/) + encoder.text_token match, :delimiter + encoder.end_group :string state = :initial label_expected = false - next - elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) - kind = :char - elsif scan(/ \\ | $ /x) - tokens << [:close, :string] - kind = :error + elsif match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) + encoder.text_token match, :char + elsif match = scan(/ \\ | $ /x) + encoder.end_group :string + encoder.text_token match, :error state = :initial label_expected = false else - raise_inspect "else case \" reached; %p not handled." % peek(1), tokens + raise_inspect "else case \" reached; %p not handled." % peek(1), encoder end when :include_expected - if scan(/<[^>\n]+>?|"[^"\n\\]*(?:\\.[^"\n\\]*)*"?/) - kind = :include + if match = scan(/<[^>\n]+>?|"[^"\n\\]*(?:\\.[^"\n\\]*)*"?/) + encoder.text_token match, :include state = :initial elsif match = scan(/\s+/) - kind = :space + encoder.text_token match, :space state = :initial if match.index ?\n else state = :initial - next end else - raise_inspect 'Unknown state', tokens + raise_inspect 'Unknown state', encoder end - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens - end - raise_inspect 'Empty token', tokens unless match - - tokens << [match, kind] - end if state == :string - tokens << [:close, :string] + encoder.end_group :string end - tokens + encoder end end diff --git a/lib/coderay/scanners/cpp.rb b/lib/coderay/scanners/cpp.rb index eba1bd2..7531892 100644 --- a/lib/coderay/scanners/cpp.rb +++ b/lib/coderay/scanners/cpp.rb @@ -53,7 +53,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial label_expected = true @@ -63,9 +63,6 @@ module Scanners until eos? - kind = nil - match = nil - case state when :initial @@ -75,15 +72,14 @@ module Scanners in_preproc_line = false label_expected = label_expected_before_preproc_line end - tokens << [match, :space] - next + encoder.text_token match, :space - elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) - kind = :comment + elsif match = scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) + encoder.text_token match, :comment elsif match = scan(/ \# \s* if \s* 0 /x) match << scan_until(/ ^\# (?:elif|else|endif) .*? $ | \z /xm) unless eos? - kind = :comment + encoder.text_token match, :comment elsif match = scan(/ [-+*=<>?:;,!&^|()\[\]{}~%]+ | \/=? | \.(?!\d) /x) label_expected = match =~ /[;\{\}]/ @@ -91,7 +87,7 @@ module Scanners label_expected = true if match == ':' case_expected = false end - kind = :operator + encoder.text_token match, :operator elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x) kind = IDENT_KIND[match] @@ -109,122 +105,110 @@ module Scanners end end end + encoder.text_token match, kind - elsif scan(/\$/) - kind = :ident + elsif match = scan(/\$/) + encoder.text_token match, :ident elsif match = scan(/L?"/) - tokens << [:open, :string] + encoder.begin_group :string if match[0] == ?L - tokens << ['L', :modifier] + encoder.text_token match, 'L', :modifier match = '"' end state = :string - kind = :delimiter + encoder.text_token match, :delimiter - elsif scan(/#[ \t]*(\w*)/) - kind = :preprocessor + elsif match = scan(/#[ \t]*(\w*)/) + encoder.text_token match, :preprocessor in_preproc_line = true label_expected_before_preproc_line = label_expected state = :include_expected if self[1] == 'include' - elsif scan(/ L?' (?: [^\'\n\\] | \\ #{ESCAPE} )? '? /ox) + elsif match = scan(/ L?' (?: [^\'\n\\] | \\ #{ESCAPE} )? '? /ox) label_expected = false - kind = :char + encoder.text_token match, :char - elsif scan(/0[xX][0-9A-Fa-f]+/) + elsif match = scan(/0[xX][0-9A-Fa-f]+/) label_expected = false - kind = :hex + encoder.text_token match, :hex - elsif scan(/(?:0[0-7]+)(?![89.eEfF])/) + elsif match = scan(/(?:0[0-7]+)(?![89.eEfF])/) label_expected = false - kind = :oct + encoder.text_token match, :oct - elsif scan(/(?:\d+)(?![.eEfF])L?L?/) + elsif match = scan(/(?:\d+)(?![.eEfF])L?L?/) label_expected = false - kind = :integer + encoder.text_token match, :integer - elsif scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/) + elsif match = scan(/\d[fF]?|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/) label_expected = false - kind = :float + encoder.text_token match, :float else - getch - kind = :error + encoder.text_token getch, :error end when :string - if scan(/[^\\"]+/) - kind = :content - elsif scan(/"/) - tokens << ['"', :delimiter] - tokens << [:close, :string] + if match = scan(/[^\\"]+/) + encoder.text_token match, :content + elsif match = scan(/"/) + encoder.text_token match, :delimiter + encoder.end_group :string state = :initial label_expected = false - next - elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) - kind = :char - elsif scan(/ \\ | $ /x) - tokens << [:close, :string] - kind = :error + elsif match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) + encoder.text_token match, :char + elsif match = scan(/ \\ | $ /x) + encoder.end_group :string + encoder.text_token match, :error state = :initial label_expected = false else - raise_inspect "else case \" reached; %p not handled." % peek(1), tokens + raise_inspect "else case \" reached; %p not handled." % peek(1), encoder end when :include_expected - if scan(/<[^>\n]+>?|"[^"\n\\]*(?:\\.[^"\n\\]*)*"?/) - kind = :include + if match = scan(/<[^>\n]+>?|"[^"\n\\]*(?:\\.[^"\n\\]*)*"?/) + encoder.text_token match, :include state = :initial elsif match = scan(/\s+/) - kind = :space + encoder.text_token match, :space state = :initial if match.index ?\n else state = :initial - next end when :class_name_expected - if scan(/ [A-Za-z_][A-Za-z_0-9]* /x) - kind = :class + if match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x) + encoder.text_token match, :class state = :initial elsif match = scan(/\s+/) - kind = :space + encoder.text_token match, :space else - getch - kind = :error + encoder.text_token getch, :error state = :initial end else - raise_inspect 'Unknown state', tokens - - end + raise_inspect 'Unknown state', encoder - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens end - raise_inspect 'Empty token', tokens unless match - - tokens << [match, kind] end if state == :string - tokens << [:close, :string] + encoder.end_group :string end - tokens + encoder end end diff --git a/lib/coderay/scanners/css.rb b/lib/coderay/scanners/css.rb index 75cd056..b3f116e 100644 --- a/lib/coderay/scanners/css.rb +++ b/lib/coderay/scanners/css.rb @@ -51,129 +51,123 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options value_expected = nil states = [:initial] until eos? - kind = nil - match = nil - - if scan(/\s+/) - kind = :space + if match = scan(/\s+/) + encoder.text_token match, :space elsif case states.last when :initial, :media - if scan(/(?>#{RE::Ident})(?!\()|\*/ox) - kind = :type - elsif scan RE::Class - kind = :class - elsif scan RE::Id - kind = :constant - elsif scan RE::PseudoClass - kind = :pseudo_class + if match = scan(/(?>#{RE::Ident})(?!\()|\*/ox) + encoder.text_token match, :type + elsif match = scan(RE::Class) + encoder.text_token match, :class + elsif match = scan(RE::Id) + encoder.text_token match, :constant + elsif match = scan(RE::PseudoClass) + encoder.text_token match, :pseudo_class elsif match = scan(RE::AttributeSelector) # TODO: Improve highlighting inside of attribute selectors. - tokens << [match[0,1], :operator] - tokens << [match[1..-2], :attribute_name] if match.size > 2 - tokens << [match[-1,1], :operator] if match[-1] == ?] - next + encoder.text_token match[0,1], :operator + encoder.text_token match[1..-2], :attribute_name if match.size > 2 + encoder.text_token match[-1,1], :operator if match[-1] == ?] elsif match = scan(/@media/) - kind = :directive + encoder.text_token match, :directive states.push :media_before_name end when :block - if scan(/(?>#{RE::Ident})(?!\()/ox) + if match = scan(/(?>#{RE::Ident})(?!\()/ox) if value_expected - kind = :value + encoder.text_token match, :value else - kind = :key + encoder.text_token match, :key end end when :media_before_name - if scan RE::Ident - kind = :type + if match = scan(RE::Ident) + encoder.text_token match, :type states[-1] = :media_after_name end when :media_after_name - if scan(/\{/) - kind = :operator + if match = scan(/\{/) + encoder.text_token match, :operator states[-1] = :media end when :comment - if scan(/(?:[^*\s]|\*(?!\/))+/) - kind = :comment - elsif scan(/\*\//) - kind = :comment + if match = scan(/(?:[^*\s]|\*(?!\/))+/) + encoder.text_token match, :comment + elsif match = scan(/\*\//) + encoder.text_token match, :comment states.pop - elsif scan(/\s+/) - kind = :space + elsif match = scan(/\s+/) + encoder.text_token match, :space end else - raise_inspect 'Unknown state', tokens + raise_inspect 'Unknown state', encoder end - elsif scan(/\/\*/) - kind = :comment + elsif match = scan(/\/\*/) + encoder.text_token match, :comment states.push :comment - elsif scan(/\{/) + elsif match = scan(/\{/) value_expected = false - kind = :operator + encoder.text_token match, :operator states.push :block - elsif scan(/\}/) + elsif match = scan(/\}/) value_expected = false if states.last == :block || states.last == :media - kind = :operator + encoder.text_token match, :operator states.pop else - kind = :error + encoder.text_token match, :error end elsif match = scan(/#{RE::String}/o) - tokens << [:open, :string] - tokens << [match[0, 1], :delimiter] - tokens << [match[1..-2], :content] if match.size > 2 - tokens << [match[-1, 1], :delimiter] if match.size >= 2 - tokens << [:close, :string] - next + encoder.begin_group :string + encoder.text_token match[0, 1], :delimiter + encoder.text_token match[1..-2], :content if match.size > 2 + encoder.text_token match[-1, 1], :delimiter if match.size >= 2 + encoder.end_group :string elsif match = scan(/#{RE::Function}/o) - tokens << [:open, :string] + encoder.begin_group :string start = match[/^\w+\(/] - tokens << [start, :delimiter] + encoder.text_token start, :delimiter if match[-1] == ?) - tokens << [match[start.size..-2], :content] - tokens << [')', :delimiter] + encoder.text_token match[start.size..-2], :content + encoder.text_token ')', :delimiter else - tokens << [match[start.size..-1], :content] + encoder.text_token match[start.size..-1], :content end - tokens << [:close, :string] - next + encoder.end_group :string - elsif scan(/(?: #{RE::Dimension} | #{RE::Percentage} | #{RE::Num} )/ox) - kind = :float + elsif match = scan(/(?: #{RE::Dimension} | #{RE::Percentage} | #{RE::Num} )/ox) + encoder.text_token match, :float - elsif scan(/#{RE::Color}/o) - kind = :color + elsif match = scan(/#{RE::Color}/o) + encoder.text_token match, :color - elsif scan(/! *important/) - kind = :important + elsif match = scan(/! *important/) + encoder.text_token match, :important - elsif scan(/(?:rgb|hsl)a?\([^()\n]*\)?/) - kind = :color + elsif match = scan(/(?:rgb|hsl)a?\([^()\n]*\)?/) + encoder.text_token match, :color - elsif scan(/#{RE::AtKeyword}/o) - kind = :directive + elsif match = scan(RE::AtKeyword) + encoder.text_token match, :directive elsif match = scan(/ [+>:;,.=()\/] /x) if match == ':' @@ -181,26 +175,16 @@ module Scanners elsif match == ';' value_expected = false end - kind = :operator + encoder.text_token match, :operator else - getch - kind = :error - - end + encoder.text_token getch, :error - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens end - raise_inspect 'Empty token', tokens unless match - - tokens << [match, kind] end - tokens + encoder end end diff --git a/lib/coderay/scanners/debug.rb b/lib/coderay/scanners/debug.rb index e33bff2..0f2b89f 100644 --- a/lib/coderay/scanners/debug.rb +++ b/lib/coderay/scanners/debug.rb @@ -14,67 +14,52 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options opened_tokens = [] until eos? - kind = nil - match = nil - - if scan(/\s+/) - tokens << [matched, :space] - next - - elsif scan(/ (\w+) \( ( [^\)\\]* ( \\. [^\)\\]* )* ) \)? /x) - kind = self[1].to_sym - match = self[2].gsub(/\\(.)/, '\1') - unless Tokens::AbbreviationForKind.has_key? kind - kind = :error - match = matched - end - - elsif scan(/ (\w+) ([<\[]) /x) - kind = self[1].to_sym - opened_tokens << kind - case self[2] - when '<' - match = :open - when '[' - match = :begin_line - else - raise - end - - elsif !opened_tokens.empty? && scan(/ > /x) - kind = opened_tokens.pop - match = :close - - elsif !opened_tokens.empty? && scan(/ \] /x) - kind = opened_tokens.pop - match = :end_line - + if match = scan(/\s+/) + encoder.text_token match, :space + + elsif match = scan(/ (\w+) \( ( [^\)\\]* ( \\. [^\)\\]* )* ) \)? /x) + kind = self[1].to_sym + match = self[2].gsub(/\\(.)/, '\1') + unless Tokens::AbbreviationForKind.has_key? kind + kind = :error + match = matched + end + encoder.text_token match, kind + + elsif match = scan(/ (\w+) ([<\[]) /x) + kind = self[1].to_sym + opened_tokens << kind + case self[2] + when '<' + encoder.begin_group kind + when '[' + encoder.begin_line kind else - kind = :space - getch - + raise 'CodeRay bug: This case should not be reached.' end - - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens + + elsif !opened_tokens.empty? && match = scan(/ > /x) + encoder.end_group opened_tokens.pop + + elsif !opened_tokens.empty? && match = scan(/ \] /x) + encoder.end_line opened_tokens.pop + + else + encoder.text_token getch, :space + end - raise_inspect 'Empty token', tokens unless match - - tokens << [match, kind] end - tokens << [:close, opened_tokens.pop] until opened_tokens.empty? + encoder.end_group opened_tokens.pop until opened_tokens.empty? - tokens + encoder end end @@ -111,14 +96,14 @@ method([])] TEST_OUTPUT = CodeRay::Tokens[ ['10', :integer], ['(\\)', :operator], - [:open, :string], + [:begin_group, :string], ['test', :content], - [:close, :string], + [:end_group, :string], [:begin_line, :test], ["\n\n \t \n", :space], ["[]", :method], [:end_line, :test], - ] + ].flatten def test_filtering_text_tokens assert_equal TEST_OUTPUT, CodeRay::Scanners::Debug.new.tokenize(TEST_INPUT) diff --git a/lib/coderay/scanners/delphi.rb b/lib/coderay/scanners/delphi.rb index 170f250..e0f4ea1 100644 --- a/lib/coderay/scanners/delphi.rb +++ b/lib/coderay/scanners/delphi.rb @@ -42,110 +42,100 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial last_token = '' - + until eos? - - kind = nil - match = nil - + if state == :initial - if scan(/ \s+ /x) - tokens << [matched, :space] + if match = scan(/ \s+ /x) + encoder.text_token match, :space next - elsif scan(%r! \{ \$ [^}]* \}? | \(\* \$ (?: .*? \*\) | .* ) !mx) - tokens << [matched, :preprocessor] + elsif match = scan(%r! \{ \$ [^}]* \}? | \(\* \$ (?: .*? \*\) | .* ) !mx) + encoder.text_token match, :preprocessor next - elsif scan(%r! // [^\n]* | \{ [^}]* \}? | \(\* (?: .*? \*\) | .* ) !mx) - tokens << [matched, :comment] + elsif match = scan(%r! // [^\n]* | \{ [^}]* \}? | \(\* (?: .*? \*\) | .* ) !mx) + encoder.text_token match, :comment next elsif match = scan(/ <[>=]? | >=? | :=? | [-+=*\/;,@\^|\(\)\[\]] | \.\. /x) - kind = :operator + encoder.text_token match, :operator elsif match = scan(/\./) - kind = :operator - if last_token == 'end' - tokens << [match, kind] - next - end + encoder.text_token match, :operator + next if last_token == 'end' elsif match = scan(/ [A-Za-z_][A-Za-z_0-9]* /x) - kind = NAME_FOLLOWS[last_token] ? :ident : IDENT_KIND[match] + encoder.text_token match, NAME_FOLLOWS[last_token] ? :ident : IDENT_KIND[match] - elsif match = scan(/ ' ( [^\n']|'' ) (?:'|$) /x) - tokens << [:open, :char] - tokens << ["'", :delimiter] - tokens << [self[1], :content] - tokens << ["'", :delimiter] - tokens << [:close, :char] + elsif match = skip(/ ' ( [^\n']|'' ) (?:'|$) /x) + encoder.begin_group :char + encoder.text_token "'", :delimiter + encoder.text_token self[1], :content + encoder.text_token "'", :delimiter + encoder.end_group :char next elsif match = scan(/ ' /x) - tokens << [:open, :string] + encoder.begin_group :string + encoder.text_token match, :delimiter state = :string - kind = :delimiter - elsif scan(/ \# (?: \d+ | \$[0-9A-Fa-f]+ ) /x) - kind = :char + elsif match = scan(/ \# (?: \d+ | \$[0-9A-Fa-f]+ ) /x) + encoder.text_token match, :char - elsif scan(/ \$ [0-9A-Fa-f]+ /x) - kind = :hex + elsif match = scan(/ \$ [0-9A-Fa-f]+ /x) + encoder.text_token match, :hex - elsif scan(/ (?: \d+ ) (?![eE]|\.[^.]) /x) - kind = :integer + elsif match = scan(/ (?: \d+ ) (?![eE]|\.[^.]) /x) + encoder.text_token match, :integer + + elsif match = scan(/ \d+ (?: \.\d+ (?: [eE][+-]? \d+ )? | [eE][+-]? \d+ ) /x) + encoder.text_token match, :float - elsif scan(/ \d+ (?: \.\d+ (?: [eE][+-]? \d+ )? | [eE][+-]? \d+ ) /x) - kind = :float - else - kind = :error - getch - + encoder.text_token getch, :error + next + end elsif state == :string - if scan(/[^\n']+/) - kind = :content - elsif scan(/''/) - kind = :char - elsif scan(/'/) - tokens << ["'", :delimiter] - tokens << [:close, :string] + if match = scan(/[^\n']+/) + encoder.text_token match, :content + elsif match = scan(/''/) + encoder.text_token match, :char + elsif match = scan(/'/) + encoder.text_token match, :delimiter + encoder.end_group :string state = :initial next - elsif scan(/\n/) - tokens << [:close, :string] - kind = :error + elsif match = scan(/\n/) + encoder.end_group :string + encoder.text_token match, :space state = :initial else - raise "else case \' reached; %p not handled." % peek(1), tokens + raise "else case \' reached; %p not handled." % peek(1), encoder end else - raise 'else-case reached', tokens + raise 'else-case reached', encoder end - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens, state - end - raise_inspect 'Empty token', tokens unless match - last_token = match - tokens << [match, kind] end - tokens + if state == :string + encoder.end_group state + end + + encoder end end diff --git a/lib/coderay/scanners/diff.rb b/lib/coderay/scanners/diff.rb index 4f3ff2e..417985a 100644 --- a/lib/coderay/scanners/diff.rb +++ b/lib/coderay/scanners/diff.rb @@ -13,7 +13,7 @@ module Scanners require 'coderay/helpers/file_type' - def scan_tokens tokens, options + def scan_tokens encoder, options line_kind = nil state = :initial @@ -21,14 +21,13 @@ module Scanners content_lang = nil until eos? - kind = match = nil if match = scan(/\n/) if line_kind - tokens << [:end_line, line_kind] + encoder.end_line line_kind line_kind = nil end - tokens << [match, :space] + encoder.text_token match, :space next end @@ -36,89 +35,82 @@ module Scanners when :initial if match = scan(/--- |\+\+\+ |=+|_+/) - tokens << [:begin_line, line_kind = :head] - tokens << [match, :head] - if filename = scan(/.*?(?=$|[\t\n\x00]| \(revision)/) - tokens << [filename, :filename] - content_lang = FileType.fetch filename, :plaintext + encoder.begin_line line_kind = :head + encoder.text_token match, :head + if match = scan(/.*?(?=$|[\t\n\x00]| \(revision)/) + encoder.text_token match, :filename + content_lang = FileType.fetch match, :plaintext end next unless match = scan(/.+/) - kind = :plain + encoder.text_token match, :plain elsif match = scan(/Index: |Property changes on: /) - tokens << [:begin_line, line_kind = :head] - tokens << [match, :head] + encoder.begin_line line_kind = :head + encoder.text_token match, :head next unless match = scan(/.+/) - kind = :plain + encoder.text_token match, :plain elsif match = scan(/Added: /) - tokens << [:begin_line, line_kind = :head] - tokens << [match, :head] + encoder.begin_line line_kind = :head + encoder.text_token match, :head next unless match = scan(/.+/) - kind = :plain + encoder.text_token match, :plain state = :added elsif match = scan(/\\ /) - tokens << [:begin_line, line_kind = :change] - tokens << [match, :change] + encoder.begin_line line_kind = :change + encoder.text_token match, :change next unless match = scan(/.+/) - kind = :plain + encoder.text_token match, :plain elsif match = scan(/@@(?>[^@\n]*)@@/) if check(/\n|$/) - tokens << [:begin_line, line_kind = :change] + encoder.begin_line line_kind = :change else - tokens << [:open, :change] + encoder.begin_group :change end - tokens << [match[0,2], :change] - tokens << [match[2...-2], :plain] if match.size > 4 - tokens << [match[-2,2], :change] - tokens << [:close, :change] unless line_kind - next unless code = scan(/.+/) - CodeRay.scan code, content_lang, :tokens => tokens + encoder.text_token match[0,2], :change + encoder.text_token match[2...-2], :plain if match.size > 4 + encoder.text_token match[-2,2], :change + encoder.end_group :change unless line_kind + next unless match = scan(/.+/) + CodeRay.scan match, content_lang, :tokens => encoder next elsif match = scan(/\+/) - tokens << [:begin_line, line_kind = :insert] - tokens << [match, :insert] + encoder.begin_line line_kind = :insert + encoder.text_token match, :insert next unless match = scan(/.+/) - CodeRay.scan match, content_lang, :tokens => tokens + CodeRay.scan match, content_lang, :tokens => encoder next elsif match = scan(/-/) - tokens << [:begin_line, line_kind = :delete] - tokens << [match, :delete] - next unless code = scan(/.+/) - CodeRay.scan code, content_lang, :tokens => tokens + encoder.begin_line line_kind = :delete + encoder.text_token match, :delete + next unless match = scan(/.+/) + CodeRay.scan match, content_lang, :tokens => encoder next - elsif code = scan(/ .*/) - CodeRay.scan code, content_lang, :tokens => tokens + elsif match = scan(/ .*/) + CodeRay.scan match, content_lang, :tokens => encoder next - elsif scan(/.+/) - tokens << [:begin_line, line_kind = :comment] - kind = :plain + elsif match = scan(/.+/) + encoder.begin_line line_kind = :comment + encoder.text_token match, :plain else raise_inspect 'else case rached' end when :added if match = scan(/ \+/) - tokens << [:begin_line, line_kind = :insert] - tokens << [match, :insert] + encoder.begin_line line_kind = :insert + encoder.text_token match, :insert next unless match = scan(/.+/) - kind = :plain + encoder.text_token match, :plain else state = :initial next end end - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens - end - raise_inspect 'Empty token', tokens unless match - - tokens << [match, kind] end - tokens << [:end_line, line_kind] if line_kind - tokens + encoder.end_line line_kind if line_kind + + encoder end end diff --git a/lib/coderay/scanners/groovy.rb b/lib/coderay/scanners/groovy.rb index fd7fbd9..fdbbbc7 100644 --- a/lib/coderay/scanners/groovy.rb +++ b/lib/coderay/scanners/groovy.rb @@ -1,11 +1,11 @@ module CodeRay module Scanners - + load :java # Scanner for Groovy. class Groovy < Java - + include Streamable register_for :groovy @@ -37,7 +37,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial inline_block_stack = [] @@ -45,35 +45,32 @@ module Scanners string_delimiter = nil import_clause = class_name_follows = last_token = after_def = false value_expected = true - + until eos? - - kind = nil - match = nil case state - + when :initial - + if match = scan(/ \s+ | \\\n /x) - tokens << [match, :space] + encoder.text_token match, :space if match.index ?\n import_clause = after_def = false value_expected = true unless value_expected end next - elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) + elsif match = scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) value_expected = true after_def = false - kind = :comment + encoder.text_token match, :comment - elsif bol? && scan(/ \#!.* /x) - kind = :doctype + elsif bol? && match = scan(/ \#!.* /x) + encoder.text_token match, :doctype - elsif import_clause && scan(/ (?!as) #{IDENT} (?: \. #{IDENT} )* (?: \.\* )? /ox) + elsif import_clause && match = scan(/ (?!as) #{IDENT} (?: \. #{IDENT} )* (?: \.\* )? /ox) after_def = value_expected = false - kind = :include + encoder.text_token match, :include elsif match = scan(/ #{IDENT} | \[\] /ox) kind = IDENT_KIND[match] @@ -93,16 +90,17 @@ module Scanners import_clause = match == 'import' after_def = true if match == 'def' end + encoder.text_token match, kind - elsif scan(/;/) + elsif match = scan(/;/) import_clause = after_def = false value_expected = true - kind = :operator + encoder.text_token match, :operator - elsif scan(/\{/) + elsif match = scan(/\{/) class_name_follows = after_def = false value_expected = true - kind = :operator + encoder.text_token match, :operator if !inline_block_stack.empty? inline_block_paren_depth += 1 end @@ -113,155 +111,146 @@ module Scanners value_expected = true value_expected = :regexp if match == '~' after_def = false - kind = :operator + encoder.text_token match, :operator elsif match = scan(/ [)\]}] /x) value_expected = after_def = false if !inline_block_stack.empty? && match == '}' inline_block_paren_depth -= 1 if inline_block_paren_depth == 0 # closing brace of inline block reached - tokens << [match, :inline_delimiter] - tokens << [:close, :inline] + encoder.text_token match, :inline_delimiter + encoder.end_group :inline state, string_delimiter, inline_block_paren_depth = inline_block_stack.pop next end end - kind = :operator + encoder.text_token match, :operator elsif check(/[\d.]/) after_def = value_expected = false - if scan(/0[xX][0-9A-Fa-f]+/) - kind = :hex - elsif scan(/(?>0[0-7]+)(?![89.eEfF])/) - kind = :oct - elsif scan(/\d+[fFdD]|\d*\.\d+(?:[eE][+-]?\d+)?[fFdD]?|\d+[eE][+-]?\d+[fFdD]?/) - kind = :float - elsif scan(/\d+[lLgG]?/) - kind = :integer + if match = scan(/0[xX][0-9A-Fa-f]+/) + encoder.text_token match, :hex + elsif match = scan(/(?>0[0-7]+)(?![89.eEfF])/) + encoder.text_token match, :oct + elsif match = scan(/\d+[fFdD]|\d*\.\d+(?:[eE][+-]?\d+)?[fFdD]?|\d+[eE][+-]?\d+[fFdD]?/) + encoder.text_token match, :float + elsif match = scan(/\d+[lLgG]?/) + encoder.text_token match, :integer end - + elsif match = scan(/'''|"""/) after_def = value_expected = false state = :multiline_string - tokens << [:open, :string] + encoder.begin_group :string string_delimiter = match - kind = :delimiter - + encoder.text_token match, :delimiter + # TODO: record.'name' syntax elsif match = scan(/["']/) after_def = value_expected = false state = match == '/' ? :regexp : :string - tokens << [:open, state] + encoder.begin_group state string_delimiter = match - kind = :delimiter - - elsif value_expected && (match = scan(/\//)) + encoder.text_token match, :delimiter + + elsif value_expected && match = scan(/\//) after_def = value_expected = false - tokens << [:open, :regexp] + encoder.begin_group :regexp state = :regexp string_delimiter = '/' - kind = :delimiter - - elsif scan(/ @ #{IDENT} /ox) + encoder.text_token match, :delimiter + + elsif match = scan(/ @ #{IDENT} /ox) after_def = value_expected = false - kind = :annotation - - elsif scan(/\//) + encoder.text_token match, :annotation + + elsif match = scan(/\//) after_def = false value_expected = true - kind = :operator - + encoder.text_token match, :operator + else - getch - kind = :error - + encoder.text_token getch, :error + end - + when :string, :regexp, :multiline_string - if scan(STRING_CONTENT_PATTERN[string_delimiter]) - kind = :content + if match = scan(STRING_CONTENT_PATTERN[string_delimiter]) + encoder.text_token match, :content elsif match = scan(state == :multiline_string ? /'''|"""/ : /["'\/]/) - tokens << [match, :delimiter] + encoder.text_token match, :delimiter if state == :regexp # TODO: regexp modifiers? s, m, x, i? modifiers = scan(/[ix]+/) - tokens << [modifiers, :modifier] if modifiers && !modifiers.empty? + encoder.text_token modifiers, :modifier if modifiers && !modifiers.empty? end state = :string if state == :multiline_string - tokens << [:close, state] + encoder.end_group state string_delimiter = nil after_def = value_expected = false state = :initial next - + elsif (state == :string || state == :multiline_string) && (match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)) if string_delimiter[0] == ?' && !(match == "\\\\" || match == "\\'") - kind = :content + encoder.text_token match, :content else - kind = :char + encoder.text_token match, :char end - elsif state == :regexp && scan(/ \\ (?: #{REGEXP_ESCAPE} | #{UNICODE_ESCAPE} ) /mox) - kind = :char - + elsif state == :regexp && match = scan(/ \\ (?: #{REGEXP_ESCAPE} | #{UNICODE_ESCAPE} ) /mox) + encoder.text_token match, :char + elsif match = scan(/ \$ #{IDENT} /mox) - tokens << [:open, :inline] - tokens << ['$', :inline_delimiter] + encoder.begin_group :inline + encoder.text_token '$', :inline_delimiter match = match[1..-1] - tokens << [match, IDENT_KIND[match]] - tokens << [:close, :inline] + encoder.text_token match, IDENT_KIND[match] + encoder.end_group :inline next elsif match = scan(/ \$ \{ /x) - tokens << [:open, :inline] - tokens << ['${', :inline_delimiter] + encoder.begin_group :inline + encoder.text_token match, :inline_delimiter inline_block_stack << [state, string_delimiter, inline_block_paren_depth] inline_block_paren_depth = 1 state = :initial next - - elsif scan(/ \$ /mx) - kind = :content - - elsif scan(/ \\. /mx) - kind = :content - - elsif scan(/ \\ | \n /x) - tokens << [:close, state] - kind = :error + + elsif match = scan(/ \$ /mx) + encoder.text_token match, :content + + elsif match = scan(/ \\. /mx) + encoder.text_token match, :content # FIXME: Shouldn't this be :error? + + elsif match = scan(/ \\ | \n /x) + encoder.end_group state + encoder.text_token match, :error after_def = value_expected = false state = :initial - + else - raise_inspect "else case \" reached; %p not handled." % peek(1), tokens + raise_inspect "else case \" reached; %p not handled." % peek(1), encoder + end - + else - raise_inspect 'Unknown state', tokens - - end - - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens + raise_inspect 'Unknown state', encoder + end - raise_inspect 'Empty token', tokens unless match last_token = match unless [:space, :comment, :doctype].include? kind - tokens << [match, kind] - end - + if [:multiline_string, :string, :regexp].include? state - tokens << [:close, state] + encoder.end_group state end - - tokens + + encoder end - + end - + end end diff --git a/lib/coderay/scanners/html.rb b/lib/coderay/scanners/html.rb index 52c7520..8f71e0e 100644 --- a/lib/coderay/scanners/html.rb +++ b/lib/coderay/scanners/html.rb @@ -53,135 +53,125 @@ module Scanners @state = :initial @plain_string_content = nil end - - def scan_tokens tokens, options - + + def scan_tokens encoder, options + state = @state plain_string_content = @plain_string_content - + until eos? - - kind = nil - match = nil - - if scan(/\s+/m) - kind = :space - + + if match = scan(/\s+/m) + encoder.text_token match, :space + else - + case state - + when :initial - if scan(/<!--.*?-->/m) - kind = :comment - elsif scan(/<!DOCTYPE.*?>/m) - kind = :doctype - elsif scan(/<\?xml.*?\?>/m) - kind = :preprocessor - elsif scan(/<\?.*?\?>|<%.*?%>/m) - kind = :comment - elsif scan(/<\/[-\w.:]*>/m) - kind = :tag + if match = scan(/<!--.*?-->/m) + encoder.text_token match, :comment + elsif match = scan(/<!DOCTYPE.*?>/m) + encoder.text_token match, :doctype + elsif match = scan(/<\?xml.*?\?>/m) + encoder.text_token match, :preprocessor + elsif match = scan(/<\?.*?\?>|<%.*?%>/m) + encoder.text_token match, :comment + elsif match = scan(/<\/[-\w.:]*>/m) + encoder.text_token match, :tag elsif match = scan(/<[-\w.:]+>?/m) - kind = :tag + encoder.text_token match, :tag state = :attribute unless match[-1] == ?> - elsif scan(/[^<>&]+/) - kind = :plain - elsif scan(/#{ENTITY}/ox) - kind = :entity - elsif scan(/[<>&]/) - kind = :error + elsif match = scan(/[^<>&]+/) + encoder.text_token match, :plain + elsif match = scan(/#{ENTITY}/ox) + encoder.text_token match, :entity + elsif match = scan(/[<>&]/) + encoder.text_token match, :error else - raise_inspect '[BUG] else-case reached with state %p' % [state], tokens + raise_inspect '[BUG] else-case reached with state %p' % [state], encoder end - + when :attribute - if scan(/#{TAG_END}/) - kind = :tag + if match = scan(/#{TAG_END}/) + encoder.text_token match, :tag state = :initial - elsif scan(/#{ATTR_NAME}/o) - kind = :attribute_name + elsif match = scan(/#{ATTR_NAME}/o) + encoder.text_token match, :attribute_name state = :attribute_equal else - kind = :error - getch + encoder.text_token getch, :error end - + when :attribute_equal - if scan(/=/) - kind = :operator + if match = scan(/=/) + encoder.text_token match, :operator state = :attribute_value - elsif scan(/#{ATTR_NAME}/o) - kind = :attribute_name - elsif scan(/#{TAG_END}/o) - kind = :tag + elsif match = scan(/#{ATTR_NAME}/o) + encoder.text_token match, :attribute_name + elsif match = scan(/#{TAG_END}/o) + encoder.text_token match, :tag state = :initial - elsif scan(/./) - kind = :error + else + encoder.text_token getch, :error state = :attribute end - + when :attribute_value - if scan(/#{ATTR_NAME}/o) - kind = :attribute_value + if match = scan(/#{ATTR_NAME}/o) + encoder.text_token match, :attribute_value state = :attribute elsif match = scan(/["']/) - tokens << [:open, :string] + encoder.begin_group :string state = :attribute_value_string plain_string_content = PLAIN_STRING_CONTENT[match] - kind = :delimiter + encoder.text_token match, :delimiter elsif scan(/#{TAG_END}/o) - kind = :tag + encoder.text_token match, :tag state = :initial else - kind = :error - getch + encoder.text_token getch, :error end - + when :attribute_value_string - if scan(plain_string_content) - kind = :content - elsif scan(/['"]/) - tokens << [matched, :delimiter] - tokens << [:close, :string] + if match = scan(plain_string_content) + encoder.text_token match, :content + elsif match = scan(/['"]/) + encoder.text_token match, :delimiter + encoder.end_group :string state = :attribute - next - elsif scan(/#{ENTITY}/ox) - kind = :entity - elsif scan(/&/) - kind = :content - elsif scan(/[\n>]/) - tokens << [:close, :string] - kind = :error + elsif match = scan(/#{ENTITY}/ox) + encoder.text_token match, :entity + elsif match = scan(/&/) + encoder.text_token match, :content + elsif match = scan(/[\n>]/) + encoder.end_group :string state = :initial + encoder.text_token match, :error end - + else - raise_inspect 'Unknown state: %p' % [state], tokens - + raise_inspect 'Unknown state: %p' % [state], encoder + end - + end - - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens, state - end - raise_inspect 'Empty token', tokens unless match - - tokens << [match, kind] + end - + if options[:keep_state] @state = state @plain_string_content = plain_string_content + else + if state == :attribute_value_string + encoder.end_group :string + end end - - tokens + + encoder end - + end - + end end diff --git a/lib/coderay/scanners/java.rb b/lib/coderay/scanners/java.rb index e4a7421..e7becda 100644 --- a/lib/coderay/scanners/java.rb +++ b/lib/coderay/scanners/java.rb @@ -48,7 +48,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial string_delimiter = nil @@ -58,23 +58,20 @@ module Scanners until eos? - kind = nil - match = nil - case state when :initial if match = scan(/ \s+ | \\\n /x) - tokens << [match, :space] + encoder.text_token match, :space next elsif match = scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) - tokens << [match, :comment] + encoder.text_token match, :comment next - elsif package_name_expected && scan(/ #{IDENT} (?: \. #{IDENT} )* /ox) - kind = package_name_expected + elsif package_name_expected && match = scan(/ #{IDENT} (?: \. #{IDENT} )* /ox) + encoder.text_token match, package_name_expected elsif match = scan(/ #{IDENT} | \[\] /ox) kind = IDENT_KIND[match] @@ -93,92 +90,82 @@ module Scanners class_name_follows = true end end + encoder.text_token match, kind - elsif scan(/ \.(?!\d) | [,?:()\[\]}] | -- | \+\+ | && | \|\| | \*\*=? | [-+*\/%^~&|<>=!]=? | <<<?=? | >>>?=? /x) - kind = :operator + elsif match = scan(/ \.(?!\d) | [,?:()\[\]}] | -- | \+\+ | && | \|\| | \*\*=? | [-+*\/%^~&|<>=!]=? | <<<?=? | >>>?=? /x) + encoder.text_token match, :operator - elsif scan(/;/) + elsif match = scan(/;/) package_name_expected = false - kind = :operator + encoder.text_token match, :operator - elsif scan(/\{/) + elsif match = scan(/\{/) class_name_follows = false - kind = :operator + encoder.text_token match, :operator elsif check(/[\d.]/) - if scan(/0[xX][0-9A-Fa-f]+/) - kind = :hex - elsif scan(/(?>0[0-7]+)(?![89.eEfF])/) - kind = :oct - elsif scan(/\d+[fFdD]|\d*\.\d+(?:[eE][+-]?\d+)?[fFdD]?|\d+[eE][+-]?\d+[fFdD]?/) - kind = :float - elsif scan(/\d+[lL]?/) - kind = :integer + if match = scan(/0[xX][0-9A-Fa-f]+/) + encoder.text_token match, :hex + elsif match = scan(/(?>0[0-7]+)(?![89.eEfF])/) + encoder.text_token match, :oct + elsif match = scan(/\d+[fFdD]|\d*\.\d+(?:[eE][+-]?\d+)?[fFdD]?|\d+[eE][+-]?\d+[fFdD]?/) + encoder.text_token match, :float + elsif match = scan(/\d+[lL]?/) + encoder.text_token match, :integer end elsif match = scan(/["']/) - tokens << [:open, :string] state = :string + encoder.begin_group state string_delimiter = match - kind = :delimiter + encoder.text_token match, :delimiter - elsif scan(/ @ #{IDENT} /ox) - kind = :annotation + elsif match = scan(/ @ #{IDENT} /ox) + encoder.text_token match, :annotation else - getch - kind = :error + encoder.text_token getch, :error end when :string - if scan(STRING_CONTENT_PATTERN[string_delimiter]) - kind = :content + if match = scan(STRING_CONTENT_PATTERN[string_delimiter]) + encoder.text_token match, :content elsif match = scan(/["'\/]/) - tokens << [match, :delimiter] - tokens << [:close, state] - string_delimiter = nil + encoder.text_token match, :delimiter + encoder.end_group state state = :initial - next + string_delimiter = nil elsif state == :string && (match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)) if string_delimiter == "'" && !(match == "\\\\" || match == "\\'") - kind = :content + encoder.text_token match, :content else - kind = :char + encoder.text_token match, :char end - elsif scan(/\\./m) - kind = :content - elsif scan(/ \\ | $ /x) - tokens << [:close, state] - kind = :error + elsif match = scan(/\\./m) + encoder.text_token match, :content + elsif match = scan(/ \\ | $ /x) + encoder.end_group state state = :initial + encoder.text_token match, :error else - raise_inspect "else case \" reached; %p not handled." % peek(1), tokens + raise_inspect "else case \" reached; %p not handled." % peek(1), encoder end else - raise_inspect 'Unknown state', tokens + raise_inspect 'Unknown state', encoder end - - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens - end - raise_inspect 'Empty token', tokens unless match last_token_dot = match == '.' - tokens << [match, kind] - end if state == :string - tokens << [:close, state] + encoder.end_group state end - tokens + encoder end end diff --git a/lib/coderay/scanners/java_script.rb b/lib/coderay/scanners/java_script.rb index 92ac005..3ae8d80 100644 --- a/lib/coderay/scanners/java_script.rb +++ b/lib/coderay/scanners/java_script.rb @@ -5,12 +5,12 @@ module Scanners # # Aliases: +ecmascript+, +ecma_script+, +javascript+ class JavaScript < Scanner - + include Streamable - + register_for :java_script file_extension 'js' - + # The actual JavaScript keywords. KEYWORDS = %w[ break case catch continue default delete do else @@ -40,7 +40,7 @@ module Scanners add(PREDEFINED_CONSTANTS, :pre_constant). add(MAGIC_VARIABLES, :local_variable). add(KEYWORDS, :keyword) # :nodoc: - + ESCAPE = / [bfnrtv\n\\'"] | x[a-fA-F0-9]{1,2} | [0-7]{1,3} /x # :nodoc: UNICODE_ESCAPE = / u[a-fA-F0-9]{4} | U[a-fA-F0-9]{8} /x # :nodoc: REGEXP_ESCAPE = / [bBdDsSwW] /x # :nodoc: @@ -56,47 +56,43 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial string_delimiter = nil value_expected = true key_expected = false function_expected = false - + until eos? - - kind = nil - match = nil case state - + when :initial - + if match = scan(/ \s+ | \\\n /x) value_expected = true if !value_expected && match.index(?\n) - tokens << [match, :space] - next - - elsif scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) + encoder.text_token match, :space + + elsif match = scan(%r! // [^\n\\]* (?: \\. [^\n\\]* )* | /\* (?: .*? \*/ | .* ) !mx) value_expected = true - kind = :comment - + encoder.text_token match, :comment + elsif check(/\.?\d/) key_expected = value_expected = false - if scan(/0[xX][0-9A-Fa-f]+/) - kind = :hex - elsif scan(/(?>0[0-7]+)(?![89.eEfF])/) - kind = :oct - elsif scan(/\d+[fF]|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/) - kind = :float - elsif scan(/\d+/) - kind = :integer + if match = scan(/0[xX][0-9A-Fa-f]+/) + encoder.text_token match, :hex + elsif match = scan(/(?>0[0-7]+)(?![89.eEfF])/) + encoder.text_token match, :oct + elsif match = scan(/\d+[fF]|\d*\.\d+(?:[eE][+-]?\d+)?[fF]?|\d+[eE][+-]?\d+[fF]?/) + encoder.text_token match, :float + elsif match = scan(/\d+/) + encoder.text_token match, :integer end - + elsif value_expected && match = scan(/<([[:alpha:]]\w*) (?: [^\/>]*\/> | .*?<\/\1>)/xim) # FIXME: scan over nested tags - xml_scanner.tokenize match + xml_scanner.tokenize match, :tokens => encoder value_expected = false next @@ -105,12 +101,12 @@ module Scanners last_operator = match[-1] key_expected = (last_operator == ?{) || (last_operator == ?,) function_expected = false - kind = :operator - - elsif scan(/ [)\]}]+ /x) + encoder.text_token match, :operator + + elsif match = scan(/ [)\]}]+ /x) function_expected = key_expected = value_expected = false - kind = :operator - + encoder.text_token match, :operator + elsif match = scan(/ [$a-zA-Z_][A-Za-z_0-9$]* /x) kind = IDENT_KIND[match] value_expected = (kind == :keyword) && KEYWORDS_EXPECTING_VALUE[match] @@ -128,101 +124,91 @@ module Scanners end function_expected = (kind == :keyword) && (match == 'function') key_expected = false - + encoder.text_token match, kind + elsif match = scan(/["']/) if key_expected && check(KEY_CHECK_PATTERN[match]) state = :key else state = :string end - tokens << [:open, state] + encoder.begin_group state string_delimiter = match - kind = :delimiter - + encoder.text_token match, :delimiter + elsif value_expected && (match = scan(/\/(?=\S)/)) - tokens << [:open, :regexp] + encoder.begin_group :regexp state = :regexp string_delimiter = '/' - kind = :delimiter - - elsif scan(/ \/ /x) + encoder.text_token match, :delimiter + + elsif match = scan(/ \/ /x) value_expected = true key_expected = false - kind = :operator - + encoder.text_token match, :operator + else - getch - kind = :error - + encoder.text_token getch, :error + end - + when :string, :regexp, :key - if scan(STRING_CONTENT_PATTERN[string_delimiter]) - kind = :content + if match = scan(STRING_CONTENT_PATTERN[string_delimiter]) + encoder.text_token match, :content elsif match = scan(/["'\/]/) - tokens << [match, :delimiter] + encoder.text_token match, :delimiter if state == :regexp modifiers = scan(/[gim]+/) - tokens << [modifiers, :modifier] if modifiers && !modifiers.empty? + encoder.text_token modifiers, :modifier if modifiers && !modifiers.empty? end - tokens << [:close, state] + encoder.end_group state string_delimiter = nil key_expected = value_expected = false state = :initial - next elsif state != :regexp && (match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)) if string_delimiter == "'" && !(match == "\\\\" || match == "\\'") - kind = :content + encoder.text_token match, :content else - kind = :char + encoder.text_token match, :char end - elsif state == :regexp && scan(/ \\ (?: #{ESCAPE} | #{REGEXP_ESCAPE} | #{UNICODE_ESCAPE} ) /mox) - kind = :char - elsif scan(/\\./m) - kind = :content - elsif scan(/ \\ | $ /x) - tokens << [:close, state] - kind = :error + elsif state == :regexp && match = scan(/ \\ (?: #{ESCAPE} | #{REGEXP_ESCAPE} | #{UNICODE_ESCAPE} ) /mox) + encoder.text_token match, :char + elsif match = scan(/\\./m) + encoder.text_token match, :content + elsif match = scan(/ \\ | $ /x) + encoder.end_group state + encoder.text_token match, :error key_expected = value_expected = false state = :initial else - raise_inspect "else case \" reached; %p not handled." % peek(1), tokens + raise_inspect "else case \" reached; %p not handled." % peek(1), encoder end - + else - raise_inspect 'Unknown state', tokens - - end - - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens + raise_inspect 'Unknown state', encoder + end - raise_inspect 'Empty token', tokens unless match - tokens << [match, kind] - end - + if [:string, :regexp].include? state - tokens << [:close, state] + encoder.end_group state end - - tokens + + encoder end - + protected - + def reset_instance super @xml_scanner.reset if defined? @xml_scanner end - + def xml_scanner @xml_scanner ||= CodeRay.scanner :xml, :tokens => @tokens, :keep_tokens => true, :keep_state => false end - + end end diff --git a/lib/coderay/scanners/json.rb b/lib/coderay/scanners/json.rb index ca74ff3..668fd82 100644 --- a/lib/coderay/scanners/json.rb +++ b/lib/coderay/scanners/json.rb @@ -19,7 +19,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial stack = [] @@ -27,82 +27,67 @@ module Scanners until eos? - kind = nil - match = nil - case state when :initial - if match = scan(/ \s+ | \\\n /x) - tokens << [match, :space] - next + if match = scan(/ \s+ /x) + encoder.text_token match, :space + elsif match = scan(/"/) + state = key_expected ? :key : :string + encoder.begin_group state + encoder.text_token match, :delimiter elsif match = scan(/ [:,\[{\]}] /x) - kind = :operator + encoder.text_token match, :operator case match - when '{' then stack << :object; key_expected = true - when '[' then stack << :array when ':' then key_expected = false when ',' then key_expected = true if stack.last == :object + when '{' then stack << :object; key_expected = true + when '[' then stack << :array when '}', ']' then stack.pop # no error recovery, but works for valid JSON end elsif match = scan(/ true | false | null /x) - kind = :value - elsif match = scan(/-?(?:0|[1-9]\d*)/) + encoder.text_token match, :value + elsif match = scan(/ -? (?: 0 | [1-9]\d* ) /x) kind = :integer - if scan(/\.\d+(?:[eE][-+]?\d+)?|[eE][-+]?\d+/) + if scan(/ \.\d+ (?:[eE][-+]?\d+)? | [eE][-+]? \d+ /x) match << matched kind = :float end - elsif match = scan(/"/) - state = key_expected ? :key : :string - tokens << [:open, state] - kind = :delimiter + encoder.text_token match, kind else - getch - kind = :error + encoder.text_token getch, :error end when :string, :key - if scan(/[^\\"]+/) - kind = :content - elsif scan(/"/) - tokens << ['"', :delimiter] - tokens << [:close, state] + if match = scan(/[^\\"]+/) + encoder.text_token match, :content + elsif match = scan(/"/) + encoder.text_token match, :delimiter + encoder.end_group state state = :initial - next - elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) - kind = :char - elsif scan(/\\./m) - kind = :content - elsif scan(/ \\ | $ /x) - tokens << [:close, state] - kind = :error + elsif match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) + encoder.text_token match, :char + elsif match = scan(/\\./m) + encoder.text_token match, :content + elsif match = scan(/ \\ | $ /x) + encoder.end_group state + encoder.text_token match, :error state = :initial else - raise_inspect "else case \" reached; %p not handled." % peek(1), tokens + raise_inspect "else case \" reached; %p not handled." % peek(1), encoder end else - raise_inspect 'Unknown state', tokens + raise_inspect 'Unknown state', encoder end - - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens - end - raise_inspect 'Empty token', tokens unless match - - tokens << [match, kind] - end if [:string, :key].include? state - tokens << [:close, state] + encoder.end_group state end - tokens + encoder end end diff --git a/lib/coderay/scanners/nitro_xhtml.rb b/lib/coderay/scanners/nitro_xhtml.rb index fe6b303..ba8ee71 100644 --- a/lib/coderay/scanners/nitro_xhtml.rb +++ b/lib/coderay/scanners/nitro_xhtml.rb @@ -1,14 +1,14 @@ module CodeRay module Scanners - + load :html load :ruby - + # Nitro XHTML Scanner # # Alias: +nitro+ class NitroXHTML < Scanner - + include Streamable register_for :nitro_xhtml file_extension :xhtml @@ -38,7 +38,7 @@ module Scanners ) (?: %> )? /mx # :nodoc: - + NITRO_VALUE_BLOCK = / \# (?: @@ -55,83 +55,83 @@ module Scanners | \\ [^\\]* \\? ) /x # :nodoc: - + NITRO_ENTITY = / % (?: \#\d+ | \w+ ) ; / # :nodoc: - + START_OF_RUBY = / (?=[<\#%]) < (?: \?r | % | ruby> ) | \# [{(|] | % (?: \#\d+ | \w+ ) ; /x # :nodoc: - + CLOSING_PAREN = Hash.new { |h, p| h[p] = p } # :nodoc: CLOSING_PAREN.update( { '(' => ')', '[' => ']', '{' => '}', } ) - + protected - + def setup @ruby_scanner = CodeRay.scanner :ruby, :tokens => @tokens, :keep_tokens => true @html_scanner = CodeRay.scanner :html, :tokens => @tokens, :keep_tokens => true, :keep_state => true end - + def reset_instance super @html_scanner.reset end - - def scan_tokens tokens, options - + + def scan_tokens encoder, options + until eos? - - if (match = scan_until(/(?=#{START_OF_RUBY})/o) || scan_until(/\z/)) and not match.empty? + + if (match = scan_until(/(?=#{START_OF_RUBY})/o) || match = scan_until(/\z/)) and not match.empty? @html_scanner.tokenize match - + elsif match = scan(/#{NITRO_VALUE_BLOCK}/o) start_tag = match[0,2] delimiter = CLOSING_PAREN[start_tag[1,1]] end_tag = match[-1,1] == delimiter ? delimiter : '' - tokens << [:open, :inline] - tokens << [start_tag, :inline_delimiter] + encoder.begin_group :inline + encoder.text_token start_tag, :inline_delimiter code = match[start_tag.size .. -1 - end_tag.size] - @ruby_scanner.tokenize code - tokens << [end_tag, :inline_delimiter] unless end_tag.empty? - tokens << [:close, :inline] - + @ruby_scanner.tokenize code, :tokens => encoder + encoder.text_token end_tag, :inline_delimiter unless end_tag.empty? + encoder.end_group :inline + elsif match = scan(/#{NITRO_RUBY_BLOCK}/o) start_tag = '<?r' end_tag = match[-2,2] == '?>' ? '?>' : '' - tokens << [:open, :inline] - tokens << [start_tag, :inline_delimiter] + encoder.begin_group :inline + encoder.text_token start_tag, :inline_delimiter code = match[start_tag.size .. -(end_tag.size)-1] - @ruby_scanner.tokenize code - tokens << [end_tag, :inline_delimiter] unless end_tag.empty? - tokens << [:close, :inline] - + @ruby_scanner.tokenize code, :tokens => encoder + encoder.text_token end_tag, :inline_delimiter unless end_tag.empty? + encoder.end_group :inline + elsif entity = scan(/#{NITRO_ENTITY}/o) - tokens << [entity, :entity] - + encoder.text_token entity, :entity + elsif scan(/%/) - tokens << [matched, :error] - + encoder.text_token matched, :error + else - raise_inspect 'else-case reached!', tokens + raise_inspect 'else-case reached!', encoder end - + end - - tokens - + + encoder + end - + end - + end end diff --git a/lib/coderay/scanners/php.rb b/lib/coderay/scanners/php.rb index 289e795..67bb233 100644 --- a/lib/coderay/scanners/php.rb +++ b/lib/coderay/scanners/php.rb @@ -230,7 +230,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options if check(RE::PHP_START) || # starts with <? (match?(/\s*<\S/) && exist?(RE::PHP_START)) || # starts with tag and contains <? @@ -252,29 +252,24 @@ module Scanners until eos? - match = nil - kind = nil - case states.last when :initial # HTML - if scan RE::PHP_START - kind = :inline_delimiter + if match = scan(RE::PHP_START) + encoder.text_token match, :inline_delimiter label_expected = true states << :php else match = scan_until(/(?=#{RE::PHP_START})/o) || scan_until(/\z/) @html_scanner.tokenize match unless match.empty? - next end when :php if match = scan(/\s+/) - tokens << [match, :space] - next + encoder.text_token match, :space - elsif scan(%r! (?m: \/\* (?: .*? \*\/ | .* ) ) | (?://|\#) .*? (?=#{RE::PHP_END}|$) !xo) - kind = :comment + elsif match = scan(%r! (?m: \/\* (?: .*? \*\/ | .* ) ) | (?://|\#) .*? (?=#{RE::PHP_END}|$) !xo) + encoder.text_token match, :comment elsif match = scan(RE::IDENTIFIER) kind = Words::IDENT_KIND[match] @@ -299,77 +294,68 @@ module Scanners next end end + encoder.text_token match, kind - elsif scan(/(?:\d+\.\d*|\d*\.\d+)(?:e[-+]?\d+)?|\d+e[-+]?\d+/i) + elsif match = scan(/(?:\d+\.\d*|\d*\.\d+)(?:e[-+]?\d+)?|\d+e[-+]?\d+/i) label_expected = false - kind = :float + encoder.text_token match, :float - elsif scan(/0x[0-9a-fA-F]+/) + elsif match = scan(/0x[0-9a-fA-F]+/) label_expected = false - kind = :hex + encoder.text_token match, :hex - elsif scan(/\d+/) + elsif match = scan(/\d+/) label_expected = false - kind = :integer - - elsif scan(/'/) - tokens << [:open, :string] - if modifier - tokens << [modifier, :modifier] - modifier = nil - end - kind = :delimiter - states.push :sqstring + encoder.text_token match, :integer - elsif match = scan(/["`]/) - tokens << [:open, :string] + elsif match = scan(/['"`]/) + encoder.begin_group :string if modifier - tokens << [modifier, :modifier] + encoder.text_token modifier, :modifier modifier = nil end delimiter = match - kind = :delimiter - states.push :dqstring + encoder.text_token match, :delimiter + states.push match == "'" ? :sqstring : :dqstring elsif match = scan(RE::VARIABLE) label_expected = false - kind = Words::VARIABLE_KIND[match] + encoder.text_token match, Words::VARIABLE_KIND[match] - elsif scan(/\{/) - kind = :operator + elsif match = scan(/\{/) + encoder.text_token match, :operator label_expected = true states.push :php - elsif scan(/\}/) + elsif match = scan(/\}/) if states.size == 1 - kind = :error + encoder.text_token match, :error else states.pop if states.last.is_a?(::Array) delimiter = states.last[1] states[-1] = states.last[0] - tokens << [matched, :delimiter] - tokens << [:close, :inline] - next + encoder.text_token match, :delimiter + encoder.end_group :inline else - kind = :operator + encoder.text_token match, :operator label_expected = true end end - elsif scan(/@/) + elsif match = scan(/@/) label_expected = false - kind = :exception + encoder.text_token match, :exception - elsif scan RE::PHP_END - kind = :inline_delimiter + elsif match = scan(RE::PHP_END) + encoder.text_token match, :inline_delimiter states = [:initial] elsif match = scan(/<<<(?:(#{RE::IDENTIFIER})|"(#{RE::IDENTIFIER})"|'(#{RE::IDENTIFIER})')/o) - tokens << [:open, :string] + encoder.begin_group :string warn 'heredoc in heredoc?' if heredoc_delimiter heredoc_delimiter = Regexp.escape(self[1] || self[2] || self[3]) - kind = :delimiter + encoder.text_token match, :delimiter states.push self[3] ? :sqstring : :dqstring heredoc_delimiter = /#{heredoc_delimiter}(?=;?$)/ @@ -379,152 +365,141 @@ module Scanners label_expected = true if match == ':' case_expected = false end - kind = :operator + encoder.text_token match, :operator else - getch - kind = :error + encoder.text_token getch, :error end when :sqstring - if scan(heredoc_delimiter ? /[^\\\n]+/ : /[^'\\]+/) - kind = :content - elsif !heredoc_delimiter && scan(/'/) - tokens << [matched, :delimiter] - tokens << [:close, :string] + if match = scan(heredoc_delimiter ? /[^\\\n]+/ : /[^'\\]+/) + encoder.text_token match, :content + elsif !heredoc_delimiter && match = scan(/'/) + encoder.text_token match, :delimiter + encoder.end_group :string delimiter = nil label_expected = false states.pop - next elsif heredoc_delimiter && match = scan(/\n/) - kind = :content if scan heredoc_delimiter - tokens << ["\n", :content] - tokens << [matched, :delimiter] - tokens << [:close, :string] + encoder.text_token "\n", :content + encoder.text_token matched, :delimiter + encoder.end_group :string heredoc_delimiter = nil label_expected = false states.pop - next + else + encoder.text_token match, :content end - elsif scan(heredoc_delimiter ? /\\\\/ : /\\[\\'\n]/) - kind = :char - elsif scan(/\\./m) - kind = :content - elsif scan(/\\/) - kind = :error + elsif match = scan(heredoc_delimiter ? /\\\\/ : /\\[\\'\n]/) + encoder.text_token match, :char + elsif match = scan(/\\./m) + encoder.text_token match, :content + elsif match = scan(/\\/) + encoder.text_token match, :error + else + states.pop end when :dqstring - if scan(heredoc_delimiter ? /[^${\\\n]+/ : (delimiter == '"' ? /[^"${\\]+/ : /[^`${\\]+/)) - kind = :content - elsif !heredoc_delimiter && scan(delimiter == '"' ? /"/ : /`/) - tokens << [matched, :delimiter] - tokens << [:close, :string] + if match = scan(heredoc_delimiter ? /[^${\\\n]+/ : (delimiter == '"' ? /[^"${\\]+/ : /[^`${\\]+/)) + encoder.text_token match, :content + elsif !heredoc_delimiter && match = scan(delimiter == '"' ? /"/ : /`/) + encoder.text_token match, :delimiter + encoder.end_group :string delimiter = nil label_expected = false states.pop - next elsif heredoc_delimiter && match = scan(/\n/) - kind = :content if scan heredoc_delimiter - tokens << ["\n", :content] - tokens << [matched, :delimiter] - tokens << [:close, :string] + encoder.text_token "\n", :content + encoder.text_token matched, :delimiter + encoder.end_group :string heredoc_delimiter = nil label_expected = false states.pop - next + else + encoder.text_token match, :content end - elsif scan(/\\(?:x[0-9A-Fa-f]{1,2}|[0-7]{1,3})/) - kind = :char - elsif scan(heredoc_delimiter ? /\\[nrtvf\\$]/ : (delimiter == '"' ? /\\[nrtvf\\$"]/ : /\\[nrtvf\\$`]/)) - kind = :char - elsif scan(/\\./m) - kind = :content - elsif scan(/\\/) - kind = :error + elsif match = scan(/\\(?:x[0-9A-Fa-f]{1,2}|[0-7]{1,3})/) + encoder.text_token match, :char + elsif match = scan(heredoc_delimiter ? /\\[nrtvf\\$]/ : (delimiter == '"' ? /\\[nrtvf\\$"]/ : /\\[nrtvf\\$`]/)) + encoder.text_token match, :char + elsif match = scan(/\\./m) + encoder.text_token match, :content + elsif match = scan(/\\/) + encoder.text_token match, :error elsif match = scan(/#{RE::VARIABLE}/o) - kind = :local_variable if check(/\[#{RE::IDENTIFIER}\]/o) - tokens << [:open, :inline] - tokens << [match, :local_variable] - tokens << [scan(/\[/), :operator] - tokens << [scan(/#{RE::IDENTIFIER}/o), :ident] - tokens << [scan(/\]/), :operator] - tokens << [:close, :inline] - next + encoder.begin_group :inline + encoder.text_token match, :local_variable + encoder.text_token scan(/\[/), :operator + encoder.text_token scan(/#{RE::IDENTIFIER}/o), :ident + encoder.text_token scan(/\]/), :operator + encoder.end_group :inline elsif check(/\[/) match << scan(/\[['"]?#{RE::IDENTIFIER}?['"]?\]?/o) - kind = :error + encoder.text_token match, :error elsif check(/->#{RE::IDENTIFIER}/o) - tokens << [:open, :inline] - tokens << [match, :local_variable] - tokens << [scan(/->/), :operator] - tokens << [scan(/#{RE::IDENTIFIER}/o), :ident] - tokens << [:close, :inline] - next + encoder.begin_group :inline + encoder.text_token match, :local_variable + encoder.text_token scan(/->/), :operator + encoder.text_token scan(/#{RE::IDENTIFIER}/o), :ident + encoder.end_group :inline elsif check(/->/) match << scan(/->/) - kind = :error + encoder.text_token match, :error + else + encoder.text_token match, :local_variable end elsif match = scan(/\{/) if check(/\$/) - kind = :delimiter + encoder.begin_group :inline states[-1] = [states.last, delimiter] delimiter = nil states.push :php - tokens << [:open, :inline] + encoder.text_token match, :delimiter else - kind = :string + encoder.text_token match, :string end - elsif scan(/\$\{#{RE::IDENTIFIER}\}/o) - kind = :local_variable - elsif scan(/\$/) - kind = :content + elsif match = scan(/\$\{#{RE::IDENTIFIER}\}/o) + encoder.text_token match, :local_variable + elsif match = scan(/\$/) + encoder.text_token match, :content + else + states.pop end when :class_expected - if scan(/\s+/) - kind = :space + if match = scan(/\s+/) + encoder.text_token match, :space elsif match = scan(/#{RE::IDENTIFIER}/o) - kind = :class + encoder.text_token match, :class states.pop else states.pop - next end when :function_expected - if scan(/\s+/) - kind = :space - elsif scan(/&/) - kind = :operator + if match = scan(/\s+/) + encoder.text_token match, :space + elsif match = scan(/&/) + encoder.text_token match, :operator elsif match = scan(/#{RE::IDENTIFIER}/o) - kind = :function + encoder.text_token match, :function states.pop else states.pop - next end else - raise_inspect 'Unknown state!', tokens, states - end - - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens, states + raise_inspect 'Unknown state!', encoder, states end - raise_inspect 'Empty token', tokens, states unless match - - tokens << [match, kind] end - tokens + encoder end end diff --git a/lib/coderay/scanners/plaintext.rb b/lib/coderay/scanners/plaintext.rb index b8db721..e176403 100644 --- a/lib/coderay/scanners/plaintext.rb +++ b/lib/coderay/scanners/plaintext.rb @@ -17,8 +17,9 @@ module Scanners protected - def scan_tokens tokens, options - tokens << [string, :plain] + def scan_tokens encoder, options + encoder.text_token string, :plain + encoder end end diff --git a/lib/coderay/scanners/python.rb b/lib/coderay/scanners/python.rb index be5205e..568ed57 100644 --- a/lib/coderay/scanners/python.rb +++ b/lib/coderay/scanners/python.rb @@ -98,7 +98,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial string_delimiter = nil @@ -111,37 +111,34 @@ module Scanners until eos? - kind = nil - match = nil - if state == :string - if scan(STRING_DELIMITER_REGEXP[string_delimiter]) - tokens << [matched, :delimiter] - tokens << [:close, string_type] + if match = scan(STRING_DELIMITER_REGEXP[string_delimiter]) + encoder.text_token match, :delimiter + encoder.end_group string_type string_type = nil state = :initial next - elsif string_delimiter.size == 3 && scan(/\n/) - kind = :content - elsif scan(STRING_CONTENT_REGEXP[string_delimiter]) - kind = :content - elsif !string_raw && scan(/ \\ #{ESCAPE} /ox) - kind = :char - elsif scan(/ \\ #{UNICODE_ESCAPE} /ox) - kind = :char - elsif scan(/ \\ . /x) - kind = :content - elsif scan(/ \\ | $ /x) - tokens << [:close, string_type] + elsif string_delimiter.size == 3 && match = scan(/\n/) + encoder.text_token match, :content + elsif match = scan(STRING_CONTENT_REGEXP[string_delimiter]) + encoder.text_token match, :content + elsif !string_raw && match = scan(/ \\ #{ESCAPE} /ox) + encoder.text_token match, :char + elsif match = scan(/ \\ #{UNICODE_ESCAPE} /ox) + encoder.text_token match, :char + elsif match = scan(/ \\ . /x) + encoder.text_token match, :content + elsif match = scan(/ \\ | $ /x) + encoder.end_group string_type string_type = nil - kind = :error + encoder.text_token match, :error state = :initial else - raise_inspect "else case \" reached; %p not handled." % peek(1), tokens, state + raise_inspect "else case \" reached; %p not handled." % peek(1), encoder, state end elsif match = scan(/ [ \t]+ | \\?\n /x) - tokens << [match, :space] + encoder.text_token match, :space if match == "\n" state = :initial if state == :include_expected docstring_coming = true if match?(/[ \t]*u?r?"""/) @@ -149,28 +146,28 @@ module Scanners next elsif match = scan(/ \# [^\n]* /mx) - tokens << [match, :comment] + encoder.text_token match, :comment next elsif state == :initial - if scan(/#{OPERATOR}/o) - kind = :operator + if match = scan(/#{OPERATOR}/o) + encoder.text_token match, :operator elsif match = scan(/(u?r?|b)?("""|"|'''|')/i) string_delimiter = self[2] string_type = docstring_coming ? :docstring : :string docstring_coming = false if docstring_coming - tokens << [:open, string_type] + encoder.begin_group string_type string_raw = false modifiers = self[1] unless modifiers.empty? string_raw = !!modifiers.index(?r) - tokens << [modifiers, :modifier] + encoder.text_token modifiers, :modifier match = string_delimiter end state = :string - kind = :delimiter + encoder.text_token match, :delimiter # TODO: backticks @@ -186,43 +183,45 @@ module Scanners state = DEF_NEW_STATE[match] from_import_state << match.to_sym if state == :include_expected end + encoder.text_token match, kind - elsif scan(/@[a-zA-Z0-9_.]+[lL]?/) - kind = :decorator + elsif match = scan(/@[a-zA-Z0-9_.]+[lL]?/) + encoder.text_token match, :decorator - elsif scan(/0[xX][0-9A-Fa-f]+[lL]?/) - kind = :hex + elsif match = scan(/0[xX][0-9A-Fa-f]+[lL]?/) + encoder.text_token match, :hex - elsif scan(/0[bB][01]+[lL]?/) - kind = :bin + elsif match = scan(/0[bB][01]+[lL]?/) + encoder.text_token match, :bin elsif match = scan(/(?:\d*\.\d+|\d+\.\d*)(?:[eE][+-]?\d+)?|\d+[eE][+-]?\d+/) - kind = :float if scan(/[jJ]/) match << matched - kind = :imaginary + encoder.text_token match, :imaginary + else + encoder.text_token match, :float end - elsif scan(/0[oO][0-7]+|0[0-7]+(?![89.eE])[lL]?/) - kind = :oct + elsif match = scan(/0[oO][0-7]+|0[0-7]+(?![89.eE])[lL]?/) + encoder.text_token match, :oct elsif match = scan(/\d+([lL])?/) - kind = :integer if self[1] == nil && scan(/[jJ]/) match << matched - kind = :imaginary + encoder.text_token match, :imaginary + else + encoder.text_token match, :integer end else - getch - kind = :error + encoder.text_token getch, :error end elsif state == :def_expected state = :initial if match = scan(unicode ? /#{NAME}/uo : /#{NAME}/o) - kind = :method + encoder.text_token match, :method else next end @@ -230,33 +229,34 @@ module Scanners elsif state == :class_expected state = :initial if match = scan(unicode ? /#{NAME}/uo : /#{NAME}/o) - kind = :class + encoder.text_token match, :class else next end elsif state == :include_expected if match = scan(unicode ? /#{DESCRIPTOR}/uo : /#{DESCRIPTOR}/o) - kind = :include if match == 'as' - kind = :keyword + encoder.text_token match, :keyword from_import_state << :as elsif from_import_state.first == :from && match == 'import' - kind = :keyword + encoder.text_token match, :keyword from_import_state << :import elsif from_import_state.last == :as - # kind = match[0,1][unicode ? /[[:upper:]]/u : /[[:upper:]]/] ? :class : :method - kind = :ident + # encoder.text_token match, match[0,1][unicode ? /[[:upper:]]/u : /[[:upper:]]/] ? :class : :method + encoder.text_token match, :ident from_import_state.pop elsif IDENT_KIND[match] == :keyword unscan match = nil state = :initial next + else + encoder.text_token match, :include end elsif match = scan(/,/) from_import_state.pop if from_import_state.last == :as - kind = :operator + encoder.text_token match, :operator else from_import_state = [] state = :initial @@ -264,28 +264,19 @@ module Scanners end else - raise_inspect 'Unknown state', tokens, state + raise_inspect 'Unknown state', encoder, state end - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens, state - end - raise_inspect 'Empty token', tokens, state unless match - last_token_dot = match == '.' - tokens << [match, kind] - end if state == :string - tokens << [:close, string_type] + encoder.end_group string_type end - tokens + encoder end end diff --git a/lib/coderay/scanners/rhtml.rb b/lib/coderay/scanners/rhtml.rb index 01fda8e..064a92c 100644 --- a/lib/coderay/scanners/rhtml.rb +++ b/lib/coderay/scanners/rhtml.rb @@ -1,18 +1,18 @@ module CodeRay module Scanners - + load :html load :ruby - + # Scanner for HTML ERB templates. class RHTML < Scanner - + include Streamable register_for :rhtml title 'HTML ERB Template' KINDS_NOT_LOC = HTML::KINDS_NOT_LOC - + ERB_RUBY_BLOCK = / <%(?!%)[=-]? (?> @@ -24,51 +24,51 @@ module Scanners ) (?: -?%> )? /x # :nodoc: - + START_OF_ERB = / <%(?!%) /x # :nodoc: - + protected - + def setup @ruby_scanner = CodeRay.scanner :ruby, :tokens => @tokens, :keep_tokens => true @html_scanner = CodeRay.scanner :html, :tokens => @tokens, :keep_tokens => true, :keep_state => true end - + def reset_instance super @html_scanner.reset end - - def scan_tokens tokens, options - + + def scan_tokens encoder, options + until eos? - + if (match = scan_until(/(?=#{START_OF_ERB})/o) || scan_until(/\z/)) and not match.empty? - @html_scanner.tokenize match - + @html_scanner.tokenize match, :tokens => encoder + elsif match = scan(/#{ERB_RUBY_BLOCK}/o) start_tag = match[/\A<%[-=]?/] end_tag = match[/-?%?>?\z/] - tokens << [:open, :inline] - tokens << [start_tag, :inline_delimiter] + encoder.begin_group :inline + encoder.text_token start_tag, :inline_delimiter code = match[start_tag.size .. -1 - end_tag.size] @ruby_scanner.tokenize code - tokens << [end_tag, :inline_delimiter] unless end_tag.empty? - tokens << [:close, :inline] - + encoder.text_token end_tag, :inline_delimiter unless end_tag.empty? + encoder.end_group :inline + else - raise_inspect 'else-case reached!', tokens + raise_inspect 'else-case reached!', encoder end - + end - - tokens - + + encoder + end - + end - + end end diff --git a/lib/coderay/scanners/ruby.rb b/lib/coderay/scanners/ruby.rb index 0e8e802..dcbfce0 100644 --- a/lib/coderay/scanners/ruby.rb +++ b/lib/coderay/scanners/ruby.rb @@ -30,7 +30,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options patterns = Patterns # avoid constant lookup @@ -50,20 +50,18 @@ module Scanners unicode = string.respond_to?(:encoding) && string.encoding.name == 'UTF-8' until eos? - match = nil - kind = nil if state.instance_of? patterns::StringState match = scan_until(state.pattern) || scan_until(/\z/) - tokens << [match, :content] unless match.empty? + encoder.text_token match, :content unless match.empty? break if eos? if state.heredoc and self[1] # end of heredoc match = getch.to_s match << scan_until(/$/) unless eos? - tokens << [match, :delimiter] - tokens << [:close, state.type] + encoder.text_token match, :delimiter + encoder.end_group state.type state = state.next_state next end @@ -74,34 +72,34 @@ module Scanners if state.paren_depth state.paren_depth -= 1 if state.paren_depth > 0 - tokens << [match, :nesting_delimiter] + encoder.text_token match, :nesting_delimiter next end end - tokens << [match, :delimiter] + encoder.text_token match, :delimiter if state.type == :regexp and not eos? modifiers = scan(/#{patterns::REGEXP_MODIFIERS}/ox) - tokens << [modifiers, :modifier] unless modifiers.empty? + encoder.text_token modifiers, :modifier unless modifiers.empty? end - tokens << [:close, state.type] + encoder.end_group state.type value_expected = false state = state.next_state when '\\' if state.interpreted if esc = scan(/ #{patterns::ESCAPE} /ox) - tokens << [match + esc, :char] + encoder.text_token match + esc, :char else - tokens << [match, :error] + encoder.text_token match, :error end else case m = getch when state.delim, '\\' - tokens << [match + m, :char] + encoder.text_token match + m, :char when nil - tokens << [match, :error] + encoder.text_token match, :error else - tokens << [match + m, :content] + encoder.text_token match + m, :content end end @@ -113,42 +111,38 @@ module Scanners value_expected = true state = :initial inline_block_curly_depth = 1 - tokens << [:open, :inline] - tokens << [match + getch, :inline_delimiter] + encoder.begin_group :inline + encoder.text_token match + getch, :inline_delimiter when '$', '@' - tokens << [match, :escape] + encoder.text_token match, :escape last_state = state state = :initial else raise_inspect 'else-case # reached; #%p not handled' % - [peek(1)], tokens + [peek(1)], encoder end when state.opening_paren state.paren_depth += 1 - tokens << [match, :nesting_delimiter] + encoder.text_token match, :nesting_delimiter when /#{patterns::REGEXP_SYMBOLS}/ox - tokens << [match, :function] + encoder.text_token match, :function else raise_inspect 'else-case " reached; %p not handled, state = %p' % - [match, state], tokens + [match, state], encoder end - next else if match = scan(/[ \t\f]+/) - kind = :space match << scan(/\s*/) unless eos? || heredocs value_expected = true if match.index(?\n) - tokens << [match, kind] - next + encoder.text_token match, :space elsif match = scan(/\\?\n/) - kind = :space if match == "\n" value_expected = true state = :initial if state == :undef_comma_expected @@ -156,24 +150,20 @@ module Scanners if heredocs unscan # heredoc scanning needs \n at start state = heredocs.shift - tokens << [:open, state.type] + encoder.begin_group state.type heredocs = nil if heredocs.empty? next else match << scan(/\s*/) unless eos? end - tokens << [match, kind] - next + encoder.text_token match, :space elsif bol? && match = scan(/\#!.*/) - tokens << [match, :doctype] - next + encoder.text_token match, :doctype elsif match = scan(/\#.*/) or (bol? and match = scan(/#{patterns::RUBYDOC_OR_DATA}/o)) - kind = :comment - tokens << [match, kind] - next + encoder.text_token match, :comment elsif state == :initial @@ -192,16 +182,16 @@ module Scanners value_expected = true if patterns::KEYWORDS_EXPECTING_VALUE[match] end value_expected = true if !value_expected && check(/#{patterns::VALUE_FOLLOWS}/o) + encoder.text_token match, kind elsif method_call_expected and match = scan(unicode ? /#{patterns::METHOD_AFTER_DOT}/uo : /#{patterns::METHOD_AFTER_DOT}/o) - kind = - if method_call_expected == '::' && match[/^[A-Z]/] && !match?(/\(/) - :constant - else - :ident - end + if method_call_expected == '::' && match[/^[A-Z]/] && !match?(/\(/) + encoder.text_token match, :constant + else + encoder.text_token match, :ident + end method_call_expected = false value_expected = check(/#{patterns::VALUE_FOLLOWS}/o) @@ -209,7 +199,6 @@ module Scanners elsif not method_call_expected and match = scan(/ \.\.\.? | (\.|::) | [,\(\)\[\]\{\}] | ==?=? /x) value_expected = match !~ / [.\)\]\}] /x || match =~ /\A\.\./ method_call_expected = self[1] - kind = :operator if inline_block_stack case match when '{' @@ -220,35 +209,36 @@ module Scanners state, inline_block_curly_depth, heredocs = inline_block_stack.pop inline_block_stack = nil if inline_block_stack.empty? heredocs = nil if heredocs && heredocs.empty? - tokens << [match, :inline_delimiter] - kind = :inline - match = :close + encoder.text_token match, :inline_delimiter + encoder.end_group :inline + next end end end + encoder.text_token match, :operator elsif match = scan(/ ['"] /mx) - tokens << [:open, :string] - kind = :delimiter + encoder.begin_group :string + encoder.text_token match, :delimiter state = patterns::StringState.new :string, match == '"', match # important for streaming elsif match = scan(unicode ? /#{patterns::INSTANCE_VARIABLE}/uo : /#{patterns::INSTANCE_VARIABLE}/o) value_expected = false - kind = :instance_variable + encoder.text_token match, :instance_variable elsif value_expected and match = scan(/\//) - tokens << [:open, :regexp] - kind = :delimiter + encoder.begin_group :regexp + encoder.text_token match, :delimiter interpreted = true state = patterns::StringState.new :regexp, interpreted, match - elsif match = value_expected ? scan(/[-+]?#{patterns::NUMERIC}/o) : scan(/#{patterns::NUMERIC}/o) + elsif match = scan(value_expected ? /[-+]?#{patterns::NUMERIC}/o : /#{patterns::NUMERIC}/o) if method_call_expected - kind = :error + encoder.text_token match, :error method_call_expected = false else - kind = self[1] ? :float : :integer + encoder.text_token match, self[1] ? :float : :integer end value_expected = false @@ -256,28 +246,28 @@ module Scanners /#{patterns::SYMBOL}/o) case delim = match[1] when ?', ?" - tokens << [:open, :symbol] - tokens << [':', :symbol] + encoder.begin_group :symbol + encoder.text_token ':', :symbol match = delim.chr - kind = :delimiter + encoder.text_token match, :delimiter state = patterns::StringState.new :symbol, delim == ?", match else - kind = :symbol + encoder.text_token match, :symbol value_expected = false end elsif match = scan(/ [-+!~^]=? | [*|&]{1,2}=? | >>? /x) value_expected = true - kind = :operator + encoder.text_token match, :operator elsif value_expected and match = scan(/#{patterns::HEREDOC_OPEN}/o) indented = self[1] == '-' quote = self[3] delim = self[quote ? 4 : 2] kind = patterns::QUOTE_TO_TYPE[quote] - tokens << [:open, kind] - tokens << [match, :delimiter] - match = :close + encoder.begin_group kind + encoder.text_token match, :delimiter + encoder.end_group kind heredoc = patterns::StringState.new kind, quote != '\'', delim, (indented ? :indented : :linestart ) heredocs ||= [] # create heredocs if empty @@ -286,38 +276,38 @@ module Scanners elsif value_expected and match = scan(/#{patterns::FANCY_START}/o) kind, interpreted = *patterns::FancyStringType.fetch(self[1]) do - raise_inspect 'Unknown fancy string: %%%p' % k, tokens + raise_inspect 'Unknown fancy string: %%%p' % k, encoder end - tokens << [:open, kind] + encoder.begin_group kind state = patterns::StringState.new kind, interpreted, self[2] - kind = :delimiter + encoder.text_token match, :delimiter elsif value_expected and match = scan(/#{patterns::CHARACTER}/o) value_expected = false - kind = :integer + encoder.text_token match, :integer elsif match = scan(/ [\/%]=? | <(?:<|=>?)? | [?:;] /x) value_expected = true - kind = :operator + encoder.text_token match, :operator elsif match = scan(/`/) if method_call_expected - kind = :operator + encoder.text_token match, :operator value_expected = true else - tokens << [:open, :shell] - kind = :delimiter + encoder.begin_group :shell + encoder.text_token match, :delimiter state = patterns::StringState.new :shell, true, match end elsif match = scan(unicode ? /#{patterns::GLOBAL_VARIABLE}/uo : /#{patterns::GLOBAL_VARIABLE}/o) - kind = :global_variable + encoder.text_token match, :global_variable value_expected = false elsif match = scan(unicode ? /#{patterns::CLASS_VARIABLE}/uo : /#{patterns::CLASS_VARIABLE}/o) - kind = :class_variable + encoder.text_token match, :class_variable value_expected = false else @@ -340,9 +330,9 @@ module Scanners end next if unicode end - kind = :error - match = getch - + + encoder.text_token getch, :error + end if last_state @@ -353,34 +343,30 @@ module Scanners elsif state == :def_expected if match = scan(unicode ? /(?>#{patterns::METHOD_NAME_EX})(?!\.|::)/uo : /(?>#{patterns::METHOD_NAME_EX})(?!\.|::)/o) - kind = :method + encoder.text_token match, :method state = :initial else last_state = :dot_expected state = :initial - next end elsif state == :dot_expected if match = scan(/\.|::/) # invalid definition state = :def_expected - kind = :operator + encoder.text_token match, :operator else state = :initial - next end elsif state == :module_expected if match = scan(/<</) - kind = :operator + encoder.text_token match, :operator else state = :initial if match = scan(unicode ? / (?:#{patterns::IDENT}::)* #{patterns::IDENT} /oux : / (?:#{patterns::IDENT}::)* #{patterns::IDENT} /ox) - kind = :class - else - next + encoder.text_token match, :class end end @@ -388,31 +374,29 @@ module Scanners state = :undef_comma_expected if match = scan(unicode ? /(?>#{patterns::METHOD_NAME_EX})(?!\.|::)/uo : /(?>#{patterns::METHOD_NAME_EX})(?!\.|::)/o) - kind = :method + encoder.text_token match, :method elsif match = scan(/#{patterns::SYMBOL}/o) case delim = match[1] when ?', ?" - tokens << [:open, :symbol] - tokens << [':', :symbol] + encoder.begin_group :symbol + encoder.text_token ':', :symbol match = delim.chr - kind = :delimiter + encoder.text_token match, :delimiter state = patterns::StringState.new :symbol, delim == ?", match state.next_state = :undef_comma_expected else - kind = :symbol + encoder.text_token match, :symbol end else state = :initial - next end elsif state == :undef_comma_expected if match = scan(/,/) - kind = :operator + encoder.text_token match, :operator state = :undef_expected else state = :initial - next end elsif state == :alias_expected @@ -420,38 +404,30 @@ module Scanners /(#{patterns::METHOD_NAME_OR_SYMBOL})([ \t]+)(#{patterns::METHOD_NAME_OR_SYMBOL})/o) if match - tokens << [self[1], (self[1][0] == ?: ? :symbol : :method)] - tokens << [self[2], :space] - tokens << [self[3], (self[3][0] == ?: ? :symbol : :method)] + encoder.text_token self[1], (self[1][0] == ?: ? :symbol : :method) + encoder.text_token self[2], :space + encoder.text_token self[3], (self[3][0] == ?: ? :symbol : :method) end state = :initial - next end - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens, state - end - raise_inspect 'Empty token', tokens, state unless match - - tokens << [match, kind] end end # cleaning up if state.is_a? patterns::StringState - tokens << [:close, state.type] + encoder.end_group state.type end if inline_block_stack until inline_block_stack.empty? state, *more = inline_block_stack.pop - tokens << [:close, :inline] if more - tokens << [:close, state.type] + encoder.end_group :inline if more + encoder.end_group state.type end end - tokens + encoder end end diff --git a/lib/coderay/scanners/scheme.rb b/lib/coderay/scanners/scheme.rb index cbd9729..c29641e 100644 --- a/lib/coderay/scanners/scheme.rb +++ b/lib/coderay/scanners/scheme.rb @@ -72,74 +72,63 @@ module CodeRay protected - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial ident_kind = IDENT_KIND until eos? - kind = match = nil case state when :initial - if scan(/ \s+ | \\\n /x) - kind = :space - elsif scan(/['\(\[\)\]]|#\(/) - kind = :operator # FIXME: was :operator_fat - elsif scan(/;.*/) - kind = :comment - elsif scan(/#\\(?:newline|space|.?)/) - kind = :char - elsif scan(/#[ft]/) - kind = :pre_constant - elsif scan(/#{IDENTIFIER}/o) - kind = ident_kind[matched] - elsif scan(/\./) - kind = :operator - elsif scan(/"/) - tokens << [:open, :string] + if match = scan(/ \s+ | \\\n /x) + encoder.text_token match, :space + elsif match = scan(/['\(\[\)\]]|#\(/) + encoder.text_token match, :operator # FIXME: was :operator_fat + elsif match = scan(/;.*/) + encoder.text_token match, :comment + elsif match = scan(/#\\(?:newline|space|.?)/) + encoder.text_token match, :char + elsif match = scan(/#[ft]/) + encoder.text_token match, :pre_constant + elsif match = scan(/#{IDENTIFIER}/o) + encoder.text_token match, ident_kind[matched] + elsif match = scan(/\./) + encoder.text_token match, :operator + elsif match = scan(/"/) + encoder.begin_group :string + encoder.text_token match, :delimiter state = :string - tokens << ['"', :delimiter] - next - elsif scan(/#{NUM}/o) and not matched.empty? - kind = :integer - elsif getch - kind = :error + elsif match = scan(/#{NUM}/o) and not matched.empty? + encoder.text_token match, :integer + else + encoder.text_token getch, :error end when :string - if scan(/[^"\\]+/) or scan(/\\.?/) - kind = :content - elsif scan(/"/) - tokens << ['"', :delimiter] - tokens << [:close, :string] + if match = scan(/[^"\\]+|\\.?/) + encoder.text_token match, :content + elsif match = scan(/"/) + encoder.text_token match, :delimiter + encoder.end_group :string state = :initial - next else raise_inspect "else case \" reached; %p not handled." % peek(1), - tokens, state + encoder, state end else - raise "else case reached" - end - - match ||= matched - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens + raise 'else case reached' + end - raise_inspect 'Empty token', tokens, state unless match - - tokens << [match, kind] end if state == :string - tokens << [:close, :string] + encoder.end_group state end - tokens + encoder end end diff --git a/lib/coderay/scanners/sql.rb b/lib/coderay/scanners/sql.rb index 3aeea77..d62a2c3 100644 --- a/lib/coderay/scanners/sql.rb +++ b/lib/coderay/scanners/sql.rb @@ -51,7 +51,7 @@ module CodeRay module Scanners STRING_PREFIXES = /[xnb]|_\w+/i - def scan_tokens tokens, options + def scan_tokens encoder, options state = :initial string_type = nil @@ -59,54 +59,50 @@ module CodeRay module Scanners until eos? - kind = nil - match = nil - if state == :initial - if scan(/ \s+ | \\\n /x) - kind = :space + if match = scan(/ \s+ | \\\n /x) + encoder.text_token match, :space - elsif scan(/^(?:--\s?|#).*/) - kind = :comment + elsif match = scan(/^(?:--\s?|#).*/) + encoder.text_token match, :comment - elsif scan(%r( /\* (!)? (?: .*? \*/ | .* ) )mx) - kind = self[1] ? :directive : :comment + elsif match = scan(%r( /\* (!)? (?: .*? \*/ | .* ) )mx) + encoder.text_token match, self[1] ? :directive : :comment - elsif scan(/ [-+*\/=<>;,!&^|()\[\]{}~%] | \.(?!\d) /x) - kind = :operator + elsif match = scan(/ [-+*\/=<>;,!&^|()\[\]{}~%] | \.(?!\d) /x) + encoder.text_token match, :operator - elsif scan(/(#{STRING_PREFIXES})?([`"'])/o) + elsif match = scan(/(#{STRING_PREFIXES})?([`"'])/o) prefix = self[1] string_type = self[2] - tokens << [:open, :string] - tokens << [prefix, :modifier] if prefix + encoder.begin_group :string + encoder.text_token prefix, :modifier if prefix match = string_type state = :string - kind = :delimiter + encoder.text_token match, :delimiter elsif match = scan(/ @? [A-Za-z_][A-Za-z_0-9]* /x) # FIXME: Don't match keywords after "." - kind = match[0] == ?@ ? :variable : IDENT_KIND[match.downcase] + encoder.text_token match, match[0] == ?@ ? :variable : IDENT_KIND[match.downcase] - elsif scan(/0[xX][0-9A-Fa-f]+/) - kind = :hex + elsif match = scan(/0[xX][0-9A-Fa-f]+/) + encoder.text_token match, :hex - elsif scan(/0[0-7]+(?![89.eEfF])/) - kind = :oct + elsif match = scan(/0[0-7]+(?![89.eEfF])/) + encoder.text_token match, :oct - elsif scan(/(?>\d+)(?![.eEfF])/) - kind = :integer + elsif match = scan(/(?>\d+)(?![.eEfF])/) + encoder.text_token match, :integer - elsif scan(/\d[fF]|\d*\.\d+(?:[eE][+-]?\d+)?|\d+[eE][+-]?\d+/) - kind = :float + elsif match = scan(/\d[fF]|\d*\.\d+(?:[eE][+-]?\d+)?|\d+[eE][+-]?\d+/) + encoder.text_token match, :float - elsif scan(/\\N/) - kind = :pre_constant + elsif match = scan(/\\N/) + encoder.text_token match, :pre_constant else - getch - kind = :error + encoder.text_token getch, :error end @@ -121,54 +117,48 @@ module CodeRay module Scanners next end unless string_content.empty? - tokens << [string_content, :content] + encoder.text_token string_content, :content string_content = '' end - tokens << [matched, :delimiter] - tokens << [:close, :string] + encoder.text_token match, :delimiter + encoder.end_group :string state = :initial string_type = nil - next else string_content << match end - next - elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) + elsif match = scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) unless string_content.empty? - tokens << [string_content, :content] + encoder.text_token string_content, :content string_content = '' end - kind = :char + encoder.text_token match, :char elsif match = scan(/ \\ . /mox) string_content << match next - elsif scan(/ \\ | $ /x) + elsif match = scan(/ \\ | $ /x) unless string_content.empty? - tokens << [string_content, :content] + encoder.text_token string_content, :content string_content = '' end - kind = :error + encoder.text_token match, :error state = :initial else - raise "else case \" reached; %p not handled." % peek(1), tokens + raise "else case \" reached; %p not handled." % peek(1), encoder end else - raise 'else-case reached', tokens + raise 'else-case reached', encoder end - match ||= matched - unless kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens, state - end - raise_inspect 'Empty token', tokens unless match - - tokens << [match, kind] - end - tokens + + if state == :string + encoder.end_group state + end + + encoder end diff --git a/lib/coderay/scanners/yaml.rb b/lib/coderay/scanners/yaml.rb index 62a6aba..3c3928f 100644 --- a/lib/coderay/scanners/yaml.rb +++ b/lib/coderay/scanners/yaml.rb @@ -13,7 +13,7 @@ module Scanners protected - def scan_tokens tokens, options + def scan_tokens encoder, options value_expected = nil state = :initial @@ -21,50 +21,48 @@ module Scanners until eos? - kind = nil - match = nil key_indent = nil if bol? if match = scan(/ +[\t ]*/) - kind = :space + encoder.text_token match, :space elsif match = scan(/\n+/) - kind = :space + encoder.text_token match, :space state = :initial if match.index(?\n) elsif match = scan(/#.*/) - kind = :comment + encoder.text_token match, :comment elsif bol? and case when match = scan(/---|\.\.\./) - tokens << [:open, :head] - tokens << [match, :head] - tokens << [:close, :head] + encoder.begin_group :head + encoder.text_token match, :head + encoder.end_group :head next when match = scan(/%.*/) - tokens << [match, :doctype] + encoder.text_token match, :doctype next end elsif state == :value and case - when !check(/(?:"[^"]*")(?=: |:$)/) && scan(/"/) - tokens << [:open, :string] - tokens << [matched, :delimiter] - tokens << [matched, :content] if scan(/ [^"\\]* (?: \\. [^"\\]* )* /mx) - tokens << [matched, :delimiter] if scan(/"/) - tokens << [:close, :string] + when !check(/(?:"[^"]*")(?=: |:$)/) && match = scan(/"/) + encoder.begin_group :string + encoder.text_token match, :delimiter + encoder.text_token match, :content if match = scan(/ [^"\\]* (?: \\. [^"\\]* )* /mx) + encoder.text_token match, :delimiter if match = scan(/"/) + encoder.end_group :string next when match = scan(/[|>][-+]?/) - tokens << [:open, :string] - tokens << [match, :delimiter] + encoder.begin_group :string + encoder.text_token match, :delimiter string_indent = key_indent || column(pos - match.size - 1) - tokens << [matched, :content] if scan(/(?:\n+ {#{string_indent + 1}}.*)+/) - tokens << [:close, :string] + encoder.text_token matched, :content if scan(/(?:\n+ {#{string_indent + 1}}.*)+/) + encoder.end_group :string next when match = scan(/(?![!"*&]).+?(?=$|\s+#)/) - tokens << [match, :string] + encoder.text_token match, :string string_indent = key_indent || column(pos - match.size - 1) - tokens << [matched, :string] if scan(/(?:\n+ {#{string_indent + 1}}.*)+/) + encoder.text_token matched, :string if scan(/(?:\n+ {#{string_indent + 1}}.*)+/) next end @@ -72,68 +70,69 @@ module Scanners when match = scan(/[-:](?= |$)/) state = :value if state == :colon && (match == ':' || match == '-') state = :value if state == :initial && match == '-' - kind = :operator + encoder.text_token match, :operator + next when match = scan(/[,{}\[\]]/) - kind = :operator + encoder.text_token match, :operator + next when state == :initial && match = scan(/[\w.() ]*\S(?=: |:$)/) - kind = :key + encoder.text_token match, :key key_indent = column(pos - match.size - 1) - # tokens << [key_indent.inspect, :debug] + # encoder.text_token key_indent.inspect, :debug state = :colon + next when match = scan(/(?:"[^"\n]*"|'[^'\n]*')(?=: |:$)/) - tokens << [:open, :key] - tokens << [match[0,1], :delimiter] - tokens << [match[1..-2], :content] - tokens << [match[-1,1], :delimiter] - tokens << [:close, :key] + encoder.begin_group :key + encoder.text_token match[0,1], :delimiter + encoder.text_token match[1..-2], :content + encoder.text_token match[-1,1], :delimiter + encoder.end_group :key key_indent = column(pos - match.size - 1) - # tokens << [key_indent.inspect, :debug] + # encoder.text_token key_indent.inspect, :debug state = :colon next - when scan(/(![\w\/]+)(:([\w:]+))?/) - tokens << [self[1], :type] + when match = scan(/(![\w\/]+)(:([\w:]+))?/) + encoder.text_token self[1], :type if self[2] - tokens << [':', :operator] - tokens << [self[3], :class] + encoder.text_token ':', :operator + encoder.text_token self[3], :class end next - when scan(/&\S+/) - kind = :variable - when scan(/\*\w+/) - kind = :global_variable - when scan(/<</) - kind = :class_variable - when scan(/\d\d:\d\d:\d\d/) - kind = :oct - when scan(/\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d(\.\d+)? [-+]\d\d:\d\d/) - kind = :oct - when scan(/:\w+/) - kind = :symbol - when scan(/[^:\s]+(:(?! |$)[^:\s]*)* .*/) - kind = :error - when scan(/[^:\s]+(:(?! |$)[^:\s]*)*/) - kind = :error + when match = scan(/&\S+/) + encoder.text_token match, :variable + next + when match = scan(/\*\w+/) + encoder.text_token match, :global_variable + next + when match = scan(/<</) + encoder.text_token match, :class_variable + next + when match = scan(/\d\d:\d\d:\d\d/) + encoder.text_token match, :oct + next + when match = scan(/\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d(\.\d+)? [-+]\d\d:\d\d/) + encoder.text_token match, :oct + next + when match = scan(/:\w+/) + encoder.text_token match, :symbol + next + when match = scan(/[^:\s]+(:(?! |$)[^:\s]*)* .*/) + encoder.text_token match, :error + next + when match = scan(/[^:\s]+(:(?! |$)[^:\s]*)*/) + encoder.text_token match, :error + next end else - getch - kind = :error + raise if eos? + encoder.text_token getch, :error end - match ||= matched - - if $CODERAY_DEBUG and not kind - raise_inspect 'Error token %p in line %d' % - [[match, kind], line], tokens, state - end - raise_inspect 'Empty token', tokens, state unless match - - tokens << [match, kind] - end - tokens + encoder end end diff --git a/lib/coderay/token_kinds.rb b/lib/coderay/token_kinds.rb index 3e63372..9904d50 100755 --- a/lib/coderay/token_kinds.rb +++ b/lib/coderay/token_kinds.rb @@ -79,7 +79,6 @@ module CodeRay :plain => :NO_HIGHLIGHT, } AbbreviationForKind[:method] = AbbreviationForKind[:function] - AbbreviationForKind[:open] = AbbreviationForKind[:close] = AbbreviationForKind[:delimiter] AbbreviationForKind[:nesting_delimiter] = AbbreviationForKind[:delimiter] AbbreviationForKind[:escape] = AbbreviationForKind[:delimiter] AbbreviationForKind[:docstring] = AbbreviationForKind[:comment] diff --git a/lib/coderay/tokens.rb b/lib/coderay/tokens.rb index 2a0dc15..c85c2f1 100644 --- a/lib/coderay/tokens.rb +++ b/lib/coderay/tokens.rb @@ -1,6 +1,6 @@ module CodeRay - # = Tokens + # = Tokens TODO: Rewrite! # # The Tokens class represents a list of tokens returnd from # a Scanner. @@ -8,7 +8,7 @@ module CodeRay # A token is not a special object, just a two-element Array # consisting of # * the _token_ _text_ (the original source of the token in a String) or - # a _token_ _action_ (:open, :close, :begin_line, :end_line) + # a _token_ _action_ (begin_group, end_group, begin_line, end_line) # * the _token_ _kind_ (a Symbol representing the type of the token) # # A token looks like this: @@ -18,16 +18,16 @@ module CodeRay # ['$^', :error] # # Some scanners also yield sub-tokens, represented by special - # token actions, namely :open and :close. + # token actions, namely begin_group and end_group. # # The Ruby scanner, for example, splits "a string" into: # # [ - # [:open, :string], + # [:begin_group, :string], # ['"', :delimiter], # ['a string', :content], # ['"', :delimiter], - # [:close, :string] + # [:end_group, :string] # ] # # Tokens is the interface between Scanners and Encoders: @@ -47,20 +47,11 @@ module CodeRay # # It also allows you to generate tokens directly (without using a scanner), # to load them from a file, and still use any Encoder that CodeRay provides. - # - # Tokens' subclass TokenStream allows streaming to save memory. class Tokens < Array # The Scanner instance that created the tokens. attr_accessor :scanner - # Whether the object is a TokenStream. - # - # Returns false. - def stream? - false - end - # Iterates over all tokens. # # If a filter is given, only tokens of that kind are yielded. @@ -76,7 +67,7 @@ module CodeRay end # Iterates over all text tokens. - # Range tokens like [:open, :string] are left out. + # Token actions are left out. # # Example: # tokens.each_text_token { |text, kind| text.replace html_escape(text) } @@ -117,9 +108,13 @@ module CodeRay # For example, if you call +tokens.html+, the HTML encoder # is used to highlight the tokens. def method_missing meth, options = {} - Encoders[meth].new(options).encode_tokens self + encode_with meth, options end - + + def encode_with encoder, options = {} + Encoders[encoder].new(options).encode_tokens self + end + # Returns the tokens compressed by joining consecutive # tokens of the same kind. # @@ -158,7 +153,7 @@ module CodeRay replace optimize end - # Ensure that all :open tokens have a correspondent :close one. + # Ensure that all begin_group tokens have a correspondent end_group. # # TODO: Test this! def fix @@ -167,15 +162,15 @@ module CodeRay opened = [] for type, kind in self case type - when :open - opened.push [:close, kind] + when :begin_group + opened.push [:begin_group, kind] when :begin_line opened.push [:end_line, kind] - when :close, :end_line + when :end_group, :end_line expected = opened.pop if [type, kind] != expected - # Unexpected :close; decide what to do based on the kind: - # - token was never opened: delete the :close (just skip it) + # Unexpected end; decide what to do based on the kind: + # - token was never opened: delete the end (just skip it) next unless opened.rindex expected # - token was opened earlier: also close tokens in between tokens << token until (token = opened.pop) == expected @@ -230,6 +225,11 @@ module CodeRay dump = dump.gzip gzip_level dump.extend Undumping end + + # Return the actual number of tokens. + def count + size / 2 + end # The total size of the tokens. # Should be equal to the input size before @@ -242,9 +242,7 @@ module CodeRay size end - # The total size of the tokens. - # Should be equal to the input size before - # scanning. + # Return all text tokens joined into a single string. def text map { |t, k| t if t.is_a? ::String }.join end @@ -271,77 +269,12 @@ module CodeRay @dump = Marshal.load dump end - end - - - # = TokenStream - # - # The TokenStream class is a fake Array without elements. - # - # It redirects the method << to a block given at creation. - # - # This allows scanners and Encoders to use streaming (no - # tokens are saved, the input is highlighted the same time it - # is scanned) with the same code. - # - # See CodeRay.encode_stream and CodeRay.scan_stream - class TokenStream < Tokens - - # Whether the object is a TokenStream. - # - # Returns true. - def stream? - true - end - - # The Array is empty, but size counts the tokens given by <<. - attr_reader :size - - # Creates a new TokenStream that calls +block+ whenever - # its << method is called. - # - # Example: - # - # require 'coderay' - # - # token_stream = CodeRay::TokenStream.new do |text, kind| - # puts 'kind: %s, text size: %d.' % [kind, text.size] - # end - # - # token_stream << ['/\d+/', :regexp] - # #-> kind: rexpexp, text size: 5. - # - def initialize &block - raise ArgumentError, 'Block expected for streaming.' unless block - @callback = block - @size = 0 - end - - # Calls +block+ with +token+ and increments size. - # - # Returns self. - def << token - @callback.call(*token) - @size += 1 - self - end - - # This method is not implemented due to speed reasons. Use Tokens. - def text_size - raise NotImplementedError, - 'This method is not implemented due to speed reasons.' - end - - # A TokenStream cannot be dumped. Use Tokens. - def dump - raise NotImplementedError, 'A TokenStream cannot be dumped.' - end - - # A TokenStream cannot be optimized. Use Tokens. - def optimize - raise NotImplementedError, 'A TokenStream cannot be optimized.' - end - + alias text_token push + def begin_group kind; push :begin_group, kind end + def end_group kind; push :end_group, kind end + def begin_line kind; push :begin_line, kind end + def end_line kind; push :end_line, kind end + end end @@ -369,17 +302,18 @@ class TokensTest < Test::Unit::TestCase def test_adding_tokens tokens = CodeRay::Tokens.new assert_nothing_raised do - tokens << ['string', :type] - tokens << ['()', :operator] + tokens.text_token 'string', :type + tokens.text_token '()', :operator end - assert_equal tokens.size, 2 + assert_equal tokens.size, 4 + assert_equal tokens.count, 2 end def test_dump_undump tokens = CodeRay::Tokens.new assert_nothing_raised do - tokens << ['string', :type] - tokens << ['()', :operator] + tokens.text_token 'string', :type + tokens.text_token '()', :operator end tokens2 = nil assert_nothing_raised do diff --git a/test/functional/basic.rb b/test/functional/basic.rb index 150089e..013c0c4 100755 --- a/test/functional/basic.rb +++ b/test/functional/basic.rb @@ -14,12 +14,12 @@ class BasicTest < Test::Unit::TestCase RUBY_TEST_TOKENS = [ ['puts', :ident], [' ', :space], - [:open, :string], + [:begin_group, :string], ['"', :delimiter], ['Hello, World!', :content], ['"', :delimiter], - [:close, :string] - ] + [:end_group, :string] + ].flatten def test_simple_scan assert_nothing_raised do assert_equal RUBY_TEST_TOKENS, CodeRay.scan(RUBY_TEST_CODE, :ruby).to_ary diff --git a/test/functional/for_redcloth.rb b/test/functional/for_redcloth.rb index a1c3100..a8a737a 100644 --- a/test/functional/for_redcloth.rb +++ b/test/functional/for_redcloth.rb @@ -1,5 +1,5 @@ require 'test/unit' -$: << 'lib' +$:.unshift 'lib' require 'coderay' begin @@ -14,11 +14,11 @@ class BasicTest < Test::Unit::TestCase def test_for_redcloth require 'coderay/for_redcloth' - assert_equal "<p><span lang=\"ruby\" class=\"CodeRay\">puts <span style=\"background-color:#fff0f0;color:#D20\"><span style=\"color:#710\">"</span><span style=\"\">Hello, World!</span><span style=\"color:#710\">"</span></span></span></p>", + assert_equal "<p><span lang=\"ruby\" class=\"CodeRay\">puts <span style=\"background-color:hsla(0,100%,50%,0.1);color:#D20\"><span style=\"color:#710\">"</span><span style=\"\">Hello, World!</span><span style=\"color:#710\">"</span></span></span></p>", RedCloth.new('@[ruby]puts "Hello, World!"@').to_html assert_equal <<-BLOCKCODE.chomp, <div lang="ruby" class="CodeRay"> - <div class="code"><pre>puts <span style="background-color:#fff0f0;color:#D20"><span style="color:#710">"</span><span style="">Hello, World!</span><span style="color:#710">"</span></span></pre></div> + <div class="code"><pre>puts <span style="background-color:hsla(0,100%,50%,0.1);color:#D20"><span style="color:#710">"</span><span style="">Hello, World!</span><span style="color:#710">"</span></span></pre></div> </div> BLOCKCODE RedCloth.new('bc[ruby]. puts "Hello, World!"').to_html diff --git a/test/functional/suite.rb b/test/functional/suite.rb index 039ab47..97dd330 100755 --- a/test/functional/suite.rb +++ b/test/functional/suite.rb @@ -2,7 +2,7 @@ require 'test/unit' MYDIR = File.dirname(__FILE__) -$: << 'lib' +$:.unshift 'lib' require 'coderay' puts "Running basic CodeRay #{CodeRay::VERSION} tests..." |