File Icon Extractor Data Extraction Overview
Apr 16

Printer Friendly Version

Download Source Code: CSharpSyntaxHighlighter.zip - 11.54KB

Generic SyntaxHighlighter Base Class

We'll start with a generic SyntaxHighlighter base class, which provides common control properties and simple buffer-based functions for any kind of specific programming language-based highlighter.

Upon each Process method call, our parser loads your source code into a _buffer string and uses two internal _start and _end indexes, as delimiters for your current token. Parsed words are appended into a _result StringBuilder object, surrounded by SPAN HTML tags with different class names, to provide different colors, based on each word's semantics.

To make it easier for you, when you implement a ProcessThis method override, we provided some simple functions for the buffer, such as FindNext, Current, CharAt and Substring.

As common control properties, you can show your source code with or without line numbers, comments and collapsible blocks. Line numbering functionality is common for any kind of listings, and it will essentially add a specific simple header to each line. The method AddLineNumbers can be called AFTER you already parsed and converted the code, for your specific programming language:

/// <summary>
/// Generic method, to be called at the end of ProcessThis,
/// after regular parsing and before returning the results
/// </summary>
protected void AddLineNumbers()
{
    if (_showLineNumbers)
    {
        // Simple Split&Merge algorithm:
        // Split _result string in lines and append a header
        // with the line number for each line
        // The new _result contains all merged lines, with headers
        string[] lines = _result.ToString().Split('\n');
        _result = new StringBuilder();
        for (int i = 0; i < lines.Length; i++)
        {
            _result.Append("<span class=\"sh_line\">");
            _result.Append((i + 1).ToString().PadLeft(
                _lineNumberSpaces));
            _result.Append("</span>");
            _result.Append(lines[i]);
            _result.Append('\n');
        }
    }
}

Best known collapsible blocks in .NET are C# #region and VB.NET #Region expandable sections. In our layout, they may look a bit different, but provide the same dynamic functionality: when you click on their header line, they will collapse or expand their inner content. When you opt for collapsible blocks, your whole highlighted result can be hidden, with a Show/Hide hyperlink we display on top.

To add a collapsing region, Call AddCollapsibleBlock from your own ProcessThis override. This will transparently add some dynamic JavaScript-based code, which calls the shToggle function from our .js file. AddCollapsibleBlock will also asign a unique HTML identifier to each region, and restart this number from zero when it gets too high. To avoid collisions, your HTML page should not use elsewhere IDs with blk prefix followed by a number.

// To add collapsible blocks, call AddCollapsibleBlock for #region
// and EndCollapsibleBlock() after #endregion
// For empty text, adds Show/Hide top content block
protected string AddCollapsibleBlock(string text)
{
    if (!_showCollapsibleBlocks)
        return text;
    if (text.Length == 0)
        text = "Show/Hide\r\n";
    if (_lastBlockId == MAX_BLOCK_ID)
        _lastBlockId = 0;

    // collapsible blocks are dynamically processed
    // by JavaScript's shToggle function, from SyntaxHighlighters.js
    return "<span class=\"sh_expanded\" onclick=\"shToggle(this,"
        + (++_lastBlockId) + ")\">" + text
        + "</span><span id=\"blk" + _lastBlockId + "\">";
}
protected string EndCollapsibleBlock()
{
    return (_showCollapsibleBlocks ? "</span>" : "");
}

IDEs such as Visual Studio also recognize and show type names with different colors, as well as other element types. You can customize font and colors going to the Tools-Options menu item and Environment Font and Colors screen. It would be too much for our highlighters to parse full metadata and determine if an identifier is a type (a class name, enum type name, interface name and so on). However, we provide a simple feature, in case you want to manually pass the few type names from a sequence of code you want highlighted: the Types property.

Visual Studio transforms text that starts with one of http://, https://, ftp://, telnet:/, gopher://, in hyperlinks, that you can follow in your browser by CTRL+click. We'll implement a similar generic feature. If ShowHyperlinks is set, AddHyperlinks method will parse the text passed as argument, identify potential hyperlinks and transform them into HTML A clickable links. It also HTML-encode all other text:

/// <summary>
/// Parse and convert possible hyperlinks in a plain text
/// Hyperlink must start with one of http://, https://,
/// ftp://, telnet:/, gopher://
/// If no _showHyperlinks, just return whole text HTML-encoded
/// </summary>
/// <param name="text">Plain text, check for hyperlinks</param>
/// <returns>HTML-encoded text, with eventual hyperlinks</returns>
protected string AddHyperlinks(string text)
{
    int endEncoded = 0; // text HTML-encoded until this last position
    if (_showHyperlinks)
        for (int position = 0; position < text.Length; )
        {
            // skip characters between identifiers
            while (position < text.Length 
                && !IsHyperlinkChar(text[position]))
                position++;
            if (position >= text.Length)
                break;

            // collects next identifier
            int start = position;
            while (position < text.Length 
                && IsHyperlinkChar(text[position]))
                position++;
            if (position > start)
            {
                string token = text.Substring(
                    start, position - start);
                if (token.StartsWith("http://",
                        StringComparison.InvariantCultureIgnoreCase)
                    || token.StartsWith("https://",
                        StringComparison.InvariantCultureIgnoreCase)
                    || token.StartsWith("ftp://",
                        StringComparison.InvariantCultureIgnoreCase)
                    || token.StartsWith("telnet:/",
                        StringComparison.InvariantCultureIgnoreCase)
                    || token.StartsWith("gopher://",
                        StringComparison.InvariantCultureIgnoreCase))
                {
                    // yep, that's a possible hyperlink
                    token = Encode(token);
                    token = "<a rel=\"nofollow\""
                        + " target=\"_blank\" href=\""
                        + token + "\">" + token + "</a>";
                    token = text.Substring(0, endEncoded)
                        + Encode(text.Substring(
                        endEncoded, start - endEncoded))
                        + token;
                    text = token + text.Substring(position);
                    endEncoded = position = token.Length;
                }
            }
        }

    // return starting encoded portion with
    // HTML-encoded now trailing portion
    return text.Substring(0, endEncoded)
        + Encode(text.Substring(endEncoded));
}

It usually makes sense to call AddHyperlinks only with portions of text you collect from comments or string values, not other identifiers. Remark that this hyperlink conversion is not always very reliable, but that's the way it also behaves in Visual Studio or other Microsoft IDEs. It identifies as hyperlinks text surrounded by white spaces or other characters. But text like http:// alone or http://.> will also be mistaken for a link.

The syntax highlighter assumes your code's syntax is already correct. It will not check for errors and will not tell you anything about the validity of your code. If you forgot to close a multi-line comment or a string value, it may assume - like Visual Studio and other code editors - that all the rest of your code is part of the comment or the string value.

It doesn't have neither the possibility to clearly identify your code as being written in a specific language. For instance, with some limitations, you can almost successfully use a C# language highlighter for C, C++, Java and JavaScript, because all these languages have the same C language as root for their keywords and basic syntax. Most keywords are common, the way you express character or string values is similar, they all use similar single and multi-line comment syntax.

Continue reading »

Subscribe and Share: Subscribe using any feed reader Bookmark and Share

Leave a Reply