Source code to formatted text converter
*** Since Github is now part of Mordor Corp, the highlight Git repo *** *** moved to https://gitlab.com/saalen/highlight ***
OSI Certified Open Source Software
Deutsche Anleitung: README_DE
OVERVIEW 1.1 INTENDED PURPOSE 1.2 FEATURE LIST 1.3 SUPPORTED PROGRAMMING AND MARKUP LANGUAGES
USAGE AND OPTIONS 2.1 QUICK INTRODUCTION 2.2 CLI OPTIONS 2.3 GUI OPTIONS 2.4 INPUT AND OUTPUT 2.5 GNU SOURCE-HIGHLIGHT COMPATIBILITY 2.6 ADVANCED OPTIONS 2.7 ENVIRONMENT VARIABLES
CONFIGURATION 3.1 FILE FORMAT 3.2 LANGUAGE DEFINITIONS 3.3 REGULAR EXPRESSIONS 3.4 THEME DEFINITIONS 3.5 KEYWORD GROUPS 3.6 PLUG-INS 3.7 FILE MAPPING 3.8 CONFIG FILE SEARCH
EMBEDDING HIGHLIGHT 4.1 SAMPLE SCRIPTS 4.2 PANDOC 4.3 SWIG 4.4 TCL 4.5 THIRD PARTY SCRIPTS AND PLUG-INS
BUILDING AND INSTALLING 5.1 PRECOMPILED PACKAGES 5.2 BUILDING DEPENDENCIES
DEVELOPER CONTACT
OVERVIEW
Highlight converts sourcecode to HTML, XHTML, RTF, ODT, LaTeX, TeX, SVG, BBCode, Pango markup and terminal escape sequences with coloured syntax highlighting. Language definitions and colour themes are customizable.
Highlight was designed to offer a flexible but easy to use syntax highlighter for several output formats. No syntax or colouring information is hardcoded, instead all relevant data is stored in configuration scripts. These Lua scripts may be altered and enhanced with plug-in scripts.
Please see README_LANGLIST for the current set of supported languages. You may also run "highlight --list-scripts=langs" to get a list and associated file extensions.
The following examples show how to produce a highlighted C++ file, using main.cpp as input file:
Generate HTML: highlight -i main.cpp -o main.cpp.html highlight < main.cpp > main.cpp.html --syntax cpp
You will find the HTML file and highlight.css in the working directory. If you use IO redirection (2nd example), you must define the programming language with --syntax.
Generate HTML with embedded CSS definitions and line numbers: highlight -i main.cpp -o main.cpp.html --include-style --line-numbers
Generate HTML with inline CSS definitions: highlight -i main.cpp -o main.cpp.html --inline-css
Generate LaTeX using "horstmann" source formatting style and "neon" colour theme: highlight -O latex -i main.cpp -o main.cpp.tex --reformat horstmann --style neon
The following output formats may be defined with --out-format:
html: HTML5 (default) xhtml: XHTML 1.1 tex: Plain TeX latex: LaTeX rtf: RTF odt: OpenDocument Text (Flat XML) svg: SVG bbcode: BBCode pango: Pango markup ansi: Terminal 16 color escape codes xterm256: Terminal 256 color escape codes truecolor: Terminal 16m color escape codes
Customize font settings: highlight --syntax ada --font-size 12 --font "'Courier New',monospace" highlight --syntax ada --out-format=latex --font-size tiny --font sffamily
Define an output directory: highlight -d some/target/dir/ *.cpp *.h
See "highlight --help" or "man highlight" for more details.
The command line version of highlight offers the following options:
USAGE: highlight [OPTIONS]... [FILES]...
General options:
-B, --batch-recursive= ignore listed unknown file types
(Example: --skip='bak;c~;h~')
--start-nested=
Output formatting options:
-O, --out-format=
(X)HTML output options:
-a, --anchors attach anchor to line numbers
-y, --anchor-prefix=
LaTeX output options:
-b, --babel disable Babel package shorthands -r, --replace-quotes replace double quotes by \dq{} --beamer adapt output for the Beamer package --pretty-symbols improve appearance of brackets and other symbols
RTF output options:
--page-color include page color attributes
-x, --page-size=
SVG output options:
--height set image height (units allowed)
--width set image width (see --height)
GNU source-highlight compatibility options:
--doc create stand alone document
--no-doc cancel the --doc option
--css=filename the external style sheet filename
--src-lang=STRING source language
-t, --tab=INT specify tab length -n, --line-number[=0] number all output lines, optional padding --line-number-ref[=p] number all output lines and generate an anchor, made of the specified prefix p + the line number (default='line') --output-dir=path output directory --failsafe if no language definition is found for the input, it is simply copied to the output
The Graphical User Interface offers a subset of the CLI's features. It includes a dynamic preview of the output file's apperarance. Please see screenshots and screencasts on the project website. Invoke highlight-gui with the --portable option to let it save its settings in the binary's current directory (instead of using the registry).
If no input or output file name is defined by --input and --output options, highlight will use stdin and stdout for file processing.
If no input filename is defined by --input or given at the prompt, highlight is not able to determine the language type by means of the file extension (except some scripting languages which are figured out by the shebang in the first input line). In this case you have to pass highlight the given langage with --syntax (this should be the file suffix of the source file in most cases). Example: If you want to convert a Python file, highlight needs to load the py.lang language definition. The correct argument of --syntax would be "py".
highlight test.py highlight < test.py --syntax py # --syntax option necessary cat test.py | highlight --syntax py
If there exist multiple suffixes (like C, cc, cpp, h for C++ - files), they are mapped to a language definition in $CONF_DIR/filetypes.conf.
Highlight enters the batch processing mode if multiple input files are defined or if --batch-recursive is set. In batch mode, highlight will save the generated files using the original filename, appending the extension of the chosen output type. If files in the input directories happen to share the same name, the output files will be prefixed with their source path name. The --out-dir option is recommended in batch mode. Use --quiet to improve performance (recommended for usage in shell scripts).
The HTML, TeX, LaTeX and SVG output formats allow to reference a stylesheet file which contains the formatting information.
In HTML and SVG output, this file contains CSS definitions and is saved as 'highlight.css'. In LaTeX and TeX, it contains macro definitions, and is saved as 'highlight.sty'.
Name and path of the stylesheet may be modified with --style-outfile. If the --outdir option is given, all generated output, including stylesheets, are stored in this directory.
Use --include-style to embed the style information in the output documents without referencing a stylesheet.
Referenced stylesheets have the advantage to share all formatting information in a single file, which affects all referencing documents.
With --style-infile you define a file to be included in the final formatting information of the document. This way you enhance or redefine the default highlight style definitions without editing generated code. Note: Using a plug-in script is the preferred way to enhance styling.
Since there are limited colours defined for ANSI terminal output, there exists only one hard coded colour theme with --out-format=ansi. You should therefore use --out-format=xterm256 to enable output in 256 colours. The 256 colour mode is supported by recent releases of xterm, rxvt and Putty (among others). Recent terminal emulators also support 16m colors, this mode is enabled with --out-format=truecolors.
highlight --out-format=ansi
If the language definition is specified as "txt", no highlighting takes place.
highlight -S txt --out-format=latex README > README.tex
highlight --out-format=xhtml --batch-recursive '*.cpp' --outdir ~/html_code/ This command converts all *.cpp files in the current directory and its sub- directories to xhtml files, and stores the output in /home/you/html_code.
highlight -out-format=latex * --outdir /home/you/latex_code/ This command onverts all files to LaTeX, stored in /home/you/latex_code/.
highlight --stdout -s seashell --print-style This command prints only the CSS information to stdout (theme: Seashell).
The command line interface is extensively harmonised with source-highlight (http://www.gnu.org/software/src-highlite/).
The following highlight options have the same meaning as in source-highlight: --input, --output, --help, --version, --out-format, --title, --data-dir, --verbose, --quiet
These options were added to enhance compatibility: --css, --doc, --failsafe, --line-number, --line-number-ref, --no-doc, --tab, --output-dir, --src-lang
These switches provide a common highlighter interface for scripts, plugins etc.
If highlight could be invoked with all kinds of input, you can disable parsing of binary files using --validate-input. This flag causes highlight to match the input file header with a list of magic numbers. If a binary file type is detected, highlight quits with an error message. This switch also removes an UTF-8 BOM in the output.
If a file starts with an embedded code section which misses an appropriate opening delimiter, the --start-nested option will switch to the nested language mode. This can be useful with LuaTeX files: highlight luatex.tex --latex --start-nested=inc_luatex
inc_luatex is a Lua language definition with TeX line comments. Note that the nested code section has to end with the ending delimiter defined in the host language definition.
The option --config-file helps to test new config files before installing them. The given file must be a lang or theme file.
highlight --config-file xxx.lang --config-file yyy.theme -I
Use --verbose to display the Lua and syntax data.
Use --validate-input to get rid of UTF8 byte order marks.
Use --stdout to write output files in batch mode to stdout.
Invoke highlight-gui.exe with the --portable switch to save its configuration in text files instead of the registry.
The command line version recognizes these variables:
HIGHLIGHT_DATADIR: sets the path to highlight's configuration scripts HIGHLIGHT_OPTIONS: may contain command line options, but no input file paths.
Configuration files are Lua scripts. Please refer to http://www.lua.org/manual/5.1/manual.html for more details about the Lua syntax.
These constructs are sufficient to edit the scripts:
Variable assigment: name = value (variables have no type, only values have)
Strings: string1="string literal with escape: \n" string2=[[raw string without escape sequence]]
If raw string content starts with "[" or ends with "]", pad the paranthesis with space to avoid a syntax error. Highlight will strip the string. If the string is a regular expression containing a set with a character class like [[:space:]], use string delimiters with a "filler": [=[ regex string ]=]
Comments: -- line comment --[[ block comment ]]
Arrays: array = { first=1, second="2", 3, { 4,5 } } Arrays may contain variables and can be nested.
A language definition describes all elements of a programming language which
will be highlighted by different colours and font types.
Save the new file in langDefs/, using the following name convention:
Examples: PHP -> php.lang, Java -> java.lang If there exist multiple suffixes, list them in filetypes.conf.
Syntax elements:
Keywords = { Id, List|Regex, Group? }
Id: Integer, keyword group id (values 1-4, can be reused for several keyword groups) List: List, list of keywords Regex: String, regular expression Group: Integer, capturing group id of regular expression, defines part of regex which should be returned as keyword (optional; if not set, the match with the highest group number is returned (counts from left to right))
Comments = { {Block, Nested?, Delimiter={Open, Close?} }
Block: Boolean, true if comment is a block comment Nested: Boolean, true if block comments can be nested (optional) Delimiter: List, contains open delimiter regex (line comment) or open and close delimiter regexes (block comment)
Strings = { Delimiter|DelimiterPairs={Open, Close, Raw?}, Escape?, Interpolation?, RawPrefix?, AssertEqualLength? }
Delimiter: String, regular expression which describes string delimiters DelimiterPairs: List, includes open and close delimiter expressions if not equal, includes optional Raw flag as boolean which marks delimiter pair to contain a raw string Escape: String, regex of escape sequences (optional) Interpolation: String, regex of interpolation sequences (optional) RawPrefix: String, defines raw string indicator (optional) AssertEqualLength: Boolean, set true if delimiters must have the same length
PreProcessor = { Prefix, Continuation? }
Prefix: String, regular expression which describes open delimiter Continuation: String, contains line continuation character (optional).
NestedSections = {Lang, Delimiter= {} }
Lang: String, name of nested language Delimiter: List, contains open and close delimiters of the code section
Description: String, Defines syntax description
Digits: String, Regular expression which defines digits (optional)
Identifiers: String, Regular expression which defines identifiers (optional)
Operators: String, Regular expression which defines operators
EnableIndentation: Boolean, set true if syntax may be reformatted and indented
IgnoreCase: Boolean, set true if keyword case should be ignored
Global variables:
The following variables are available within a language definition:
HL_LANG_DIR: path of language definition directory (use with Lua dofile function)
Identifiers: Default regex for identifiers Digits: Default regex for numbers
The following integer variables represent the internal highlighting states:
HL_STANDARD HL_STRING HL_NUMBER HL_LINE_COMMENT HL_BLOCK_COMMENT HL_ESC_SEQ HL_PREPROC HL_PREPROC_STRING HL_OPERATOR HL_INTERPOLATION HL_LINENUMBER HL_KEYWORD HL_STRING_END HL_LINE_COMMENT_END HL_BLOCK_COMMENT_END HL_ESC_SEQ_END HL_PREPROC_END HL_OPERATOR_END HL_INTERPOLATION_END HL_KEYWORD_END HL_EMBEDDED_CODE_BEGIN HL_EMBEDDED_CODE_END HL_IDENTIFIER_BEGIN HL_IDENTIFIER_END HL_UNKNOWN HL_REJECT
The function OnStateChange:
This function is a hook which is called if an internal state changes (e.g. from HL_STANDARD to HL_KEYWORD if a keyword is found). It can be used to alter the new state or to manipulate syntax elements like keyword lists.
OnStateChange(oldState, newState, token, kwGroupID)
Hook Event: Highlighting parser state change Parameters: oldState: old state newState: intended new state token: the current token which triggered the new state kwGroupID: if newState is HL_KEYWORD, the parameter contains the keyword group ID Returns: Correct state to continue OR HL_REJECT
Return HL_REJECT if the recognized token and state should be discarded; the first character of token will be outputted and highlighted as "oldState".
The highlighting is not working as expected? See src/include/codegenerator.hpp for details about the parser.
Examples:
function OnStateChange(oldState, newState, token) if token=="]]" and oldState==HL_STRING and newState==HL_BLOCK_COMMENT_END then return HL_STRING_END end return newState end
This function resolves a Lua parsing issue with the "]]" close delimiter which ends both comments and strings.
function OnStateChange(oldState, newState, token)
if token=="'" and oldState==HL_NUMBER and newState==HL_STRING then
return HL_NUMBER
end
return newState
end
This hook enables correct parsing of C++14 number separator syntax.
See README_PLUGINS for other available functions.
Description="C and C++"
Keywords={ { Id=1, List={"goto", "break", "return", "continue", "asm", "case", "default", -- [..] } }, -- [..] }
Strings = { Delimiter=[["|']], RawPrefix="R", }
Comments = { { Block=true, Nested=false, Delimiter = { [[/*]], [[*/]] } }, { Block=false, Delimiter = { [[//]] } } }
IgnoreCase=false
PreProcessor = { Prefix=[[#]], Continuation="\", }
Operators=[[(|)|[|]|{|}|,|;|.|:|&|<|>|!|=|/|*|%|+|-|~]]
EnableIndentation=true
Please see README_REGEX for the supported regex constructs.
Colour themes contain the formatting information of the language elements which are described in language definitions.
The files have to be stored as *.theme in themes/. Apply a theme with the --style option.
Format attributes:
Attributes = {Colour, Bold?, Italic?, Underline? }
Colour: String, defines colour in HTML hex notation ("#rrggbb") Bold: Boolean, true if font should be bold (optional) Italic: Boolean, true if font should be italic (optional) Underline: Boolean, true if font should be underlined (optional)
Theme elements:
Description: String, Defines theme description
Default = Attributes (Colour of unspecified text)
Canvas = Attributes (Background colour )
Number = Attributes (Formatting of numbers)
Escape = Attributes (Formatting of escape sequences)
String = Attributes (Formatting of strings)
Interpolation = Attributes (Formatting of interpolation sequences)
PreProcessor = Attributes (Formatting of preprocessor directives)
StringPreProc = Attributes (Formatting of strings within preprocessor directives)
BlockComment = Attributes (Formatting of block comments)
LineComment = Attributes (Formatting of line comments)
LineNum = Attributes (Formatting of line numbers)
Operator = Attributes (Formatting of operators)
Keywords= { Attributes1, Attributes2, Attributes3, Attributes4, }
AttributesN: Formatting of keyword group N. There should be at least four items to match the number of keyword groups defined in the language definitions
Example:
Default = { Colour="#000000" } Canvas = { Colour="#ffffff" } Number = { Colour="#000000" } Escape = { Colour="#bd8d8b" } String = { Colour="#bd8d8b" } StringPreProc = { Colour="#bd8d8b" } BlockComment = { Colour="#ac2020", Italic=true } PreProcessor = { Colour="#000000" } LineNum = { Colour="#555555" } Operator = { Colour="#000000" } LineComment = BlockComment
Keywords = { { Colour= "#9c20ee", Bold=true }, { Colour= "#208920" }, { Colour= "#0000ff" }, { Colour= "#000000" }, }
You may define custom keyword groups and corresponding highlighting styles. This is useful if you want to highlight functions of a third party library, macros, constants etc.
You define a new group in two steps:
Define a new group in your language definition:
Keywords = { -- add your keyword description: {Id=5, List = {"ERROR", "DEBUG", "WARN"} } }
Add a corresponding highlighting style in your colour theme:
Keywords= { --add your keyword style as 5th item in the list: { Colour= "#ff0000", Bold=true }, }
It is recommended to define keyword groups in user-defined plugin scripts to avoid editing of original highlight files. See the cpp_qt.lua sample plug-in script and README_PLUGINS for details.
The --plug-in option receives the path of a Lua script which overrides or enhances the settings of theme and language definition files. Plug-ins make it possible to apply costum settings without the need to edit installed configuration files. You can apply multiple plugins by using the --plug-in option more than once.
See README_PLUGINS for a detailed description and examples of packaged plugins.
The script filetypes.conf assigns file extensions and shebang descriptions to language definitions. A configuration is mandatory only if multiple file extensions are linked to one syntax or if a extension is ambiguous. Otherwise the syntax definition whose name corresponds to the input file extension will be applied.
Format:
FileMapping={ { Lang, Filenames|Extensions|Shebang }, }
Lang: String, name of language definition Filenames: list of strings, contains filenames referring to "Lang" Extensions: list of strings, contains file extensions referring to "Lang" Shebang: String, Regular expression which matches the first line of the input file
Behaviour upon ambiguous file extensions:
Edit the file gui_files/ext/fileopenfilter.conf to add new syntax types to the GUI's file open filter.
Configuration scripts are searched in the following directories:
These subdirectories are expected to contain the corresponding scripts: -langDefs: *.lang -themes: *.theme -plugins: *.lua
A custom filetypes.conf may be placed directly in ~/.highlight/. This search order enables you to enhance the installed scripts without the need to copy preinstalled files somewhere else.
See the extras/ subdirectory in the highlight package for some scripts in PHP, Perl and Python which invoke highlight and retrieve its output as string. These scripts may be used as reference to develop plug-ins for other apps.
PP macros file and tutorial are located in extras/pandoc. See README.html for usage instruction and example files as reference.
A SWIG interface file is located in extras/swig. See README_SWIG for installation instructions and the example scripts in Perl, PHP and Python as programming reference.
A TCL extension is located in extras/tcl. See README_TCL for installation instructions.
See the extras/web_plugins subdirectory in the highlight package for some plugins which integrate highlight in Wiki and Blogging software:
-DokuWiki -MovableType -Wordpress -Serendipity
Other uses of highlight can be found on andre-smon.de This site shows several use cases of highlight in projects like Webgit, Evolution, Inkscape, Ranger and more.
The file INSTALL describes the installation from source and includes links to precompiled packages.
Highlight is known to compile with gcc and clang. It depends on Boost headers and Lua 5.x/LuaJit developer packages. The optional GUI depends on Qt5 developer packages. Please see the makefile for further options.
Andre Simon [email protected] http://www.andre-simon.de/
Github project with Git repository, bug tracker: https://github.com/andre-simon/highlight
sf.net project with SVN repository, download mirror, bug tracker, help forum: http://sourceforge.net/projects/syntaxhighlight/