CommonMark parsing and rendering library and program in C
pre
from blocktags scanner. pre
is handled separately
in rule 1 and needn't be handled in rule 6.iframe
to list of blocktags, as per spec change.HRULE
after blank line. This previously caused cmark
to break out of a list, thinking it had two consecutive blanks.S_process_line
ends with \n
(#72).
So S_process_line
sees only unix style line endings. Ultimately we
probably want a better solution, allowing the line ending style of
the input file to be preserved. This solution forces output with newlines.cmark_strbuf_normalize_whitespace
(#73). Now all characters
that satisfy cmark_isspace
are recognized as whitespace. Previously
\r
and \t
(and others) weren't included.--hardbreaks
with \r\n
line breaks (#68).#
s in ATX headercmark_strbuf_printf
and cmark_strbuf_vprintf
.
These are no longer needed, and cause complications for MSVC.
Also removed HAVE_VA_COPY
and HAVE_C99_SNPRINTF
feature tests.CMARK_INLINE
macro.FileNotFoundError
errors on tests when cmark is built from
another project via add_subdirectory()
(Kevin Wojniak).utf8proc
functions to avoid conflict with existing library
(Kevin Wojniak)..pdb
files (Nick Wellnhofer).smart_punct.txt
(see jgm/commonmark.js#61).POSITION_INDEPENDENT_CODE
ON
for static library (see #39).make bench
: allow overriding BENCHFILE
. Previously if you did
this, it would clopper BENCHFILE
with the default bench file.make bench
: Use -10 priority with renice.make_autolink
. Ensures that title is chunk with empty
string rather than NULL, as with other links.clang-check
target.roundtrip_test
and leakcheck
(OGINO Masanori).format
target to Makefile. Removed astyle
target.
Updated .editorconfig
.cmark_render_latex
. New source file: src/latex.hs
.html_block_tag
scanner.
Added new html_block_start
and html_block_start_7
, as well
as html_block_end_n
for n = 1-5. Rewrote block parser for new HTML
block spec.CMARK_OPT_VALIDATE_UTF8
option and command-line option
--validate-utf8
. This option causes cmark to check for valid
UTF-8, replacing invalid sequences with the replacement
character, U+FFFD. Previously this was done by default in
connection with tab expansion, but we no longer do it by
default with the new tab treatment. (Many applications will
know that the input is valid UTF-8, so validation will not
be necessary.)CMARK_OPT_SAFE
option and --safe
command-line flag.
CMARK_OPT_SAFE
. This option disables rendering of raw HTML
and potentially dangerous links.--safe
option in command-line program.cmark.3
man page.scan_dangerous_url
to scanners.CMARK_OPT_SAFE
. Dangerous URLs are those that begin
with javascript:
, vbscript:
, file:
, or data:
(except for
image/png
, image/gif
, image/jpeg
, or image/webp
mime types).api_test
for OPT_CMARK_SAFE
.README.md
on security.render_man
(API change).renderer.[ch]
(#63). To write a
renderer now, you only need to write a character escaping function
and a node rendering function. You pass these to cmark_render
and it handles all the plumbing (including line wrapping) for you.
So far this is an internal module, but we might consider adding
it to the API in the future.!
.[link](foo\(and\(bar\)\))
which it would parse as containing a bare \
followed by
an in-parens chunk ending with the final paren.test/smart_punct.txt
.--smart
, converting
sequences of hyphens to sequences of em and en dashes that contain no
hyphens.process_emphasis
: Fixed setting lower bound to potential openers.
Renamed potential_openers
-> openers_bottom
.
Renamed start_delim
-> stack_bottom
.pathological_test.py
.subject_from_buffer
. This gives bad results in
parsing reference links, where we might have trailing blanks
(finalize
removes the bytes parsed as a reference definition;
before this change, some blank bytes might remain on the line).
column
and first_nonspace_column
fields to parser
.houdini_html_u.c
. An example of the kind of error that was fixed:
≧̸
should be rendered as "≧̸" (U+02267 U+00338), but it was
being rendered as "≧" (which is the same as ≧
).src/html_unescape.gperf
and src/html_unescape.h
.src/entities.h
(generated by tools/make_entities_h.py
).houdini_html_u.c
, and
use the data in src/entities.h
.entities.h
-> entities.inc
, and
tools/make_entities_h.py
-> tools/make_entitis_inc.py
.[ref]: url "title" ok
Here we should parse the first line as a reference.inlines.c
: Added utility functions to skip spaces and line endings.process_line
: Removed "add newline if line doesn't have one."
This isn't actually needed.process_emphasis
."*a_ " * 20000
.process_emphasis
to handle new pathological cases.
Now we have an array of pointers (potential_openers
),
keyed to the delim char. When we've failed to match a potential opener
prior to point X in the delimiter stack, we reset potential_openers
for that opener type to X, and thus avoid having to look again through
all the openers we've already rejected.process_inlines
: remove closers from delim stack when possible.
When they have no matching openers and cannot be openers themselves,
we can safely remove them. This helps with a performance case:
"a_ " * 20000
(jgm/commonmark.js#43).spec_tests.py
: allow →
for tab in HTML examples.normalize.py
: don't collapse whitespace in pre contexts.-m none
, added CMARK_OPTS
.make afl
instructions.cmark.3
to 72 character line width.debug.h
. (It uses GNU extensions, and we don't need it anyway.)api_test
.cmark_chunk
rather than char *
.cmark_consolidate_text_nodes
(#32).is_autolink
in the CommonMark renderer (#50). Previously any
link with an absolute URL was treated as an autolink.snprintf
on Windows (Nick Wellnhofer). On Windows,
snprintf
returns -1 if the output was truncated. Fall back to
Windows-specific _scprintf
.cmark_markdown_to_html
,
cmark_parser_feed
, and cmark_parse_document
from int
to size_t
(#53, Nick Wellnhofer).bufsize_t
for all string sizes and indices.
This allows to switch to 64-bit string buffers by changing a single
typedef and a macro definition (Nick Wellnhofer).strbuf
code, checking for integer overflows and
adding range checks (Nick Wellnhofer).cmark_strbuf_attach
(Nick Wellnhofer).-Wshorten-64-to-32
warns about (Nick Wellnhofer).cmark_strbuf_safe_strlen
that converts
from size_t
to bufsize_t
and throws an error in case of
an overflow (Nick Wellnhofer).strbuf
out of memory errors (Nick Wellnhofer).
Previously such errors were not being trapped. This involves
some internal changes to the buffer
library that do not affect
the API.S_find_first_nonspace
in S_proces_line
.
Added fields offset
, first_nonspace
, indent
, and blank
to cmark_parser
struct. This just removes some repetition.-pg
from Debug build flags (#47).make leakcheck
. We now return an error status if anything
in the loop fails. We now check --smart
and --normalize
options.wrapper3.py
, made wrapper.py
work with python 2 and 3.
Also improved the wrapper to work with Windows, and to use smart
punctuation (as an example).wrapper.rb
, added argument for options._
emphasis parsing to conform to spec (jgm/CommonMark#317).spec.txt
.-DCMARK_STATIC_DEFINE
(Nick Wellnhofer).CMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_NO_WARNINGS
to suppress warnings.appveyor.yml
).os.path.join
in test/cmark.py
for proper cross-platform paths.Makefile.nmake
.make afl
: added test/afl_dictionary
, increased timeout for hangs.utf8proc_iterate
(#27).strbuf_printf
instead of snprintf
. snprintf
is not available on some platforms (Visual Studio 2013 and earlier).[link](<>)
.cmark_render_commonmark
. In addition to options, this takes a width
parameter. A value of 0 disables wrapping; a positive value wraps the document to the specified width. Note that width is automatically set to 0 if the CMARK_OPT_HARDBREAKS
option is set.cmark
executable now allows -t commonmark
for output as CommonMark. A --width
option has been added to specify wrapping width.roundtrip_test
Makefile target. This runs all the spec through the commonmark renderer, and then through the commonmark parser, and compares normalized HTML to the test. All tests pass with the current parser and renderer, giving us some confidence that the commonmark renderer is sufficiently robust. Eventually this should be pythonized and put in the cmake test routine.blocks.c
. By the time we check for a list start, we've already checked for a horizontal rule, so we don't need to repeat that check here. Thanks to Robin Stocker for pointing out a similar redundancy in commonmark.js.cmark_strbuf_unescape
(buffer.c
). The old function gave incorrect results on input like \\*
, since the next backslash would be treated as escaping the *
instead of being escaped itself.scanners.re
: added _scan_scheme
, scan_scheme
, used in the commonmark renderer.CMAKE_C_COMPILER
(not CC_COMPILER
) when setting C flags.CMARK_OPT_DEFAULT
(Nick Wellnhofer).cmark_markdown_to_html
.CMARK_NODE_LINK_LABEL
.make leakcheck
now checks all output formats.test/cmark.py
: set default options for markdown_to_html
.cmark.h
: Add missing argument to cmark_parser_new
(#12).make prof
target.CMARK_OPT_SMART
and --smart
option, smart.c
, smart.h
.--smart
option.cmark_parser_finish
.cmark_parse_document
, cmark_parse_file
.cmark.3
man page.options
an int rather than a long, for consistency.cmark_node
struct to fit into 128 bytes.
This gives a small performance boost and lowers memory usage.delimiter
struct to avoid hole.parser->current
in the loop that creates new
blocks, since finalize
in add_child
may have removed
the current parser (if it contains only reference definitions).
This isn't a great solution; in the long run we need to rewrite
to make the logic clearer and to make it harder to make
mistakes like this one.make asan
will link against ASan; the
resulting executable will do checks for memory access issues.
Thanks @JordanMilne for the suggestion.$AFL_PATH
must point to the directory containing the AFL
binaries. It can be set as an environment variable or passed to make on
the command line.cmake
is run again only when necessary (Nick Wellnhofer).INSTALL_PREFIX
to the Makefile, allowing installation to a location other than /usr/local
without invoking cmake
manually (Nick Wellnhofer).make test
now guarantees that the project will be rebuilt before tests are run (Nick Wellnhofer).CMARK_VERSION
, CMARK_VERSION_STRING
) and as symbols (cmark_version
, cmark_version_string
) (Nick Wellnhofer). All of these come from cmark_version.h
, which is constructed from a template cmark_version.h.in
and data in CMakeLists.txt
.free
on null pointer.cmark_iter_get_root
)._
.ends_with_blank_line
(#286).