CommonMark parsing and rendering library and program in C
Update to 0.31.2 spec.txt.
Treat unicode Symbols like Punctuation, as per the 0.31 spec.
Add a new function to utf8.h
:
int cmark_utf8proc_is_punctuation_or_symbol(int32_t uc)
.
The old cmark_utf8proc_is_punctuation
has been kept for
now, but it is no longer used.
Add new exported function cmark_parser_new_with_mem_into_root
(API change) (John Ericson).
Avoid repeated language-
in info string (commonmark/commonmark.js#277).
Fix quadratic behavior in S_insert_emph
(Nick Wellnhofer).
Fixes part of GHSA-66g8-4hjf-77xh.
Fix quadratic behavior in check_open_blocks
(Nick Wellnhofer).
Fixes part of GHSA-66g8-4hjf-77xh.
Track underscore bottom separately mod 3, like asterisk (Michael Howell). This was already implemented correctly for asterisks, but not for underscore.
Use fwrite
instead of printf
to print results in main (#523).
This avoids a massive slowdown in MSYS2.
commonmark writer: less aggressive escaping for !
(#131).
Update libFuzzer build (Nick Wellnhofer):
-fsanitize=fuzzer(-no-link)
without requiring LIB_FUZZER_PATH
.libFuzzer
rule in Makefile and the README.md.CMake build changes (Saleem Abdulrasool).
CMARK_STATIC
and CMARK_SHARED
options as one of the two
must be enabled always as the cmark executable depends on the library.
Instead of having a custom flag to discern between the
library type, use the native CMake option BUILD_SHARED_LIBS
,
allowing the user to control which library to build. This matches
CMake recommendations to only build a single copy of the library.CMARK_SHARED
and
CMARK_STATIC
to redirect the author of the dependent package to
BUILD_SHARED_LIBS
./OPT:REF
and /OPT:ICF
will trigger full links, which is the default in
release mode.cmake/modules
directory that is the common layout for CMake based
projects.__builtin_expect
. Use __has_builtin
to check
at compile time if the feature is supported.
This macro is supported by both clang and GCC (as of 10).
In the case that the compiler in use is not new enough, we still
provide the fallback so that the code will compile but without the
additional hints for the branch probability. config.h
has been
removed from the code base as it is no longer needed./TP
usage on MSVC and replace CMARK_INLINE
with inline
.
These were workarounds for pre-VS2015 compilers, which are no longer
supported./W3
from the language flags under MSVC with
CMP0092. Set the policy to new to avoid the D9025 warning.CMARK_STATIC_DEFINE
.project
functions. They should be set immediately after
cmake_minimum_required
(which implicitly sets policies).
Use the POLICY
check to see if a policy is defined rather
than using a version check.CMARK_TESTS
with CMake sanctioned BUILD_TESTING
.FindPythonInterp
, and with CMake 3.27, were
obsoleted with CMP0148. Add a version check and switch to the new
behaviour to allow building with newer releases.Fix regex syntax warnings in pathological_tests.py
(Nick Wellnhofer).
test/cmark.py
: avoid star imports (Jakub Wilk).
spec_tests.py
: Add option to generate fuzz corpus (Nick Wellnhofer).
Add an option --fuzz-corpus
that writes the test cases to separate
files including the options header, so they can be used as seed corpus
for fuzz testing.
Fix some cmark.3 man rendering issues so we can do a clean regen (John Ericson).
Update Windows compilation instructions (Pomax, #525).
Fix quadratic complexity bug with repeated ![[]()
.
Resolves CVE-2023-22486. Add new pathological test. (John MacFarlane)
Allow declarations with no space, as per spec (#456, John MacFarlane).
Set enumi*
counter correctly in LaTeX output (#451, John MacFarlane).
Allow <!DOCTYPE
to be case-insensitive. (This conforms to the
existing spec.) (John MacFarlane)
Fixed HTML comment scanning. Need to handle this case: <!--> and -->
.
Since the scanner finds the longest match, we had to
move some of the logic outside of the scanner. (John MacFarlane)
Fix quadratic parsing issue with repeated <!--
(this was not
introduced by the previous fix, and not in a released version of cmark).
Resolves CVE-2023-22484. Add new pathological test. (John MacFarlane)
Update HTML comment scanner to accord with commonmark/commonmark-spec#713 (John MacFarlane).
Pathological tests: half the number of repetitions, and the timeout. This reduces the time needed for the pathological tests. (John MacFarlane)
Shrink struct cmark_node
(#446). The internal_offset
member is
only used for headings and can be moved to struct cmark_heading
.
This reduces the size of struct cmark_node
from 112 to 104 bytes on
64-bit systems. (Nick Wellnhofer)
Add -Wstrict-prototypes
and fix offending functions. (Nick
Wellnhofer, Dan Cîrnaț)
Fix quadratic behavior involving get_containing_block
(#431).
Instead of searching for the containing block, update the tight list
status when entering a child of a list item or exiting a list.
(Nick Wellnhofer)
Fix pathological_tests.py
(Nick Wellnhofer):
allowed_failures
test.Fix source position bug with backticks (kyle).
Fix parsing of emphasis before links (#424, Nick Wellnhofer). Fixes a regression introduced with commit ed0a4bf.
Update to Unicode 14.0 (data-man).
Add ~
to safe href character set (#394, frogtile).
Update CMakeLists.txt (Saleem Abdulrasool). Bump the minimum required CMake to 3.7. Imperatively define output name for static library.
Fix install paths in libcmark.pc (Sebastián Mancilla).
CMAKE_INSTALL_<dir>
can be relative or absolute path, so it is wrong to
prefix CMAKE_INSTALL_PREFIX
because if CMAKE_INSTALL_<dir>
is set to an
absolute path it will result in a malformed path with two absolute paths
joined together. Instead, use CMAKE_INSTALL_FULL_<dir>
from
GNUInstallDirs.
openers_bottom
. Besides causing undefined
behavior when reading a dangling pointer, this could also result
in quadratic behavior when parsing emphasis.openers_bottom
optimization from kicking in and leading to quadratic behavior when
processing lots of quotes.cmark_add_compile_options
when a new compilation target is added.cmark_get_default_mem_allocator()
(#330). API change: this
adds a new exported function in cmark.h.openers_bottom
table not just by the type of delimiter and the length of the
closing delimiter mod 3, but by whether the closing delimiter
can also be an opener. (The algorithm for determining emphasis
matching depends on all these factors.) Add regression test.<?
, <!DECL
or <![CDATA[
could
lead to quadratic behavior if no matching ending sequence was found.
Separate the inline HTML scanners. Remember if scanning the whole input
for a specific ending sequence failed and skip subsequent scans.cmark_node_append_child
to append
nodes. This public function has a sanity check which is linear in the
depth of the tree. Repeated calls could show quadratic behavior in
degenerate trees. Use a special function to append nodes without this
check. (Issue found by OSS-Fuzz.)textarea
like script
, style
, pre
(type 1 HTML block),
in accordance with spec change.MAX_INDENT
for xml (#355). Otherwise we can get quadratic
increase in size with deeply nested structures.houdini_html_u.c
rather than
the entity handling in inlines.c
. Unlike the other,
this approach works also in e.g. link titles.print_usage()
: Minor grammar fix, swap two words (#305, Øyvind A. Holm).memcpy
with NULL
as first parameter.
This is illegal according to the C standard, sec. 7.1.4.
See https://www.imperialviolet.org/2016/06/26/nonnull.html.blocks.c
.is_autolink
(Nick Wellnhofer). In a recent commit,
the check was changed to strcmp
, but we really have to use strncmp
.is_autolink
(Nick Wellnhofer).
Introduced by a recent commit. Found by OSS-Fuzz.cmark_node
to cmark_parser
.
When finalizing nodes that allow inlines (paragraphs and headings),
detach the strbuf and store the block content in the node's data/len
members. Free the block content after processing inlines.
Reduces size of struct cmark_node
by 8 bytes.struct cmark_list
(Nick Wellnhofer).struct cmark_node
.CMARK_OPT_SMART
is enabled, we escape literal -
, .
, and quote
characters when needed to avoid their being "smartified."cmark_renderer
.size_t
instead of int
.string.h
in cmark-fuzz.c
.cmark_reference_lookup
(when it is expected
that no further references will be added to the reference map), the
linked list of references is written into a fixed-size array.cmark_reference_lookup
query in O(log n). Any further lookup calls
will also be O(log n), since the sorted references table only needs to
be generated once.
The resulting implementation is notably simple (as it uses standard
library builtins qsort
and bsearch
), whilst performing better than
the fixed size hash table in documents that have a high number of
references and never becoming pathological regardless of the input.cmark_strbuf_cstr
in buffer.h
.--safe
command-line option as a no-op (#344), for backwards
compatibility.add_subdirectory
or be installed into the system (or some other
directory) and be found with find_package(cmark)
. In both cases the
cmake target cmark::cmark
and/or cmark::cmark_static
is all that
is needed to be linked. Previously the cmarkConfig.cmake
file
was generated, but not installed. As additional bonus of generation
by cmake we get a generated cmake-config-version.cmake
file for
find_package()
to search for the same major version.
The generated config file is position independent, allowing the
installed directory to be copied or moved and still work.
The following four files are generated and installed:
lib/cmake/cmark/cmark-config.cmake
,
lib/cmake/cmark/cmark-config-version.cmake
,
lib/cmake/cmark/cmark-targets.cmake
,
lib/cmake/cmark/cmark-targets-release.cmake
.-Wconst-qual
warning (Saleem Abdulrasool). This enables building
with /Zc:strictString
with MSVC as well.CMAKE_INCLUDE_CURRENT_DIRECTORY
.GNUInstallDirs
once.add_compile_definitions
with add_compile_options
since the former was introduced in 3.12 (#321).LINKER_LANGUAGE
property for C++ runtime.add_compile_options
rather than modify CMAKE_C_FLAGS
.-fvisibilty
flags to top-level.CMARK_STATIC
is on (default), link the executable with the static
library. This produces exactly the same result as compiling the library
sources again and linking with the object files.
If CMARK_STATIC
is off, link the executable with the shared library.
This wasn't supported before and should be the preferred way to
package cmark on Linux distros.
Building only a shared library and a statically linked executable
isn't supported anymore but this doesn't seem useful.html.escape
instead of cgi.escape
(#313).__name__
check.__name__
check in test scripts (Nick Wellnhofer).--normalize
which no longer exists (#332).[0.29.0]
CMARK_OPT_UNSAFE
and make CMARK_OPT_SAFE
a no-op (for API compatibility). The new default behavior is to suppress raw HTML and potentially dangerous links. The CMARK_OPT_UNSAFE
option has to be set explicitly to prevent this. NOTE: This change will require modifications in bindings for cmark and in most libraries and programs that use cmark. Borrows heavily from @kivikakk's patch in github/cmark-gfm#123.<>
link destination in reference link.memory.h
(#290).<
unless it is an angle-bracket link that also ends with >
(#289). (If your URL really starts with <
, URL-escape it.)references.h
in parser.h
(#287).[link](<foo\>)
.ends_with_blank_line
with S_
prefix.CMARK_NODE__LAST_LINE_CHECKED
flag (#284). Use this to avoid unnecessary recursion in ends_with_blank_line
.ends_with_blank_line
, call S_set_last_line_blank
to avoid unnecessary repetition (#284). Once we settle whether a list item ends in a blank line, we don't need to revisit this in considering parent list items.(
in parenthesized link title.subj
since it's been modified while parsing the subject and could represent line info from a future line. This is simple and works.render.c
: reset last_breakable
after cr. Fixes jgm/pandoc#5033.houdini_href_e.c
(Felix Yan).~~~
fences if info string contains backtick. This is needed for round-trip tests.cmark -t xml
back to Commonmark.add_compiler_export_flags()
(Jonathan Müller). It is deprecated in CMake 3.0, the replacement is to set the CXX_VISIBILITY_PRESET
(or in our case C_VISIBILITY_PRESET
) and VISIBILITY_INLINES_HIDDEN
properties of the target. We're already setting them by setting the CMake variables anyway, so the call can be removed.cmake_host_system_information does not recognize <key> VS_15_DIR
. CMake will not find these system libraries on non-Windows hosts anyways, and we were silencing the warnings, so simply omit the installation when cross-compiling to Windows.xml:space="preserve"
in XML output when appropriate (Nguyễn Thái Ngọc Duy). (For text, code, code_block, html_inline and html_block tags.)entity_tests.py
- omit noisy success output.pathological_tests.py
: make tests run faster. Commented out the (already ignored) "many references" test, which times out. Reduced the iterations for a couple other tests.pathological_tests.py
: added test for deeply nested lists.S_find_first_nonspace
. We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to Martin Mitas for diagnosing the problem.->size
is zero so renderer.buffer->ptr[renderer.buffer->size - 1]
will cause an out-of-bounds read. Empty buffers always point to the global cmark_strbuf__initbuf
buffer so we read cmark_strbuf__initbuf[-1]
.CMARK_SHARED=OFF
(Nick Wellnhofer).width
parameter to be generated too so we get better fuzz-coverage.pathological_tests.py
. This allows us to include tests that we don't yet know how to pass.pathological_tests.py
. Tests must complete in 8 seconds or are errors.--smart
: open quote can never occur right after ]
or )
(#227).finalize
(Vicent Marti).CMAKE_INSTALL_LIBDIR
to create libcmark.pc
(#236). This wasn't getting set in processing libcmark.pc.in
, and we were getting the wrong entry in libcmark.pc
. The new approach sets an internal libdir
variable to lib${LIB_SUFFIX}
. This variable is used both to set the install destination and in the libcmark.pc.in template.make astyle
with make format
(Nguyễn Thái Ngọc Duy).Update spec.
Use unsigned integer when shifting (Phil Turnbull). Avoids a UBSAN warning which can be triggered when handling a long sequence of backticks.
Avoid memcpy'ing NULL pointers (Phil Turnbull). Avoids a UBSAN warning when link title is empty string. The length of the memcpy is zero so the NULL pointer is not dereferenced but it is still undefined behaviour.
DeMorgan simplification of some tests in emphasis parser. This also brings the code into closer alignment with the wording of the spec (see jgm/CommonMark#467).
Fixed undefined shift in commonmark writer (#211). Found by google/oss-fuzz: https://oss-fuzz.com/v2/testcase-detail/4686992824598528.
latex writer: fix memory overflow (#210).
We got an array overflow in enumerated lists nested more than
10 deep with start number =/= 1.
This commit also ensures that we don't try to set enum_
counters
that aren't defined by LaTeX (generally up to enumv).
Found by google/oss-fuzz:
https://oss-fuzz.com/v2/testcase-detail/5546760854306816.
Check for NULL pointer in get_link_type (Phil Turnbull).
echo '[](xx:)' | ./build/src/cmark -t latex
gave a
segfault.
Move fuzzing dictionary into single file (Phil Turnbull). This allows AFL and libFuzzer to use the same dictionary
Reset bytes after UTF8 proc (Yuki Izumi, #206).
Don't scan past an EOL (Yuki Izumi).
The existing negated character classes ([^…]
) are careful to
always include \x00
in the characters excluded, but these .
catch-alls can scan right past the terminating NUL placed
at the end of the buffer by _scan_at
. As such, buffer
overruns can occur. Also, don't scan past a newline in HTML
block end scanners.
Document cases where get_
functions return NULL
(#155).
E.g. cmark_node_get_url
on a non-link or image.
Properly handle backslashes in link destinations (#192). Only ascii punctuation characters are escapable, per the spec.
Fixed cmark_node_get_list_start
to return 0 for bullet lists,
as documented (#202).
Use CMARK_NO_DELIM
for bullet lists (#201).
Fixed code for freeing delimiter stack (#189).
Removed abort outside of conditional (typo).
Removed coercion in error message when aborting from buffer.
Print message to stderr when we abort due to memory demands (#188).
libcmark.pc
: use CMAKE_INSTALL_LIBDIR
(#185, Jens Petersen).
Needed for multilib distros like Fedora.
Fixed buffer overflow error in S_parser_feed
(#184).
The overflow could occur in the following condition:
the buffer ends with \r
and the next memory address
contains \n
.
Update emphasis parsing for spec change.
Strong now goes inside Emph rather than the reverse,
when both scopes are possible. The code is much simpler.
This also avoids a spec inconsistency that cmark had previously:
***hi***
became Strong (Emph "hi")) but
***hi****
became Emph (Strong "hi")) "*"
Fixes for the LaTeX renderer (#182, Doeme)
\-
is a hyphenation, so it doesn't get displayed at all.Added a test for NULL when freeing subj->last_delim
.
Cleaned up setting of lower bounds for openers. We now use a much smaller array.
Fix #178, quadratic parsing bug. Add pathological test.
Slight improvement of clarity of logic in emph matching.
Fix "multiple of 3" determination in emph/strong parsing.
We need to store the length of the original delimiter run,
instead of using the length of the remaining delimiters
after some have been subtracted. Test case:
a***b* c*
. Thanks to Raph Levin for reporting.
Correctly initialize chunk in S_process_line (Nick Wellnhofer, #170).
The alloc
member wasn't initialized. This also allows to add an
assertion in chunk_rtrim
which doesn't work for alloced chunks.
Added 'make newbench'.
scanners.c
generated with re2c 0.16 (68K smaller!).
scanners.re
- fixed warnings; use *
for fallback.
Fixed some warnings in scanners.re
.
Update CaseFolding to latest (Kevin Wojniak, #168).
Allow balanced nested parens in link destinations (Yuki Izumi, #166)
Allocate enough bytes for backticks array.
Inlines: Ensure that the delimiter stack is freed in subject.
Fixed pathological cases with backtick code spans:
Removed recursion in scan_to_closing_backticks
Added an array of pointers to potential backtick closers to subject
This array is used to avoid traversing the subject again when we've already seen all the potential backtick closers.
Added a max bound of 1000 for backtick code span delimiters.
This helps with pathological cases like:
x
x `
x ``
x ```
x ````
...
Added pathological test case.
Thanks to Martin Mitáš for identifying the problem and for discussion of solutions.
Remove redundant cmake_minimum_required (#163, @kainjow).
Make shared and static libraries optional (Azamat H. Hackimov).
Now you can enable/disable compilation and installation targets for
shared and static libraries via -DCMARK_SHARED=ON/OFF
and
-DCMARK_STATIC=ON/OFF
.
Added support for built-in ${LIB_SUFFIX}
feature (Azamat H.
Hackimov). Replaced ${LIB_INSTALL_DIR}
option with built-in
${LIB_SUFFIX}
for installing for 32/64-bit systems. Normally,
CMake will set ${LIB_SUFFIX}
automatically for required enviroment.
If you have any issues with it, you can override this option with
-DLIB_SUFFIX=64
or -DLIB_SUFFIX=""
during configuration.
Add Makefile target and harness to fuzz with libFuzzer (Phil Turnbull).
This can be run locally with make libFuzzer
but the harness will be
integrated into oss-fuzz for large-scale fuzzing.
Advertise --validate-utf8
in usage information
(Nguyễn Thái Ngọc Duy).
Makefile: use warnings with re2c.
README: Add link to Python wrapper, prettify languages list (Pavlo Kapyshin).
README: Add link to cmark-scala (Tim Nieradzik, #196)