Manticoresearch Versions Save

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon

6.2.12

8 months ago

Manticore Search 6.2.12

Released: Aug 23rd 2023

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Version 6.2.12 continues the 6.2 series and addresses issues discovered after the release of 6.2.0.

Bugfixes

  • Issue #1351 "Manticore 6.2.0 doesn't start via systemctl on Centos 7": Modified TimeoutStartSec from infinity to 0 for better compatibility with Centos 7.
  • Issue #1364 "Crash after upgrading from 6.0.4 to 6.2.0": Added replay functionality for empty binlog files from older binlog versions.
  • PR #1334 "fix typo in searchdreplication.cpp": Corrected a typo in searchdreplication.cpp: beggining -> beginning.
  • Issue #1337 "Manticore 6.2.0 WARNING: conn (local)(12), sock=8088: bailing on failed MySQL header, AsyncNetInputBuffer_c::AppendData: error 11 (Resource temporarily unavailable) return -1": Lowered the verbosity level of the MySQL interface warning about the header to logdebugv.
  • Issue #1355 "join cluster hangs when node_address can't be resolved": Improved replication retry when certain nodes are unreachable, and their name resolution fails. This should resolve issues in Kubernetes and Docker nodes related to replication. Enhanced the error message for replication start failures and made updates to test model 376. Additionally, provided a clear error message for name resolution failures.
  • Issue #1361 "No lower case mapping for "Ø" in charset non_cjk": Adjusted the mapping for the 'Ø' character.
  • Issue #1365 "searchd leaves binlog.meta and binlog.001 after clean stop": Ensured that the last empty binlog file is removed properly.
  • Commit 0871: Fixed the Thd_t build issue on Windows related to atomic copy restrictions.
  • Commit 1cc0: Addressed an issue with FT CBO vs ColumnarScan.
  • Commit c6bf: Made corrections to test 376 and added a substitution for the AF_INET error in the test.
  • Commit cbc3: Resolved a deadlock issue during replication when updating blob attributes versus replacing documents. Also removed the rlock of the index during commit because it's already locked at a more basic level.

Minor changes

  • Commit 4f91 Updated info on /bulk endpoints in the manual.

MCL

6.2.0

9 months ago

Manticore Search 6.2.0

Released: Aug 4th 2023

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Major changes

  • The query optimizer has been enhanced to support full-text queries, significantly improving search efficiency and performance.
  • Integrations with:
  • We've started using GitHub workflows, making it simpler for contributors to utilize the same Continuous Integration (CI) process that the core team applies when preparing packages. All jobs can be run on GitHub-hosted runners, which facilitates seamless testing of changes in your fork of Manticore Search.
  • We've started using CLT to test complex scenarios. For example, we're now able to ensure that a package built after a commit can be properly installed across all supported Linux operating systems. The Command Line Tester (CLT) provides a user-friendly way to record tests in an interactive mode and to effortlessly replay them.
  • Significant performance improvement in count distinct operation by employing a combination of hash tables and HyperLogLog.
  • Enabled multithreaded execution of queries containing secondary indexes, with the number of threads limited to the count of physical CPU cores. This should considerably improve the query execution speed.
  • pseudo_sharding has been adjusted to be limited to the number of free threads. This update considerably enhances the throughput performance.
  • Users now have the option to specify the default attribute storage engine via the configuration settings, providing better customization to match specific workload requirements.
  • Support for Manticore Columnar Library 2.2.0 with numerous bug fixes and improvements in Secondary indexes.

Minor changes

  • Buddy #153: The /pq HTTP endpoint now serves as an alias for the /json/pq HTTP endpoint.
  • Commit 0bf1: We've ensured multi-byte compatibility for upper() and lower().
  • Commit 2bb9: Instead of scanning the index for count(*) queries, a precalculated value is now returned.
  • Commit 3c84: It's now possible to use SELECT for making arbitrary calculations and displaying @@sysvars. Unlike before, you are no longer limited to just one calculation. Therefore, queries like select user(), database(), @@version_comment, version(), 1+1 as a limit 10 will return all the columns. Note that the optional 'limit' will always be ignored.
  • Commit 6aca: Implemented the CREATE DATABASE stub query.
  • Commit 9dc1: When executing ALTER TABLE table REBUILD SECONDARY, secondary indexes are now always rebuilt, even if attributes weren't updated.
  • Commit 46ed: Sorters utilizing precalculated data are now identified before using CBO to avoid unnecessary CBO calculations.
  • Commit 102a: Implementing mocked and utilizing of the full-text expression stack to prevent daemon crashes.
  • Commit 979f: A speedy code path has been added for match cloning code for matches that don't use string/mvas/json attributes.
  • Commit a073: Added support for the SELECT DATABASE() command. However, it will always return Manticore. This addition is crucial for integrations with various MySQL tools.
  • Commit bc04: Modified the response format of the /cli endpoint, and added the /cli_json endpoint to function as the previous /cli.
  • Commit d70b: The thread_stack can now be altered during runtime using the SET statement. Both session-local and daemon-wide variants are available. Current values can be accessed in the show variables output.
  • Commit d96e: Code has been integrated into CBO to more accurately estimate the complexity of executing filters over string attributes.
  • Commit e77d: The DocidIndex cost calculation has been improved, enhancing overall performance.
  • Commit f3ae: Load metrics, similar to 'uptime' on Linux, are now visible in the SHOW STATUS command.
  • Commit f3cc: The field and attribute order for DESC and SHOW CREATE TABLE now match that of SELECT * FROM.
  • Commit f3d2: Different internal parsers now provide their internal mnemonic code (e.g., P01) during various errors. This enhancement aids in identifying which parser caused an error and also obscures non-essential internal details.
  • Issue #271 "Sometimes CALL SUGGEST does not suggest a correction of a single letter typo": Improved SUGGEST/QSUGGEST behaviour for short words: added the option sentence to show the entire sentence
  • Issue #696 "Percolate index does not search properly by exact phrase query when stemming enabled": The percolate query has been modified to handle an exact term modifier, improving search functionality.
  • Issue #829 "DATE FORMATTING methods": added the date_format() select list expression, which exposes the strftime() function.
  • Issue #961 "Sorting buckets via HTTP JSON API": introduced an optional sort property for each bucket of aggregates in the HTTP interface.
  • Issue #1062 "Improve error logging of JSON insert api failure - "unsupported value type"": The /bulk endpoint reports information regarding the number of processed and non-processed strings (documents) in case of an error.
  • Issue #1070 "CBO hints don't support multiple attributes": Enabled index hints to handle multiple attributes.
  • Issue #1106 "Add tags to http search query": Tags have been added to HTTP PQ responses.
  • Issue #1301 "buddy should not create table in parallel": Resolved an issue that was causing failures from parallel CREATE TABLE operations. Now, only one CREATE TABLE operation can run at a time.
  • Issue #1303 "add support of @ to column names".
  • Issue #1316 "Queries on taxi dataset are slow with ps=1": The CBO logic has been refined, and the default histogram resolution has been set to 8k for better accuracy on attributes with randomly distributed values.
  • Issue #1317 "Fix CBO vs fulltext on hn dataset": Enhanced logic has been implemented for determining when to use bitmap iterator intersection and when to use a priority queue.
  • Issue #1318 "columnar: change iterator interface to single-call" : Columnar iterators now use a single Get call, replacing the previous two-step AdvanceTo + Get calls to retrieve a value.
  • Issue #1319 "Aggregate calc speedup (remove CheckReplaceEntry?)": The CheckReplaceEntry call was removed from the group sorter to expedite the calculation of aggregate functions.
  • Issue #1320 "create table read_buffer_docs/hits doesn't understand k/m/g syntax": The CREATE TABLE options read_buffer_docs and read_buffer_hits now support k/m/g syntax.
  • Language packs for English, German and Russian can now be effortlessly installed on Linux by executing the command apt/yum install manticore-language-packs. On macOS, use the command brew install manticoresoftware/tap/manticore-language-packs.
  • Field and attribute order is now consistent between SHOW CREATE TABLE and DESC operations.
  • If disk space is insufficient when executing INSERT queries, new INSERT queries will fail until enough disk space becomes available.
  • The UINT64() type conversion function has been added.
  • The /bulk endpoint now processes empty lines as a commit command. More info here.
  • Warnings have been implemented for invalid index hints, providing more transparency and allowing for error mitigation.
  • When count(*) is used with a single filter, queries now leverage precalculated data from secondary indexes when available, substantially speeding up query times.

⚠️ Breaking changes

  • ⚠️ Document IDs are now handled as unsigned 64-bit integers during indexing and INSERT operations.
  • ⚠️ The syntax for query optimizer hints has been updated. The new format is /*+ SecondaryIndex(uid) */. Please note that the old syntax is no longer supported.
  • ⚠️ Issue #1160: The usage of @ in table names has been disallowed to prevent syntax conflicts.
  • ⚠️ String fields/attributes marked as indexed and attribute are now regarded as a single field during INSERT, DESC, and ALTER operations.
  • ⚠️ Issue #1057: MCL libraries will no longer load on systems that don't support SSE 4.2.

Bugfixes

  • Commit 2a6e "Crash on DROP TABLE": resolved a problem causing extended wait times to finish write operations (optimize, disk chunk save) on an RT table when executing a DROP TABLE statement. Added a warning to notify when a table directory is not empty after executing a DROP TABLE command.
  • Commit 2ebd: Support for columnar attributes, which was missing in the code used for grouping by multiple attributes, has been added.
  • Commit 3be4 Resolved a crash issue potentially caused by disk space running out by properly handling write errors in binlog.
  • Commit 6adb: A crash that occasionally occurred when using multiple columnar scan iterators (or secondary index iterators) in a query has been fixed.
  • Commit 6bd9: Filters were not being removed when using sorters that use precalculated data. This issue has been fixed.
  • Commit 6d03: The CBO code has been updated to provide better estimates for queries using filters over row-wise attributes executed in multiple threads.
  • Commit 6dd3, Helm #56 "fatal crash dump in Kubernetes cluster": Fixed a defective bloom filter for the JSON root object; fixed daemon crash due to filtering by a JSON field.
  • Commit 6e1b Rectified daemon crash caused by invalid manticore.json config.
  • Commit 6fbc Fixed the json range filter to support int64 values.
  • Commit 9c67 .sph files could be corrupted ALTER. Fixed.
  • Commit 77cc: A shared key has been added for the replication of the replace statement to resolve a pre_commit error occurring when replace is replicated from multiple master nodes.
  • Commit 2884 resolved issues with bigint checks over functions like 'date_format()'.
  • Commit 9513: Iterators are no longer displayed in SHOW META when sorters utilize precalculated data.
  • Commit a2a7: The fulltext node stack size has been updated to prevent crashes on complex fulltext queries.
  • Commit a062: A bug causing a crash during the replication of updates with JSON and string attributes has been resolved.
  • Commit b3e6: The string builder has been updated to use 64-bit integers to avoid crashes when dealing with large data sets.
  • Commit c472: Addressed a crash that was occurring with count distinct across multiple indexes.
  • Commit d073: Fixed an issue where queries over disk chunks of RT indexes could be executed in multiple threads even if pseudo_sharding was disabled.
  • Commit d205 The set of values returned by the show index status command has been modified and now varies depending on the type of index in use.
  • Commit e9bc Fixed an HTTP error when processing bulk requests and an issue where the error wasn't being returned to the client from the net loop.
  • Commit f77c use of an extended stack for PQ.
  • Commit fac2 Updated the export ranker output to align with packedfactors().
  • Commit ff87: Fixed an issue with the string list in the filter of the SphinxQL query log.
  • Issue #589 "The charset definition seems to depend on the ordering of codes": Fixed incorrect charset mapping for duplicates.
  • Issue #811 "Mapping multiple words in word forms interferes phrase search with CJK punctuations between keywords": Fixed ngram token position within phrase query with wordforms.
  • Issue #834 "Equals sign in search query breaks request": Ensured the exact symbol can be escaped and fixed double exact expansion by the expand_keywords option.
  • Issue #864 "exceptions/stopwords conflict"
  • Issue #910 "Manticore crash when calling call snippets() with libstemmer_fr and index_exact_words": Resolved internal conflicts causing crashes when SNIPPETS() was called.
  • Issue #946 "Duplicate records during SELECT": Fixed the issue of duplicate documents in the result set for a query with not_terms_only_allowed option to RT index with killed documents.
  • Issue #967 "Using JSON arguments in UDF functions leads to a crash": Fixed a daemon crash when processing a search with pseudo-sharding enabled and UDF with JSON argument.
  • Issue #1050 "count(*) in FEDERATED": Fixed a daemon crash occurring with a query through a FEDERATED engine with aggregate.
  • Issue #1052 Fixed an issue where rt_attr_json column was incompatible with columnar storage.
  • Issue #1072 "* is removed from search query by ignore_chars": Fixed this issue so wildcards in a query aren't impacted by ignore_chars.
  • Issue #1075 "indextool --check fails if there's a distributed table": indextool is now compatible with instances having 'distributed' and 'template' indexes in the json config.
  • Issue #1081 "particular select on particular RT dataset leads to crash of searchd": Resolved daemon crash on a query with packedfactors and large internal buffer.
  • Issue #1095 "With not_terms_only_allowed deleted documents are ignored"
  • Issue #1099 "indextool --dumpdocids is not working": Restored functionality of the --dumpdocids command.
  • Issue #1100 "indextool --buildidf is not working": indextool now closes the file after finishing globalidf.
  • Issue #1104 "Count(*) is trying to be treated as schema set in remote tables": Resolved an issue where an error message was being sent by the daemon for queries into the distributed index when the agent returned an empty result set.
  • Issue #1109 "FLUSH ATTRIBUTES hangs with threads=1".
  • Issue #1126 "Lost connection to MySQL server during query - manticore 6.0.5": Crashes that were happening when using multiple filters over columnar attributes have been addressed.
  • Issue #1135 "JSON string filtering case sensitivity": Corrected the collation to function correctly for filters used in HTTP search requests.
  • Issue #1140 "Match in a wrong field": Fixed the damage related with morphology_skip_fields.
  • Issue #1155 "system remote commands via API should pass g_iMaxPacketSize": Made updates to bypass the max_packet_size check for replication commands between nodes. Additionally, the latest cluster error has been added to the status display.
  • Issue #1302 "tmp files left on failed optimize": Corrected an issue where temporary files were left behind after an error occurred during a merge or optimize process.
  • Issue #1304 "add env var for buddy start timeout": Added environment variable MANTICORE_BUDDY_TIMEOUT (default 3 seconds) to control the daemon's wait duration for a buddy message at startup.
  • Issue #1305 "Int overflow when saving PQ meta": Mitigated excessive memory consumption by daemon on saving large PQ index.
  • Issue #1306 "Can't recreate RT table after altering its external file": Rectified an error of alter with empty string for external files; fixed RT index external files left after altering external files.
  • Issue #1307 "SELECT statement sum(value) as value doesn't work properly": Fixed issue where select list expression with alias could hide index attribute; also fixed sum to count in int64 for integer.
  • Issue #1308 "Avoid binding to localhost in replication": Ensured replication doesn't bind to localhost for host names with multiple IPs.
  • Issue #1309 "reply to mysql client failed for data larger 16Mb": Fixed the issue of returning a SphinxQL packet larger than 16Mb to the client.
  • Issue #1310 "wrong reference in "paths to external files should be absolute": Corrected the display of the full path to external files in SHOW CREATE TABLE.
  • Issue #1311 "debug build crashes on long strings in snippets": Now, long strings (>255 characters) are permitted in the text targeted by the SNIPPET() function.
  • Issue #1312 "spurious crash on use-after-delete in kqueue polling (master-agent)": Fixed crashes when the master cannot connect to the agent on kqueue-driven systems (FreeBSD, MacOS, etc.).
  • Issue #1313 "too long connect to itself": When connecting from the master to agents on MacOS/BSD, a unified connect+query timeout is now used instead of just connect.
  • Issue #1314 "pq (json meta) with unreached embedded synonyms fails to load": Fixed the embedded synonyms flag in pq.
  • Issue #1315 "Allow some functions (sint, fibonacci, second, minute, hour, day, month, year, yearmonth, yearmonthday) to use implicitly promoted argument values".
  • Issue #1321 "Enable multithreaded SI in fullscan, but limit threads": Code has been implemented into CBO to better predict multithreaded performance of secondary indexes when they're utilized in a full-text query.
  • Issue #1322 "count(*) queries still slow after using precalc sorters": Iterators are no longer initiated when employing sorters that use precalculated data, circumventing detrimental performance effects.
  • Issue #1411 "query log in sphinxql does not preserve original queries for MVA's": Now, all()/any() is logged.

6.0.4

1 year ago

Manticore Search 6.0.4

Released: Mar 15 2023

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

New features

  • Improved integration with Logstash, Beats etc. including:
    • Support for Logstash versions >= 7.13.
    • Auto-schema support.
    • Added handling of bulk requests in Elasticsearch-like format.
  • Buddy commit ce90 Log Buddy version on Manticore start.

Bugfixes

  • Issue #588, Issue #942 Fixed bad character at the search meta and call keywords for bigram index.
  • Issue #1027 Lowercase HTTP headers are rejected.
  • Issue #1039 Fixed memory leak at daemon on reading output of the Buddy console.
  • Issue #1056 Fixed unexpected behavior of question mark.
  • Issue #1064 - Fixed race condition in tokenizer lowercase tables causing a crash.
  • Commit 59bb Fixed bulk writes processing in the JSON interface for documents with id explicitly set to null.
  • Commit 7b6b Fixed term statistics in CALL KEYWORDS for multiple same terms.
  • Commit f381 Default config is now created by Windows installer; paths are no longer substituted in runtime.
  • Commit 6940, Commit cc5a Fixed replication issues for cluster with nodes in multiple networks.
  • Commit 4972 Fixed /pq HTTP endpoint to be an alias of the /json/pq HTTP endpoint.
  • Commit 3b53 Fixed daemon crash on Buddy restart.
  • Buddy commit fba9 Display original error on invalid request received.
  • Buddy commit db95 Allow spaces in backup path and add some magic to regexp to support single quotes also.

6.0.2

1 year ago

Manticore Search 6.0.2

Released: Feb 10 2023

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Bugfixes

  • Issue #1024 crash 2 Crash / Segmentation Fault on Facet search with larger number of results
  • Issue #1029 - WARNING: Compiled-in value KNOWN_CREATE_SIZE (16) is less than measured (208). Consider to fix the value!
  • Issue #1032 - Manticore 6.0.0 plain index crashes
  • Issue #1033 - multiple distributed lost on daemon restart

6.0.0

1 year ago

Manticore Search 6.0.0

Released: Feb 7 2023

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Starting with this release, Manticore Search comes with Manticore Buddy, a sidecar daemon written in PHP that handles high-level functionality that does not require super low latency or high throughput. Manticore Buddy operates behind the scenes, and you may not even realize it is running. Although it is invisible to the end user, it was a significant challenge to make Manticore Buddy easily installable and compatible with the main C++-based daemon. This major change will allow the team to develop a wide range of new high-level features, such as shards orchestration, access control and authentication, and various integrations like mysqldump, DBeaver, Grafana mysql connector. For now it already handles SHOW QUERIES, BACKUP and Auto schema.

This release also includes more than 130 bug fixes and numerous features, many of which can be considered major.

Major Changes

  • 🔬 Experimental: you can now execute Elasticsearch-compatible insert and replace JSON queries which enables using Manticore with tools like Logstash (version < 7.13), Filebeat and other tools from the Beats family. Enabled by default. You can disable it using SET GLOBAL ES_COMPAT=off.
  • Support for Manticore Columnar Library 2.0.0 with numerous fixes and improvements in Secondary indexes. ⚠️ BREAKING CHANGE: Secondary indexes are ON by default as of this release. Make sure you do ALTER TABLE table_name REBUILD SECONDARY if you are upgrading from Manticore 5. See below for more details.
  • Commit c436 Auto-schema: you can now skip creating a table, just insert the first document and Manticore will create the table automatically based on its fields. Read more about this in detail here. You can turn it on/off using searchd.auto_schema.
  • Vast revamp of cost-based optimizer which lowers query response time in many cases.
    • Issue #1008 Parallelization performance estimate in CBO.
    • Issue #1014 CBO is now aware of secondary indexes and can act smarter.
    • Commit cef9 Encoding stats of columnar tables/fields are now stored in the meta data to help CBO make smarter decisions.
    • Commit 2b95 Added CBO hints for fine-tuning its behaviour.
  • Telemetry: we are excited to announce the addition of telemetry in this release. This feature allows us to collect anonymous and depersonalized metrics that will help us improve the performance and user experience of our product. Rest assured, all data collected is completely anonymous and will not be linked to any personal information. This feature can be easily turned off in the settings if desired.
  • Commit 5aaf ALTER TABLE table_name REBUILD SECONDARY to rebuild secondary indexes whenever you want, for example:
  • Issue #821 New tool manticore-backup for backing up and restoring Manticore instance
  • SQL command BACKUP to do backups from inside Manticore.
  • SQL command SHOW QUERIES as an easy way to see running queries rather than threads.
  • Issue #551 SQL command KILL to kill a long-running SELECT.
  • Dynamic max_matches for aggregation queries to increase accuracy and lower response time.

Minor changes

  • Issue #822 SQL commands FREEZE/UNFREEZE to prepare a real-time/plain table for a backup.

  • Commit c470 New settings accurate_aggregation and max_matches_increase_threshold for controlled aggregation accuracy.

  • Issue #718 Support for signed negative 64-bit IDs. Note, you still can't use IDs > 2^63, but you can now use ids in the range of from -2^63 to 0.

  • As we recently added support for secondary indexes, things became confusing as "index" could refer to a secondary index, a full-text index, or a plain/real-time index. To reduce confusion, we are renaming the latter to "table". The following SQL/command line commands are affected by this change. Their old versions are deprecated, but still functional:

    • index <table name> => table <table name>,
    • searchd -i / --index => searchd -t / --table,
    • SHOW INDEX STATUS => SHOW TABLE STATUS,
    • SHOW INDEX SETTINGS => SHOW TABLE SETTINGS,
    • FLUSH RTINDEX => FLUSH TABLE,
    • OPTIMIZE INDEX => OPTIMIZE TABLE,
    • ATTACH TABLE plain TO RTINDEX rt => ATTACH TABLE plain TO TABLE rt,
    • RELOAD INDEX => RELOAD TABLE,
    • RELOAD INDEXES => RELOAD TABLES.

    We are not planning to make the old forms obsolete, but to ensure compatibility with the documentation, we recommend changing the names in your application. What will be changed in a future release is the "index" to "table" rename in the output of various SQL and JSON commands.

  • Queries with stateful UDFs are now forced to be executed in a single thread.

  • Issue #1011 Refactoring of all related to time scheduling as a prerequisite for parallel chunks merging.

  • ⚠️ BREAKING CHANGE: Columnar storage format has been changed. You need to rebuild those tables that have columnar attributes.

  • ⚠️ BREAKING CHANGE: Secondary indexes file format has been changed, so if you are using secondary indexes for searching and have searchd.secondary_indexes = 1 in your configuration file, be aware that the new Manticore version will skip loading the tables that have secondary indexes. It's recommended to:

    • Before you upgrade change searchd.secondary_indexes to 0 in the configuration file.
    • Run the instance. Manticore will load up the tables with a warning.
    • Run ALTER TABLE <table name> REBUILD SECONDARY for each index to rebuild secondary indexes.

    If you are running a replication cluster, you'll need to run ALTER TABLE <table name> REBUILD SECONDARY on all the nodes or follow this instruction with just change: run the ALTER .. REBUILD SECONDARY instead of the OPTIMIZE.

  • ⚠️ BREAKING CHANGE: The binlog version has been updated, so any binlogs from previous versions will not be replayed. It is important to ensure that Manticore Search is stopped cleanly during the upgrade process. This means that there should be no binlog files in /var/lib/manticore/binlog/ except for binlog.meta after stopping the previous instance.

  • Issue #849 SHOW SETTINGS: helper command for manticore-backup.

  • Issue #1007 SET GLOBAL CPUSTATS=1/0 turns on/off cpu time tracking; SHOW THREADS now doesn't show CPU statistics when the cpu time tracking is off.

  • Issue #1009 RT table RAM chunk segments can now be merged while the RAM chunk is being flushed.

  • Issue #1012 Added secondary index progress to the output of indexer.

  • Issue #1013 Previously a table record could be removed by Manticore from the index list if it couldn't start serving it on start. The new behaviour is to keep it in the list to try to load it on the next start.

  • indextool --docextract returns all the words and hits belonging to requested document.

  • Commit 2b29 Environment variable dump_corrupt_meta enables dumping a corrupted table meta data to log in case searchd can't load the index.

  • Commit c7a3 DEBUG META can show max_matches and pseudo sharding statistics.

  • Commit 6bca A better error instead of the confusing "Index header format is not json, will try it as binary...".

  • Commit bef3 Ukirainian lemmatizer path has been changed.

  • Commit 4ae7 Secondary indexes statistics has been added to SHOW META.

  • Commit 2e7c JSON interface can now be easily visualized using Swagger Editor https://manual.manticoresearch.com/dev/Openapi#OpenAPI-specification.

  • Refactoring of Secondary indexes integration with Columnar storage.
  • Commit efe2 Manticore Columnar Library optimization which can lower response time by partial preliminary min/max evaluation.
  • Commit 2757 If a disk chunk merge is interrupted, the daemon now cleans up the MCL-related tmp files.
  • Commit e9c6 Columnar and secondary libraries versions are dumped to log on crash.
  • Commit f5e8 Added support for quick doclist rewinding to secondary indexes.
  • Commit 06df Queries like select attr, count(*) from plain_index (w/o filtering) are now faster in case you are using MCL.
  • Commit 0a76 @@autocommit in HandleMysqlSelectSysvar for compatibility with .net connector for mysql greater than 8.25
  • Commit 4d19 ⚠️ BREAKING CHANGE: Support for Debian Stretch and Ubuntu Xenial has been discontinued.
  • RHEL 9 support including Centos 9, Alma Linux 9 and Oracle Linux 9.
  • Issue #924 Debian Bookworm support.
  • Issue #636 Packaging: arm64 builds for Linuxes and MacOS.
  • PR #26 Multi-architecture (x86_64 / arm64) docker image.
  • Simplified package building for contributors.
  • It's now possible to install a specific version using APT.
  • Commit a6b8 Windows installer (previously we provided just an archive).
  • Switched to compiling using CLang 15.
  • ⚠️ BREAKING CHANGE: Custom Homebrew formulas including the formula for Manticore Columnar Library. To install Manticore, MCL and any other necessary components, use the following command brew install manticoresoftware/manticore/manticoresearch manticoresoftware/manticore/manticore-extra.

Bugfixes

  • Issue #479 Field with name text
  • Issue #501 id can't be non bigint
  • Issue #646 ALTER vs field with name "text"
  • Issue #652 Possible BUG: HTTP (JSON) offset and limit affects facet results
  • Issue #827 Searchd hangs/crashes under intensive loading
  • Issue #996 PQ index out of memory
  • Commit 1041 binlog_flush = 1 has been broken all the time since Sphinx. Fixed.
  • MCL Issue #14 MCL: crash on select when too many ft fields
  • MCL Issue #17 MCL: add SSE code to columnar scan
  • Issue #470 sql_joined_field can't be stored
  • Issue #713 Crash when using LEVENSHTEIN()
  • Issue #743 Manticore crashes unexpected and cant to normal restart
  • Issue #788 CALL KEYWORDS through /sql returns control char which breaks json
  • Issue #789 mariadb can't create table FEDERATED
  • Issue #796 WARNING: dlopen() failed: /usr/bin/lib_manticore_columnar.so: cannot open shared object file: No such file or directory
  • Issue #797 Manticore crashes when search with ZONESPAN is done through api
  • Issue #799 wrong weight when using multiple indexes and facet distinct
  • Issue #801 SphinxQL group query hangs after SQL index reprocessing
  • Issue #802 MCL: Indexer crashes in 5.0.2 and manticore-columnar-lib 1.15.4
  • Issue #813 Manticore 5.0.2 FEDERATED returns empty set (MySQL 8.0.28)
  • Issue #824 select COUNT DISTINCT on 2 indices when result is zero throws internal error
  • Issue #826 CRASH on delete query
  • Issue #843 MCL: Bug with long text field
  • Issue #856 5.0.2 rtindex: Aggregate search limit behavior is not as expected
  • Issue #863 Hits returned is Nonetype object even for searches that should return multiple results
  • Issue #870 Crash with using Attribute and Stored Field in SELECT expression
  • Issue #872 table gets invisible after crash
  • Issue #877 Two negative terms in search query gives error: query is non-computable
  • Issue #878 a -b -c is not working via json query_string
  • Issue #886 pseudo_sharding with query match
  • Issue #893 Manticore 5.0.2 min/max function doesn't work as expecting ...
  • Issue #896 Field "weight" is not parsed correctly
  • Issue #897 Manticore service crash upon start and keep restarting
  • Issue #900 group by j.a, smth works wrong
  • Issue #913 Searchd crash when expr used in ranker, but only for queries with two proximities
  • Issue #916 net_throttle_action is broken
  • Issue #919 MCL: Manticore crashes on query execution and other crashed during cluster recovery.
  • Issue #925 SHOW CREATE TABLE outputs w/o backticks
  • Issue #930 It's now possible to query Manticore from Java via JDBC connector
  • Issue #933 bm25f ranking problems
  • Issue #934 configless indexes frozen in watchdog on the first-load state
  • Issue #937 Segfault when sorting facet data
  • Issue #940 crash on CONCAT(TO_STRING)
  • Issue #947 In some cases a single simple select could cause the whole instance stall, so you couldn't log in to it or run any other query until the running select is done.
  • Issue #948 Indexer crash
  • Issue #950 wrong count from facet distinct
  • Issue #953 LCS is calculating incorrectly in built-in sph04 ranker
  • Issue #955 5.0.3 dev crashing
  • Issue #963 FACET with json on engine columnar crash
  • Issue #982 MCL: 5.0.3 crash from secondary index
  • PR #984 @@autocommit in HandleMysqlSelectSysvar
  • PR #985 Fix thread-chunk distribution in RT indexes
  • Issue #985 Fix thread-chunk distribution in RT indexes
  • Issue #986 wrong default max_query_time
  • Issue #987 Crash on when using regex expression in multithreaded execution
  • Issue #988 Broken backward index compatibility
  • Issue #989 indextool reports error checking columnar attributes
  • Issue #990 memleak of json grouper clones
  • Issue #991 Memleak of levenshtein func cloning
  • Issue #992 Error message lost when loading meta
  • Issue #993 Propagate errors from dynamic indexes/subkeys and sysvars
  • Issue #994 Crash on count distinct over a columnar string in columnar storage
  • Issue #995 MCL: min(pickup_datetime) from taxi1 crashes
  • Issue #997 empty excludes JSON query removes columns from select list
  • Issue #998 Secondary tasks run on current scheduler sometimes cause abnormal side effects
  • Issue #999 crash with facet distinct and different schemas
  • Issue #1000 MCL: Columnar rt index became damaged after run without columnar library
  • Issue #1001 implicit cutoff is not working in json
  • Issue #1002 Columnar grouper issue
  • Issue #1003 Unable to delete last field from the index
  • Issue #1004 wrong behaviour after --new-cluster
  • Issue #1005 "columnar library not loaded", but it's not required
  • Issue #1006 no error for delete query
  • Issue #1010 Fixed ICU data file location in Windows builds
  • PR #1018 Handshake send problem
  • Issue #1020 Display id in show create table
  • Issue #1024 crash 1 Crash / Segmentation Fault on Facet search with larger number of results.
  • Commit 4739 Thread gets stuck on shutdown while replication is busy between nodes
  • Commit ab87 Mixing floats and ints in a JSON range filter could make Manticore ignore the filter
  • Commit d001 Float filters in JSON were inaccurate
  • Commit 4092 Discard uncommitted txns if index altered (or it can crash)
  • Commit 9692 Query syntax error when using backslash
  • Commit 0c19 workers_clients could be wrong in SHOW STATUS
  • Commit 1772 fixed a crash on merging ram segments w/o docstores
  • Commit f45b Fixed missed ALL/ANY condition for equals JSON filter
  • Commit 3e83 Replication could fail with got exception while reading ist stream: mkstemp(./gmb_pF6TJi) failed: 13 (Permission denied) if the searchd was started from a directory it can't write to.
  • Commit 92e5 Since 4.0.2 crash log included only offsets. This commit fixes that.

5.0.2

1 year ago

Manticore Search 5.0.2, May 30th 2022

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

  • Issue #791 - wrong stack size could cause a crash.

5.0.0

2 years ago

Manticore Search 5.0.0, May 18th 2022

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Release blogpost https://manticoresearch.com/blog/manticore-search-5-0-0/

Major new features

  • 🔬 Support for Manticore Columnar Library 1.15.2, which enables Secondary indexes beta version. Building secondary indexes is on by default for plain and real-time columnar and row-wise indexes (if Manticore Columnar Library is in use), but to enable it for searching you need to set secondary_indexes = 1 either in your configuration file or using SET GLOBAL. The new functionality is supported in all operating systems except old Debian Stretch and Ubuntu Xenial.
  • Read-only mode: you can now specify listeners that process only read queries discarding any writes.
  • New /cli endpoint for running SQL queries over HTTP even easier.
  • Faster bulk INSERT/REPLACE/DELETE via JSON over HTTP: previously you could provide multiple write commands via HTTP JSON protocol, but they were processed one by one, now they are handled as a single transaction.
  • #720 Nested filters support in JSON protocol. Previously you couldn't code things like a=1 and (b=2 or c=3) in JSON: must (AND), should (OR) and must_not (NOT) worked only on the highest level. Now they can be nested.
  • Support for Chunked transfer encoding in HTTP protocol. You can now use chunked transfer in your application to transfer large batches with lower resource consumption (since you don't need to calculate Content-Length). On the server's side Manticore now always processes incoming HTTP data in streaming fashion without waiting for the whole batch to be transferred as previously, which:
    • decreases peak RAM consumption, which lowers a chance of OOM
    • decreases response time (our tests showed 11% decrease for processing a 100MB batch)
    • lets you overcome max_packet_size and transfer batches much larger than the largest allowed value of max_packet_size (128MB), e.g. 1GB at once.
  • #719 HTTP interface support of 100 Continue: now you can transfer large batches from curl (including curl libraries used by various programming languages) which by default does Expect: 100-continue and waits some time before actually sending the batch. Previously you had to add Expect: header, now it's not needed.
  • Having at least one full-text field in a real-time/plain index is not mandatory anymore. You can now use Manticore even in cases not having anything to do with full-text search.
  • Fast fetching for attributes backed by Manticore Columnar Library: queries like select * from <columnar table> are now much faster than previously, especially if there are many fields in the schema.
  • ⚠️ BREAKING CHANGE: Implicit cutoff. Manticore now doesn't spend time and resources processing data you don't need in the result set which will be returned. The downside is that it affects total_found in SHOW META and hits.total in JSON output. It is now only accurate in case you see total_relation: eq while total_relation: gte means the actual number of matching documents is greater than the total_found value you've got. To retain the previous behaviour you can use search option cutoff=0, which makes total_relation always eq.
  • ⚠️ BREAKING CHANGE: All full-text fields are now stored by default in plain indexes. You need to use stored_fields = (empty value) to make all fields non-stored (i.e. revert to the previous behaviour).
  • #715 HTTP JSON supports search options.

Minor changes

  • ⚠️ BREAKING CHANGE: Index meta file format change. Previously meta files (.meta, .sph) were in binary format, now it's just json. The new Manticore version will convert older indexes automatically, but:
    • you can get warning like WARNING: ... syntax error, unexpected TOK_IDENT
    • you won't be able to run the index with previous Manticore versions, make sure you have a backup
  • ⚠️ BREAKING CHANGE: Session state support with help of HTTP keep-alive. This makes HTTP stateful when the client supports it too. For example, using the new /cli endpoint and HTTP keep-alive (which is on by default in all browsers) you can call SHOW META after SELECT and it will work the same way it works via mysql. Note, previously Connection: keep-alive HTTP header was supported too, but it only caused reusing the same connection. Since this version it also makes the session stateful.
  • You can now specify columnar_attrs = * to define all your attributes as columnar in the plain mode which is useful in case the list is long.
  • Faster replication SST
  • ⚠️ BREAKING CHANGE: Replication protocol has been changed. If you are running a replication cluster, then when upgrading to Manticore 5 you need to:
    • stop all your nodes first cleanly
    • and then start the node which was stopped last with --new-cluster (run tool manticore_new_cluster in Linux).
    • read about restarting a cluster for more details.
  • Replication improvements:
    • Faster SST
    • Noise resistance which can help in case of unstable network between replication nodes
    • Improved logging
  • Security improvement: Manticore now listens on 127.0.0.1 instead of 0.0.0.0 in case no listen at all is specified in config. Even though in the default configuration which is shipped with Manticore Search the listen setting is specified and it's not typical to have a configuration with no listen at all, it's still possible. Previously Manticore would listen on 0.0.0.0 which is not secure, now it listens on 127.0.0.1 which is usually not exposed to the Internet.
  • Faster aggregation over columnar attributes.
  • Increased AVG() accuracy: previously Manticore used float internally for aggregations, now it uses double which increases the accuracy significantly.
  • Improved support for JDBC MySQL driver.
  • DEBUG malloc_stats support for jemalloc.
  • optimize_cutoff is now available as a per-table setting which can be set when you CREATE or ALTER a table.
  • ⚠️ BREAKING CHANGE: query_log_format is now sphinxql by default. If you are used to plain format you need to add query_log_format = plain to your configuration file.
  • Significant memory consumption improvements: Manticore consumes significantly less RAM now in case of long and intensive insert/replace/optimize workload in case stored fields are used.
  • shutdown_timeout default value was increased from 3 seconds to 60 seconds.
  • Commit ffd0499d Support for Java mysql connector >= 6.0.3: in Java mysql connection 6.0.3 they changed the way they connect to mysql which broke compatibility with Manticore. The new behaviour is now supported.
  • Commit 1da6dbec disabled saving a new disk chunk on loading an index (e.g. on searchd startup).
  • Issue #746 Support for glibc >= 2.34.
  • Issue #784 count 'VIP' connections separately from usual (non-VIP). Previously VIP connections were counted towards the max_connections limit, which could cause "maxed out" error for non-VIP connections. Now VIP connections are not counted towards the limit. Current number of VIP connections can be also seen in SHOW STATUS and status.
  • ID can now be specified explicitly.

⚠️ Other minor breaking changes

  • ⚠️ BM25F formula has been slightly updated to improve search relevance. This only affects search results in case you use function BM25F(), it doesn't change behaviour of the default ranking formula.
  • ⚠️ Changed behaviour of REST /sql endpoint: /sql?mode=raw now requires escaping and returns an array.
  • ⚠️ Format change of the response of /bulk INSERT/REPLACE/DELETE requests:
    • previously each sub-query constituted a separate transaction and resulted in a separate response
    • now the whole batch is considered a single transaction, which returns a single response
  • ⚠️ Search options low_priority and boolean_simplify now require a value (0/1): previously you could do SELECT ... OPTION low_priority, boolean_simplify, now you need to do SELECT ... OPTION low_priority=1, boolean_simplify=1.
  • ⚠️ If you are using old php, python or java clients please follow the corresponding link and find an updated version. The old versions are not fully compatible with Manticore 5.
  • ⚠️ HTTP JSON requests are now logged in different format in mode query_log_format=sphinxql. Previously only full-text part was logged, now it's logged as is.

New packages

  • ⚠️ BREAKING CHANGE: because of the new structure when you upgrade to Manticore 5 it's recommended to remove old packages before you install the new ones:

    • RPM-based: yum remove manticore*
    • Debian and Ubuntu: apt remove manticore*
  • New deb/rpm packages structure. Previous versions provided:

    • manticore-server with searchd (main search daemon) and all needed for it
    • manticore-tools with indexer and indextool
    • manticore including everything
    • manticore-all RPM as a meta package referring to manticore-server and manticore-tools

    The new structure is:

    • manticore - deb/rpm meta package which installes all the above as dependencies
    • manticore-server-core - searchd and everything to run it alone
    • manticore-server - systemd files and other supplementary scripts
    • manticore-tools - indexer, indextool and other tools
    • manticore-common - default configuration file, default data directory, default stopwords
    • manticore-icudata, manticore-dev, manticore-converter didn't change much
    • .tgz bundle which includes all the packages
  • Support for Ubuntu Jammy

  • Support for Amazon Linux 2 via YUM repo

Bugfixes

  • Issue #287 out of memory while indexing RT index
  • Issue #604 Breaking change 3.6.0, 4.2.0 sphinxql-parser
  • Issue #667 FATAL: out of memory (unable to allocate 9007199254740992 bytes)
  • Issue #676 Strings not passed correctly to UDFs
  • Issue #698 Searchd crashes after trying to add a text column to a rt index
  • Issue #705 Indexer couldn't find all columns
  • Issue #709 Grouping by json.boolean works wrong
  • Issue #716 indextool commands related to index (eg. --dumpdict) failure
  • Issue #724 Fields disappear from the selection
  • Issue #727 .NET HttpClient Content-Type incompatibility when using application/x-ndjson
  • Issue #729 Field length calculation
  • Issue #730 create/insert into/drop columnar table has a memleak
  • Issue #731 Empty column in results under certain conditions
  • Issue #749 Crash of daemon on start
  • Issue #750 Daemon hangs on start
  • Issue #751 Crash at SST
  • Issue #752 Json attribute marked as columnar when engine='columnar'
  • Issue #753 Replication listens on 0
  • Issue #754 columnar_attrs = * is not working with csvpipe
  • Issue #755 Crash on select float in columnar in rt
  • Issue #756 Indextool changes rt index during check
  • Issue #757 Need a check for listeners port range intersections
  • Issue #758 Log original error in case RT index failed to save disk chunk
  • Issue #759 Only one error reported for RE2 config
  • Issue #760 RAM consumption changes in commit 5463778558586d2508697fa82e71d657ac36510f
  • Issue #761 3rd node doesn't make a non-primary cluster after dirty restart
  • Issue #762 Update counter gets increased by 2
  • Issue #763 New version 4.2.1 corrupt index created with 4.2.0 with morphology using
  • Issue #764 No escaping in json keys /sql?mode=raw
  • Issue #765 Using function hides other values
  • Issue #766 Memleak triggered by a line in FixupAttrForNetwork
  • Issue #767 Memleak in 4.2.0 and 4.2.1 related with docstore cache
  • Issue #768 Strange ping-pong with stored fields over network
  • Issue #769 lemmatizer_base reset to empty if not mentioned in 'common' section
  • Issue #770 pseudo_sharding makes SELECT by id slower
  • Issue #771 DEBUG malloc_stats output zeros when using jemalloc
  • Issue #772 Drop/add column makes value invisible
  • Issue #773 Can't add column bit(N) to columnar table
  • Issue #774 "cluster" gets empty on start in manticore.json
  • Commit 1da4ce89 HTTP actions are not tracked in SHOW STATUS
  • Commit 381000ab disable pseudo_sharding for low frequency single keyword queries
  • Commit 800325cc fixed stored attributes vs index merge
  • Commit cddfeed6 generalized distinct value fetchers; added specialized distinct fetchers for columnar strings
  • Commit fba4bb4f fixed fetching null integer attributes from docstore
  • Commit f3009a92 ranker could be specified twice in query log

4.2.0

2 years ago

Manticore Search 4.2.0, Dec 23rd 2021

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Release blogpost

Major new features

  • Pseudo-sharding support for real-time indexes and full-text queries. In previous release we added limited pseudo sharding support. Starting from this version you can get all benefits of the pseudo sharding and your multi-core processor by just enabling searchd.pseudo_sharding. The coolest thing is that you don't need to do anything with your indexes or queries for that, just enable it and if you have free CPU it will be used to lower your response time. It supports plain and real-time indexes for full-text, filtering and analytical queries. For example, here is how enabling pseudo sharding can make most queries' response time in average about 10x lower on Hacker news curated comments dataset multiplied 100 times (116 million docs in a plain index).
4.2.0 pseudo sharding on vs off
  • PQ transactions are now atomic and isolated. Previously PQ transactions support was limited. It enables much faster REPLACE into PQ, especially when you need to replace a lot of rules at once. Performance details:

Previous version 4.0.2

It takes 48 seconds to insert 1M PQ rules and 406 seconds to REPLACE just 40K in 10K batches.

root@perf3 ~ # mysql -P9306 -h0 -e "drop table if exists pq; create table pq (f text, f2 text, j json, s string) type='percolate';"; date; for m in `seq 1 1000`; do (echo -n "insert into pq (id,query,filters,tags) values "; for n in `seq 1 1000`; do echo -n "(0,'@f (cat | ( angry dog ) | (cute mouse)) @f2 def', 'j.json.language=\"en\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; [ $n != 1000 ] && echo -n ","; done; echo ";")|mysql -P9306 -h0; done; date; mysql -P9306 -h0 -e "select count(*) from pq"

Wed Dec 22 10:24:30 AM CET 2021
Wed Dec 22 10:25:18 AM CET 2021
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

root@perf3 ~ # date; (echo "begin;"; for offset in `seq 0 10000 30000`; do n=0; echo "replace into pq (id,query,filters,tags) values "; for id in `mysql -P9306 -h0 -NB -e "select id from pq limit $offset, 10000 option max_matches=1000000"`; do echo "($id,'@f (tiger | ( angry bear ) | (cute panda)) @f2 def', 'j.json.language=\"de\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; n=$((n+1)); [ $n != 10000 ] && echo -n ","; done; echo ";"; done; echo "commit;") > /tmp/replace.sql; date
Wed Dec 22 10:26:23 AM CET 2021
Wed Dec 22 10:26:27 AM CET 2021
root@perf3 ~ # time mysql -P9306 -h0 < /tmp/replace.sql

real	6m46.195s
user	0m0.035s
sys	0m0.008s

Previous version 4.2.0

It takes 34 seconds to insert 1M PQ rules and 23 seconds to REPLACE them in 10K batches.

root@perf3 ~ # mysql -P9306 -h0 -e "drop table if exists pq; create table pq (f text, f2 text, j json, s string) type='percolate';"; date; for m in `seq 1 1000`; do (echo -n "insert into pq (id,query,filters,tags) values "; for n in `seq 1 1000`; do echo -n "(0,'@f (cat | ( angry dog ) | (cute mouse)) @f2 def', 'j.json.language=\"en\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; [ $n != 1000 ] && echo -n ","; done; echo ";")|mysql -P9306 -h0; done; date; mysql -P9306 -h0 -e "select count(*) from pq"

Wed Dec 22 10:06:38 AM CET 2021
Wed Dec 22 10:07:12 AM CET 2021
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

root@perf3 ~ # date; (echo "begin;"; for offset in `seq 0 10000 990000`; do n=0; echo "replace into pq (id,query,filters,tags) values "; for id in `mysql -P9306 -h0 -NB -e "select id from pq limit $offset, 10000 option max_matches=1000000"`; do echo "($id,'@f (tiger | ( angry bear ) | (cute panda)) @f2 def', 'j.json.language=\"de\"', '{\"tag1\":\"tag1\",\"tag2\":\"tag2\"}')"; n=$((n+1)); [ $n != 10000 ] && echo -n ","; done; echo ";"; done; echo "commit;") > /tmp/replace.sql; date
Wed Dec 22 10:12:31 AM CET 2021
Wed Dec 22 10:14:00 AM CET 2021
root@perf3 ~ # time mysql -P9306 -h0 < /tmp/replace.sql

real	0m23.248s
user	0m0.891s
sys	0m0.047s

Minor changes

  • optimize_cutoff is now available as a configuration option in section searchd. It's useful when you want to limit the RT chunks count in all your indexes to a particular number globally.
  • Commit 00874743 accurate count(distinct ...) and FACET ... distinct over several local physical indexes (real-time/plain) with identical fields set/order.
  • PR #598 bigint support for YEAR() and other timestamp functions.
  • Commit 8e85d4bc Adaptive rt_mem_limit. Previously Manticore Search was collecting exactly up to rt_mem_limit of data before saving a new disk chunk to disk, and while saving was still collecting up to 10% more (aka double-buffer) to minimize possible insert suspension. If that limit was also exhausted, adding new documents was blocked until the disk chunk was fully saved to disk. The new adaptive limit is built on the fact that we have auto-optimize now, so it's not a big deal if disk chunks do not fully respect rt_mem_limit and start flushing a disk chunk earlier. So, now we collect up to 50% of rt_mem_limit and save that as a disk chunk. Upon saving we look at the statistics (how much we've saved, how many new documents have arrived while saving) and recalculate the initial rate which will be used next time. For example, if we saved 90 million documents, and another 10 million docs arrived while saving, the rate is 90%, so we know that next time we can collect up to 90% of rt_mem_limit before starting flushing another disk chunk. The rate value is calculated automatically from 33.3% to 95%.
  • Issue #628 unpack_zlib for PostgreSQL source. Thank you, Dmitry Voronin for the contribution.
  • Commit 6d54cf2b indexer -v and --version. Previously you could still see indexer's version, but -v/--version were not supported.
  • Issue #662 infinit mlock limit by default when Manticore is started via systemd.
  • Commit 63c8cd05 spinlock -> op queue for coro rwlock.
  • Commit 41130ce3 environment variable MANTICORE_TRACK_RT_ERRORS useful for debugging RT segments corruption.

Breaking changes

  • Binlog version was increased, binlog from previous version won't be replayed, so make sure you stop Manticore Search cleanly during upgrade: no binlog files should be in /var/lib/manticore/binlog/ except binlog.meta after stopping the previous instance.
  • Commit 3f659f36 new column "chain" in show threads option format=all. It shows stack of some task info tickets, most useful for profiling needs, so if you are parsing show threads output be aware of the new column.
  • searchd.workers was obsoleted since 3.5.0, now it's deprecated, if you still have it in your configuration file it will trigger a warning on start. Manticore Search will start, but with a warning.

Bugfixes

  • Issue #650 Manticore 4.0.2 slower than Manticore 3.6.3. 4.0.2 was faster than previous versions in terms of bulk inserts, but significantly slower for single document inserts. It's been fixed in 4.2.0.
  • Commit 22f4141b RT index could get corrupted under intensive REPLACE load, or it could crash
  • Commit 03be91e4 fixed average at merging groupers and group N sorter; fixed merge of aggregates
  • Commit 2ea575d3 indextool --check could crash
  • Commit 7ec76d4a RAM exhaustion issue caused by UPDATEs
  • Commit 658a727e daemon could hang on INSERT
  • Commit 46e42b9b daemon could hang on shutdown
  • Commit f8d7d517 daemon could crash on shutdown
  • Commit 733accf1 daemon could hang on crash
  • Commit f7f8bd8c daemon could crash on startup trying to rejoin cluster with invalid nodes list
  • Commit 14015561 distributed index could get completely forgotten in RT mode in case it couldn't resolve one of its agents on start
  • Issue #683 attr bit(N) engine='columnar' fails
  • Issue #682 create table fails, but leaves dir
  • Issue #663 Config fails with: unknown key name 'attr_update_reserve'
  • Issue #632 Manticore crash on batch queries
  • Issue #679 Batch queries causing crashes again with v4.0.3
  • Commit f7f8bd8c fixed daemon crash on startup trying to re-join cluster with invalid nodes list
  • Issue #643 Manticore 4.0.2 does not accept connections after batch of inserts
  • Issue #635 FACET query with ORDER BY JSON.field or string attribute could crash
  • Issue #634 Crash SIGSEGV on query with packedfactors

4.0.2

2 years ago

Version 4.0.2, Sep 21st 2021

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Major new features

  • Full support of Manticore Columnar Library. Previously Manticore Columnar Library was supported only for plain indexes. Now it's supported:

    • in real-time indexes for INSERT, REPLACE, DELETE, OPTIMIZE
    • in replication
    • in ALTER
    • in indextool --check
  • Automatic indexes compaction (#478). Finally you don't have to call OPTIMIZE manually or via a crontask or other kind of automation. Manticore now does it on your own. You can set default compaction threshold via optimize_cutoff.

  • Chunk snapshots and locks system revamp. These changes may be invisible from outside at first glance, but they improve the behaviour of many things happening in real-time indexes significantly. In a nutshell, previously most Manticore data manipulation operations relied on locks heavily, now we use disk chunk snapshots instead.

    • read operations (e.g. SELECTs, replication) are performed with snapshots
    • operations that just change internal index structure without modifying schema/documents (e.g. merging RAM segments, saving disk chunks, merging disk chunks) are performed with read-only snapshots and replace the existing chunks in the end
    • UPDATEs and DELETEs are performed against existing chunks, but for the case of merging that may be happening the writes are collected and are then applied against the new chunks
    • UPDATEs acquire an exclusive lock sequentially for every chunk. Merges acquire a shared lock when entering the stage of collecting attributes from the chunk. So at the same time only one (merge or update) operation has access to attributes of the chunk.
    • when merging gets to the phase it needs attributes it sets a special flag. When UPDATE finishes it checks the flag and if it's set, the whole update is stored in a special collection. Finally when the merge finishes, it applies the updates set to the newborn disk chunk
    • ALTER runs via an exclusive lock
    • replication runs as a usual read operation, but in addition saves the attributes before SST and forbids updates during the SST
  • ALTER can add/remove a full-text field. Previously it could only add/remove an attribute.

  • 🔬 Experimental: pseudo sharding for full-scan queries - allows to parallelize any non-full-text search query. Instead of preparing shards manually you can now just enable new option searchd.pseudo_sharding and expect up to CPU cores lower response time for non-full-text search queries. Note it can easily occupy all existing CPU cores, so if you care not only about latency, but throughput too - use it with caution.

Minor changes

  • Linux Mint and Ubuntu Hirsute Hippo are supported via APT repository
  • faster update by id via HTTP in big indexes in some cases (depends on the ids distribution)

3.6.0

time curl -X POST -d '{"update":{"index":"idx","id":4611686018427387905,"doc":{"mode":0}}}' -H "Content-Type: application/x-ndjson" http://127.0.0.1:6358/json/bulk

real    0m43.783s
user    0m0.008s
sys     0m0.007s

4.0.2

time curl -X POST -d '{"update":{"index":"idx","id":4611686018427387905,"doc":{"mode":0}}}' -H "Content-Type: application/x-ndjson" http://127.0.0.1:6358/json/bulk

real    0m0.006s
user    0m0.004s
sys     0m0.001s

Breaking changes

  • the new version can read older indexes, but the older versions can't read Manticore 4's indexes
  • removed implicit sorting by id. Sort explicitly if required
  • charset_table's default value changes from 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451 to non_cjk
  • OPTIMIZE happens automatically. If you don't need it make sure to set auto_optimize=0 in section searchd in the configuration file
  • #616 ondisk_attrs_default were deprecated, now they are removed
  • for contributors: we now use Clang compiler for Linux builds as according to our tests it can build a faster Manticore Search and Manticore Columnar Library
  • if max_matches is not specified in a search query it gets updated implicitly with the lowest needed value for the sake of performance of the new columnar storage. It can affect metric total in SHOW META, but not total_found which is the actual number of found documents.

Migration from Manticore 3

  • make sure you a stop Manticore 3 cleanly:
    • no binlog files should be in /var/lib/manticore/binlog/ (only binlog.meta should be in the directory)
    • otherwise the indexes Manticore 4 can't reply binlogs for won't be run
  • the new version can read older indexes, but the older versions can't read Manticore 4's indexes, so make sure you make a backup if you want to be able to rollback the new version easily
  • if you run a replication cluster make sure you:
    • stop all your nodes first cleanly
    • and then start the node which was stopped last with --new-cluster (run tool manticore_new_cluster in Linux).
    • read about restarting a cluster for more details

Bugfixes

  • Lots of replication issues have been fixed:
    • 696f8649 - fixed crash during SST on joiner with active index; added sha1 verify at joiner node at writing file chunks to speed up index loading; added rotation of changed index files at joiner node on index load; added removal of index files at joiner node when active index gets replaced by a new index from donor node; added replication log points at donor node for sending files and chunks
    • b296c55a - crash on JOIN CLUSTER in case the address is incorrect
    • 418bf880 - while initial replication of a large index the joining node could fail with ERROR 1064 (42000): invalid GTID, (null), the donor could become unresponsive while another node was joining
    • 6fd350d2 - hash could be calculated wrong for a big index which could result in replication failure
    • #615 - replication failed on cluster restart
  • #574 - indextool --help doesn't display parameter --rotate
  • #578 - searchd high CPU usage while idle after ca. a day
  • #587 - flush .meta immediately
  • #617 - manticore.json gets emptied
  • #618 - searchd --stopwait fails under root. It also fixes systemctl behaviour (previously it was showing failure for ExecStop and didn't wait long enough for searchd to stop properly)
  • #619 - INSERT/REPLACE/DELETE vs SHOW STATUS. command_insert, command_replace and others were showing wrong metrics
  • #620 - charset_table for a plain index had a wrong default value
  • 8f753688 - new disk chunks don't get mlocked
  • #607 - Manticore cluster node crashes when unable to resolve a node by name
  • #623 - replication of updated index can lead to undefined state
  • ca03d228 - indexer could hang on indexing a plain index source with a json attribute
  • 53c75305 - fixed not equal expression filter at PQ index
  • ccf94e02 - fixed select windows at list queries above 1000 matches. SELECT * FROM pq ORDER BY id desc LIMIT 1000 , 100 OPTION max_matches=1100 was not working previously
  • a0483fe9 - HTTPS request to Manticore could cause warning like "max packet size(8388608) exceeded"

3.6.0

3 years ago

Version 3.6.0, May 3rd 2021

Maintenance release before Manticore 4

➡️➡️➡️ DOWNLOAD HERE ⬅️⬅️⬅️

Major new features

  • Support for Manticore Columnar Library for plain indexes. New setting columnar_attrs for plain indexes
  • Support for Ukrainian Lemmatizer
  • Fully revised histograms. When building an index Manticore also builds histograms for each field in it, which it then uses for faster filtering. In 3.6.0 the algorithm was fully revised and you can get a higher performance if you have a lot of data and do a lot of filtering.

Minor changes

Optimizations

  • faster JSON parsing, our tests show 3-4% lower latency on queries like WHERE json.a = 1
  • non-documented command DEBUG SPLIT as a prerequisite for automatic sharding/rebalancing

Bugfixes

  • #584 - inaccurate and unstable FACET results
  • #506 - Strange behavior when using MATCH: those who suffer from this issue need to rebuild the index as the problem was on the phase of building an index
  • #387 - intermittent core dump when running query with SNIPPET() function
  • Stack optimizations useful for processing complex queries:
    • #469 - SELECT results in CRASH DUMP
    • e8420cc7 - stack size detection for filter trees
  • #461 - Update using the IN condition does not take effect correctly
  • #464 - SHOW STATUS immediately after CALL PQ returns - #481 - Fixed static binary build
  • #502 - bug in multi-queries
  • #514 - Unable to use unusual names for columns when use 'create table'
  • d1dbe771 - daemon crash on replay binlog with update of string attribute; set binlog version to 10
  • 775d0555 - fixed expression stack frame detection runtime (test 207)
  • 4795dc49 - percolate index filter and tags were empty for empty stored query (test 369)
  • c3f0bf4d - breaks of replication SST flow at network with long latency and high error rate (different data centers replication); updated replication command version to 1.03
  • ba2d6619 - joiner lock cluster on write operations after join into cluster (test 385)
  • de4dcb9f - wildcards matching with exact modifier (test 321)
  • 6524fc6a - docid checkpoints vs docstore
  • f4ab83c2 - Inconsistent indexer behavior when parsing invalid xml
  • 7b727e22 - Stored percolate query with NOTNEAR runs forever (test 349)
  • 812dab74 - wrong weight for phrase starting with wildcard
  • 1771afc6 - percolate query with wildcards generate terms without payload on matching causes interleaved hits and breaks matching (test 417)
  • aa0d8c2b - fixed calculation of 'total' in case of parallelized query
  • 18d81b3c - crash in Windows with multiple concurrent sessions at daemon
  • 84432f23 - some index settings could not be replicated
  • 93411fe6 - On high rate of adding new events netloop sometimes freeze because of atomic 'kick' event being processed once for several events a time and loosing actual actions from them status of the query, not the server status
  • d805fc12 - New flushed disk chunk might be lost on commit
  • 63cbf008 - inaccurate 'net_read' in profiler
  • f5379bb2 - Percolate issue with arabic (right to left texts)
  • 49eeb420 - id not picked correctly on duplicate column name
  • refactoring of network events to fix a crash in rare cases
  • e8420cc7 fix in indextool --dumpheader
  • ff716353 - TRUNCATE WITH RECONFIGURE worked wrong with stored fields

Breaking changes:

  • New binlog format: you need to make a clean stop of Manticore before upgrading
  • Index format slightly changes: the new version can read you existing indexes fine, but if you decide to downgrade from 3.6.0 to an older version the newer indexes will be unreadable
  • Replication format change: don't replicate from an older version to 3.6.0 and vice versa, switch to the new version on all your nodes at once
  • reverse_scan is deprecated. Make sure you don't use this option in your queries since 3.6.0 since they will fail otherwise
  • As of this release we don't provide builds for RHEL6, Debian Jessie and Ubuntu Trusty any more. If it's mission critical for you to have them supported contact us

Deprecations

  • No more implicit sorting by id. If you rely on it make sure to update your queries accordingly
  • Search option reverse_scan has been deprecated