An embeddable fulltext search engine. Groonga is the successor project to Senna.
We optimized performance as below.
We optimized performance of OR
and AND
search when the number of hits were many.
We optimized performance of prefix search(@^
).
We optimized performance of AND
search when the number of records of A
more than B
in condition of A AND B
.
We optimized performance of search when we used many dynamic columns.
[token_ngram] Added new option ignore_blank
.
We can replace TokenBigramIgnoreBlank
with TokenNgram("ignore_blank", true)
as below.
Here is example of use TokenBigram
.
tokenize TokenBigram "! ! !" NormalizerAuto
[
[
0,
1715155644.64263,
0.001013517379760742
],
[
{
"value": "!",
"position": 0,
"force_prefix": false,
"force_prefix_search": false
},
{
"value": "!",
"position": 1,
"force_prefix": false,
"force_prefix_search": false
},
{
"value": "!",
"position": 2,
"force_prefix": false,
"force_prefix_search": false
}
]
]
Here is example of use TokenBigramIgnoreBlank
.
tokenize TokenBigramIgnoreBlank "! ! !" NormalizerAuto
[
[
0,
1715155680.323451,
0.0009913444519042969
],
[
{
"value": "!!!",
"position": 0,
"force_prefix": false,
"force_prefix_search": false
}
]
]
Here is example of use TokenNgram("ignore_blank", true)
.
tokenize 'TokenNgram("ignore_blank", true)' "! ! !" NormalizerAuto
[
[
0,
1715155762.340685,
0.001041412353515625
],
[
{
"value": "!!!",
"position": 0,
"force_prefix": false,
"force_prefix_search": false
}
]
]
[ubuntu] Add support for Ubuntu 24.04 LTS (Noble Numbat).
[request_cancel] Fix a bug that Groonga may crash when we execute request_cancel
command while we execute the other query.
Fixed the unexpected error when using --post_filter
with --offset
greater than the post-filtered result
In the same situation, using --filter
with --offset
doesn't raise the error.
This inconsistency in behavior between --filter
and --post-filter
has now been resolved.
table_create Users TABLE_PAT_KEY ShortText
column_create Users age COLUMN_SCALAR UInt32
load --table Users
[
["_key", "age"],
["Alice", 21],
["Bob", 22],
["Chris", 23],
["Diana", 24],
["Emily", 25]
]
select Users \
--filter 'age >= 22' \
--post_filter 'age <= 24' \
--offset 3 \
--sort_keys -age --output_pretty yes
[
[
-68,
1715224057.317582,
0.001833438873291016,
"[table][sort] grn_output_range_normalize failed",
[
[
"grn_table_sort",
"/home/horimoto/Work/free-software/groonga.tag/lib/sort.c",
1052
]
]
]
]
Fixed a bug where incorrect search result could be returned when not all phrases within (...)
matched using near phrase product.
For example, there is no record which matched (2)
condition using --query '*NPP1"(a) (2)"'
.
In this case, the expected behavior would be return no record. However, the actual behavior was equal to the query --query '*NPP and "(a)"
as below.
This means that despite no records matched (2)
, records like ax1
and axx1
were incorrectly returned.
table_create Entries TABLE_NO_KEY
column_create Entries content COLUMN_SCALAR Text
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenNgram
column_create Terms entries_content COLUMN_INDEX|WITH_POSITION Entries content
load --table Entries
[
{"content": "ax1"},
{"content": "axx1"}
]
select Entries \
--match_columns content \
--query '*NPP1"(a) (2)"' \
--output_columns 'content'
[
[
0,
1715224211.050228,
0.001366376876831055
],
[
[
[
2
],
[
[
"content",
"Text"
]
],
[
"ax1"
],
[
"axx1"
]
]
]
]
Fixed a bug that rehash failed or data in a table broke when rehash occurred that the table with TABLE_HASH_KEY
has 2^28 or more records.
Fixed a bug that highlight position slipped out of place in the following cases.
If full width space existed before highlight target characters as below.
We expected that Groonga returned "Groonga <span class=\"keyword\">高</span>速!"
.
However, Groonga returned "Groonga <span class=\"keyword\">高速</span>!"
as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("report_source_location", true)' \
--normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": "Groonga 高速!"}
]
select Entries \
--output_columns \
--match_columns body \
--query '高' \
--output_columns 'highlight_html(body, Terms)'
[
[
0,
1715215640.979517,
0.001608610153198242
],
[
[
[
1
],
[
[
"highlight_html",
null
]
],
[
"Groonga <span class=\"keyword\">高速</span>!"
]
]
]
]
If we used TokenNgram("loose_blank", true)
and if highlight target characters included full width space as below.
We expected that Groonga returned "<span class=\"keyword\">山田 太郎</span>"
.
However, Groonga returned "<span class=\"keyword\">山田 太</span>"
as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("loose_blank", true, "report_source_location", true)' \
--normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": "山田 太郎"}
]
select Entries --output_columns \
--match_columns body --query '山田太郎' \
--output_columns 'highlight_html(body, Terms)' --output_pretty yes
[
[
0,
1715220409.096246,
0.0004854202270507812
],
[
[
[
1
],
[
[
"highlight_html",
null
]
],
[
"<span class=\"keyword\">山田 太</span>"
]
]
]
]
If white space existed in the front of highlight target characters as below.
We expected that Groonga returned " <span class=\"keyword\">山</span>田太郎"
.
However, Groonga returned " <span class=\"keyword\">山</span>"
as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("report_source_location", true)' \
--normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": " 山田太郎"}
]
select Entries \
--output_columns \
--match_columns body \
--query '山' \
--output_columns 'highlight_html(body, Terms)' --output_pretty yes
[
[
0,
1715221627.002193,
0.001977920532226562
],
[
[
[
1
],
[
[
"highlight_html",
null
]
],
[
" <span class=\"keyword\">山</span>"
]
]
]
]
If the second character of highlight target was full width space as below.
We expected that Groonga returned "<span class=\"keyword\">山 田</span>太郎"
.
However, Groonga returned "<span class=\"keyword\">山 田太</span>郎"
as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("report_source_location", true)' \
--normalizer 'NormalizerNFKC150("report_source_offset", true)'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
{"body": "山 田太郎"}
]
select Entries \
--output_columns \
--match_columns body \
--query '山 田' \
--output_columns 'highlight_html(body, Terms)'
[
[
0,
1715222501.496007,
0.0005536079406738281
],
[
[
[
0
],
[
[
"highlight_html",
"<span class=\"keyword\">山 田太</span>郎"
]
]
]
]
]
Reduced a log level of a log when Groonga setting normalizers/tokenizer/token_filters against temporary table.
For example, the target log of this modification is the following log.
DDL:1234567890:set_normalizers NormalizerAuto
PGroonga sets normalizers against temporary table on start. So, this log becomes noise.
Because this log become output when PGroonga start because of PGroonga's default log level is notice
.
Therefore, we reduce log level to debug
for the log since this release.
Thus, this log does not output when PGroonga start in default.
[load] Stopped reporting an error when we load
key that becomes an empty key by normalization.
"-"
becomes ""
with NormalizerNFKC150("remove_symbol", true)
. So the following case reports a "empty key" error.
table_create Values TABLE_HASH_KEY ShortText \
--normalizers 'NormalizerNFKC150("remove_symbol", true)'
table_create Data TABLE_NO_KEY
column_create Data value COLUMN_SCALAR Values
load --table Data
[
{"value": "-"}
]
However, if we many load
in such data, many error log are generated.
Because Groonga output many "empty key" error because of Groonga can't register empty string to index.
No problem even if empty string can't register to index in such case. Because we don't match anything even if we search by empty string. So, we stop reporting an "empty key" error in such case.
Fixed a crash bug if a request is canceled/reference/functions/between
or range search.
This bug doesn't necessarily occur. This bug occur when we cancel a request in the specific timing. This bug occur easily when search time is long such as sequential search.
Fixed a bug that/reference/functions/highlight_html
may return invalid result when the following conditions are met.
NormalizerTable
and NormalizerNFKC150
.For example, this bug occur such as the following case.
table_create NormalizationsIndex TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
table_create Normalizations TABLE_HASH_KEY UInt64
column_create Normalizations normalized COLUMN_SCALAR LongText
column_create Normalizations target COLUMN_SCALAR NormalizationsIndex
column_create NormalizationsIndex index COLUMN_INDEX Normalizations target
table_create Lexicon TABLE_PAT_KEY ShortText \
--normalizers 'NormalizerTable("normalized", \
"Normalizations.normalized", \
"target", \
"target"), NormalizerNFKC150'
table_create Names TABLE_HASH_KEY UInt64
column_create Names name COLUMN_SCALAR Lexicon
load --table Names
[
["_key","name"],
[1,"Sato Toshio"]
]
select Names \
--query '_key:1 OR name._key:@"Toshio"' \
--output_columns 'highlight_html(name._key, Lexicon)
[
[
0,
1710401574.332274,
0.001911401748657227
],
[
[
[
1
],
[
[
"highlight_html",
null
]
],
[
"sato <span class=\"keyword\">toshi</span>o"
]
]
]
]
[ubuntu] We become able to provide package for Ubuntu again.
We don't provide packages for Ubuntu in Groonga version 14.0.0. Because we fail makeing Groonga package for Ubuntu by problrm of build environment for Ubuntu package.
We fixed problrm of build environment for Ubuntu package in 14.0.1. So, we can provide packages for Ubuntu again since this release.
Fixed build error when we build from source by using clang
. [GitHub#1738][Reported by windymelt]
This is a major version up! But It keeps backward compatibility. We can upgrade to 14.0.0 without rebuilding database.
Added a new tokenizer TokenH3Index
(experimental).
TokenH3Index
tokenizes WGS84GetPoint to UInt64(H3 index).
Added support for offline and online index construction with non text based tokenizer (experimental).
TokenH3Index
is one of non text based tokenizers.
[select] Added support for searching by index with non text based tokenizer (experimental).
TokenH3Index
is one of non text based tokenizers.
Added new functions distance_cosine()
, distance_inner_product()
, distance_l2_norm_squared()
, distance_l1_norm()
.
We can only get records that a small distance as vector with these functions and limit N
These functions calculate distance in the output
stage.
However, we don't optimaize these functions yet.
distance_cosine()
: Calculate cosine similarity.distance_inner_product()
: Calculate inner product.distance_l2_norm_squared()
: Calculate euclidean distance.distance_l1_norm()
: Calculate manhattan distance.Added a new function number_round()
.
[load] Added support for parallel load
.
This feature only enable when the input_type
of load
is apache-arrow
.
This feature one thread per column. If there are many target columns, it will reduce load time.
[select] We can use uvector as much as possible for array literal in --filter
.
uvector is vector of elements with fix size.
If all elements have the same type, we use uvector instead vector.
[status] Added n_workers
to output of status
.
Optimized a dynamic column creation.
[WAL] Added support for rebuilding broken indexes in parallel.
[select] Added support for Int64
in output_type=apache-arrow
for columns that reference other table.
[Windows] Fixed path for documents of groonga-normalizer-mysql
in package for Windows.
Documents of groonga-normalizer-mysql
put under the share/
in this release.
[select] Fixed a bug that Groonga may crash when we use bitwise operations.
Dropped support for mingw32. [GitHub#1654]
Added support for index search of "vector_column[N] OPERATOR literal" with --match_columns
and --query
.
[windows] Bundled groonga-normalizer-mysql
again. [GitHub#1655]
Groonga 13.1.0 for Windows didn't include groonga-normalizer-mysql
.
This problem only occured in Groonga 13.1.0.
[select] Groonga also cached trace log.
Added support for outputting dict<string>
in a responce of Apache Arrow format.
[groonga-server-http] Added support for new content type application/vnd.apache.arrow.stream
[query] Added support empty input as below.
table_create Users TABLE_NO_KEY
column_create Users name COLUMN_SCALAR ShortText
table_create Lexicon TABLE_HASH_KEY ShortText --default_tokenizer TokenBigramSplitSymbolAlphaDigit --normalizer NormalizerAuto
column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name
load --table Users
[
{"name": "Alice"},
{"name": "Alisa"},
{"name": "Bob"}
]
select Users --output_columns name,_score --filter 'query("name", " ")'
[
[
0,
0.0,
0.0
],
[
[
[
0
],
[
[
"name",
"ShortText"
],
[
"_score",
"Int32"
]
]
]
]
]
Added support for BFloat16(experimental)
We can just load and select BFloat16.
We can't use arithmetic operations such as bfloat16_value - 1.2
.
[column_create] Added new flag WEIGHT_BFLOAT16
.
[select] Fixed a bug that when Groonga cached output_pretty=yes
result, Groonga returned a query with output_pretty
even if we sent a query without output_pretty
.
Fixed a wrong data created bug.
In general, users can't do this explicitly because the command API doesn't accept GRN_OBJ_{APPEND,PREPEND}
.
This may be used internally when a dynamic numeric vector column is created and a temporary result set is created (OR is used).
For example, the following query may create wrong data:
select TABLE \
--match_columns TEXT_COLUMN \
--query 'A B OR C' \
--columns[NUMERIC_DYNAMIC_COLUMN].stage result_set \
--columns[NUMERIC_DYNAMIC_COLUMN].type Float32 \
--columns[NUMERIC_DYNAMIC_COLUMN].flags COLUMN_VECTOR
If this is happen, NUMERIC_DYNAMIC_COLUMN
contains many garbage elements.
It also causes too much memory consumption.
Note that this is caused by an uninitialized variable on stack. So this may or may not be happen.
Fixed a bug that may fail to set valid normalizers/token_filters
.
[fuzzy_search] Fixed a crash bug when the following three conditions established.
${ASCII}${ASCII}${MULTIBYTE}*
characters in a patricia trie table.WITH_TRANSPOSITION
is enabled.For example, "aaあ" in a patricia trie table with query "あああ" pair has this problem as below.
table_create Users TABLE_NO_KEY
column_create Users name COLUMN_SCALAR ShortText
table_create Names TABLE_PAT_KEY ShortText
column_create Names user COLUMN_INDEX Users name
load --table Users
[
{"name": "aaあ"},
{"name": "あうi"},
{"name": "あう"},
{"name": "あi"},
{"name": "iう"}
]
select Users
--filter 'fuzzy_search(name, "あiう",)'
--output_columns 'name, _score'
--match_escalation_threshold -1
[select] Changed the default value of --fuzzy_max_expansions
from 0 to 10.
--fuzzy_max_expansions
can limit number of words that has close edit distance to use search process.
This argument can help to balance hit numbers and performance of the search.
When --fuzzy_max_expansions
is 0, the search use all words that the edit distance are under --fuzzy_max_distance
in the vocabulary list.
--fuzzy_max_expansions
is 0 (unlimited) may slow down a search. Therefore, the default value of --fuzzy_max_expansions
is 10 from this release.
[select] Improved select
arguments with addition new argument --fuzzy_with_transposition
(experimental).
Normally, a edit distance 1
for the transposition case.
However, we can choose edit distance 1
or 2
for the transposition case by using this argument.
If this parameter is yes
, the edit distance of this case is 1
. It's 2
otherwise.
[select] Improved select
arguments with addition new argument --fuzzy_tokenize
.
When --fuzzy_tokenize
is true
, Gronga use tokenizer that specifies in --default_tokenizer
in typo tolerance search.
The default value of --fuzzy_tokenize
is fase
. The useful case of --fuzzy_tokenize
is the following case.
TokenMecab
in --fuzzy_tokenize
.[load] Added support for --ifexists
even if we specified apache-arrow
into input_type
.
[normalizers] Improved NormalizerNFKC*
options with addition new option remove_blank_force
.
When remove_blank_force
is false
, Normalizer doesn't ignore space as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
load --table Entries
[
{"body": "Groonga はとても速い"},
{"body": "Groongaはとても速い"}
]
select Entries --output_columns \
'highlight(body, \
"gaはとても", "<keyword>", "</keyword>", \
\
)'
[
[
0,
0.0,
0.0
],
[
[
[
2
],
[
[
"highlight",
null
]
],
[
"Groonga はとても速い"
],
[
"Groon<keyword>gaはとても</keyword>速い"
]
]
]
]
[select] Improved select
arguments with addition new argument --output_trace_log
(experimental).
If we specify yes
in --output_trace_log
and --command_version 3
, Groonga output addition new log as below.
table_create Memos TABLE_NO_KEY
column_create Memos content COLUMN_SCALAR ShortText
table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenNgram --normalizer NormalizerNFKC150
column_create Lexicon memos_content COLUMN_INDEX|WITH_POSITION Memos content
load --table Memos
[
{"content": "This is a pen"},
{"content": "That is a pen"},
{"content": "They are pens"}
]
select Memos \
--match_columns content \
--query "Thas OR ere" \
--fuzzy_max_distance 1 \
--output_columns *,_score \
--command_version 3 \
--output_trace_log yes \
--output_type apache-arrow
return_code: int32
start_time: timestamp[ns]
elapsed_time: double
error_message: string
error_file: string
error_line: uint32
error_function: string
error_input_file: string
error_input_line: int32
error_input_command: string
-- metadata --
GROONGA:data_type: metadata
return_code start_time elapsed_time error_message error_file error_line error_function error_input_file error_input_line error_input_command
0 0 1970-01-01T09:00:00+09:00 0.000000 (null) (null) (null) (null) (null) (null) (null)
========================================
depth: uint16
sequence: uint16
name: string
value: dense_union<0: uint32=0, 1: string=1>
elapsed_time: uint64
-- metadata --
GROONGA:data_type: trace_log
depth sequence name value elapsed_time
0 1 0 ii.select.input Thas 0
1 2 0 ii.select.exact.n_hits 0 1
2 2 0 ii.select.fuzzy.input Thas 2
3 2 1 ii.select.fuzzy.input.actual that 3
4 2 2 ii.select.fuzzy.input.actual this 4
5 2 3 ii.select.fuzzy.n_hits 2 5
6 1 1 ii.select.n_hits 2 6
7 1 0 ii.select.input ere 7
8 2 0 ii.select.exact.n_hits 2 8
9 1 1 ii.select.n_hits 2 9
========================================
content: string
_score: double
-- metadata --
GROONGA:n_hits: 2
content _score
0 This is a pen 1.000000
1 That is a pen 1.000000
--output_trace_log
is valid in only command version 3.
This will be useful for the following cases:
[query] Added support for object literal.
[query_expand] Added support for NPP
and ONPP
(experimental).
[snippet] Added support for normalizers
option.
We can use normalizer with option.
For example, when we don't want to ignore space in snippet()
function, we use this option as below.
table_create Entries TABLE_NO_KEY
column_create Entries content COLUMN_SCALAR ShortText
load --table Entries
[
{"content": "Groonga and MySQL"},
{"content": "Groonga and My SQL"}
]
select Entries \
--output_columns \
' snippet(content, "MySQL", "<keyword>", "</keyword>", )'
[
[
0,
0.0,
0.0
],
[
[
[
2
],
[
[
"snippet",
null
]
],
[
[
"Groonga and <keyword>MySQL</keyword>"
]
],
[
null
]
]
]
]
Fixed a bug in Time OPERATOR Float{,32}
comparison. GH-1624[Reported by yssrku]
Microsecond (small value than second) information in Float{,32}
isn't used.
This is happen only when Time OPERATOR Float{,32}
.
This is happen in load --ifexists 'A OP B || C OP D'
as below.
table_create Reports TABLE_HASH_KEY ShortText
column_create Reports content COLUMN_SCALAR Text
column_create Reports modified_at COLUMN_SCALAR Time
load --table Reports
[
{"_key": "a", "content": "", "modified_at": 1663989875.438}
]
load \
--table Reports \
--ifexists 'content == "" && modified_at <= 1663989875.437'
However, this isn't happen in select --filter
.
Fixed a bug that alnum(a-zA-Z0-9) + blank
may be detected.
If the number of input is 2 such as ab
and text with some blanks such as a b
is matched, a b
is detected.
However, it should not be detected in this case.
For example, a i
is detected when this bug occures as below.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
load --table Entries
[
{"body": "Groonga is fast"}
]
select Entries \
--output_columns 'highlight(body, "ai", "<keyword>", "</keyword>")'
[
[
0,0.0,0.0
],
[
[
[
1
],
[
[
"highlight",
null
]
],
[
"Groong<keyword>a i</keyword>s fast"
]
]
]
]
However, the above result is unexpected result.
We don't want to detect a i
in the above case.
[column_create] Improved column_create
flags with addition new flags COLUMN_FILTER_SHUFFLE
, COLUMN_FILTER_BYTE_DELTA
, COMPRESS_FILTER_TRUNCATE_PRECISION_1BYTE
, and COMPRESS_FILTER_TRUNCATE_PRECISION_2BYTES
.
Added new bundling library Blosc.
COLUMN_FILTER_SHUFFLE
, COLUMN_FILTER_BYTE_DELTA
, COMPRESS_FILTER_TRUNCATE_PRECISION_1BYTE
, and COMPRESS_FILTER_TRUNCATE_PRECISION_2BYTES
flags are require Blosc.
[status] Improved status
output with addition new features "blosc"
.
[groonga] Improved groonga --version
output with addition new value blosc
.
[select] Improved select
arguments with addition new argument --fuzzy_max_distance
(experimental).
[select] Improved select
arguments with addition new argument --fuzzy_max_expansions
(experimental).
[select] Improved select
arguments with addition new argument --fuzzy_max_distance_ratio
(experimental).
[select] Improved select
arguments with addition new argument --fuzzy_prefix_length
(experimental).
[cast] Added support for casting "[0.0, 1.0, 1.1, ...]"
to Float
/Float32
vector.
[fuzzy_search] Rename max_expansion
option to max_expansions
option.
max_expansion
option is deprecate since this release.
However, we can use max_expansion
in the feature to backward compatibility.
Rename master branch to main branch.
[RPM] Use CMake for building.
[Debian] Added support for Debian trixie.
[fuzzy_search] Fixed a bug that Groonga may get records that should not match.
[query-syntax-near-phrase-search-condition][script-syntax-near-phrase-search-operator] Fixed a bug that Groonga crashed when the first phrase group doesn't match anything as below.
table_create Entries TABLE_NO_KEY
column_create Entries content COLUMN_SCALAR Text
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenNgram \
--normalizer NormalizerNFKC121
column_create Terms entries_content COLUMN_INDEX|WITH_POSITION \
Entries content
load --table Entries
[
{"content": "x y z"}
]
select Entries \
--match_columns Terms.entries_content.content \
--query '*NPP1"(NONEXISTENT) (z)"' \
--output_columns '_score, content'
[normalize] Fixed a bug that normalize
command doesn't output last offset and type.
normalize
command can output offset and type of string after normalize as below, but normalize
command doesn't output the last offset and type by this bug.
table_create Normalizations TABLE_PAT_KEY ShortText
column_create Normalizations normalized COLUMN_SCALAR ShortText
load --table Normalizations
[
]
normalize 'NormalizerNFKC130("unify_kana", true, "report_source_offset", true), NormalizerTable("normalized", "Normalizations.normalized", "report_source_offset", true)' "お あ a ア i ア オ" REMOVE_BLANK|WITH_TYPES|WITH_CHECKS
[
[
0,
0.0,
0.0
],
{
"normalized": "お<あ>a<あ>i<あ>お",
"types": [
"hiragana",
"symbol",
"hiragana",
"symbol",
"alpha",
"symbol",
"hiragana",
"symbol",
"alpha",
"symbol",
"hiragana",
"symbol",
"hiragana"
],
"checks": [
3,
0,
0,
4,
-1,
0,
0,
-1,
4,
4,
-1,
0,
0,
-1,
4,
4,
-1,
0,
0,
-1,
4,
0,
0
],
"offsets": [
0,
4,
4,
4,
8,
12,
12,
12,
16,
20,
20,
20,
24
]
}
]
[normalizers] Fixed a bug that the last offset value may be invalid when we use multiple normalizers.
For the following example, the last offset value is 27 correctly, but it is 17 in the following example by this bug.
table_create Normalizations TABLE_PAT_KEY ShortText
column_create Normalizations normalized COLUMN_SCALAR ShortText
load --table Normalizations
[
]
normalize 'NormalizerNFKC130("unify_kana", true, "report_source_offset", true), NormalizerTable("normalized", "Normalizations.normalized", "report_source_offset", true)' "お あ a ア i ア オ" REMOVE_BLANK|WITH_TYPES|WITH_CHECKS
[
[
0,
0.0,
0.0
],
{
"normalized": "お<あ>a<あ>i<あ>お",
"types": [
"hiragana",
"symbol",
"hiragana",
"symbol",
"alpha",
"symbol",
"hiragana",
"symbol",
"alpha",
"symbol",
"hiragana",
"symbol",
"hiragana",
"null"
],
"checks": [
3,
0,
0,
4,
-1,
0,
0,
-1,
4,
4,
-1,
0,
0,
-1,
4,
4,
-1,
0,
0,
-1,
4,
0,
0
],
"offsets": [
0,
4,
4,
4,
8,
12,
12,
12,
16,
20,
20,
20,
24,
17
]
}
]
[highlight_html] Don't report error when we specify empty string into highlight_html()
as below.
highlight_html()
just returns an empty text.
table_create Entries TABLE_NO_KEY
column_create Entries body COLUMN_SCALAR ShortText
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer 'TokenNgram("report_source_location", true)' \
--normalizer 'NormalizerNFKC150'
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
load --table Entries
[
]
select Entries \
--match_columns body \
--query 'ab' \
--output_columns 'highlight_html("", Terms)'
[
[
0,
0.0,
0.0
],
[
[
[
1
],
[
[
"highlight_html",null
]
],
[
""
]
]
]
]
Added support aggregator_*
for dynamic columns and pseudo columns.
Pseudo column is a column with _
prefix.(e.g. _id
, _nsubrecs
, ...).
[CMake] Fixed a build error with CMake when both of msgpack and msgpackc-cxx are installed.
Please refer the comment of https://github.com/groonga/groonga/pull/1601 about details.
Fixed a parse bug when we use x OR <0.0y
with QUERY_NO_SYNTAX_ERROR
.
Records that should match may be not matched.
For example, if we execute the following query, {"_key": "name yyy"}
should match but {"_key": "name yyy"}
is not match.
table_create Names TABLE_PAT_KEY ShortText
table_create Tokens TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
column_create Tokens names_key COLUMN_INDEX|WITH_POSITION Names _key
load --table Names
[
]
select Names \
--match_columns "_key" \
--query "xxx OR <0.0yyy" \
--query_flags ALLOW_PRAGMA|ALLOW_COLUMN|QUERY_NO_SYNTAX_ERROR
[
[
0,
0.0,
0.0
],
[
[
[
0
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
]
]
]
]
]
[highlight_html] Fixed a bug that highlight position may be incorrect.
For example, this bug occures when we specify as highlight target both of keyword with the number of characters is one and keyword with the number of characters is two.