cloud-native file store
Full Changelog: https://github.com/cubeFS/cubefs/compare/v2.4.1...v2.4.3
If your CubeFS(ChubaoFS) version is v2.3.x or before, please refer to the UPGRADE NOTICE in v2.4.0 for upgrading steps. And also please make sure that your fuse client or objectnode version is equal to or older than the servers, i.e. master, metanode and datanode. In another word, newer versioned client can not be used in a cluster with older versioned servers.
This is a very important release,volume can be create as standard(3 replica) or low frequency access type(Erasure-Code), we introduce a new module BlobStore subsystem which supports erasure coding storage method(https://github.com/cubeFS/cubefs-blobstore) to support low frequency volume and reduce cost.
Due to 3.0-beta based on release-v2.4 which is lag behind master which is not easy to merge into, we let the core code is temporarily merged into one commit. If you have any questions, you can consult in the community group. All releated code and detail commit will merge into master and is expected to be released in next two months as release v3.0.0.
Please refer to the documentation for details English version : https://cubefs.readthedocs.io/en/latest/overview.html Chinese version: https://cubefs.readthedocs.io/zh_CN/latest/
fuse client
. read write blobstore(Erasure-Code) #212
master
master. mulit volume type #1291
master
low frequency access type volume support #1291
master
volume high and low frequency datapartition offline #1291
master
out scale of low frequncey volume #1291
master
recycle dataparition #1291
master
support different type dp in a volume #1291
meta
support index of blobstore #212
meta
obsolete index of eliminate data #1291
datanode
support two different type dataparitition #1291
datanode
obsolete and clear cache datapartition #1291
datanode
offline datapartition #1291
project name upgrade
. chubaofs to cubefs
If your CubeFS(ChubaoFS) version is v2.3.x or before, please refer to the UPGRADE NOTICE in v2.4.0 for upgrading steps. And also please make sure that your fuse client or objectnode version is equal to or older than the servers, i.e. master, metanode and datanode. In another word, newer versioned client can not be used in a cluster with older versioned servers.
This is a very important release with a lot of code to support mulit layer data layer such as blobstore (Erasure-Code) to adapte to datalake,volume can be create as standard(3 replica) or low frequency access type(Erasure-Code), we introduce a new module BlobStore subsystem which supports erasure coding storage method(https://github.com/cubeFS/cubefs-blobstore) to support low frequency volume and reduce cost.
Due to 3.0-beta based on release-v2.5 which is lag behind master which is not easy to merge into, we let the core code is temporarily merged into one commit. If you have any questions, you can consult in the community group. All releated code and detail commit will merge into master and release v3.0.0 is expected to be released in next two months.
Please refer to the documentation for details English version : https://cubefs.readthedocs.io/en/latest/overview.html Chinese version: https://cubefs.readthedocs.io/zh_CN/latest/
fuse client
. read write blobstore(Erasure-Code) #212
master
master. mulit volume type #1291
master
low frequency access type volume support #1291
master
volume high and low frequency datapartition offline #1291
master
out scale of low frequncey volume #1291
master
recycle dataparition #1291
master
support different type dp in a volume #1291
meta
support index of blobstore #212
meta
obsolete index of eliminate data #1291
datanode
support two different type dataparitition #1291
datanode
obsolete and clear cache datapartition #1291
datanode
offline datapartition #1291
project name upgrade
. chubaofs to cubefs
If your ChubaoFS version is v2.3.x or before, please refer to the UPGRADE NOTICE in v2.4.0 for upgrading steps. And also please make sure that your fuse client or objectnode version is equal to or older than the servers, i.e. master, metanode and datanode. In another word, newer versioned client can not be used in a cluster with older versioned servers.
datanode
feat: intruduce metrics degrade level to datanode [#1234]datanode
remove 5 GB left limit for disk to create datapartition[#1345]fuse client
follow recover handlers when closing open handler[#1344]datanode
improve performance of write. data allocated didn't put back to buffer pool [#1346]datanode
remove unnecessary json unmarshal when getting local extents [#1346]If your ChubaoFS version is v2.3.x or before, please refer to the UPGRADE NOTICE in v2.4.0 for upgrading steps. And also please make sure that your fuse client or objectnode version is equal to or older than the servers, i.e. master, metanode and datanode. In another word, newer versioned client can not be used in a cluster with older versioned servers.
datanode
feat: intruduce metrics degrade level to datanode [#1234]datanode
remove 5 GB left limit for disk to create datapartition[#1345]fuse client
follow recover handlers when closing open handler[#1344]datanode
improve performance of write. data allocated didn't put back to buffer pool[#1346]datanode
remove unnecessary json unmarshal when getting local extents[#1346]If your ChubaoFS version is v2.3.x or before, please refer to the UPGRADE NOTICE in v2.4.0 for upgrading steps. And also please make sure that your fuse client or objectnode version is equal to or older than the servers, i.e. master, metanode and datanode. In another word, newer versioned client can not be used in a cluster with older versioned servers.
fuse client
:support auto push data to push gateway #1175master
:reduce flow from interface of "client/partiton" that follower support it and will not redirect to leader #1452raft
:the receive buffer channel size of raft support to be configurable #1171metanode
:implement the content summary #1161master
:domain for cross zone #1169metanode
:when kill metanode(mean while mp stoped firstly) and if apply snapshot happen at the same time, snapshot will be block, causeing metanode can't be killed #1307metanode
:meta node migration be recorded in badmetapartitions concurrently without lock which lead to mp id miss #1259metanode
:makes AppendExtentKeyWithCheck request idempotent #1460fuse client
:push addr shadowed if export port is not set #1461objectnode
:readdir gets insufficient dentries if prefix larger than marker #1462raft
:remove redundant memcopy when reading raft snapshot #1257raft
start tcp listen before starting raft #1463 ltp test
:broken test case ftest01,update and unlock ltp test cases ltp test
:fix ltptest ci bugdocker
:finstall killall command in the docker imagecli
:fix cli tool typomaster
: support disk, datanode, metanode decommission and assign target node #1237master
:return response cache directly, not copy again #1464fuse client
:increase client retry times to avoid mp raft election timeout.grafna
: change grafana volume used size rate from rate to derivgrafna
:support push monitor data to gatewayobjectnode
: add subdir authorized check for objectnoderaft
:check nil when get leader term info #1465raft
: check size when read data from heartbeat port #1465raft
: add getRaftStatus for metanode #1465style
:rename cfs to cbfs & optimize the libsdk moduledocker
:change: use ghcr instead of dockerhubbuild
: add version information and rules for building fsckbuild
:change: update build status badge1. Purpose In the cross zone scenario, the reliability need to be improved. Compared with the 2.5 version before, the number of copysets in probability can be reduced. The key point is to use fault domains to group nodesets between multiple zones.
Reliable papers https://www.usenix.org/conference/atc13/technical-sessions/presentation/cidon
Chinese can refer to https://zhuanlan.zhihu.com/p/28417779
2. configuration
1) Master
Config file:master.json Enable faultDomain set item "faultDomain": true
Zone count to build domain faultDomainGrpBatchCnt,default count:3,can also set 2 or 1
If zone is unavaliable caused by network partition interruption,create nodeset group according to usable zone Set “faultDomainBuildAsPossible” true, default is false
The distribution of nodesets under the number of different faultDomainGrpBatchCnt 3 zone(1 nodeset per zone) 2 zone(2 nodesets zone,1 nodeset zone,Take the size of the space as the weight, and build 2 nodeset with the larger space remaining) 1 zone(3 nodeset in 1 zone)
Ratio 1) The use space threshold of the non-fault domain(origin zone) After the upgrade, the zone used by the previous volume can be expanded, or operated and maintained in the previous way, or the space of the fault domain can be used, but the original space usage ratio needs to reach a threshold, that is, the current configuration item The proportion of the overall space used by meta or data Default:0.90 UpdateInterface: AdminUpdateZoneExcludeRatio = "/admin/updateZoneExcludeRatio"
2) the use space threshold of the nodeset group in the domain Nodeset group will not be used in dp or mp allocation default:0.75 Update interface: AdminUpdateDataUseRatio = "/admin/updateDomainDataRatio"
2) Datanode && metanode
After the fault domain is enabled, a minimum configuration of the fault domain is constructed under the default configuration: Each zone contains 1 datanode and 1 metanode, and the zone name needs to be specified in the configuration file There are 3 datanodes and 3 metanodes in 3 zones
For example, three datanodes (metanode) are configured separately: "zoneName": "z1", "zoneName": "z2", "zoneName": "z3", Start after configuration, the master will build a nodeset for z1, z2, and z3, and component a nodesetgrp
3. Note 1) After the fault domain is enabled, all devices in the new zone will join the fault domain 2) The created volume will preferentially select the resources of the original zone 3) Need add configuration items to use domain resources when creating a new volume according to the table below. By default, the original zone resources are used first if it’s avaliable
Cluster:faultDomain | Vol:crossZone | Vol:defaultPriority | Rules for volume to use domain |
---|---|---|---|
N | N/A | N/A | Do not support domain |
Y | N | N/A | Write origin resources first before fault domain until origin reach threshold |
Y | Y | N | Write fault domain only |
Y | Y | Y | Write origin resources first before fault domain until origin reach threshold |
Note: the fault domain is designed for cross zone by default. The fault domain of a single zone is considered as a special case of cross zone, and the options are consistent
example : curl "http://10.177.200.119:17010/admin/createVol?name=vol_cross5&capacity=1000&owner=cfs&crossZone=true&defaultPriority=true"|jq .
1. Purpose In order to query the content summary information of a directory efficiently, e.g. total file size, total files and total directories, v2.5 stores such information as the parent directory’s xattr.
The parent directory stores the files, directories and total file size of the current directory. Then only need to make recursive of the sub directories, and accumulate the information stored by the directories to query the content summary information of a directory.
2. Configuration Client config file: fuse.json 1) Enable XAttr ”enableXattr”:”true” 2) Enable Summary ”enableSummary”:”true” Both of xattr and summay have to be set if you want to mount a volume to the local disk. Set summary is enough if you want to access the volume via libsdk.so.
3. How to use There are two different ways to get the content summary of a directory. 1) Fuse mount getfattr -n DirStat yourDirPath getfattr can be installed by: yum install attr or apt install attr 2) libsdk.so cfs_getsummary (libsdk/libsdk.go)
4. Note 1)The incremental files’ summary information will be held by their parent directories. But the old files will not. Use cfs_refreshsummary (libsdk/libsdk.go) interface to rebuild the content summary information. 2)The files, directories and total file size are updated asynchronously in the background. Users are not aware of these operations, but it does increase the requests to meta servers (usually doubled). You are recommended to evaluate the impact to your cluster before using this feature.
If your ChubaoFS version is v2.3.x or before, please refer to the UPGRADE NOTICE in v2.4.0 for upgrading steps. And also please make sure that your fuse client or objectnode version is equal to or older than the servers, i.e. master, metanode and datanode. In another word, newer versioned client can not be used in a cluster with older versioned servers.
meta&object
introduce ReadDirLimit interface to retrieve partial results #1234
fuse client
use ReadDirLimit in fuse client #1244
sdk
makes AppendExtentKeyWithCheck request idempotent #1224
meta
start tcp listen before starting raft 1256
raft
remove redundant memcopy when reading raft snapshot to avoid snapshot hanging 1264
object
handling range read request in a behavior compatible with S3 #1286 #1298
fuse client
:support auto push data to push gatewaymaster
:reduce flow from interface of "client/partiton" that follower support it and will not redirect to leaderraft
:the receive buffer channel size of raft support to be configurablemetanode
:implement the content summarymaster
:domain for cross zonefuse client
:fix: currAddr of stream conn is emptyfuse client
:fix: update extent cache no matter whether eh is dirty or notmetanode
:when kill metanode(mean while mp stoped firstly) and if apply snapshot happen at the same time, snapshot will be block, causeing metanode can't be killedmetanode
:meta node migration be recorded in badmetapartitions concurrently without lock which lead to mp id missmetanode
:return file nodes for statfsmetanode
:meta not use warn when consulMeta not setmetanode
:makes AppendExtentKeyWithCheck request idempotentfuse client
:push addr shadowed if export port is not setobjectnode
:readdir gets insufficient dentries if prefix larger than markerraft
:remove redundant memcopy when reading raft snapshotraft
start tcp listen before starting raft ltp test
:broken test case ftest01,update and unlock ltp test cases ltp test
:fix ltptest ci bugdocker
:finstall killall command in the docker imagecli
:fix cli tool typomaster
: support disk, datanode, metanode decommission and assign target nodemaster
:return response cache directly, not copy againmaster
:add vol usedRatio warn log when usage > 90%master
:add master metrics vol_meta_count to export vol dp/mp/inode/dentrymaster
:del replica with force if decommission hangmetanode
:add log for loading meta partitionsfuse client
:increase client retry times to avoid mp raft election timeout.grafna
: change grafana volume used size rate from rate to derivgrafna
:support push monitor data to gatewayobjectnode
: add subdir authorized check for objectnodeobjectnode
: tracking objectnode modification for cfs-serverraft
:check nil when get leader term inforaft
: check size when read data from heartbeat portraft
: add getRaftStatus for metanodecli
:update gitlab-cicli
:Enhancement: add ci check for ltptest resultcli
:enhance: upload docker_data when ci tests finishcli
:Create and Update ci.ymlstyle
:rename cfs to cbfs & optimize the libsdk moduledocker
:change: use ghcr instead of dockerhubbuild
: add version information and rules for building fsckbuild
:change: update build status badgePlease refer to the notice of release v2.4.0 for how to upgrade from previous versions.
objectnode
introduce readdirlimit which returns part of all the dentries. (#1243 )fuse client
fix occasional IO errors. (#1179 #1205 #1215)fuse client
statfs system call returns the correct inode count. (#1192)fuse client
make AppendExtentKeyWithCheck idempotent.ci
unlock some useful LTP test cases; make ci results checking more strict; upload logs after ci finishes for debug convenience.objectnode
support subdir checks. (#1208)ci
add gofmt check. (#1233 )raft
check size when read data from heartbeat port.monitor
add master metrics vol_meta_count to export vol dp/mp/inode/dentry count info.