Aistore Versions Save

AIStore: scalable storage for AI applications

3.10

2 years ago

Highlights

  • Global (cluster) config meta-version 2
  • Bucket metadata (BMD) meta-version 2
  • Python SDK
  • Build: Go 1.18; upgrade all packages
  • MessagePack archiving

Global Config v2 & Bucket Metadata v2

  • bump up cluster config's meta-version; handle backward compatibility !5122, !5123, !5124
  • redefine and extend feature flags !5132
  • capacity and space watermarks (cleanup, low/high, OOS) !5143, !5144
  • BMD meta-v2 with backward-compatible logic to load v1 !5145, !5150
  • BMD meta-v2: simplify mirroring configuration !5151
  • add transport section !5153
  • support loading previous meta-version !5154
  • readonly knobs; backward compatibility !5155
  • introduce cos.size type and "kb", "mb", etc. suffixes !5156
  • cos.size vs backward compatibility !5157
  • add memsys section and tunables !5158
  • log flush time interval !5159
  • join-cluster-at-startup timeout !5161
  • add log-stats time, add validators !5171
  • continued refactoring !5172
  • add tcb (transform-and-copy-bucket) section; stream bundle multiplier; compression !5173
  • revise docs/configuration.md !5180

msgpack archiving; list, create, and read msgpack formatted shards

  • msgp (generic) shard !5100, !5101
  • fix preambles in the msgp generated files !5086
  • alternative msgp; sgl as io.ByteScanner; Python unpack to test !5103
  • assorted text fixes; more msgpack !5104
  • archiving formats: add msgpack (in addition to tar, tgz, zip) !5107, !5108
  • add integration tests !5100, !5101

Python SDK

  • basic bucket and object operations !5109
  • introduce pydantic; refactor list-buckets !5120
  • CI: add Python SDK test !5112
  • improve list-objects API !5163
  • refactor API calls for usability: !5128
  • move direct requests call out of list objects, create and delete bucket, cluster map !5130
  • put-object & head-bucket API !5141
  • improve list object API !5163
  • fix return type in _request_raw method !5164
  • add head-object method !5166
  • add setUp and tearDown in tests !5167
  • improve creating temporary file !5168
  • add evict-bucket and delete-object; add props argument for list-objects !5175
  • get-object returns a stream instead of the entire content !5184
  • update README, docs !5178, !5190
  • fix naming; add read-all method for streams !5196
  • list_object to support pagination !5198
  • improve object streaming; rename msg.py -> types.py !5203
  • Python SDK tutorial !5218

CLI

  • show cluster config feature flags in human-readable format !5133
  • fix setting backend bucket props !5152
  • add ais object rm --all to remove all in one shot !5195
  • verbose and non-verbose multi-object operations !5197

Tests

  • re-enable smoke-test for Cloud buckets !5098
  • msgp (generic) shard !5100, !5101
  • add missing wait for put-copies; BMD string !5204
  • failback; devtools: wait-cluster; transport !5209
  • dummy keepalive tracker !5211

Docs

  • website: tech blog on promoting local and shared files !5082
  • explain msgp usage; add msgp-update to Makefile !5085
  • update msgp generator usage !5094
  • msgpack !5110
  • revise and amend configuration.md !5180
  • performance tuning !5115, !5116
  • development, first-time usage, production deployments (summary) !5127
  • getting started (recommendations) !5129
  • local playground, tracking access time (more edits) !5139
  • fix references, add a reference !5140
  • erasure coding (space utilization ratio, IO performance) !5142

Build

  • upgrade fasthttp package !5092
  • upgrade direct dependencies !5096, !5097
  • transition to Go 1.18 !5106
  • upgrade all dependencies except 1.18 !5113
  • upgrade all dependencies except 1.18; v3.10-rc2 !5181
  • upgrade all dependencies including 1.18; v3.10-rc3 !5186
  • upgrade all minors !5200

Bug fixes and performance improvements; continuous refactoring

  • skip bucket name, etc. validation when initializing lom !5083
  • follow-up !5084
  • unsafe string: inline and use it consistently !5087
  • write-msg-pack (ref) !5088
  • control-plane broadcast & unicast: result value factories and decoders !5089, !5090, !5091, !5093, !5095
  • global rebalance: dynamic reg/unreg streams; housekeeper name suffix !5099
  • fix sending http request body on redirect: recompute content-length !5102
  • not restoring atime when !5105
  • multi-object operations !5111
  • local from-scratch deployment: add user-friendly tips !5114
  • follow-up !5117, !5118
  • write-policy: both metadata and data; bump up config's meta-version; backward compatibility (major) !5121
  • deploy: fix Dockerfiles and related scripts !5126
  • assorted usability tweaks !5134
  • local playground: print ports AIS listens on !5135
  • remove bench/soaktest; rewrite devtools/readme !5136
  • config: write policy for data must be 'immediate' !5137
  • docker + LVM deployment: no disks fix !5146
  • move mock-bmd => cluster/mock (refactoring) !5147
  • docker + LVM deployment: no disks !5148
  • local playground: pgrep to check; quiet run-cmd !5160, !5162
  • fix deploy.sh script for older bash versions !5165
  • target startup: skip docker union mounts when enumerating drives !5169, !5170
  • dedup fix HEAD(obj) system attributes !5174
  • get-object lock/unlock (refactoring) !5176
  • URI parsing, config validations, and assorted ref !5177, !5179
  • making "rlock" exception to forcefully remove corrupted !5182
  • dont-lookup-remote-bucket = true: proceed silently, log-wise !5183, !5185
  • remove +build directive (obsolete) !5188, !5189
  • get random proxy (ref) !5191
  • cluster map (ref); node equality !5192
  • PUT object vs remote backend returning status=503 !5193
  • multi-object operation, range template (ref) !5194
  • aisloader works !5199
  • when shutting down warn with details !5201
  • api: xaction wait-IC min waiting time !5202
  • compute-checksum and clone-md (ref) !5205
  • memsys: mem-free to include buff/cache !5206
  • memsys: revise sys/mem (major) !5207
  • lif to lom conversion: free on error; EC error handling !5208
  • revise xs/rename-bucket !5210
  • deferred unlock via lightweight lif (ref) !5212
  • lom as a reader, lif as unlocker !5213
  • get-from-neighbor (gfn): reg hk housekeepr under lock; abort to self-expire !5215
  • lom copy: allocate with name !5216
  • lom xattr (ref) !5217

3.9

2 years ago

AIS v3.9 is substantial productization and performance-improving upgrade. Much of the codebase has been refactored for consistency, with numerous micro-optimization and stabilization fixes across the board.

Highlights

  • promote: redefine to handle remote file shares; collaborate when promoting via entire cluster; add usability options; productize;
  • xmeta: extend to also dump in a human-readable format: a) erasure-coded metadata and b) object metadata;
  • memory usage and fragmentation: consistently use mem-pooling (via sync.Pool) for all control structures in the datapath;
  • optimistic concurrency when running batch prefetch jobs; refactor and productize;
  • optimize PUT datapath;
  • core logic to deconflict running concurrent xactions (asynchronous jobs): bucket rename vs bucket copy, put a node into maintenance mode vs offline ETL, and similar;
  • extend and reinforce resilvering logic to withstand simultaneous disk losses/attachments - at runtime and with no downtime;
  • stabilize global rebalance to successfully pass multiple hours of random node "kills" and restarts - node-left and node-joined events - in presence of stressful data traffic;
  • self-healing: object metadata cache to support recovery upon mountpath events (e.g., drive failures);
  • error handling: phase out generic fmt.Errorf and consistently use assorted error types instead;
  • additional options to speedup listing of very large buckets (list-objects);
  • numerous micro-optimizing improvements: fast datapath query (DPQ) and many more.

Promote files and directories

  • refactor as a 2-phase transaction and auto-detect file share (initial) !4929
  • auto-detect file share and distribute the work between target nodes !4945
  • add test; add target node => IC notifications !4975
  • extend test coverage; reinforce global UUID when promoting via entire cluster !4976
  • rename api.Promote; add test permutations and checks !4985
  • remove redundant control structures; cleanup !4987
  • add API options delete-src (delete source) and overwrite-dst (overwrite destination)!4988
  • fix extra-copy optimization with full refactoring !4989
  • revise/optimize HEAD(object) implementation and utilize it when promoting with overwrite-dst=false (major) !4991
  • extend object write transaction (OWT) to support the flow !4992
  • support in (i.e., transmitted), out (i.e., received) and locally-promoted stats counters - files/objects and bytes !4993
  • introduce confirmed file share; add user option not to auto-detect file share !5019
  • CLI: add overwrite-dst and delete-src E2E tests !5024
  • consolidate control, eliminate ambiguity !5045
  • increase test coverage !5047, !5063
  • add all test permutations to cover (ais | cloud | remote-ais) bucket vs. (non-redundant | EC | n-way mirror) !5068

ETL

  • add CLI to view stored ETL code and specification !4925
  • handle target-down; test !4933
  • redefine and improve ETL API (!4947, !4966, !5022), including:
    • manage (CRUD wise) persistently stored ETL definitions
    • eliminate redundant URL path parameters
    • enforce uniqueness of the user-provided ETL name
  • remove (obsolete) embedded ETL-specific annotations from the init spec (pod template)
  • support stopping and restarting ETLs !5005, !5056
  • update ETL docs and fix minor bugs !4984, !5022

Global Rebalance

  • global rebalance status: always respond with total (cumulative) stats counters !4905
  • generic fs.Walk for global rebalance (refactoring) !4889, !4930
  • get-status & health !4934
  • global rebalance status: reimplement to optimize !4936
  • devtools: merge WaitForRebalanceToStart and WaitForAllResilver !4937
  • tweak/optimize receive logic !4994
  • abort via stage notifications from other target nodes (major) !5015
  • transport streams vs receive errors; assorted fixes !5040
  • tweak preemption logic (when rebalance triggering events arrive back to back) !5057
  • assorted fixes: global rebalance vs n-way mirroring & resilvering !5071, !5072
  • consistent renames and continuous refactoring !5075

Resilvering (in presence of drive failures and attachments)

  • tools and stats: wait-for-all-resilvers, multi-snap API !4888
  • resilvering vs copy management (major) !4865, !4866, !4867
  • resilvering: tweak is-active time interval !4882
  • support losing multiple disks (mountpaths) simultaneously !4884
  • multiple overlapping add/remove disk operations: fixes !4894
  • resilvering as scrubbing: recover objects to their expected (default) locations !4900
  • resilvering: interval-of-inactivity multiplier !4974
  • resilvering under stress in presence of lost mountpath(s) !5058

Asynchronous Jobs (aka xactions)

  • when aborting and propagating abort to the control-plane caller, make sure not to lose the original cause for the abort !4886
  • fix put-xaction finishing logic !4887
  • aborting jobs: propagate the original cause through channels and APIs !4890, !4891
  • revise lookup by only-running and/or by UUID !4897
  • move xaction and xreg packages with refactoring !4898
  • clarify running vs not finished xaction !4908
  • registry: fix matching logic, remove redundant code !4911, !4912
  • registry: amend housekeeping !4913
  • registry: continued refactoring and cleanup !4914
  • "limited coexistence" between running and about-to-run services (new) !4915, !4916, !4917
  • xact package: revise and optimize abort-checking concurrency !4923
  • registry: continued simplification and cleanup !4940
  • reinforce global UUID for all cluster-wide xactions !4978
  • IC notifications vs transactional xactions: same rules for all !4982
  • more stateful info: propagate xaction reference all the way into local PUT flow (major) !4995
  • copy-object -- xaction -- promote: continued refactoring !5002
  • registry: micro-optimizations and cleanup !5028

CLI

  • PUT: add an option not to load (skip loading) object metadata; amend docs; refactor and cleanup !4859
  • add a command to view ETL code/spec !4925
  • fix: do not add --help flag to the subcommands of subcommands !4926
  • amend 'show config' to include CLI config (in addition to cluster and local configs); fix cluster-unreachable error !4983
  • revise 'flag-is-set' for Boolean flags !5021, !5023, !5030
  • copy bucket: add --force option !5042
  • add start etl command !5056
  • ais show cluster: add support for refresh=<time-interval> and count options !5076
  • update CLI docs !5078
  • enable 'ais show storage' and friends to run continuously and refresh periodically !5079

Testing

  • test fixes to align with changes in the core !4861
  • add ensure-num-mountpaths helper, and reinforce !4892
  • use api.WaitForXaction instead of tutils.WaitForXactionByID !4893
  • re-enable one fs-checker test, allow more time for mountpaths !4903
  • add more checks when downloading object !4910
  • extend CLI e2e promote test !4932
  • WaitFor follow-up !4943
  • fix e2e AuthN messages !4952
  • retry upon failure to recover a damaged erasure-coded object !4986
  • amend and extend EC tests !4990
  • revise and enable bucket-rename-and-copy test !5060

CI/CD (continuous integration)

  • add CI job that runs on multiple cloud buckets !5027
  • add 1.18-rc1 version to build check !5044
  • add test-short-minimal to test a single-node cluster !5046
  • make AWS_REGION global env variable !5050
  • update test-long stage !5059

Bug fixes, performance improvements; continuous refactoring

  • LOM load to return distinct types: syscall-error and corrupted-error !4849
  • LOM vs n-way mirroring: fix and revise caching of the metadata !4850
  • list-objects: add fast mode --only-names !4851
  • introduce permission to overwrite disconnected backend !4852
  • api: refactor PUT API; fix devtools !4853
  • optimize PUT latency by allowing not to load object metadata !4854
  • aisloader bench: do not run goroutine per each PUT request !4855
  • reinforce access time atime (major) !4856
  • reintroduce no-metadata error; fix n-way stress; refactor !4857
  • build: fix deprecation warning on MacOS !4858
  • when copying objects differentiate between copying == mirroring and all other scenarios !4860
  • simplify LOM from-fs logic !4862
  • general: don't use regex to validate names and UUIDs !4863, !4864
  • assorted fixes !4868
  • preserve atime across LOM caches !4869
  • storage cleanup: leftover copies, corrupted and missing metadata !4870
  • refactor cmn and api packages !4871
  • list-objects: use-cache option !4872
  • consistently use HTTP status 507 throughout; assorted fixes !4875
  • eliminate redundant mirroring !4876
  • get-cold (aka cold-GET) follow-up !4877
  • object write transaction (OWT) fusion !4878
  • prefetch: support optimistic concurrency (major) !4879
  • name locker: fit two structures into 24 bytes !4880, !4881
  • disable/detach mountpath: graceful (admin request) and immediate (FSHC) !4883
  • move health package under fs/health !4885
  • bucket summary and obj-list query: move, refactor, and simplify !4895, !4896
  • control-plane: always free call-results back to pool !4899
  • api: eliminate code duplication !4901
  • general: deprecate and remove query objects !4902
  • refactor ais package !4904
  • control-plane transactions: refactor, reduce code !4906
  • initial ETL get API implementation !4907
  • control-plane transactions: follow-up !4909
  • introduce read-only (but still configurable) timeouts: cplane-operation and max-keepalive !4918
  • intra-cluster transport streams: tweak termination logic !4919
  • slab allocator: amend pooling of the SGL control structures !4920
  • transport streams: tweak termination logic !4921
  • transport streams: revise transmit <=> terminate concurrency, optimize !4922
  • refactor transaction types (minor) !4927
  • list-objects: name-only option (follow-up) !4928
  • fs: non-recursive walk in lexical order !4931
  • fs: refactor bucket-traversing logic, eliminate nested closures" !4935
  • list-objects and bucket-summary: refactor target side !4938
  • LRU & storage cleanup: clarify when-previous-is-running !4939
  • transport: redefine client callback to return error !4941
  • transport: consistent drain-and-free cleanup on the receive side !4942
  • lint: gofumpt & gocritic !4944
  • list-objects buffering, caching: refactor, optimize !4946
  • etl: intuitive RESTful URL paths - Part 1 !4947
  • feature flags: refactor, enforce intra-cluster requests via API endpoints !4948
  • API-level JSON messaging: uniformity & consistency !4949
  • core: fast URL query parsing (major) !4950, !4953, !4954, !4955, !4956, !4957, !4958
  • tools: add support for EC metafile to xmeta !4959
  • bench: revamp lstat !4960
  • api package: memory pooling !4961
  • follow-up: DPQ; t.Helper !4962
  • api package: memory pooling !4963
  • api package: memory pooling !4964
  • fs and cos packages: alternative slightly faster fstat to check existence !4965
  • etl: change init-code and init-spec; intuitive RESTful APIs - part 2 !4966
  • fix the logic to attach remote cluster during early startup !4967
  • api package: memory pooling !4968
  • api/object, aisloader: continued refactoring !4969
  • core/cluster: rewrite max-version decision logic !4970
  • core: call-args control structure !4971, !4972, !4973
  • tools: xmeta support for LOM !4977
  • IC notifications: eliminate redundant aborts !4980
  • etl: implement ETL Stop & Delete APIS; intutive RESTful API part 3 !4981
  • docs: updates to reflect all ETL API changes !4984
  • alloc/free put-obj; mem-pool !4994
  • cos package: inline usage of one-time constants !4996
  • ais target: refactor copy-object, copy-reader logic !4997
  • double transactions begin timeout; send-remote and `OWT !4998
  • target-to-target copy object: remove local-only option !4999
  • URL query parameter (ref) !5000
  • go-vet xmeta !5003
  • intra-cluster PUT vs user PUT: further differentiate and account for !5004
  • etl: add API to start ETL !5005
  • error processing: wrap errors to retain the types (major) !5006, !5007
  • tools: fix linting when golangci-lint is installed differently !5008
  • cmn package: remove LogLevel and Vmodule fields from config !5009
  • simplify error reporting and attribution (major ref) !5010
  • lint-1.44.2 !5011
  • error processing and attribution !5012
  • keep-alive cluster nodes: major refactoring, cleanup, code reduction !5013
  • transport: transmit/receive unsized objects resulting from streaming transformation !5017
  • etl: delete operation must be controlled by primary gateway !5022
  • apc: new package for API constants (major ref) !5026
  • copy/transform bucket: add 'force' option !5029
  • apc package: move metadata-write and refactor bucket-props validators !5032
  • apc package: move access control (ref) !5033
  • apc package: move action message (ref) !5034
  • apc package: move list-objects control message !5035
  • apc package: move copy/transform bucket message (ref) !5036
  • common bucket structure (major) !5037, !5038
  • fix: do not use jogger-group bucket inside individual jogger's visit-object callbacks !5048
  • core: two distinct methods to initialize LOM (major) !5049
  • backend bucket: fix returned AWS error to include details !5051
  • assorted fixes !5052, !5053
  • cloning LOM (metadata): follow-up !5065
  • self-healing: object recovery in the GET path !5069
  • assorted fixes: object recovery vs runtime metadata extension !5070
  • avoid duplicate FQN parsing when traversing buckets and visiting content types !5073

3.8

2 years ago

Highlights

  • ETL: !4621, !4624, !4633, !4649, !4681, !4702, !4780, !4790, !4802, !4831, !4833
  • storage cleanup: !4632, !4741, !4748, !4753, !4754
  • custom user-defined object metadata, system Cloud-specific metadata: !4655, !4657, !4659, !4661, !4663, !4667, !4668, !4670, !4673
  • reinforce and protect ais volume and associated metadata: !4683, !4684, !4698, !4699, !4701, !4703, !4813, !4814, !4834, !4837
  • safely add and remove mountpaths (disks) at runtime !4721, !4722, !4728, !4729, !4735, !4736, !4740, !4744, !4789, !4818
  • "easy URL": support gs/bucket/object, s3/bucket/object, and similar (easy) URLs across all backends !4711
  • introduce node standby mode !4688, !4689, !4691
  • performance monitoring with a scope node | batch job | cluster !4792, !4793, !4794, !4798, !4800, !4810, !4812
  • support ais targets with no disks (feature) !4825
  • Kubernetes Operator v0.9

CLI

  • fix file name and message when extracting a file from an archive !4602
  • templates: circumvent index wrap around for aliases !4603
  • multi-object transform and copy: continue-on-err; add similar keywords !4605
  • show object (and its variations): improve usability and fix minors !4674
  • show config improvements; follow-up !4676
  • add command to immediately unregister node from cluster map !4693
  • new command ais storage cleanup !4704
  • attach mountpath and detach mountpath (ref) !4705
  • show storage subcommands !4713
  • --wait option for storage cleanup command !4773
  • report human-readable stats in show job xaction !4788
  • add commands to manage AIS CLI configuration !4791
  • improve alias command and docs !4795
  • follow-up: introduce xaction snapshot (aka "snap") into CLI templates !4806
  • add ais cluster show stats subcommand !4807
  • refresh for stats !4811
  • improve job xaction show command; minor refactoring !4815, !4816
  • node IDs are very special completions (provide visual cue) !4823
  • fix archive ls; set and show config usability; log_level !4824
  • remove glog's vmodule (obsolete) !4826
  • improve ais cluster rebalance command, allow ais archive ls to list entire bucket (of tarballs, etc.) !4828
  • do not print hint if JSON format is on !4839
  • introduce unknown-value and not-set-value, and use consistently; fix stop-maintenance logic !4842
  • downloader: fix error handling, improve usability, !4844

Integration and unit testing; CI/CD

  • add Go 1.17 shuffling for short tests !4615
  • restore original primary in TestForwardCP !4619
  • use new t.Setenv instead of os.Setenv and defer os.Unsetenv !4620
  • CI: fix lint errors on MacOS !4622
  • wait for xaction to finish when enabling mirroring !4628
  • tests/cleanup: ignore "server closed idle connection" !4631
  • fix failing TestGetAfterReregisterWithMissedBucketUpdate !4640
  • dev-tools/tassert: timestamp failure message, customize print-stack !4647
  • dev-tools and tests: continued refactoring and cleanup !4654, !4656
  • CI: add minimal cluster job that runs with 1 proxy and 1 target !4643
  • save/check cluster state; log: memory reporting !4677
  • CI: skipping tests in short mode !4776
  • test cluster config persistence and mountpath add/remove across restarts !4821
  • massive read operation vs detach/attach mountpath (non-redundent bucket) !4822
  • wait for mirroring, re-enable mountpaths !4835

Documentation & blog; website aiatscale.org

  • transition to Go 1.17 (add a note) !4623
  • remove empty reference to Python client !4636
  • remote AIS cluster: examples and references !4665
  • remote buckets (naming notation, CLI examples, cross-refs) !4669
  • bucket property inheritance, LRU note; E2E tests: set-custom !4673
  • configuration.md; transport/Rx: PDU header validation !4679
  • website: AIStore & ETL: Using AIS/PyTorch connector to transform ImageNet (post #2) !4694
  • remote ais cluster, global namespace !4697
  • cleanup http_api docs and add missing json for list buckets operation !4715
  • website: AIStore & ETL: Introduction (post #1) !4716
  • update command-line usage !4727
  • ais cluster show stats !4809
  • update ais show job CLI !4817
  • website: copying existing file datasets in two easy steps !4829
  • update main v3.8 README !4830
  • touch cluster.md; bump version to 3.8 !4838
  • What's new in AIS v3.8

Build and toolchain; dependencies

  • transition to Go 1.17 and use new Go features !4627
  • make: add run and restart make targets !4707
  • reinitialize go-modules and upgrade all minor versions !4724, !4726

Bug fixes and improvements

  • multi-object copy/transform: handle errors; ref common (part two) !4601
  • multi-object archive/copy/transform: aborting logic and cleanup !4606
  • general: use Go 1.17 enhancements !4607
  • health-checker: more efficient method to read directory entries !4608
  • bucket copy/transform: aborting logic, cleanup, and test !4609
  • transport/streams: fix broadcast in a single-target cluster !4610
  • revise err-aborted, refactor common errors, downloader !4611
  • cluster config vs bucket props vs LRU !4612
  • erasure-coding: fix bucket-encoding xaction hang on errors and upon abort !4613
  • deployment: add registry URL in docker build and docker push commands !4614
  • add atomic counted err value !4616, !4617
  • on-demand xaction: mutex to protect compound state !4618
  • etl: add more details when K8s pod times out on being ready; add debug log !4621, !4624
  • intra-cluster notifications: minor refactoring !4625
  • xaction: add immutable (original) bucket !4626
  • revise xaction aborting logic !4629
  • lru pkg: trash non-existing buckets; health check: docs and follow-up ref-s !4630
  • storage cleanup: remove artifacts of erasure-coding (unfinished slices, redundant replicas and metafiles) !4632
  • etl: fix waiting for a ready condition when starting K8s pod !4633
  • fix global-rebalance can-start & must-run helpers !4634, !4635
  • log: add backend module; AWS: ignore "unknown region" most of the time !4637
  • single-target cluster: support bucket renaming !4638
  • node name vs 1) nodes joining the cluster and 2) early startup !4639
  • xaction pkg: consistent naming and refactoring !4641
  • refine health check: add 'ready to rebalance'; fix test to wait for cluster state !4642
  • default bucket props (ref) !4644
  • glog: set-node, reduce header; transport streams: add destination ID !4645
  • copy configuration values for default bucket props !4646
  • retriable connection errors; transport: do retry !4648
  • etl: remove single-pod limitation (i.e., the capability to run multiple ETLs) !4649
  • retriable connection errors: use the same condition consistently throughout !4650
  • single-target cluster shutdown and assorted fixes !4651
  • global rebalance: refine preemption logic, abort associated streams (major ref) !4652, !4653
  • custom object metadata: set/replace vs add/update !4655
  • consistent custom versioning, checksumming, ETag (major refactoring) !4657
  • default buffer sizes (TCP, HTTP) !4658
  • custom metadata: unify downloader; unify version comparison !4659, !4661
  • downloader: make object comparison much more rigorous !4662
  • object metadata: to/from header converters !4663
  • unify the code that checks local/remote equality, change backend.HeadObj() API, simplify cold-GET !4664
  • erasure-coding config and bucket props: add batch size back for compatibility !4666
  • object metadata: system attributes (as opposed to user-defined) !4667
  • object metadata: unset custom keys on PUT, resolve bucket copy vs inc-version dichotomy !4668
  • unify object props and system object attributes (major revision) !4670
  • get-bucket-props: remove redundant code, simplify !4671, !4672
  • configuration: add transport-idle-teardown !4675
  • memsys: tracking memory stats and responding to OOM (major rev) !4678
  • xmeta (tool): add VMD, refactor !4680
  • etl: add new (non-HTTP) communicator and stop relying on the K8s API Server !4681
  • SizeBytes() and AtimeUnix() methods for CT !4682
  • volume: revise, refactor, and reinforce operations on metadata !4683, !4684
  • downloader: refactoring !4685
  • cluster membership: shutdown, decommission, maintenance !4686
  • target standby mode !4688, !4689, !4691
  • dev scripts: deploy remote ais cluster with the same build tags !4690
  • refactor and simplify admin-join/auto-join logic !4692
  • follow-up: self-join, admin-join, immediate removal from the cluster map !4695
  • phase out "unregister" & further consistent renames !4696
  • add volume package (ref) !4698
  • volume: consolidate loading and initialization; bootstrap in two passes !4699
  • cmn/context => cmn/cos/context; rename "attach/detach remote ais" (minor ref) !4700
  • volume: keep it sync with local config !4701
  • etl: add more Python runtimes !4702
  • volume: keep it sync with local configuration !4703
  • api to attach/detach/enable/disable mountpaths: pass node ref consistently !4706
  • make: add run and restart targets !4707
  • error formatting (ref) !4708
  • api package (major ref) !4709, !4710
  • "easy URL": accommodate list-objects with no json msg !4711
  • deploy script: fix getting the last parameter !4712
  • intra-cluster reverse proxying vs content-length !4714
  • improve error formatting and content !4717, !4718
  • cluster/hrw and cmn/err: continued refactoring !4719
  • graceful mountpath (disk) removal - major revision !4721, !4722
  • fs: more validation and naming consistency !4725
  • api to attach mountpath with force !4728
  • transactional (begin -- commit) to remove mountpath, with committing after resilvering !4729
  • lint and refactoring, housekeeping !4730
  • follow-up: add wait-for-resilver, sort mountpaths; increase CI timeout !4732
  • add/remove disks: make it transactional with resilvering in-between !4735, !4736, !4740, !4744
  • refactor and split reb pkg, add res (resilver) pkg !4737
  • revise GFN (state): simplify global, move/rewrite local !4738
  • follow-up: move global GFN state to reb pkg !4739
  • separate storage cleanup xaction from LRU one !4741
  • xaction registry vs housekeeper: tweak initializations !4742
  • housekeeper: run first & reinforce; x-factories: ordered initialization !4743
  • cos pkg: make ExitLogf print file and line of the caller !4745
  • housekeeper: add wait for housekeeper readiness when registering !4746
  • fs: get and get-available; test-init; refactoring !4747
  • storage cleanup (feature, major upgrade) !4748
  • rename API action constants for consistency; tweak OOS logic !4749
  • dsort: fix panic when creating shard !4750
  • new pkg to combine both cleanup and eviction logic !4751
  • memsys: reduce free-mem requirement for tests & minor fixes !4752
  • storage cleanup: repackage, rewrite, and refactor !4753, !4754
  • memsys for tests (follow-up) !4755
  • parse request query only once (core) !4757
  • create and lookup buckets on the fly (major rev) !4758
  • memsys and housekeeper: further clarification !4759
  • follow-up: do not lookup remote bucket; stopping hk !4760
  • revise and refactor logic to handle deleted (formerly trash) !4761
  • memsys vs mock target: more orderly initialization !4762
  • move mock target out of production code !4763
  • memsys: tune-up low-watermark !4764
  • delete/undelete: retain bucket names when deleting !4766
  • memsys: two distinct pairs of slab allocators !4767
  • general: rename mountpoint to mountpath for consistency !4768
  • memsys: tune-up for OOM !4769
  • memsys: reintroduce max-depth !4770
  • reduce glog and tweak panic-OOM condition !4771
  • memsys: fix swapping state !4772, !4774
  • fs: make mountpath flags atomic !4775
  • xreg pkg: remove obsolete RebalanceArgs struct !4777
  • ais pkg: split etl.go into tgtetl.go and prxetl.go for consistency !4778
  • cmn pkg: sort actions alphabetically !4779
  • etl: integrate etl communicator with xaction !4780
  • set-name when joining nodes; keepalive logging; test tweaks !4781
  • not aborting xactions anymore when adding/removing mountpaths !4782
  • fs: refactor fqn parsing !4783
  • fs: refactor fqn parsing, update unit tests and docs !4784, !4785
  • log rotation: uniformity of the headers !4786
  • fs: minor fixes; tests !4787
  • add an option to remove mountpaths without resilvering !4789
  • etl: manage etl metadata !4790
  • refactor xaction stats: remove redundant interfaces, rename !4792
  • xactions: introduce snapshot, add in and out counters !4793, !4794
  • integrate: streams and target stats, rebalance and xaction snapshots !4796
  • dsort: wait for all in-flight stream requests !4797
  • data mover to support xaction stats; in, out, and locally processed !4798
  • transport: complete transition to using opcodes !4799
  • stats pkg: add get-daemon-stats API, remove redundant code and refactor !4800
  • etl: fix panic in unit test !4801
  • etl: change signature of InitSpec function to require options !4802
  • memsys: tweak init-once logic yet again !4803
  • consolidate all mock objects in cluster/mock !4804
  • xactions: complete transition to snaps (ref) !4805
  • dsort: fix placement of aborted check !4808
  • fix streams received-bytes count; bump CLI and aisloader versions !4810
  • remove logic to compute min/max k-alive latency; simplify & refactor !4812
  • capability to recover lost or corrupted node ID at startup !4813, !4814
  • detach/disable mountpath: actually wait for resilvering !4818
  • uuid/gid: consolidate and refactor generation and validation !4819, !4820
  • aistore with no disks (feature) !4825
  • http provider: ETag check and prefetch prefix !4827
  • etl metadata: minor fixes and ref !4831
  • rebalance: simplify-out Tx semaphore !4832
  • always re-encode etl metadata; improvements and ref !4833
  • volume: when creating new volume (and associated VMD), make sure to have a record of configured fspaths !4834
  • cmn pkg: add is-err-not-found !4836
  • option to force starting up with a lost or missing mountpath !4837
  • change one api constant to support rolling upgrade !4840
  • fix ais advanced remove-from-smap command !4841
  • minikube: RBAC v1 !4843

3.7.1

2 years ago
  • AIStore K8s Operator v0.8
  • review TODOs, fix comments, remove unused bits !4600
  • multi-object streaming xactions: ref common, eliminate duplication !4599
  • K8s dev scripts: fix checking minikube version !4598
  • multi-proxy stress test: randomize sleep, minor cleanups !4597

3.7

2 years ago

Highlights

  • Reading, writing, and listing archives
  • Multi-object transactions; multi-object ETL (in addition to previously supported full bucket-to-bucket transforms)
  • "Easy URL"
  • New and improved CLI
  • New documentation website
  • Bug fixes and performance improvements

TAR, TGZ, and ZIP archives

  • unify/reuse list-range and archiving logic !4483
  • revise xaction renewal process; propagate archival UUID back to client !4488
  • unify archiving with existing multi-object operations !4484
  • CLI: extend PUT to support archive creation !4485
  • archive: support multiple mime types; add zip test !4491
  • naming iniside archives: by default, do not include source bucket name !4495
  • archive: ref count targets on a per archival transaction basis !4497
  • archive: error processing; end-of-iteration !4500
  • list objects API: new option to list archived content !4536
  • archive: support TGZ !4541
  • support full path via archpath query parameter !4543
  • archive: nest (tar, tgz) writers !4544
  • archive: add checksumming for all supported types !4547
  • add checksum-sizer; archive: add basic stats; eliminate fseek !4548
  • multi-obj archiving: transactions vs reusable xaction !4550
  • archive: add a file to a TAR archive !4553
  • archive: API naming, message flags !4556
  • archive CLI: make it work; docs: add multi-object operations !4561
  • CLI: add top level archive command !4570
  • Blog: an article on how to properly append to an existing TAR !4571
  • archive: finalize and cleanup logic; consistent renames !4575
  • archive: transactions, control flags, and error handling !4576
  • archive: allow multiobject operation to append to existing archive !4580

Multi-object Transform and Copy

  • multi-object transform & copy and archive: ref-count targets, unify quiescence logic !4511
  • multi-object transform (initial); consistent renames !4555
  • add multi-object (list | range) transformation !4565
  • multi-object copy & transform: multiple transactions, single xaction !4569
  • transform & copy, bucket & multi-obj: consistent naming !4572
  • transform & copy, bucket & multi-obj: continued ref !4573
  • transform & copy multi-obj: ref-count transactions !4574
  • multi-object copy & transform: refactor API, add stress !4581
  • multi-object copy & transform: add utils !4583
  • ETL: forward Init call to the primary !4591

Multi-object operations (common)

  • move/rename list and range xops (ref) !4459
  • list and range xops (refactoring part two) !4461
  • revise list and range operations !4464
  • list and range ops: continued refactoring !4469
  • more rigorous idleness check; unique transport endpoint (major) !4595

eXtended Actions (xactions)

  • xaction registry: remove finished (ref) !4442
  • enforce unique xaction ID !4449
  • consolidate xactions (ref) !4462
  • consolidate xactions (part two) !4463
  • remove/simplify cluster.XactID (ref) !4474
  • remove xaction.Args - simplify (ref) !4475
  • xactions: continued refactoring !4476
  • renewable xaction (w/ code reduction) !4515
  • renewable xaction (part two) !4516
  • renewable xaction (part three: WPR) !4517
  • rename on-demand xaction !4518
  • renewable xaction (part four) !4520
  • renewable xaction (part five) !4521
  • renewable xaction (part six) !4522
  • renewable xaction (part seven) !4523
  • renewable xaction (part eight) !4524
  • renewable xaction (part nine) !4526
  • rename/clarify xaction scope !4527
  • renewable xaction (part ten) !4528
  • simplify initialization of on-demand xactions !4529
  • EC: xaction renewals !4533
  • renewable xaction (part eleven) !4534
  • improve synchronicity when starting to run (distributed) xactions !4535
  • wait for rebalance to start (in re: improve synchronicity) !4538
  • xactions: improve synchronicity (LRU, resilver) !4546

Intra-cluster Transport

  • data mover stages; reduce unreg-recv delay !4545
  • consistency in copying obj attrs to/from http and transport headers !4452
  • transport streams: send completion vs freeing obj headers !4496
  • transport/streams: reduce/optimize header size, add SID/opcode; archive: send-done !4505
  • transport: refactor receive logic; use slab for headers; monotime !4507
  • transport Rx: simplify eof-ok !4508

Documentation

  • docs: add a section on increasing disk priority for aisnode !4470
  • docs: setting CPU governor to performance value !4471
  • CLI docs: update bucket props command !4506
  • docs: major restructuring to place all documentation in docs directory !4514
  • docs: bucket properties and property inheritance !4551
  • docs: make documentation a website !4554
  • website: replace images links with embedded youtube videos !4557
  • docs: move 'info for developers' out from user cli documentation !4558
  • website: various visual improvements !4559
  • website: remove posts header !4560
  • website: refactore and fix redirections and images !4563
  • website: add author field to blog post !4564
  • website: fix tables and enumerations !4568
  • docs: revamp RESTful API doc !4590

Tests

  • tests: allow larger number of idle connections so we can reuse them !4445
  • make archive test more stressful !4562
  • tests: fix PrefetchList and PrefetchRange !4466
  • add multi-objects test !4567
  • update archiveListRange test, make generated obj names more predictable !4468
  • fix object props test for aws and a bucket with disabled versioning !4478
  • test: remove hidden object path augmentation !4486
  • archive stress-test improvements !4499
  • add functional test for transforming object with GET request !4531
  • fix LRU unit test; check capacity consistently with existing code !4539

Bug fixes and improvements

  • revise/refactor LRU; xaction-being-renewed; misc renames !4537
  • revise housekeeper !4443
  • GFN: remove locks, revise, simplify !4444
  • unify object metadata, transport and ETL headers (part one) !4446
  • unify object metadata, transport and ETL headers (part two) !4447
  • checksum utils: add accessors, optimize !4451
  • object custom MD: send/receive and store upon migration (part one) !4453
  • not-found error: add source node for context !4454
  • revise head-object: target (incl. S3) and api package !4455
  • cleanup metafile if a bucket is deleted while EC is saving metadata !4456
  • s3 compatibility: reuse target.getObject, eliminate copy-paste !4457
  • set custom object properties (part one) !4458
  • move EC file type constants to fs package !4460
  • add lom.HrwTarget helper (ref) !4465
  • handle bucket-already-exists when creating one !4467
  • consistency in handling unmarshal errors !4473
  • gcp client singleton !4479
  • aws client per region !4480
  • backend API changes: context (part one) !4481
  • backend API changes: context (part two) !4482
  • CLI: new options for PUT object operation !4487
  • assorted TODO fixes !4490
  • xreg args by value !4492
  • lint/revive: enable unused-parameter !4493
  • "easy URL" - aka alternative AIS API mapping (feature) !4494
  • tweak initializing remote backend (ref, race) !4498
  • ETL executor implementation (initial) !4501
  • ETL: revise bucket copy&transform (major) !4502
  • CLI: add subcommand 'ais bucket prop' !4503
  • follow-up: aborting prior to starting to run !4504
  • slab-allocator: add AllocSize !4509
  • CLI: TAB-TAB to list only active cloud providers !4510
  • EC: use Hdr.SID instead of internal EC.intraReq.sender !4512
  • EC: use transport.Hdr.Opcode instead of intraReq.action !4519
  • dSort: remove redundant field in remoteRequest msg !4525
  • CVE fix: replace JWT library !4530
  • LRU: remove on-demand logic; archive: no need to quiesce data mover !4540
  • notifications: getting status and err vs rlocks !4542
  • CLI: minor refactoring and cleanup; error handling !4549
  • CLI: add bucket lru subcommand; show bucket props by default !4552
  • EC recv meta: fix error handling !4566
  • debug modules; logs; persistent markers !4577
  • node name !4578
  • glog not to flush in a separate goroutine; minor ref !4579
  • CI: add newest Go version to be used in GitHub actions !4586
  • revise quiescence logic; on-demand stats !4587
  • build: upgrade fuse pkg !4588
  • CLI ais bucket cp: support --list and --template options !4589
  • CLI ETL: support --list and --template options !4593

3.6

2 years ago

Highlights

  • observability: revise StatsD integration, add disk metrics, support Prometheus
  • EC version 2: introduce new (packed) metadata format; speed-up erasure-coding and optimize rebalancing of erasure-coded buckets
  • API for reading and writing (tar, tgz, zip) archives - initial, experimental

Observability

  • revise StatsD integration (part one: major update) !4340
  • revise StatsD integration (part two: cleanup, refactoring) !4342
  • stats/ref: move core-stats to common, close StatsD !4345
  • revise StatsD integration (part three) !4346
  • prepare metric names at init time !4355
  • implement Prometheus exporter !4356
  • make StatsD naming and units consistent with Prometheus !4357
  • preallocate disk stats !4358
  • add disk metrics, publish disk metrics !4372
  • tests: fix disk utilization computation !4374
  • track average read/write sizes; log improvements; refactoring !4376
  • read-lock when collecting Prometheus stats !4378

AIS CLI

  • CLI: do not validate config received from a daemon !4339
  • CLI: add an option --validate for bucket summary to show the number of 'not-ok' objects !4341
  • CLI: introduce new top-level command: storage !4350
  • CLI: fix counting missing copies in 'ais storage' output !4353
  • CLI: docs for ais storage command !4362
  • CLI: add 'ais storage disk' sub-command !4375

EC

  • EC: extend metafile with version and a list of slices !4373
  • EC: use packed binary format for metadata !4382
  • EC: flatten EC props for object HEAD response !4386
  • EC: save slice locations in metafile !4387
  • EC: protect metadata with checksum !4395
  • EC: rebalance v2 (major upd) !4426
  • EC: optimize network usage when deleting and restoring an object !4419

Archive

  • archive: GET from tar !4388
  • archive: CLI --extract option; refactoring !4390
  • archive: archpath keyword; add test; refactoring !4391
  • archive: devtools (minor ref) !4392
  • archive: GET from tar, tgz, and zip !4394
  • archive: add extension and provider tests !4396
  • archive bucket to bucket (part one) !4418
  • archive bucket to bucket (part two) !4425
  • archive bucket to bucket (part three) !4427
  • archive bucket to bucket (part four) !4429
  • archive bucket to bucket (part five) !4434
  • archive: auto-detect archive (mime) type !4402
  • archive: add test variations; absolute path !4404
  • archive: add mime type query and more tests !4405

CI

  • CI: fix linter on Darwin platform !4337
  • CI, Jenkins: run cloud ETL tests !4370, !4384
  • CI, Jenkins: correctly read and apply AWS environment for minikube !4397
  • CI: fix dev/k8s template !4403
  • CI: push credentials in secret !4408

Bug fixes and improvements

  • aisloader: require -cleanup option !4335
  • general: use Go 1.16 embed package !4336
  • test: add reader test for SGL !4338
  • config ports, json: remove duplication and parsing !4343
  • config, etc. !4347
  • config: add custom duration type and parsing (part one) !4348
  • revisit json duration (type) !4349
  • config: eliminate string durations, use custom type (part two) !4351
  • docs: "Monitoring AIStore with Prometheus" !4352
  • comments !4354
  • network call with retry: client and server flavors !4344
  • cmn: rename BucketNames struct to Bcks to better describe content !4359
  • fs, ios, ais: minor refactoring & cleanup !4360
  • mountpath, fs disks (ref) !4361
  • fs: revise init and validation logic !4363
  • fs/vmd minor ref !4364
  • transport, streams: tweak global defaults !4365
  • metasync: not using nil SGL for payload (fix) !4366
  • metasync: delayed cleanup (fix) !4367
  • deployment: remove unused aisloader container !4368
  • minor changes in documentation and code !4369
  • ETL: collect stats for communicator !4371
  • metasync: remove CoW debug validation; fixes !4377
  • cos/utils: create-file now always creates write-only !4379
  • dsort: replace section reader !4380
  • ETL: pull docker images only if not present !4381
  • ais: minor ref (get, extract) !4383
  • metasync, BMD, SGL !4385
  • reuse SGL when range-reading w/checksum; DEADBEEF; miscellaneous !4389
  • compute redirect latency every so often (perf) !4393
  • GET and PUT: remove dry-run option !4399
  • Jenkins: upgrade minikube !4400
  • override backends at cluster startup !4401
  • golint is deprecated: using revive !4406
  • memsys: tune-up and refactoring !4409
  • lint: revive fixes (part two) !4410
  • transport/streams: Rx callbacks do not need (or depend on) http !4411
  • comments fixing and minor ref ATB !4412
  • list/range API; CLI comments/examples !4413
  • tests: limit the number of cleanup-generated errors !4414
  • ref: move "connection-refused" and similar exceptions !4415
  • PUT into mirrored bucket !4416
  • more lint !4417
  • rename/refactor xaction providers !4420
  • ref: xaction args !4421
  • xaction factory (ref, consistency) !4422
  • read-json: prevent superfluous response !4423
  • revise xaction renewal (internal) API !4424
  • lint/revive: enable unused-receiver rule !4428
  • revise xaction registry (part one) !4430
  • proxy: mem-pool API args; original bucket url !4431
  • run-provider tests must cleanup after themselves !4432
  • bucket lookups vs a missing backend provider !4433
  • introduce fullname() shortcut for: lom, transport header, archive !4436
  • api/object: close response body, close source reader !4438
  • list-buckets corner cases with no authentication and/or compiled-in backend !4441

3.5

3 years ago

Bug fixes and improvements

  • metasync: dynamic SGL sizing !4282
  • http-common: serialize shutdown and initialization !4283
  • streams: intra-data endpoint !4284
  • ETL/target: intra-control endpoint !4285
  • s3 compatibility: stubs for unsupported features; s3cmd !4286
  • CI: add missing build configurations !4287
  • upgrade Go1.15 to Go1.16 !4288
  • enable intra-cluster networks for functional testing !4289
  • GODEBUG=madvdontneed=1 vs Go 1.16 !4290
  • deploy: remove obsolete charts and scripts !4293
  • tests: re-enable conditions for dSort tests with memory and disk !4294
  • CI: add more linters + chores around existing linters !4295
  • metasync: reuse jsp, optimize serialization (part one: cluster map) !4296
  • metasync: just-in-time revs !4297
  • CLI: use default config instead of empty one if the configuration file is missing !4298
  • fix minikube based deployment for tests !4299
  • target: serialize receiving BMD vs graceful termination !4300
  • extend is-unreachable helper with more !4301
  • updating nil BMD !4302
  • CLI: sort disk output by target_ID and disk_name !4303
  • fflush upon close prior to rename !4304, !4305, !4307
  • no metasync when shutting down !4306
  • tests: wait cluster state - use reverse proxy !4308
  • metasync: reuse jsp, optimize serialization (part two: BMD) !4309
  • metasync: avoid re-marshaling received cluster map (part four) !4312
  • metasync: optimize-out remarshaling (part five: Rx BMD) !4313
  • metasync: encode first, persist and distribute second !4315
  • metasync: reuse jsp, optimize serialization (part six: config) !4316
  • use direct I/O and sync for FSHC file read and write !4317
  • debug panic must leave a trace !4318
  • build: fix MacOS build !4319
  • rejoin cluster during startup if primary gets re-elected (major) !4320
  • metasync: optimize-out remarshaling (part seven: Rx Config) !4321
  • bootstrap: use joining targets info to discover BMD; strict UUID checking !4322
  • bootstrap, joining targets and RMD; fixes !4323
  • tweak primary election !4325
  • bootstrap: refactor reg-pool processing; handle global config !4326
  • metasync: revise receive logic; structured error type (part eight) !4327
  • metasync: handle cluster-info scenarios (part nine) !4328
  • starting up with outdated or missing cluster maps !4329
  • build/deps: upgrade all minors !4330
  • CLI: make order of columns stable !4331
  • target: refactor BMD Rx logic !4332
  • aisloader: support bucket URI in the command line !4333

3.4.1

3 years ago

Kubernetes Operator

Bug fixes and improvements

  • bench: add FIO benchmark configuration for repeatable FS benchmarking !4228
  • refactor EC rebalance test !4255
  • improve deploy script !4256
  • docs: Use absolute links !4260
  • lint update v1.32 to v1.39 !4258, !4261, !4262, !4263
  • better checking for rebalance abort !4264
  • revise vmd loading and error handling !4265
  • rlock ios !4266
  • fix keepalive reconfig !4267
  • ios: minor optimizations; vmd: refactor init/load !4268
  • assign Snode flags only when changed !4269
  • fix build for net/http !4270
  • rebalance: mem-pool and fix transport/header !4271, !4272
  • build: 1) upgrade fasthttp and 2) replace x/crypto (deprecated) !4273
  • memsys: refactor sgl allocation !4274
  • ais/s3 API: respond via sgl !4275
  • mem-pool SGL (list, not buffers) !4276

CLI

  • allow user to set aliases !4244
  • move cluster configure to top-level commands !4248
  • show extended EC stats for 'ais show job xaction ecput/ecget' !4249
  • replace 'daemon id' with 'node' in titles !4251
  • don't show all configs by default !4252
  • don't use append() to initialize commands !4253
  • show daemon local config separately !4254
  • revise alias; add list, reset !4257
  • add completion for 'cluster detach' !4259
  • add link to autocompletion scripts to app description !4279
  • add basic k8s support as top level command 'kubectl' !4280
  • ais show log [node]: API and CLI !4277, !4278, !4281

3.4

3 years ago

Highlights

  • Kubernetes Operator: a separate repository;
  • Cluster life-cycle management: maintenance (node), decommission and shutdown (node and cluster);
  • CLI: implement (category, verb, subject) auto-completions; numerous improvements;
  • Global Cluster Configuration - versioned, replicated, protected; per-node capability to override global defaults;
  • System Metadata (5 types): unified CCS formatting, instance and meta-versioning, backward compatibility;
  • Authentication and ACLs: add users and roles, docs, tests, bucket and cluster management, CLI;
  • ETL: AIS transport streams, GCP and minikube, CI and Jenkins, stress tests (*);
  • PDU-based intra-cluster transport, streaming objects of unknown size;
  • Performance: memory pooling, in-memory metadata;
  • HDFS on the back – in total, supporting 6 different backends;
  • DNS names - supporting hostnames and/or IPv4 in public and intra-cluster networks;
  • Resilver: resume upon reboot, run on a selected node;
  • Cloud: improve AWS versioning, improve error handling;
  • Erasure Coding: improved stability for Cloud buckets; handle low memory and OOM;
  • Distributed shuffle (dSort): improve stability, optimize memory and CPU usage;
  • All subsystems and core: productization, stabilization, bug fixing, refactoring across board

Core

  • unify bucket(name, provider, namespace) transfer - part one - !3340
  • general: refactor and simplify cold Get - !3345
  • keepalive: do not decrease timeout on connection error - !3444
  • (new) metadata write policy - !3513, !3596
  • LOM minification - !3521
  • fs/cluster: generalize fs.GetContentFQN with interface - !3548
  • cluster: remove some redundancy in CT and LOM structures - !3549
  • cmn: return ReadOpenCloser on ReadOpenCloser.Open - !3562
  • memsys: remove unused SliceReader - !3563
  • LOM: remove one field - !3570
  • LOM: enforce rlock - !3615
  • support (DNS) hostnames for the public and intra-cluster networks - !3616
  • fix passing origURLBck in bucket init - !3720
  • LOM In Flight (LIF) - !3765
  • mem-pool LOM - !3722
  • mem-pool GET, PUT, and COPY runtime - !3825
  • mem-pool LOM (part two) - !3839
  • mem-pool LOM (part three - rebalance) - !3878
  • copy & transform flows: more mem-pooling and refactoring - !3879
  • extend core structs to support possible within-meta-version extensions - !4239
  • on-disk meta-versioning and backward compat (part one) - !4023
  • meta-versioning and backward compatibility (part two) - !4037
  • meta-versioning and backward compatibility (part three) - !4061
  • meta-versioning and backward compatibility (part four) - !4075
  • refactor http-common - !3639, !3641, !3649, !3656, !3660, !3788, !3793, !3816
  • transport: remove roff - !3661
  • transport: pool obj-reader struct - !3662
  • transport: always execute send-completion logic - !3936
  • transport: use interface{} instead of unsafe.Pointer as callback argument - !3953
  • minor refactor of proxy election method - !3582
  • ignore TLS handshake errors - !3586
  • refactor bucket initialization and permission check - !3590
  • add missing unlock in GET object - !3681
  • fix rename to accept bucket with backend buckets - !3684
  • send props on create bucket in message instead of query - !3692
  • pre-decide which type of lock should be taken on CopyObject - !3738
  • ensure that remote object was correctly fetched after UpgradeLock - !3761
  • generalize creating default bucket props - !3775
  • fail early when trying to explicitly create a cloud or HTTP bucket - !3778
  • correctly handle creation of a bucket with backend bucket - !3782
  • fix sending error when part of the object was already sent - !3799
  • determine internally if we should skip validate in defaultBckProps - !3807
  • correctly skip bmd modify when terminate flag is set - !3889
  • revise/unify error handling - !3929
  • fix unlock panic when copying an object - !3935
  • unmarshal HTTPError on call - !3608
  • do not append redundant call frame to error - !3821
  • correctly send error on bucket summary - !4100
  • add new function for extracting QueryBcks from request - !4105
  • unify listing and getting summary - !4106
  • remove err from bckInitArgs struct and rename queryBck - !4123
  • add stronger validation for bucket in bckInitArgs - !4124
  • return error on missing required query parameter - !4126
  • ensure latest bmd when doing list and summary - !4135
  • fix returning error when begin phase in bucket creation fails - !4138
  • correctly handle init remote ais cluster bucket - !4139
  • remove vmd creation on user register - !4152
  • pass error when querying daemon info - !4154
  • initialize bucket on bucket summary - !4164
  • start resilver if the new mountpath was added - !4180
  • revise startup sequence - !4191
  • earlystart: fix possible disconnect between rmd and smap - !3984
  • earlystart: resume global rebalance (part two) - !4000
  • synchronize remote AIS attachments when target joins (part two) - !4202
  • general: fix head object on remote AIS bucket - !4204
  • revise metasync receive - !4206
  • Sync RMD when a node joins - !4193
  • wait for metasync when decommissioning a target - !4087

CLI

  • revamp 'show rebalance' - !3551
  • reorganize proxy and targets templates - !3552
  • do not show nodes status when all online - !3554
  • fix put progress bar report for large uploads - !3556
  • show deployment type of nodes in ais show cluster - !3557
  • ais rename issues - !3560
  • introduce templates table - !3566
  • use colors in messages - !3578
  • add wait flag for copy and etl bucket - !3583
  • correctly handle errors from HTTP server - !3584
  • fix error messages - !3658
  • change the way ais displays daemon configs - !3666
  • fix AuthN errors and a little refactoring - !3677
  • Change the rename command to mv (part two) - !3675
  • panic when setting an invalid bucket property - !3695
  • Improve 'ais show config nonexistent' error - !3711
  • fix matching fail checks in tests - !3715
  • completion for bucket permissions - !3733
  • Update 'show object' to display properties vertically - !3743
  • fix makePairs to accept values with = characters - !3747
  • Remove cp objects from help message - !3764
  • Rebalances cannot be complete if none exist - !3766
  • Add a wait option for mv bucket - !3767
  • fix printing download ID on the start - !3599
  • small ETL improvements - !3603
  • Imply obj name as out_file in get command - !3640
  • change the rename command to mv - !3668
  • highlight bucket properties that differ from default ones - !3686
  • complete option values - !3699
  • Add support for graceful shutdown - !3805
  • improve AuthN UX - !3842
  • revoke token command - !3880
  • AuthN user permissions - !3887
  • TAB-TAB completion and parsing for user-friendly permission when adding role - !3888
  • Introduce object and bucket top level commands - !4029
  • Move rm download/dsort inside job - !4043
  • improve gen-shards input and allow specifing provider - !4048
  • Add ais show auth and ais show bucket - !4062
  • use standard name for no-color flag - !4068
  • Various grammar/wording changes - !3998
  • Introduce cluster and disk top level command - !4006
  • force flag for node maintenance - !4007
  • Introduce job top level command - !4017
  • Introduce advanced top level command - !4018
  • fix setting cluster/daemon config - !4076
  • Fix panic when aliases have subcommands - !4077
  • fix duplicated ais auth show command - !4082
  • refactor parsing bucket or bucket + objName URIs - !4085
  • Get daemon ID from daemon itself instead of API call - !4091
  • add parse functions tests - !4095
  • require bucket only for bucket xactions - !4104
  • show targets in maintenance - !4133
  • allow viewing cluster config - !4147
  • scope based props validation in configure command - !4148
  • Make verbose more consistent for show cluster - !4156
  • Fix ordering in show cluster - !4182
  • add config option to set default provider - !4189
  • Update version number - !4192
  • make object name optional for evict command - !4203
  • local and cluster config displaying - !4230
  • docs improvements and more polish - !4240
  • add 'ais cluster show bmd' command - !4241

Distributed Shuffle (dSort)

  • general: use new optimized msgp method - !3553
  • Use job instead of manager in dSort errors - !3667
  • access lom size only when put has been successful - !3843
  • re-load lom under lock when sending to another target - !3844
  • use cmn.Bck instead of bare name and provider - !4066
  • improve and add more logging - !4093
  • correctly cleanup resources when ignoring duplicated records - !4109
  • ignore targets in maintenance - !4166

Authentication & User management

  • Allow changing bucket permissions for read-only buckets - !3561
  • fix panic when checking restricted user's permissions - !3841
  • correct checking user's role permissions - !3857
  • do not cache generated tokens - !3858
  • AuthN docs - !3881
  • Authn docs and code review - !3892
  • API to change configuration on the fly (1 of 2) - !3947
  • fix returning info for a single cluster, update tests - !3987
  • AuthN refactoring: move to its own package - !4171
  • revise access permissions (ACL) - !3744
  • Improve bucket ACL check - !3757

ETL

  • use user defined ID as name - !3571
  • fix minikube playground etl + cloud providers - !3565
  • minor improvements and refactoring - !3671
  • add tests for target down - !3674
  • use a separate GCP bucket for cloud tests - !3701
  • add better cloud warm/cold ETL object tests - !3703
  • add large bucket test - !3679
  • health observability (part one) - !3683
  • run health tests - !3763
  • wait longer for pods metrics - !3770
  • add tests for occasionally failing transformation - !3785
  • broadcast list request to all targets - !3787
  • make failing ETL test less flacky - !3796
  • fixes around k8s.Detect() usage - !3801
  • better observability of ETLBigBucketTest - !3818
  • add objects and bytes count check - !3823
  • don't run big bucket test on single target deployment - !3836
  • test abort in the middle of running - !3852
  • tests improvements, better cleanups, fail faster - !3884
  • remove already fixed TODO comment - !3896
  • new StopCh for each etl big bucket test case - !3899
  • introduce single request timeout for ETL Bucket - !3900
  • set python runtime image pull policy to Always - !3909
  • set request timeout for TestETLBigBucket - !3910
  • minor tests improvements - !3913
  • more verbose Pod ready failure message - !3980
  • ETL: failfast if we know starting an ETL will fail - !3990
  • fix health status code regression - !4003
  • don't skip TestETLObjectCloud - !4004
  • more precise health error message - !4111
  • propagate whole bucket info to etl requests - !4140
  • wait for rebalance to complete after target down test - !4153

K8s

  • dev: support redeploy k8s; and datascience container - !3577
  • k8s: add fallback env variable when HOSTNAME is not reliable - !3567
  • k8s: decrease default curl timeouts on bootstrap - !3594
  • k8s: simplify and rewrite building and pushing aistore/aisnode image - !3610
  • k8s: prod docker image build node with all providers - !3659
  • fs: run mountpaths integration tests on minikube - !4012
  • deploy: increase default memory limit on minikube - !4051
  • deploy: add metrics-server to minikube deployment - !3689
  • minikube: update docker image tag - !3726
  • minikube: update image to aisnode:minikube - !3746
  • minikube: optionally deploy metrics collection - !3749
  • general: rename minikube testing build tag - !3786
  • general: waiting for primary to become ready (minikube + tests) - !3808
  • minikube: minor fixes in deploy scripts - !3812
  • minikube: allow multitarget deployment and use it on jenkins - !3835
  • minikube: build with debug on jenkins - !3938
  • minikube: disable cgo for node build - !3942
  • minikube: fix fspaths config - !4177
  • k8s: remove wait for service logic from container start-up script - !3815
  • k8s: update start up script - !3959
  • k8s: AIS_HELM_DEPLOYMENT variable as string - !3985
  • jenkins: update RE variable for K8s tests - !4040
  • skip public net hostname validation on K8S environment - !3637
  • k8s: add Dockerfile for initContainer - !3651

EC

  • allows setting Data:Parity for a bucket if the number of nodes is between... - !3708
  • use LIF where possible - !3789
  • load recovered LOM - !3806
  • configuration option 'disk_only' - !3827
  • correct LOM loading when PUTting an object - !3838
  • remove unused cksumType parameter - !3847
  • do not call lom.Load while deleting an encoded object - !3856
  • mem-pool cluster nodes - !3973
  • TestDestoryBucket - !4057
  • add slice locking when receiving, sending, and deleting - !4079
  • encode and decode xactions refactoring - !4089
  • EC xactions - refactoring on-demand - !3607
  • EC cloud test fix - !3966
  • EC putjogger refactoring - !3906
  • EC getjogger refactoring - !3919

Downloader

  • fix bucket backend provider check - !3810
  • refactor of CompareObjects test - !3845
  • remove special parameter for lom.Size - !3851
  • synchronize requests on dispatcher startup - !4019
  • correctly calculate download latency - !4070
  • listen on abort when waiting on throttler - !4071
  • improve checking if the task still exists - !4094
  • improve context cancel logic + minor refactor - !4121
  • Downloader: fix uploading to Cloud bucket - !3598
  • allow downloading to cloud bucket - !3587
  • return first error on download start request - !3592

API

  • add http error message when doing HEAD request on bucket - !3600
  • cloud: add API to create buckets for cloud providers - !3670
  • return bucket props in single header - !3682
  • move HEAD header error check to checkResp func - !3688
  • cmn: remove unused API constants - !3690
  • return 404 on wrong node id for config request - !3712
  • API change: bucket props-to-update - !3755
  • Add support for graceful node shutdown and related CLI - !3828
  • CLI and Single node resilver - !3893
  • set cluster config using cmn.ConfigToUpdate struct - !3894
  • fix closing reader in DoReqWithRetry function - !3926
  • add an option to evict only data for remote buckets - !3958
  • pass node as parameter to GetDaemonConfig function - !4036
  • fix 'get remote cluster list' API - !4059
  • cmn: cleanup api constants - !4083
  • Reset cluster or daemon to cluster configs - !4150
  • Workaround for remote ais bucket create - !4208
  • Refactor httpcluget GetWhatRemoteAIS - !4210

General

  • xact: remove unused return error parameter from Run methods - !3574
  • xact: always call Finish on the end of xaction - !3576
  • cloud providers: validations, docs (updates) - !3579
  • dl: auto create destination bucket if not exists - !3580
  • lru: minor refactor of main Run function - !3581
  • xact: print list objects error when not aborted - !3585
  • Allow cloud provider aliases (as in CLI) - !3588
  • metadata write policy (part two) - !3589
  • general: remove unnecessary query append - !3593
  • Use correct RecvType for each DataMover - !3597
  • LOM: assert locking - !3604
  • control plane: update forwardCP to utilize original req body - !3606
  • transfer bucket: correctly set objects versions - !3609
  • debug: remove Enabled, add Func() - !3611
  • fix issue with cached backend bucket props; tests evict backend bucket - !3612
  • bckInitArgs remove origURLBck field - !3613
  • fix rename bucket with backend and add test - !3614
  • Idle property for XactDemand - !3617
  • test fixes: cloud-mirror & rename-with-backend - !3618
  • bucket uname query (rename & doc) - !3619
  • fix bucket props when copying and creating on the fly - !3620
  • do not clone config; refactoring - !3621
  • fix intra control/data ports - !3622
  • Improving checkRESTItems - !3623
  • make target health handler public - !3624
  • Replace variadic path item list in checkREST calls with "constant" slices - !3625
  • optimize CoW config saving when updating cluster maps - !3626
  • handlers use flags for network access - !3627
  • fix smap-modifier - !3628
  • keepalive: revise cluster map update - !3629
  • sys: refactor & simplify - !3630
  • limit intra-cluster broadcast - !3631
  • health: refactor proxy/target health handlers - !3632
  • Remove ActRecoverBck - !3633
  • fix predeclared identifier usage - !3635
  • Reuse apiRequest.parse in proxy - !3636
  • fix setting intra-call headers - !3638
  • config: rename ipv4 to hostname - !3642
  • general: correctly register health handler on proxy - !3644
  • general: add missing space in log message - !3645
  • general: refactor join words with URLPath - !3647
  • health: external health check types - !3648
  • cloud: add support for HDFS provider - !3653
  • general: change format of stack printout in logs - !3654
  • fs: move markers when mountpath is removed or disabled - !3663
  • general: update copyright statement - !3664
  • txn: simplify alloc and free of bcast args - !3669
  • cloud: change the order of the methods to match interface - !3672
  • general: add better validation for bucket rename - !3673
  • config: Put and Begin/Commit/Discard - !3676
  • cmn: split fields in Extra by provider - !3680
  • Put Error to Header response if a request is a HEAD one - !3685
  • config: add daemon role field; fix fspath config validation - !3687
  • cmn: move code to match order of providers - !3691
  • cmn: validate HDFS config addresses - !3693
  • general: add HDFS build tag and allow selecting it during make deploy - !3694
  • scripts: ipv4 => hostname - !3696
  • add text: data redundancy - multiple options - !3697
  • metadata write policy default (minor) - !3700
  • deploy: easier configuration for cloud providers - !3702
  • Security: Reduce file permissions to user and group only - !3705
  • remove OpenAPI-based client generator (swagger-codegen) - !3706
  • revise lom create-file - !3709
  • cache bucket dirs with mountpaths - !3710
  • deploy: fix deployment of next tier cluster in cleanup script - !3713
  • general: fix grammar mistakes - !3717
  • general: remove implicit cloud provider (cloud://) leftovers - !3718
  • general: remove node name prefix from bucket errors - !3719
  • change in terminology: backend provider - !3730
  • general: add update config option - !3732
  • general: correctly handle lookup of HDFS bucket - !3737
  • general: fixes around HDFS support - !3739
  • change in terminology: backend provider (part two) - !3741
  • Using *ToUpdate for updating config - !3742
  • general: fix implementation of UpgradeLock - !3745
  • general: fixes around HDFS provider - !3751
  • general: rename cloud package to backend - !3752
  • cluster: fix and improve implementation of UpgradeLock - !3754
  • extend providers tests; more test-skipping options - !3756
  • general: rename CloudVersion to RemoteVersion - !3762
  • Use correct permissions for every initAndTry - !3768
  • backend: remove only extra info from AWS error - !3769
  • general: rename cloud_mock to backend_mock - !3771
  • general: fix resetting bucket props for HDFS bucket - !3772
  • simplify lom.bucket (ref) - !3779
  • PUT(mirror): optimize-out one channel from the datapath - !3780
  • reduce wack-target memory footprint - !3792
  • backend: relax address connectivity checking for HDFS backend provider - !3795
  • backend: correctly set ContinuationToken for HDFS's list objects - !3797
  • cmn: remove | character from HTTP error string - !3798
  • makefile: better AIStore running check - !3802
  • general: make skip_verify_crt option configurable in aisfs and cli - !3803
  • backend: correctly handle prefix in HDFS's list objects - !3809
  • general: rename cloud to broader terms remote or backend - !3811
  • backend: correctly skip HDFS files which do not match prefix - !3817
  • reinforce LOM loading - !3824
  • fix/revise delete-obj; evict to return status - !3829
  • unloaded deleted object - !3830
  • general: remove special marker for cases when the LOM is loaded - !3831
  • general: use consistent names and order in DeleteObject method - !3832
  • tutils: cleanup temporary directory created in PrepareObjects - !3846
  • transfer bucket: remove unused function - !3848
  • backend: fix PutObj implementation for remote ais - !3849
  • general: return copied bytes count in targets copy* methods - !3850
  • revise PUT versioning logic - !3859
  • backend: remove retry in the AIS remote cluster - !3862
  • obj: remove unused uncache field in coi structure - !3863
  • backend: simplify PutObj implementation for AIS remote cluster - !3864
  • copy-object-info: remove unnecessary lom.Loaded() asserts - !3865
  • copy&transform flow: revise and simplify put-remote - !3869
  • xaction must call Finish as soon as everything is cleaned up - !3870
  • revise copy-object: remote and local parts - !3871
  • extract cluster-wide operation into a separate source - !3891
  • review maintenance, shutdown, and decommission oper-s - !3895
  • transactions: simplify handler and fix double write bug - !3897
  • make kill timeout - !3901
  • node shutdown vs self-initiated unregister - !3904
  • revise node self-initiated removal (part two) - !3912
  • makefile: option to run tests only from specified directory - !3914
  • unify target and proxy termination - !3916
  • tutils: refactor cluster init - !3917
  • refactor invalmsghdlr and friends - !3918
  • general: login Jenkins to Docker Hub - !3920
  • general: rename cloud field and struct to backend - !3921
  • general: embed cleanup callback in the reader Close method - !3923
  • backend: refactor PutObj usage and delegate closing reader to implementation - !3924
  • Add configuration for automatic resilvering - !3928
  • revise/unify error handling (part two) - !3933
  • cmn: better OOS error message - !3934
  • revise/unify error handling (part three) - !3937
  • cmn: correctly set custom HTTP status code - !3939
  • cmn: use DurationJSON type in Bck2BckMsg struct - !3941
  • aisfs: build aisfs in test:long - !3943
  • err handler for "405 method not allowed" - !3945
  • Make complex tree of calls to IncPending and decPending a simpler one - !3948
  • jenkins: clean docker images cache - !3949
  • cmn: add comment about closing BodyR - !3951
  • general: use unlock lom on close whenever possible - !3952
  • cmn: fix parsing 'Range' HTTP header - !3955
  • LOM: strict caching and locking policy - !3956
  • revise the logic to resume rebalance - !3960
  • tutils: simplify DeployNode and CleanupNode logic - !3962
  • tutils: add CheckResp for checking response status code - !3963
  • assorted fixes - !3964
  • general: separate global and daemon config - !3965
  • graceful shutdown; can start rebalance; remove dont_run_time config - !3967
  • simplify fs.CreateBuckets - !3968
  • tutils: allow custom intervals for WaitNodeReady - !3970
  • reinforce naming convention for custom errors and error types - !3971
  • general: update build scripts to use local config - !3972
  • Use constant format strings to return common errors - !3975
  • universal quiescence - !3977
  • general: remove stray debugging message - !3979
  • general: use ErrNotFound across the codebase - !3981
  • build: replace non-existing /dev/nil with /dev/null - !3982
  • docker: revert startup wait removal for helm charts ! - !3815 - !3983
  • cmn: add Caller field to HTTPErr - !3986
  • xaction: eliminate copy-paste construction - !3988
  • general: refactor and simplification around http errors - !3994
  • on-demand stopping logic (minor) - !3997
  • Use GET for ListObjects instead of POST - !3999
  • Do not remove an object after it is rebalanced - !4001
  • general: move fields from global to local config - !4002
  • create config owner and maintain versioning - !4008
  • support software version and build time; show via API and CLI - !4009
  • HTTP header and query - !4010
  • config: fix config Validation returned from API - !4011
  • metasync global config - !4015
  • cluster: fix debug message which caused infinite recursion - !4020
  • general: use modifier to update config - !4022
  • Remove force option from maintenance - !4024
  • general: update attach/detach remote AIS to use config modifier - !4025
  • fix mutex assertion race - !4026
  • Auto rebalance settings - !4027
  • general: allow override global config - !4028
  • general: sync config on node join - !4030
  • general: fix setting global config from non-primary - !4031
  • cmn: correctly validate config confdir value - !4041
  • cmn: set ais as default provider when parsing bucket - !4042
  • general: using tags to restrict config updates - !4047
  • cmn: rename StringSet.Keys to ToSlice - !4053
  • devtools: fix wait-start-re - !4056
  • general: separate global and local config - !4063
  • Use cmn.Bck to initailize CT instead of strings - !4064
  • devtools: return pid as int instead of string - !4072
  • devtools: fix assert in PutRandObjs - !4073
  • action-message (minor ref) - !4074
  • general: refactor config management implementation - !4078
  • jenkins: force docker image prune - !4081
  • cmn: return invalid provider error from validate function - !4084
  • flatten devtools directory structure - !4088
  • redistribute VMD under lock - !4090
  • new common (part one) - !4092
  • cluster: remove unused argument - !4097
  • cluster: refactor around bck provider validation - !4098
  • Quick-fix EC test for new LOM pool - !3781
  • EC tests decrease the number of goroutines - !4102
  • Keep node in smap during shutdown - !4103
  • new common (part two) - !4107
  • config: tweak validator - !4108
  • debug: add http debug handlers (pprof and vars) - !4110
  • debug: fix pprof handler path - !4112
  • jsp authn (config and tokens) - !4114
  • general: allow setting transient cluster config - !4115
  • general: fix config change notifier - !4116
  • Get snode flags from smap - !4117
  • general: discover latest config - !4118
  • sign and version config; xmeta - !4119
  • Stop aisnode when decommissioning - !4120
  • cmn: improve QueryBcks validation methods - !4125
  • cmn: cleanup around provider validation - !4127
  • general: move metadata filenames to cmn package - !4128
  • general: store cluster config only on disk - !4129
  • general: *ToUpdate omitempty fields in json - !4130
  • general: remove json local extension - !4131
  • fs: support decommissioning, rewrite moving-markers logic - !4132
  • general: move log dir to local config - !4134
  • general: refactoring around bucket parsing and validation - !4136
  • general: explicitly require bucket provider - !4137
  • general: make fspath config representation - !4144
  • revise GET and mirror-copy load balancing - !4145
  • Improve CLI UX and documentation for configuration - !4146
  • make runtime stats expvar-observable in debug mode - !4149
  • fs: refactor around vmd - !4151
  • general: fix attach detach remote AIS - !4157
  • RmNode optional data cleanup - !4158
  • general: include config in nodeRegMeta - !4160
  • global config: add uuid - !4161
  • remove node from smap in removal-by-test case - !4162
  • unify target/proxy receive Smap logic - !4169
  • join-cluster: metasync vs node-reg-meta - !4170
  • fs: extend vmd structure with more filesystem information - !4172
  • stats: simplify-out listening for config updates and other refactoring - !4173
  • keepalive: refactor & optimize - !4174
  • support selecting remote backends at deployment time - !4175
  • cmn: add writeup about bucket validation - !4178
  • fs: add versioning to vmd - !4179
  • global config: provide for within meta-version extensions - !4181
  • Remove hardcoding in clean_deploy.sh - !4183
  • AWS: fix reading object versions for not-cached objects - !4184
  • target startup sequence: init fs first, snode second - !4187
  • cmn: move test fspaths init to config validation - !4188
  • general: remove unused code and code cleanup - !4190
  • backend: fix initialization of objMeta map - !4197
  • unify cluster-wide meta exchange (part one) - !4198
  • keep remote AIS attachments in sync when new targets join - !4199
  • general: implement decommission cluster - !4200
  • general: use proxy/target runner Stop on decommission - !4207
  • general: ensure intra(Primary) call - !4209
  • Step 2: cluMeta usage - !4211
  • Add more options for clean_deploy.sh - !4212
  • general: correct evit/destroy of remote AIS buckets - !4213
  • general: better error logging for hostname selection - !4214
  • set-bucket-props: backend - !4215
  • general: remove go routine shutdown/decommission request - !4218
  • rename http header constants - !4219
  • intra-cluster control: micro-optimizations and refactoring - !4220
  • refactor Smap out of ais msg - !4221
  • general: fix node shutdown - !4225
  • build: change default compiled providers for aisnode docker image - !4229
  • general: add shutdown marker on daemon shutdown/decommission - !4233
  • Do not panic on unexpected EOF when draining a reader - !4234
  • Add a build tag to use rproxy for all s3 requests - !4235
  • Allow disabling xattrs feature - !4237
  • Add alias ais show config cluster for consistency - !4242
  • Relax checking for Smap reliability when private network is used - !4245
  • reinforce cluster UUID when joining and starting up - !4246

Tests and CI

  • skip flaky TestDistributedSortKillTargetDuringPhases test - !3550
  • ci: bring back skip-ci feature - !3569
  • add bytes limit to TestDownloadJobLimitConnections - !3602
  • use t.Cleanup to destroy and evict test buckets - !3605
  • fix s3 tests to work with HTTPS - !3643
  • do not require AIS_ENDPOINT when testing on K8s - !3646
  • ci: fix tags on k8s jobs - !3655
  • ci: update Docker image - !3725
  • test: correct check for target count in EC rebalance test - !3728
  • ci: add HDFS provider job - !3734
  • extend the scope of tests from cloud to remote buckets - !3736
  • better ETL health check - !3748
  • detect cluster deployment type, not test suite deployment type (k8s) - !3758
  • run ETL tests on jenkins - !3750
  • ci: mark all the jobs with aistore specific tags - !3652
  • ci: add timeouts for all jobs - !3716
  • ci: use native solution for downloading dependecies - !3773
  • ci: fix permissions in HDFS job - !3794
  • ci: increase mountpath count and target count in k8s tests - !3822
  • ci: ignore lint for blocks which do not contain new variables - !3833
  • ci: allow running manual long tests on master branch - !3872
  • ci: re-enable running jobs in scheduled pipeline - !3925
  • ci: fix setup in HDFS job - !3927
  • ci: allow non-escaped pipe characters in RE variable - !4034
  • ci: add simple github workflow - !4035
  • ci: deploy metrics server on kube pipelines - !4052
  • ci: fix branch name in github workflow - !4065
  • ci: show correct path to the jenkin logs - !4069
  • ci: remove test:soak and test:bench - !4099
  • ci: rename and standardize jobs names - !4165
  • ci: use clean_deploy script to deploy cluster - !4194
  • add bytes limit to TestDownloadJobConcurrency - !3591
  • Tests: fix TestObjectPrefix filter check - !3698
  • wait for resilver after manipulating mountpaths - !3774
  • increase kube long tests timeout - !3777
  • extend ETL big bucket tests with new cases - !3783
  • add Python2 runtime for build test - !3784
  • Tests: improve EC tests' slice lookup procedure - !3791
  • skip failing EC tests for bad checksum - !3800
  • FailNow vs goroutines - !3813
  • Tests: replace full /tmp/ais traversing with fs.WalkBck - !3814
  • add new, faster and better implementation of PutRandObjs - !3819
  • do not stop ETL when it wasn't correctly started - !3820
  • skip tests with HDFS as backend bucket - !3834
  • make copy bucket tests long - !3853
  • use new version of PutRandObjs - !3866
  • remove PutObjsFromList function - !3867
  • make tutils init explicit - !3868
  • simplify TestSmoke implementation - !3873
  • show AIS logs for minikube tests on jenkins - !3874
  • increase allowed log streams for jenkins minikube - !3875
  • fix passing cksum type - !3876
  • Tests: fix ListCloudGetTargetURL test - !3882
  • fix map version sync - don't update higher version smap - !3885
  • show ETL logs when a test has failed - !3898
  • add flag to skip local tests - !3902
  • retry failed ETL YAML get from github - !3915
  • remove destination bucket after each ETL test - !3922
  • Tests: fix fshc test - !3930
  • fix evicting bucket/objects for HDFS backend provider - !3940
  • correctly set SkipArgs for local tests - !3950
  • fix bucket props for HDFS bucket - !3954
  • Tests: fix waiting for node is gone in shutdown test - !3969
  • fix restoring with diff IP - !3976
  • fix expected message error in AuthN tests - !3978
  • fix cleanup node error - !3989
  • minor refactoring around EC tests - !3995
  • verify that cliBck exists - !4005
  • Tests: correct initialization of local config in MultiProxy - !4014
  • Tests: add waiting for node is shut down in TestNodeShutdown - !4016
  • CI - collect failed Kube tests artifacts - !4021
  • Tests: TestECDisableEnableDuringLoad fix - !4033
  • fix rename-bucket typo - !4038
  • revert rename-bucket wait change - !4039
  • make sure e2e fails gracefully when there are no mountpaths - !4049
  • add test to check backend providers in list buckets - !4050
  • fix fs package tests on deployments with loopback devices - !4055
  • Tests: Add a more rigorous shutdown test - !4058
  • refactor devtools; clarify decommission - !4086
  • add config management tests - !4101
  • config tests wait for node to be ready - !4113
  • fix displayed TestETLObjectCloud name - !4141
  • remove unused test config; update log dir path - !4142
  • FSHC use number of object based on the number of mountpaths - !4143
  • Tests: use StartMaintenance instead of Decommission - !4155
  • Tests: Add tests for configuration resets - !4159
  • Tests: skip MOCK nodes when saving node command-lines for restore - !4163
  • re-enable AddNodeDuplicateIP tests - !4167
  • fix invalid url in TestRProxyGCS - !4176
  • retry starting aisfs binary - !4195
  • skip TestRWStress on HDFS backend - !4196
  • Tests: stressi rebalance - !4224

Documentation

  • close tar in imagenet ETL example - !3555
  • fix etl commands in k8 dev docs - !3558
  • update video thumbnail images and add favicon for future web use - !3559
  • revisit and update bucket documentation - !3704
  • add HDFS provider documentation - !3723
  • describe --force option for set-props - !3729
  • add HDFS tutorial - !3759
  • reword and add link to HDFS provider section - !3760
  • Remove inconsistencies in console examples - !3932
  • update options in clean_deploy script - !4054
  • Clean up dead links - !4060
  • rework bucket provider document - !4067
  • fix cli output - !4096
  • Minor polish for CLI reference - !4185
  • Update CLI demo gifs - !4222
  • minor cleanup - !4231
  • clean up for ais show and ais job - !4243
  • readme-3.4: new pics - !4238
  • Update node decommission docs - !4186

3.3

3 years ago

Highlights

  • ETL - inline and offline dataset transformations, custom user-defined transformations via both user-provided containers and Python scripts, simplified ETL initialization, ETL directly to and from Cloud buckets;
  • Multi-Cloud capability supporting co-existence and management of datasets originating from (or hosted by) different Cloud storages - !2736, !2737, !2748, !2792, !2793;
  • Maintenance and decommission - the capability to put a clustered node in maintenance mode and/or safely and permanently remove it from the cluster - #947, !2935, !2957, !2983, !2990, !3094;
  • Volume metadata (VMD) - persistent information that describes each clustered node's storage configuration (including data drives, local filesystems, mountpaths) further used to reinforce data integrity and protection - #939, #941, !3118, !3198;
  • New protocol prefixht:// - uniform access to "vanilla" HTTP(S) based datasets - #882, #889;
  • Terraform integration - easy and automated deployment via Terraform - there's a separate repository (of scripts, charts, and documentation) that we use for production deployments;
  • Intra-cluster communications - the transport we use to rebalance user data, transfer erasure-coded slices, copy and transform datasets - a major upgrade !2860, !2895, !2984, !3053, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3286, !3303, !3356, !3357, !3396, !3403, !3409, !3415, !3417.

And also:

  • performance optimizations, CLI usability improvements, refactoring, cleanup, and stability fixes across the board.

Multi-Cloud

A new protocol prefix ht:// (in addition to s3://, gs://, and azure://) for seamless integration and uniform access to "vanilla" HTTP(S) based datasets.

Multi-Cloud via a single deployed runtime. Improved access to public Cloud buckets (from different Cloud providers). Bucket copying and transformations (see ETL below) extended to supports Cloud buckets.

  • New HTTP provider (ht://) - #882, #889
  • Multi-Cloud - added runtime support for bucket management of multiple Cloud providers - !2736, !2737
  • Support multiple regions for AWS buckets - #778, !2804
  • Improve Google provider error handling - !2792, !2793
  • Public GCP buckets can be use without setting PROJECT_ID - !2723
  • Remove default Cloud provider option (provider no must be set explicitly) - 2748
  • Support Cloud-based source/destination in a bucket copy operation - !2975
  • Prefetch performance improvement: keep cached object properties longer - #969

Core

Improve cluster stability in the presence of exceptional events, optimize cluster operation under heavy workloads, introduce maintenance mode, support permanent decommissioning of nodes from the cluster, improve the reliability of bucket destroy operation, optimize and further stabilize cluster rebalancing logic.

  • Node maintenance feature - #947, !2935, !2990, !3094
  • Improved out-of-space (out of capacity) handling - #822
  • Backend buckets vs bucket initialization - !2841
  • Improve cluster stability while it is in transition (when the primary changes) - #945, #968, #960
  • If cluster restarts during rebalancing we will now resume the rebalance - #913
  • Optimize copy-bucket and other bucket-traversing workloads - #917
  • Make promote consistent with other object operations - !2763, !2765
  • Add transfer statistics for resilvering - !2926
  • Configuration option Rebalance. Enabled now; affects only automatic rebalance (manual one can always be started - !2915
  • Reduce resource usage by StatsD (Grafana, Graphite) client - !3240
  • New CLI option --daemon-id to join a node with user-predefined ID - !3255
  • Fix object rename operation to work across different mountpaths - !3329
  • Make destroy bucket operation transactional - !3315
  • Volume meta data (VMD) - persistent information about a node and its storage configuration, used on startup when running node integrity checks - #939, #941, !3118, !3198
  • No metasync when shutting down - !2844
  • Not ignoring errors when listing multiple Cloud providers - !2845
  • Refactor reb (rebalance) package - !2857
  • Refactor target handlers and fix transactions' housekeeping logic - !2869
  • Refactor copy-object interface - !2879
  • Revise and refactor PROMOTE (command and API) - !2880
  • Refactor target copy-object and put-remote interfaces - !2881
  • Use data mover to copy buckets - !2893
  • LOM: fix CopyObject - !2908
  • cmn.JoinWords and friends - !2913
  • Always allow manual rebalance (even if automatic one is disabled) - !2915
  • Mountpath resilvering now counts moved objects and their total size - !2926
  • Copy buckets to return correct total size of copied content - !2919
  • Revise and optimize intra-cluster broadcasting - !2943
  • Improve HrwTargetList performance - !2945
  • Fix zero-size objects scenario - !3531

ETL

Multiple improvements and enhancements to the capability (introduced first with v3.2) to easily run user-defined custom dataset transformations - and scale the performance linearly with each added storage server. This release adds offline (dataset-to-dataset) transformation.

For ETL documentation (that now also includes animated presentations), please refer to docs/etl.md and etl/README.md

  • Add offline, local and cloud, bucket transformation - !2827, !2854, !2898, !3445
  • ETL for objects in the Cloud - !3399
  • ETL build operation - easy initialization based on the function definition - !2873, !2884, !2918, !3369
  • Remove kubectl (shell) calls, use K8s client-go instead - !2896, !2907
  • Support retrieving ETL logs - !2947
  • Stability and performance improvements, bug fixes - !2955, !2977, !3330, !3369, !3374, !3411
  • Add and improve labels in Pods and Services - !3445
  • Improve waiting for the Pod/Service to be ready - !3332, !3397
  • Add extension, prefix, and suffix flags for offline ETL - !2846
  • Support aborting offline ETL - !2850
  • Add dry run option for offline ETL - !2854
  • Simplify flow to initialize ETL - !2853
  • Consistent naming of API constants - !2861
  • ETL build: remove unnecessary annotations - !2871
  • Update skeleton docker images used to run custom Python-based transforms - !2870
  • Install dependencies in initContainer - !2873
  • POD spec: add volume mount - !2883
  • Unify offline ETL with copy-bucket - !2898, !2933
  • Improve waiting for POD-ready - !2912
  • Adddry-run capability - !2939
  • K8s client: pod namespace & refactoring - !2948
  • The capability to throttle ETL (transforms) depending on disk utilizations - !2998

Terraform integration

Dramatically simplified deployment of AIStore cluster on the Cloud via Terraform. This release delivers GKE but can be easily extended to support any Cloud that provides Kubernetes (service). It is now possible to start a fully functional AIStore cluster with a single command - for details, please refer to AIStore Kubernetes repository.

  • Add scripts for easy deployment and shutdown of the AIStore cluster on the cloud - !16, !56-!68, #14, #17
  • Add admin container image - !3079, !3195, !3359
  • Remove requirement for K8S_HOST_NAME environment variable - !3451

Information Center (IC)

More reliable extended action (xaction) status management and reporting, automatic cluster-wide xaction abort, xaction progress notifications (new). In AIS, xaction is a long-lived asynchronous operation, a job.

  • Notify all participating nodes when any one of them aborts xaction - !2928
  • Improve IC status reporting by polling xaction status from targets that have not reported xaction status yet - !2953
  • Fix xaction registration for newly added targets - !2924
  • Support both transactional and non-transactional xactions - !2734
  • Replace target polling with notifications when waiting for xaction to complete - !2868
  • xactions to return user-friendly status - !2865

Downloader

Integration with IC, more robust downloader job handling.

  • Downloader naming; fix mountpath register/unregister - !2842
  • Better job aborting; improved completion mechanisms - #902, !2960
  • Progress Bar: report periodic status and stats to IC (see above) - !2911

Distributed Shuffle (dSort)

Performance improvements, resource usage optimizations.

  • Performance: decrease resource usage - #938
  • Better data transport streams handling - #936, !3307

Erasure Coding (EC)

Resource usage optimizations, better slice checksum handling.

  • Fix checksum when sending constructed slices to other targets - !3073, !3132
  • Improve operation over data transport streams - #916, !3311
  • Fix receiving object slices when the bucket is being destroyed - #887
  • Add support for nodes in maintenance mode - !3404

Intra-cluster communications

The transport that we use to rebalance user data (e. g., when adding/removing nodes), transfer erasure-coded slices, copy and transform datasets has undergone a major upgrade:

  • Add data mover layer - !2860, !2895, !2899
  • Support for short messages and message streams - !2984, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3303
  • Revise and optimize transport stream multiplexing - !3141
  • When done transmitting, wait for data mover quiescence - !2903
  • Support streaming unsized objects - objects of unknown size - the functionality in particular useful when ETL-transforming objects on the fly (that is, inline) - !3356, !3357, !3396, !3403, !3409, !3415, !3417
  • Optimize memory management and debug unlikely races: !3053, !3189, !3286, !3298, !3309, !3314, !3319
  • Data mover: is-open vs quiescent - !2941

CLI (tool)

New command ais show mountpath, new option --keep for PROMOTE operation, allow running certain commands without accessing a cluster, redesigned ais rm node command, automatic progress indicator for long ais ls <bucket> operations, many fixes for various show commands.

  • Display EC xaction extra information for ais show xaction command - #823
  • Improve user experience: commands that do not need a cluster do not require the cluster is running - #878, !2914
  • Listing bucket objects with the flag --all displays all objects (including temporarily misplaced) - #964
  • Command ais cat now prints only object content, trailing object size information line is removed - !2729
  • Cloud bucket can be downloaded without setting backend bucket - !2803
  • Added progress indicator when listing a huge bucket - #884, !2786
  • Unify --all sub-option for all commands - !2843, !3264
  • New option for PROMOTE command: --keep original files after promoting them to objects - !2880
  • New command ais show mountpath to display target mountpath info - !2900, !3387
  • Fix displaying rebalance statistics - !3264
  • Fix ais show xaction rebalance to show the last xaction - !3250
  • Fix ais show cluster smap - !3243
  • Revise ais rm node command: add mandatory option --mode (to choose between node decommission and putting node in maintenance), and optional --no-rebalance (to skip rebalance and execute removal immediately) - !2965
  • An option to remove all finished download jobs - !2849
  • Wait option (flag) - !2876
  • New command ais show mountpath - !2900
  • Fix 'show rebalance' showing rebalance stats - !2954
  • Refactor CLI cat/get top-level commands - !2972

Other

  • aisloader (benchmark): add progress indicator when listing very large buckets - !2821
  • aisfs: APPEND operation is now checksum-protected - #780
  • build: use custom image for faster CI, enable more linters, switch to Go 1.15, add memory and CPU profiling options via make, upgrade third-party packages - !3235, !3121, !2949, !2916, !2993, !3050
  • CI/CD: fix k8s development scripts, run many more tests in minikube CI, add terraform GCP playground - !2851, !2858, !2980.
  • S3 compatibility: support AIS buckets with Cloud backend - !3532, #67, #68