Bert As Service Versions Save

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

v0.8.3

4 months ago

Release Note (0.8.3)

Release time: 2023-12-20 04:13:18

🙇 We'd like to thank all contributors for this new release! In particular, Zihao Jing, Han Xiao, Nick de Silva, Ziniu Yu, Jina Dev Bot, 🙇

🐞 Bug fixes

  • [280b925e] - fix docarray at v1 (#911) (Ziniu Yu)

📗 Documentation

  • [ca2b25b7] - remove jina self-hosted parts (#942) (Zihao Jing)
  • [6e418fe6] - replace free service docs with inference docs (#918) (Ziniu Yu)

🍹 Other Improvements

  • [d4e7a30b] - Update README.md (Han Xiao)
  • [679de4e3] - change slack link to discord (Han Xiao)
  • [02abdc7b] - version: the next version will be 0.8.3 (Jina Dev Bot)

v0.8.2

1 year ago

Release Note (0.8.2)

Release time: 2023-04-19 08:23:45

🙇 We'd like to thank all contributors for this new release! In particular, Ziniu Yu, Yang Ruiyi, YangXiuyu, Jie Fu, zawabest, Girish Chandrashekar, Jina Dev Bot, 🙇

🆕 New Features

  • [cce3b05a] - set prefetch in client for traffic control (#897) (Ziniu Yu)
  • [dabbe8bc] - add cn clip model (#888) (Yang Ruiyi)
  • [1fe3a5a0] - add fp16 inference support (torch/onnx) (#871) (YangXiuyu)
  • [1eebdd7f] - add custom tracing spans with jina>=3.12.0 (#861) (Girish Chandrashekar)
  • [f2515394] - add three new open clip roberta base models (#860) (YangXiuyu)
  • [e4717a35] - Integrate flash attention (#853) (YangXiuyu)

🐞 Bug fixes

  • [280b925e] - fix docarray at v1 (#911) (Ziniu Yu)
  • [35733a0b] - replace transform ndarray with transform blob (#910) (Ziniu Yu)
  • [d70f2382] - onnx package conflict during setup (#894) (Ziniu Yu)
  • [8a576c58] - install pytorch cu116 for server docker image (#882) (Ziniu Yu)
  • [0b293ec8] - dynamic convert onnx model to fp16 during start session (#876) (YangXiuyu)
  • [fd16e5ab] - check dtype when loading models (#872) (Ziniu Yu)
  • [67f551ca] - torchvision version to avoid compatibility issue (#866) (Jie Fu)
  • [0223e6fa] - add pip installable flash attention (#863) (YangXiuyu)

📗 Documentation

  • [1888ef65] - fix broken link in client doc (#909) (Ziniu Yu)
  • [f4eed3bc] - add link and intro to inference api (#900) (Ziniu Yu)
  • [702fff88] - default model suggestion (#874) (Jie Fu)

🍹 Other Improvements

  • [19b4fa51] - remove docsqa html (#899) (Ziniu Yu)
  • [aa07d257] - remove docsqa (#898) (Ziniu Yu)
  • [f3421f7c] - bump open-clip-torch to v2.8.0 (#883) (Ziniu Yu)
  • [c7af9f71] - fix configuration file for the search flow doc (#869) (zawabest)
  • [53cd0630] - hide changelog in docs (#864) (Ziniu Yu)
  • [9bb7d1f4] - version: the next version will be 0.8.2 (Jina Dev Bot)

v0.8.1

1 year ago

Release Note (0.8.1)

Release time: 2022-11-15 11:15:48

This release contains 1 new feature, 1 performance improvement, 2 bug fixes and 4 documentation improvements.

🆕 Features

Allow custom callback in clip_client (#849)

This feature allows clip-client users to send a request to a server and then process the response with a custom callback function. There are three callbacks that users can process with custom functions: on_done, on_error and on_always.

The following code snippet shows how to send a request to a server and save the response to a database.

from clip_client import Client

db = {}

def my_on_done(resp):
    for doc in resp.docs:
        db[doc.id] = doc


def my_on_error(resp):
    with open('error.log', 'a') as f:
        f.write(resp)


def my_on_always(resp):
    print(f'{len(resp.docs)} docs processed')


c = Client('grpc://0.0.0.0:12345')
c.encode(
    ['hello', 'world'], on_done=my_on_done, on_error=my_on_error, on_always=my_on_always
)

For more details, please refer to the CLIP client documentation.

🚀 Performance

Integrate flash attention (#853)

We have integrated the flash attention module as a faster replacement for nn.MultiHeadAttention. To take advantage of this feature, you will need to install the flash attention module manually:

pip install git+https://github.com/HazyResearch/flash-attention.git

If flash attention is present, clip_server will automatically try to use it.

The table below compares CLIP performance with and without the flash attention module. We conducted all tests on a Tesla T4 GPU, and times how long it took to encode a batch of documents 100 times.

Model Input data Input shape w/o flash attention flash attention Speedup
ViT-B-32 text (1, 77) 0.42692 0.37867 1.1274
ViT-B-32 text (8, 77) 0.48738 0.45324 1.0753
ViT-B-32 text (16, 77) 0.4764 0.44315 1.07502
ViT-B-32 image (1, 3, 224, 224) 0.4349 0.40392 1.0767
ViT-B-32 image (8, 3, 224, 224) 0.47367 0.45316 1.04527
ViT-B-32 image (16, 3, 224, 224) 0.51586 0.50555 1.0204

Based on our experiments, performance improvements vary depending on the model and GPU, but in general, the flash attention module improves performance.

🐞 Bug Fixes

Increase timeout at startup for Executor docker images (#854)

During Executor initialization, it can take quite a lot of time to download model parameters. If a model is very large and downloading slowly, the Executor may fail due to time-out before even starting. We have increased the timeout to 3000000ms.

Install transformers for Executor docker images (#851)

We have added the transformers package to Executor docker images, in order to support the multilingual CLIP model.

📗 Documentation Improvements

  • Update Finetuner docs (#843)
  • Add tips for client parallelism usage (#846)
  • Move benchmark conclusion to beginning (#847)
  • Add instructions for using clip server hosted by Jina (#848)

🤟 Contributors

We would like to thank all contributors to this release:

  • Ziniu Yu (@ZiniuYu)
  • Jie Fu (@jemmyshin)
  • felix-wang (@numb3r3)
  • YangXiuyu (@OrangeSodaHub)

v0.8.0

1 year ago

Release Note (0.8.0)

Release time: 2022-10-12 08:11:40

This release contains 3 new features, 1 performance improvement, and 1 documentation improvements.

🆕 Features

Support large ONNX model files (#828)

Before this release, the ONNX model file is limited to 2GB. Now we support large ONNX models which are archived into zip files, in which several small ONNX files are stored for subgraphs. As a result, we are now able to serve all of the CLIP models via onnxruntime.

Support ViT-B-32, ViT-L-14, ViT-H-14 and ViT-g-14 trained on laion-2b (#825)

Users can now serve four new CLIP models from OpenCLIP trained on the Laion-2B dataset:

  • ViT-B-32::laion2b-s34b-b79k
  • ViT-L-14::laion2b-s32b-b82k
  • ViT-H-14::laion2b-s32b-b79k
  • ViT-g-14::laion2b-s12b-b42k

The ViT-H-14 model achieves 78.0% zero-shot top-1 accuracy on ImageNet and 73.4% on zero-shot image retrieval at Recall@5 on MS COCO. This is the best-performing open source CLIP model. To use the new models, simply specify the model name, e.g., ViT-H-14::laion2b-s32b-b79k in the FLOW YAML. For example:

jtype: Flow
version: '1'
with:
  port: 51000
executors:
  - name: clip_t
    uses:
      jtype: CLIPEncoder
      with:
        name: ViT-H-14::laion2b-s32b-b79k
      metas:
        py_modules:
          - clip_server.executors.clip_torch

Please refer to model support to see the full list of supported models.

In-place result in clip_client; preserve output order by uuid (#815)

The clip_client module now supports in-place embedding. This means the result of a call to the CLIP server to get embeddings is stored in the input DocumentArray, instead of creating a new DocumentArray. Consequently, the DocumentArray returned by a call to Client.encode now has the same order as the input DocumentArray.

This could cause a breaking change if code depends on Client.encode to return a new DocumentArray instance.

If you run the following code, you can verify that the input DocumentArray now contains the embeddings and that the order is unchanged.

from docarray import DocumentArray, Document
from clip_client import Client

c = Client('grpc://0.0.0.0:51000')

da = [
    Document(text='she smiled, with pain'),
    Document(uri='apple.png'),
    Document(uri='apple.png').load_uri_to_image_tensor(),
    Document(blob=open('apple.png', 'rb').read()),
    Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
    Document(
        uri=''
    ),
]

c.encode(da)
print(da.embeddings)

🚀 Performance

Drop image content to boost latency (#824)

Calls to Client.encode no longer return the input image with the embedding. Since embeddings are now inserted into the original DocumentArray instance, this is unnecessary network traffic. As a result, the system is now faster and more responsive. Performance improvement is dependent on the size of the image and network bandwidth.

📗 Documentation Improvements

CLIP benchmark on zero-shot classification and retrieval tasks (#832)

We now provide benchmark information for CLIP models on zero-shot classification and retrieval tasks. This information should help users to choose the best CLIP model for their specific use-cases. For more details, please read the Benchmark page in the CLIP-as-Service User Guide.

🤟 Contributors

We would like to thank all contributors to this release: felix-wang(@numb3r3 ) Ziniu Yu(@ZiniuYu ) Jie Fu(@jemmyshin )

v0.7.0

1 year ago

Release Note (0.7.0)

Release time: 2022-09-13 13:47:54

🙇 We'd like to thank all contributors for this new release! In particular, numb3r3, felix-wang, Jie Fu, Ziniu Yu, Jina Dev Bot, 🙇

🆕 New Features

  • [a07a5218] - support clip retrieval (#816) (felix-wang)

🐞 Bug fixes

  • [213ecc28] - always return docarray as search result (#821) (felix-wang)
  • [eca57745] - readme: use new demo server (#819) (felix-wang)

📗 Documentation

  • [8d9725fb] - update clip search (#820) (felix-wang)
  • [fa7e5776] - docs for retrieval (#808) (Jie Fu)
  • [47144c23] - enable horizontal scrolling in wide tables (#818) (Ziniu Yu)

🍹 Other Improvements

  • [53636cea] - bump version to 0.7.0 (numb3r3)
  • [eda4aa8e] - version: the next version will be 0.6.3 (Jina Dev Bot)
  • [f7ee26a1] - improve model not found error msg (#812) (Ziniu Yu)

v0.6.2

1 year ago

Release Note (0.6.2)

Release time: 2022-09-01 04:16:27

🙇 We'd like to thank all contributors for this new release! In particular, Ziniu Yu, Jina Dev Bot, felix-wang, 🙇

🐞 Bug fixes

  • [ea239685] - grpc meta auth (#811) (felix-wang)

📗 Documentation

  • [4461d2e9] - update model support table (#813) (Ziniu Yu)

🍹 Other Improvements

  • [f7ee26a1] - improve model not found error msg (#812) (Ziniu Yu)
  • [f1c0057d] - version: the next version will be 0.6.2 (Jina Dev Bot)

v0.6.1

1 year ago

Release Note (0.6.1)

Release time: 2022-08-30 13:57:32

🙇 We'd like to thank all contributors for this new release! In particular, felix-wang, Jina Dev Bot, numb3r3, 🙇

🐞 Bug fixes

  • [ea239685] - grpc meta auth (#811) (felix-wang)

🍹 Other Improvements

  • [83a8120c] - version: the next version will be 0.6.1 (Jina Dev Bot)
  • [2a80235c] - bump version to 0.6.0 (numb3r3)

v0.6.0

1 year ago

Release Note (0.6.0)

Release time: 2022-08-30 04:19:21

🙇 We'd like to thank all contributors for this new release! In particular, numb3r3, Ziniu Yu, felix-wang, Jina Dev Bot, 🙇

🆕 New Features

  • [3c43eed3] - do not send blob from server when it is loaded in client (#804) (Ziniu Yu)
  • [f852dfc8] - add warning if input is too large (#796) (Ziniu Yu)
  • [65032f02] - encode text first when both text and uri are presented (#795) (Ziniu Yu)

🐞 Bug fixes

  • [bb2c142b] - cast dtype for fp16 (#801) (felix-wang)

📗 Documentation

  • [a5893c70] - update jcloud gpu usage (#809) (Ziniu Yu)
  • [b4fb0dd2] - fix hub table typo (#803) (Ziniu Yu)

🍹 Other Improvements

  • [2a80235c] - bump version to 0.6.0 (numb3r3)
  • [59b9f771] - update protobuf version (#810) (Ziniu Yu)
  • [89205f06] - update executor docstring (#806) (Ziniu Yu)
  • [25c91e21] - version: the next version will be 0.5.2 (Jina Dev Bot)

v0.5.1

1 year ago

Release Note (0.5.1)

Release time: 2022-08-08 05:11:18

🙇 We'd like to thank all contributors for this new release! In particular, Ziniu Yu, Jina Dev Bot, numb3r3, 🙇

🆕 New Features

  • [65032f02] - encode text first when both text and uri are presented (#795) (Ziniu Yu)

📗 Documentation

  • [7c6708fa] - update hub readme (#794) (Ziniu Yu)

🍹 Other Improvements

  • [a7c4f490] - version: the next version will be 0.5.1 (Jina Dev Bot)
  • [b00963c4] - bump version to 0.5.0 (numb3r3)

v0.5.0

1 year ago

Release Note (0.5.0)

Release time: 2022-08-03 05:13:06

🙇 We'd like to thank all contributors for this new release! In particular, numb3r3, Ziniu Yu, Alex Shan, felix-wang, Sha Zhou, Jina Dev Bot, Han Xiao, 🙇

🆕 New Features

  • [3402b1d1] - replace traversal_paths with access_paths (#791) (Ziniu Yu)
  • [87928a7b] - update onnx models and md5 (#785) (Ziniu Yu)
  • [8bd83896] - support onnx backend for openclip (#781) (felix-wang)
  • [f043b4d9] - update openclip loader (#782) (Alex Shan)
  • [fa62d8e9] - support openclip&mclip models + refactor model loader (#774) (Alex Shan)
  • [32b11cd6] - allow model selection in client (#775) (Ziniu Yu)
  • [0ff4e252] - allow credential in client (#765) (Ziniu Yu)
  • [ee7da10d] - support custom onnx file and update model signatures (#761) (Ziniu Yu)
  • [ed1b92d1] - docs: add qabot (#759) (Sha Zhou)

🐞 Bug fixes

  • [e48a7a38] - change onnx and trt default model name to ViT-B-32::openai (#793) (Ziniu Yu)
  • [8b8082a9] - mclip cuda device (#792) (felix-wang)
  • [8681b88e] - fp16 inference (#790) (felix-wang)
  • [ab00c2ae] - upgrade jina (#788) (felix-wang)
  • [1db43b48] - no allow client to change server batch size (#787) (Ziniu Yu)
  • [58772079] - add models and md5 (#783) (Ziniu Yu)
  • [7c8285bb] - async progress bar does not display (#779) (Ziniu Yu)
  • [79e85eed] - miscalling clip_server in clip_client (Han Xiao)

📗 Documentation

  • [c67a7f59] - add model support (#784) (Alex Shan)
  • [bc6b72e6] - add finetuner docs (#771) (Ziniu Yu)
  • [2b78b12e] - improve model support (#768) (Ziniu Yu)

🍹 Other Improvements

  • [b00963c4] - bump version to 0.5.0 (numb3r3)
  • [c458dd65] - remove clip_hg (#786) (Ziniu Yu)
  • [ca03dca3] - fix markdown-table extention (#772) (felix-wang)
  • [7b19bffe] - version: the next version will be 0.4.21 (Jina Dev Bot)