Chatglm.cpp Versions Save

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & more LLMs

v0.3.2

1 month ago

Support p-tuning v2 finetuned models for ChatGLM family
Fix convert.py for lora models & chatglm3-6b-128k
Fix RoPE theta config for 32k/128k sequence length
Better cuda cmake script respecting nvcc version

v0.3.1

4 months ago

Support function calling in OpenAI api server
Faster repetition penalty sampling
Support max_new_tokens generation option

v0.3.0

6 months ago

Full functionality of ChatGLM3 including system prompt, function call and code interpreter
Brand new OpenAI-style chat API
Add token usage information in OpenAI api server to be compatible with LangChain frontend
Fix conversion error for chatglm3-6b-32k

v0.2.10

6 months ago

Support ChatGLM3 in conversation mode.
Coming soon: new prompt format for system message and function call.

v0.2.9

7 months ago

Support InternLM 7B & 20B model architectures

v0.2.8

7 months ago

Metal backend support for all models (ChatGLM & ChatGLM2 & Baichuan-7B & Baichuan-13B)
Fix GLM generation on CUDA for long context

v0.2.7

8 months ago

Support Baichuan-7B model architecture (works for both Baichuan v1 & v2).
Minor bug fix and enhancement.

v0.2.6

8 months ago

Support Baichuan-13B on CPU & CUDA backends
Bug fix for Windows and Metal

v0.2.5

9 months ago

Optimize context computing (GEMM) for metal backend
Support repetition penalty option for generation
Update Dockerfile for CPU & CUDA backends with full functionality, hosted on GHCR

v0.2.4

9 months ago

Python binding enhancement: support load-and-convert directly from original Hugging Face models. Intermediate GGML model files are no longer necessary.
Small fix for CLI demo on Windows.