Chatglm.cpp Versions Save

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & more LLMs

v0.3.2

1 month ago
  • Support p-tuning v2 finetuned models for ChatGLM family
  • Fix convert.py for lora models & chatglm3-6b-128k
  • Fix RoPE theta config for 32k/128k sequence length
  • Better cuda cmake script respecting nvcc version

v0.3.1

4 months ago
  • Support function calling in OpenAI api server
  • Faster repetition penalty sampling
  • Support max_new_tokens generation option

v0.3.0

6 months ago
  • Full functionality of ChatGLM3 including system prompt, function call and code interpreter
  • Brand new OpenAI-style chat API
  • Add token usage information in OpenAI api server to be compatible with LangChain frontend
  • Fix conversion error for chatglm3-6b-32k

v0.2.10

6 months ago
  • Support ChatGLM3 in conversation mode.
  • Coming soon: new prompt format for system message and function call.

v0.2.9

7 months ago
  • Support InternLM 7B & 20B model architectures

v0.2.8

7 months ago
  • Metal backend support for all models (ChatGLM & ChatGLM2 & Baichuan-7B & Baichuan-13B)
  • Fix GLM generation on CUDA for long context

v0.2.7

8 months ago
  • Support Baichuan-7B model architecture (works for both Baichuan v1 & v2).
  • Minor bug fix and enhancement.

v0.2.6

8 months ago
  • Support Baichuan-13B on CPU & CUDA backends
  • Bug fix for Windows and Metal

v0.2.5

9 months ago
  • Optimize context computing (GEMM) for metal backend
  • Support repetition penalty option for generation
  • Update Dockerfile for CPU & CUDA backends with full functionality, hosted on GHCR

v0.2.4

9 months ago
  • Python binding enhancement: support load-and-convert directly from original Hugging Face models. Intermediate GGML model files are no longer necessary.
  • Small fix for CLI demo on Windows.