Google JetStream Versions Save

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

v0.2.1

1 month ago

Key Changes

  • Support Llama3 tokenizer
  • JetStream Tokenizer refactor
  • Disaggregation preparation work

What's Changed

New Contributors

Full Changelog: https://github.com/google/JetStream/compare/v0.2.0...v0.2.1

v0.2.0

2 months ago

Major Changes

  • Support JetStream MaxText inference on Cloud TPU VM
  • Support JetStream Pytorch inference on Cloud TPU VM
  • Support Continuous Batching with interleaved mode in JetStream
  • Support online serving benchmarking

What's Changed

New Contributors

Full Changelog: https://github.com/google/JetStream/commits/v0.2.0