A simple Python sandbox for helpful LLM data agents
Terrarium is a relatively low latency, easy to use, and economical Python sandbox - to be used as a docker deployed container, for example in GCP Cloud Run - for executing untrusted user or LLM generated python
code.
Using the deployed Cloud Run is super easy - just call it with the code
to run & authorization bearer (if so configured) as follows:
curl -X POST --url <name of your deployed gcp cloud run> \
-H "Authorization: bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
--no-buffer \
--data-raw '{"code": "1 + 1"}'
which returns:
{"output_files":[],"final_expression":2,"success":true,"std_out":"","std_err":"","code_runtime":16}
The authentication gcloud auth print-identity-token
needs to be renewed every hour.
See terrarium_client.py
for an easy-to-use python function to call the service - including file input & output functionality via base64 encoded files.
The sandbox is composed of multiple layers:
Parse, compile, & execute python code inside a node.js process - via CPython compiled to webassembly, not running natively - with https://pyodide.org/en/stable/index.html. This approach restricts the untrusted code's abilities:
Deploy the node.js host into a GCP Cloud Run container, which restricts:
The following packages are supported out of the box: https://pyodide.org/en/stable/usage/packages-in-pyodide.html including, but not limited to:
You need node.js installed on your system. To install dependencies run:
npm install
mkdir pyodide_cache
run the server & function locally:
npm run dev
execute code in the terrarium:
curl -X POST -H "Content-Type: application/json" \
--url http://localhost:8080 \
--data-raw '{"code": "1 + 1"}' \
--no-buffer
run a set of test files (all .py files in /test
) through the endpoint with:
python terrarium_client.py http://localhost:8080
To run in docker:
Build:
docker build -t terrarium .
Run:
docker run -p 8080:8080 terrarium
Stop:
docker ps
to get the container id and then
docker stop {container_id}
Allocating more resources to speed up run time as well as limiting concurrency from Cloud Run:
gcloud run deploy <insert name of your deployment here> \
--region=us-central1 \
--source . \
--concurrency=1 \
--min-instances=3 \
--max-instances=100 \
--cpu=2 \
--memory=4Gi \
--no-cpu-throttling \
--cpu-boost \
--timeout=100
Pyodide today runs on the node.js main process, and can block node.js from responding. Pyodide recommends using a Worker if we need to interrupt. However the interface with pyodide would be through message passing, and it doesn't support matplotlib amongst other libraries.
Example code that would trigger a timeout.
curl -m 110 -X POST <insert name of your deployment here> \
-H "Authorization: bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
-d '{
"code": "import time\ntime.sleep(200)"
}'
Cloud Run doesn't support Dockerfile healthcheck. Once the service is deployed for the first time, you need to grab the service.yaml file and add the liveness probe.
gcloud run services describe <insert name of your deployment here> --format export > service.yaml
Add livenessProbe after the image
definition
livenessProbe:
failureThreshold: 1
httpGet:
path: /health
port: 8080
periodSeconds: 100
timeoutSeconds: 1
Run gcloud run services replace service.yaml
This is only needed once per new Cloud Run service deployed.
Docker itself doesn't support auto-restarts based on HEALTHCHECK (it seems). Process with pid 1
seems protected, and can't be killed. Would need to spin up a separate service like so: https://github.com/willfarrell/docker-autoheal
For large & complex computations we sometimes observe untraceble "RangeError: Maximum call stack size exceeded" exceptions in Pyodide.
See also: https://blog.pyodide.org/posts/function-pointer-cast-handling/