Etherum Co-founder Vitalik Buterin Details Ultra-Secure, ...

Buterin Rejects Cloud AI Services, Prioritizing Privacy and Self-Sovereignty

Etherum co-founder Vitalik Buterin has completely transitioned away from utilizing cloud-based artificial intelligence services. In a recent technical blog post detailing his private setup, he outlined a fully local, sandboxed architecture for running large language models (LLMs). His decision stems from deep concerns regarding the widespread security and privacy shortcomings observed within the current AI agent domain.

Buterin noted that research indicates approximately 15% of available AI agent skills or plug-in tools contain potentially malicious instructions. Furthermore, he cited findings from the security firm Hiddenlayer, which demonstrated that simply processing a single compromised webpage could fully jeopardize an Openclaw instance, enabling it to download and execute shell scripts without the user being aware.

The Architecture of a Local AI System

Buterin described his resulting system as “self-sovereign, local, private, and secure.” For the hardware component, he selected a laptop equipped with an Nvidia 5090 GPU that possesses 24 GB of video memory. By running the open-weights Qwen3.5:35B model from Alibaba via llama-server, his setup achieves a throughput of 90 tokens per second—a rate he deems suitable for comfortable daily use.

He compared this performance to other hardware options; an AMD Ryzen AI Max Pro unit with 128 GB unified memory recorded 51 tokens per second, while the DGX Spark desktop supercomputer reached 60 tokens per second. Despite its high cost and lower throughput relative to a capable laptop GPU, Buterin found the DGX Spark underwhelming.

Regarding his operating system choice, he moved from Arch Linux to NixOS, which allows users to define their entire computing environment using one centralized, declarative configuration file. To maintain connectivity without external cloud dependencies, he employs llama-server as a background daemon that exposes a local port for any connected application.

Implementing Security Measures and Controls

Central to Buterin’s security model is the practice of sandboxing. He utilizes bubblewrap to establish isolated environments from any given directory using a single command, ensuring that any processes running within these secure boundaries can only access explicitly permitted files or controlled network ports.

To enhance messaging safety, he open-sourced a messaging daemon designed to wrap both signal-cli and email. This tool permits the reading of messages and sending them to himself without requiring confirmation. However, for any outgoing communication directed toward an external third party, explicit human authorization is mandatory. Buterin termed this mechanism the “human + LLM 2-of-2” model, advising that similar strict logic should be applied when developing Etherum wallet tools.

In terms of financial security, he advised development teams building AI-connected wallet utilities to set a daily autonomous transaction limit at $100. Furthermore, any transaction exceeding this amount or involving call data capable of exfiltrating information must require human confirmation.

Research and Remote Inference Protocols

For research applications, Buterin compared his local tool, Local Deep Research, against his own setup, which combines the pi agent framework with SearXNG, a self-hosted meta-search engine focused on privacy. He asserted that combining pi plus SearXNG yielded superior quality results.

To mitigate reliance on external searches—which he views as potential privacy leaks—he maintains a local dump of Wikipedia, covering approximately 1 terabyte, alongside technical documentation. Additionally, he published a local audio transcription daemon, which can process speech without needing a GPU for basic functions, feeding the output to an LLM for refinement and summarization.

For cases where local models are insufficient, Buterin outlined privacy-preserving methods for remote inference. These include his ZK-API proposal with researcher Davide, the Openanonymity project, and using mixnets to prevent IP address linking across successive requests by servers. While he recognized trusted execution environments as a near-term way to lessen data leakage, he noted that fully homomorphic encryption remains too slow for practical cloud inference today.

Buterin concluded his post by cautioning readers that the article describes merely a beginning stage, not a final product, and warned against replicating his exact tools while assuming they are inherently secure.

Summary

The rise of private AI deployment in Singapore reflects a broader industry shift toward self-reliance and data sovereignty. By avoiding cloud subscriptions, businesses can achieve greater control over their AI systems while aligning with local regulatory standards. However, this approach requires careful planning to balance cost efficiency with technical scalability.

As the demand for private AI deployment in Singapore grows, companies must weigh the long-term benefits of on-premises solutions against the initial investment required. The right infrastructure can transform AI from a costly overhead into a strategic asset—one that supports innovation while safeguarding critical data.

MT Labs helps companies across Singapore deploy AI tools they actually own. Private infrastructure, no recurring cloud subscriptions, and a setup built around how your team already works. Whether you’re exploring your first AI use case or consolidating scattered tools into one system, we’ll walk you through it.
Get in touch and let’s figure out what makes sense for your business.

Related reading: