The defining infrastructure question of the next decade isn’t how to train larger AI models. It’s how to serve them. Today, we’re announcing our investment in VSORA. VSORA is building Europe’s only frontier-class AI inference chip — and we believe it’s one of the most promising deeptech opportunities on the continent.
The Rise of AI Inference
For much of the last three years, the AI story has been told through training: the multi-billion-dollar compute runs that teach models to write, reason, and create. A more economically significant shift is now underway. Inference — the act of running a trained model to serve a user — is rapidly overtaking training as the dominant workload in AI computing.
By 2026, inference is projected to account for roughly two-thirds of all AI compute spend, about a third in 2023. Unlike training, which is a one-time cost per model generation, inference scales continuously with adoption. OpenAI reportedly spent on the order of $5B on inference in 2025, with inference costs rising several-fold over the year. This is not an OpenAI-specific problem. It is a structural feature of the AI economy: usage growth compounds inference spend faster than efficiency gains can offset it.
The consequence is that cost-per-token has become the central procurement criterion for AI infrastructure, and it has exposed a fundamental limitation in the hardware that has powered the AI revolution to date.
The Problem with GPUs for Inference
To understand why VSORA exists, it helps to understand why GPUs are poorly suited to inference workloads.
A GPU has two main components: a processor that handles computation, and memory that feeds the processor with data. GPUs were originally designed for graphics and later adapted for AI training, where they remain extraordinarily efficient. Training runs on large, regular batches of data at high throughput — exactly what GPUs were built to handle.
Inference is structurally different. When a GPU processes a query, data must travel through multiple layers of storage before reaching the compute units: from HBM global memory to L2 cache(≈50MB on an H100), then to L1 cache and shared memory, and finally into the registers used by the compute units themselves. Each transition introduces idle time.Industry analyses indicate that on inference workloads GPUs often realise only a fraction of their theoretical peak — frequently around 20% — with most of the rest lost to data movement.
The second constraint is memory capacity. Large language models hold their parameters in chip memory during operation. A model like Llama 405B requires approximately 810GB just to load. Today leading accelerators typically offer 192GB per unit, meaning a single large model must be split across five or more chips, each one adding communication overhead, integration complexity, and cost.
This is what the semiconductor industry calls the memory wall. It is not a bug that can be patched. It follows directly from architectures that were never designed for this workload.
VSORA’s Architecture: From Automotive to Datacenters
VSORA was founded in 2015 by Khaled Maalej and the engineering team behind DiBcom, the French chip company that became a global leader in digital TV and mobile silicon before its acquisition by Parrot in 2011. The core team has worked together for over 25 years and completed 14 chip tape-outs across their careers — a genuinely unusual execution record at this stage.
VSORA’s early years were spent on chips for the automotive sector, where every wasted clock cycle has a real cost in latency and power. The architectural foundations the team refined in that domain turned out to be precisely what datacenter inference would soon require. Three years ago, the company reoriented entirely toward AI infrastructure, and the fit was strong .
The result is the Matrix Processing Unit (MPU). Instead of the GPU’s multi-level cache hierarchy, the MPU gives every arithmetic unit direct access to a single large block of Tightly Coupled Memory (TCM) that functions as a massive register bank. Data moves from external HBM into the TCM, and from there each processing unit can fetch it in a single clock cycle. The chain of transitions that throttles GPUs on inference simply isn’t there.
Key specifications of the first chip include:
- Implementation on TSMC’s 5nm process node
- Advanced chiplet architecture design and integration
- 2.5D die-to-die connectivity using GUC’s 17.2 Gbps GLink-2.5D interconnect
- Advanced packaging implementation leveraging TSMC CoWoS-S technology (3x reticle size)
- Full-system co-optimization of signal, power, and thermal integrity (SI/PI/TI)
- Power and IR optimization to enhance overall system efficiency
- 3,200 Tflops of compute powerwith 288 GB HBM3e memory
That last point matters more than it might appear. A model like Llama 405B fits on three Jotunn8 chips, versus five for a 192GB competitor . That means less inter-chip communication, fewer components, simpler datacenter integration, and a materially lower total cost of ownership. Memory capacity is not just a performance metric — it is a threshold condition for which models a chip can actually run.


The TSMC Partnership
Building a chip at this specification requires access to things that are not freely available. Leading-edge manufacturing at the 5nm and 2nm nodes is controlled by a handful of foundries. The advanced packaging technology needed to connect multiple HBM stacks to the processor die on a single substrate (CoWoS) is effectively a TSMC monopoly at datacenter scale. Without it, building a 288GB chip is simply not possible.
VSORA works closely with TSMC and its design ecosystem — including GUC, its silicon-implementation and advanced-packaging partner — and has presented Jotunn 8 at the 2026 EUROPE TSMC SYMPOSIUM. In practice, this translates into privileged access to manufacturing capacity, process nodes, and packaging technology that a new entrant would need two to three years to qualify for, at best. It is one of the most meaningful structural advantages a chip company at this stage can have.
Jotunn8, VSORA’s first commercial chip, was taped out on TSMC’s 5nm node in October 2025, with initial silicon confirmed electrically alive in early 2026. A next-generation chip is already in development using state-of-the-art semiconductor technology."
A Non-US Alternative at Scale
The market for high-performance AI inference chips is, today, effectively a US-controlled duopoly. NVIDIA dominates; AMD is the only credible Western alternative. Both are subject to US export controls that restrict the sale of high-performance chips to a growing list of countries and regions.
For non-US cloud providers, national AI programmes, and enterprises operating in restricted jurisdictions, this creates a genuine supply risk. The European Union, several Gulf states, Japan, and others have articulated explicit goals around AI infrastructure independence, and the investment plans backing those goals are not symbolic.
On the current evidence, VSORA is the only European company operating in the same performance tier— and one of a small handful worldwide outside the US and China. Its software stack supports standard AI frameworks natively without requiring CUDA. Its TSMC partnership gives it access to the same manufacturing capabilities as the incumbents.
The Team
What ultimately convinced us is the team — and it is unlike most of what we see in European deep tech.
Chip companies are not built on good ideas. They are built on decades of accumulated knowledge, hard-won manufacturing experience, and teams that have spent years solving hard problems together. At VSORA, that foundation runs unusually deep.
Khaled Maalej, VSORA’s CEO, is an École Polytechnique and Télécom Paris engineer who spent most of his career as CTO of DiBcom, where he built chips that ended up in hundreds of millions of devices worldwide. He is not a first-time founder navigating chip design for the first time. He has done this before, repeatedly, with the same people, and it has worked team has 14 successful tape-outs to its name — a record that reflects a level of engineering discipline and institutional knowledge that cannot be assembled quickly.
Alongside him, Trung Dung Nguyen, Julien Schmitt and Pierre-Emmanuel Bernard lead the hardware and software organizations. They have worked with Khaled for 18 years, through DiBcom, Parrot, and now VSORA. They are not a team that recently decided to build a chip company. They are a group that has spent nearly two decades refining a shared technical intuition, and it shows in the architecture they have built.
On the board, VSORA recently recruited Sandra Rivera as Chair. Sandra was EVP and General Manager of Data Center and AI Group at Intel before becoming CEO of Altera. She is actively helping shape the commercial build-out, drawing on deep relationships across the major datacenter buyers, built over two decades at Intel. She brings the kind of direct access that a European hardware startup could not otherwise credibly claim.
Why Now
What drew us to VSORA is a combination of factors that rarely align: a team with a track record of delivering complex chips, successful tape-out of a frontier-class inference processor, a structural market need compounding in real time, and a manufacturing moat that takes years to replicate.
The AI infrastructure layer of the next decade is being defined now. Inference economics will determine who can build, deploy, and scale AI products — and where that compute lives will determine which regions retain technological sovereignty. We believe VSORA has the team, the architecture, and the partnerships to be one of the small handful of companies that defines this layer.
We are proud to back them.
Article written by Guilhem de Vregille, Partner, & Julie Forel, former Senior Associate at XAnge.


