From Kernel WASM to User-Space Policy Evaluation: Lessons Learned at Riptides

Written by

Nandor Kracser

Published on

October 6, 2025

From Kernel WASM to User-Space Policy Evaluation: Lessons Learned at Riptides

Introduction

At Riptides, we're building a platform that provides seamless kernel-based non-human identity (NHI) with SPIFFE and kTLS, delivering deep socket-level security and real-time policy enforcement inside the Linux kernel. One of our core challenges has been determining the optimal architecture for policy evaluation - specifically, where and how to run Open Policy Agent (OPA) policies that govern socket connections in real-time.

This is the story of our journey from an ambitious kernel-space WASM implementation to a pragmatic user-space solution, and the hard-earned lessons we learned along the way.

The Initial Vision: WASM in the Kernel

When we first architected the Riptides platform, we had a bold vision: evaluate OPA policies directly in kernel space using a WebAssembly runtime. We thought this could work as an eBPF alternative of sorts, providing a more flexible and powerful way to run policy logic in kernel space. The reasoning seemed sound:

Ultra-low latency: Policy evaluation would happen at the kernel level without context switches
Isolation: WASM provides sandboxing for untrusted policy code
Flexibility: Policies could be updated without kernel module changes
Performance: No serialization overhead between kernel and userspace
eBPF alternative: More expressive than eBPF for complex policy logic while maintaining kernel-space execution

To make this vision reality, we forked the wasm3 WebAssembly runtime and began the challenging work of porting it to run in Linux kernel space. For those interested in the technical details, our kernel port is available at github.com/riptideslabs/wasm3-kernel. Also, this wasn't the first Wasm kernel journey in history, see: https://github.com/wasmerio/kernel-wasm

The Technical Journey

Porting wasm3 to Kernel Space

The wasm3 runtime was an attractive choice for kernel porting because:

It's designed to be lightweight and embeddable
Uses an interpreter rather than JIT compilation (safer and simpler in kernel context)
Has minimal external dependencies
Relatively small codebase

However, porting any userspace runtime to kernel space presents significant challenges, memory management seemed like an easy one:

// Example of kernel-space WASM memory management
static void* wasm_kernel_malloc(size_t size) {
    return kmalloc(size, GFP_KERNEL);
}

static void wasm_kernel_free(void* ptr) {
    kfree(ptr);
}

// Replaced standard library functions with kernel equivalents
#define malloc(size) wasm_kernel_malloc(size)
#define free(ptr) wasm_kernel_free(ptr)

‍

We had to:

Replace all standard library calls with kernel equivalents
Implement custom memory management for WASM linear memory
Handle stack overflow protection in kernel context
Ensure atomic operations were kernel-safe
Remove floating-point operations (not allowed in some kernel contexts)

The floating-point removal was particularly challenging. We had to modify wasm3 to handle the absence of floating-point operations, but many OPA-compiled WASM modules contained floating-point operations for numeric computations. While OPA's WASM compilation generally produces efficient code, certain Rego built-in functions like mathematical operations, JSON number parsing, or comparison functions could generate floating-point instructions. This incompatibility between our floating-point-free kernel environment and the OPA WASM compilation output became a significant source of policy evaluation failures, especially for policies that performed numeric operations or worked with JSON data containing decimal numbers.

Integrating OPA Policies

Once we had wasm3 running in the kernel, we integrated it with OPA policy evaluation. OPA provides the ability to compile Rego policies into WASM modules using opa build -t wasm, which creates standalone WASM executables that can be loaded into any WASM runtime:

// Simplified policy evaluation in kernel module
static int evaluate_socket_policy(riptides_socket *s, const char *policy_wasm_data, size_t wasm_size) {
    wasm3_runtime *runtime = get_wasm_runtime();
    
    // Load OPA-compiled policy WASM module
    wasm3_module *policy_module = wasm3_load_module(runtime, policy_wasm_data, wasm_size);
    
    // Prepare socket context as JSON input
    char input_json[512];
    snprintf(input_json, sizeof(input_json), 
        "{"
        "\"source_ip\":\"%s\","
        "\"destination_ip\":\"%s\","  
        "\"destination_port\":%d,"
        "\"process_name\":\"%s\""
        "}", 
        s->src_addr, s->dst_addr, s->dst_port, current->comm);
    
    // Allocate memory in WASM module and copy input
    uint32_t input_ptr = wasm3_call_function(policy_module, "opa_malloc", strlen(input_json) + 1);
    memcpy(wasm3_get_memory(runtime, input_ptr), input_json, strlen(input_json) + 1);
    
    // Parse JSON input in WASM module  
    uint32_t parsed_input = wasm3_call_function(policy_module, "opa_json_parse", input_ptr, strlen(input_json));
    
    // Create evaluation context and set input
    uint32_t ctx = wasm3_call_function(policy_module, "opa_eval_ctx_new");
    wasm3_call_function(policy_module, "opa_eval_ctx_set_input", ctx, parsed_input);
    
    // Evaluate policy
    wasm3_call_function(policy_module, "eval", ctx);
    
    // Get result
    uint32_t result_addr = wasm3_call_function(policy_module, "opa_eval_ctx_get_result", ctx);
    uint32_t result_json = wasm3_call_function(policy_module, "opa_json_dump", result_addr);
    
    // Parse result - OPA returns [{"result": true/false}] for allow/deny policies
    const char *result_str = wasm3_get_memory(runtime, result_json);
    int decision = (strstr(result_str, "\"result\":true") != NULL) ? POLICY_ALLOW : POLICY_DENY;
    
    // Cleanup
    wasm3_call_function(policy_module, "opa_free", input_ptr);
    
    return decision;
}

Early Success

Initially, the system worked beautifully! The OPA WASM compilation workflow was straightforward:

# Compile Rego policy to WASM
opa build -t wasm -e riptides/socket/allow policy.rego

# Load the resulting bundle into our kernel module
echo "load $(base64 policy.wasm)" > /dev/riptides

‍

We could:

Compile standard Rego policies to WASM using opa build -t wasm
Load the compiled WASM modules into the kernel via our device driver
Evaluate policies for socket connections with microsecond latency
Update policies by simply recompiling and reloading WASM modules
Maintain strong isolation between policy code and kernel code

The performance numbers were impressive - policy evaluation took less than 10 microseconds in most cases, and the WASM modules were compact (typically 50-200KB for complex policies).

Initial architecture with WASM running OPA in the Linux kernel

The Reality Check: When WASM in Kernel Goes Wrong

However, as we moved from proof-of-concept to production-ready code, we encountered a series of challenges that ultimately led us to reconsider our approach.

Memory Management Nightmares

WASM linear memory management in kernel space proved problematic:

// This became a source of kernel panics
static int wasm_grow_memory(wasm3_runtime *runtime, uint32_t pages) {
    size_t new_size = pages * WASM_PAGE_SIZE;
    
    // vmalloc for large allocations, but this can fail under memory pressure
    void *new_memory = wasm_kernel_malloc(new_size);
    if (!new_memory) {
        return -ENOMEM; // This could panic the kernel
    }
    
    // Copy existing memory... potential for corruption here
    memcpy(new_memory, runtime->memory, runtime->memory_size);
    
    wasm_kernel_free(runtime->memory);
    runtime->memory = new_memory;
    runtime->memory_size = new_size;
    
    return 0;
}

Issues we encountered:

Memory fragmentation: Large WASM linear memory allocations could fragment kernel memory
OOM conditions: Policy evaluation could trigger out-of-memory conditions that panic the kernel
Memory leaks: Complex WASM module lifecycle management led to subtle memory leaks
Stack overflow: Recursive policy evaluation could overflow kernel stacks

Debugging Complexity

Debugging WASM execution in kernel space was extremely challenging:

No standard debugging tools worked
Kernel panics provided minimal context about WASM execution state
Policy bugs could crash the entire system
Difficult to distinguish between wasm3 bugs, our porting bugs, and policy bugs

Security Concerns

While WASM provides isolation, running it in kernel space introduced new attack vectors:

Bugs in the WASM runtime could compromise kernel security
Poorly written policies could potentially exploit kernel interfaces
The attack surface of the kernel was significantly increased
Fuzzing and security testing became much more complex

Maintenance Burden

Keeping our wasm3 fork in sync with upstream while maintaining kernel compatibility proved unsustainable:

Every wasm3 update required careful porting work
Kernel API changes affected our runtime integration
Supporting multiple kernel versions became a nightmare
The codebase became increasingly divergent from upstream wasm3
Upstream wasm3 became basically unmaintained during the time

The Pivot: Moving to User-Space

After months of fighting these issues, we made the difficult decision to move policy evaluation out of the kernel and into our Go-based agent process.

New Architecture

The new architecture looks like this:

Using native OPA Go library in user-space via Protobuf

Protocol Buffer Communication

We use protocol buffers with nanopb for efficient kernel-userspace communication. The idea of using protobuf for kernel-userspace communication isn't entirely new - NetBSD explored this concept and demonstrated its feasibility in their 2015 EuroBSDCon presentation, showing that structured serialization protocols can work well in kernel contexts.

// policy_request.proto
message PolicyEvaluationRequest {
  string source_ip = 1;
  string destination_ip = 2;
  uint32 destination_port = 3;
  string process_name = 4;
  string hostname = 5;
  map<string, string> labels = 6;
}

message PolicyEvaluationResponse {
  enum Decision {
    DENY = 0;
    ALLOW = 1;
  }
  Decision decision = 1;
  string reason = 2;
  uint32 cache_ttl = 3;
}

Kernel-Side Implementation

// Kernel module policy evaluation
static int evaluate_connection_policy(riptides_socket *s) {
    // Check cache first
    cached_decision *cached = lookup_policy_cache(s);
    if (cached && !cache_expired(cached)) {
        return cached->decision;
    }
    
    // Prepare protobuf request
    PolicyEvaluationRequest request = PolicyEvaluationRequest_init_zero;
    encode_socket_context(s, &request);
    
    // Send to user-space agent via device file
    PolicyEvaluationResponse response;
    int ret = send_policy_request(&request, &response);
    
    if (ret == 0) {
        // Cache the result
        cache_policy_decision(s, &response);
        return response.decision;
    }
    
    // Default to deny on communication failure
    return POLICY_DENY;
}

User-Space Agent Implementation

// Go agent policy evaluation
func (a *Agent) EvaluatePolicyRequest(req *PolicyEvaluationRequest) *PolicyEvaluationResponse {
    // Prepare OPA input
    input := map[string]interface{}{
        "source_ip":        req.SourceIp,
        "destination_ip":   req.DestinationIp, 
        "destination_port": req.DestinationPort,
        "process_name":     req.ProcessName,
        "hostname":         req.Hostname,
        "labels":           req.Labels,
    }
    
    // Evaluate against OPA policies
    results, err := a.opa.Query(context.Background(), rego.Query{
        Query: "data.riptides.socket.allow",
        Input: input,
    })
    
    if err != nil {
        log.Errorf("Policy evaluation failed: %v", err)
        return &PolicyEvaluationResponse{
            Decision: PolicyEvaluationResponse_DENY,
            Reason:   "evaluation_error",
        }
    }
    
    // Process results
    if len(results) > 0 && results[0].Expressions[0].Value == true {
        return &PolicyEvaluationResponse{
            Decision: PolicyEvaluationResponse_ALLOW,
            CacheTtl: 300, // 5 minutes
        }
    }
    
    return &PolicyEvaluationResponse{
        Decision: PolicyEvaluationResponse_DENY,
        Reason:   "policy_denied",
    }
}

The Benefits of the New Architecture

Reliability and Stability

Moving policy evaluation to user-space immediately improved system stability:

No more kernel panics: Policy bugs can't crash the kernel
Better error handling: Graceful degradation when policies fail
Easier debugging: Standard Go debugging tools work perfectly
Improved testing: Unit tests, integration tests, and fuzzing all become straightforward

Maintainability

The new architecture is much more maintainable:

Standard OPA: No custom WASM runtime to maintain
Pure Go: Leverages Go's excellent ecosystem and tooling
Simpler deployment: Policy updates don't require kernel module changes
Better observability: Rich metrics and logging capabilities
Reduced complexity: Previously, we needed to augment processes in kernel space, which required additional message passing to user-space for policy evaluation. Now all process context gathering and policy evaluation happens in the agent, eliminating this complexity

Performance Characteristics

While we lost the ultra-low latency of kernel-space evaluation, the performance is still excellent:

Sub-millisecond evaluation: Most policies evaluate in 200-500 microseconds
Efficient caching: Results are cached in kernel space for subsequent connections
Batching optimization: Multiple policy requests can be batched together
Async evaluation: Non-blocking policy evaluation for better throughput

Security Improvements

The security posture actually improved:

Reduced attack surface: Kernel module is much simpler and focused
Principle of least privilege: Policy evaluation runs in user-space with limited privileges
Better isolation: Policies are isolated from kernel and from each other
Easier auditing: Policy changes are easier to review and audit

Performance Characteristics

The performance comparison revealed interesting insights:

Kernel WASM (wasm3 interpreter):

Lower latency due to no context switching
But interpreted execution was slower than expected
Memory overhead from WASM linear memory allocation
Complex garbage collection in kernel space

User-space OPA (compiled Go binary):

Higher latency due to kernel-userspace communication
But much faster policy execution due to compiled code
More efficient memory usage
Better CPU utilization

Surprisingly, the compiled OPA evaluation in Go was significantly faster than the interpreted WASM execution, even accounting for the overhead of kernel-userspace communication. The user-space approach provides more than adequate performance for real-world workloads, especially with effective caching strategies.

Lessons Learned

1. Complexity Has a Cost

Running WASM in kernel space was technically impressive, but the complexity cost was enormous. The maintenance burden, debugging difficulty, and stability issues far outweighed the performance benefits.

2. The Kernel Should Stay Simple

Kernels should focus on what they do best: resource management, scheduling, and hardware abstraction. Complex business logic like policy evaluation is better suited for user-space.

3. Performance Isn't Everything

While the kernel WASM approach was faster, the user-space approach provides better overall system characteristics: reliability, maintainability, debuggability, and security.

4. Caching Changes Everything

Effective caching in the kernel module means that the higher latency of user-space evaluation only affects the first request. Subsequent requests to the same destination are served from cache with microsecond latency.

5. Protobuf + nanopb Works Great

The combination of protobuf for schema definition and nanopb for efficient kernel-space encoding/decoding provides an excellent balance of performance and maintainability.

Living with the New Architecture

Six months into production with our user-space architecture, we're genuinely happy with the decision. The system feels solid in ways that our kernel WASM implementation never quite achieved.

From an operational perspective, policy deployments have become trivial. When we need to update a policy, we simply push new Rego files to our policy repository, and OPA picks them up automatically. No kernel module recompilation, no system restarts, no careful coordination between kernel and user-space components. Our security team can iterate on policies independently, testing them in development environments before rolling them out to production.

The observability story has also dramatically improved. We now have rich metrics showing policy evaluation times, decision distributions, and error rates. When a policy behaves unexpectedly, we can trace through the decision logic with standard debugging tools rather than trying to decipher kernel logs. Our compliance team particularly appreciates the detailed audit trails - every policy decision is logged with full context, making security reviews straightforward.

Perhaps most importantly, the system degrades gracefully under load or when things go wrong. If our agent process crashes or becomes unresponsive, the kernel module falls back to cached decisions or configurable default behaviors. With the WASM approach, a runtime error could potentially take down the entire kernel module.

The development velocity gains have been substantial as well. New team members can contribute to policy logic using familiar Go tooling and testing frameworks. We can run comprehensive test suites against our policy logic, something that was nearly impossible with the kernel WASM approach. Policy authors and kernel developers can work independently, which has eliminated many coordination bottlenecks.

Looking ahead, we're exploring some interesting optimizations enabled by the user-space approach. We're experimenting with machine learning models to predict policy decisions before they're requested, pre-warming caches for better performance. We're also investigating batch policy evaluation for workloads with predictable connection patterns. These kinds of sophisticated optimizations would have been extremely difficult to implement safely in kernel space.

Conclusion

The journey from kernel-space WASM to user-space OPA evaluation taught us that impressive technical achievements don't always make the best engineering decisions. While running WASM in the kernel was a fascinating technical challenge, the user-space approach provides better overall system characteristics for a production security platform.

Key takeaways for anyone considering similar architectural decisions:

Measure total cost of ownership, not just raw performance
Prioritize reliability and maintainability over theoretical performance gains
Use the right tool for the job - kernels for kernel tasks, user-space for complex logic
Design for operations - consider debugging, monitoring, and updates from day one
Caching can bridge performance gaps between different architectural approaches

The Riptides platform is now more reliable, maintainable, and secure than ever before. Sometimes the best technical decision is to choose the boring, well-understood solution over the exciting, cutting-edge one.

Interested in how our Linux kernel journey unfolded and what we learned along the way?

If you enjoyed this post, follow us on LinkedIn and X for more updates. If you’d like to see Riptides in action, get in touch with us for a demo.

‍

Share this post

Ready to replace secrets
with trusted identities?

Build with trust at the core.

Get Early Access

Join a Research Call

From Kernel WASM to User-Space Policy Evaluation: Lessons Learned at Riptides

From Kernel WASM to User-Space Policy Evaluation: Lessons Learned at Riptides

Introduction

The Initial Vision: WASM in the Kernel

The Technical Journey

Porting wasm3 to Kernel Space

Integrating OPA Policies

Early Success

The Reality Check: When WASM in Kernel Goes Wrong

Memory Management Nightmares

Debugging Complexity

Security Concerns

Maintenance Burden

The Pivot: Moving to User-Space

New Architecture

Protocol Buffer Communication

Kernel-Side Implementation

User-Space Agent Implementation

The Benefits of the New Architecture

Reliability and Stability

Maintainability

Performance Characteristics

Security Improvements

Performance Characteristics

Lessons Learned

1. Complexity Has a Cost

2. The Kernel Should Stay Simple

3. Performance Isn't Everything

4. Caching Changes Everything

5. Protobuf + nanopb Works Great

Living with the New Architecture

Conclusion

Ready to replace secrets with trusted identities?

Ready to replace secrets
with trusted identities?