RuView: How WiFi Signals Can See Through Walls — A Deep Dive Into the Edge AI System That Turns Radio Waves Into Human Pose
An open-source Rust project reconstructs human body pose, breathing rate, and heart rate from ordinary WiFi signals — no cameras, no cloud, no privacy concerns. We performed a deep code audit to understand how it actually works.
Key Takeaways
Key Takeaways: - RuView implements WiFi DensePose — a technique from Carnegie Mellon University research — entirely in Rust with zero ML framework dependencies - The system uses a Graph Transformer architecture that combines cross-attention with Graph Convolutional Networks on the COCO skeleton to estimate 17 body keypoints from WiFi Channel State Information - A self-supervised contrastive learning pipeline (SimCLR-style) enables the model to learn meaningful signal representations without labeled data - The SONA module combines LoRA adapters with Elastic Weight Consolidation to adapt to new rooms without catastrophic forgetting - Designed for ESP32 microcontrollers, making it deployable on hardware costing under $10
Imagine knowing exactly where people are standing in your home, how they are breathing, and whether they have fallen — without a single camera, microphone, or wearable device. Just your ordinary WiFi router emitting signals it was already transmitting.
This is not science fiction. A 2022 Carnegie Mellon University paper titled <em>DensePose From WiFi</em> [1] demonstrated that WiFi signals reflecting off the human body carry enough information to reconstruct a full body pose — with accuracy comparable to camera-based systems. Now, an open-source project called <strong>RuView</strong> [6] has taken that research and turned it into a complete, deployable edge AI system written in Rust, designed to run on microcontrollers costing less than $10.
We performed a comprehensive code audit of RuView using Code Indexer, a semantic code search engine, analyzing all 1,139 source files — 45,017 code chunks, 18,078 symbols, and 78,311 cross-references — to understand how this system actually works under the hood. Here is what we found.
The Physics: Why WiFi Can "See" You
Every WiFi router continuously transmits radio waves at 2.4 GHz or 5 GHz. When these waves encounter a human body, they scatter, reflect, and attenuate in predictable ways depending on body position. Modern WiFi chipsets report this interaction through a measurement called <strong>Channel State Information (CSI)</strong> — a matrix of complex numbers describing the amplitude and phase of signals across multiple subcarrier frequencies and antenna pairs.
A person raising their arm creates a different CSI pattern than someone sitting down. Chest expansion during breathing creates subtle, rhythmic variations. Even heartbeats produce detectable micro-movements. The challenge is extracting meaningful information from this noisy, high-dimensional signal — and that is where deep learning enters the picture.
From Signal to Skeleton: The AI Pipeline
RuView's architecture follows a four-stage pipeline that transforms raw WiFi measurements into 3D body keypoints. What makes it remarkable is that the entire machine learning stack — including linear algebra, neural network layers, optimizers, and loss functions — is implemented in pure Rust with zero external ML dependencies. No PyTorch, no TensorFlow, no ONNX Runtime.
Stage 1: Signal Preprocessing
Raw CSI data is inherently noisy. RuView's <code>CsiProcessor</code> module applies three cleaning steps: <strong>noise removal</strong> by filtering amplitude values below a configurable decibel threshold, <strong>Hamming windowing</strong> to reduce spectral leakage at frame boundaries, and <strong>unit-variance normalization</strong> so that signals from different hardware are comparable. The processor maintains a sliding history window for temporal smoothing using exponential moving averages.
Stage 2: The Graph Transformer
This is the architectural centerpiece. RuView implements a hybrid model called <code>CsiToPoseTransformer</code> that combines two powerful ideas from deep learning research: <strong>Transformers</strong> (the attention mechanism behind GPT and similar models) and <strong>Graph Neural Networks</strong> (networks that operate on structured graph data).
The process works as follows. First, CSI features from each antenna pair are projected into a high-dimensional embedding space through a learned linear transformation. Next, 17 <em>learnable query vectors</em> — one for each body keypoint in the COCO format (nose, eyes, shoulders, elbows, wrists, hips, knees, ankles) — attend to the CSI embeddings through <strong>cross-attention</strong>. This mechanism allows each keypoint to dynamically focus on the WiFi signals most relevant to its position, similar to how the DETR object detector works [1].
The attended features then pass through a <strong>Graph Convolutional Network (GCN)</strong> that operates on the human skeleton topology — a graph where joints are nodes and bones are edges. This step enforces anatomical constraints: an elbow must be connected to a shoulder, a knee to a hip. The GCN uses normalized adjacency matrices with self-loops, applying the classic message-passing formula where each node aggregates features from its neighbors.
Finally, separate regression heads predict 3D coordinates (x, y, z) and a confidence score for each keypoint. The result is a full 17-point skeleton reconstructed entirely from WiFi signals.
Stage 3: DensePose — Beyond Keypoints
While keypoints give you a skeleton, <strong>DensePose</strong> goes further — it maps every point on the body surface to a UV coordinate system across 24 anatomical regions (head, torso, upper arms, forearms, hands, thighs, calves, feet). RuView includes a dedicated <code>DensePoseHead</code> module with two branches: a <strong>segmentation branch</strong> that classifies which body part each spatial location belongs to, and a <strong>UV regression branch</strong> that predicts continuous surface coordinates. This enables applications like detailed gesture recognition and body surface tracking.
Stage 4: Vital Sign Extraction
Beyond pose estimation, RuView extracts physiological signals from the same WiFi data. Breathing creates periodic chest expansion (0.2–0.5 Hz), and heartbeats produce micro-vibrations (0.8–2.0 Hz). The system uses bandpass filtering and spectral analysis on CSI time series to isolate these rhythmic components. A dedicated vital signs classifier then validates whether the detected frequencies correspond to genuine physiological signals or environmental artifacts like a ceiling fan.
Training Without Labels: Self-Supervised Learning
One of the most sophisticated aspects of RuView is its ability to learn meaningful signal representations <em>without any labeled training data</em>. Setting up ground truth for WiFi pose estimation is expensive — it requires synchronized camera systems capturing poses while WiFi records CSI. RuView addresses this through a <strong>SimCLR-style contrastive learning</strong> pipeline [2].
The approach works by creating two different augmented views of the same CSI window — for example, one with added Gaussian noise and temporal jitter, another with subcarrier masking and phase rotation. Both views pass through the transformer backbone and a projection head (a two-layer neural network that maps features to a 128-dimensional embedding space). The <strong>InfoNCE loss</strong> then trains the model to recognize that these two views came from the same original signal while distinguishing them from views of different signals.
This process creates a rich, general-purpose representation of WiFi signals that captures meaningful structure — which rooms look similar, which movements create similar patterns — before any supervised fine-tuning with actual pose labels. The concept is identical to how GPT learns language structure before being fine-tuned for specific tasks.
SONA: Adapting to New Environments Without Forgetting
WiFi signals vary dramatically between environments. A model trained in a concrete office will perform poorly in a carpeted apartment because walls, furniture, and room geometry all affect signal propagation. Retraining from scratch for every new room is impractical. RuView solves this with <strong>SONA (Self-Organizing Neural Architecture)</strong> — an online adaptation framework that combines two techniques from recent ML research.
The first technique is <strong>LoRA (Low-Rank Adaptation)</strong> [3] — originally developed for fine-tuning large language models. Instead of modifying all model parameters, LoRA learns a small, factorized delta: a product of two low-rank matrices (typically rank 4) scaled by a constant. This means adapting to a new room requires updating only a few thousand parameters instead of the full model, making it feasible on microcontrollers.
The second technique is <strong>Elastic Weight Consolidation (EWC)</strong> [5] — a regularization method from DeepMind that prevents <em>catastrophic forgetting</em>. When the model adapts to a new environment, EWC adds a penalty term that discourages large changes to parameters that were important for previous environments. The penalty is proportional to the Fisher Information Matrix, which measures how sensitive the model's output is to each parameter.
Together, LoRA + EWC enable RuView to quickly adapt to a new room (within minutes of operation) while retaining what it learned about previous environments — a capability that Person-in-WiFi-3D [4] and other recent WiFi sensing systems are only beginning to explore.
The Hardware Story: $10 Perception
RuView targets ESP32-S3 microcontrollers — chips that cost under $5, consume milliwatts of power, and support CSI data extraction. A mesh of three or more ESP32 nodes covers a typical room. The firmware, written in C, streams CSI frames over QUIC transport to a local hub or processes them directly on-device through WASM (WebAssembly) modules.
The project also includes a Tauri-based desktop application with a React frontend for visualization, and a complete REST API server for integration with smart home systems. Everything runs locally — no data leaves the premises.
| Component | Technology | Purpose |
|---|---|---|
| Signal Processing | Rust (CsiProcessor) | Noise removal, windowing, normalization |
| AI Core | Rust (CsiToPoseTransformer) | Cross-attention + GCN pose estimation |
| DensePose | Rust (DensePoseHead) | Body part segmentation and UV mapping |
| Self-Supervised Learning | Rust (SimCLR + InfoNCE) | Representation learning without labels |
| Online Adaptation | Rust (SONA: LoRA + EWC) | Environment-specific fine-tuning |
| Hardware | ESP32-S3 (C firmware) | CSI data collection at 20 Hz |
| Desktop App | Tauri + React | Real-time pose visualization |
| Edge Modules | WebAssembly | On-device gesture/gait/seizure detection |
Code Quality: What the Audit Revealed
Our automated audit scored the project 44/100 — a grade that is somewhat misleading. The low score is primarily driven by bundled vendor files (React DOM alone has functions with cyclomatic complexity exceeding 17,000) and generated JSON metadata from Claude Flow orchestration. The actual Rust source code is well-structured with clean module boundaries.
Notable findings include: test coverage at 11% (adequate for a research project but insufficient for production), 30 magic numbers in signal processing code that would benefit from named constants, and a 3,800-line <code>main.rs</code> that serves as the monolithic server entry point — a classic coupling antipattern that the crate system partially mitigates. Documentation is sparse in formal docstrings but compensated by 20+ Architecture Decision Records (ADRs) that explain design rationale.
Why This Matters
WiFi-based sensing is not merely an academic curiosity. The IEEE 802.11bf task group is actively standardizing WLAN sensing capabilities, signaling industry-wide recognition that WiFi infrastructure can serve dual purposes — connectivity and perception. RuView demonstrates what a complete, from-signal-to-skeleton implementation looks like in practice.
The privacy implications are significant. Unlike cameras, WiFi sensing produces abstract signal data that cannot be trivially converted to identifiable images. It works through walls and in complete darkness. It costs orders of magnitude less than LiDAR or radar. For elderly care, fall detection, and smart home automation, this technology offers a compelling balance between capability and privacy.
The full source code is available at <a href="https://github.com/ruvnet/RuView">github.com/ruvnet/RuView</a>. This analysis was conducted using Code Indexer, a semantic code search and audit engine, which indexed the entire 1,139-file codebase for automated quality assessment and architectural exploration.
📚 Sources & References
| # | Source | Link |
|---|---|---|
| [1] | DensePose From WiFi |
|
| [2] | A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) |
|
| [3] | LoRA: Low-Rank Adaptation of Large Language Models |
|
| [4] | Person-in-WiFi-3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi |
|
| [5] | Overcoming Catastrophic Forgetting in Neural Networks (EWC) |
|
| [6] | RuView — WiFi DensePose Edge AI System (GitHub Repository) |
|
| [7] | Code Indexer — AI-Powered Semantic Code Search Engine |
|