Memory or Storage-Centric Processing

As compute throughput continues to scale faster than data movement, many real-world workloads are increasingly dominated by memory- and storage-bound behavior rather than pure arithmetic intensity. Modern applications, ranging from large-scale analytics and bioinformatics pipelines to embedding retrieval and data-intensive machine learning, often spend a disproportionate amount of time and energy moving data across the memory hierarchy and through I/O stacks. This “data movement wall” becomes even more severe when datasets grow to hundreds of gigabytes or terabytes, where repeatedly shuttling data between storage, host memory, and accelerators leads to high latency, poor efficiency, and wasted bandwidth. Processing-in-Memory (PIM) and Processing-in-Storage (PIS) are compelling because they attack the core inefficiency directly: bring computation to where the data already resides.

Our research on Processing in Memory/Storage is driven by a system-first, bottleneck-elimination philosophy. Rather than treating near-data computation as an isolated microarchitectural trick, we design end-to-end near-data acceleration paths that integrate compute placement, data layout, and execution orchestration across the full pipeline. We explore architectures that offload carefully selected primitives, such as filtering, aggregation, lightweight feature extraction, and pre-/post-processing—into memory- or storage-adjacent execution, thereby reducing redundant transfers and freeing the host/accelerator for higher-value computation. Importantly, our designs prioritize deployability: we consider interface constraints, consistency semantics, and practical integration with heterogeneous hosts (CPU–FPGA/accelerator environments), aiming for near-data solutions that deliver measurable system-level gains under realistic workloads and datasets.

Going forward, we plan to push beyond static offload toward adaptive near-data computing that is workload-aware and infrastructure-aware. Key directions include: (i) co-designing data placement and near-data kernels, so that layouts, compression, and indexing are optimized for both performance and bandwidth efficiency; (ii) dynamic partitioning of computation between host and near-data units based on contention, queueing, and observed bottlenecks; (iii) exploring secure and multi-tenant PIM/PIS, where isolation and access control are first-class concerns rather than afterthoughts; and (iv) building a unified framework that can select the best execution location—CPU, accelerator, memory-side, or storage-side—based on cost models and runtime signals. Our long-term vision is to make near-data processing a practical and principled component of modern system architecture, enabling scalable performance and energy efficiency for the next generation of data-centric computing.