Research Contributions at UCSD

My first hands-on research project was with Prof. Amy Ousterhout. This project focused on improving CPU efficiency by offloading datacenter tax - encryption, compression, and copying - to SmartNIC accelerators.

SmartNICs are highly specialized hardware units / accelerators that run jobs more efficiently than CPUs, while offering substantial cost savings compared to servers. Here is some more information on SmartNICs and types of SmartNICs.

The primary motivation for this project was to better utilize precious CPU cycles to focus on the actual "business logic" rather than datacenter tax. For instance, if I upload a picture to Instagram (not that I use the platform), the server would perform multiple tasks, a few of which would be encrypting and compressing the image, followed by copying it over to other servers in the distributed system for more availability and reliability. From Facebook's Accelerometer paper: microservices spend as few as 18% of CPU cycles executing core application logic, while the remaining cycles are spent in common operations that are not core to the application logic (e.g., I/O processing, logging, and compression). This work was also motivated by the fact that there is a persistent surge in server requirements in today's datacenters, further intensified by the diminishing effects of Moore's Law. By offloading datacenter tax to hardware accelerators, there is a remarkable potential to enhance CPU efficiency and curtail costs associated with scaling computer systems.

I identified applications that spent substantial CPU cycles on datacenter tax and conducted benchmarking analyses with and without offloading the tax to a Mellanox ConnectX-5 SmartNIC. I started with the Synthetic Web Service application created by AIFM (Application-Integrated Far Memory). Since performance was to be benchmarked on a single machine, I isolated the application's data structures from inherent far memory usage and then migrated the Shenango-based application over to Caladan, a novel CPU scheduler that supplanted Shenango. Porting the application over to Caladan was done to reap the most benefits, since the accelerator APIs used Caladan too - it would not be ideal to use two different schedulers for a single experiment.

After offloading compression tasks to a Mellanox ConnectX-5 SmartNIC, we saw significant performance improvements - specifically, a >3.5x boost in throughput, and a >20% reduction in latency.

This work was used as a small part of Abhishek Vijeev's thesis, titled Operating System Scheduling for Emerging Hardware Accelerators.

Another interesting application I found was Apache Arrow, which spends a significant number of CPU cycles to compress data while writing out an Arrow .parquet file. However, Arrow's thread creation and per-thread measurement capabilities were abstracted away, and being able to port it to Caladan seemed unlikely.

---