{{{#!comment
See (https://www.orbit-lab.org/wiki/WikiFormatting) for wiki formatting
}}}

[[TOC(Other/Summer/2025/R3/*, depth=1, heading=R3)]]

= Real-time, robust, and reliable (R^3^) machine learning over wireless networks =
**WINLAB Summer Internship 2025**

**Group Members:** Ayaan Qayyum(GR), Joshua Menezes(GR), Nihal Abdul Muneer(GR),  Hasan Ali(UG), Madhav Subramaniyam(UG)

== Project Objective ==

=== Introduction: ===

Edge computing is an emerging paradigm for enabling ML/AI applications in mobile networks. This setting presents many challenges in terms of latency, accuracy, security/privacy, and adaptivity to changing environments. The goal of this project will be to implement and evaluate algorithms and approaches which work “on paper”  to see how well they work in practice. In particular, students will work on methods for mobile devices to strategically “offload” complex computing tasks to the cloud, approaches for fully decentralized model updating and adaptation that are robust against malicious attacks, and strategies that can enable real-time tracking and control.

=== Sub-Groups: ===

||= Learning to Help (L2H) =||= Feature Extraction for Distributed Systems (PCA) =||
||  The recently proposed Learning to Help (L2H) model proposed training a server model given a fixed local (client) model. This differs from the Learning to Defer (L2D) framework which trains the client for a fixed (expert) server.  L2H demonstrates its applicability in a number of different scenarios of practical interest in which access to the server may be limited by cost, availability, or policy. || Implement a distributed feature extraction method, specifically Principal Component Analysis (PCA), on the Orbit testbed. Enable multiple nodes in the Orbit network to collaboratively learn the eigenvectors to reduce the dimensions of new data samples. This compressed data will then be fed into a pre-trained machine learning model for inference. The central idea is that collaboration among nodes can speed up the process of learning these eigenvectors improving the efficiency of our learning or inference tasks.
||

== Progress ==

=== Week 1 ===
[https://docs.google.com/presentation/d/1vpTfC3btD_fo2PTWTjl_F7GIQ_yFmAVL1vh2xrC8-Us/edit?usp=drive_link Week 1 Slides]

- Read the L2H paper [https://docs.google.com/document/d/1SbiWl773-a59fv6qETlRt_3xXUfRgAaZzjJVOTREiqU/edit?tab=t.8x9jjtvl79to info]
- Read the PCA paper [https://docs.google.com/document/d/1SbiWl773-a59fv6qETlRt_3xXUfRgAaZzjJVOTREiqU/edit?tab=t.gd7hstpnqblj#heading=h.qmb8rahmqgyk info]

=== Week 2 ===
[https://docs.google.com/presentation/d/1HQLzoaZy9GAZdUuAKzWyq4eZfQxWejc9XjZ0i7Q0ZDQ/edit?usp=sharing Week 2 Slides]

- Finished all given papers 
- Reviewed ML Concepts
- Practiced Cosmos (Linux, Vim, etc.)
- Set up IDE
- Set up project Gitlab 

=== Week 3 ===
[https://docs.google.com/presentation/d/1CoeVqW_KkwaUJu5cP5dy_3ICegnhyhhBIlqUHjwtDFQ/edit?usp=sharing Week 3 Slides]

[https://docs.google.com/presentation/d/1pnY_DhRFG13IRgjqnUoJyrN_iZFM_V6ZZzkz9gru-JI/edit?usp=sharing Week 3 Content Slides]

- Officially split into sub-groups \\ - L2H (Joshua & Madhav) \\ - PCA (Aayan, Nihal, & Hasan)

||= L2H=||= PCA=||
|| - Designed Model Architecture based on the paper \\ - Trained all the models on MNIST (expert, rejector, client)\\ - Ran L2H system on one node \\  - Ran multiclass L2H across nodes || - Created an example distributed system framework \\ - Implemented some fault handling features for reliability/robustness \\ - Created a number guessing game with a resilient central node and subnode architecture \\ - Wrote simulation of ORBIT distributed environment using Docker ||

=== Week 4 ===
[https://docs.google.com/presentation/d/1m7BtaOoItuCyNeE5cEjFde8bFQstZb9ObNr1WhVC1B0/edit?usp=sharing Week 4 Slides]

[https://docs.google.com/presentation/d/17FawluRSIRMfZgn4Bd5y3A1g9PfSynj2MYy2c-dy5uQ/edit?usp=sharing Week 4 Content Slides]

Created the Project Wiki you are currently viewing.

||= L2H=||= PCA=||
|| - Tested PTP on one node and got negative latencies \\ - Switched to Monotonic Clock \\ - Adjusted cost \\ - Ran L2H across 2 nodes \\ - Graphed Results of L2H || - Made distributed Code \\ - Implemented PCA on one node \\ - Implemented kPCA on one node ||

=== Week 5 ===
[https://docs.google.com/presentation/d/10QSHvUPM6LBKI4O6dKvgsKKWQAxWdv9wpkmh8T4fspk/edit?usp=sharing Week 5 Slides]

[https://docs.google.com/presentation/d/1mmgcWLy9FspA5I_6iUq3jtDNgBcmaQbgoOkUZw0eLD0/edit?usp=sharing Week 5 Content Slides]

||= L2H=||= PCA=||
|| - Started Stratification of Dataset \\ - More timestamping for more insight into timing and latency\\ - Researched more into how to fix PTP negative values\\ - Drafted new Bandwidth implementation || - Corrected math errors and fixed tuning parameters in PCA code implemented in a single file \\ - Implemented D-Krasulina logic into central node + subnode topology \\ - Mesh topology framework  ||

=== Week 6 ===
[https://docs.google.com/presentation/d/1hTjLyMhSqAuADzIuhZ2mc7YNELAGtxetqsXjGiWenGI/edit?usp=sharing Week 6 Slides]

[https://docs.google.com/presentation/d/1zpPk6s-5hbjz65HfCvDY4ocHUTxbPiwtKYAXjn1-LWs/edit?usp=sharing Week 6 Content Slides]


||= L2H=||= PCA=||
|| - Fixed Stratification so all classes are equally represented \\ - Started sending more information between Client and Expert \\ - Started Multiple Clients per Expert
 || - Fixed bugs in D-Krasulina algorithm \\ - Working on Orbit with Central + Subnode and Complete graph topologies! \\ - Verified the correctness of algorithm
  ||

=== Week 7 ===
[https://docs.google.com/presentation/d/1X2dxMSalhxrYwW9NN6YcUCrEPy8c7ITPifjv7lbXcGs/edit?usp=sharing Week 7 Slides]

[https://docs.google.com/presentation/d/1C4xk22gjOWsRLig9NPOsHPs_73ewUKgyeZlp3VoZqkk/edit?usp=sharing Week 7 Content Slides]


||= L2H=||= PCA=||
|| - Implemented Multithreading using a Go Script \\ - Graphed # of Clients vs Calls per Second \\ - Graphed # of Clients vs Bandwidth \\ - 3D Graphs using noise || - MNIST D-Krasulina \\ - D-Krasulina Optimization Testing (wall clock time vs packet size) \\ - Tested D-Krasulina and DM-Krasulina algorithm with complete graph topology \\ - Nearly finished C-DIEGO ||

=== Week 8 ===
[https://docs.google.com/presentation/d/1k3dVeMn2Tn9pEnqT85z6iBM2OiSdKpcCwpxJLD_3mkk/edit?usp=sharing Week 8 Slides]

[https://docs.google.com/presentation/d/1k8O2uCnY2wf3aSk5NDD3JYQpNytElA7IQiXf5a2M6aU/edit?usp=sharing Week 8 Content Slides]


||= L2H=||= PCA=||
|| - Changed dataset to Tiny-ImageNet-200 \\ - Changed to Expert ViT \\ - Trained models on new problem/system \\ - Explored new research questions || - Implemented DiSK for k-PCA using synthetic data \\ - Created first draft of research paper ||

=== Week 9 ===
[https://docs.google.com/presentation/d/10mlG6dVZOPQDW9bZol2yV8UjWmCn3b9yrh2pjsHuTis/edit?usp=sharing Week 9 Slides]

[https://docs.google.com/presentation/d/1vzx6HvCjiVBoxRoR_1-HcWXwCJAgfjQpckO8E4kDOsE/edit?usp=sharing Week 9 Content Slides]


||= L2H=||= PCA=||
|| -Change dataset to Cifar-10 \\ - Redefined research question for noisy data \\ - Wrote Autoencoder L2H training function \\ - Trained models for new system || - Finished DiSK algorithm implementation \\ - Continued working on paper ||

=== Week 10 ===
[https://docs.google.com/presentation/d/1wyIgK1fu-v2klhz9nXLeyvYdaLU6XcFZulNUbotbwCA/edit?usp=sharing Week 10 Slides]

[https://docs.google.com/presentation/d/1HyO3ou_aRUj24WvUH_Gzz_Hb6dG0Cb4aNp70LDNBAP0/edit?usp=sharing Week 10 Content Slides]


||= L2H=||= PCA=||
|| - Set up Ansible and prepared for large scale experiment \\ - Set up bash script for large scale experiment \\ - Updated the graphing code || - Studied Rabbat's Work \\ - Implemented fully async, non-blocking gossip version of DiSK with partial graph topology support ||

=== Final Week ===
[https://docs.google.com/presentation/d/1N9hEsJtJ_Xnc-E6t-KALSeByvXgtUIW8sagMHkL3GdM/edit?usp=sharing Final Presentation]


||= L2H=||= PCA=||
|| - Continued research for paper || - Continued research for paper ||

== Useful Links: ==

**For Reserving Nodes (Usually Bed or Weeks):** https://www.orbit-lab.org/cPanel/controlPanel/start

**Documentation/Notes:** https://drive.google.com/drive/folders/15HJmhyTUdzQafzMtNqIhy9jgc2sllF35?usp=sharing

**Gitlab:** https://gitlab.orbit-lab.org/r3-25

**Presentations:**
https://drive.google.com/drive/folders/1aThnKsOUpBFiJtRgjq9AjoppeDdHGuQy?usp=sharing