wiki:Other/Summer/2023/Inference

Context Navigation

Resilient Edge-Cloud Autonomous Learning with Timely inferences

Project Advisor and Mentors: Professor Anand Sarwate, Professor Waheed Bajwa, Nitya Sathyavageeswaran, Vishakha Ramani, Yu Wu, Aliasghar Mohammadsalehi, Muhammad Zulqarnain

Team: Shreya Venugopaal, Haider Abdelrahman, Tanushree Mehta, Lakshya Gour, Yunhyuk Chang

Objective: The purpose of this project is to use ORBIT in order to design experiments which will analyze the training and prediction of various ML models across multiple edge devices. Additionally, students will develop a latency profiling framework for MEC-assisted machine learning using cutting edge techniques such as splitting model computing and early exit. They will then analyze these models for latency and accuracy tradeoff analysis, along with measuring network delays.

Final Poster with Results

Networking Setup for Our Experiment Setup

We have developed an experiment enviornment that explores the nature of mobile-edge computing. Our innovative solution involves a client-server architecture that enables seamless communication between devices for predictive analysis. In this section, we'll provide you with an overview of the technologies and processes we employ in our networking setup.

Socket Programming: Establishing Connectivity

Our networking framework relies on socket programming to establish seamless connections between client and server devices. This connectivity is pivotal for the real-time exchange of data, enabling us to experiment and analyze performance in a dynamic environment. We used the socket library to accomplish this connectivity.

Data Serialization and Deserialization: Ensuring Accuracy

To ensure precise data transmission, we employ data serialization and deserialization methods. The struct library plays a crucial role in packaging and unpacking data, guaranteeing that information traverses between devices without compromise.

Tracking Metrics with Pandas

Efficient data management forms the foundation of our endeavors. Leveraging the pandas library, we meticulously organize and log data in CSV format. This meticulous record-keeping is essential for tracking timing metrics and prediction results, facilitating in-depth analysis and iterative improvement.

Unraveling the Networking Workflow

Image Transmission: Images are prepared and transmitted from the client to the server, initiating the analysis process.

Server Processing: Upon receipt, the server takes charge. It unboxes the data, engages the MobileNetV2 model for inference, and meticulously times the journey from image input to prediction output.

Response to Client: The server sends back the prediction results alongside a wealth of timing data, enabling comprehensive analysis.

Client's Analytical Role: Armed with the prediction outcomes and timing insights, the client takes the reins. It logs data, evaluates accuracy, and extracts valuable insights from the entire process.

Our Pursuit's Significance

Our networking ecosystem holds profound significance for several reasons:

Learning Through Experimentation: Our framework serves as a playground for mobile edge computing experimentation. We actively investigate optimization strategies and thresholds, expanding our knowledge base.
Resource Optimization: The dynamic decision-making mechanism ensures optimal resource utilization. We determine the tipping point where leveraging the edge device's capabilities yields the greatest advantage.
Informed Decision-Making: By logging timing metrics and prediction results, we gain a panoramic view. This informs ongoing optimizations and strategies for achieving higher accuracy.\

Unveiling the Power of Training and Optimization

Not only did we learn how to set up the networking in our experiment but, we were equally dedicated to mastering the art of training, optimization, and making the most of the resources at hand. This section sheds light on our training methodologies and the plethora of tools and techniques we've harnessed to hone our models.

Foundational Training

Our training journey commenced by delving into the fundamentals of PyTorch and other essential libraries. We took inspiration from tutorials and insights gathered from previous classes to build neural networks from the ground up. Our exploration led us to harness the capabilities of diverse libraries, including TensorFlow, Keras, Optuna, torchvision, torch, pandas, numpy, matplotlib, and time.

Strategic Techniques for Optimization

To tackle challenges of overfitting and ensure robustness, we embraced a plethora of techniques that transformed our models into precision instruments. Here are the key strategies we employed:

Image Cutouts: We utilized image cutouts, a sophisticated augmentation technique that randomly masks out portions of input images during training. This approach fortifies the model's resilience and acts as a buffer against overfitting.
Learning Rate Scheduling: We adopted the step decay learning rate scheduling method, progressively reducing the learning rate as epochs advanced. This dynamic strategy enhances convergence and accuracy.
Batch Normalization: Our models reaped the benefits of batch normalization, a technique that accelerates convergence speed and enhances training stability. Its primary role is to mitigate the internal covariate shift problem.
Early Stopping: To expedite training, we implemented an early stopping mechanism that halts training when loss ceases to decrease, within a predefined patience window. This dynamic strategy reduced training time and was experimented with varying patience levels from 5 to 15.
Random Rotations: Our models underwent random rotations during training, a technique designed to boost accuracy by imparting robustness.
Random Erasing: This ingenious technique strategically resets selected weights to their original state, preventing over-reliance on individual nodes and fostering balanced learning.

Hyperparameter Tuning

To unlock the full potential of our models, we turned to hyperparameter tuning. Optuna, a hyperparameter optimization library, became our ally. We designed experiments with 20 training trials, fine-tuning parameters like normalization transforms, patience levels, and starting learning rates. This approach ensured that our models achieved peak performance across various scenarios.

Navigating Model Limitations

In a race against time, we confronted the limitations posed by our chosen models, including Mobilenet_v2, Densenet121, and Resnet18. These models were initially tailored for ImageNet, a rich dataset comprising high-quality images and an extensive class spectrum. However, within our 10-week timeline, employing ImageNet for training and testing was impractical due to the prolonged time requirement. Instead, we focused on training the last layers of these models specifically for Cifar10. This strategic adaptation allowed us to attain meaningful results in a constrained timeframe.

The Significance of Training and Optimization

Our training and optimization endeavors hold exceptional significance:

Holistic Skill Development: Through our journey, we've mastered PyTorch and a host of libraries, honing skills that transcend individual projects.
Robust Model Foundation: The diverse strategies we employed fortified our models against overfitting and fostered resilience in the face of challenging scenarios.
Efficient Resource Utilization: Techniques like hyperparameter tuning and early stopping streamline the training process, making the most of limited resources.

Weekly Updates

Week 1

Summary

Understood the goal of the project and broke down its objectives
Got familiar with Linux by practicing some simple Linux commands.
Review some basic Python and other coding materials
Understanding the basis of Machine Learning models and algorithms

Next Steps

Create and train a “small” and “large” Neural Network
Attempt to simulate the difference between their performances at inference

Week 2

Summary

Performed basics of pattern recognition and Machine Learning (PPA - Patterns, Predictions, Actions) using Pytorch
Created a node image with Pytorch
Created small Machine Learning models
Loaded the Modified National Institute of Standards and Technology (MNIST) database onto the orbit node

Next Steps

Create and train a “small” and “large” Neural Network
Attempt to simulate the difference between their performances at inference

Week 3

Summary

Debugged issues with work we had done in previous weeks
Connected 2 nodes with client-server architecture
Extracted data for time and Accuracy measurements
Added logging to code and stored logs in a table format (csv)
Ready to start testing!

Next Steps

Compute the delay time for each step to a higher precision
Plot and identify trends in data (accuracy & latency)
Begin to implement Split Computing - 10 layer MLP, ResNet
Think about other implementations - Early Exiting, model compression, data compression, … Mixture?

Week 4

Summary

Compared performances of both neural networks on the CIFAR10 dataset
Established connection between two nodes
Communicated test data between nodes to compare accuracy and delay between our NN models
- Need to serialize by bytes instead of transferring as strings
Read and discussed various research papers related to our project - created a brief presentation on each paper

Next Steps

Divide data into “chunks” for faster and more efficient transmission
Design experiments to test different architectures and implementations of Early Exiting and Split Computing
Track and add Age of Information Metrics

Week 5

We learned how to use the Network Time Protocol while we waited for the more accurate Precision Time Protocol to be implemented in the technology we were using.
Implemented NTP in our code files so that we can measure time and evaluate the trade offs that we came up with.

Week 6

Summary

Figured out how to properly split a NN using split computing
First Experimented with image inferencing on the same device
Split the Neural Network (ResNet 18) onto two different devices
Ran an inference with the split NN across a network

Next Steps

Take a step back
Ask: what questions do we want to answer?
As you vary the threshold for asking for help, how does the average latency change(over the dataset)?

Open ended

Transfer to Precision time protocol(PTP)

Week 7

Summary

Explored different research questions with the data collected
Limited CPU power in terminal to imitate mobile devices
Implemented different threshold values based on confidence for sending the data to the edge and server for inference
Generated graphs for threshold vs latency

Next Steps

Use PTP timing to get more precise time measurements
After analyzing latency, move on to accuracy tradeoffs using different thresholds

Links to Presentations

Week 1 Week 2 Week 3 Week 4 Week 5 - Cumulative Week 6 Week 7 Final Presentation

As machine learning models become increasingly advanced and complex, running these models on less powerful devices is becoming increasingly difficult, especially when accounting for latency. However, it is also not efficient to run everything using only the cloud as it creates too much traffic. A viable solution to this problem is edge computing, where we use the edge (the networks in between user devices and the cloud) for computation.

Trained a small and large Neural Networks (DENSENET & Mobilenet V2) on the CIFAR10 dataset
Performed PCA and SVM on NNs to familiarize ourselves with PyTorch
Loaded the MNIST database (image) onto an orbit node
Connected 2 nodes with client-server architecture and extracted data for time and accuracy measurements
Compared performances of both neural networks on the CIFAR10 dataset
- Established connection between two nodes
- Communicated test data between nodes to compare accuracy and delay between our NN models
Worked with professors/mentors and read papers to understand the concepts of early exit, split computing, accuracy/latency tradeoff, and distributed DNNs over the edge cloud
Split the NN ResNet 18 using split computing onto two different devices and ran an inference across a network
Used Network Time Protocol (NTP) and sent data in "packages" (chunks) to collect latency and delay data
Explored different research questions with the data collected:
Limited CPU power in terminal to imitate mobile devices
Implemented different threshold values based on confidence for sending the data to the edge and server for inference
- Generated graphs for threshold vs latency, accuracy vs latency, etc.
Retrained neural network to achieve 88% accuracy and collected new graphs
Introduced a delay in the inference as well as data transfer to simulate a queue

References

Abbas, Y. Zhang, A. Taherkordi and T. Skeie, "[Mobile Edge Computing: A Survey](," in IEEE Internet of Things Journal

Teerapittayanon, S., McDanel, B., & Kung, H. T. (2017, June). Distributed deep neural networks over the cloud, the edge and end devices.

Last modified 17 months ago Last modified on Mar 15, 2024, 6:56:17 PM

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text