Changes between Version 40 and Version 41 of Other/Summer/2023/Distributed


Ignore:
Timestamp:
Jul 10, 2023, 4:58:14 PM (12 months ago)
Author:
krs227
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Other/Summer/2023/Distributed

    v40 v41  
    11
    2 == **Distributed Data Infrastructure                                             **
     2== **Distributed Data Infrastructure**
    33----
    44
     
    3434**Summary**
    3535   - Installed Debian Linux in Hard Drives
    36    - Solidified our understanding with networking as well as the Network File Systems (NFS).
    37    - Worked with start and stop services.
     36   - Solidified our understanding of networking as well as the Network File Systems (NFS).
     37   - Worked with start and stop services as well as job schedulers.
    3838
    3939**Idea Of Our Progress This Week**
    4040
    41 Our main goal for this week was to spend time understanding the goal of the project and installing the necessary tools that would be used throughout the project, such as SSH, Orbit, Linux Packages, and eight cluster servers. We also got access to our Wireguard where we would be getting access to containers. We also installed and wired a cluster of 8 servers and issued some commands from our LXC containers in our terminal where we worked with Linux and understood some of the basic commands that would be useful for some more advanced work in the project. The final thing that we did was to install Debian and Ubuntu packages, remove Ubuntu packages with APT, and also searched and queried Ubuntu packages with APT.
     41This week, we spent more time installing more drives to work with throughout the project. To understand the interaction that these drives would go through, we spent time studying one of the most fundamental file systems that are useful for this project, known as the network file system, which is basically a method for sharing files where "files reside on one or more remote servers accessible on a local client system", according to the course that we followed for this project. We mounted NFS and also booted up the BIOS which would prove useful for this project.
    4242
    4343**Week Resources**
     
    4949Start And Stop Services: https://linuxcourse.rutgers.edu/Winlab_Internship_2023/html/systemd.html
    5050
     51Article On Network File Systems (NFS): https://www.weka.io/learn/file-storage/what-is-network-file-system/
     52
    5153== **Week 3**
    5254**Summary**
     
    5759**Idea Of Our Progress This Week**
    5860
    59 Our main goal for this week was to spend time understanding the goal of the project and installing the necessary tools that would be used throughout the project, such as SSH, Orbit, Linux Packages, and eight cluster servers. We also got access to our Wireguard where we would be getting access to containers. We also installed and wired a cluster of 8 servers and issued some commands from our LXC containers in our terminal where we worked with Linux and understood some of the basic commands that would be useful for some more advanced work in the project. The final thing that we did was to install Debian and Ubuntu packages, remove Ubuntu packages with APT, and also searched and queried Ubuntu packages with APT.
     61   This week we were introduced to the idea of shell scripting as well as Python scripting and worked on some exercises in general. We also did the manual work of replacing the old network switch in the cluster with the new, 1 Gbit 48 port switch and also configured 2 vlans on the switch. After installing the Samsung drives, we noticed that new devices were added along with /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd. After that, we spent some time reading more articles about HPC Storage and about how that generally works. We then learnt about some of the most fundamental file systems that are useful for this project including ZFS, MDADM, as well as RAID.
     62  The RAID (redundant array of independent disks) is a way of storing the same data in different places on multiple hard disks, while the MDADM is a Linux utility used to create, manage, as well as monitor software-based RAID devices. We used the concepts that we then learnt in order to test MDRAID as well as ZFS for disk failure. We did the following: exported /scratch file system (RAID 5) which is built on 3 drives from the nodes to our LXC containers, mounting it there in general. We then checked the I/O performance on /scratch on the file server, and the node, and we finally checked the I/O performance of the NFS mounted file system. Here are some diagrams with some of the statistics of the comparison between local vs remote performance as well as normal versus degraded performance.
     63
     64We used the DD command to do the I/O scripting. After that, we captured an installation image on FOG after Debian 12 was installed in node03, and we learnt about the GRUB software for recovery purposes.
    6065
    6166**Week Resources**
     
    6772MDADM/ZFS Raid Configurations: https://linuxcourse.rutgers.edu/Winlab_Internship_2023/html/Raid_configurations.html
    6873
    69 == **Week 4**
     74== **Week 4 + Week 5**
    7075**Summary**
    71    - Focus on Ceph installation
     76   - Focused on Ceph installation
    7277   - Did some troubleshooting relating to OSDs
     78   - Ran I/O dd and fio scripts using Ceph
     79   - Installed Debian 11.7
    7380
    7481**Idea Of Our Progress This Week**
    7582
    76 Our main goal for this week was to spend time understanding the goal of the project and installing the necessary tools that would be used throughout the project, such as SSH, Orbit, Linux Packages, and eight cluster servers. We also got access to our Wireguard where we would be getting access to containers. We also installed and wired a cluster of 8 servers and issued some commands from our LXC containers in our terminal where we worked with Linux and understood some of the basic commands that would be useful for some more advanced work in the project. The final thing that we did was to install Debian and Ubuntu packages, remove Ubuntu packages with APT, and also searched and queried Ubuntu packages with APT.
    77 
     83This was probably the week where we did the most chunk of work that was relevant to the project, which was getting into the process of Ceph installation as well as working with Ceph. We watched three videos about how Ceph works, data security, as well as the process of self-balancing and also self-healing. While we were trying to mount Ceph, there was a lack of OSDs and we spent Week 4 trying to troubleshoot the problem
    7884**Week Resources**
    7985