Changes between Version 42 and Version 43 of Other/Summer/2023/Distributed


Ignore:
Timestamp:
Jul 10, 2023, 6:01:58 PM (12 months ago)
Author:
krs227
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Other/Summer/2023/Distributed

    v42 v43  
    5959**Idea Of Our Progress This Week**
    6060
    61    This week we were introduced to the idea of shell scripting as well as Python scripting and worked on some exercises in general. We also did the manual work of replacing the old network switch in the cluster with the new, 1 Gbit 48 port switch and also configured 2 vlans on the switch. After installing the Samsung drives, we noticed that new devices were added along with /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd. After that, we spent some time reading more articles about HPC Storage and about how that generally works. We then learnt about some of the most fundamental file systems that are useful for this project including ZFS, MDADM, as well as RAID.
    62   The RAID (redundant array of independent disks) is a way of storing the same data in different places on multiple hard disks, while the MDADM is a Linux utility used to create, manage, as well as monitor software-based RAID devices. We used the concepts that we then learnt in order to test MDRAID as well as ZFS for disk failure. We did the following: exported /scratch file system (RAID 5) which is built on 3 drives from the nodes to our LXC containers, mounting it there in general. We then checked the I/O performance on /scratch on the file server, and the node, and we finally checked the I/O performance of the NFS mounted file system. Here are some diagrams with some of the statistics of the comparison between local vs remote performance as well as normal versus degraded performance.
     61This week we were introduced to the idea of shell scripting as well as Python scripting and worked on some exercises in general. We also did the manual work of replacing the old network switch in the cluster with the new, 1 Gbit 48 port switch and also configured 2 vlans on the switch. After installing the Samsung drives, we noticed that new devices were added along with /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd. After that, we spent some time reading more articles about HPC Storage and about how that generally works. We then learnt about some of the most fundamental file systems that are useful for this project including ZFS, MDADM, as well as RAID.
     62
     63The RAID (redundant array of independent disks) is a way of storing the same data in different places on multiple hard disks, while the MDADM is a Linux utility used to create, manage, as well as monitor software-based RAID devices. We used the concepts that we then learnt in order to test MDRAID as well as ZFS for disk failure. We did the following: exported /scratch file system (RAID 5) which is built on 3 drives from the nodes to our LXC containers, mounting it there in general. We then checked the I/O performance on /scratch on the file server, and the node, and we finally checked the I/O performance of the NFS mounted file system. Here are some diagrams with some of the statistics of the comparison between local vs remote performance as well as normal versus degraded performance.
    6364
    6465We used the DD command to do the I/O scripting. After that, we captured an installation image on FOG after Debian 12 was installed in node03, and we learnt about the GRUB software for recovery purposes.