Changes between Version 54 and Version 55 of Other/Summer/2023/Distributed


Ignore:
Timestamp:
Aug 7, 2023, 6:31:18 PM (16 months ago)
Author:
krs227
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Other/Summer/2023/Distributed

    v54 v55  
    7272This week we were introduced to the idea of shell scripting as well as Python scripting and worked on some exercises in general. We also did the manual work of replacing the old network switch in the cluster with the new, 1 Gbit 48 port switch and also configured 2 vlans on the switch. After installing the Samsung drives, we noticed that new devices were added along with /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd. After that, we spent some time reading more articles about HPC Storage and about how that generally works. We then learnt about some of the most fundamental file systems that are useful for this project including ZFS, MDADM, as well as RAID.
    7373
     74Before installation output:
     75hostadm@node04:~$ lsblk
     76NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
     77sda     8:0     0 447.1G  0 disk
     78├─sda1   8:1    0   512M  0 part /boot/efi
     79├─sda2   8:2    0 445.7G  0 part /
     80└─sda3   8:3    0   977M  0 part [SWAP]
     81sr0     11:0    1   738M  0 rom
     82
     83After installation output:
     84hostadm@node05:~$ lsblk
     85NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
     86sda     8:0     0 447.1G  0 disk
     87├─sda1   8:1    0   512M  0 part /boot/efi
     88├─sda2   8:2    0 445.7G  0 part /
     89└─sda3   8:3    0   977M  0 part [SWAP]
     90sdb     8:16   0 465.8G  0 disk
     91sdc     8:32   0 465.8G  0 disk
     92sdd     8:48   0 465.8G  0 disk
     93sr0     11:0    1   738M  0 rom
     94
    7495The RAID (redundant array of independent disks) is a way of storing the same data in different places on multiple hard disks, while the MDADM is a Linux utility used to create, manage, as well as monitor software-based RAID devices. We used the concepts that we then learnt in order to test MDRAID as well as ZFS for disk failure. We did the following: exported /scratch file system (RAID 5) which is built on 3 drives from the nodes to our LXC containers, mounting it there in general. We then checked the I/O performance on /scratch on the file server, and the node, and we finally checked the I/O performance of the NFS mounted file system. Here are some diagrams with some of the statistics of the comparison between local vs remote performance as well as normal versus degraded performance.
    7596
     
    127148   - Ansible Management LXC Setup: https://linuxcourse.rutgers.edu/Winlab_Internship_2023/html/Ansible_installation.html
    128149
    129 == **Week 6**
     150== **Week 7**
    130151**Summary**
    131    - Ceph one node manual installation
    132    - Created lxc Ansible containers
    133    - Worked with Ansible playbooks
     152   - Added and removed cluster networks to our Ceph nodes
     153   - Enabled the second network interface in our node
     154   - Added the cluster network in the configuration
     155   - Manually added the new node to the Ceph cluster
     156   - Developed Ansible playbooks to automate the process
    134157
    135158**Idea Of Our Progress This Week**
    136159This week, we spent most of our time focusing on the manual side installation of Ceph. In order for us to do it, we set up the manager as well as the monitor service setup. We also set up the dashboard, storage pool, file system, and also the CephFS setup. After that, we went over Ansible tasks and went. over how to work with Ansible tasks using yaml scripts which are also known as the Ansible playbooks. We then created playbooks in order for us to be able to fix hostname tasks. After that, the final thing that we did was to setup the LXC ansible container in order for us to test further configurations. We used passwordless sudo, deployed Ceph on one node, and created an Ansible playbook in order for us to mount the Ceph FS.
     160
     161[[Image(Ceph diagram.png)]]
     162
     163[[Image(Ceph diagram.png)]]
     164
     165[[Image(Ceph diagram.png)]]
    137166
    138167== **Week 10**
     
    154183- Week 5 (July 3 - July 7): https://docs.google.com/presentation/d/1-GlYdTUVjstotg2bEqquZz5WvsGXw3R0301GuNqOkq0/edit#slide=id.p
    155184- Week 6 (July 10 - July 14): https://docs.google.com/presentation/d/1-GlYdTUVjstotg2bEqquZz5WvsGXw3R0301GuNqOkq0/edit#slide=id.p
    156 - Week 7 (July 17 - July 21):
    157 https://docs.google.com/presentation/d/11OnqS8HUQT1ipjDuqICwsVQrfv0_AZFqHEjgwRpKNZw/edit#slide=id.g25ab189147e_3_5
     185- Week 7 (July 17 - July 21): https://docs.google.com/presentation/d/11OnqS8HUQT1ipjDuqICwsVQrfv0_AZFqHEjgwRpKNZw/edit#slide=id.g25ab189147e_3_5
    158186- Week 8 (July 24 - July 28): [Slides]
    159187- Week 9 (July 31 - August 4): https://docs.google.com/presentation/d/1-mlnMaOQyuYR4uzl357ZnpLlohKaLmN6NtvcZnTCSzU/edit#slide=id.p