James is working on a second generation inventory script. Currently on: reorganising code Versions: Gather:0.87 Writer:0.9 It's plan is to be simpler and less ambitious than it's predecessor, but still respect the sql table structure ("as much as possible.") As I see it there should be 3 parts to this script: 1. executer: runs either wget or apt-get install and copies the latest version of the other parts of the scripts then executes them 1. gatherer: collects information using only operating system based facilities (dmesg, lsusb, lspci, ifconfig, /sys). 1. Writer: checks the mysql repository for changes from the current state. If different changes them. '''NOTE''' Gatherer and Writer are being merged into a single file which I will call inventory2_client.rb The sql structure is a bit of a bit mess, the major tables of interest are: 1. motherboards - List of things that can be connected to, has its own id used to tie other tables to it 1. devices - List of deviced "connected" to mother boards 1. device_kinds - type identifier for connected devices (an attribute of a device). 1. locations - Converts x,y coordinates to a single integer that maps directly to a mother board. 1. inventories - records the start and stop time of the inventory pass. 1. testbeds - gives a testbed id for the specific domain, thus disambiguating node1-1 A lot of the tables are full of unused colums. I guess we'll just ignore them for now. The basic crux of an update should be the following: 1. examine our IP to determine our current location 1. We gather information about the mother board: 1. '''Gatherer:''' 1. Disk Size (dmesg) 1. Memory Size (dmesg) 1. Cpu number (dmesg) 1. Gather information about attached devices: 1. 2 wired Ethernet addresses (ifconfig, /sys) 1. 2 wireless Ethernet addresses (ifconfig, /sys) 1. any usb devices (lsusb, /sys) 1. '''Writer:''' 1. get the mother board id from the location table 1. update mother board information if diffrent, and stamp with current inventory number 1. add kinds if they don't exist already 1. update devices if diffrent and stamp with inventory number 1. profit. ---- == Require Tools / Libraries == 1. lsusb (usbutils.deb) 1. lspci (native) 1. dmesg (native) 1. ifconfig (native) 1. libxml-simple-ruby.deb 1. libmysql-ruby.deb 1. lshw (lshw.deb) ---- Gatherer: The disk size and memory size are a quick scan from dmesg. The disk size matches, but the memory size is a little off. It probably has to do with the way dmesg reports memory vs /sys reports memeory. It would be nice to find the /sys entry for consistency. In /sys/devices/pci0000:00 are the sub directories correlated with the specific Ethernet hardware. In each directory that correlated to an Ethernet device there will be a symbolic link with the operating system name of the device. This will allow us to match up the pci address(name of the subdirectory of /sys/devices/pci0000:00) to the mac address (from ifconfig). lspci can tell us the associated pci address and a hardware identifier string. lsusb on the otherhand offers a direct correlation to the device kind table, the ordered pair of numbers xxxx:yyyy directly correlated to the tables vendor and device ids. And the Bus xxx Device yyy number fits into the addres category of the device table. === 9/29/09 === I may have discovered the cause of the device / vendor discrepancy. Joe seems to be looking at /sys/class/net/devincename/device... perhaps this points to a different device id. I'll have to check it out. That being said I have a working Gahterer protoype: {{{ ssugrim@external2:~/scripts$ ruby gatherer.rb ssugrim@external2:~/scripts$ more /tmp/external2.xml <10.50.0.12 iface='eth1' host='external2.orbit-lab.org'/> <127.0.0.1 iface='' host=''/> <0 device='0001' bus_add='001:001' str='Linux Foundation 1.1 root hub' vendor='1d6b'/> ssugrim@external2:~/scripts$ }}} ---- === 10/2/09 === Minus error checking for failed commands, the gatherer is complete. I'm now moving onto writer. I'm going to keep them in the same script for now, so I don't have to deal with reimporting the data and extracting it from xml, at some point that'll be a todo, so that way we can call just the gatherer if we want to. Fow now, I need to determine what node I am based on the resolved host name. The scheme is nodex-y.testbedname# I can extract the x and y cooridnates from the node part, and then The testbed name will have to be a lookup. (this should probably be in gatherer as parameters. Once I have that I can look up my unique mysql id from the mysql databse. This id will then allow me to correlate devices with the ones I have. ---- Following the instructions on http://support.tigertech.net/mysql-duplicate I copied the mysql database from invetory1 to inventory2. One Caveat is noted on http://forums.digitalpoint.com/showthread.php?t=259486 {{{ In the top of the database file you are trying to dump you will see that : CREATE DATABASE `gunit_pimpjojo` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci; Just remove this from the dump ( Notepad or wherever you have the dump) Then re paste the file You just need to remove that line....and you will be good to go }}} === 10/5/09 === I've revamped the internal data types to facilitate the way xml-simple outputs, and re imports. Any multi argument results (eth, usb, ip) return and array of hashes. This creates clean xml. I also unfolded the cords has to single instance variables, they all get wrapped up into a single attribute. The new xml format looks like so. {{{ }}} I've also gone to the original two script model. Gatherer is "feature complete". ---- Working on the writer I've created a internal data type called Xmldata, it's got exactly the same fields as Info, but populates them from the generated xml file. Working on the mysql part of I have to examine the code that lives in {{{ ssugrim@internal1:/opt/gridservices2-2.2.0/lib/ogs_inventory$ }}} NOTE: mysql query strings should be crafted prior to the actual execusiton of the query, since they don't always form the way you think they do. Also the %&string& formulation of strings is very helpfull in getting the quotes correct. === 10/08/09 === The writer script is now equipped with two classes Xmldata, and Identify. Both can only be instantiated by the create command, making them singletons (create will only run new if one does not already exist). Identify instantiates an Xmldata class, and then uses the x and y coordinates and the domain to determine a location id (the global unique identifier that ties all the tables together.) I also get the max id from the Inventory ids, assuming that the highest number is the latest. === 10/12/09 === Quick edit to gatherer to convert the device and vendor tags to decimal instead of hex. The reason they didn't match before was because in the sql database, they are stored as decimal (I guess cuz you can't store hex in mysql). === 10/18/09 === Writer is "feature complete". The mail (non-data) class is Check_sql. Besides new, it's main methods are check and update. They respectively compare the xmldata against sql and update the database if the data doesn't match. I'd like to be more "indpendent" of the form of the xmldata, but that would involve a lot more dummy varibles and searching of hashes. Big TODO is mostly rescuing errors. First on the list is connect retries. Class interface descriptions to follow soon. === 10/20/09 === Modified both gatherer and writer to take parameters. The paramters are as follows: {{{ Writer: --server = #server hostname (default: internal1.orbit-lab.org) --user = #username for mysql --pass = #password --db = #database name (default: inventory2) --input = #input file name (default: foo.xml) Gatherer: --output = #name of outputfile (defualt: stdout) }}} Also now writer only checks vendor and device id. If no match is found it will add it with the description string. ---- === 10/26/09 === Modifying gather to use lshw to collect uuid (motherboard serial number) also changing the internal data types to more closely match the table contents e.g devices and motherboards. Originally I thought to just used lshw to gather most of the information, but this doesn't really gain us any thing since I would have to resort to the other tools (lsusb and lspci) to find the relevant entries in lshw output (and would require a huge rewrite of the gatherer). Lshw can output in a few diffrent ways. I'm currently using the direct line by line approach to search for the uuid. I did however experiment with the -xml output. When imported with !XmlSimple.xml_in(), we get a massive hash/array structure that contains all the data elements as either a value in a hash or a item in an array. To find what we're looking for we need a recursive tool to extract the relevant data structures. An example was written in titled ''lshw_recursion_example.rb'' the main recursive tool keeps calling the each method of every sub-element (hashes and arrays both have an each method, they behave differently, thus a check for class is needed first). One snag that was "hacked" in was that if we find the keyword in an array if we push the containing data type, all we get is an array with that keyword. I resolved this by passing the previous data structure as a parameter. If the keyword is found in a hash I store the hash, if it's found in an array, I store the previous hash. I opted to hunt for an list of words instead of a single one. Its more efficient than iterating across the entire structure multiple times for each word. We don't iterate through the words for every element, just the ones that are the termination of a branch. This saves a lot of computation and prevents a few type errors. Its assumed that the word list is significantly smaller than the size of the hash. Some example code: {{{ found = Hash.new() def hunt (dat,words,found,prev=nil) #check the type if dat.kind_of?(Array): dat.each do |v| #iterate over the current type and check for an instance the words words.each {|w| found.store(w,prev) if /#{w}/.match(v)} if v.kind_of?(String) #recursively call the function on the children of this data structre #note the parent is passed as a parameter as the array branch needs to store the container hunt(v,words,found,dat) end elsif dat.kind_of?(Hash) dat.each do |k,v| #same deal as the array cept we have a key,value combo, and we can store the current data #data structure. We still need to pass the parent as a parameter since we don't know #what type the child is words.each {|w| found.store(w,dat) if /#{w}/.match(v)} if v.kind_of?(String) hunt(v,words,found,dat) end end }}} ---- === 11/4/09 === I'll need to revisit the use of recursion for lshw. I have some working ideas on how to do it. Ivan suggest multi tier iterations where I hunt for keywords following some kind of "path of keywords". Using the "hunt" multiple times with a sequence of keywords (examining keys as well as values), we should be able to iteratively extract smaller and smaller data structures that contain more relevant information. More immediately are the changes that need to be made to write to reflect the table structure in the mysql they are: * Need to get mother board id from table matching against serial number * Update node to correlate mother board to location (when they move) * motherboard updates should only modify disk and memory (the mother board id should not change) * If a motherboard is not found then we insert it. * should get node id from sql table matching against location ---- === 11/17/09 === Modifications on writer have been completed (preliminary checks worked). * reverted Db.hupdate to only update. The calling functions should decide whether to insert or update. * Mb_id nows checks against serial instead of location in the Identify class * update_mb now checks for mb_id. If the ID is present it will update the record otherwise it will insert WITHOUT specifying and ID since SQL should autoincremt the ids * Nodes are uniquely identified by a triple of (node_id, location_id, motherboar_id). Its assumed that the (node_id,location_id) portion is constant. Thus the only change/ update we should check for and preform is to ensure that the motherboard id matches for a given node_id, location_id pair. the update_node function only modifies motherboard_id's Things that need to be done: * move all the "checks" into the get methods (depreciate the get methods). check() should simply call the sub_check methods and retain a hash of matches for specific data structures (table entries). * update can then iterate that hash and call the respective update functions for each give table. * to that end the update_device method needs to be split in two to reflect the data structures * the data structure design paradigm should be to have one data structure for each table that needs to be checked / update. It's pretty close to this, but needs a little work.