Linux Cluster Builder - Bare Metal Red Hat & SUSE
teach probepal how to recognize SLES 15 Package media.
teach probepal how to recognize SLES15 sp1 installer iso, because sles puts spaces in version numbers
add probe name to the list of attributes in a palletinfo object
make pallets.cgi return os and arch information for pallets. Refactor pallets.cgi to use probepal
Fix frontend-install.py
to detect package existence accurately with zypper
Create blank versions of some apache conf files so package updates won't add them back
Fix regression in load storage partition
Support for MTU in SLES
Change pallet fingerprinting for sles11/12 to match the rest of stacki
The version
field is now the major version number concat'd with service pack.
The release
field is now the distro concat'd with the major version number.
NAME VERSION RELEASE ARCH OS BOXES
stacki 5.4_20191118_ed3441d sles12 x86_64 sles default frontend
SLES 12sp3 sles12 x86_64 sles -----
Use the same starting node for stackios as frontend-install.py
do get()'s instead of indexing into the attributes dicitonary to not crash on a stackios frontend install
add a stack sync config. This is needed in stackios (at least to fix the hosts file). TODO: better place for this line...
Explicitly set the permissions in 'add pallet', to make them compatible with apache - this was broken in stackios
stackios needs to set the hostname attr explicitly in site.attrs
Fix logrotate config not being laid down during install
add pallets to the frontend box in StackiOS
remove the very old create keys command. Use ssh-keygen instead.
don't apply pallet patches when creating jumbo pallets
Remove/replace unnecessary calls into stack.file
Add pallet by network path should normalize the path ('//' -> '/')
(and then actually use the result)
Don't throw away wget output in 'add pallet', in case we have an error
On CentOS paths for mariadb RPM are borked
pip2src sometimes lost bootstrap for deps
If multiple packages shared a dependency but only one package was set for bootstrap sometimes the dependencies would not bootstrap.
Keep Docker on CentOS7
latest
image tag is now CentOS8 and we are not ready for that.
add host
sets the boot action to os
remove test for state that doesn't happen
Make pallet patching work for any pallet, more extendable, and available outside of 'add pallet'
Add example partition spreadsheets for uefi based hosts
Rework report system partition test to work on RedHat and SLES
Added TAB completion to stack command.
Add port forwarding to cluster-up
You can now pass a new --forward-ports=SRC:DST[,...]
flag to the cluster-up.sh
command to forward ports from the host (SRC) to the guest VM (DST). Multiple comma separated SRC:DST pairs can be passed in the flag.
Allow json data to be posted directly via the webservice
Allow non-frontend parents to be time servers
Also allow for non management of time
Remove the sync.hosts
attribute
Must use manage.hostfile
and sync.hostsfile
from now on.
Add parameter to stack load
to have it run the commands
Fine grain management of /etc/hosts
Attributes
manage.hostfile - file created during installation (default is False/None)
sync.hostsfile - file creating the file during sync host network
if
manage.hostsfile
is also true. (default is False/None)
Transitioning away from the sync.hosts attribute that controlled both of these as all or nothing.
For Teradata the default will be managed.hostsfile=True and sync.hostsfile=False, hence the above change.
Stop managing the ssh host keys
This means a re-install WILL change the ssh host keys.
'stack load' includes pallet tags (if pallet exists)
Create a stack-templates package for jinja templates. Templatize 'report.named'.
Add os and environment support to stack load
add 'prefix' attribute to 'stack:report' SUX tags. This allows 'report' commands outside of 'stack report more stuff'
during sync host network, also sync /etc/hosts, but only if the attr sync.hosts=True
Also sync /etc/resolv.conf during sync host network. Cleanup seemingly superfluous 'report' calls
Add a new "external" appliance for unmanaged hosts
Relocate the host interface alias
commands
The [add|list|remove] host alias commands are now located at
[add|list|remove] host interface alias
to better reflect that the
command actually attaches the alias to a host interface.
Port to Red Hat 7.6.1810
Halt the install (sles) with an error message on the console and in the message queue if we are unable to create a RAID. To override, set the attribute 'halt_install_on_error=False'.
Add firmware management framework to stacki
Adds a set of stacki commands to manage firmware for devices. The initial set of supported hardware includes Dell x1052 switches and Mellanox m7800 and m6036 infiniband switches.
report system
checks if the backend installs succeeded
Improved cluster data output in report system
The stack report system
command now outputs a section in the end of the output containing all the list
data collected during the commands. I also added data for list cart
because it seemed to be missing.
To use this in other tests, simply include the new report_output
fixture and call it inside of your test function, passing it a "title" for the output and the output itself, both as strings.
Refactor storage partition commands to use scope
There are now scope level commands for storage partition
:
stack add [appliance, environment, host, os] storage partition
stack list [appliance, environment, host, os] storage partition
stack remove [appliance, environment, host, os] storage partition
These versions of the commands operate on the global scope:
stack add storage partition
stack list storage partition
stack remove storage partition
/etc/resolv.conf
improved handling
resolv.search
attribute can override the search line.
Only the Frontend has access to the Kickstart_PublicDNSServers
. For
all other nodes the Frontend will be the first nameserver IFF it is
serving DNS for any network (this was previously the case as well),
but the subsequent nameserver lines will come only from the
Kickstart_PrivateDNSServers
.
Added checklist service that gives information about backend installation.
This can be invoked via systemd 'systemctl start|stop|status checklist' or as a standalone script via 'export STACKDEBUG=y;/opt/stack/bin/checklist.py'. Installation status is logged to /var/log/checklist.log and will also be added to console in the debug mode.
Added support for target,*,1,*,
in storage CSV.
This will pair up drives and put them in a RAID 1 configurations.
setting Kickstart_PrivateDNSServers to ' ' removes nameserver line from /etc/resolv.conf Also if no Public and Private DNSServers are defined don't setup stacki DNS to forward.
Vagrant based magic to have Stacki build itself
NTP Fixes
Support for NTP service across the cluster.
Correct redhat os name for sync host firewall
This fixes a bug in sync host firewall on redhat based hosts. Before the condition for calling iptables expected the string rhel instead of redhat, with the latter being set as the os name on redhat hosts. This resulted in a service that is sles only trying to be called (stacki-iptables) and failing when trying to sync the firewall.
Update 'load networkfile' documentation
print cleaner error message on XML parsing errors while trying 'help' instead of stacktrace.
Also fix an XML parsing error in 'sync host firmware's docstring
Add/correct more docstrings in dump and load
Add docstrings to each dump command.
Fix permissions of uefi files for the tftp server
When setting up a frontend to serve uefi based hosts on CentOS, the previous behavior was to just copy shim.efi and grubx64.efi from /boot/EFI/centos without changing their file permissions. These files had permissions of 0700, resulting in tftp permission denied errors when trying to retrive files from the host, causing boot errors. This commit changes these files to be readable by anyone (0644) after being copied so they can be served to hosts via tftp.
fix bug where we were unable to remove a pallet if the directory contents were already removed.
Only allow valid hostname labels
Validity is defined by RFCs 952 and 1123.
Hostnames in add/set host
are validated as one label.
smq-publish was accessing an attributes
variable that didn't exist in the exception code path
add cart shouldn't fail if it can't import requests (which is not available early in the barnacle process)
Allow cluster-up to use the local frontend-install.py
If it can't find a local copy, then try to pull it from Github using a branch which matches the version embedded in the ISO name.
Fix conditionally installing RPMs in frontend-install.py
The old code was still including stack-templates
in the package list, which needed to be conditional to support older versions of Stacki. I also found a cleaner way to probe if a package is availble to install.
Give self-signed certificates a unique serial
Browsers don't like it when your SSL cert has the same serial number (in this case 0) as a previous SSL cert for the same domain. This bits you when you are using localhost
as your domain and you reinstall the frontend.
Added checking to skip repoquery command if the os is sles
Fix report system tests incorrectly using stack.api.Call
print error messages if they come up in 'list help'
Import errors can trip up the overall stack command if they happen in a part of the overall stack command package.
Fix minimal PDperarray value for Raid 50, and Raid 60.
Add stack-templates to server installs
(even non-barnacled ones)
Fail if the command being evaled fails
Frontend goes into the frontend box
add check to install script to not break older stacki releases with newer install scripts
Fix a handful of print
syntax errors
Load storage controllers into the correct scope
Fix not loading the firewall rule names
const_overwrite
attribute processing fix
in sync host boot
a notify=true parameter was trying to be
appended to the report host bootfile
call, but in that code it was
being treated as a host.
Always sync /etc/hosts on the Frontend
Don't dump attributes marked 'shadow'
only restart named if a network in the database has dns=true
when running against a host with an interface with no zone, default to using just the hostname
Also fix various places where we assume a host, including the FE, will have a 'domainname' attribute.
Load storage partitions under the correct scope
Properly set the file system type when loading partitions
Fail to get scope mappings if arguments are provided at the global scope
Frontend goes into frontend box
Environment commands fails w/o environments defined
Change default stacki frontend time to be UTC.
Apache appears to need restarted after the barnacle during the Redhat bootable ISO build
Fix SQL error in list attr
In the addGlobalAttrs
function used to generate Kickstart_PrivateHostname
and Kickstart_PublicHostname
contained an SQL statement testing if the network interface name was null, using <>
. This will always return false, no matter what, and should have been IS NOT NULL
instead.
Import sys for the rare error path
Also cleanup some dangling whitespace.
add os to supported parameters
Add support for specifying the os (sles, redhat) when creating a mirror. Defaults to the current os if not specified
fix the infiniband table definitions to actually clean up the database if foreign keys are removed
missed an argument for the error handling code in controller config
jsoncomment
needs to be part of the bootstrap in order for the command line to function.
topo.py default to localhost
Add command output to web server when command error is raised
This fixes a bug that cropped up when running report system on the web service, as it was noticed only an output that a test failed was coming back to the client. It was then discovered that only stderr would be given back as the web service only returned it when a CommandError was raised. Now both stderr and stdout are output. In addition if an exception is not raised for non sudo commands, json output is checked for and if not it formats as json vs the previous behavior of giving a stacktrace.
For the first fix if wsclient report system was run the output would be:
{"API Error": "error - One or more tests failed"}
Now with the fix:
{"API Error": "error - One or more tests failed", "Output":"$TESTOUTPUT"}
where $TESTOUTPUT is the results from report system
For the second fix, assuming a command doesn't raise an exception and does not output valid json.
Before:
"$TESTOUTPUT"
Now:
{"Output": "$TESTOUTPUT"}
Display an error message on the console and the message queue if we are unable to find any disks on the host.
attempt to umount all partitions on a disk before nuking.
If we are unable to unmount the partitions, raise an error unless attribute 'halt_install_on_error=False'.
Correctly identify if a block device is a disk or partition on sles11
Previous code assumed the output of hwinfo would be more consistent, but this does not appear to be the case and could cause a stacktrace that led to storage not being initialized.
Handle nukedisks before the we start autoyast. This prevents the situation where the installer hangs because it attempted to mount a disk and could not later nuke it.
Start Chronyd on frontend
Fixed a bug where install messages where not appearing. Also added checklist.log to logrotate.conf file to rotate at 100MB.
Do interface validation of set interface
commands in a case-insensitive manor, to match the DB.
The command is systemctl, not systemd
If an interface 'name' exists, then add it to /etc/hosts
Remove myhostname
from nsswitch.conf on Redhat.
It is unneeded because we always add localhost and localhost.localdomain to the /etc/hosts file we generate. Removing it brings Redhat inline with SLES, and prevents Redhat from resolving non-existant .localdomain
subdomains to localhost.
Cleanup how hostnames for "run" commands are calculated
Fix permission error when installing host with new interface
When a backend is reinstalled and there is an interface not in the database, the apache user would attempt to do a sync config and cause a permission error to appear in the install logs. This isn't necessary during this stage (it's worked fine until now with that error message) so pass in a flag to skip it.
Tell logrotate it's ok to skip rotating missing logs instead of erroring. Also include local0 in rotation, and make the log rotation size 100M for all logs in stack.conf
The logrotate file is laid down via SUX, so existing frontends can get an approximation of this fix with:
sed -i 's/local\[1/local\[0/' /etc/logrotate.d/stack
sed -i '1s;^;size=100M\nmissingok\n;' /etc/logrotate.d/stack
logrotate /etc/logrotate.conf -d 2>/dev/null && echo "success"
Fix NTP code for frontends.
Allow external users to pxe-boot backends
Also, I made the auto root login for the frontend check that the shell is interactive.
Downgrade flask in the SLES 11 install environment to 0.12.4, before ssl was a requirement
Lots of database schema changes.
There is new a database schema for the storage_partition table. This SQL will update an existing DB, but you will lose your existing partition configurations in the process:
DROP TABLE IF EXISTS storage_partition;
CREATE TABLE storage_partition (
id INT AUTO_INCREMENT PRIMARY KEY,
scope_map_id INT NOT NULL,
device VARCHAR(128) NOT NULL,
mountpoint VARCHAR(128) DEFAULT NULL,
size INT NOT NULL,
fstype VARCHAR(128) DEFAULT NULL,
partid INT NOT NULL,
options VARCHAR(512) NOT NULL,
INDEX (device),
INDEX (mountpoint),
INDEX (device, mountpoint),
FOREIGN KEY (scope_map_id) REFERENCES scope_map(id) ON DELETE CASCADE
);
Added the following tables for firmeware management:
DROP TABLE IF EXISTS firmware_mapping; DROP TABLE IF EXISTS firmware; DROP TABLE IF EXISTS firmware_model; DROP TABLE IF EXISTS firmware_make; DROP TABLE IF EXISTS firmware_imp; DROP TABLE IF EXISTS firmware_version_regex;
CREATE TABLE firmware_version_regex ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, regex VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, description VARCHAR(2048) NOT NULL, INDEX (name), CONSTRAINT unique_name UNIQUE (name) );
CREATE TABLE firmware_imp ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, INDEX (name), CONSTRAINT unique_name UNIQUE (name) );
CREATE TABLE firmware_make ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, version_regex_id INT DEFAULT NULL, FOREIGN KEY (version_regex_id) REFERENCES firmware_version_regex(id) ON DELETE SET NULL, INDEX (name), CONSTRAINT unique_name UNIQUE (name) );
CREATE TABLE firmware_model ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, make_id INT NOT NULL, imp_id INT NOT NULL, version_regex_id INT DEFAULT NULL, FOREIGN KEY (make_id) REFERENCES firmware_make(id), FOREIGN KEY (imp_id) REFERENCES firmware_imp(id), FOREIGN KEY (version_regex_id) REFERENCES firmware_version_regex(id) ON DELETE SET NULL, INDEX (name), CONSTRAINT unique_make_model UNIQUE (make_id, name) );
CREATE TABLE firmware ( id INT AUTO_INCREMENT PRIMARY KEY, model_id INT NOT NULL, source VARCHAR(2048) NOT NULL, version VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, hash_alg VARCHAR(255) NOT NULL default 'md5', hash VARCHAR(2048) NOT NULL, file VARCHAR(2048) NOT NULL, FOREIGN KEY (model_id) REFERENCES firmware_model(id), INDEX (version), CONSTRAINT unique_model_version UNIQUE (model_id, version) );
CREATE TABLE firmware_mapping ( id INT AUTO_INCREMENT PRIMARY KEY, node_id INT NOT NULL, firmware_id INT NOT NULL, FOREIGN KEY (node_id) REFERENCES nodes(ID) ON DELETE CASCADE, FOREIGN KEY (firmware_id) REFERENCES firmware(id) ON DELETE CASCADE, CONSTRAINT unique_node_firmware UNIQUE (node_id, firmware_id) );
stacki 5.4
Add more tests for hostname validation
Test lengths of 1, 0, max, max+1
Improve regex and documentation of hostname validation
Remove extra case from regex and use case-insensitive flag Fully describe our definition of a valid hostname Describe regex with ABNF (inline comments with verbose flag)
Rename hostname validation function for clarity
Revert "NOMERGE: Don't pipe to devnull so I can see the error"
This reverts commit d951ab92fc2b0510459bc5c9dfa2188ec5f99934.
Revert "INTERNAL: For a test-framework run, pull the version of frontend-install.py from the same branch instead of develop"
This reverts commit e0a117e0d947866d347c0828f318d0189aa0a387.
Revert "BUGFIX: Frontend goes into frontend box"
This reverts commit 581c02692c0476befa8659096f36c153c7a1439f.
Merge branch 'release/05.03.00.00' into develop
Merge branch 'release/05.03.00.00'
Allow non-frontend parents to be time servers
Also allow for non management of time
Allow json data to be posted directly via the webservice
Add parameter to stack load
to have it run the commands
Remove the sync.hosts
attribute
Must use manage.hostfile
and sync.hostsfile
from now on.
Fine grain management of /etc/hosts
Attributes
manage.hostfile - file created during installation (default is False/None)
sync.hostsfile - file creating the file during sync host network
if
manage.hostsfile
is also true. (default is False/None)
Transitioning away from the sync.hosts attribute that controlled both of these as all or nothing.
For Teradata the default will be managed.hostsfile=True and sync.hostsfile=False, hence the above change.
Added support for target,*,1,*,
in storage CSV.
This will pair up drives and put them in a RAID 1 configurations.
setting Kickstart_PrivateDNSServers to ' ' removes nameserver line from /etc/resolv.conf
Also if no Public and Private DNSServers are defined don't setup stacki DNS to forward.
Cleanup how hostnames for "run" commands are calculated
Tell logrotate it's ok to skip rotating missing logs instead of erroring. Also include local0 in rotation, and make the log rotatio
n size 100M for all logs in stack.conf
The logrotate file is laid down via SUX, so existing frontends can get an approximation of this fix with:
sed -i 's/local\[1/local\[0/' /etc/logrotate.d/stack
sed -i '1s;^;size=100M\nmissingok\n;' /etc/logrotate.d/stack
logrotate /etc/logrotate.conf -d 2>/dev/null && echo "success"
Merge branch 'support/05.02.06.x' of github.com:/Teradata/stacki into support/05.02.06.x
Fix test to match new code.
/etc/resolv.conf
improved handling
resolv.search
attribute can override the search line.
Only the Frontend has access to the Kickstart_PublicDNSServers
. For
all other nodes the Frontend will be the first nameserver IFF it is
serving DNS for any network (this was previously the case as well),
but the subsequent nameserver lines will come only from the
Kickstart_PrivateDNSServers
.
Do interface validation of set interface
commands in a case-insensitive manor, to match the DB.
The command is systemctl, not systemd