Pablo Iranzo Gómez's blog

jul 31, 2017

Magui for analysis of issues across everal hosts.

Background

Citellus allows to check a sosreport againt known problems identified on the provided tests.

This approach is easy to implement and easy to test but has limitations when a problem can span across several hosts and only the problem reveals itself when a general analysis is performed.

Magui tries to solve that by running the analysis functions inside citellus across a set of sosreports, unifying the data obtained per citellus plugin.

At the moment, Magui just does the grouping of the data and visualization, for example, give it a try with the seqno plugin of citellus to report the sequence number in galera database:

[user@host folder]$ magui.py * -f seqno # (filtering for ‘seqno’ plugins).
{'/home/remote/piranzo/citellus/citellus/plugins/openstack/mysql/seqno.sh': {'ctrl0.localdomain': {'err': '08a94e67-bae0-11e6-8239-9a6188749d23:36117633\n',
                                                                                                   'out': '',
                                                                                                   'rc': 0},
                                                                             'ctrl1.localdomain': {'err': '08a94e67-bae0-11e6-8239-9a6188749d23:36117633\n',
                                                                                                   'out': '',
                                                                                                   'rc': 0},
                                                                             'ctrl2.localdomain': {'err': '08a94e67-bae0-11e6-8239-9a6188749d23:36117633\n',
                                                                                                   'out': '',
                                                                                                   'rc': 0}}}

Here, we can see that the sequence number on the logs is the same for the hosts.

The goal, once tis has been discussed and determined, is to write plugins that get the raw data from citellus and applies logic on top by parsing the raw data obtained by the increasing number of citellus plugins and is able to detect issues like, for example:

  • galera seqno
  • cluster status
  • ntp syncronization across nodes
  • etc

Hope it's helpful for you! Pablo

PD: We've proposed this to be a talk in upcoming OSP Summit 2017 in Sydney, so if you want to see us there, don't forget to vote on https://www.openstack.org/summit/sydney-2017/vote-for-speakers#/19095

Click to read and post comments

jul 26, 2017

Citellus: framework for detecting known issues in systems.

Background

Since I became Technical Account Manager for Cloud and later as Software Maintenance Engineer for OpenStack, I became officially part of Red Hat Support.

We do usually diagnose issues based on data from the affected systems, sometimes from one system, and most of the times, from several at once.

It might be controllers nodes for OpenStack, Computes running instances, IdM, etc

In order to make it easier to grab the required information, we rely on sosreport.

Sosreport has a set of plugins for grabbing required information from system, ranging from networking configuration, installed packages, running services, processes and even for some components, it can also check API, database queries, etc.

But that's all, it does data gathering, packaging in a tarball but nothing else.

In OpenStack we've already identified common issues, so we create kbases for them, ranging from covering some documentation gaps, to speficic use cases or configuration options.

Many times, a missed configuration (documented) is causing headaches and can be checked with a simple checks, like TTL in ceilometer or stonith configuration on pacemaker.

Here is where Citellus comes to play.

Citellus

The Citellus project https://github.com/zerodayz/citellus/ created by my colleague Robin, aims on creating a set of tests that can be executed against a live system or an uncompressed sosreport tarball (it depends on the test if it applies to one or the other).

The philosphy behind is very easy: - There's a wrapper citellus.py which allows to select plugins to use, or folder containing plugins, verbosity, etc and a sosreport folder to act against. - The wrapper does check the plugins available (can be anything executable from linux, so bash, python, etc are there to be used) - Then it setups some environment variables like the path to find the data and proceeds to execute the plugins against, recording the output of them. - The plugins, on their side, determine if: - Plugin should be run or skipped if it's a live system, a sosreport - Plugin should run because of required file or package missing - Provide return code of: - 0 for success - 1 for failure - 2 for skip - anything else (Undetermined error) - Provide 'stderr' with relevant messages: - Reason to be skipped - Reason for failure - etc - The wrapper then sorts the output, and prints it based on settings (grouping skipped and ok by default) and detailing failures.

You can check the provided plugins on the github repo (and hopefully also collaborate sending yours).

Our target is to keep plugins easy to write, so we can extend the plugin set as much as possible, hilighting were focus should be put at first and once typical issues are ruled out, check on the deeper analysis.

Even if we've started with OpenStack plugins (that's what we do for a living), the software is open to check against whatever is there, and we've reached to other colleagues in different speciality areas to provide more feedback or contributions to make it even more useful.

As Citellus works with sosreports it is easy to have it installed locally and test new tests.

Writing a new test

Leading by the example is probably easier, so let's illustrate how to create a basic plugin for checking if a system is a RHV hosted engine:

#!/bin/bash

if [ "$CITELLUS_LIVE" = "0" ];  ## Checks if we're running live or not
then
        grep -q ovirt-hosted-engine-ha $CITELLUS_ROOT/installed-rpms ## checks package
        returncode=$?  #stores return code
        echo “ovirt-hosted-engine is not installed “ >&2 #Outputs info
        exit $returncode #returns code to wrapper
else
        echo “Not running on Live system” >&2
        exit 2
fi

Above example is a bit 'hacky', as we count on wrapper not outputing information if return code is 0, so it should have another conditional to write output or not.

How to debug?

Easiest way to do trial-error would be to create a new folder for your plugins to test and use something like this:

[user@host mytests]$ ~/citellus/citellus.py /cases/01884438/sosreport-20170724-175510/ycrta02.rd1.rf1 ~/mytests/  [-d debug]


DEBUG:__main__:Additional parameters: ['/cases/sosreport-20170724-175510/hostname', '/home/remote/piranzo/mytests/']
DEBUG:__main__:Found plugins: ['/home/remote/piranzo/mytests/ovirt-engine.sh']
_________ .__  __         .__  .__                
\_   ___ \|__|/  |_  ____ |  | |  |  __ __  ______
/    \  \/|  \   __\/ __ \|  | |  | |  |  \/  ___/
\     \___|  ||  | \  ___/|  |_|  |_|  |  /\___ \
 \______  /__||__|  \___  >____/____/____//____  >
        \/              \/                     \/
found #1 tests at /home/remote/piranzo/mytests/
mode: fs snapshot /cases/sosreport-20170724-175510/hostname
DEBUG:__main__:Running plugin: /home/remote/piranzo/mytests/ovirt-engine.sh
# /home/remote/piranzo/mytests/ovirt-engine.sh: failed
    “ovirt-hosted-engine is not installed “

DEBUG:__main__:Plugin: /home/remote/piranzo/mytests/ovirt-engine.sh, output: {'text': u'\x1b[31mfailed\x1b[0m', 'rc': 1, 'err': '\xe2\x80\x9covirt-hosted-engine is not installed \xe2\x80\x9c\n', 'out': ''}

That debug information comes from the python wrapper, if you need more detail inside your test, you can try set -x to have bash showing more information about progress.

Keep always in mind that all functionality is based on return codes and the stderr message to keep it simple.

Hope it's helpful for you! Pablo

PD: We've proposed this to be a talk in upcoming OSP Summit 2017 in Sydney, so if you want to see us there, don't forget to vote on https://www.openstack.org/summit/sydney-2017/vote-for-speakers#/19095

Click to read and post comments

feb 23, 2017

InfraRed for deploying OpenStack

InfraRed is tool that allows to install/provision OpenStack. You can find the documentation for the project at http://infrared.readthedocs.io.

Also, developers and users are online in FreeNode at #infrared channel.

Why InfraRed?

Deploying OSP with OSP-d (TripleO) requires several setup steps for preparation, deployment, etc. InfraRed simplifies them by automating with ansible most of those steps and configuration.

  • It allows to deploy several OSP versions
  • Allows to ease connection to installed vm roles (Ceph, Computes, Controllers, Undercloud)
  • Allows to define working environments so one InfraRed-running host can be used to manage different environments
  • and much more...

Setup of InfraRed-running host

Setting InfraRed is quite easy, at the moment the version 2 (branch on github) is working pretty well.

We'll start with:

  • Clone GIT repo: git clone https://github.com/redhat-openstack/infrared.git
  • Create a virtual ENV so we can proceed with installation, later we'll need to source it before each use. cd infrared ; virtualenv .venv && source .venv/bin/activate
  • Proceed with upgrade of pip and setuptools (required) and installation of InfraRed
    • pip install --upgrade pip
    • pip install --upgrade setuptools
    • pip install .

Remote host setup

Once done, we need to setup the requirements on the host we'll use to virtualize, this includes, having the system registered against a repository providing required packages.

  • Register RHEL7 and update:
    • subscription-manager register (provide your credentials)
    • subscription-manager attach --pool= (check pool number first)
    • subscription-manager repos --disable=*
    • for canal in rhel-7-server-extras-rpms rhel-7-server-fastrack-rpms rhel-7-server-optional-fastrack-rpms rhel-7-server-optional-rpms rhel-7-server-rh-common-rpms rhel-7-server-rhn-tools-rpms rhel-7-server-rpms rhel-7-server-supplementary-rpms rhel-ha-for-rhel-7-server-rpms;do subscription-manager repos --enable=$canal; done

NOTES

  • OSP7 did not contain RPM packaged version of images, a repo with the images needs to be defined like:
    • time infrared tripleo-undercloud --version $VERSION --images-task import --images-url $REPO_URL
    • NOTE: --images-task import and --images-url
  • Ceph failed to install unless --storage-backend ceph was provided (open bug for that)

Error reporting

  • IRC or github

RFE/BUGS

Some bugs/RFE on the way to get implemented some day:

  • Allow use of localhost to launch installation against local host
  • Multi env creation, so several osp-d versions are deployed on the same hypervisor but one launched
  • Automatically add --storage-backend ceph when ceph nodes defined

Using Ansible to deploy InfraRed

This is something that I began testing to automate the basic setup, still is needed to decide version to use, and do deployment of infrastructure vm's but does some automation for setting up the hypervisors.

---
- hosts: all
  user: root

  tasks:
    - name: Install git
      yum:
        name:
          - "git"
          - "python-virtualenv"
          - "openssl-devel"
        state: latest

    - name: "Checkout InfraRed to /root/infrared folder"
      git:
        repo: https://github.com/redhat-openstack/infrared.git
        dest: /root/infrared

    - name: Initialize virtualenv
      pip:
        virtualenv: "/root/infrared/.venv"
        name: setuptools, pip

    - name: Upgrade virtualenv pip
      pip:
        virtualenv: "/root/infrared/.venv"
        name: pip
        extra_args: --upgrade

    - name: Upgrade virtualenv setuptools
      pip:
        virtualenv: "/root/infrared/.venv"
        name: setuptools
        extra_args: --upgrade

    - name: Install InfraRed
      pip:
        virtualenv: "/root/infrared/.venv"
        name: file:///root/infrared/.

This playbook will do checkout of git repo, setup extra pip commands to upgrade virtualenv's deployed pip and setuptools, etc.

Deploy environment examples

This will show the commands that might be used to deploy some environments and some sample timings on a 64Gb RAM host.

Common requirements

export HOST=myserver.com
export HOST_KEY=~/.ssh/id_rsa
export ANSIBLE_LOG_PATH=deploy.log

Cleanup

time infrared virsh --cleanup True --host-address $HOST --host-key $HOST_KEY

OSP 9 (3 + 2)

Define version to use

export VERSION=9

time infrared virsh --host-address $HOST --host-key $HOST_KEY --topology-nodes "undercloud:1,controller:3,compute:2"

real    11m19.665s
user    3m7.013s
sys     1m27.941s

time infrared tripleo-undercloud --version $VERSION --images-task rpm

real    48m8.742s
user    10m35.800s
sys     5m23.126s

time infrared tripleo-overcloud --deployment-files virt --version 9 --introspect yes --tagging yes --post yes

real    43m44.424s
user    9m36.592s
sys     4m39.188s

OSP 8 (3+2)

export VERSION=8

time infrared virsh --host-address $HOST --host-key $HOST_KEY --topology-nodes "undercloud:1,controller:3,compute:2"

real    11m29.478s
user    3m10.174s
sys     1m28.276s

time infrared tripleo-undercloud --version $VERSION --images-task rpm

real    40m47.387s
user    9m14.151s
sys     4m24.820s

time infrared tripleo-overcloud --deployment-files virt --version $VERSION --introspect yes --tagging yes --post yes

real    42m57.315s
user    9m2.412s
sys     4m25.840s

OSP 10 (3+2)

export VERSION=10

time infrared virsh --host-address $HOST --host-key $HOST_KEY --topology-nodes "undercloud:1,controller:3,compute:2"

real    10m54.710s
user    2m42.761s
sys     1m12.844s

time infrared tripleo-undercloud --version $VERSION --images-task rpm

real    43m10.474s
user    8m34.905s
sys     4m3.732s

time infrared tripleo-overcloud --deployment-files virt --version $VERSION --introspect yes --tagging yes --post yes

real    54m1.111s
user    11m55.808s
sys     6m1.023s

OSP 7 (3+2+3)

export VERSION=7

time infrared virsh --host-address $HOST --host-key $HOST_KEY --topology-nodes "undercloud:1,controller:3,compute:2,ceph:3"

real    13m46.205s
user    3m46.753s
sys     1m47.422s

time infrared tripleo-undercloud --version $VERSION --images-task import    --images-url $URLTOIMAGES

real    43m14.471s
user    9m45.479s
sys     4m53.126s

time infrared tripleo-overcloud --deployment-files virt --version $VERSION --introspect yes --tagging yes --post yes     --storage-backend ceph

real    86m47.471s
user    20m2.582s
sys     9m42.577s

Wrapping-up

Please do refer to the InfraRed documentation to get deeper in its possibilities and if interested, consider contributing!

Click to read and post comments

feb 20, 2017

Getting started with Ansible

I've started to get familiar with Ansible because, apart of getting more and more accepted for OSP-related tasks and installation, I wanted to automate some tasks we needed to setup some servers for the OpenStack group I work for.

First of all, it's recommended to get latest version of ansible (tested on RHEL7 and Fedora), but in order not to mess with the system python libraries, it's convenient to use python's virtual environments.

A virtual Environment allows to create a 'chroot'-like enviroment that might contain different library versions to the one installed with the system (but be careful as if it's not kept track as part of the usually system patching process, it might become a security concern).

virtualenvs

For creating a virtualenv, we require the package python-virtualenv installed on our system and executing virtualenv and a target folder, for example:

[iranzo@iranzo ~]$ virtualenv .venv
New python executable in /home/iranzo/.venv/bin/python2
Also creating executable in /home/iranzo/.venv/bin/python
Installing setuptools, pip, wheel...done.

From this point, we've a base virtualenv installed, but as we would like to install more packages inside we'll first need to 'enter' into it:

. .venv/bin/activate

And from there, we can list the available/installed packages:

[iranzo@iranzo ~]$ pip list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
appdirs (1.4.0)
packaging (16.8)
pip (9.0.1)
pyparsing (2.1.10)
setuptools (34.2.0)
six (1.10.0)
wheel (0.30.0a0)

Now, all packages we install using pip will get installed to this folder, leaving system libraries intact.

Once we finished, to return back to system's environment, we'll execute deactivate.

Pipsi

In order to simplify the management we can make use of pipsi which not only allows to install Python packages as we'll normally do with pip, but also, takes care of doing proper symlinks so the installed packages are available directly for execution.

If our distribution provides it, we can install pipsi on our system:

dnf -y install pipsi

But if not, we can use this workaround (for example, on RHEL7)

# Use pip to install pipsi on the system (should be minor change not affecting other software installed)
pip install pipsi

From this point, we can use pipsi to take care of installation and maintenance (can do upgrades, removal, etc) of our python packages.

For example, we can install ansible by executing:

pipsi install ansible

This might fail, as ansible, does some compiling and for doing so, it might require some development libraries on your system, have care of that to satisfy requirements for the packages.

Prepare for ansible utilization

At this point we've the ansible binary available for execution as pipsi did take care of setting up the required symlinks, etc

Ansible uses an inventory file (can be provided on command line) so it can connect to the hosts listed there and apply playbooks which define the different actions to perform.

This file, for example, can consist of just a simple list of hosts to connect to like:

192.168.1.1
192.168.1.2
myhostname.net

And for starting we create a simple playbook, for example a HW asset inventory:

---
- hosts: all
  user: root

  tasks:
    - name: Display inventory of host
      debug:
        msg: "{{ inventory_hostname }} | {{ ansible_default_ipv4.address }} | | | {{ ansible_memtotal_mb }} | | | {{ ansible_bios_date }}"

This will act on all hosts, as user root and will run a task which prints a debug message crafted based on the contents of some of the facts that ansible gathers on the execution host.

To run it is quite easy:

[iranzo@iranzo labs]$ ansible-playbook -i myhost.net, inventory.yaml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [myhost.net]

TASK [Display inventory of host] ***********************************************
ok: [myhost.net] => {
    "msg": "myhost.net | 192.168.1.1 | | | 14032 | | | 01/01/2011"
}

PLAY RECAP *********************************************************************
myhost.net             : ok=2    changed=0    unreachable=0    failed=0

This has connected to the target host, and returned a message with hostname, ip address, some empty fields, total memory and bios date.

This is a quite simple script, but for example, we can use ansible to deploy ansible binary on our target host using other modules available, in this case, for simplicity, we'll not be using pipsi for ansible installation.

---
- hosts: all
  user: root

  tasks:
    - name: Install git
      yum:
        name:
          - "git"
          - "python-virtualenv"
          - "openssl-devel"
        state: latest

    - name: Install virtualenv
      pip:
        virtualenv: "/root/infrared/.venv"
        name: pipsi

    - name: Upgrade virtualenv pip
      pip:
        virtualenv: "/root/infrared/.venv"
        name: pip
        extra_args: --upgrade

    - name: Upgrade virtualenv setuptools
      pip:
        virtualenv: "/root/infrared/.venv"
        name: setuptools
        extra_args: --upgrade

    - name: Install Ansible
      pip:
        virtualenv: "/root/infrared/.venv"
        name: ansible

At this point, the system should have ansible available from within the virtualenv we've created and should be avialble when executing:

# Activate python virtualenv
. .venv/bin/activate
# execute ansible
ansible-playbook -i hosts ansible.yaml

Have fun!

Click to read and post comments