Category: linux

  • Encrypt an existing Linux installation with zero downtime (LUKS on LVM)

    Encrypt an existing Linux installation with zero downtime (LUKS on LVM)

    During the bi-yearly review of my setup, I realized I was running a Linux machine without full disk encryption. The encryption of the disk needed to be done ASAP, but I was not willing to reinstall the whole operating system to achieve that.

    Solution? I came up with an interesting way to encrypt my existing Linux installation without reinstalling it. And with zero downtime too: while I was moving my data and encrypting it, I was still able to use my computer productively. In other words, the process works on the fly!

    Requirements

    There are three requirements for this guide:

    1. The Linux installation already lives in an unencrypted LVM setup
    2. Some space to store your data (in another partition or on an external disk) with equal or more capacity than the LVM partition you are trying to encrypt
    3. Do a backup of the hard drive and store it somewhere (another disk, NFS, S3… I suggest using Clonezilla for this purpose). And don’t forget to test your backup.

    Initial situation

    As a starting point, let’s visualize the partitions of my hard disk:

    The interesting part is the red one: the root volume of the Linux operating system. Windows is already encrypted with BitLocker, so these two partitions should not be touched.

    /boot will remain a separate partition for the time being (we will discuss it later).

    After we will be finished, the resulting hard disk layout is referenced as LVM on top of an encrypted LUKS encrypted partition.

    Install the required tools

    Since I am already using LVM, the only package I am missing is cryptsetup: find it in your distribution repositories and install it.

    Encryption of existing Linux LVM partition

    In a nutshell, what we are going to do in LVM terms:

    1. Add the external disk (/dev/sdb1 in my case) to the VG
    2. Move the PE from the internal disk PV to the external disk PV
    3. Remove the internal disk PV from the VG
    4. Create a LUKS encrypted container in the internal disk, a PV in it, and add the created PV to the VG
    5. Move the PE from the external disk PV to the internal disk PV
    6. Remove the external disk PV from the VG
    7. Configure the bootloader to access the encrypted PV

    In the following sections, we are going to describe every step in detail.

    1. Add the external disk (/dev/sdb1 in my case) to the volume group

    Let’s create a physical volume in the external disk and add it to the volume group (in my case this is called ‘system’):

    # pvcreate /dev/sdb1
    # vgextend system /dev/sdb1
    
    We did not lvresize the two LVs, so they kept the same size. Our data is still in /dev/sda5

    2. Move the physical extents from the internal disk physical volume to the external physical volume

    This is a time-consuming operation: we will transfer all the physical extents from the internal disk physical volume to the external disk physical volume:

    # pvmove /dev/sda5 /dev/sdb1
    

    The command will periodically output the percentage of completion.
    The speed of the process depends on multiple factors, overall others: hard disk transfer throughput and size of the data to move.

    3. Remove the internal disk physical volume from the volume group

    Now the physical volume in the internal disk is empty: we can remove it from the volume groups and remove the physical volume from it:

    # vgreduce system /dev/sda5
    # pvremove /dev/sda5
    

    Now our data is all on the /dev/sdb1 (external disk) PV

    4. Create a LUKS encrypted container in the internal disk, a physical volume in it, and add the created physical volume to the volume group

    Our data is completely stored on the physical volume that is in the external disk: we are halfway through.

    Let’s wipe the internal disk partition that was holding our unencrypted data:

    # cryptsetup open --type plain /dev/sda5 container --key-file /dev/urandom
    

    Now we need to create an encrypted LUKS container to hold the new internal disk PV.
    Different options can be selected, depending on the distribution you are running, the bootloader, and the version of cryptsetup you are using (e.g. LUKS2 works only with cryptsetup ≥ 2.1.0).

    I choose:

    • XTS cipher
    • 512 bits key size
    • LUKS1 (so I can remove the separate /boot partition later)

    A password will be asked (do not lose it):

    # cryptsetup -v --verify-passphrase -s 512 luksFormat /dev/sda5
    

    This password will be used for every subsequent mounting of the root volume (e.g. on boot, so choose carefully).

    Let’s now create a physical volume into the container and then add the physical volume to the volume group:

    # cryptsetup luksOpen /dev/sda5 dm_crypt-1
    # pvcreate /dev/mapper/dm_crypt-1
    # vgextend system /dev/mapper/dm_crypt-1
    

    5. Move the physical extents from the external disk physical volume to the internal disk physical volume

    We are now going to reverse the direction of the data flow: the physical volume in the internal disk is now ready to hold our data again.
    Let’s move the physical extents from the external disk PV to the internal disk physical volume. Again, this is a time-consuming operation that depends on the same factors outlined above:

    # pvmove /dev/sdb1 /dev/mapper/dm_crypt-1
    

    As stated before, the command will periodically output the percentage of completion.

    6. Remove the external disk physical volume from the volume group

    Our data is now entirely on the internal disk physical volume (this time encrypted, though). We need to remove the external disk physical volume from the volume group and remove the physical volume on it:

    # vgreduce system /dev/sdb1
    # pvremove /dev/sdb1
    

    Success! Our data is safely encrypted in the LUKS encrypted /dev/sda5 (internal disk) PV

    It is considered good practice to completely wipe /dev/sdb1 now, as it was containing our unencrypted data.

    7. Configure the bootloader to access the encrypted physical volume

    The final step is to inform the bootloader that the root file-system is now on an encrypted partition.
    Depending on your distribution, there are different ways to inform the bootloader.

    My distribution of choice (openSUSE) features GNU GRUB and initrd. In this case, the specific instructions are:

    • Create /etc/crypttab and insert the name of the encrypted LUKS container with the UUID of the partition on the disk (check which one with ls /dev/disk-by-uuid):
    # ls -l /dev/disk/by-uuid/ | grep sda5
    lrwxrwxrwx 1 root root 10 Nov 13 21:27 45a4cbf0-da55-443f-9f2d-70752b16de8d -> ../../sda5
    # echo "dm_crypt-1 UUID=45a4cbf0-da55-443f-9f2d-70752b16de8d" > /etc/crypttab
    
    • Regenerate initrd with:
    # mkinitrd
    • Reinstall GRUB with:
    grub2-mkconfig -o /boot/grub2/grub.cfg && grub2-install /dev/sda
    

    Success! /boot is still unencrypted, though.

    initrd will now ask at every boot the same password you used to create the LUKS container.

    Right now our root volume is encrypted, except for /boot which is left unencrypted. Leaving /boot unencrypted brings some benefits:

    • Unattended LUKS unlock via keyfile (stored, for example, in a USB key)
    • LUKS unlock via the network (authenticate via SSH to provide the LUKS password as implemented in dropbear-initramfs)

    One big drawback: having /boot unencrypted is vulnerable to the evil maid attack. But simple remediation can be put in place: let’s discover it in the next section.

    Optional: remove the separate /boot partition and achieve full disk encryption (FDE)

    Depending on your security model, on the bootloader you are using, on the LUKS version your container is using, it might be more secure to make /boot part of the encrypted volume.
    In my case, I decided that I wanted full disk encryption so I moved /boot into the encrypted volume.

    The idea here is to:

    • Create a copy of /boot into the LVM volume
    # cp -rav /boot /boot-new
    # unmount /boot
    # mv /boot /boot.old
    # mv /boot-new /boot
    
    • Remove the /boot partition from /etc/fstab:
    # grep -v /boot /etc/fstab > /etc/fstab.new && mv /etc/fstab.new /etc/fstab
    • Modify GRUB to load the boot loader from an encrypted partition:
    # echo "GRUB_ENABLE_CRYPTODISK=y" >>/etc/default/grub
      • Provision a keyfile to avoid typing the unlocking password twice.
        We are now in a particular situation: GRUB needs a password to unlock the second stage of the bootloader (we just enabled it). After the initrd has loaded, it needs the same password to mount the root device.

        To avoid typing the password twice, there is a handy explanation in the openSUSE Wiki: avoid typing the passphrase twice with full disk encryption.
        Be sure to follow all the steps.

      • Install the new bootloader:
    # grub2-mkconfig -o /boot/grub2/grub.cfg && grub2-install /dev/sda
    Full disk encryption: mission accomplished!

    Everything now is in place: all the data is encrypted at rest.
    Only one password will be asked: the password that you used to create the LUKS container. GRUB will ask it every time you boot the system, while initrd will use the keyfile and will not ask for it.

  • How a Terraform + Salt + Kubernetes GitOps infrastructure enabled a zero downtime hosting provider switch

    The switch

    It has been a busy weekend: I switched the hosting provider of my whole cloud infrastructure from DigitalOcean to Hetzner.
    If you are reading this it means that the switch is completed and you are being served by the Hetzner cloud.

    The interesting fact about the switch is that I managed to complete the transition from one hosting provider to another with zero downtime.

    The Phoenix Server

    One of the underlying pillars that contributed to this success story is the concept of Phoenix Server. In other words, at any moment in time, I could recreate the whole infrastructure in an automated way.

    How?

    • The resource infrastructure definition is declared using Terraform. By harnessing the Terraform Hetzner provider, I could simply terraform apply my infrastructure up.
    • The configuration definition is powered and makes use of Salt, versioned in Git.

    At some point in time, I made the big effort of translating all the configurations, the tweakings and the personalization I made to every part of the infrastructure and prepare a repository of Salt states that I kept updated.

    Two notable examples: I am picky about fail2ban and ssh.

    The result is that, after provisioning the infrastructure, I could configure every server exactly how I want it by simply applying the Salt highstate.

    • The application stack relies on containers: every application runs in its container to be portable and scalable. The orchestration is delegated to Kubernetes.

    After all the steps above were applied and I have an identical infrastructure running on Hetzner, the old infrastructure was still working and serving the users.

    DNS switching

    At this point, I had just prepared a specular environment running in Hetzner cloud. But this environment was not serving any client.

    Why?
    Let’s consider an example to explain the next step.

    This website, www.michelebologna.net, is one of the services running by the infrastructure.
    Each user was still resolving www.michelebologna.net using the old address: the old infrastructure was still serving it.

    To test the new infrastructure, I fiddled with my /etc/hosts and pointed www.michelebologna.net to the new reverse proxy IP (Note: this is required to bypass the load balancers): I verified it was working and that meant I was ready for the switch.

    The switch happened at the DNS level: I simply changed the CNAME for the www record from the old reverse proxy to the new one. Thanks to the proper naming scheme for servers I have been using, the switch was effortless.
    After the switch, I quickly opened a tail in the logs of the reverse proxy: as soon as the upstream DNSes were updating the record, users were accessing the website via Hetzner, success!

    Trivia: after 5.5 years, the old reverse proxy was shut down. In memory of it, its uptime records with an astonishing availability at 99.954%!

         #               Uptime | System                
    ----------------------------+-----------------------
         1   112 days, 18:33:34 | Linux 4.4.0-tuned
         2   104 days, 21:00:22 | Linux 4.15.0-generic
         3    85 days, 19:08:32 | Linux 3.13.0-generic
         4    78 days, 19:04:49 | Linux 4.4.0-tuned
         5    71 days, 13:01:09 | Linux 4.13.0-lowlaten
         6    66 days, 04:42:44 | Linux 4.15.0-generic
         7    62 days, 15:49:14 | Linux 3.19.0-generic
         8    62 days, 00:52:09 | Linux 4.15.0-generic
         9    56 days, 22:21:20 | Linux 3.19.0-generic
        10    53 days, 16:34:11 | Linux 4.2.0-highmem
    ----------------------------+-----------------------
        up  1989 days, 03:46:34 | since Tue Oct 28 14:28:05 2014
      down     0 days, 22:00:33 | since Tue Oct 28 14:28:05 2014
       %up               99.954 | since Tue Oct 28 14:28:05 2014
    

    After updating the DNS records for all other services, I was still checking if any service was still being accessed using the old infrastructure. After some days with minimal activity in the old infrastructure, I decided to destroy the old infrastructure.

    Caveats with DNS

    There are some things that I learned while doing these kinds of transitions. Or maybe, that I learned last time but I did not write down, and I am using this space as a reminder for the next time.

    • A DNS wildcard record (*.michelebologna.net) that gets resolved to a hostname (a catch-all record) can generate weird results if you are running a machine that has search michelebologna.net in its resolv.conf
    • Good hosting providers offer the ability to set a reverse DNS for every floating or static IP address for every cloud instance. A reverse DNS must reflect the mail server hostname (in Postfix)
    • With email hosting, set up DKIM and publish SPF, DKIM, and DMARC records in the DNS
    • The root record (@) must not be a CNAME record, but it must be an A/AAAA record
  • TLS-terminated Bitlbee with custom protocols

    Five years ago I started a small GitHub project aimed to run Bitlbee seamlessly in a container.

    Why Bitlbee?

    Back in the day, I was relying heavily on IRC for my daily communications and the plethora of other protocols that were starting to get traction was too much: I wanted to have a bridge between my IRC client and the other protocols to be able to communicate only by using my IRC client without installing any resource consuming monster (enough said).

    Bitlbee was and still is the perfect tool to implement that bridge: every protocol is consumable via IRC, provided that a Bitlbee server has been set up and a bridge between Bitlbee and the protocol is available and installed into the Bitlbee server.

    I decided to roll my server of Bitlbee running in a Docker container, and I decided to integrate into the build a list of custom protocols that were available as plugins for Bitlbee. By packaging everything into a container, running a ready to use Bitlbee server with custom protocols was only a docker pull away.

    The container, called docker-bitlbee and published to Docker Hub, started to get traction (who wants to compile all the plugins nowadays?) and in 2018 I reached 100k downloads on Docker Hub.
    It is also the first result for the SERP “docker bitlbee” on DuckDuckGo and Google.

    With time, contributors started to submit pull requests to enable new custom protocols, reporting problems and asking for new features.

    Now the container has been downloaded more than 500k times on Docker Hub and I am still using it in my infrastructure to access some protocols over IRC (a notable example: Gitter).

    The latest feature that I just added, based on a user request, is TLS termination to Bitlbee via stunnel. There has been some constructive discussion, and I am glad that the community is supportive and confrontational.

    So far, I am very proud of the work that contributed to this side project.

  • Startup order in Docker containers

    Startup order in Docker containers

    Motivation

    I recently dealt with an application that is comprised of multiple services running in containers. Even though every part of this application is correctly split into each separated microservice, the independence of each service is not enforced.
    This lack of independence has several drawbacks, one of which is that containers must be started by following a pre-defined startup order. Otherwise, some containers might be terminated due to an application error (the application breaks when an unexpected error occurs, e.g. it is relying on another linked service that is not ready to accept the connection).

    Not all applications suffer from this kind of problem: the application I was dealing with was not born with microservices in mind, but it was rather split and converted to separate containers across its lifetime. But it is not the only application that has this particular limit, for sure other applications out there are converted into a Franken-microservice-stein “monster”.

    Workarounds

    I am going to explore what are the possible workarounds to define and follow a startup order when launching containerized applications that span across multiple containers.

    Depending on the scenario, it is possible that we do not want (or we cannot) change the containers and the application itself: there are multiple reasons behind these factors, namely:

    • the complexity of the application
    • whether the sources are available
    • if changes to the Dockerfiles are possible (especially ENTRYPOINTs)
    • the time required to change the architecture of the application

    docker-compose and healthcheck

    Using docker-compose, we can specify:

    • a healthcheck: it specifies what is the test (command) to check if the container is working. The test is executed at intervals (interval) and retried retries times:
    db:
      image: my-db-image
      container_name: db-management
      ports:
        - 31337:31337
      healthcheck:
        test: ["CMD", "curl", "-fk", "https://localhost:31337"]
        interval: 300s
        timeout: 400s
        retries: 10
    
    • a depends_on field to describe to start the container after the dependency has been started and a restart_on_failure:
    web:
      image: my-web-image
      restart: on-failure
      depends_on:
        - db
      links:
        - db
    

    What is happening here?

    • docker-compose starts the service and starts the db container first (the web one depends on it)
    • the web container is started shortly after (it does not wait for db to be ready, because it does not know what “ready” means for us). Until the db container is ready to accept connections, the web container will be restarted (restart: on-failure).
    • the db service is marked as healthy as soon as curl -fk https://localhost:31337 returns 0 (the db-management image ships with an HTTP controller, and it returns 0 only when the database is ready to accept the connections). Marking the service is healthy means that service is working as expected (because the test is returning what we are expecting). When the service is no longer healthy, the container must be restarted and other policies and actions might be introduced.

    NOTE: in docker-compose reference < 3, depends_on could also wait for the health checks, but starting from docker-compose reference specification version 3, depends_on can only accept other services as parameters in docker-compose.

    This solution is not ideal, as the web container is restarted until the dependency is satisfied: that can be a huge problem if we are using that container for running tests, as a container exiting because of failure can be assimilated as failed tests.

    wait-for-it wrapper script

    This approach is slightly better than the previous, but it is still a workaround. We are going to use docker-compose and the wait-for-it script.
    In the docker-compose.yml file we insert a depends_on (as described in the previous section) and a command:

    db:
     container_name: db-management
      ports:
        - 31337:31337
      healthcheck:
        test: ["CMD", "curl", "-fk", "https://localhost:31337"]
        interval: 300s
        timeout: 400s
        retries: 10
    
    web:
      image: my-web-image
      depends_on:
        - db
      links:
        - db
      command: ["./wait-for-it.sh", "db:31337", "--", "./webapp"]
    

    The wait-for-it script waits for host:port to be open (TCP only). Again, this does not guarantee that the application is ready to serve but, compared to the previous workaround, we are not restarting the web container until its dependency is ready.
    One drawback of this workaround is that it is invasive: it requires the container image to be rebuilt by adding the wait-for-it script (you can use a multi-stage build to do so).

    Re-architect the application

    This is not a workaround but it is rather the solution, and the best one we can achieve. It takes effort and it might cost a lot: the application architecture needs to be modified to make it resilient against failures. There are no general guidelines on how to successfully re-architect an application to be failproof and microservice ready, even though I strongly suggest to follow the 12 guidelines expressed in the 12-factor applications website.

  • On servers timezone and tmux

    A while ago I was fighting with a timezone set on a server because of the daylight saving time kicked in: during the ghost hour I had troubles with finding automated jobs. Moreover, the server was located overseas and depending on when I was checking the remote date and time, I could get a different time delta.

    Then, the quasi-philosophical question about “which timezone should be set for a remote server: my timezone or local timezone to the server?” has began rolling in my mind.

    After some research, I found a piece of technical advice from Yeller. In short, their advice can be summarized with:

    Use UTC
    Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC. Use UTC.

    (no, really check the linked post: it is full of good and agreeable technical points to use UTC).

    After setting the default timezone for all my servers to UTC, there are some tweaks to live happily ever after with UTC.

    First, add your timezone and export it into the TZ variable:

    echo 'export TZ=/usr/share/zoneinfo/Europe/Rome' >> ~/.zshrc

    This brings a notable advantages:

    • without TZ set:

    % date
    Mon Mar 25 20:56:43 2019
    journalctl -f
    [...]
    Mar 25 20:57:51 golf systemd-logind[1154]: Session 980 logged out. Waiting for processes to exit.

    • with TZ set you get every message* localized in the selected timezone:

    % date
    Mon Mar 25 21:57:53 CET 2019
    journalctl -f
    [...]
    Mar 25 21:57:51 golf systemd-logind[1154]: Session 980 logged out. Waiting for processes to exit.

    * = every message from a sane and decently written program that knows about timezones and honors the TZ variable.

    Secondly, I usually have everything running in a tmux session with the time in the tab bar. After changing the server timezone to UTC, tmux was outputting the time in UTC: I wanted to show my local time as well. In order to show localized time, you have to change some parameters:

    • Output the time and the timezone in the tab bar:

    In ~/.tmux.conf:
    set -g status-right '%a %b %d %H:%M %Z'

    • Make sure to send your TZ variable whenever you are using SSH:

    In ~/.ssh/config:
    Host *
    [...]
    SendEnv TZ

    • Make sure that your SSH server automatically accepts the TZ variable:

    In /etc/ssh/sshd_config
    AcceptEnv [...] TZ

    Restart your sshd service and try to login in the remote server. Your tmux tab bar should show the updated time in your localized timezone, while still using UTC as the global timezone for the server.

  • Automatic (or unattended) upgrades in openSUSE, CentOS and Fedora, Debian and Ubuntu

    Each one of us is a system administrator: for at least your workstation (or notebook) you can decide when and how to administrate it. In the special case in which you are being elected to administer servers too, the matter becomes thorny: what is the workflow in terms of patching, time of reaction to security issues and, in general, when and how to install updates?

    Some distributions offer the concept of automatic (or unattended) upgrades: install automatically a subset (or all) the available updates via the package manager. This particular subset can be specified by the system administrator, a notable example would be the subset of security updates.

    The approach is, of course, debatable: should you use it for a critical server? What happens if the upgrade goes south? Would this approach scale?

    The answer is, nevertheless, debatable: it depends. You are not required to use automatic updates, but installing security patches automatically might make sense in some non-mission-critical situations. You can read an opinionated list of reasons to use automatic updates, as well as an equally opinionated list of reasons NOT to use automatic updates.

    In this post, I am going to present the three approaches for automatic updates offered in:

    • openSUSE
    • CentOS and Fedora
    • Debian and Ubuntu

    and how I setup them for my own “very special, do not try this at home” situation, which means that servers always install only security updates automatically.

    openSUSE

    openSUSE can schedule automatic updates via Automatic Online Update.

    Take a look at the documentation: everything is already well documented, you just need to the package with:

    # zypper install yast2-online-update-configuration

    and then, to configure it:

    # yast2 online_update_configuration

    The servers must weekly check and install only security updates automatically (category “Security”), except the ones declared as “Interactive”. From the documentation:

    Sometimes patches may require the attention of the administrator, for example when restarting critical services. For example, this might be an update for Docker Open Source Engine that requires all containers to be restarted. Before these patches are installed, the user is informed about the consequences and is asked to confirm the installation of the patch. Such patches are called “Interactive Patches”.
    When installing patches automatically, it is assumed that you have accepted the installation of interactive patches. If you rather prefer to review these patches before they get installed, select Skip Interactive Patches. In this case, interactive patches will be skipped during automated patching. Make sure to periodically run a manual online update, to check whether interactive patches are waiting to be installed.

    Skipping interactive patches absolutely makes sense to me, as well as using delta RPMs (to save bandwidth), auto-agreeing with licensing and including recommended packages.

    Update: Richard reminded me that if you are running Leap or Tumbleweed with transactional updates, you can take advantage of automatic transactional updates; rebootmgr will take care of automatically reboot the machine in case any transactional updates were installed.

    CentOS version <= 7

    The package that enables automatic updates is called yum-cron. To install it:

    # yum -y install yum-cron

    The configuration file (/etc/yum/yum-cron.conf) is self-documenting: just open it in an editor and begin tweaking. In my case, to check and install only security updates I just changed the following two lines:

    update_cmd = security
    apply_updates = yes

    Finally, make sure that the corresponding service is enabled:

    # systemctl start yum-cron.service

    Fedora and CentOS version >= 8

    Fedora automatic updates are enabled by installing the dnf-automatic package:

    # dnf install -y dnf-automatic

    As with CentOS, I just changed the configuration file (/etc/dnf/automatic.conf) to install security updates only:

    upgrade_type = security

    After the configuration, start the service:

    # systemctl enable --now dnf-automatic.timer

    Debian and Ubuntu

    Debian and Ubuntu make use of the unattended-upgrades package in order to enable automatic updates. Let’s begin with installing it:

    # apt install unattended-upgrades

    It is configuration time: make sure to enable the update of package lists and perform the upgrade in /etc/apt/apt.conf.d/20auto-upgrades:

    APT::Periodic::Update-Package-Lists "1";
    APT::Periodic::Unattended-Upgrade "1";

    Now enable the repository from which updates can be installed in /etc/apt/apt.conf.d/50unattended-upgrades; in our case, only the security repository:

    Unattended-Upgrade::Origins-Pattern {
            "origin=Debian,codename=${distro_codename},label=Debian-Security";
    };

    Conclusions

    Every distribution offers then its own tweaks (like email notifications when updates are ready and when are installed), package exclusions based on package names, install updates at shutdown time and whatnot: be sure to read the documentation! The examples are just a starting point.

    Happy automatic patching!

  • Send an email from a Docker container through an external MTA with ssmtp

    Send an email from a Docker container through an external MTA with ssmtp

    I packaged a standard application (think of it as a standard PHP or <insert your preferred framework here>) into a Docker container. So far, it was working flawlessly, but then a problem arose: send an email from the Docker container (the event is triggered within the container).

    As you may know, a good Docker container is a container with only one process running: the naive solution for our case would be to have, in addition to having our PHP process running, another process to manage the email interexchange (an MTA, i.e. Postfix). As we are following the best practices for Docker containers, this path is discouraged.

    There are many solutions to this problem.

    The common ground for all of the solutions is to rely on ssmtp when sending emails from the container. ssmtp is a simple relayer to deliver local emails to a remote mailhub that will take care of delivering the emails.

    Provided that the container distribution ships ssmtp, the installation is straightforward: just add the package during the install phase of the Dockerfile. ssmtp must be configured to relay every email an SMTP host, e.g.:

    # cat /etc/ssmtp/ssmtp.conf
    
    # The user that gets all the mails (UID < 1000, usually the admin)
    root=postmaster
    
    # The place where the mail goes. The actual machine name is required
    # no MX records are consulted. Commonly mailhosts are named mail.domain.com
    # The example will fit if you are in domain.com and you mailhub is so named.
    
    # Use SSL/TLS before starting negotiation
    UseTLS=Yes
    UseSTARTTLS=Yes
    
    # Fill the following with your credentials (if requested) 
    AuthUser=postmaster@mycompany.biz
    AuthPass=supersecretpassword
    
    # Change or uncomment the following only if you know what you are doing
    
    # Where will the mail seem to come from?
    # rewriteDomain=localhost
    # The full hostname
    # hostname="localhost"
    # The address where the mail appears to come from for user authentication.
    # rewriteDomain=localhost
    # Email 'From header's can override the default domain?
    # FromLineOverride=yes

    All the three solutions that I am going to illustrate rely on having a custom mailhub that must be configured accordingly.

    Let’s review each solution.

    An external SMTP relay host

    If an external SMTP relay host is available, the solution is to point mailhub option of ssmtp to the external SMTP host.

    Another container running the MTA

    The proper way to solving this problem would be to run a Docker container just for the MTA itself (personal preference: Postfix). One caveat of this solution: some Linux distributions come with an MTA running out of the box. If the container host is already running an MTA, the container cannot publish the port 25/tcp from the Postfix container [the address is already in use by the MTA running on the host].

    By searching on GitHub, a promising and an up-to-date container is the eea.docker.postfix. After you deploy the Postfix container, link every container that needs an MTA to it. E.g.

    # docker run --link=postfix-container my-awesome-app-that-needs-an-mta 

    The container must configure ssmtp to use postfix-container(or the name defined as the link) in the mailhub option in ssmtp.conf.

    Relying on the host MTA

    Premise: the Docker daemon exposes an adapter to all the containers running on the same host. This adapter is usually named as the docker0 interface:

    # ip a show docker0
    5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
        link/ether 11:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff
        inet 172.17.42.1/16 brd 172.17.255.255 scope global docker0
           valid_lft forever preferred_lft forever
        inet6 fe80::11:22ff:fff0:3344/64 scope link 
           valid_lft forever preferred_lft forever

    If the host MTA is listening on the docker0 interface, then the containers can relay email to the host MTA. There is not an extra configuration on the container itself, just configure ssmtp to use the docker0 IP as the mailhub.

    EXTRA: HOW TO CONFIGURE POSTFIX TO LISTEN ON DOCKER INTERFACE (LIKE DOCKER0) AS WELL

    To use the solution described above, the MTA on the host must be configured to listen on the docker0 inteface as well. In case that the MTA in case is Postfix, the configuration is straightforward:

    On the host, open /etc/postfix/main.cf and add the docker0 IP to the inet_interfaces option and add the subnetwork block range of the containers that need to use the host MTA to the mynetwork option:

    # cat /etc/postfix/main.cf 
    
    [...]
    inet_interfaces = 127.0.0.1, 172.17.42.1 
    mynetworks = 127.0.0.0/8 172.17.42.0/24
    [...]

    If Postfix is set to be started at boot by systemd, we need to take care of the dependency: Docker daemon must be started before the Postfix daemon, as Postfix needs to bind on the docker0 IP address.

    In order to express this dependency, and luckily for us, systemd already ships with a service that detects when an interface is up:

    # systemctl | grep docker0
      sys-devices-virtual-net-docker0.device                                                                      loaded active plugged   /sys/devices/virtual/net/docker0

    Postfix must be started after the docker0 inteface has been brought up, and to express the dependency we must override Postfix’s service units (this may vary based on the host distribution):

    # systemctl | grep postfix
      postfix.service                                                                                             loaded active exited    Postfix Mail Transport Agent                                                                          
      postfix@-.service                                                                                           loaded active running   Postfix Mail Transport Agent (instance -)    

    in this case it is enough to override only the Postfix instance service with:

    # systemctl edit postfix@-.service

    Override the unit service file by declaring the dependency explicitely:

    [Unit]
    Requires=sys-devices-virtual-net-docker0.device
    After=sys-devices-virtual-net-docker0.device

    Reload systemd with systemctl daemon-reload and restart Postfix with systemctl restart postfix.

    Relying on the host MTA by using host network driver on Docker

    When a container is set to use host networking interface, the container can access the host networking and thus its services. If the container host already has an MTA configured, then the containers can use it by just pointing to localhost.The syntax to use host networking interface for the application that needs to use the host MTA is:

    # docker run --net=host my-awesome-app-that-needs-an-mta

    To configure ssmtp, just point the mailhub to localhost.

    NOTE: Using the host networking interface has obviously security drawbacks, because containers do not have their networking containerized by Docker but rather rely on the host networking; this can guarantee to the Docker container to have access to the whole networking stack (in read-only mode) and open low-numbered ports like any other root process. Use this networking option by carefully weigh pro and cons.

  • Linux: using bind mount to move a subset of root subdirectories to another partion or disk

    I was in the situation dealing with a Linux box with two hard disks:

    • /dev/sda: fast hard drive (SSD), small size (~200 GB)
    • /dev/sdb: very big hard drive (HDD), large size (~4 TB)

    The operating system was installed on /dev/sda, so I had /dev/sdb empty. I knew I could create a mount point (e.g. /storage) and mount it to /dev/sdb, but after reading Intelligent partitioning and the recommended Debian partitioning scheme I thought about moving:

    • /var
    • /home
    • /tmp

    to the big hard drive /dev/sdb

    The process described here is completely different from just putting a mount point to a partition in /etc/fstab: in our solution, we will use one disk (or one partition) to store multiple root subdirectories (/var, /home, /tmp). With the fstab “usual” method, you put one subdirectory in one disk or partition or volume.

    The solution to this problem is a bind mount: the three original directories will exist in the root disk (/dev/sda) but they will be empty. Those directories will live into the second disk (/dev/sdb) and, upon mounting, a bind will be created between the root filesystem and the directories in the second disk.

    The process is easy:

    1. Backup your data
    2. Boot from a live distribution (e.g. KNOPPIX)
    3. Mount your hard drives:mkdir /mnt/sd{a,b}1
      mount /dev/sda1 /mnt/sda1
      mount /dev/sdb1 /mnt/sdb1
    4. Copy the directories from sda to sdb:cp -ax /mnt/sda1/{home,tmp,var} /mnt/sdb1/
    5. Rename the old directories you just copied and create the new mount points:mv /mnt/sda1/home /mnt/sda1/home.old
      mv /mnt/sda1/tmp /mnt/sda1/tmp.old
      mv /mnt/sda1/var /mnt/sda1/var.old
      mkdir /mnt/sda1/{home,tmp,var}
    6. Update your fstab with the new locations:Mount the second hard drive:

      /dev/sdb1 /mnt/sdb1 ext4 defaults 0 2

      Then create the bind mounts for the 3 subdirectories you moved:

      /mnt/sdb1/home /home none defaults,bind 0 0
      /mnt/sdb1/tmp /tmp none defaults,bind 0 0
      /mnt/sdb1/var /var none defaults,bind 0 0

    7. umount your hard drives and reboot
    8. Check that everything under /home, /var and /tmp is working as expected. You may also want to clean up and delete /home.old, /var.old, and /tmp.old.

    This process can be repeated for any subdirectory you want to move (except, obviously, /boot).

    Closing notes: if you are brave enough:

    • you are not required to boot from a live distribution, just boot into single-user mode (adapt the paths of the following guide though!)
    • you can also skip booting into single user mode if you are using LVM: just create a new logical volume and copying subdirectories into it
  • Automatically add SSH keys to SSH agent with GNOME and macOS

    Automatically add SSH keys to SSH agent with GNOME and macOS

    I am using passwordless login via SSH on every box that I administer.
    Of course, my private SSH key is protected with a password that must be provided when accessing the key.
    Modern operating systems incorporate the usage of ssh-agent to “link” the user account to the SSH key(s), in order to unlock the SSH key as soon as the user is logged in. In this way, they avoid nagging the user asking for the SSH key password every time the key needs to be used.
    In my case, I am running GNU/Linux with GNOME and macOS:

    • GNOME, via its Keyring, supports the automatic unlocking of SSH keys upon user login. Starting from GNOME 3.28, ed25519 keys are supported as well as RSA keys (I do not other use any other type of SSH keys). To add your keys, just invoke ssh-add and supply your key path:
    ssh-add ~/.ssh/[your-private-key]
    

    you will be asked for your SSH key password. It will be put in the GNOME Keyring (remember it if you update your SSH password!).

    • macOS supports associating your SSH key password into the Keychain. You can add your key(s) with:
    ssh-add -K ~/.ssh/[your-private-key]
    

    Starting from Sierra, though, you need to change your ~/.ssh/config to persist the key between reboots and add:

    Host *
      UseKeychain yes
      AddKeysToAgent yes
      IdentityFile ~/.ssh/[your-private-key-rsa]
      IdentityFile ~/.ssh/[your-private-key-ed25519]
    

    Now, if you share the same ~/.ssh/config file between GNU/Linux and macOS you would encounter an error: how ssh on Linux is supposed to know about UseKeychain option (which is compiled only in macOS’ ssh)?
    A special instruction, IgnoreUnkown, comes to the rescue:

    IgnoreUnknown UseKeychain
    UseKeychain yes
    

    Eventually, my ~/.ssh/config looks like:

    Host *
      IgnoreUnknown UseKeychain
      UseKeychain yes
      AddKeysToAgent yes
      IdentityFile ~/.ssh/id_rsa
      IdentityFile ~/.ssh/id_ed25519
      Compression yes
      ControlMaster auto
    [...]
    
  • Accessing remote libvirt on a non-standard SSH port via virt-manager

    Scenario: you are using a remote host as a virtualization host with libvirt and you want to manage it via ”Virtual machine manager” (virt-manager) over SSH.

    But SSH is listening on a non-standard port, and virt-manager does not offer you to connect to a remote libvirt instance on a non-standard port.

    Fear not, the option to connect to your libvirt command is just this command away:

    virt-manager -c 'qemu+ssh://root@:/system?socket=/var/run/libvirt/libvirt-sock'
    

    (make sure to have passwordless login to the remote host already setup, for example with SSH keys).

    Plus note: you can install virt-manager even on macOS (obviously only remote support) with homebrew-virt-manager