Commit 914889d0 authored by Karthik Basavaraj's avatar Karthik Basavaraj
Browse files

Fixes an issue with watchdog where it sometimes fails to restart stunnels in...

Fixes an issue with watchdog where it sometimes fails to restart stunnels in efs-csi-driver container
* Fixes an issue where fs cannot be mounted with tls using systemd.automount-units due to mountpoint check
parent c1496234
...@@ -11,7 +11,7 @@ set -ex ...@@ -11,7 +11,7 @@ set -ex
BASE_DIR=$(pwd) BASE_DIR=$(pwd)
BUILD_ROOT=${BASE_DIR}/build/debbuild BUILD_ROOT=${BASE_DIR}/build/debbuild
VERSION=1.25-3 VERSION=1.26.2
DEB_SYSTEM_RELEASE_PATH=/etc/os-release DEB_SYSTEM_RELEASE_PATH=/etc/os-release
UBUNTU18_REGEX="Ubuntu 18" UBUNTU18_REGEX="Ubuntu 18"
DEBIAN11_REGEX="Debian GNU/Linux bullseye" DEBIAN11_REGEX="Debian GNU/Linux bullseye"
......
...@@ -7,5 +7,5 @@ ...@@ -7,5 +7,5 @@
# #
[global] [global]
version=1.25 version=1.26
release=3 release=2
Package: amazon-efs-utils Package: amazon-efs-utils
Architecture: all Architecture: all
Version: 1.25-3 Version: 1.26.2
Section: utils Section: utils
Depends: python|python2, nfs-common, stunnel4 (>= 4.56), openssl (>= 1.0.2), util-linux Depends: python|python2, nfs-common, stunnel4 (>= 4.56), openssl (>= 1.0.2), util-linux
Priority: optional Priority: optional
......
...@@ -26,8 +26,8 @@ ...@@ -26,8 +26,8 @@
%endif %endif
Name : amazon-efs-utils Name : amazon-efs-utils
Version : 1.25 Version : 1.26
Release : 3%{?dist} Release : 2%{?dist}
Summary : This package provides utilities for simplifying the use of EFS file systems Summary : This package provides utilities for simplifying the use of EFS file systems
Group : Amazon/Tools Group : Amazon/Tools
...@@ -126,33 +126,37 @@ fi ...@@ -126,33 +126,37 @@ fi
%clean %clean
%changelog %changelog
* Tue May 26 2020 Yuan Gao <ygaochn@amazon.com> - 1.25-3 * Tue Jun 16 2020 Karthik Basavaraj <kbbasav@amazon.com> - 1.26.2
- Clean up stunnel PIDs in state files persisted by previous efs-csi-driver to ensure watchdog spawns a new stunnel after driver restarts.
- Fix an issue where fs cannot be mounted with tls using systemd.automount-units due to mountpoint check
* Tue May 26 2020 Yuan Gao <ygaochn@amazon.com> - 1.25.3
- Fix an issue where subprocess was not killed successfully - Fix an issue where subprocess was not killed successfully
- Stop emitting unrecognized init system supervisord if the watchdog daemon has already been launched by supervisor - Stop emitting unrecognized init system supervisord if the watchdog daemon has already been launched by supervisor
- Support Fedora - Support Fedora
- Check if mountpoint is already mounted beforehand for tls mount - Check if mountpoint is already mounted beforehand for tls mount
* Tue May 05 2020 Yuan Gao <ygaochn@amazon.com> - 1.25-2 * Tue May 05 2020 Yuan Gao <ygaochn@amazon.com> - 1.25.2
- Fix the issue that IAM role name format is not correctly encoded in python3 - Fix the issue that IAM role name format is not correctly encoded in python3
- Add optional override for stunnel debug log output location - Add optional override for stunnel debug log output location
* Mon Apr 20 2020 Yuan Gao <ygaochn@amazon.com> - 1.25-1 * Mon Apr 20 2020 Yuan Gao <ygaochn@amazon.com> - 1.25.1
- Create self-signed certificate for tls-only mount - Create self-signed certificate for tls-only mount
* Tue Apr 7 2020 Yuan Gao <ygaochn@amazon.com> - 1.24-4 * Tue Apr 7 2020 Yuan Gao <ygaochn@amazon.com> - 1.24.4
- Fix the malformed certificate info - Fix the malformed certificate info
* Fri Mar 27 2020 Yuan Gao <ygaochn@amazon.com> - 1.24-3 * Fri Mar 27 2020 Yuan Gao <ygaochn@amazon.com> - 1.24.3
- Use IMDSv1 by default, and use IMDSv2 where required - Use IMDSv1 by default, and use IMDSv2 where required
* Tue Mar 10 2020 Yuan Gao <ygaochn@amazon.com> - 1.24-2 * Tue Mar 10 2020 Yuan Gao <ygaochn@amazon.com> - 1.24.2
- List which as dependency - List which as dependency
* Tue Mar 10 2020 Yuan Gao <ygaochn@amazon.com> - 1.24-1 * Tue Mar 10 2020 Yuan Gao <ygaochn@amazon.com> - 1.24.1
- Enable efs-utils to source region from config file for sigv4 auth - Enable efs-utils to source region from config file for sigv4 auth
- Fix the issue that stunnel bin exec cannot be found in certain linux distributions - Fix the issue that stunnel bin exec cannot be found in certain linux distributions
* Tue Mar 03 2020 Yuan Gao <ygaochn@amazon.com> - 1.23-2 * Tue Mar 03 2020 Yuan Gao <ygaochn@amazon.com> - 1.23.2
- Support new option: netns, enable file system to mount in given network namespace - Support new option: netns, enable file system to mount in given network namespace
- Support new option: awscredsuri, enable sourcing iam authorization from aws credentials relative uri - Support new option: awscredsuri, enable sourcing iam authorization from aws credentials relative uri
- List openssl and util-linux as package dependency for IAM/AP authorization and command nsenter to mount file system to given network namespace - List openssl and util-linux as package dependency for IAM/AP authorization and command nsenter to mount file system to given network namespace
...@@ -68,7 +68,7 @@ except ImportError: ...@@ -68,7 +68,7 @@ except ImportError:
from urllib.error import URLError, HTTPError from urllib.error import URLError, HTTPError
VERSION = '1.25-3' VERSION = '1.26.2'
SERVICE = 'elasticfilesystem' SERVICE = 'elasticfilesystem'
CONFIG_FILE = '/etc/amazon/efs/efs-utils.conf' CONFIG_FILE = '/etc/amazon/efs/efs-utils.conf'
...@@ -1489,8 +1489,15 @@ def match_device(config, device): ...@@ -1489,8 +1489,15 @@ def match_device(config, device):
% (remote, 'https://docs.aws.amazon.com/efs/latest/ug/mounting-fs-mount-cmd-dns-name.html')) % (remote, 'https://docs.aws.amazon.com/efs/latest/ug/mounting-fs-mount-cmd-dns-name.html'))
def is_nfs_mount(mountpoint):
cmd = ['stat', '-f', '-L', '-c', '%T', mountpoint]
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
output, _ = p.communicate()
return output and 'nfs' in str(output)
def mount_tls(config, init_system, dns_name, path, fs_id, mountpoint, options): def mount_tls(config, init_system, dns_name, path, fs_id, mountpoint, options):
if os.path.ismount(mountpoint): if os.path.ismount(mountpoint) and is_nfs_mount(mountpoint):
sys.stdout.write("%s is already mounted, please run 'mount' command to verify\n" % mountpoint) sys.stdout.write("%s is already mounted, please run 'mount' command to verify\n" % mountpoint)
logging.warn("%s is already mounted, mount aborted" % mountpoint) logging.warn("%s is already mounted, mount aborted" % mountpoint)
return return
......
...@@ -45,7 +45,7 @@ except ImportError: ...@@ -45,7 +45,7 @@ except ImportError:
from urllib.error import URLError from urllib.error import URLError
from urllib.request import urlopen from urllib.request import urlopen
VERSION = '1.25-3' VERSION = '1.26.2'
SERVICE = 'elasticfilesystem' SERVICE = 'elasticfilesystem'
CONFIG_FILE = '/etc/amazon/efs/efs-utils.conf' CONFIG_FILE = '/etc/amazon/efs/efs-utils.conf'
...@@ -454,7 +454,12 @@ def check_efs_mounts(config, child_procs, unmount_grace_period_sec, state_file_d ...@@ -454,7 +454,12 @@ def check_efs_mounts(config, child_procs, unmount_grace_period_sec, state_file_d
logging.exception('Unable to parse json in %s', state_file_path) logging.exception('Unable to parse json in %s', state_file_path)
continue continue
is_running = is_pid_running(state['pid']) try:
pid = state['pid']
is_running = is_pid_running(pid)
except KeyError:
logging.debug('Did not find PID in state file. Assuming stunnel is not running')
is_running = False
current_time = time.time() current_time = time.time()
if 'unmount_time' in state: if 'unmount_time' in state:
...@@ -969,6 +974,49 @@ def get_utc_now(): ...@@ -969,6 +974,49 @@ def get_utc_now():
return datetime.utcnow() return datetime.utcnow()
def check_process_name(pid):
cmd = ['cat', '/proc/{pid}/cmdline'.format(pid=pid)]
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
return p.communicate()[0]
def clean_up_previous_stunnel_pids(state_file_dir=STATE_FILE_DIR):
"""
Cleans up stunnel pids created by mount watchdog spawned by a previous efs-csi-driver after driver restart, upgrade
or crash. This method attempts to clean PIDs from persisted state files after efs-csi-driver restart to
ensure watchdog creates a new stunnel.
"""
state_files = get_state_files(state_file_dir)
logging.debug('Persisted state files in "%s": %s', state_file_dir, list(state_files.values()))
for state_file in state_files.values():
state_file_path = os.path.join(state_file_dir, state_file)
with open(state_file_path) as f:
try:
state = json.load(f)
except ValueError:
logging.exception('Unable to parse json in %s', state_file_path)
continue
try:
pid = state['pid']
except KeyError:
logging.debug('No PID found in state file %s', state_file)
continue
out = check_process_name(pid)
if out and 'stunnel' in str(out):
logging.debug('PID %s in state file %s is active. Skipping clean up', pid, state_file)
continue
state.pop('pid')
logging.debug('Cleaning up pid %s in state file %s', pid, state_file)
rewrite_state_file(state, state_file_dir, state_file)
def main(): def main():
parse_arguments() parse_arguments()
assert_root() assert_root()
...@@ -983,6 +1031,8 @@ def main(): ...@@ -983,6 +1031,8 @@ def main():
poll_interval_sec = config.getint(CONFIG_SECTION, 'poll_interval_sec') poll_interval_sec = config.getint(CONFIG_SECTION, 'poll_interval_sec')
unmount_grace_period_sec = config.getint(CONFIG_SECTION, 'unmount_grace_period_sec') unmount_grace_period_sec = config.getint(CONFIG_SECTION, 'unmount_grace_period_sec')
clean_up_previous_stunnel_pids()
while True: while True:
config = read_config() config = read_config()
check_efs_mounts(config, child_procs, unmount_grace_period_sec) check_efs_mounts(config, child_procs, unmount_grace_period_sec)
......
...@@ -52,7 +52,7 @@ def test_changelog_version_match(): ...@@ -52,7 +52,7 @@ def test_changelog_version_match():
def get_expected_version_release(): def get_expected_version_release():
global_version = get_global_value('version') global_version = get_global_value('version')
global_release = get_global_value('release') global_release = get_global_value('release')
return global_version + '-' + global_release return global_version + '.' + global_release
def get_version_for_changelog(file_path): def get_version_for_changelog(file_path):
......
...@@ -185,3 +185,9 @@ def test_main_tlsport_is_integer(mocker): ...@@ -185,3 +185,9 @@ def test_main_tlsport_is_integer(mocker):
def test_main_tlsport_is_not_integer(mocker, capsys): def test_main_tlsport_is_not_integer(mocker, capsys):
expected_err = 'is not an integer' expected_err = 'is not an integer'
_test_main_assert_error(mocker, capsys, expected_err, tls=True, tlsport=TLSPORT_INCORRECT) _test_main_assert_error(mocker, capsys, expected_err, tls=True, tlsport=TLSPORT_INCORRECT)
def test_main_tls_mount_point_mounted_with_non_nfs(mocker):
mocker.patch('os.path.ismount', return_value=True)
mocker.patch('mount_efs.is_nfs_mount', return_value=False)
_test_main(mocker, tls=True)
...@@ -102,13 +102,14 @@ def test_mount_nfs_tls_netns(mocker): ...@@ -102,13 +102,14 @@ def test_mount_nfs_tls_netns(mocker):
assert '/mnt' in args[NFS_MOUNT_POINT_IDX + NETNS_NFS_OFFSET] assert '/mnt' in args[NFS_MOUNT_POINT_IDX + NETNS_NFS_OFFSET]
def test_mount_tls_mountpoint_mounted(mocker, capsys): def test_mount_tls_mountpoint_mounted_with_nfs(mocker, capsys):
options = dict(DEFAULT_OPTIONS) options = dict(DEFAULT_OPTIONS)
options['tls'] = None options['tls'] = None
bootstrap_tls_mock = mocker.patch('mount_efs.bootstrap_tls') bootstrap_tls_mock = mocker.patch('mount_efs.bootstrap_tls')
mocker.patch('os.path.ismount', return_value=True) mocker.patch('os.path.ismount', return_value=True)
mocker.patch('mount_efs.is_nfs_mount', return_value=True)
mount_efs.mount_tls(CONFIG, INIT_SYSTEM, DNS_NAME, PATH, FS_ID, MOUNT_POINT, options) mount_efs.mount_tls(CONFIG, INIT_SYSTEM, DNS_NAME, PATH, FS_ID, MOUNT_POINT, options)
out, err = capsys.readouterr() out, err = capsys.readouterr()
assert 'is already mounted' in out assert 'is already mounted' in out
utils.assert_not_called(bootstrap_tls_mock) utils.assert_not_called(bootstrap_tls_mock)
\ No newline at end of file
...@@ -144,6 +144,22 @@ def test_tls_not_running(mocker, tmpdir): ...@@ -144,6 +144,22 @@ def test_tls_not_running(mocker, tmpdir):
utils.assert_called_once(restart_tls_mock) utils.assert_called_once(restart_tls_mock)
def test_tls_not_running_due_to_pid_clean_up(mocker, tmpdir):
state = dict(STATE)
state.pop('pid')
state_file_dir, state_file = create_state_file(tmpdir, content=json.dumps(state))
clean_up_mock, restart_tls_mock, _ = setup_mocks(mocker,
mounts={'mnt': watchdog.Mount('127.0.0.1', '/mnt', 'nfs4', '', '0', '0')},
state_files={'mnt': state_file}, is_pid_running=True)
watchdog.check_efs_mounts(_get_config(), [], GRACE_PERIOD, state_file_dir)
utils.assert_not_called(clean_up_mock)
utils.assert_called_once(restart_tls_mock)
def test_ap_mount_with_extra_mount(mocker, tmpdir): def test_ap_mount_with_extra_mount(mocker, tmpdir):
state_file_dir, state_file = create_state_file(tmpdir) state_file_dir, state_file = create_state_file(tmpdir)
......
#
# Copyright 2017-2018 Amazon.com, Inc. and its affiliates. All Rights Reserved.
#
# Licensed under the MIT License. See the LICENSE accompanying this file
# for the specific language governing permissions and limitations under
# the License.
#
import watchdog
import json
import tempfile
from .. import utils
from datetime import datetime
PID = 1234
STATE = {
'pid': PID,
'commonName': 'deadbeef.com',
'certificate': '/tmp/foobar',
'certificateCreationTime': datetime.utcnow().strftime(watchdog.CERT_DATETIME_FORMAT),
'mountStateDir': 'fs-deadbeef.mount.dir.12345',
'privateKey': '/tmp/foobarbaz',
'accessPoint': 'fsap-fedcba9876543210'
}
PROCESS_NAME_OUTPUT = 'stunnel/var/run/efs/stunnel-config/fs-deadbeef.mount.dir.12345'
PROCESS_NAME_OUTPUT_LWP = '/foo/bar/baz'
PROCESS_NAME_OUTPUT_ERR = ''
def setup_mocks(mocker, state_files, process_name_output):
mocker.patch('watchdog.get_state_files', return_value=state_files)
mocker.patch('watchdog.check_process_name', return_value=process_name_output)
return mocker.patch('watchdog.rewrite_state_file')
def create_state_file(tmpdir, content=json.dumps(STATE)):
state_file = tmpdir.join(tempfile.mktemp())
state_file.write(content, ensure=True)
return state_file.dirname, state_file.basename
def test_malformed_state_file(mocker, tmpdir):
state_file_dir, state_file = create_state_file(tmpdir, 'not-json')
rewrite_state_file_mock = setup_mocks(mocker, state_files={'mnt' : state_file},
process_name_output=PROCESS_NAME_OUTPUT)
watchdog.clean_up_previous_stunnel_pids(state_file_dir)
utils.assert_not_called(rewrite_state_file_mock)
def test_clean_up_active_stunnel_from_previous_watchdog(mocker, tmpdir):
state_file_dir, state_file = create_state_file(tmpdir)
rewrite_state_file_mock = setup_mocks(mocker, state_files={'mnt': state_file},
process_name_output=PROCESS_NAME_OUTPUT)
watchdog.clean_up_previous_stunnel_pids(state_file_dir)
utils.assert_not_called(rewrite_state_file_mock)
def test_clean_up_active_LWP_from_driver(mocker, tmpdir):
state_file_dir, state_file = create_state_file(tmpdir)
rewrite_state_file_mock = setup_mocks(mocker, state_files={'mnt': state_file},
process_name_output=PROCESS_NAME_OUTPUT_LWP)
watchdog.clean_up_previous_stunnel_pids(state_file_dir)
utils.assert_called_once(rewrite_state_file_mock)
def test_clean_up_stunnel_pid_from_previous_driver(mocker, tmpdir):
state_file_dir, state_file = create_state_file(tmpdir)
rewrite_state_file_mock = setup_mocks(mocker, state_files={'mnt': state_file},
process_name_output=PROCESS_NAME_OUTPUT_ERR)
watchdog.clean_up_previous_stunnel_pids(state_file_dir)
utils.assert_called_once(rewrite_state_file_mock)
def test_no_state_files_from_previous_driver(mocker, tmpdir):
rewrite_state_file_mock = setup_mocks(mocker, state_files={}, process_name_output=PROCESS_NAME_OUTPUT)
watchdog.clean_up_previous_stunnel_pids(tmpdir)
utils.assert_not_called(rewrite_state_file_mock)
def test_clean_up_multiple_stunnel_pids(mocker, tmpdir):
state_file_dir, state_file_1 = create_state_file(tmpdir)
state = dict(STATE)
state['pid'] = 5678
state_file_dir, state_file_2 = create_state_file(tmpdir, content=json.dumps(state))
rewrite_state_file_mock = setup_mocks(mocker, state_files={'mnt/a1': state_file_1, 'mnt/a2': state_file_2},
process_name_output=PROCESS_NAME_OUTPUT_ERR)
watchdog.clean_up_previous_stunnel_pids(state_file_dir)
utils.assert_called(rewrite_state_file_mock)
def test_clean_up_stunnel_no_pid(mocker, tmpdir):
state = dict(STATE)
state.pop('pid')
state_file_dir, state_file = create_state_file(tmpdir, content=json.dumps(state))
rewrite_state_file_mock = setup_mocks(mocker, state_files={'mnt': state_file},
process_name_output=PROCESS_NAME_OUTPUT_LWP)
watchdog.clean_up_previous_stunnel_pids(state_file_dir)
utils.assert_not_called(rewrite_state_file_mock)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment