Commit bb6b811a authored by Yuan Gao's avatar Yuan Gao
Browse files

Support publishing mount success/failure notification via CloudWatch log

parent d2a1c58c
......@@ -41,6 +41,9 @@ tarball: clean
shebang-support:
./mangle-shebangs.sh
.PHONY: sources
sources: tarball
.PHONY: rpm-only
rpm-only:
mkdir -p $(BUILD_DIR)/{SPECS,COORD_SOURCES,DATA_SOURCES,BUILD,RPMS,SOURCES,SRPMS}
......@@ -50,7 +53,7 @@ rpm-only:
cp $(BUILD_DIR)/RPMS/*/*rpm build
.PHONY: rpm
rpm: shebang-support tarball rpm-only
rpm: shebang-support sources rpm-only
.PHONY: deb
deb:
......
......@@ -6,23 +6,24 @@ Utilities for Amazon Elastic File System (EFS)
The `efs-utils` package has been verified against the following Linux distributions:
| Distribution | Package Type | `init` System |
| ------------ | ------------ | ------------- |
| Amazon Linux 2017.09 | `rpm` | `upstart` |
| Amazon Linux 2 | `rpm` | `systemd` |
| CentOS 7 | `rpm` | `systemd` |
| RHEL 7 | `rpm`| `systemd` |
| RHEL 8 | `rpm`| `systemd` |
| Fedora 28 | `rpm` | `systemd` |
| Fedora 29 | `rpm` | `systemd` |
| Fedora 30 | `rpm` | `systemd` |
| Fedora 31 | `rpm` | `systemd` |
| Fedora 32 | `rpm` | `systemd` |
| Debian 9 | `deb` | `systemd` |
| Debian 10 | `deb` | `systemd` |
| Ubuntu 16.04 | `deb` | `systemd` |
| Ubuntu 18.04 | `deb` | `systemd` |
| Ubuntu 20.04 | `deb` | `systemd` |
| Distribution | Package Type | `init` System | Python Env|
| ------------ | ------------ | ------------- | --------- |
| Amazon Linux 2017.09 | `rpm` | `upstart` | Python2 |
| Amazon Linux 2 | `rpm` | `systemd` | Python2 |
| CentOS 7 | `rpm` | `systemd` | Python2 |
| CentOS 8 | `rpm` | `systemd` | Python3 |
| RHEL 7 | `rpm`| `systemd` | Python2 |
| RHEL 8 | `rpm`| `systemd` | Python3 |
| Fedora 28 | `rpm` | `systemd` | Python3 |
| Fedora 29 | `rpm` | `systemd` | Python3 |
| Fedora 30 | `rpm` | `systemd` | Python3 |
| Fedora 31 | `rpm` | `systemd` | Python3 |
| Fedora 32 | `rpm` | `systemd` | Python3 |
| Debian 9 | `deb` | `systemd` | Python2 |
| Debian 10 | `deb` | `systemd` | Python2 |
| Ubuntu 16.04 | `deb` | `systemd` | Python2 |
| Ubuntu 18.04 | `deb` | `systemd` | Python3 |
| Ubuntu 20.04 | `deb` | `systemd` | Python3 |
## Prerequisites
......@@ -41,6 +42,19 @@ For those using Amazon Linux or Amazon Linux 2, the easiest way to install `efs-
$ sudo yum -y install amazon-efs-utils
```
### Install via AWS Systems Manager Distributor
You can now use AWS Systems Manage Distributor to automatically install or update `amazon-efs-utils`.
Please refer to [Using AWS Systems Manager to automatically install or update Amazon EFS clients](https://docs.aws.amazon.com/efs/latest/ug/manage-efs-utils-with-aws-sys-manager.html) for more guidance.
The following are prerequisites for using AWS Systems Manager Distributor to install or update `amazon-efs-utils`:
1. AWS Systems Manager agent is installed on the distribution (For `Amazon Linux` and `Ubuntu`, AWS Systems Manager agent
is pre-installed, for other distributions, please refer to [install AWS Systems Manager agent on Linux EC2 instance](https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-install-ssm-agent.html)
for more guidance.)
2. Instance is attached with IAM role with AWS managed policy `AmazonElasticFileSystemsUtils`, this policy will enable your instance to be managed by
AWS Systems Manager agent, also it contains permissions to support specific features.
### On other Linux distributions
Other distributions require building the package from source and installing it.
......@@ -143,6 +157,58 @@ By default, when using the EFS mount helper with TLS, it enforces certificate ho
Once you’ve installed the `amazon-efs-utils` package, to upgrade your system’s version of `stunnel`, see [Upgrading Stunnel](https://docs.aws.amazon.com/efs/latest/ug/using-amazon-efs-utils.html#upgrading-stunnel).
## Enable mount success/failure notification via CloudWatch log
`efs-utils` now support publishing mount success/failure logs to CloudWatch log. By default, this feature is disabled. There are three
steps you must follow to enable and use this feature:
### Step 1. Install botocore
`efs-utils` uses botocore to interact with CloudWatch log service . Please note the package type and
python env from the above table.
- To install botocore on RPM
```bash
# Python2
sudo python /tmp/get-pip.py
sudo pip install botocore || sudo /usr/local/bin/pip install botocore
# Python3
sudo python3 /tmp/get-pip.py
sudo pip3 install botocore || sudo /usr/local/bin/pip3 install botocore
```
- To install botocore on DEB
```bash
sudo apt-get update
sudo apt-get -y install wget
wget https://bootstrap.pypa.io/get-pip.py -O /tmp/get-pip.py
# Python2
sudo python /tmp/get-pip.py
sudo pip install botocore || sudo /usr/local/bin/pip install botocore
# On Debian10, the botocore needs to be installed in specific target folder
sudo python /tmp/get-pip.py
sudo pip install --target /usr/lib/python2.7/dist-packages botocore || sudo /usr/local/bin/pip install --target /usr/lib/python2.7/dist-packages botocore
# Python3
sudo python3 /tmp/get-pip.py
sudo pip3 install botocore || sudo /usr/local/bin/pip3 install botocore
# On Ubuntu20, the botocore needs to be installed in specific target folder
sudo python3 /tmp/get-pip.py
sudo pip3 install --target /usr/lib/python3/dist-packages botocore || sudo /usr/local/bin/pip3 install --target /usr/lib/python3/dist-packages botocore
```
### Step 2. Enable CloudWatch log feature in efs-utils config file `/etc/amazon/efs/efs-utils.conf`
```bash
sudo sed -i -e '/\[cloudwatch-log\]/{N;s/# enabled = true/enabled = true/}' /etc/amazon/efs/efs-utils.conf
```
You can also configure CloudWatch log group name and log retention days in the config file.
### Step 3. Attach the CloudWatch logs policy to the IAM role attached to instance.
Attach AWS managed policy `AmazonElastciFileSystemsUtils` to the iam role you attached to the instance, or the aws credentials
configured on your instance.
After completing the three prerequisite steps, you will be able to see mount status notifications in CloudWatch Logs.
## License Summary
This code is made available under the MIT license.
......
......@@ -32,7 +32,7 @@
%endif
Name : amazon-efs-utils
Version : 1.27.1
Version : 1.28.1
Release : 1%{platform}
Summary : This package provides utilities for simplifying the use of EFS file systems
......@@ -132,6 +132,10 @@ fi
%clean
%changelog
* Fri Sep 18 2020 Yuan Gao <ygaochn@amazon.com> - 1.28.1
- Introduce botocore to publish mount success/failure notification to cloudwatch log
- Revert stop emitting unrecognized init system supervisord if the watchdog daemon has already been launched by supervisor check
* Tue Aug 4 2020 Karthik Basavaraj <kbbasav@amazon.com> - 1.27.1
- Merge PR #60 on GitHub. Adds support for AssumeRoleWithWebIdentity
......
......@@ -11,10 +11,11 @@ set -ex
BASE_DIR=$(pwd)
BUILD_ROOT=${BASE_DIR}/build/debbuild
VERSION=1.27.1
VERSION=1.28.1
RELEASE=1
DEB_SYSTEM_RELEASE_PATH=/etc/os-release
UBUNTU18_REGEX="Ubuntu 18"
UBUNTU20_REGEX="Ubuntu 20"
DEBIAN11_REGEX="Debian GNU/Linux bullseye"
echo 'Cleaning deb build workspace'
......@@ -31,7 +32,7 @@ mkdir -p ${BUILD_ROOT}/var/log/amazon/efs
mkdir -p ${BUILD_ROOT}/usr/share/man/man8
if [ -f $DEB_SYSTEM_RELEASE_PATH ] && echo "$(grep PRETTY_NAME $DEB_SYSTEM_RELEASE_PATH)" \
| grep -e "$UBUNTU18_REGEX" -e "$DEBIAN11_REGEX"; then
| grep -e "$UBUNTU18_REGEX" -e "$DEBIAN11_REGEX" -e "$UBUNTU20_REGEX"; then
echo 'Correcting python executable'
sed -i -e 's/python|python2/python3/' dist/amazon-efs-utils.control
# Replace the first line in .py to "#!/usr/bin/env python3" no matter what it was before
......
......@@ -7,5 +7,5 @@
#
[global]
version=1.27.1
version=1.28.1
release=1
Package: amazon-efs-utils
Architecture: all
Version: 1.27.1
Version: 1.28.1
Section: utils
Depends: python|python2, nfs-common, stunnel4 (>= 4.56), openssl (>= 1.0.2), util-linux
Priority: optional
......
......@@ -51,4 +51,12 @@ poll_interval_sec = 1
unmount_grace_period_sec = 30
# Set client auth/access point certificate renewal rate. Minimum value is 1 minute.
tls_cert_renewal_interval_min = 60
\ No newline at end of file
tls_cert_renewal_interval_min = 60
[cloudwatch-log]
# enabled = true
log_group_name = /aws/efs/utils
# Possible values are : 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653
# Comment this config to prevent log deletion
retention_in_days = 14
\ No newline at end of file
......@@ -69,8 +69,15 @@ except ImportError:
from urllib.error import URLError, HTTPError
from urllib.parse import urlencode
try:
import botocore.session
from botocore.exceptions import ClientError, NoCredentialsError, EndpointConnectionError
BOTOCORE_PRESENT = True
except ImportError:
BOTOCORE_PRESENT = False
VERSION = '1.27.1'
VERSION = '1.28.1'
SERVICE = 'elasticfilesystem'
CONFIG_FILE = '/etc/amazon/efs/efs-utils.conf'
......@@ -78,6 +85,12 @@ CONFIG_SECTION = 'mount'
CLIENT_INFO_SECTION = 'client-info'
CLIENT_SOURCE_STR_LEN_LIMIT = 100
CLOUDWATCH_LOG_SECTION = 'cloudwatch-log'
DEFAULT_CLOUDWATCH_LOG_GROUP = '/aws/efs/utils'
DEFAULT_RETENTION_DAYS = 14
# Cloudwatchlog agent dict includes cloudwatchlog botocore client, cloudwatchlog group name, cloudwatchlog stream name
CLOUDWATCHLOG_AGENT = None
LOG_DIR = '/var/log/amazon/efs'
LOG_FILE = 'mount.log'
......@@ -216,6 +229,7 @@ def fatal_error(user_message, log_message=None, exit_code=1):
sys.stderr.write('%s\n' % user_message)
logging.error(log_message)
publish_cloudwatch_log(CLOUDWATCHLOG_AGENT, 'Mount failed, %s' % log_message)
sys.exit(exit_code)
......@@ -248,18 +262,36 @@ def get_target_region(config):
def get_region_from_instance_metadata():
instance_identity, err_msg = get_instance_identity_info_from_instance_metadata('region')
if err_msg:
raise Exception(err_msg)
return instance_identity
def get_instance_id_from_instance_metadata():
instance_id, err_msg = get_instance_identity_info_from_instance_metadata('instanceId')
if err_msg:
logging.warning('Cannot get instance id from instance metadata, %s' % err_msg)
return instance_id
def get_instance_identity_info_from_instance_metadata(property):
err_msg = None
try:
headers = {}
instance_identity = get_aws_ec2_metadata(headers)
return instance_identity['region']
return instance_identity[property], err_msg
except HTTPError as e:
# 401:Unauthorized, the GET request uses an invalid token, so generate a new one
if e.code == 401:
token = get_aws_ec2_metadata_token()
headers = {'X-aws-ec2-metadata-token': token}
instance_identity = get_aws_ec2_metadata(headers)
return instance_identity['region']
return instance_identity[property], err_msg
err_msg = 'Unable to reach instance metadata service at %s: status=%d, reason is %s' \
% (INSTANCE_METADATA_SERVICE_URL, e.code, e.reason)
except URLError as e:
......@@ -267,10 +299,9 @@ def get_region_from_instance_metadata():
except ValueError as e:
err_msg = 'Error parsing json: %s' % (e,)
except KeyError as e:
err_msg = 'Region not present in %s: %s' % (instance_identity, e)
err_msg = '%s not present in %s: %s' % (property, instance_identity, e)
if err_msg:
raise Exception(err_msg)
return None, err_msg
def get_region_from_legacy_dns_format(config):
......@@ -847,31 +878,6 @@ def start_watchdog(init_system):
else:
logging.debug('%s is already running', WATCHDOG_SERVICE)
elif init_system == 'supervisord':
error_message = None
proc = subprocess.Popen(
['supervisorctl', 'status', WATCHDOG_SERVICE], stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
stdout, stderr = proc.communicate()
rc = proc.returncode
if rc != 0:
if rc == 4: # No such process
error_message = \
'%s process is not started, please use supervisor to launch the watchdog daemon' % WATCHDOG_SERVICE
elif rc == 2: # Supervisorctl is not properly setup
error_message = 'Cannot invoke supervisorctl to check status of %s' % WATCHDOG_SERVICE
else:
error_message = 'Unknown error %s' % stderr
else:
if 'RUNNING' in stdout:
logging.debug('%s is already running', WATCHDOG_SERVICE)
else:
logging.debug('%s is not running', WATCHDOG_SERVICE)
if error_message:
sys.stderr.write('%s\n' % error_message)
logging.warning(error_message)
else:
error_message = 'Could not start %s, unrecognized init system "%s"' % (WATCHDOG_SERVICE, init_system)
sys.stderr.write('%s\n' % error_message)
......@@ -1014,7 +1020,9 @@ def mount_nfs(dns_name, path, mountpoint, options):
out, err = proc.communicate()
if proc.returncode == 0:
logging.info('Successfully mounted %s at %s', dns_name, mountpoint)
message = 'Successfully mounted %s at %s' % (dns_name, mountpoint)
logging.info(message)
publish_cloudwatch_log(CLOUDWATCHLOG_AGENT, message)
else:
message = 'Failed to mount %s at %s: returncode=%d, stderr="%s"' % (dns_name, mountpoint, proc.returncode, err.strip())
fatal_error(err.strip(), message, proc.returncode)
......@@ -1534,6 +1542,7 @@ def match_device(config, device):
primary, secondaries, _ = socket.gethostbyname_ex(remote)
hostnames = list(filter(lambda e: e is not None, [primary] + secondaries))
except socket.gaierror:
create_default_cloudwatchlog_agent_if_not_exist(config)
fatal_error(
'Failed to resolve "%s" - check that the specified DNS name is a CNAME record resolving to a valid EFS DNS '
'name' % remote,
......@@ -1541,6 +1550,7 @@ def match_device(config, device):
)
if not hostnames:
create_default_cloudwatchlog_agent_if_not_exist(config)
fatal_error(
'The specified domain name "%s" did not resolve to an EFS mount target' % remote
)
......@@ -1556,6 +1566,7 @@ def match_device(config, device):
if hostname == expected_dns_name:
return fs_id, path
else:
create_default_cloudwatchlog_agent_if_not_exist(config)
fatal_error('The specified CNAME "%s" did not resolve to a valid DNS name for an EFS mount target. '
'Please refer to the EFS documentation for mounting with DNS names for examples: %s'
% (remote, 'https://docs.aws.amazon.com/efs/latest/ug/mounting-fs-mount-cmd-dns-name.html'))
......@@ -1627,6 +1638,334 @@ def check_options_validity(options):
fatal_error('The "awscredsuri" and "awsprofile" options are mutually exclusive')
def bootstrap_cloudwatch_logging(config, fs_id=None):
if not check_if_cloudwatch_log_enabled(config):
return None
cloudwatchlog_client = get_botocore_client(config, 'logs')
if not cloudwatchlog_client:
return None
cloudwatchlog_config = get_cloudwatchlog_config(config, fs_id)
log_group_name = cloudwatchlog_config.get('log_group_name')
log_stream_name = cloudwatchlog_config.get('log_stream_name')
retention_days = cloudwatchlog_config.get('retention_days')
group_creation_completed = create_cloudwatch_log_group(cloudwatchlog_client, log_group_name)
if not group_creation_completed:
return None
put_retention_policy_completed = put_cloudwatch_log_retention_policy(cloudwatchlog_client, log_group_name, retention_days)
if not put_retention_policy_completed:
return None
stream_creation_completed = create_cloudwatch_log_stream(cloudwatchlog_client, log_group_name, log_stream_name)
if not stream_creation_completed:
return None
return {
'client': cloudwatchlog_client,
'log_group_name': log_group_name,
'log_stream_name': log_stream_name
}
def create_default_cloudwatchlog_agent_if_not_exist(config):
if not check_if_cloudwatch_log_enabled(config):
return None
global CLOUDWATCHLOG_AGENT
if not CLOUDWATCHLOG_AGENT:
CLOUDWATCHLOG_AGENT = bootstrap_cloudwatch_logging(config)
def get_botocore_client(config, service):
if not BOTOCORE_PRESENT:
logging.error('Failed to import botocore, please install botocore first.')
return None
session = botocore.session.get_session()
region = get_target_region(config)
iam_role_name = get_iam_role_name()
if iam_role_name:
credentials, _ = get_aws_security_credentials_from_instance_metadata(iam_role_name)
if credentials:
return session.create_client(service, aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['Token'], region_name=region)
return session.create_client(service, region_name=region)
def get_cloudwatchlog_config(config, fs_id=None):
log_group_name = DEFAULT_CLOUDWATCH_LOG_GROUP
if config.has_option(CLOUDWATCH_LOG_SECTION, 'log_group_name'):
log_group_name = config.get(CLOUDWATCH_LOG_SECTION, 'log_group_name')
retention_days = DEFAULT_RETENTION_DAYS
if config.has_option(CLOUDWATCH_LOG_SECTION, 'retention_in_days'):
retention_days = config.get(CLOUDWATCH_LOG_SECTION, 'retention_in_days')
log_stream_name = get_cloudwatch_log_stream_name(fs_id)
return {
'log_group_name': log_group_name,
'retention_days': int(retention_days),
'log_stream_name': log_stream_name
}
def get_cloudwatch_log_stream_name(fs_id=None):
instance_id = get_instance_id_from_instance_metadata()
if instance_id and fs_id:
log_stream_name = '%s - %s - mount.log' % (fs_id, instance_id)
elif instance_id:
log_stream_name = '%s - mount.log' % (instance_id)
elif fs_id:
log_stream_name = '%s - mount.log' % (fs_id)
else:
log_stream_name = 'default - mount.log'
return log_stream_name
def check_if_cloudwatch_log_enabled(config):
if config.has_option(CLOUDWATCH_LOG_SECTION, 'enabled'):
return config.getboolean(CLOUDWATCH_LOG_SECTION, 'enabled')
return False
def cloudwatch_create_log_group_helper(cloudwatchlog_client, log_group_name):
cloudwatchlog_client.create_log_group(
logGroupName=log_group_name
)
logging.info('Created cloudwatch log group %s' % log_group_name)
def create_cloudwatch_log_group(cloudwatchlog_client, log_group_name):
try:
cloudwatch_create_log_group_helper(cloudwatchlog_client, log_group_name)
except ClientError as e:
exception = e.response['Error']['Code']
if exception == 'ResourceAlreadyExistsException':
logging.debug('Log group %s already exist, %s' % (log_group_name, e.response))
return True
elif exception == 'LimitExceededException':
logging.error('Reached the maximum number of log groups that can be created, %s' % e.response)
return False
elif exception == 'OperationAbortedException':
logging.debug('Multiple requests to update the same log group %s were in conflict, %s' % (log_group_name, e.response))
return False
elif exception == 'InvalidParameterException':
logging.error('Log group name %s is specified incorrectly, %s' % (log_group_name, e.response))
return False
else:
handle_general_botocore_exceptions(e)
return False
except NoCredentialsError as e:
logging.warning('Credentials are not properly configured, %s' % e)
return False
except EndpointConnectionError as e:
logging.warning('Could not connect to the endpoint, %s' % e)
return False
except Exception as e:
logging.warning('Unknown error, %s.' % e)
return False
return True
def cloudwatch_put_retention_policy_helper(cloudwatchlog_client, log_group_name, retention_days):
cloudwatchlog_client.put_retention_policy(
logGroupName=log_group_name,
retentionInDays=retention_days
)
logging.debug('Set cloudwatch log group retention days to %s' % retention_days)
def put_cloudwatch_log_retention_policy(cloudwatchlog_client, log_group_name, retention_days):
try:
cloudwatch_put_retention_policy_helper(cloudwatchlog_client, log_group_name, retention_days)
except ClientError as e:
exception = e.response['Error']['Code']
if exception == 'ResourceNotFoundException':
logging.error('Log group %s does not exist, %s' % (log_group_name, e.response))
return False
elif exception == 'OperationAbortedException':
logging.debug('Multiple requests to update the same log group %s were in conflict, %s' % (log_group_name, e.response))
return False
elif exception == 'InvalidParameterException':
logging.error('Either parameter log group name %s or retention in days %s is specified incorrectly, %s'
% (log_group_name, retention_days, e.response))
return False
else:
handle_general_botocore_exceptions(e)
return False
except NoCredentialsError as e:
logging.warning('Credentials are not properly configured, %s' % e)
return False
except EndpointConnectionError as e:
logging.warning('Could not connect to the endpoint, %s' % e)
return False
except Exception as e:
logging.warning('Unknown error, %s.' % e)
return False
return True
def cloudwatch_create_log_stream_helper(cloudwatchlog_client, log_group_name, log_stream_name):
cloudwatchlog_client.create_log_stream(
logGroupName=log_group_name,
logStreamName=log_stream_name
)
logging.info('Created cloudwatch log stream %s in log group %s' % (log_stream_name, log_group_name))
def create_cloudwatch_log_stream(cloudwatchlog_client, log_group_name, log_stream_name):
try:
cloudwatch_create_log_stream_helper(cloudwatchlog_client, log_group_name, log_stream_name)
except ClientError as e:
exception = e.response['Error']['Code']
if exception == 'ResourceAlreadyExistsException':
logging.debug('Log stream %s already exist in log group %s, %s' % (log_stream_name, log_group_name, e.response))
return True
elif exception == 'InvalidParameterException':
logging.error('Either parameter log group name %s or log stream name %s is specified incorrectly, %s'
% (log_group_name, log_stream_name, e.response))
return False
elif exception == 'ResourceNotFoundException':
logging.error('Log group %s does not exist, %s' % (log_group_name, e.response))
return False
else:
handle_general_botocore_exceptions(e)
return False
except NoCredentialsError as e:
logging.warning('Credentials are not properly configured, %s' % e)
return False
except EndpointConnectionError as e:
logging.warning('Could not connect to the endpoint, %s' % e)
return False
except Exception as e:
logging.warning('Unknown error, %s.' % e)
return False
return True
def cloudwatch_put_log_events_helper(cloudwatchlog_agent, message, token=None):
kwargs = {
'logGroupName': cloudwatchlog_agent.get('log_group_name'),
'logStreamName': cloudwatchlog_agent.get('log_stream_name'),
'logEvents': [
{
'timestamp': int(round(time.time() * 1000)),
'message': message
}
]
}
if token:
kwargs['sequenceToken'] = token
cloudwatchlog_agent.get('client').put_log_events(**kwargs)
def publish_cloudwatch_log(cloudwatchlog_agent, message):
if not cloudwatchlog_agent or not cloudwatchlog_agent.get('client'):
return False
token = get_log_stream_next_token(cloudwatchlog_agent)
try:
cloudwatch_put_log_events_helper(cloudwatchlog_agent, message, token)
except ClientError as e:
exception = e.response['Error']['Code']
if exception == 'InvalidSequenceTokenException':
logging.debug('The sequence token is not valid, %s' % e.response)
return False