5

3 ways SSSD logging improvements make sysadmins' lives easier

 1 year ago
source link: https://www.redhat.com/sysadmin/sssd-logging-improvements-linux
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

System Security Services Daemon (SSSD) is the primary client software component supporting identity management in many Linux distributions, including Red Hat Enterprise Linux (RHEL), Fedora, and Debian. RHEL 8.6 and 9.0 contain new SSSD logging improvements, which we'll discuss in this article.

[ Improve your skills managing and using SELinux with this helpful guide. ] 

SSSD troubleshooting challenges

SSSD is complex software composed of multiple components (processes and libraries) that talk to each other through various interprocess communication techniques.

Image
SSSD architecture

(Image courtesy of the SSSD Project)

When troubleshooting issues related to SSSD, you must analyze or review SSSD debug logs. This can be daunting, especially when SSSD is configured to run with a high debug level. Tracking identity or authentication requests across different log files often requires deep knowledge of internal components. These requests are handled asynchronously, making linear log-file analysis unhelpful or time consuming. There may be tens of thousands of lines of log messages on high-traffic systems to review, with thousands of requests happening.

SSSD improvements in RHEL

The following SSSD improvements in RHEL aim to solve these challenges.

1. Unique request identifier

SSSD heavily utilizes the Samba project's tevent library, which runs an event loop to handle events asynchronously. SSSD now takes advantage of tevent's new unique identifier (Chain ID) support.

This unique request identifier, which bears no other meaning than being unique, is now printed in SSSD responder logs (such as sssd_nss.log or sssd_pam.log) to "tag" a log message and associate it with a specific request. The identifier [CID#1] is in the first line of this output:

[accept_fd_handler] (0x0400): [CID#1] Client [cmd id][uid 1000][0xb7b480][27] connected!
[sss_cmd_get_version] (0x0200): [CID#1] Received client version [1].
[sss_cmd_get_version] (0x0200): [CID#1] Offered version [1].
[sss_nss_getby_name] (0x0400): [CID#1] Input name: [email protected]
[cache_req_set_plugin] (0x2000): [CID#1] CR #0: Setting "User by name" plugin
[cache_req_send] (0x0400): [CID#1] CR #0: REQ_TRACE: New request [CID #1] 'User by name'
[cache_req_process_input] (0x0400): [CID#1] CR #0: Parsing input name [[email protected]]
[sss_domain_get_state] (0x1000): [CID#1] Domain implicit_files is Active
[sss_domain_get_state] (0x1000): [CID#1] Domain example.com is Active
[sss_parse_name_for_domains] (0x0200): [CID#1] name '[email protected]' matched expression for domain example.com', user is jstephen

SSSD responders forward this unique request identifier to back-end processes. The data provider's requests generate a unique ID, and the back-end log (for example, sssd_$domain.log) messages include the tag RID#X.

This approach allows administrators to correlate requests between the SSSD responder's (front-end) and the SSSD data provider's (back-end) log files. The first line of the log message (below), which is printed for every new request, shows RID#9, then the same log message prints CID#1, effectively linking the SSSD front-end request (CID#1) to the SSSD back-end (RID#9) request:

 [be[ad.vm]] [dp_attach_req] (0x0400): [RID#9] DP Request [Account #9]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001].
 [be[ad.vm]] [dp_attach_req] (0x0400): [RID#9] Number of active DP request: 1
 [be[ad.vm]] [sdap_get_groups_next_base] (0x0400): [RID#9] Searching for groups with base [DC=ad,DC=vm]
 [be[ad.vm]] [sdap_print_server] (0x2000): [RID#9] Searching 192.168.121.150:389
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x0400): [RID#9] calling ldap_search_ext with [(&(objectSID=S-1-5-21-433998187-2822908608-1404606238-518)(objectClass=group)(sAMAccountName=*))][DC=ad,DC=vm].
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [objectClass]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [sAMAccountName]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [gidNumber]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [member]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [objectGUID]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [objectSID]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [whenChanged]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [uSNChanged]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x1000): [RID#9] Requesting attrs: [groupType]
 [be[ad.vm]] [sdap_get_generic_ext_step] (0x2000): [RID#9] ldap_search_ext called, msgid = 16
 [be[ad.vm]] [sdap_op_add] (0x2000): [RID#9] New operation 16 timeout 6
 [be[ad.vm]] [sdap_parse_entry] (0x1000): [RID#9] OriginalDN: [CN=Schema Admins,CN=Users,DC=ad,DC=vm]
 [be[ad.vm]] [sdap_op_destructor] (0x2000): [RID#9] Operation 16 finished
 [be[ad.vm]] [sdap_get_groups_process] (0x0400): [RID#9] Search for groups, returned 1 results.

[ Get a list of Linux utilities and commands for managing servers and networks. ]

2. Debug backtrace

The default log level (debug_level config option) of SSSD components is 2 (0x0040: Serious failures). This captures important errors, but it doesn't explain the reason for those errors.

To facilitate investigations, SSSD introduced the "debug backtrace" feature starting with SSSD 2.5.x (available in RHEL 8.5 and newer). When a significant SSSD operational failure occurs, debug backtrace triggers a dump of the entire log (all log levels) from a specified date forward to help uncover the sequence of events that resulted in the failure. A dump is clearly highlighted in the log file by the begin and end markers:

********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
...
********************** BACKTRACE DUMP ENDS HERE ********************************

This is especially useful with one-off, intermittent, and difficult-to-produce issues. It also makes sos report logs more useful without any additional actions.

The feature is enabled by default. See the debug_backtrace_enabled option in man sssd.conf(5) for more information.

3. Log analyzer

SSSD includes a new log parsing tool in the sssd-tools package for debug-log analysis. This analyzer tool filters log messages pertaining to a specific client request you're investigating. This allows you to follow or track relevant logs for a specific request across the SSSD front-end, back-end, and child processes.

In this example, the Active Directory user testuser cannot log in to the RHEL system. Here's how to identify the problem with the log analyzer tool.

First, use the list argument to find the unique identifier associated with the client command you're tracking. NSS (identity) lookups are shown by default; for authentication requests, you must also pass the --pam argument:

$ sudo sssctl analyze request list --pam
2022-08-23  9:30:02: [uid 0] CID #1: su
2022-08-23  9:32:05: [uid 0] CID #2: sshd: [email protected]

You can add the -v option for more detailed information about each request. This option lists related subrequests that are executed inside SSSD and would normally be generated by a single client command.

[ Learn How to integrate Active Directory Federation Services (ADFS) authentication with Red Hat SSO using SAML. ]

The second request is an SSH login attempt for testuser. The number 2 (as in [CID#2]) is the unique ID you provide as an argument to the analyzer show command to track the request across SSSD components and print all relevant logs. In this snippet, RID#43 was generated by a subrequest of CID#2. The --pam argument is also needed here:

$ sudo sssctl analyze request show 2 --pam
[pam] [accept_fd_handler] (0x0400): [CID#2] Client [cmd sshd: [email protected] [priv]][uid 0][0x55bd0e395ae0][24] connected to privileged pipe!
[pam] [sss_cmd_get_version] (0x0200): [CID#2] Received client version [3].
[pam] [sss_cmd_get_version] (0x0200): [CID#2] Offered version [3].
[pam] [pam_cmd_authenticate] (0x0100): [CID#2] entering pam_cmd_authenticate
[pam] [sss_domain_get_state] (0x1000): [CID#2] Domain implicit_files is Active
[pam] [sss_domain_get_state] (0x1000): [CID#2] Domain ad.domain is Active
[pam] [sss_domain_get_state] (0x1000): [CID#2] Domain ad.domain is Active
[pam] [sss_parse_name_for_domains] (0x0200): [CID#2] name '[email protected] matched expression for domain 'ad.domain', user is testuser
[pam] [pam_print_data] (0x0100): [CID#2] command: SSS_PAM_AUTHENTICATE
[pam] [pam_print_data] (0x0100): [CID#2] domain: ad.domain
[pam] [pam_print_data] (0x0100): [CID#2] user: testuser
[pam] [pam_print_data] (0x0100): [CID#2] service: sshd
<snip>
…
<snip>
[be[ad.domain]] [simple_access_check_send] (0x0200): [RID#43] Simple access check for [email protected]
[be[ad.domain]] [simple_check_users] (0x4000): [RID#43] Checking against allow list username [[email protected]].
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [simple_check_users] (0x4000): [RID#43] Checking against allow list username [[email protected]].
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [simple_check_users] (0x4000): [RID#43] Checking against allow list username [[email protected]].
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [simple_check_users] (0x4000): [RID#43] Checking against allow list username [[email protected]].
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [simple_check_users] (0x4000): [RID#43] Checking against allow list username [[email protected]].
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [simple_check_users] (0x4000): [RID#43] Checking against allow list username [[email protected]].
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [simple_check_get_groups_send] (0x1000): [RID#43] Looking up groups for user [email protected]
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [simple_check_get_groups_send] (0x0400): [RID#43] User [email protected] is a member of 16 supplemental groups
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group Reports - [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group Public - Domain [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group Public - Domain [email protected]
[be[ad.domain]] [simple_check_process_group] (0x2000): [RID#43] Adding group Domain [email protected]
[be[ad.domain]] [sysdb_search_group_by_id] (0x0400): [RID#43] No such entry
[be[ad.domain]] [simple_check_get_groups_primary] (0x0040): [RID#43] Could not look up primary group [650835861]: [2][No such file or directory]
[be[ad.domain]] [simple_check_get_groups_send] (0x0400): [RID#43] All groups had name attribute
[be[ad.domain]] [simple_check_groups] (0x4000): [RID#43] Checking against allow list group name [[email protected]].
[be[ad.domain]] [sss_domain_get_state] (0x1000): [RID#43] Domain ad.domain is Active
[be[ad.domain]] [simple_access_check_done] (0x2000): [RID#43] Group check done
[be[ad.domain]] [simple_access_check_recv] (0x1000): [RID#43] Access not granted
[be[ad.domain]] [dp_req_done] (0x0400): [RID#43] DP Request [PAM Account #43]: Request handler finished [0]: Success
[be[ad.domain]] [_dp_req_recv] (0x0400): [RID#43] DP Request [PAM Account #43]: Receiving request data.
[be[ad.domain]] [dp_req_destructor] (0x0400): [RID#43] DP Request [PAM Account #43]: Request removed.
[be[ad.domain]] [dp_req_destructor] (0x0400): [RID#43] Number of active DP request: 0
[be[ad.domain]] [dp_method_enabled] (0x0400): [RID#43] Target selinux is not configured

The logs printed by the analyzer are specifically associated with this authentication attempt. The tail end of this output shows that the simple access check failed. It also indicates the specific group that was missing and is needed to access the system:

[be[ad.domain]] [simple_check_groups] (0x4000): [RID#43] Checking against allow list group name [[email protected]].
[be[ad.domain]] [simple_access_check_recv] (0x1000): [RID#43] Access not granted

These log messages show that this Active Directory user is not a member of the necessary group on the Active Directory side. They are a member of [email protected] but not [email protected]. Once you add this user, they can successfully log in. Using the SSSD log analyzer makes finding the root cause of this failed authentication easier and quicker.

As a side note, you can use the --child option to include logs for SSSD child processes such as p11_child, gpo_child, or krb5_child.

By default, the analyze command parses SSSD logs from the system where you run the command. You can use the --logdir option to analyze SSSD logs from any system. This requires providing a path to a directory containing SSSD debug logs. It could be useful for help-desk or support team members who often receive and analyze different sets of logs.

Wrap up

These logging improvements hopefully make system administrators' jobs easier. Please try out these new SSSD features and provide feedback about what works and doesn't work for you, what further improvements would make sense, or other suggestions. You can find the SSSD team's contacts on the project website.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK