14

Don’t Use MongoDB Profiling Level 1

 4 years ago
source link: https://www.percona.com/blog/2019/11/15/dont-use-mongodb-profiling-level-1/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

FfUBriv.png!web TLDR: It is not profile level 1 that is the problem; it’s a gotcha with the optional ‘slowms’ argument that causes users to accidentally set verbose logging and fill their disk with log files.

In MongoDB, there are two ways to see, with individual detail, which operations were executed and how long they took.

  • Profiling . Saves the operation details to a capped collection system.profile . You access this information through a mongodb connection.
  • Log lines of “COMMAND” component type in the mongod log files . (Also mongos v4.0+ log files). You have to access the files as Unix or Windows user and work with them as text.

Profiling is low-cost, but it is limited to keeping only a small snapshot. It has levels : 0 (off), 1 (slowOp(s only)), 2 (all).

The (log file) logger also has levels of its own, but there is no ‘off’. Even at level 0, it prints any slow operation in a “COMMAND” log line. ‘Slow’ is defined by the configured slowOpThresholdMs option (originally “slowMS”). That is 100ms by default, which is a good default i.m.o.

Usability problem

The log-or-not code and profile-or-not are unrelated systems to the user, but they share the same global variable for ‘slow’: serverGlobalParams.slowMS

The basic post-mortem will go:

  • Someone used db.setProfilingLevel(1/*slowOp*/, 0/*ms*/) to start profiling all operations (DON’T DO THIS – use db.setProfilingLevel(2/*all*/) instead.)
  • The logger starts writing every command to the log files.
  • They executed db.setProfilingLevel(0/*off*/) to turn the profiler off.
  • The logger continues writing every command to the log files because slowMS is still 0ms.
  • The DBA gets paged after hours because the disk filled up with log file and thought ‘oh, I should have set up log rotation on that server; I’ll do it in the morning’.
  • The DBA gets woken up in the small hours of the morning because the new primary node has also crashed due to a full disk.

So that’s the advice: Until MongoDB is enhanced to have separate slow-op threshold options for the profiler, never use db.setProfilingLevel(1, …). Even if you know the gotcha, someone learning over your shoulder won’t see it.

What to do instead:

  • Use only db.setProfilingLevel(0 /*off*/) <-> db.setProfilingLevel(2 /*all*/) and don’t touch slowms
    • There is still a valid use case for using profiler at level 1, but if you are not taking on the responsibility of looking after the log file’s code interdependence on the slowMS value, don’t go there.
  • If you want the logger to print “COMMAND” lines for every command, including fast ones, use db.setLogLevel(1, "command") . And run db.setLogLevel(-1, "command") to stop it again. ‘Use log level 1, not profile level 1’ could almost be the catchphrase of this article.

    (db.setLogLevel(1) + db.setLogLevel(0) is an alternative to the above, but is an older, blunter method.)

  • If you want to set the sampleRate you can do that without changing level (or ‘slowms’) with the following command: db.setProfilingLevel(db.getProfilingStatus().was, {“sampleRate”: <new float value>})
  • If you want a different threshold for slowMS permanently, use the slowOpThresholdMs option in the config file, but you can also do it dynamically as for the sampleRate instructions above.

Some other gotchas to do with profiling and logging:

  • The logger level is global; the slowOpThresholdMs a.k.a. slowMS value is global, but profiling level is per db namespace.
  • All of the above are local only to the mongod (or mongos) the commands are run on / the config file is set for. If you run it on a primary it does not change the secondaries, and if you run it on a mongos it does not change the shard or the configsvr replicaset nodes.
  • mongos nodes only provide a subset of these diagnostic features. They have no collections of their own, so for starters, they cannot make a system.profile collection.
bool shouldDBProfile(bool shouldSample = true) {
    // Profile level 2 should override any sample rate or slowms settings.
    if (_dbprofile >= 2)
        return true;
 
    if (!shouldSample || _dbprofile <= 0)
        return false;
 
      /* Blog: and by elimination if _dbprofile == 1: */
    return elapsedTimeExcludingPauses() >= Milliseconds{serverGlobalParams.slowMS};
}
 
bool CurOp::completeAndLogOperation(OperationContext* opCtx,
                                    logger::LogComponent component,
                                    boost::optional<size_t> responseLength,
                                    boost::optional<long long> slowMsOverride,
                                    bool forceLog) {
    // Log the operation if it is eligible according to the current slowMS and sampleRate settings.
    const bool shouldLogOp = (forceLog || shouldLog(component, logger::LogSeverity::Debug(1)));
    const long long slowMs = slowMsOverride.value_or(serverGlobalParams.slowMS);
    ...
    ...
    const bool shouldSample =
        client->getPrng().nextCanonicalDouble() < serverGlobalParams.sampleRate;
 
    if (shouldLogOp || (shouldSample && _debug.executionTimeMicros > slowMs * 1000LL)) {
        auto lockerInfo = opCtx->lockState()->getLockerInfo(_lockStatsBase);
 
    ...
    // Return 'true' if this operation should also be added to the profiler.
    return shouldDBProfile(shouldSample);
}

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK