Don’t Use MongoDB Profiling Level 1

FfUBriv.png!web TLDR: It is not profile level 1 that is the problem; it’s a gotcha with the optional ‘slowms’ argument that causes users to accidentally set verbose logging and fill their disk with log files.

In MongoDB, there are two ways to see, with individual detail, which operations were executed and how long they took.

Profiling . Saves the operation details to a capped collection system.profile . You access this information through a mongodb connection.
Log lines of “COMMAND” component type in the mongod log files . (Also mongos v4.0+ log files). You have to access the files as Unix or Windows user and work with them as text.

Profiling is low-cost, but it is limited to keeping only a small snapshot. It has levels : 0 (off), 1 (slowOp(s only)), 2 (all).

The (log file) logger also has levels of its own, but there is no ‘off’. Even at level 0, it prints any slow operation in a “COMMAND” log line. ‘Slow’ is defined by the configured slowOpThresholdMs option (originally “slowMS”). That is 100ms by default, which is a good default i.m.o.

Usability problem

The log-or-not code and profile-or-not are unrelated systems to the user, but they share the same global variable for ‘slow’: serverGlobalParams.slowMS

The basic post-mortem will go:

Someone used db.setProfilingLevel(1/*slowOp*/, 0/*ms*/) to start profiling all operations (DON’T DO THIS – use db.setProfilingLevel(2/*all*/) instead.)
The logger starts writing every command to the log files.
They executed db.setProfilingLevel(0/*off*/) to turn the profiler off.
The logger continues writing every command to the log files because slowMS is still 0ms.
The DBA gets paged after hours because the disk filled up with log file and thought ‘oh, I should have set up log rotation on that server; I’ll do it in the morning’.
The DBA gets woken up in the small hours of the morning because the new primary node has also crashed due to a full disk.

So that’s the advice: Until MongoDB is enhanced to have separate slow-op threshold options for the profiler, never use db.setProfilingLevel(1, …). Even if you know the gotcha, someone learning over your shoulder won’t see it.

What to do instead:

Use only db.setProfilingLevel(0 /*off*/) <-> db.setProfilingLevel(2 /*all*/) and don’t touch slowms
- There is still a valid use case for using profiler at level 1, but if you are not taking on the responsibility of looking after the log file’s code interdependence on the slowMS value, don’t go there.
If you want the logger to print “COMMAND” lines for every command, including fast ones, use db.setLogLevel(1, "command") . And run db.setLogLevel(-1, "command") to stop it again. ‘Use log level 1, not profile level 1’ could almost be the catchphrase of this article.

(db.setLogLevel(1) + db.setLogLevel(0) is an alternative to the above, but is an older, blunter method.)
If you want to set the sampleRate you can do that without changing level (or ‘slowms’) with the following command: db.setProfilingLevel(db.getProfilingStatus().was, {“sampleRate”: <new float value>})
If you want a different threshold for slowMS permanently, use the slowOpThresholdMs option in the config file, but you can also do it dynamically as for the sampleRate instructions above.

Some other gotchas to do with profiling and logging:

The logger level is global; the slowOpThresholdMs a.k.a. slowMS value is global, but profiling level is per db namespace.
All of the above are local only to the mongod (or mongos) the commands are run on / the config file is set for. If you run it on a primary it does not change the secondaries, and if you run it on a mongos it does not change the shard or the configsvr replicaset nodes.
mongos nodes only provide a subset of these diagnostic features. They have no collections of their own, so for starters, they cannot make a system.profile collection.

bool shouldDBProfile(bool shouldSample = true) {
    // Profile level 2 should override any sample rate or slowms settings.
    if (_dbprofile >= 2)
        return true;
 
    if (!shouldSample || _dbprofile <= 0)
        return false;
 
      /* Blog: and by elimination if _dbprofile == 1: */
    return elapsedTimeExcludingPauses() >= Milliseconds{serverGlobalParams.slowMS};
}
 
bool CurOp::completeAndLogOperation(OperationContext* opCtx,
                                    logger::LogComponent component,
                                    boost::optional<size_t> responseLength,
                                    boost::optional<long long> slowMsOverride,
                                    bool forceLog) {
    // Log the operation if it is eligible according to the current slowMS and sampleRate settings.
    const bool shouldLogOp = (forceLog || shouldLog(component, logger::LogSeverity::Debug(1)));
    const long long slowMs = slowMsOverride.value_or(serverGlobalParams.slowMS);
    ...
    ...
    const bool shouldSample =
        client->getPrng().nextCanonicalDouble() < serverGlobalParams.sampleRate;
 
    if (shouldLogOp || (shouldSample && _debug.executionTimeMicros > slowMs * 1000LL)) {
        auto lockerInfo = opCtx->lockState()->getLockerInfo(_lockStatsBase);
 
    ...
    // Return 'true' if this operation should also be added to the profiler.
    return shouldDBProfile(shouldSample);
}

Usability problem

What to do instead:

Recommend

Benchmarking deep learning activation functions on MNIST

精讲 JavaScript 的 "switch" 语句

携手更远冠军路：联想成为中国国家女子排球队主赞助商、官方合作伙伴

瑞幸咖啡Q3产品净营收大增558% 门店运营盈利1.86亿元

2019联想创新科技大会：发布多款高新技术产品打造智慧生态

沙子掺不进了，下沉市场得加水

雄安要搞监管沙盒，需补哪些课？

UI2Code（三）imgcook

Guaranteed communication despite DDoS attacks – Scion

Sourcehut's Year in Alpha

About Joyk