Skip to content

Commit da6d109

Browse files
jimwwalkerchiyoung
authored andcommitted
Fix ThreadSanitiser issue found with TapProducer
The problem occurs because we're reading expiry time unlocked but set it with the queueLock held. It seems that the disconnect logic should really look at the 3 "elements" it cares about in a consistent way, so the new disconnect method locks the producer and evaluates all 3 parts. TSAN output (shortened): WARNING: ThreadSanitizer: data race (pid=139855) Read of size 4 at 0x7d600000f068 by thread T18 (mutexes: write M2532): #0 ConnHandler::getExpiryTime() /ep-engine/src/tapconnection.h:350 couchbase#1 ConnManager::run() /ep-engine/src/connmap.cc:150 couchbase#2 ExecutorThread::run() /ep-engine/src/executorthread.cc:115 couchbase#3 launch_executor_thread(void*) /ep-engine/src/executorthread.cc:33 couchbase#4 platform_thread_wrap(void*) /platform/src/cb_pthreads.cc:54 Previous write of size 4 at 0x7d600000f068 by main thread (mutexes: write M22798, write M22800): #0 TapProducer::processAck(...) /ep-engine/src/tapconnection.h:346 couchbase#1 EventuallyPersistentEngine::processTapAck(v..) /ep-engine/src/ep_engine.cc:3038 couchbase#2 EventuallyPersistentEngine::tapNotify(...) /ep-engine/src/ep_engine.cc:2761 couchbase#3 EvpTapNotify(...) /ep-engine/src/ep_engine.cc:1446 couchbase#4 mock_tap_notify(...) /memcached/programs/engine_testapp/engine_testapp.cc:442 couchbase#5 test_tap_ack_stream(...) /ep-engine/tests/ep_testsuite.cc:7851 #6 execute_test(...) /memcached/programs/engine_testapp/engine_testapp.cc:1091 Change-Id: Ibf13c2958aac3da745506e8ac6995ce76bd03223 Reviewed-on: http://review.couchbase.org/57234 Reviewed-by: Dave Rigby <daver@couchbase.com> Tested-by: buildbot <build@couchbase.com> Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
1 parent 65f9687 commit da6d109

2 files changed

Lines changed: 6 additions & 1 deletion

File tree

src/connmap.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -489,7 +489,7 @@ void TapConnMap::manageConnections() {
489489
for (iter = map_.begin(); iter != map_.end(); ++iter) {
490490
TapProducer *tp = dynamic_cast<TapProducer*>(iter->second.get());
491491
if (tp != NULL) {
492-
if (tp->supportsAck() && (tp->getExpiryTime() < now) && tp->windowIsFull()) {
492+
if (tp->shouldDisconnect(now)) {
493493
LOG(EXTENSION_LOG_WARNING,
494494
"%s Expired and ack windows is full. Disconnecting...",
495495
tp->logHeader());

src/tapconnection.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -940,6 +940,11 @@ class TapProducer : public Producer {
940940
clearQueues_UNLOCKED();
941941
}
942942

943+
bool shouldDisconnect(rel_time_t now) {
944+
LockHolder lh(queueLock);
945+
return supportsAck() && (getExpiryTime() < now) && windowIsFull();
946+
}
947+
943948
static const char* opaqueCmdToString(uint32_t opaque_code);
944949

945950
protected:

0 commit comments

Comments
 (0)