Skip to content
This repository was archived by the owner on Apr 24, 2023. It is now read-only.
This repository was archived by the owner on Apr 24, 2023. It is now read-only.

Scheduler silently fails on malformed ZK URLs #950

@PerilousApricot

Description

@PerilousApricot

Describe the bug
In a couple places [1] [2], the user is instructed to postfix the ZK connection string with a directory (zk node?) /cook. If the user does this, the scheduler for some reason will never connect to the mesos master.

[1] https://github.com/twosigma/Cook/blob/master/scheduler/docs/configuration.adoc
[2] https://github.com/twosigma/Cook/blob/master/scheduler/example-prod-config.edn#L15

To Reproduce
Download the latest Cook, build, and manually set the :zookeeper {: connection} config option to have a trailing /cook. The scheduler will begin some preparatory work, then seemingly hang, just periodically writing heartbeat messages to the log. I can turn this failure mode on and off by adding/removing that suffix.

Expected behavior
I'd expect an explicit crash in this case. I presume that the scheduler can't attempts to perform master election and fails because of the invalid ZK hostname. Since I never saw an error, and one of the final lines in the log is from Cook trying to find the mesos scheduler, I tried debugging that interaction, when the true failure was elsewhere.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions