I came across this problem installing ceph using rook (v1.5.7) with a single data bearing host having multiple OSDs.
The install shipped with a default CRUSH rule replicated_rule
which had host
as the default failure domain:
$ ceph osd crush rule dump replicated_rule
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
I had to find out the pool name associated with pg 1 that was "undersized", luckily in a default rook-ceph install, there's only one:
$ ceph osd pool ls
device_health_metrics
$ ceph pg ls-by-pool device_health_metrics
PG OBJECTS DEGRADED ... STATE
1.0 0 0 ... active+undersized+remapped
And to confirm the pg is using the default rule:
$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule
Instead of modifying the default CRUSH rule, I opted to create a new replicated rule, but this time specifying the osd
(aka device
) type (docs: CRUSH map Types and Buckets), also assuming the default CRUSH root of default
:
# osd crush rule create-replicated <name> <root> <type> [<class>]
$ ceph osd crush rule create-replicated replicated_rule_osd default osd
$ ceph osd crush rule dump replicated_rule_osd
{
"rule_id": 1,
"rule_name": "replicated_rule_osd",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_firstn",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
And then assigning the new rule to the existing pool:
$ ceph osd pool set device_health_metrics crush_rule replicated_rule_osd
set pool 1 crush_rule to replicated_rule_osd
$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule_osd
Finally confirming pg state:
$ ceph pg ls-by-pool device_health_metrics
PG OBJECTS DEGRADED ... STATE
1.0 0 0 ... active+clean