1
votes

I'm deploying rook-ceph into a minikube cluster. Everything seems to be working. I added 3 unformatted disk to the vm and its connected. The problem that im having is when I run ceph status, I get a health warm message that tells me "1 pg undersized". How exactly do I fix this?

The documentation(https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/) stated "If you are trying to create a cluster on a single node, you must change the default of the osd crush chooseleaf type setting from 1 (meaning host or node) to 0 (meaning osd) in your Ceph configuration file before you create your monitors and OSDs." I don't know where to make this configuration but if there's any other way to fix this that I should know of, please let me know. Thanks!

2
I recommend you have a look at the placement groups calculator: ceph.io/pgcalc basically your warning is saying: The placement group has fewer copies than the configured pool replication level. So we would need more information about your setup in order to give a proper answer.Iris G.
Hey thanks for the response, I should mention that I'm new to this lol. Since im not working with Openstack, I selected the AIO. I have 3 OSD which is the virtual hard disk that I attached to the VM. I left %Data and OSD with the default value which is 100. So, the total pg count is 128.Xcer

2 Answers

4
votes

As you mentioned in your question you should change your crush failure-domain-type to OSD that it means it will replicate your data between OSDs not hosts. By default it is host and when you have only one host it doesn't have any other hosts to replicate your data and so your pg will always be undersized.

You should set osd crush chooseleaf type = 0 in your ceph.conf before you create your monitors and OSDs.

This will replicate your data between OSDs rather that hosts.

0
votes

I came across this problem installing ceph using rook (v1.5.7) with a single data bearing host having multiple OSDs.

The install shipped with a default CRUSH rule replicated_rule which had host as the default failure domain:

$ ceph osd crush rule dump replicated_rule    
{
    "rule_id": 0,
    "rule_name": "replicated_rule",
    "ruleset": 0,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

I had to find out the pool name associated with pg 1 that was "undersized", luckily in a default rook-ceph install, there's only one:

$ ceph osd pool ls
device_health_metrics

$ ceph pg ls-by-pool device_health_metrics
PG   OBJECTS  DEGRADED  ...  STATE
1.0        0         0  ...  active+undersized+remapped

And to confirm the pg is using the default rule:

$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule

Instead of modifying the default CRUSH rule, I opted to create a new replicated rule, but this time specifying the osd (aka device) type (docs: CRUSH map Types and Buckets), also assuming the default CRUSH root of default:

# osd crush rule create-replicated <name> <root> <type> [<class>]
$ ceph osd crush rule create-replicated replicated_rule_osd default osd

$ ceph osd crush rule dump replicated_rule_osd
{
    "rule_id": 1,
    "rule_name": "replicated_rule_osd",
    "ruleset": 1,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "choose_firstn",
            "num": 0,
            "type": "osd"
        },
        {
            "op": "emit"
        }
    ]
}

And then assigning the new rule to the existing pool:

$ ceph osd pool set device_health_metrics crush_rule replicated_rule_osd
set pool 1 crush_rule to replicated_rule_osd

$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule_osd

Finally confirming pg state:

$ ceph pg ls-by-pool device_health_metrics
PG   OBJECTS  DEGRADED  ...  STATE
1.0        0         0  ...  active+clean