In recovering context, new erasure PGs stay inactive+incomplete

Question

The problem (if you want to see 'status' first, please see output at the end of post)

In my production cluster with slowly recovering erasure-coded pools,

new erasure pools are not usable because all their pgs remain "creating+incomplete" seemingly forever.

My existing pools recovery will take a very very long time (several months), but I expect to be able to create new 'clean' pools immediately to receive new data in a resilient manner.

What I tried :

ceph osd pool create newpool 128 128 erasure myprofile

rados --pool newpool put anobject afile ==> This blocks

ceph pg ls-by-pool newpool incomplete ==> all my pgs are listed

ceph pg 15.1 query ==> state ; "creating+incomplete" "up" and "acting" contain only the osd '1' as first element, and 'null'(2147483647) at all other positions.

Please note that osd '1' on my platform is the most loaded one (it has almost two times the number of PGs than other OSDs)

Full context

Luminous 12.2.0 (Ubuntu 16.04) migrated from jewel
existing erasure pools with a legacy of very degraded situation (all OSDs are now up, but 83% objects are misplaced, 13% is degraded)
recovery time is estimated to a year
my erasure profile is 12 + 3

==> I would like to start writing in a new "clean" pool, while the existing pools will slowly recover.

===================================== ceph status

cluster: id: b5ee2a02-b92c-4829-8d43-0eb17314c0f6 health: HEALTH_WARN 1314118857/1577308005 objects misplaced (83.314%) Reduced data availability: 10 pgs inactive, 10 pgs incomplete Degraded data redundancy: 203997294/1577308005 objects degraded (12.933%), 492 pgs unclean, 342 pgs degraded, 279 pgs undersized

services: mon: 28 daemons, quorum 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 mgr: 37(active), standbys: 38, 12, 45, 33, 29, 10, 11, 22, 31, 47, 15, 36, 40, 32, 41, 24, 44, 34, 27, 28, 43, 35, 39, 16, 25, 26, 2, 9, 3, 4, 8, 23, 19, 17, 5, 42, 6, 21, 7, 20, 30, 13, 18, 14, 46, 1 osd: 47 osds: 47 up, 47 in; 512 remapped pgs

data: pools: 3 pools, 522 pgs objects: 100M objects, 165 TB usage: 191 TB used, 466 TB / 657 TB avail pgs: 1.916% pgs not active 203997294/1577308005 objects degraded (12.933%) 1314118857/1577308005 objects misplaced (83.314%) 155 active+undersized+degraded+remapped+backfill_wait 140 active+remapped+backfill_wait 114 active+undersized+degraded+remapped 63 active+recovery_wait+degraded+remapped 30 active+clean+remapped 10 creating+incomplete 9 active+recovery_wait+undersized+degraded+remapped 1 active+undersized+degraded+remapped+backfilling

io: client: 291 kB/s rd, 0 B/s wr, 14 op/s rd, 16 op/s wr recovery: 6309 kB/s, 2 objects/s

ceph health detail

HEALTH_WARN 1314114780/1577303100 objects misplaced (83.314%); Reduced data availability: 10 pgs inactive, 10 pgs incomplete; Degraded data redundancy: 203992956/1577303100 objects degraded (12.933%), 492 pgs unclean, 342 pgs degraded, 279 pgs undersized OBJECT_MISPLACED 1314114780/1577303100 objects misplaced (83.314%) PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs incomplete pg 15.0 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.1 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.2 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.3 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.4 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.5 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.6 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.7 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.8 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') pg 15.9 is creating+incomplete, acting [1,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] (reducing pool tester min_size from 12 may help; search ceph.com/docs for 'incomplete') PG_DEGRADED Degraded data redundancy: 203992956/1577303100 objects degraded (12.933%), 492 pgs unclean, 342 pgs degraded, 279 pgs undersized pg 1.e2 is stuck unclean for 9096318.249662, current state active+undersized+degraded+remapped, last acting [1,2147483647,2147483647,5,40,8,28,47,13,12,29,10,23,2147483647,35] pg 1.e3 is stuck undersized for 11585.111340, current state active+undersized+degraded+remapped, last acting [1,3,2147483647,47,11,21,32,46,28,23,2147483647,13,2147483647,19,26] pg 1.e4 is stuck undersized for 11588.194871, current state active+undersized+degraded+remapped+backfill_wait, last acting [26,6,23,46,18,30,2147483647,25,38,29,13,45,9,35,20] pg 1.e5 is stuck undersized for 11588.374341, current state active+undersized+degraded+remapped+backfill_wait, last acting [14,40,2147483647,22,18,17,29,31,28,43,34,19,33,15,32] pg 1.e6 is stuck undersized for 11584.602668, current state active+undersized+degraded+remapped, last acting [1,38,40,2147483647,46,14,2147483647,23,7,44,15,39,8,21,28] pg 1.e7 is stuck undersized for 11578.574380, current state active+undersized+degraded+remapped, last acting [1,13,2147483647,37,29,33,18,2147483647,9,38,23,16,42,2147483647,3] pg 1.e8 is stuck undersized for 11571.385848, current state active+undersized+degraded+remapped, last acting [1,23,2147483647,7,36,26,6,39,38,2147483647,29,11,15,2147483647,19] pg 1.e9 is stuck undersized for 11588.254477, current state active+undersized+degraded+remapped+backfill_wait, last acting [13,44,16,11,9,2147483647,32,37,45,17,20,21,40,46,2147483647] pg 1.ea is stuck undersized for 11588.242417, current state active+undersized+degraded+remapped+backfill_wait, last acting [25,19,30,33,2147483647,44,20,39,17,45,43,24,2147483647,10,21] pg 1.eb is stuck undersized for 11588.329063, current state active+undersized+degraded+remapped+backfill_wait, last acting [29,39,2147483647,2147483647,40,18,4,33,24,38,32,36,15,47,12] pg 1.ec is stuck undersized for 11587.781353, current state active+undersized+degraded+remapped+backfill_wait, last acting [35,37,42,11,2147483647,2147483647,30,15,39,44,43,46,17,4,7]

[...]

pg 3.fa is stuck unclean for 11649.009911, current state active+remapped+backfill_wait, last acting [36,15,42,4,21,14,34,16,17,8,39,3,2,7,19]
pg 3.fb is stuck undersized for 11580.419670, current state active+undersized+degraded+remapped, last acting [1,7,16,9,19,39,2147483647,33,26,23,20,8,35,40,29]
pg 3.fd is stuck unclean for 11649.040651, current state active+remapped+backfill_wait, last acting [17,21,8,26,15,42,46,27,7,39,14,35,4,29,25]
pg 3.fe is active+recovery_wait+degraded+remapped, acting [22,8,45,18,10,46,33,36,16,7,17,34,43,1,23]
pg 3.ff is stuck unclean for 11649.056722, current state active+remapped+backfill_wait, last acting [33,46,47,17,37,4,40,34,28,43,3,44,13,2,11]

I have noticed that in output of "ceph osd tree", all my "weights" are 0 (but my reweights are 1.0). — Keith Ben

soltiz soltiz · Accepted Answer · 2018-06-18T13:52:53

I have had the same problem following a jewel-to-luminous migrated cluster size increase. Somewhere in the process the crush map weights have been damaged (not the one positioned through "reweight" command) and were zeroed. This led my cluster to strange placement decisions (overflowing one of the nodes with too many pgs), and to unability to allocate new pgs (at least when this node was not responsible).

The problem was solved when we edited the crushmap (extracting, decompiling, manual edition, recompiling, putting) and restarted our cluster. We just reset manually all devices weight with 1.0 (because all our disks are the same).

After restart, new pgs/pools could be allocated, and allocation behaviour seemed back to normal (rebalancing things nicely).

Hope that helps,

Soltiz.

In recovering context, new erasure PGs stay inactive+incomplete

What I tried :

Full context

1 Answers