1, Let me summarize the real problem firstly.
(1) The real problem is
- job is dispatching very slow than before.
- No real pending reason is given:
e.g. till job dispatched, even to a user which has no any pending running job, the pending reason "New job is waiting for scheduling; "
- This issue happens from 9pm, May 15, it probably related to a user who submitted many special jobs from 8:53pm, May 15, such jobs are now still pending in a queue below:
2, Suggested workaround:
- If possible, monitor cluster in next 2 ~ 3 hours, the purpose is to see when jobs in queue "gr10261b" less, if cluster status can be recovered a little bit.
- inact the queue "gr10261b", or switch all jobs of that user's from queue "gr10261b" to an another queue and inact the destination queue.
- At least [[inact the queue GRQUE]] can be tested for around 20~30min to see if any improvement we think -- if no improvement, just active/restore the queue.
3, We have many information already thank you.
But to do further investigation, please gather the following information for our development, they can analyze more.