テキスト
歴史

This proposal is an interim amendme

This proposal is an interim amendment, which focused on possibility of
backporting, of a problem that a Linux system can lock up forever due to
the behavior of memory allocator.

About current behavior of memory allocator:

The memory allocator continues looping than fail the allocation requests
unless "the requested page's order is larger than PAGE_ALLOC_COSTLY_ORDER"
or "GFP_NORETRY flag is passed to the allocation requests" or "TIF_MEMDIE
flag was set on the current thread by the OOM killer". As a result, the
system can fall into forever stalling state without any kernel messages;
resulting in unexplained system hang up troubles.
( https://lwn.net/Articles/627419/ )

There are at least three cases where a thread falls into infinite loop
inside the memory allocator.

The first case is too_many_isolated() throttling loop inside
shrink_inactive_list(). This throttling is intended for not to invoke the
OOM killer unnecessarily, but a certain type of memory pressure can make it
possible to let too_many_isolated() return true forever and nobody can
escape from shrink_inactive_list(). If all threads trying to allocate memory
are caught at too_many_isolated() loop, nobody can proceed.
( http://marc.info/?l=linux-kernel&m=140051046730378 and
http://marc.info/?l=linux-mm&m=141671817211121 ; Reproducer program for this
case is shared by only security@xxxxxxxxxx members and some individuals. )

The second case is allocation requests without __GFP_FS flag. This behavior
is intended for not to invoke the OOM killer unnecessarily because there
might be memory reclaimable by allocation requests with __GFP_FS flag. But
it is possible that all threads doing __GFP_FS allocation requests (including
kswapd which is capable of reclaiming memory with __GFP_FS flag) are blocked
and nobody can perform memory reclaim operations. As a result, the memory
allocator gives nobody a chance to invoke the OOM killer, falling into
infinite loop.

The third case is that the OOM victim is unable to release memory due to being
blocked by invisible dependency after a __GFP_FS allocation request invoked
the OOM killer. This case can occur when the OOM victim is blocked for waiting
for a lock whereas a thread doing allocation request with the lock held is
waiting for the OOM victim to release its mm struct. For example, we can
reproduce this case on XFS filesystem by doing !__GFP_FS allocation requests
with inode's mutex held. We can't expect that there are memory reclaimable by
__GFP_FS allocations because the OOM killer is already invoked. And since
there is already an OOM victim, the OOM killer is not invoked even if threads
doing __GFP_FS allocations are running. As a result, allocation requests by a
thread which is blocking an OOM victim can fall into infinite loop regardless
of whether the allocation request is __GFP_FS or not. We call such state as
OOM deadlock.

There are programs which are protected from the OOM killer by setting
/proc/$pid/oom_score_adj to -1000. /usr/sbin/sshd (an example of such
programs) is helpful for restarting programs killed by the OOM killer
because /usr/sbin/sshd can offer a mean to login to the system. However,
under the OOM deadlock state, /usr/sbin/sshd cannot offer a mean to login
because /usr/sbin/sshd will be stalling forever inside allocation requests
(e.g. page faults).

Those who set /proc/sys/vm/panic_on_oom to 0 are not expecting that the
system falls into forever-inoperable state when the OOM killer is invoked.
Instead, they are expecting that the system keeps operable state via the
OOM killer when the OOM killer is invoked. But current behavior makes it
impossible to login to the system, impossible to trigger SysRq-f (manually
kill a process) due to "workqueue being fallen into infinite loop inside
the memory allocator" or "SysRq-f choosing an OOM victim which already got
TIF_MEMDIE flag and got stuck due to invisible dependency". As a result, they
need to choose from SysRq-i (manually kill all processes), SysRq-c (manually
trigger kernel panic) or SysRq-b (manually reset the system). Asking them to
choose one of these SysRq is an unnecessarily large sacrifice. Also, they are
carried penalty that they need to go to in front of console in order to issue
a SysRq command, for infinite loop inside the memory allocator prevents them
from logging into the system via /usr/sbin/sshd . And since administrators are
using /proc/sys/vm/panic_on_oom with 0 without understanding that there is
such sacrifice and penalty, they rush into support center that their systems
had unexplained hang up problem. I do want to solve this circumstance.

The above description is about the third case. But people are carried penalty
for the first case and the second case that their systems fall into forever-
inoperable state until they go to in front of console and trigger SysRq-f
manually. The first case and the second case can happen regardless of
/proc/sys/vm/panic_on_oom setting because the OOM killer is not involved, but
administrators are using it without understanding that there are such cases.
And, even if they rush into support center with vmcore captured via SysRq-c,
we cannot analyze how long the threads spent looping inside the memory
allocator because current implementation gives no hint.

About proposals for mitigating this problem:

There has been several proposals which try to reduce the possibility of
OOM deadlock without use of timeout. Two of them are explained here.

One proposal is to allow small allocation requests to fail in order to avoid
lockups caused by looping forever inside the memory allocator.
( https://lwn.net/Articles/636017/ and https://lwn.net/Articles/636797/ )
But if such allocation requests start failing under memory pressure, a lot of
memory allocation failure paths which have almost never been tested will be
used, and various obscure bugs (e.g.
http://marc.info/?l=dri-devel&m=142189369426813 ) will show up. Thus, it is
too risky to backport. Also, as long as there are GFP_NOFAIL allocations
(either explicit or open-coded retry loop), this approach cannot completely
avoid OOM deadlock.

If we allow small memory allocations to fail than loop inside the memory
allocator, allocation requests caused by page faults start failing. As a side
effect, either "the OOM killer is invoked and some mm struct is chosen by the
OOM killer" or "that thread is killed by SIGBUS signal sent from the kernel"
will occur when an allocation request by page faults failed.

If important processes which are protected from the OOM killer by setting
/proc/$pid/oom_score_adj to -1000 are killed by SIGBUS signal than kill
OOM-killable processes via the OOM killer, /proc/$pid/oom_score_adj becomes
useless. Also, we can observe kernel panic triggered by the global init
process being killed by SIGBUS signal.
( http://marc.info/?l=linux-kernel&m=142676304911566 )

Regarding !__GFP_FS allocation requests caused by page faults, there will
be no difference (except for SIGBUS case explained above) between "directly
invoking the OOM killer while looping inside the memory allocator" and
"indirectly invoking the OOM killer after failing the allocation request".

However, penalty carried by failing !__GFP_FS allocation requests not caused
by page faults is large. For example, we experienced in Linux 3.19 that ext4
filesystem started to trigger filesystem error actions (remount as read-only
which prevents programs from working correctly, or kernel panic which stops
the whole system) when memory is extremely tight because we unexpectedly
allowed !__GFP_FS allocations to fail without retrying.
( http://marc.info/?l=linux-ext4&m=142443125221571 ) And we restored the
original behavior for now.

It is observed that this proposal (which allows memory allocations to fail)
is likely carrying larger penalty than trying to keep the system operable
state by invoking the OOM killer. Allowing small allocations to fail is not
as easy as people think.

Another proposal is to reserve some amount of memory which is used by
allocation requests which can invoke the OOM killer, by manipulating zone
watermark. ( https://lwn.net/Articles/642057/ ) But this proposal will not
help if threads which are preventing the OOM victim are doing allocation
requests which cannot invoke the OOM killer, or threads which are not
preventing the OOM victim can consume the reserve by doing allocation
requests which can invoke the OOM killer. Also, by manipulating zone
watermark, there could be performance impact because direct reclaim is
more likely to be invoked.

Since the dependency needed for avoiding OOM deadlock is not visible to the
memory allocator, we cannot avoid use of heuristic approaches for detecting
the OOM deadlock state. Already proposed for many times, and again proposed
here is to invoke the OOM killer based on timeout approach.

This proposal is an interim amendment, which focused on possibility of
backporting, of a problem that a Linux system can lock up forever due to
the behavior of memory allocator.

About current behavior of memory allocator:

The memory allocator continues looping than fail the allocation requests
unless "the requested page's order is larger than PAGE_ALLOC_COSTLY_ORDER"
or "GFP_NORETRY flag is passed to the allocation requests" or "TIF_MEMDIE
flag was set on the current thread by the OOM killer". As a result, the
system can fall into forever stalling state without any kernel messages;
resulting in unexplained system hang up troubles.
( https://lwn.net/Articles/627419/ )

There are at least three cases where a thread falls into infinite loop
inside the memory allocator.

The first case is too_many_isolated() throttling loop inside
shrink_inactive_list(). This throttling is intended for not to invoke the
OOM killer unnecessarily, but a certain type of memory pressure can make it
possible to let too_many_isolated() return true forever and nobody can
escape from shrink_inactive_list(). If all threads trying to allocate memory
are caught at too_many_isolated() loop, nobody can proceed.
( http://marc.info/?l=linux-kernel&m=140051046730378 and
http://marc.info/?l=linux-mm&m=141671817211121 ; Reproducer program for this
case is shared by only security@xxxxxxxxxx members and some individuals. )

The second case is allocation requests without __GFP_FS flag. This behavior
is intended for not to invoke the OOM killer unnecessarily because there
might be memory reclaimable by allocation requests with __GFP_FS flag. But
it is possible that all threads doing __GFP_FS allocation requests (including
kswapd which is capable of reclaiming memory with __GFP_FS flag) are blocked
and nobody can perform memory reclaim operations. As a result, the memory
allocator gives nobody a chance to invoke the OOM killer, falling into
infinite loop.

The third case is that the OOM victim is unable to release memory due to being
blocked by invisible dependency after a __GFP_FS allocation request invoked
the OOM killer. This case can occur when the OOM victim is blocked for waiting
for a lock whereas a thread doing allocation request with the lock held is
waiting for the OOM victim to release its mm struct. For example, we can
reproduce this case on XFS filesystem by doing !__GFP_FS allocation requests
with inode's mutex held. We can't expect that there are memory reclaimable by
__GFP_FS allocations because the OOM killer is already invoked. And since
there is already an OOM victim, the OOM killer is not invoked even if threads
doing __GFP_FS allocations are running. As a result, allocation requests by a
thread which is blocking an OOM victim can fall into infinite loop regardless
of whether the allocation request is __GFP_FS or not. We call such state as
OOM deadlock.

There are programs which are protected from the OOM killer by setting
/proc/$pid/oom_score_adj to -1000. /usr/sbin/sshd (an example of such
programs) is helpful for restarting programs killed by the OOM killer
because /usr/sbin/sshd can offer a mean to login to the system. However,
under the OOM deadlock state, /usr/sbin/sshd cannot offer a mean to login
because /usr/sbin/sshd will be stalling forever inside allocation requests
(e.g. page faults).

Those who set /proc/sys/vm/panic_on_oom to 0 are not expecting that the
system falls into forever-inoperable state when the OOM killer is invoked.
Instead, they are expecting that the system keeps operable state via the
OOM killer when the OOM killer is invoked. But current behavior makes it
impossible to login to the system, impossible to trigger SysRq-f (manually
kill a process) due to "workqueue being fallen into infinite loop inside
the memory allocator" or "SysRq-f choosing an OOM victim which already got
TIF_MEMDIE flag and got stuck due to invisible dependency". As a result, they
need to choose from SysRq-i (manually kill all processes), SysRq-c (manually
trigger kernel panic) or SysRq-b (manually reset the system). Asking them to
choose one of these SysRq is an unnecessarily large sacrifice. Also, they are
carried penalty that they need to go to in front of console in order to issue
a SysRq command, for infinite loop inside the memory allocator prevents them
from logging into the system via /usr/sbin/sshd . And since administrators are
using /proc/sys/vm/panic_on_oom with 0 without understanding that there is
such sacrifice and penalty, they rush into support center that their systems
had unexplained hang up problem. I do want to solve this circumstance.

The above description is about the third case. But people are carried penalty
for the first case and the second case that their systems fall into forever-
inoperable state until they go to in front of console and trigger SysRq-f
manually. The first case and the second case can happen regardless of
/proc/sys/vm/panic_on_oom setting because the OOM killer is not involved, but
administrators are using it without understanding that there are such cases.
And, even if they rush into support center with vmcore captured via SysRq-c,
we cannot analyze how long the threads spent looping inside the memory
allocator because current implementation gives no hint.

About proposals for mitigating this problem:

There has been several proposals which try to reduce the possibility of
OOM deadlock without use of timeout. Two of them are explained here.

One proposal is to allow small allocation requests to fail in order to avoid
lockups caused by looping forever inside the memory allocator.
( https://lwn.net/Articles/636017/ and https://lwn.net/Articles/636797/ )
But if such allocation requests start failing under memory pressure, a lot of
memory allocation failure paths which have almost never been tested will be
used, and various obscure bugs (e.g.
http://marc.info/?l=dri-devel&m=142189369426813 ) will show up. Thus, it is
too risky to backport. Also, as long as there are GFP_NOFAIL allocations
(either explicit or open-coded retry loop), this approach cannot completely
avoid OOM deadlock.

If we allow small memory allocations to fail than loop inside the memory
allocator, allocation requests caused by page faults start failing. As a side
effect, either "the OOM killer is invoked and some mm struct is chosen by the
OOM killer" or "that thread is killed by SIGBUS signal sent from the kernel"
will occur when an allocation request by page faults failed.

If important processes which are protected from the OOM killer by setting
/proc/$pid/oom_score_adj to -1000 are killed by SIGBUS signal than kill
OOM-killable processes via the OOM killer, /proc/$pid/oom_score_adj becomes
useless. Also, we can observe kernel panic triggered by the global init
process being killed by SIGBUS signal.
( http://marc.info/?l=linux-kernel&m=142676304911566 )

Regarding !__GFP_FS allocation requests caused by page faults, there will
be no difference (except for SIGBUS case explained above) between "directly
invoking the OOM killer while looping inside the memory allocator" and
"indirectly invoking the OOM killer after failing the allocation request".

However, penalty carried by failing !__GFP_FS allocation requests not caused
by page faults is large. For example, we experienced in Linux 3.19 that ext4
filesystem started to trigger filesystem error actions (remount as read-only
which prevents programs from working correctly, or kernel panic which stops
the whole system) when memory is extremely tight because we unexpectedly
allowed !__GFP_FS allocations to fail without retrying.
( http://marc.info/?l=linux-ext4&m=142443125221571 ) And we restored the
original behavior for now.

It is observed that this proposal (which allows memory allocations to fail)
is likely carrying larger penalty than trying to keep the system operable
state by invoking the OOM killer. Allowing small allocations to fail is not
as easy as people think.

Another proposal is to reserve some amount of memory which is used by
allocation requests which can invoke the OOM killer, by manipulating zone
watermark. ( https://lwn.net/Articles/642057/ ) But this proposal will not
help if threads which are preventing the OOM victim are doing allocation
requests which cannot invoke the OOM killer, or threads which are not
preventing the OOM victim can consume the reserve by doing allocation
requests which can invoke the OOM killer. Also, by manipulating zone
watermark, there could be performance impact because direct reclaim is
more likely to be invoked.

Since the dependency needed for avoiding OOM deadlock is not visible to the
memory allocator, we cannot avoid use of heuristic approaches for detecting
the OOM deadlock state. Already proposed for many times, and again proposed
here is to invoke the OOM killer based on timeout approach.

0/5000

ソース言語: -

ターゲット言語: -

結果 (日本語) 1: [コピー]

コピーしました！

この提案は可能性に焦点を当て、中間の改正Linux システムのために永遠をロックできる問題のバックポートメモリアロケーターの動作です。現在のメモリアロケーターの動作: についてメモリアロケーター割り当て要求失敗するよりもループを続けています。「要求されたページの順序は PAGE_ALLOC_COSTLY_ORDER を超える」場合を除きまたは「GFP_NORETRY フラグが割り当て要求に渡された」または"TIF_MEMDIEフラグで設定された現在のスレッドで OOM キラー」。結果として、システムは、カーネルメッセージ; なしの状態を永遠に失速に陥ることが原因不明のシステムトラブルハングアップに終って。(https://lwn.net/Articles/627419/)ありますが、少なくとも 3 つのスレッドが無限ループに該当内部メモリアロケーター。最初のケースは too_many_isolated() の内部ループを調整shrink_inactive_list()。ためのものですこのスロットルを起動しないように、OOM キラーメモリの特定の型が不必要に、圧力はそれを行うことができますtoo_many_isolated() 真の永遠および誰も返すことがでく。ことが可能にすることができます。shrink_inactive_list() からの脱出します。メモリを割り当てるしようとしているかどうかのすべてのスレッドつかまえられる too_many_isolated() ループで誰もが進むことができます。（http://marc.info/?l=linux-kernel&m=140051046730378 とhttp://marc.info/?l=linux-mm&m=141671817211121;このため再生プログラム場合は、security@xxxxxxxxxx メンバーのみといくつかの個人によって共有されます。)2 番目のケースは、__GFP_FS フラグを設定しない割り当て要求です。この動作ためのものですので、不必要に OOM のキラーを起動しないようにありますメモリ再割り当て要求 __GFP_FS フラグ付きでがあります。しかしそれは可能なすべてのスレッド __GFP_FS （を含む割り当て要求を行う__GFP_FS フラグでメモリを再利用可能な kswapd) はブロックされます誰もメモリ再生操作を実行できます。結果として、メモリアロケーターは、誰もに陥る OOM のキラーを起動するチャンスを与える無限ループ。3 番目のケースは OOM 犠牲者がいるためにメモリを解放することができます。__GFP_FS 割り当て要求が呼び出された後に目に見えない依存関係によってブロックされています。OOM のキラー。この場合は待って OOM 被害者がブロックされたときに発生しますロックをロックに割り当て要求を行うスレッドの開催はその mm 構造体を解放する OOM の犠牲者を待っています。たとえば、することができます。XFS ファイルシステム上でこのケースを行うことによって再現！ __GFP_FS の割り当ての要求inode のミューテックスを開催しました。によって再メモリがあることを期待できません。__GFP_FS 割り当て OOM キラーが既に呼び出されないため。以来OOM 被害者はすでに、OOM キラーが呼び出されない場合でもスレッド__GFP_FS やっている配分を実行しています。結果として、割り当てを要求します。OOM 被害者をブロックしているスレッドが無限ループに関係なくに落ちることができます。かどうか、割り当て要求が __GFP_FS かどうか。このような状態として呼ぶOOM デッドロック。OOM キラーから設定することによって保護されているプログラムがあります。/proc/$pid/-1000 への oom_score_adj。/usr/sbin/sshd (そのような物の例プログラム) OOM キラーによって殺されたプログラムを再起動するために便利です/usr/sbin/sshd は、システムにログインすることを意味を提供することができます。ただし、OOM デッドロック状態下/usr/sbin/sshd ログインする平均を提供できません。割り当て要求中/usr/sbin/sshd を永遠に失速がので(ページフォールトなど)。人々は/proc/sys/vm/panic_on_oom を 0 に設定はことを期待していない、OOM キラーが呼び出されたときシステムは永遠に動作不能状態に陥る。代わりに、彼らは、システムを介して実行可能な状態を保持することを期待している、OOM キラー OOM キラーが呼び出されたとき。現在の動作はシステムにログインする、トリガー SysRq f することは不可能 (手動ですることは不可能プロセスを殺す）"ワークキュー内に無限ループに陥っているためメモリアロケーター"または"SysRq f をすでに得た OOM 被害者を選択します。TIF_MEMDIE フラグとは目に見えない依存関係のため捕まってしまった」。結果として、彼らSysRq i から選択する必要があります（手動で）を殺すのすべてのプロセス、SysRq c (手動でカーネルパニックを引き起こす) または SysRq b （手動でシステムをリセットします)。ことを求めるこれら SysRq の 1 つは、不必要に大きな犠牲を選択します。また、彼らは彼らを発行するためにコンソールの前に行く必要があるペナルティを実施メモリアロケーター内に無限ループのための SysRq コマンドはそれらを防ぐ/usr/sbin/sshd システムにログインします。管理者は、以来、0 があることを理解せずに/proc/sys/vm/panic_on_oom を使用してください。サポートに突入そのような犠牲とペナルティをセンターシステム原因不明の問題をハングアップをいた.このような状況を解決するためにしたきます。上記の説明は、3 番目のケースについてです。しかし、人々は運ばれたペナルティ最初のケースと彼らのシステムに永遠に - 分類される 2 番目のケースコンソールの前に行くし、トリガー SysRq f まで手術不能の状態手動で。最初のケースと 2 番目のケースにかかわらず起こることができます。/proc/sys/vm/panic_on_oom OOM キラーは、関与していないために、設定が管理者はこのような場合があることを理解せずにそれを使用しています。そして、たとえ SysRq c を介してキャプチャされた vmcore とサポートセンターに突入我々はどのくらいのスレッドがメモリ内ループ過ごした分析できません。アロケーター現在の実装ではヒントを与えるため。この問題を軽減するための提案: について可能性を削減しようとするいくつかの提案があったOOM のデッドロックのタイムアウトを使用せず。それらの 2 つはここで説明しています。1 つの提案を避けるために失敗する小さな割り当ての要求を許可するのにはメモリアロケーターの中永遠にループによって引き起こされるハングアップ。(https://lwn.net/Articles/636017/および https://lwn.net/Articles/636797/)このような割り当てを要求するかどうかはメモリ圧迫の多く失敗を開始します。ほとんど決してテストされているメモリ割り当ての失敗のパスになります使用すると、様々な無名のバグ（例えばhttp://marc.info/?l=dri-devel&m=142189369426813) が表示されます。したがって、それはバックポートするは危険すぎます。また、限りは、GFP_NOFAIL 割り当てです。(明示的または開くコーディング再試行ループ)、このアプローチは完全にできません。OOM デッドロックを避けるため。我々は、メモリ内でループをより失敗する小さなメモリ割り当てを許可する場合アロケーター割り当て要求ページフォールト開始の失敗によって引き起こされます。副作用として効果は、どちらか"OOM キラーが呼び出され、いくつかのミリメートルの構造体がによって選ばれた、OOM キラー"や「というスレッドによって殺されます SIGBUS シグナルはカーネルから送信された」されるときに、割り当て要求失敗したページフォールトで。重要な処理をする場合の設定で OOM キラーから保護されて/proc/$pid/-1000 へ oom_score_adj は殺すより SIGBUS シグナルによって殺されるOOM キラーを介してプロセスを OOM 欺いて、/proc/$pid/oom_score_adj になります。役に立たない。また、カーネルパニックグローバル init によって引き起こされることがわかりますプロセスは、SIGBUS シグナルによって殺されます。(http://marc.info/?l=linux-kernel&m=142676304911566)について！ __GFP_FS 割り当て要求がページフォールトによって引き起こされると、そこは「直接の間 (SIGBUS 場合上で説明した) を除いては違いはありません。メモリアロケーター内ループ処理しながら OOM キラーを呼び出す"と「直接呼び出していない OOM キラー割り当て要求に失敗した後」。しかし、ペナルティは失敗によって運ばれる！ __GFP_FS 割り当て要求しない原因ページフォールトによっては大きいです。たとえば、我々は ext4 Linux 3.19 で経験ファイルシステム（マウント読み取り専用ファイルシステムエラーアクションをトリガーする開始正常に、またはカーネルパニックを停止するからプログラムを防止します。全体のシステムメモリが非常にタイトなので我々が予期せず許可！ __GFP_FS 割り当てを再試行せず失敗させます。(http://marc.info/?l=linux-ext4&m=142443125221571)我々を復元し、今のところ元の動作。それはことを観察はこの提案（これはメモリ割り当てに失敗することができます）操作可能なシステムを維持しようとするよりもはるかに低下を運ぶ可能性が高いです。OOM キラーを呼び出すことによって状態。小さな割り当てに失敗することができますはありません。人々が思うほど簡単です。別の提案はいくつかで使用されているメモリの量を予約するにはOOM キラーゾーンを操作することによって呼び出すことのできる割り当て要求透かし。(https://lwn.net/Articles/642057/)しかし、この提案はないです。OOM の犠牲者を妨げているスレッドが割り当てを行っている場合に役立つ要求の OOM キラーを呼び出すことはできませんまたはスレッドではないです。割り当てすることによって、準備を消費できる OOM 犠牲者を防止します。OOM キラーを呼び出すことのできる要求。また、ゾーンを操作することによって透かし、直接再利用するため、パフォーマンスへの影響がある可能性があります。呼び出される可能性が高い。デッドロックの OOM を回避するための依存関係に必要なためには見えない、メモリアロケーター我々を検出するためのヒューリスティックなアプローチの使用を避けることができません。OOM のデッドロック状態。既に多くの時間と再び提案しました。ここでは呼び出されますタイムアウトアプローチに基づく OOM キラー。

翻訳されて、しばらくお待ちください..

結果 (日本語) 2:[コピー]

コピーしました！

この提案は、の可能性に焦点を当てた暫定修正、ある
Linuxシステムが原因に永遠にロックアップすることができます問題のため、バックポート
メモリアロケータの挙動。メモリアロケータの現在の動作について：メモリアロケータは、割り当て要求を失敗よりもループし続けます「要求されたページの順序がPAGE_ALLOC_COSTLY_ORDERよりも大きい」場合を除き、または「GFP_NORETRYフラグが割り当て要求に渡される」または「TIF_MEMDIEのフラグがOOMキラーによって、現在のスレッドに設定されました"。その結果、システムは永遠に任意のカーネルメッセージがない状態をストールに陥ることができます。。トラブルをハングアップする原因不明のシステムで得られた（https://lwn.net/Articles/627419/）スレッドがに該当する場合には、少なくとも3つのケースがあります。無限ループメモリアロケータの内部。内部の最初のケースはtoo_many_isolatedされる（）絞りループshrink_inactive_list（）。この調整は、起動しないようにするために意図され、不必要にOOMキラーが、メモリ圧力の特定のタイプは、それが行うことができますtoo_many_isolated（）は永遠にtrueを返すと、誰もすることができますさせることが可能）（shrink_inactive_listから逃げません。メモリを割り当てるしようとしているすべてのスレッドが場合too_many_isolated（）ループで捕捉され、誰もが進行することはできません。（http://marc.info/?l=linux-kernel&m=140051046730378とhttp://marc.info/?l=linux- MM＆M = 141671817211121、このための再生プログラムの場合は、唯一のセキュリティの@ XXXXXXXXXXメンバーと一部の人によって共有されている）。第二の場合は、__GFP_FSフラグなしの割り当て要求です。この動作は、そこにあるため、不必要にOOMキラーを起動しないようにするためのものです__GFP_FSフラグと割り当て要求によって再利用可能メモリ可能性があります。しかし、それは（含む__GFP_FS割り当て要求やっすべてのスレッド可能性がある__GFP_FSフラグを使用してメモリを再利用することが可能であるkswapd）がブロックされていると誰もがメモリを実行できません操作を取り戻します。その結果、メモリアロケータはに落ち、誰にOOMキラーを起動する機会を与えてくれない無限ループ。三番目のケースは、OOMの被害者が起因しているのメモリを解放することができないことである__GFP_FS割り当て要求が呼び出された後に目に見えない依存関係によってブロックOOMをキラー。OOMの被害者が待機してブロックされたときに、この場合に発生する可能性がありますされて保持されたロックとスレッドやって割り当て要求に対し、ロックのそのミリ構造体を解放するOOMの犠牲者を待っています。例えば、私たちはすることができます実行して、XFSファイルシステム上でこのケースを再現！__は割り当て要求をGFP_FS 開催iノードのミューテックスで。我々はによって再利用可能メモリがあることを期待することはできませんOOMキラーが既に呼び出されているため__GFP_FS配分。以来とOOMの犠牲者が既にある、OOMキラーはスレッド場合であっても起動されません__GFP_FSの割り当てを行って、実行しています。その結果、によって割り当て要求OOMの犠牲者をブロックしているスレッドにかかわらず、無限ループに陥ることができます割り当て要求が__GFP_FSであるか否かの。私たちはこのような状態を呼び出すOOMデッドロック。設定することによって、OOMキラーから保護されているプログラムがあります-1000には/ proc / $ PID / oom_score_adjが。は、/ usr / sbin / sshdを（そのようなものの例プログラム）はOOMキラーに殺されたプログラムの再起動のために有用である/ usr / sbin / sshdがシステムにログインするための平均値を提供することができるために。しかし、OOMのデッドロック状態では、/ usr / sbin / sshdログインする平均提供することはできませんので、は、/ usr / sbin / sshdの割り当て要求の内部に永久にストールされます（例えば、ページフォールトを）。に/ proc / sys / vm /を設定した者0にpanic_on_oomはことを期待していないOOMキラーが呼び出されたときに、システムが永久に、操作不能状態に陥る。その代わりに、彼らは、システムが動作可能な経由で状態を維持することを期待しているOOMキラーが呼び出されたときにOOMキラー。しかし、現在の動作は、それが行う（手動SysRqを-Fをトリガすることは不可能で、システムにログインすることは不可能による「内部で無限ループに陥っされるワークキューにプロセスを強制終了）メモリアロケータ」またはすでに持って「SysRqを-F選択OOMの犠牲者TIF_MEMDIEフラグ起因見えない依存性」に捕まってしまいました。その結果、彼らはSysRqを-I（手動ですべてのプロセスを殺す）、SysRqを-C（手動から選択する必要がカーネルパニックを誘発する）またはSysRqを-B（手動でシステムをリセット）。するためにそれらを求めて、これらのSysRqをのいずれかを選択すると、必要以上に大きな犠牲です。また、それらがされている発行するために、彼らは、コンソールの前にに行く必要があることがペナルティを実施SysRqをコマンドを、メモリアロケータ内の無限ループのためにそれらを防止は、/ usr / sbin / sshdを介してシステムにログイン。管理者がしているためとがあることを理解せずに0とに/ proc / sys / vm / panic_on_oomを使用して、このような犠牲とペナルティ、彼らは彼らのシステムは、サポートセンターに突入原因不明のハングアップ問題を抱えていました。私はこの状況を解決したいん。上記の説明は、第三のケースについてです。しかし、人々はペナルティ搭載して、そのシステムがforever-に陥ることを第1のケースと第2のケースのために、彼らは、コンソールの前にに移動し、SysRqを-Fをトリガするまで操作不能状態に手動。第1のケースと第2のケースは関係なく、発生する可能性がOOMキラーが関与していないが、ために/ proc / sys / vm / panic_on_oom設定管理者は、このような場合があることを理解せずにそれを使用している。彼らはサポートセンターに突入しても、そして、 SysRqを-Cを介して捕捉vmcoreで、私たちは、スレッドがメモリの内部ループ過ごした時間の長さを分析することはできません。現在の実装では何のヒントを与えていないので、アロケータ：この問題を軽減するための提案についての可能性低減しようとするいくつかの提案がなされていることなく、OOMのデッドロックをタイムアウトを使用します。それらのうちの2つは、ここで説明されている。1つの提案は、小さな割り当て要求を回避するために失敗することを可能にすることであるメモリアロケータの内部に永遠にループすることにより引き起こされるロックアップ。（https://lwn.net/Articles/636017/とhttps：// LWN .NET / / 636797 /）の記事このような割り当て要求がメモリ圧力の下で失敗し始める場合、多くのしかし、ほとんどテストされていなかったメモリ割り当ての失敗のパスがされ使用され、様々なあいまいなバグ（例えばhttp://marc.info/ ？L = DRI-develの＆M = 142189369426813）が表示されます。したがって、それはバックポートするにはあまりにも危険な。また、限りGFP_NOFAILの割り当てがあるように（明示的またはオープンコード化された再試行ループ）は、このアプローチは、完全にすることはできませんOOMのデッドロックを回避します。私たちは、小さなメモリの割り当ては、メモリ内部のループよりも失敗することを可能にする場合はアロケータ、ページフォルトに起因する割り振り要求を開始します失敗。側としては効果、どちらか「OOMキラーが呼び出されると、いくつかのミリ構造体がによって選択されたOOMキラー""そのスレッドがカーネルから送信されたSIGBUS信号によって殺されている」またはページフォルトにより割り当て要求が失敗したとき。発生する重要な場合設定することにより、OOMキラーから保護されているプロセス-1000には/ proc / $ PID / oom_score_adjを殺すよりも、SIGBUSシグナルによって殺されているOOMキラーを介してOOM-killableプロセスを、/ procの/ $ PID / oom_score_adjになり無用。また、私たちはグローバルな初期化によってトリガカーネルパニック観察できるSIGBUS信号によって殺されているプロセスを。（http://marc.info/?l=linux-kernel&m=142676304911566）ページフォルトに起因する！__ GFP_FS割り当て要求については、そこになりますこと何の違い"との間で直接（SIGBUSケースを除いては、上記で説明していない）、メモリアロケータの内部でループしながらOOMキラー呼び出す" 。 "間接的に割り当て要求を失敗した後OOMキラーを起動し、「__ GFP_FS割り当て要求が発生していないが、ペナルティがないことにより実施します！ページフォルトによって大きいです。例えば、私たちはext4をのことをLinuxの3.19に経験のファイルシステムは、ファイルシステムのエラー·アクション（読み取り専用として再マウントトリガを開始し正常に動作してからプログラムを防止する、または停止したカーネルパニック、我々は予想外にあるため、メモリが非常にタイトである場合に、システム全体を）許さ！__ GFP_FS割り当ては。再試行せずに失敗する（http://marc.info/?l=linux-ext4&m=142443125221571）そして、我々は復元今のところ、元の動作を。それは、（メモリの割り当てが失敗することができます）この提案はすることが観察される可能性が運んでいますシステム動作可能な維持しようとしてより大きなペナルティOOMキラーを呼び出すことによって、状態を。小さな割り当てが失敗することを可能にすることはない人々が考えるように簡単。他の提案はで使用されるメモリの一部量確保してあるゾーン操作することにより、OOMキラーを呼び出すことができます割り当て要求透かしを。（https://lwn.net/Articles/642057/）しかし、この提案ではないでしょうOOMの犠牲者を防止しているスレッドが割り当て実行する場合に役立ちOOMキラーを起動することはできません要求をするか、していないスレッドOOMの犠牲者を防ぐには、消費することができます配分実行して、予備OOMキラーを呼び出すことができるの要求を。また、ゾーン操作することで、直接再利用であるため、透かしを、パフォーマンスへの影響がある可能性が呼び出される可能性が高い。OOMのデッドロックを回避するために必要な依存関係はには見えないので、メモリアロケータは、我々は検出するためのヒューリスティック手法の使用を避けることができないOOMのデッドロックを状態。すでに何度も提案し、再度提案ここでタイムアウトアプローチに基づくOOMキラーを起動することです。

翻訳されて、しばらくお待ちください..

結果 (日本語) 3:[コピー]

コピーしました！

この提案は暫定修正、backportingの可能性に集中したが、システムがロックアップして永遠に回メモリアロケータの挙動による問題の。メモリアロケータの現在の挙動について：メモリアロケータを続けるよりも、割り当て要求を輪にして失敗しない限り1」を要求されたページのためにpage_alloc_costly_orderよりも大きい」または「gfp_noretryフラグ割当て要請」または「tif_memdieフラグoomの殺人者」によると、現在のスレッドにセットされる。結果として、システムのカーネルのメッセージもなく永遠に失速状態に陥ることが不可解なシステムになる回掛けるトラブル（https://lwn.net/articles/627419/）ですが、少なくとも3つの例のスレッドが無限ループに陥るでメモリアロケータの内部に第1のケース1内too_many_isolated() shrink_inactive_list()ループを抑えています。この絞りを必要以上には、フロッピーディスクを起動するために意図されないが、メモリ圧力の特定の型は、それが真実とtoo_many_isolated()永遠に誰もshrink_inactive_list()から逃れることを返すようにすることができる。メモリ3を確保しようとすればすべてのスレッドtoo_many_isolated()ループで捕えられ、誰もが進行することができます。（セクションなどl＝linuxカーネルの&m＝140051046730378とセクションなどl＝linux mm&m＝141671817211121。この場合再生プログラムのための唯一のセキュリティxxxxxxxxxメンバーと一部の個人によって共有されます。）2番目の場合__gfp_fsフラグなしの配分を要求します。この行動ですそこ__gfp_fsフラグメモリ割当ての要求によって開墾可能であるかもしれないので、不必要にフロッピーディスクを起動するためのものではない。しかし、それが可能__gfp_fs割当て要求をしているすべての糸（kswapdが__gfp_fs旗と記憶を取り戻すことができるブロックを含む）と誰もメモリ操作の再生を行うことができる。結果として、メモリですアロケータだれもフロッピーディスクを起動する機会を与え、無限ループへ落下します。第三の場合oomの犠牲者は__gfp_fs割当て要求を起動後、フロッピーディスクの見えない依存関係によってブロックすることによりメモリを解放することができないということです。oomの犠牲者は、ロック、ロックを保持しているスレッド割当て要求待ちのためにブロックされるとき、この場合起こることそのmm structをリリースするoomの犠牲者を待っています。たとえば、我々はxfsのファイルシステムにすることによってこの事件を再現することができる！__gfp_fs割当ての要求の回のinodeのmutexで開催された。我々は、メモリをまねいて__gfp_fs配分によって、フロッピーディスクから既に呼び出されることを期待することができません。と、叔父の犠牲者が既にあるので、フロッピーディスクの場合であっても糸回呼び出されません__gfp_fs配分をすることを実行しています。結果として、犠牲者を妨げているoom aのスレッドによって割当ての要求に関係なく、割り当て要求__gfp_fsかそうでないかの無限ループに陥ることができます。我々は、叔父としてそのようなデッドロック状態と呼びます。ですが保護されてからプログラムを設定することによって、フロッピーディスク8 / proc / pidをoom_score_adj 1000。/ usr / sbin / sshd（この回の例プログラム）は、システムにログインするには平均を提供できる/ usr / sbin / sshdのため、フロッピーディスクによる死亡プログラムを再開するために役に立ちます。しかし、oomデッドロック状態の下では、/ usr / sbin / sshdの平均/ usr / sbin / sshdの割当ての要求の中に永遠に時間稼ぎですのでログインに提供することができない（例えばページフォールト）それらの人の設定は/ proc / sys vmを0 panic_on_oomがないのを期待しているシステムの滝は永遠に動作不能状態にするときにはフロッピーディスクが起動します。その代わりに、彼らはそれを期待して動作可能状態を維持し、システムを用いたフロッピーディスク、フロッピーディスクが起動される。しかし、現在の動作は、システムへのログインが不可能を不可能にして、トリガsysrq-f（手動のプロセスを殺す）内側の無限ループに陥っているworkqueue」によるメモリアロケータ」または「sysrq-fを選ぶことtif_memdieフラグをすでに得なoomの犠牲者と見えない依存性」のために動けなくなった。結果として、彼らはsysrq-iから選択する必要がある（手動ですべてのプロセスを殺す）、（2 sysrq-c手動トリガーカーネルパニック）またはsysrq-b（システムを手動でリセット）。それらの1つを選んで、これらのsysrqに必要以上に大きな犠牲を求めている。また、彼らは回彼らは、sysrqコマンドを発行するためにコンソールの前に行く必要がある罰、メモリアロケータの中に無限ループのために、彼らの伐採防止システムに/ usr / sbin / sshd経由から。1と管理者は/ proc / sys /などと0回犠牲とペナルティがあるということを理解することなくpanic_on_oom vmを使用しているので、彼らはラッシュサポートセンターへの彼らのシステム原因不明のハングアップの問題がありました。私はこの状況を解決したいです。上記の説明は、第三の事件についてです。しかし、人々が罰のための第1回行った場合と第2のケースが彼らのシステムの秋に永遠に・・・動作不能状態までに行く前に彼らのコンソールとsysrq-f手動トリガーです。第1の場合と第2のケースに関係なく起こること/ proc / sys / vm / panic_on_oom oomの殺人者が関与していないので、設定の場合、管理者による理解なしでそのような例があるということです。そして、彼らさえsysrq-cを介して捕らえられたvmcoreでサポートセンターに突入するならば、我々はどのように分析できない長い糸使用済みメモリアロケータをループ内の現在の実装ですノーヒントを与えますので。この問題を緩和するための提案について：そこのタイムアウトを使用しないoomデッドロックの可能性を減らすためにしようとするいくつかの提案されている。彼ら二人はここで説明されます。1つの提案を許容する小さな割当要求に失敗を避けるために留置場による回ループを永遠に内のメモリアロケータ。（https://lwn.net/articles/636017/とhttps :// lwn .ネット記事636797）などの要求がメモリの圧力の下で失敗し始めるならば、ほとんど決してテストされているメモリの割り当てに失敗したパスの色々な使用されますと、各種のはっきりしないバグ（例えばセクションなどですか？l＝driを強打&m＝142189369426813）が表示されます。このように、それはあまりにも危険になります。また、1 gfp_nofail配分がある限り（明示的またはオープンループも再符号化）、このアプローチは、完全に回oomのデッドロックを避けることができないならば、我々の小さなメモリ割り当てに失敗することは、メモリアロケータの内側の1ループよりも、ページフォルトが失敗しているスタートによる割当を要求します。2 a側の効果として、「フロッピーディスクのどちらかが呼び出されるといくつかのmm structによる選択oomの殺人者」または「が殺されたから送られる信号によってsigbusカーネル」の場合が発生し、割り当て要求によってページフォールトに失敗します。/ proc / pidをoom_score_adj 1000 oomの殺人者を介してkillable oomのプロセスを殺すより殺さsigbus信号によって設定することによって、フロッピーディスクからの保護重要なプロセスがなければ、/ proc / pid oom_score_adj回は無駄になる。また、我々は、sigbus信号によって殺される回グローバルinitプロセスによって引き起こされるカーネルパニックを観察することができます。（セクションなどl＝linuxカーネルの&m）について142676304911566です！ページフォールトに起因する__gfp_fs配分を要求し、差がありません（sigbusケース以外は上述した「メモリアロケータ」としての内側のループを直接フロッピーディスクの起動すると「割り当て要求に失敗した後、「フロッピーディスクを間接的に起動します。しかし、失敗によるペナルティ！ページフォルトが生じない回より大きい__gfp_fs配分を要求します。たとえば、我々はext 4ファイルシステムのファイルシステムエラー1アクションをトリガし始めていることをlinux 3 . 19で経験した回（専用のプログラムが正しく働くのを防ぎとして新馬やカーネルパニックを止める回システム全体では1）の予想外のを許したので、メモリは非常にきついとき！リトライせずに失敗する__gfp_fs配分（セクションなどですか？l＝linux-ext4&m＝142443125221571）とは、現在の最初の行動を再興した。それはこの提案を観測した（メモリの割り当てに失敗することがある）を運んで、そう大きなペナルティシステム操作可能な回を保とうとしているよりもoomの殺人者を呼び出すことによって状態。小さい配分を失敗させる簡単ではないし、人々が考えるようにします。別の提案を呼び出すことができるフロッピーディスクの割り当て要求に使用されているメモリの若干の量を確保するために、透かしを操作することによってゾーンです。（https://lwn.net/articles/642057/）しかし、この提案は叔父の犠牲者を防止するスレッド1が配置しているならば、助けないでしょうoomの殺人者を呼び出すことができない要求、またはoomの犠牲者を防いでいるのではない、フロッピーディスクのスレッドが起動できる配分を要請することによって蓄えを消費します。また、ゾーン1の透かしを操作することにより、より直接再生回呼び出されるそうですので、パフォーマンスに影響を与える可能性があります。oomのデッドロックを避けるために必要な依存性に見えないのでメモリアロケータは、叔父のデッドロック状態を検出するための発見的手法の使用を避けることができない。多くの時間のためにすでに提案されており、ここのタイムアウトを改めて提案手法に基づくoomの殺人者を呼び出すのです。

翻訳されて、しばらくお待ちください..

他の言語

翻訳ツールのサポート: アイスランド語, アイルランド語, アゼルバイジャン語, アフリカーンス語, アムハラ語, アラビア語, アルバニア語, アルメニア語, イタリア語, イディッシュ語, イボ語, インドネシア語, ウイグル語, ウェールズ語, ウクライナ語, ウズベク語, ウルドゥ語, エストニア語, エスペラント語, オランダ語, オリヤ語, カザフ語, カタルーニャ語, カンナダ語, ガリシア語, キニヤルワンダ語, キルギス語, ギリシャ語, クメール語, クリンゴン, クルド語, クロアチア語, グジャラト語, コルシカ語, コーサ語, サモア語, ショナ語, シンド語, シンハラ語, ジャワ語, ジョージア（グルジア）語, スウェーデン語, スコットランドゲール語, スペイン語, スロバキア語, スロベニア語, スワヒリ語, スンダ語, ズールー語, セブアノ語, セルビア語, ソト語, ソマリ語, タイ語, タガログ語, タジク語, タタール語, タミル語, チェコ語, チェワ語, テルグ語, デンマーク語, トルクメン語, トルコ語, ドイツ語, ネパール語, ノルウェー語, ハイチ語, ハウサ語, ハワイ語, ハンガリー語, バスク語, パシュト語, パンジャブ語, ヒンディー語, フィンランド語, フランス語, フリジア語, ブルガリア語, ヘブライ語, ベトナム語, ベラルーシ語, ベンガル語, ペルシャ語, ボスニア語, ポルトガル語, ポーランド語, マオリ語, マケドニア語, マラガシ語, マラヤーラム語, マラーティー語, マルタ語, マレー語, ミャンマー語, モンゴル語, モン語, ヨルバ語, ラオ語, ラテン語, ラトビア語, リトアニア語, ルクセンブルク語, ルーマニア語, ロシア語, 中国語, 日本語, 繁体字中国語, 英語, 言語を検出する, 韓国語, 言語翻訳.