Vulnerability analysis: CVE-2022-32250
Haonan

Introduction

Netfilter is a complicated subsystem in Linux. As its name indicated, it was used to filter network packets by some rules. Hence, netfilter provides multiple data structures (polymorphisms) and abstraction to handle different filter rules. Essentially, the netfilter places hooks throughout the regular networking modules for which other modules can register handlers. Such a design makes the whole module complex and difficult to analyze; on the other hand, the user input makes exploitation possible.

In netfilter, nf_tables reroutes packets based on user-defined rules. In nf_tables, a table (struct nft_table) is a container associated with a specific protocol (e.g., ip,ip6, arp). Then it will be processed in different handlers when a packet hits the route.

There are many other CVEs in the netfilter: such as CVE-2022-2586. It pertains to an expression referring to deleted set, which can cause UAF.

In this post, I will show the analysis of CVE-2022-32250, which is also a UAF.

General nf_tables architecture

nf_tables analogy

Noting a filter can have many rules, a table (struct nft_table) can house a set of chains (struct nft_chain). The chain defines what type of network traffic it is concerned about. nft_chain also includes a container for an ordered set of rules (struct nft_rule). The rules are yet another housing for expressions (struct nft_expr). All these structures are “base classes”; in practice, each structure contains a pointer to the level above it (there is a struct nft_table pointer in chains), and there are a lot of forced type conversions.

For example in struct nft_expr:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/**
* struct nft_expr - nf_tables expression
*
* @ops: expression ops
* @data: expression private data
*/
struct nft_expr {
const struct nft_expr_ops *ops;
unsigned char data[]
__attribute__((aligned(__alignof__(u64))));
};

static inline void *nft_expr_priv(const struct nft_expr *expr)
{
return (void *)expr->data;
}

nft_expr is an abstract class. It will then be converted to void* by nft_expr_priv. nft_expr_ops is another base class that mainly defines some functions (like init or destroy) for a concrete expression (very object-oriented, but in C).

Base and inherited class (part)

Base (Abstract) Instance
nft_set nft_set_hash
nft_set_rbtree
nft_chain nft_chain_filter_ipv4
nft_chain_filter_arp
nft_expr nft_cmp_expr
nft_immediate_expr
nft_lookup
nft_dynset

Bug analysis

The CVE-2022-32250 is essentially a use-after-free bug. It occurs when processing nft_lookup and nft_dynset expressions (in some abnormal way), the freed object remains in the set->binding linked list. So there are the following steps:

  1. create expression (lookup and dynset)
  2. add to linked list
  3. free the expression
  4. delete it from the linked list (not found or not executable)

Create expression

The story starts from nft_expr_init, expr is allocated in (1).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
static struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
const struct nlattr *nla)
{
struct nft_expr_info expr_info;
struct nft_expr *expr;
struct module *owner;
int err;

err = nf_tables_expr_parse(ctx, nla, &expr_info);
if (err < 0)
goto err1;

err = -ENOMEM;
expr = kzalloc(expr_info.ops->size, GFP_KERNEL); // (1)
if (expr == NULL)
goto err2;

err = nf_tables_newexpr(ctx, &expr_info, expr);
if (err < 0)
goto err3;

return expr;
err3:
kfree(expr);
err2:
owner = expr_info.ops->type->owner;
if (expr_info.ops->type->release_ops)
expr_info.ops->type->release_ops(expr_info.ops);

module_put(owner);
err1:
return ERR_PTR(err);
}

Then it will be initialized in nf_tables_newexpr:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
static int nf_tables_newexpr(const struct nft_ctx *ctx,
const struct nft_expr_info *info,
struct nft_expr *expr)
{
const struct nft_expr_ops *ops = info->ops;
int err;

expr->ops = ops;
if (ops->init) {
err = ops->init(ctx, expr, (const struct nlattr **)info->tb);
if (err < 0)
goto err1;
}

return 0;
err1:
expr->ops = NULL;
return err;
}
Context (Caller) Function name Possible instance
nf_tables_newexpr ops->init nft_lookup_init
nft_dynset_init
nft_objref_map_init

Add to linked list

In ops->init (if it actually calls nft_lookup_init, nft_dynset_init, or nft_objref_map_init), it will then call nf_tables_bind_set, where add this object to its container’s list.

binding is a field in some expressions (e.g., nft_lookup). It is added to the set(table)’s list at (2).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set,
struct nft_set_binding *binding)
{
struct nft_set_binding *i;
struct nft_set_iter iter;

if (set->use == UINT_MAX)
return -EOVERFLOW;

if (!list_empty(&set->bindings) && nft_set_is_anonymous(set))
return -EBUSY;

...

bind:
binding->chain = ctx->chain;
list_add_tail_rcu(&binding->list, &set->bindings); // (2) bind expr to set->binding
nft_set_trans_bind(ctx, set);
set->use++;

return 0;

Free the expression

After nft_expr_init, there is a check at (3). When it fails, the program will go to (4) and destroy this expression.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
struct nft_expr *nft_set_elem_expr_alloc(const struct nft_ctx *ctx,
const struct nft_set *set,
const struct nlattr *attr)
{
struct nft_expr *expr;
int err;

expr = nft_expr_init(ctx, attr);
if (IS_ERR(expr))
return expr;

err = -EOPNOTSUPP;
if (!(expr->ops->type->flags & NFT_EXPR_STATEFUL)) // (3) check
goto err_set_elem_expr;

if (expr->ops->type->flags & NFT_EXPR_GC) {
if (set->flags & NFT_SET_TIMEOUT)
goto err_set_elem_expr;
if (!set->ops->gc_init)
goto err_set_elem_expr;
set->ops->gc_init(set);
}

return expr;

err_set_elem_expr:
nft_expr_destroy(ctx, expr); // (4) destroy the expr
return ERR_PTR(err);
}

In nft_expr_destroy, we can see the free of expr (5)

1
2
3
4
5
void nft_expr_destroy(const struct nft_ctx *ctx, struct nft_expr *expr)
{
nf_tables_expr_destroy(ctx, expr);
kfree(expr); // (5) free
}

List delete?

nf_tables_expr_destroy is a key function to analyze and we have to determine if it has deleted the relevant pointers before we call kfree.

1
2
3
4
5
6
7
8
9
static void nf_tables_expr_destroy(const struct nft_ctx *ctx,
struct nft_expr *expr)
{
const struct nft_expr_type *type = expr->ops->type;

if (expr->ops->destroy)
expr->ops->destroy(ctx, expr);
module_put(type->owner);
}

It calls the expr’s instance of destroy function, take nft_lookup as an example:

1
2
3
4
5
6
7
static void nft_lookup_destroy(const struct nft_ctx *ctx,
const struct nft_expr *expr)
{
struct nft_lookup *priv = nft_expr_priv(expr);

nf_tables_destroy_set(ctx, priv->set);
}

It then call a shared function nf_tables_destroy_set:

1
2
3
4
5
void nf_tables_destroy_set(const struct nft_ctx *ctx, struct nft_set *set)
{
if (list_empty(&set->bindings) && nft_set_is_anonymous(set))
nft_set_destroy(ctx, set);
}

Actually, here we can find list_empty(&set->bindings) won’t come true, because we have just added one expr to set->bindings at (2). Hence, it won’t do anything and returns to the last function call. Finally, it causes the UAF.

But the path-insensitive analyzer won’t stop here. He has to investigate further:

More findings

nft_set_destroy: calls nft_expr_destroy and set->ops->destroy

1
2
3
4
5
6
7
8
9
10
11
12
static void nft_set_destroy(const struct nft_ctx *ctx, struct nft_set *set)
{
if (WARN_ON(set->use > 0))
return;

if (set->expr)
nft_expr_destroy(ctx, set->expr);

set->ops->destroy(set);
kfree(set->name);
kvfree(set);
}

Since nft_expr_destroy will go to (5) again! Whatever, let’s focus on set->ops->destroy. Obviously, this indirect call can have multiple instances. Take net_set_hash and net_set_rbtree as two cases:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
static void nft_hash_destroy(const struct nft_set *set)
{
struct nft_hash *priv = nft_set_priv(set);
struct nft_hash_elem *he;
struct hlist_node *next;
int i;

for (i = 0; i < priv->buckets; i++) {
hlist_for_each_entry_safe(he, next, &priv->table[i], node) {
hlist_del_rcu(&he->node);
nft_set_elem_destroy(set, he, true); // (6)
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
static void nft_rbtree_destroy(const struct nft_set *set)
{
struct nft_rbtree *priv = nft_set_priv(set);
struct nft_rbtree_elem *rbe;
struct rb_node *node;

cancel_delayed_work_sync(&priv->gc_work);
rcu_barrier();
while ((node = priv->root.rb_node) != NULL) {
rb_erase(node, &priv->root);
rbe = rb_entry(node, struct nft_rbtree_elem, node);
nft_set_elem_destroy(set, rbe, true); // (6)
}
}

Basically, these destruction functions iterate over each of these elements, remove them from the set, and then call the public function nft_set_elem_destroy at (6).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void nft_set_elem_destroy(const struct nft_set *set, void *elem,
bool destroy_expr)
{
struct nft_set_ext *ext = nft_set_elem_ext(set, elem);
struct nft_ctx ctx = {
.net = read_pnet(&set->net),
.family = set->table->family,
};

nft_data_release(nft_set_ext_key(ext), NFT_DATA_VALUE);
if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA))
nft_data_release(nft_set_ext_data(ext), set->dtype);
if (destroy_expr && nft_set_ext_exists(ext, NFT_SET_EXT_EXPR))
nft_set_elem_expr_destroy(&ctx, nft_set_ext_expr(ext));

if (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF))
(*nft_set_ext_obj(ext))->use--;
kfree(elem);
}

static void nft_set_elem_expr_destroy(const struct nft_ctx *ctx,
struct nft_expr *expr)
{
if (expr->ops->destroy_clone) {
expr->ops->destroy_clone(ctx, expr);
module_put(expr->ops->type->owner);
} else {
nf_tables_expr_destroy(ctx, expr); // we see this again ...
}
}

This is the end of analysis. It isn’t destroy_clone in nft_lookup; it will call nf_tables_expr_destroy (why is it always you) again. And in the whole procedure, there is no access to set->bindings. (no usage)

In conclusion, the destroy should clear both set and expr, but there is a condition list_empty(&set->bindings). Then the set remains but expr freed, and make further UAF possible (though not explicit).

Discussion

Patch

The patch is very simple: the fault status was introduced by (3). Noting expr_info.ops->type->flags is determined before so that we can move this check before creation for expr.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 12fc9cda4a2cf..f296dfe86b622 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2873,27 +2873,31 @@ static struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,


err = nf_tables_expr_parse(ctx, nla, &expr_info);
if (err < 0)
- goto err1;
+ goto err_expr_parse;
+
+ err = -EOPNOTSUPP;
+ if (!(expr_info.ops->type->flags & NFT_EXPR_STATEFUL))
+ goto err_expr_stateful;


err = -ENOMEM;
expr = kzalloc(expr_info.ops->size, GFP_KERNEL_ACCOUNT); // Commit kernel version is higher.
if (expr == NULL) // So it just written as GFP_KERNEL_ACCOUNT
- goto err2;
+ goto err_expr_stateful;


err = nf_tables_newexpr(ctx, &expr_info, expr);
if (err < 0)
- goto err3;
+ goto err_expr_new;


return expr;
-err3:
+err_expr_new:
kfree(expr);
-err2:
+err_expr_stateful:
owner = expr_info.ops->type->owner;
if (expr_info.ops->type->release_ops)
expr_info.ops->type->release_ops(expr_info.ops);


module_put(owner);
-err1:
+err_expr_parse:
return ERR_PTR(err);
}


@@ -5413,9 +5417,6 @@ struct nft_expr *nft_set_elem_expr_alloc(const struct nft_ctx *ctx,
return expr;


err = -EOPNOTSUPP;
- if (!(expr->ops->type->flags & NFT_EXPR_STATEFUL))
- goto err_set_elem_expr;
-
if (expr->ops->type->flags & NFT_EXPR_GC) {
if (set->flags & NFT_SET_TIMEOUT)
goto err_set_elem_expr;

Some thoughts

  1. knowledge for data struct, and its functions.
    1. Loop analysis: element traversal (list, tree), with condition
    2. combine with its function name? or access patterns (summary for each function)
  2. challenges:
    1. detect this kind of bug: freed pointers remain in some list, and can further be used without check! (implicit UAF?)
    2. not a false alarm for the patch?
  3. polymorphism

Reference

  1. David Bouman, How The Tables Have Turned: An analysis of two new Linux vulnerabilities in nf_tables

  2. Theori, Linux Kernel Exploit (CVE-2022-32250) with mqueue | Theori