mm/filemap: make buffered writes work with RWF_UNCACHED
If RWF_UNCACHED is set for a write, mark the folios being written with
drop_writeback. Then writeback completion will drop the pages. The
write_iter handler simply kicks off writeback for the pages, and
writeback completion will take care of the rest.
This provides similar benefits to using RWF_UNCACHED with reads. Testing
buffered writes on 32 files:
writing bs 65536, uncached 0
1s: 196035MB/sec, MB=196035
2s: 132308MB/sec, MB=328147
3s: 132438MB/sec, MB=460586
4s: 116528MB/sec, MB=577115
5s: 103898MB/sec, MB=681014
6s: 108893MB/sec, MB=789907
7s: 99678MB/sec, MB=889586
8s: 106545MB/sec, MB=996132
9s: 106826MB/sec, MB=
1102958
10s: 101544MB/sec, MB=
1204503
11s: 111044MB/sec, MB=
1315548
12s: 124257MB/sec, MB=
1441121
13s: 116031MB/sec, MB=
1557153
14s: 114540MB/sec, MB=
1671694
15s: 115011MB/sec, MB=
1786705
16s: 115260MB/sec, MB=
1901966
17s: 116068MB/sec, MB=
2018034
18s: 116096MB/sec, MB=
2134131
where it's quite obvious where the page cache filled, and performance
dropped from to about half of where it started, settling in at around
115GB/sec. Meanwhile, 32 kswapds were running full steam trying to
reclaim pages.
Running the same test with uncached buffered writes:
writing bs 65536, uncached 1
1s: 198974MB/sec
2s: 189618MB/sec
3s: 193601MB/sec
4s: 188582MB/sec
5s: 193487MB/sec
6s: 188341MB/sec
7s: 194325MB/sec
8s: 188114MB/sec
9s: 192740MB/sec
10s: 189206MB/sec
11s: 193442MB/sec
12s: 189659MB/sec
13s: 191732MB/sec
14s: 190701MB/sec
15s: 191789MB/sec
16s: 191259MB/sec
17s: 190613MB/sec
18s: 191951MB/sec
and the behavior is fully predictable, performing the same throughout
even after the page cache would otherwise have fully filled with dirty
data. It's also about 65% faster, and using half the CPU of the system
compared to the normal buffered write.
Signed-off-by: Jens Axboe <axboe@kernel.dk>