Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | +---------------------------------------------------------------------------+ |
2 | | wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. | | |
3 | | | | |
4 | | Copyright (C) 1992,1993,1994,1995,1996,1997,1999 | | |
5 | | W. Metzenthen, 22 Parker St, Ormond, Vic 3163, | | |
6 | | Australia. E-mail billm@melbpc.org.au | | |
7 | | | | |
8 | | This program is free software; you can redistribute it and/or modify | | |
9 | | it under the terms of the GNU General Public License version 2 as | | |
10 | | published by the Free Software Foundation. | | |
11 | | | | |
12 | | This program is distributed in the hope that it will be useful, | | |
13 | | but WITHOUT ANY WARRANTY; without even the implied warranty of | | |
14 | | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | | |
15 | | GNU General Public License for more details. | | |
16 | | | | |
17 | | You should have received a copy of the GNU General Public License | | |
18 | | along with this program; if not, write to the Free Software | | |
19 | | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | | |
20 | | | | |
21 | +---------------------------------------------------------------------------+ | |
22 | ||
23 | ||
24 | ||
25 | wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387 | |
26 | which was my 80387 emulator for early versions of djgpp (gcc under | |
27 | msdos); wm-emu387 was in turn based upon emu387 which was written by | |
28 | DJ Delorie for djgpp. The interface to the Linux kernel is based upon | |
29 | the original Linux math emulator by Linus Torvalds. | |
30 | ||
31 | My target FPU for wm-FPU-emu is that described in the Intel486 | |
32 | Programmer's Reference Manual (1992 edition). Unfortunately, numerous | |
33 | facets of the functioning of the FPU are not well covered in the | |
34 | Reference Manual. The information in the manual has been supplemented | |
35 | with measurements on real 80486's. Unfortunately, it is simply not | |
36 | possible to be sure that all of the peculiarities of the 80486 have | |
37 | been discovered, so there is always likely to be obscure differences | |
38 | in the detailed behaviour of the emulator and a real 80486. | |
39 | ||
40 | wm-FPU-emu does not implement all of the behaviour of the 80486 FPU, | |
41 | but is very close. See "Limitations" later in this file for a list of | |
42 | some differences. | |
43 | ||
44 | Please report bugs, etc to me at: | |
45 | billm@melbpc.org.au | |
46 | or b.metzenthen@medoto.unimelb.edu.au | |
47 | ||
48 | For more information on the emulator and on floating point topics, see | |
49 | my web pages, currently at http://www.suburbia.net/~billm/ | |
50 | ||
51 | ||
52 | --Bill Metzenthen | |
53 | December 1999 | |
54 | ||
55 | ||
56 | ----------------------- Internals of wm-FPU-emu ----------------------- | |
57 | ||
58 | Numeric algorithms: | |
59 | (1) Add, subtract, and multiply. Nothing remarkable in these. | |
60 | (2) Divide has been tuned to get reasonable performance. The algorithm | |
61 | is not the obvious one which most people seem to use, but is designed | |
62 | to take advantage of the characteristics of the 80386. I expect that | |
63 | it has been invented many times before I discovered it, but I have not | |
64 | seen it. It is based upon one of those ideas which one carries around | |
65 | for years without ever bothering to check it out. | |
66 | (3) The sqrt function has been tuned to get good performance. It is based | |
67 | upon Newton's classic method. Performance was improved by capitalizing | |
68 | upon the properties of Newton's method, and the code is once again | |
69 | structured taking account of the 80386 characteristics. | |
70 | (4) The trig, log, and exp functions are based in each case upon quasi- | |
71 | "optimal" polynomial approximations. My definition of "optimal" was | |
72 | based upon getting good accuracy with reasonable speed. | |
73 | (5) The argument reducing code for the trig function effectively uses | |
74 | a value of pi which is accurate to more than 128 bits. As a consequence, | |
75 | the reduced argument is accurate to more than 64 bits for arguments up | |
76 | to a few pi, and accurate to more than 64 bits for most arguments, | |
77 | even for arguments approaching 2^63. This is far superior to an | |
78 | 80486, which uses a value of pi which is accurate to 66 bits. | |
79 | ||
80 | The code of the emulator is complicated slightly by the need to | |
81 | account for a limited form of re-entrancy. Normally, the emulator will | |
82 | emulate each FPU instruction to completion without interruption. | |
83 | However, it may happen that when the emulator is accessing the user | |
84 | memory space, swapping may be needed. In this case the emulator may be | |
85 | temporarily suspended while disk i/o takes place. During this time | |
86 | another process may use the emulator, thereby perhaps changing static | |
87 | variables. The code which accesses user memory is confined to five | |
88 | files: | |
89 | fpu_entry.c | |
90 | reg_ld_str.c | |
91 | load_store.c | |
92 | get_address.c | |
93 | errors.c | |
94 | As from version 1.12 of the emulator, no static variables are used | |
95 | (apart from those in the kernel's per-process tables). The emulator is | |
96 | therefore now fully re-entrant, rather than having just the restricted | |
97 | form of re-entrancy which is required by the Linux kernel. | |
98 | ||
99 | ----------------------- Limitations of wm-FPU-emu ----------------------- | |
100 | ||
101 | There are a number of differences between the current wm-FPU-emu | |
102 | (version 2.01) and the 80486 FPU (apart from bugs). The differences | |
103 | are fewer than those which applied to the 1.xx series of the emulator. | |
104 | Some of the more important differences are listed below: | |
105 | ||
106 | The Roundup flag does not have much meaning for the transcendental | |
107 | functions and its 80486 value with these functions is likely to differ | |
108 | from its emulator value. | |
109 | ||
110 | In a few rare cases the Underflow flag obtained with the emulator will | |
111 | be different from that obtained with an 80486. This occurs when the | |
112 | following conditions apply simultaneously: | |
113 | (a) the operands have a higher precision than the current setting of the | |
114 | precision control (PC) flags. | |
115 | (b) the underflow exception is masked. | |
116 | (c) the magnitude of the exact result (before rounding) is less than 2^-16382. | |
117 | (d) the magnitude of the final result (after rounding) is exactly 2^-16382. | |
118 | (e) the magnitude of the exact result would be exactly 2^-16382 if the | |
119 | operands were rounded to the current precision before the arithmetic | |
120 | operation was performed. | |
121 | If all of these apply, the emulator will set the Underflow flag but a real | |
122 | 80486 will not. | |
123 | ||
124 | NOTE: Certain formats of Extended Real are UNSUPPORTED. They are | |
125 | unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities, | |
126 | and Unnormals. None of these will be generated by an 80486 or by the | |
127 | emulator. Do not use them. The emulator treats them differently in | |
128 | detail from the way an 80486 does. | |
129 | ||
130 | Self modifying code can cause the emulator to fail. An example of such | |
131 | code is: | |
132 | movl %esp,[%ebx] | |
133 | fld1 | |
134 | The FPU instruction may be (usually will be) loaded into the pre-fetch | |
135 | queue of the CPU before the mov instruction is executed. If the | |
136 | destination of the 'movl' overlaps the FPU instruction then the bytes | |
137 | in the prefetch queue and memory will be inconsistent when the FPU | |
138 | instruction is executed. The emulator will be invoked but will not be | |
139 | able to find the instruction which caused the device-not-present | |
140 | exception. For this case, the emulator cannot emulate the behaviour of | |
141 | an 80486DX. | |
142 | ||
143 | Handling of the address size override prefix byte (0x67) has not been | |
144 | extensively tested yet. A major problem exists because using it in | |
145 | vm86 mode can cause a general protection fault. Address offsets | |
146 | greater than 0xffff appear to be illegal in vm86 mode but are quite | |
147 | acceptable (and work) in real mode. A small test program developed to | |
148 | check the addressing, and which runs successfully in real mode, | |
149 | crashes dosemu under Linux and also brings Windows down with a general | |
150 | protection fault message when run under the MS-DOS prompt of Windows | |
151 | 3.1. (The program simply reads data from a valid address). | |
152 | ||
153 | The emulator supports 16-bit protected mode, with one difference from | |
154 | an 80486DX. A 80486DX will allow some floating point instructions to | |
155 | write a few bytes below the lowest address of the stack. The emulator | |
156 | will not allow this in 16-bit protected mode: no instructions are | |
157 | allowed to write outside the bounds set by the protection. | |
158 | ||
159 | ----------------------- Performance of wm-FPU-emu ----------------------- | |
160 | ||
161 | Speed. | |
162 | ----- | |
163 | ||
164 | The speed of floating point computation with the emulator will depend | |
165 | upon instruction mix. Relative performance is best for the instructions | |
166 | which require most computation. The simple instructions are adversely | |
167 | affected by the FPU instruction trap overhead. | |
168 | ||
169 | ||
170 | Timing: Some simple timing tests have been made on the emulator functions. | |
171 | The times include load/store instructions. All times are in microseconds | |
172 | measured on a 33MHz 386 with 64k cache. The Turbo C tests were under | |
173 | ms-dos, the next two columns are for emulators running with the djgpp | |
174 | ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97, | |
175 | using libm4.0 (hard). | |
176 | ||
177 | function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu | |
178 | ||
179 | + 60.5 154.8 76.5 139.4 | |
180 | - 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7 | |
181 | * 71.0 190.8 79.6 146.6 | |
182 | / 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1 | |
183 | ||
184 | sin() 310.8 4692.0 319.0 398.5 | |
185 | cos() 284.4 4855.2 308.0 388.7 | |
186 | tan() 495.0 8807.1 394.9 504.7 | |
187 | atan() 328.9 4866.4 601.1 419.5-491.9 | |
188 | ||
189 | sqrt() 128.7 crashed 145.2 227.0 | |
190 | log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1 | |
191 | exp() 479.1 6619.2 469.1 850.8 | |
192 | ||
193 | ||
194 | The performance under Linux is improved by the use of look-ahead code. | |
195 | The following results show the improvement which is obtained under | |
196 | Linux due to the look-ahead code. Also given are the times for the | |
197 | original Linux emulator with the 4.1 'soft' lib. | |
198 | ||
199 | [ Linus' note: I changed look-ahead to be the default under linux, as | |
200 | there was no reason not to use it after I had edited it to be | |
201 | disabled during tracing ] | |
202 | ||
203 | wm-FPU-emu w original w | |
204 | look-ahead 'soft' lib | |
205 | + 106.4 190.2 | |
206 | - 108.6-111.6 192.4-216.2 | |
207 | * 113.4 193.1 | |
208 | / 108.8-124.4 700.1-706.2 | |
209 | ||
210 | sin() 390.5 2642.0 | |
211 | cos() 381.5 2767.4 | |
212 | tan() 496.5 3153.3 | |
213 | atan() 367.2-435.5 2439.4-3396.8 | |
214 | ||
215 | sqrt() 195.1 4732.5 | |
216 | log() 358.0-387.5 3359.2-3390.3 | |
217 | exp() 619.3 4046.4 | |
218 | ||
219 | ||
220 | These figures are now somewhat out-of-date. The emulator has become | |
221 | progressively slower for most functions as more of the 80486 features | |
222 | have been implemented. | |
223 | ||
224 | ||
225 | ----------------------- Accuracy of wm-FPU-emu ----------------------- | |
226 | ||
227 | ||
228 | The accuracy of the emulator is in almost all cases equal to or better | |
229 | than that of an Intel 80486 FPU. | |
230 | ||
231 | The results of the basic arithmetic functions (+,-,*,/), and fsqrt | |
232 | match those of an 80486 FPU. They are the best possible; the error for | |
233 | these never exceeds 1/2 an lsb. The fprem and fprem1 instructions | |
234 | return exact results; they have no error. | |
235 | ||
236 | ||
237 | The following table compares the emulator accuracy for the sqrt(), | |
238 | trig and log functions against the Turbo C "emulator". For this table, | |
239 | each function was tested at about 400 points. Ideal worst-case results | |
240 | would be 64 bits. The reduced Turbo C accuracy of cos() and tan() for | |
241 | arguments greater than pi/4 can be thought of as being related to the | |
242 | precision of the argument x; e.g. an argument of pi/2-(1e-10) which is | |
243 | accurate to 64 bits can result in a relative accuracy in cos() of | |
244 | about 64 + log2(cos(x)) = 31 bits. | |
245 | ||
246 | ||
247 | Function Tested x range Worst result Turbo C | |
248 | (relative bits) | |
249 | ||
250 | sqrt(x) 1 .. 2 64.1 63.2 | |
251 | atan(x) 1e-10 .. 200 64.2 62.8 | |
252 | cos(x) 0 .. pi/2-(1e-10) 64.4 (x <= pi/4) 62.4 | |
253 | 64.1 (x = pi/2-(1e-10)) 31.9 | |
254 | sin(x) 1e-10 .. pi/2 64.0 62.8 | |
255 | tan(x) 1e-10 .. pi/2-(1e-10) 64.0 (x <= pi/4) 62.1 | |
256 | 64.1 (x = pi/2-(1e-10)) 31.9 | |
257 | exp(x) 0 .. 1 63.1 ** 62.9 | |
258 | log(x) 1+1e-6 .. 2 63.8 ** 62.1 | |
259 | ||
260 | ** The accuracy for exp() and log() is low because the FPU (emulator) | |
261 | does not compute them directly; two operations are required. | |
262 | ||
263 | ||
264 | The emulator passes the "paranoia" tests (compiled with gcc 2.3.3 or | |
265 | later) for 'float' variables (24 bit precision numbers) when precision | |
266 | control is set to 24, 53 or 64 bits, and for 'double' variables (53 | |
267 | bit precision numbers) when precision control is set to 53 bits (a | |
268 | properly performing FPU cannot pass the 'paranoia' tests for 'double' | |
269 | variables when precision control is set to 64 bits). | |
270 | ||
271 | The code for reducing the argument for the trig functions (fsin, fcos, | |
272 | fptan and fsincos) has been improved and now effectively uses a value | |
273 | for pi which is accurate to more than 128 bits precision. As a | |
274 | consequence, the accuracy of these functions for large arguments has | |
275 | been dramatically improved (and is now very much better than an 80486 | |
276 | FPU). There is also now no degradation of accuracy for fcos and fptan | |
277 | for operands close to pi/2. Measured results are (note that the | |
278 | definition of accuracy has changed slightly from that used for the | |
279 | above table): | |
280 | ||
281 | Function Tested x range Worst result | |
282 | (absolute bits) | |
283 | ||
284 | cos(x) 0 .. 9.22e+18 62.0 | |
285 | sin(x) 1e-16 .. 9.22e+18 62.1 | |
286 | tan(x) 1e-16 .. 9.22e+18 61.8 | |
287 | ||
288 | It is possible with some effort to find very large arguments which | |
289 | give much degraded precision. For example, the integer number | |
290 | 8227740058411162616.0 | |
291 | is within about 10e-7 of a multiple of pi. To find the tan (for | |
292 | example) of this number to 64 bits precision it would be necessary to | |
293 | have a value of pi which had about 150 bits precision. The FPU | |
294 | emulator computes the result to about 42.6 bits precision (the correct | |
295 | result is about -9.739715e-8). On the other hand, an 80486 FPU returns | |
296 | 0.01059, which in relative terms is hopelessly inaccurate. | |
297 | ||
298 | For arguments close to critical angles (which occur at multiples of | |
299 | pi/2) the emulator is more accurate than an 80486 FPU. For very large | |
300 | arguments, the emulator is far more accurate. | |
301 | ||
302 | ||
303 | Prior to version 1.20 of the emulator, the accuracy of the results for | |
304 | the transcendental functions (in their principal range) was not as | |
305 | good as the results from an 80486 FPU. From version 1.20, the accuracy | |
306 | has been considerably improved and these functions now give measured | |
307 | worst-case results which are better than the worst-case results given | |
308 | by an 80486 FPU. | |
309 | ||
310 | The following table gives the measured results for the emulator. The | |
311 | number of randomly selected arguments in each case is about half a | |
312 | million. The group of three columns gives the frequency of the given | |
313 | accuracy in number of times per million, thus the second of these | |
314 | columns shows that an accuracy of between 63.80 and 63.89 bits was | |
315 | found at a rate of 133 times per one million measurements for fsin. | |
316 | The results show that the fsin, fcos and fptan instructions return | |
317 | results which are in error (i.e. less accurate than the best possible | |
318 | result (which is 64 bits)) for about one per cent of all arguments | |
319 | between -pi/2 and +pi/2. The other instructions have a lower | |
320 | frequency of results which are in error. The last two columns give | |
321 | the worst accuracy which was found (in bits) and the approximate value | |
322 | of the argument which produced it. | |
323 | ||
324 | frequency (per M) | |
325 | ------------------- --------------- | |
326 | instr arg range # tests 63.7 63.8 63.9 worst at arg | |
327 | bits bits bits bits | |
328 | ----- ------------ ------- ---- ---- ----- ----- -------- | |
329 | fsin (0,pi/2) 547756 0 133 10673 63.89 0.451317 | |
330 | fcos (0,pi/2) 547563 0 126 10532 63.85 0.700801 | |
331 | fptan (0,pi/2) 536274 11 267 10059 63.74 0.784876 | |
332 | fpatan 4 quadrants 517087 0 8 1855 63.88 0.435121 (4q) | |
333 | fyl2x (0,20) 541861 0 0 1323 63.94 1.40923 (x) | |
334 | fyl2xp1 (-.293,.414) 520256 0 0 5678 63.93 0.408542 (x) | |
335 | f2xm1 (-1,1) 538847 4 481 6488 63.79 0.167709 | |
336 | ||
337 | ||
338 | Tests performed on an 80486 FPU showed results of lower accuracy. The | |
339 | following table gives the results which were obtained with an AMD | |
340 | 486DX2/66 (other tests indicate that an Intel 486DX produces | |
341 | identical results). The tests were basically the same as those used | |
342 | to measure the emulator (the values, being random, were in general not | |
343 | the same). The total number of tests for each instruction are given | |
344 | at the end of the table, in case each about 100k tests were performed. | |
345 | Another line of figures at the end of the table shows that most of the | |
346 | instructions return results which are in error for more than 10 | |
347 | percent of the arguments tested. | |
348 | ||
349 | The numbers in the body of the table give the approx number of times a | |
350 | result of the given accuracy in bits (given in the left-most column) | |
351 | was obtained per one million arguments. For three of the instructions, | |
352 | two columns of results are given: * The second column for f2xm1 gives | |
353 | the number cases where the results of the first column were for a | |
354 | positive argument, this shows that this instruction gives better | |
355 | results for positive arguments than it does for negative. * In the | |
356 | cases of fcos and fptan, the first column gives the results when all | |
357 | cases where arguments greater than 1.5 were removed from the results | |
358 | given in the second column. Unlike the emulator, an 80486 FPU returns | |
359 | results of relatively poor accuracy for these instructions when the | |
360 | argument approaches pi/2. The table does not show those cases when the | |
361 | accuracy of the results were less than 62 bits, which occurs quite | |
362 | often for fsin and fptan when the argument approaches pi/2. This poor | |
363 | accuracy is discussed above in relation to the Turbo C "emulator", and | |
364 | the accuracy of the value of pi. | |
365 | ||
366 | ||
367 | bits f2xm1 f2xm1 fpatan fcos fcos fyl2x fyl2xp1 fsin fptan fptan | |
368 | 62.0 0 0 0 0 437 0 0 0 0 925 | |
369 | 62.1 0 0 10 0 894 0 0 0 0 1023 | |
370 | 62.2 14 0 0 0 1033 0 0 0 0 945 | |
371 | 62.3 57 0 0 0 1202 0 0 0 0 1023 | |
372 | 62.4 385 0 0 10 1292 0 23 0 0 1178 | |
373 | 62.5 1140 0 0 119 1649 0 39 0 0 1149 | |
374 | 62.6 2037 0 0 189 1620 0 16 0 0 1169 | |
375 | 62.7 5086 14 0 646 2315 10 101 35 39 1402 | |
376 | 62.8 8818 86 0 984 3050 59 287 131 224 2036 | |
377 | 62.9 11340 1355 0 2126 4153 79 605 357 321 1948 | |
378 | 63.0 15557 4750 0 3319 5376 246 1281 862 808 2688 | |
379 | 63.1 20016 8288 0 4620 6628 511 2569 1723 1510 3302 | |
380 | 63.2 24945 11127 10 6588 8098 1120 4470 2968 2990 4724 | |
381 | 63.3 25686 12382 69 8774 10682 1906 6775 4482 5474 7236 | |
382 | 63.4 29219 14722 79 11109 12311 3094 9414 7259 8912 10587 | |
383 | 63.5 30458 14936 393 13802 15014 5874 12666 9609 13762 15262 | |
384 | 63.6 32439 16448 1277 17945 19028 10226 15537 14657 19158 20346 | |
385 | 63.7 35031 16805 4067 23003 23947 18910 20116 21333 25001 26209 | |
386 | 63.8 33251 15820 7673 24781 25675 24617 25354 24440 29433 30329 | |
387 | 63.9 33293 16833 18529 28318 29233 31267 31470 27748 29676 30601 | |
388 | ||
389 | Per cent with error: | |
390 | 30.9 3.2 18.5 9.8 13.1 11.6 17.4 | |
391 | Total arguments tested: | |
392 | 70194 70099 101784 100641 100641 101799 128853 114893 102675 102675 | |
393 | ||
394 | ||
395 | ------------------------- Contributors ------------------------------- | |
396 | ||
397 | A number of people have contributed to the development of the | |
398 | emulator, often by just reporting bugs, sometimes with suggested | |
399 | fixes, and a few kind people have provided me with access in one way | |
400 | or another to an 80486 machine. Contributors include (to those people | |
401 | who I may have forgotten, please forgive me): | |
402 | ||
403 | Linus Torvalds | |
404 | Tommy.Thorn@daimi.aau.dk | |
405 | Andrew.Tridgell@anu.edu.au | |
406 | Nick Holloway, alfie@dcs.warwick.ac.uk | |
407 | Hermano Moura, moura@dcs.gla.ac.uk | |
408 | Jon Jagger, J.Jagger@scp.ac.uk | |
409 | Lennart Benschop | |
410 | Brian Gallew, geek+@CMU.EDU | |
411 | Thomas Staniszewski, ts3v+@andrew.cmu.edu | |
412 | Martin Howell, mph@plasma.apana.org.au | |
413 | M Saggaf, alsaggaf@athena.mit.edu | |
414 | Peter Barker, PETER@socpsy.sci.fau.edu | |
415 | tom@vlsivie.tuwien.ac.at | |
416 | Dan Russel, russed@rpi.edu | |
417 | Daniel Carosone, danielce@ee.mu.oz.au | |
418 | cae@jpmorgan.com | |
419 | Hamish Coleman, t933093@minyos.xx.rmit.oz.au | |
420 | Bruce Evans, bde@kralizec.zeta.org.au | |
421 | Timo Korvola, Timo.Korvola@hut.fi | |
422 | Rick Lyons, rick@razorback.brisnet.org.au | |
423 | Rick, jrs@world.std.com | |
424 | ||
425 | ...and numerous others who responded to my request for help with | |
426 | a real 80486. | |
427 |