docs: sparc: convert to ReST

author Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

Mon, 22 Apr 2019 13:28:02 +0000 (10:28 -0300)

committer David S. Miller <davem@davemloft.net>

Thu, 9 May 2019 00:13:35 +0000 (17:13 -0700)
author Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Mon, 22 Apr 2019 13:28:02 +0000 (10:28 -0300)
committer David S. Miller <davem@davemloft.net>
Thu, 9 May 2019 00:13:35 +0000 (17:13 -0700)
diff --git a/Documentation/sparc/adi.rst b/Documentation/sparc/adi.rst

new file mode 100644 (file)

index 0000000..857ad30
--- /dev/null
+++ b/Documentation/sparc/adi.rst
@@ -0,0 +1,286 @@
+================================
+Application Data Integrity (ADI)
+================================
+
+SPARC M7 processor adds the Application Data Integrity (ADI) feature.
+ADI allows a task to set version tags on any subset of its address
+space. Once ADI is enabled and version tags are set for ranges of
+address space of a task, the processor will compare the tag in pointers
+to memory in these ranges to the version set by the application
+previously. Access to memory is granted only if the tag in given pointer
+matches the tag set by the application. In case of mismatch, processor
+raises an exception.
+
+Following steps must be taken by a task to enable ADI fully:
+
+1. Set the user mode PSTATE.mcde bit. This acts as master switch for
+   the task's entire address space to enable/disable ADI for the task.
+
+2. Set TTE.mcd bit on any TLB entries that correspond to the range of
+   addresses ADI is being enabled on. MMU checks the version tag only
+   on the pages that have TTE.mcd bit set.
+
+3. Set the version tag for virtual addresses using stxa instruction
+   and one of the MCD specific ASIs. Each stxa instruction sets the
+   given tag for one ADI block size number of bytes. This step must
+   be repeated for entire page to set tags for entire page.
+
+ADI block size for the platform is provided by the hypervisor to kernel
+in machine description tables. Hypervisor also provides the number of
+top bits in the virtual address that specify the version tag.  Once
+version tag has been set for a memory location, the tag is stored in the
+physical memory and the same tag must be present in the ADI version tag
+bits of the virtual address being presented to the MMU. For example on
+SPARC M7 processor, MMU uses bits 63-60 for version tags and ADI block
+size is same as cacheline size which is 64 bytes. A task that sets ADI
+version to, say 10, on a range of memory, must access that memory using
+virtual addresses that contain 0xa in bits 63-60.
+
+ADI is enabled on a set of pages using mprotect() with PROT_ADI flag.
+When ADI is enabled on a set of pages by a task for the first time,
+kernel sets the PSTATE.mcde bit fot the task. Version tags for memory
+addresses are set with an stxa instruction on the addresses using
+ASI_MCD_PRIMARY or ASI_MCD_ST_BLKINIT_PRIMARY. ADI block size is
+provided by the hypervisor to the kernel.  Kernel returns the value of
+ADI block size to userspace using auxiliary vector along with other ADI
+info. Following auxiliary vectors are provided by the kernel:
+
+       ============    ===========================================
+       AT_ADI_BLKSZ    ADI block size. This is the granularity and
+                       alignment, in bytes, of ADI versioning.
+       AT_ADI_NBITS    Number of ADI version bits in the VA
+       ============    ===========================================
+
+
+IMPORTANT NOTES
+===============
+
+- Version tag values of 0x0 and 0xf are reserved. These values match any
+  tag in virtual address and never generate a mismatch exception.
+
+- Version tags are set on virtual addresses from userspace even though
+  tags are stored in physical memory. Tags are set on a physical page
+  after it has been allocated to a task and a pte has been created for
+  it.
+
+- When a task frees a memory page it had set version tags on, the page
+  goes back to free page pool. When this page is re-allocated to a task,
+  kernel clears the page using block initialization ASI which clears the
+  version tags as well for the page. If a page allocated to a task is
+  freed and allocated back to the same task, old version tags set by the
+  task on that page will no longer be present.
+
+- ADI tag mismatches are not detected for non-faulting loads.
+
+- Kernel does not set any tags for user pages and it is entirely a
+  task's responsibility to set any version tags. Kernel does ensure the
+  version tags are preserved if a page is swapped out to the disk and
+  swapped back in. It also preserves that version tags if a page is
+  migrated.
+
+- ADI works for any size pages. A userspace task need not be aware of
+  page size when using ADI. It can simply select a virtual address
+  range, enable ADI on the range using mprotect() and set version tags
+  for the entire range. mprotect() ensures range is aligned to page size
+  and is a multiple of page size.
+
+- ADI tags can only be set on writable memory. For example, ADI tags can
+  not be set on read-only mappings.
+
+
+
+ADI related traps
+=================
+
+With ADI enabled, following new traps may occur:
+
+Disrupting memory corruption
+----------------------------
+
+       When a store accesses a memory localtion that has TTE.mcd=1,
+       the task is running with ADI enabled (PSTATE.mcde=1), and the ADI
+       tag in the address used (bits 63:60) does not match the tag set on
+       the corresponding cacheline, a memory corruption trap occurs. By
+       default, it is a disrupting trap and is sent to the hypervisor
+       first. Hypervisor creates a sun4v error report and sends a
+       resumable error (TT=0x7e) trap to the kernel. The kernel sends
+       a SIGSEGV to the task that resulted in this trap with the following
+       info::
+
+               siginfo.si_signo = SIGSEGV;
+               siginfo.errno = 0;
+               siginfo.si_code = SEGV_ADIDERR;
+               siginfo.si_addr = addr; /* PC where first mismatch occurred */
+               siginfo.si_trapno = 0;
+
+
+Precise memory corruption
+-------------------------
+
+       When a store accesses a memory location that has TTE.mcd=1,
+       the task is running with ADI enabled (PSTATE.mcde=1), and the ADI
+       tag in the address used (bits 63:60) does not match the tag set on
+       the corresponding cacheline, a memory corruption trap occurs. If
+       MCD precise exception is enabled (MCDPERR=1), a precise
+       exception is sent to the kernel with TT=0x1a. The kernel sends
+       a SIGSEGV to the task that resulted in this trap with the following
+       info::
+
+               siginfo.si_signo = SIGSEGV;
+               siginfo.errno = 0;
+               siginfo.si_code = SEGV_ADIPERR;
+               siginfo.si_addr = addr; /* address that caused trap */
+               siginfo.si_trapno = 0;
+
+       NOTE:
+               ADI tag mismatch on a load always results in precise trap.
+
+
+MCD disabled
+------------
+
+       When a task has not enabled ADI and attempts to set ADI version
+       on a memory address, processor sends an MCD disabled trap. This
+       trap is handled by hypervisor first and the hypervisor vectors this
+       trap through to the kernel as Data Access Exception trap with
+       fault type set to 0xa (invalid ASI). When this occurs, the kernel
+       sends the task SIGSEGV signal with following info::
+
+               siginfo.si_signo = SIGSEGV;
+               siginfo.errno = 0;
+               siginfo.si_code = SEGV_ACCADI;
+               siginfo.si_addr = addr; /* address that caused trap */
+               siginfo.si_trapno = 0;
+
+
+Sample program to use ADI
+-------------------------
+
+Following sample program is meant to illustrate how to use the ADI
+functionality::
+
+  #include <unistd.h>
+  #include <stdio.h>
+  #include <stdlib.h>
+  #include <elf.h>
+  #include <sys/ipc.h>
+  #include <sys/shm.h>
+  #include <sys/mman.h>
+  #include <asm/asi.h>
+
+  #ifndef AT_ADI_BLKSZ
+  #define AT_ADI_BLKSZ 48
+  #endif
+  #ifndef AT_ADI_NBITS
+  #define AT_ADI_NBITS 49
+  #endif
+
+  #ifndef PROT_ADI
+  #define PROT_ADI     0x10
+  #endif
+
+  #define BUFFER_SIZE     32*1024*1024UL
+
+  main(int argc, char* argv[], char* envp[])
+  {
+          unsigned long i, mcde, adi_blksz, adi_nbits;
+          char *shmaddr, *tmp_addr, *end, *veraddr, *clraddr;
+          int shmid, version;
+       Elf64_auxv_t *auxv;
+
+       adi_blksz = 0;
+
+       while(*envp++ != NULL);
+       for (auxv = (Elf64_auxv_t *)envp; auxv->a_type != AT_NULL; auxv++) {
+               switch (auxv->a_type) {
+               case AT_ADI_BLKSZ:
+                       adi_blksz = auxv->a_un.a_val;
+                       break;
+               case AT_ADI_NBITS:
+                       adi_nbits = auxv->a_un.a_val;
+                       break;
+               }
+       }
+       if (adi_blksz == 0) {
+               fprintf(stderr, "Oops! ADI is not supported\n");
+               exit(1);
+       }
+
+       printf("ADI capabilities:\n");
+       printf("\tBlock size = %ld\n", adi_blksz);
+       printf("\tNumber of bits = %ld\n", adi_nbits);
+
+          if ((shmid = shmget(2, BUFFER_SIZE,
+                                  IPC_CREAT | SHM_R | SHM_W)) < 0) {
+                  perror("shmget failed");
+                  exit(1);
+          }
+
+          shmaddr = shmat(shmid, NULL, 0);
+          if (shmaddr == (char *)-1) {
+                  perror("shm attach failed");
+                  shmctl(shmid, IPC_RMID, NULL);
+                  exit(1);
+          }
+
+       if (mprotect(shmaddr, BUFFER_SIZE, PROT_READ|PROT_WRITE|PROT_ADI)) {
+               perror("mprotect failed");
+               goto err_out;
+       }
+
+          /* Set the ADI version tag on the shm segment
+           */
+          version = 10;
+          tmp_addr = shmaddr;
+          end = shmaddr + BUFFER_SIZE;
+          while (tmp_addr < end) {
+                  asm volatile(
+                          "stxa %1, [%0]0x90\n\t"
+                          :
+                          : "r" (tmp_addr), "r" (version));
+                  tmp_addr += adi_blksz;
+          }
+       asm volatile("membar #Sync\n\t");
+
+          /* Create a versioned address from the normal address by placing
+        * version tag in the upper adi_nbits bits
+           */
+          tmp_addr = (void *) ((unsigned long)shmaddr << adi_nbits);
+          tmp_addr = (void *) ((unsigned long)tmp_addr >> adi_nbits);
+          veraddr = (void *) (((unsigned long)version << (64-adi_nbits))
+                          | (unsigned long)tmp_addr);
+
+          printf("Starting the writes:\n");
+          for (i = 0; i < BUFFER_SIZE; i++) {
+                  veraddr[i] = (char)(i);
+                  if (!(i % (1024 * 1024)))
+                          printf(".");
+          }
+          printf("\n");
+
+          printf("Verifying data...");
+       fflush(stdout);
+          for (i = 0; i < BUFFER_SIZE; i++)
+                  if (veraddr[i] != (char)i)
+                          printf("\nIndex %lu mismatched\n", i);
+          printf("Done.\n");
+
+          /* Disable ADI and clean up
+           */
+       if (mprotect(shmaddr, BUFFER_SIZE, PROT_READ|PROT_WRITE)) {
+               perror("mprotect failed");
+               goto err_out;
+       }
+
+          if (shmdt((const void *)shmaddr) != 0)
+                  perror("Detach failure");
+          shmctl(shmid, IPC_RMID, NULL);
+
+          exit(0);
+
+  err_out:
+          if (shmdt((const void *)shmaddr) != 0)
+                  perror("Detach failure");
+          shmctl(shmid, IPC_RMID, NULL);
+          exit(1);
+  }
diff --git a/Documentation/sparc/adi.txt b/Documentation/sparc/adi.txt

deleted file mode 100644 (file)

index e1aed15..0000000
--- a/Documentation/sparc/adi.txt
+++ /dev/null
@@ -1,278 +0,0 @@
-Application Data Integrity (ADI)
-================================
-
-SPARC M7 processor adds the Application Data Integrity (ADI) feature.
-ADI allows a task to set version tags on any subset of its address
-space. Once ADI is enabled and version tags are set for ranges of
-address space of a task, the processor will compare the tag in pointers
-to memory in these ranges to the version set by the application
-previously. Access to memory is granted only if the tag in given pointer
-matches the tag set by the application. In case of mismatch, processor
-raises an exception.
-
-Following steps must be taken by a task to enable ADI fully:
-
-1. Set the user mode PSTATE.mcde bit. This acts as master switch for
-   the task's entire address space to enable/disable ADI for the task.
-
-2. Set TTE.mcd bit on any TLB entries that correspond to the range of
-   addresses ADI is being enabled on. MMU checks the version tag only
-   on the pages that have TTE.mcd bit set.
-
-3. Set the version tag for virtual addresses using stxa instruction
-   and one of the MCD specific ASIs. Each stxa instruction sets the
-   given tag for one ADI block size number of bytes. This step must
-   be repeated for entire page to set tags for entire page.
-
-ADI block size for the platform is provided by the hypervisor to kernel
-in machine description tables. Hypervisor also provides the number of
-top bits in the virtual address that specify the version tag.  Once
-version tag has been set for a memory location, the tag is stored in the
-physical memory and the same tag must be present in the ADI version tag
-bits of the virtual address being presented to the MMU. For example on
-SPARC M7 processor, MMU uses bits 63-60 for version tags and ADI block
-size is same as cacheline size which is 64 bytes. A task that sets ADI
-version to, say 10, on a range of memory, must access that memory using
-virtual addresses that contain 0xa in bits 63-60.
-
-ADI is enabled on a set of pages using mprotect() with PROT_ADI flag.
-When ADI is enabled on a set of pages by a task for the first time,
-kernel sets the PSTATE.mcde bit fot the task. Version tags for memory
-addresses are set with an stxa instruction on the addresses using
-ASI_MCD_PRIMARY or ASI_MCD_ST_BLKINIT_PRIMARY. ADI block size is
-provided by the hypervisor to the kernel.  Kernel returns the value of
-ADI block size to userspace using auxiliary vector along with other ADI
-info. Following auxiliary vectors are provided by the kernel:
-
-       AT_ADI_BLKSZ    ADI block size. This is the granularity and
-                       alignment, in bytes, of ADI versioning.
-       AT_ADI_NBITS    Number of ADI version bits in the VA
-
-
-IMPORTANT NOTES:
-
-- Version tag values of 0x0 and 0xf are reserved. These values match any
-  tag in virtual address and never generate a mismatch exception.
-
-- Version tags are set on virtual addresses from userspace even though
-  tags are stored in physical memory. Tags are set on a physical page
-  after it has been allocated to a task and a pte has been created for
-  it.
-
-- When a task frees a memory page it had set version tags on, the page
-  goes back to free page pool. When this page is re-allocated to a task,
-  kernel clears the page using block initialization ASI which clears the
-  version tags as well for the page. If a page allocated to a task is
-  freed and allocated back to the same task, old version tags set by the
-  task on that page will no longer be present.
-
-- ADI tag mismatches are not detected for non-faulting loads.
-
-- Kernel does not set any tags for user pages and it is entirely a
-  task's responsibility to set any version tags. Kernel does ensure the
-  version tags are preserved if a page is swapped out to the disk and
-  swapped back in. It also preserves that version tags if a page is
-  migrated.
-
-- ADI works for any size pages. A userspace task need not be aware of
-  page size when using ADI. It can simply select a virtual address
-  range, enable ADI on the range using mprotect() and set version tags
-  for the entire range. mprotect() ensures range is aligned to page size
-  and is a multiple of page size.
-
-- ADI tags can only be set on writable memory. For example, ADI tags can
-  not be set on read-only mappings.
-
-
-
-ADI related traps
------------------
-
-With ADI enabled, following new traps may occur:
-
-Disrupting memory corruption
-
-       When a store accesses a memory localtion that has TTE.mcd=1,
-       the task is running with ADI enabled (PSTATE.mcde=1), and the ADI
-       tag in the address used (bits 63:60) does not match the tag set on
-       the corresponding cacheline, a memory corruption trap occurs. By
-       default, it is a disrupting trap and is sent to the hypervisor
-       first. Hypervisor creates a sun4v error report and sends a
-       resumable error (TT=0x7e) trap to the kernel. The kernel sends
-       a SIGSEGV to the task that resulted in this trap with the following
-       info:
-
-               siginfo.si_signo = SIGSEGV;
-               siginfo.errno = 0;
-               siginfo.si_code = SEGV_ADIDERR;
-               siginfo.si_addr = addr; /* PC where first mismatch occurred */
-               siginfo.si_trapno = 0;
-
-
-Precise memory corruption
-
-       When a store accesses a memory location that has TTE.mcd=1,
-       the task is running with ADI enabled (PSTATE.mcde=1), and the ADI
-       tag in the address used (bits 63:60) does not match the tag set on
-       the corresponding cacheline, a memory corruption trap occurs. If
-       MCD precise exception is enabled (MCDPERR=1), a precise
-       exception is sent to the kernel with TT=0x1a. The kernel sends
-       a SIGSEGV to the task that resulted in this trap with the following
-       info:
-
-               siginfo.si_signo = SIGSEGV;
-               siginfo.errno = 0;
-               siginfo.si_code = SEGV_ADIPERR;
-               siginfo.si_addr = addr; /* address that caused trap */
-               siginfo.si_trapno = 0;
-
-       NOTE: ADI tag mismatch on a load always results in precise trap.
-
-
-MCD disabled
-
-       When a task has not enabled ADI and attempts to set ADI version
-       on a memory address, processor sends an MCD disabled trap. This
-       trap is handled by hypervisor first and the hypervisor vectors this
-       trap through to the kernel as Data Access Exception trap with
-       fault type set to 0xa (invalid ASI). When this occurs, the kernel
-       sends the task SIGSEGV signal with following info:
-
-               siginfo.si_signo = SIGSEGV;
-               siginfo.errno = 0;
-               siginfo.si_code = SEGV_ACCADI;
-               siginfo.si_addr = addr; /* address that caused trap */
-               siginfo.si_trapno = 0;
-
-
-Sample program to use ADI
--------------------------
-
-Following sample program is meant to illustrate how to use the ADI
-functionality.
-
-#include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <elf.h>
-#include <sys/ipc.h>
-#include <sys/shm.h>
-#include <sys/mman.h>
-#include <asm/asi.h>
-
-#ifndef AT_ADI_BLKSZ
-#define AT_ADI_BLKSZ   48
-#endif
-#ifndef AT_ADI_NBITS
-#define AT_ADI_NBITS   49
-#endif
-
-#ifndef PROT_ADI
-#define PROT_ADI       0x10
-#endif
-
-#define BUFFER_SIZE     32*1024*1024UL
-
-main(int argc, char* argv[], char* envp[])
-{
-        unsigned long i, mcde, adi_blksz, adi_nbits;
-        char *shmaddr, *tmp_addr, *end, *veraddr, *clraddr;
-        int shmid, version;
-       Elf64_auxv_t *auxv;
-
-       adi_blksz = 0;
-
-       while(*envp++ != NULL);
-       for (auxv = (Elf64_auxv_t *)envp; auxv->a_type != AT_NULL; auxv++) {
-               switch (auxv->a_type) {
-               case AT_ADI_BLKSZ:
-                       adi_blksz = auxv->a_un.a_val;
-                       break;
-               case AT_ADI_NBITS:
-                       adi_nbits = auxv->a_un.a_val;
-                       break;
-               }
-       }
-       if (adi_blksz == 0) {
-               fprintf(stderr, "Oops! ADI is not supported\n");
-               exit(1);
-       }
-
-       printf("ADI capabilities:\n");
-       printf("\tBlock size = %ld\n", adi_blksz);
-       printf("\tNumber of bits = %ld\n", adi_nbits);
-
-        if ((shmid = shmget(2, BUFFER_SIZE,
-                                IPC_CREAT | SHM_R | SHM_W)) < 0) {
-                perror("shmget failed");
-                exit(1);
-        }
-
-        shmaddr = shmat(shmid, NULL, 0);
-        if (shmaddr == (char *)-1) {
-                perror("shm attach failed");
-                shmctl(shmid, IPC_RMID, NULL);
-                exit(1);
-        }
-
-       if (mprotect(shmaddr, BUFFER_SIZE, PROT_READ|PROT_WRITE|PROT_ADI)) {
-               perror("mprotect failed");
-               goto err_out;
-       }
-
-        /* Set the ADI version tag on the shm segment
-         */
-        version = 10;
-        tmp_addr = shmaddr;
-        end = shmaddr + BUFFER_SIZE;
-        while (tmp_addr < end) {
-                asm volatile(
-                        "stxa %1, [%0]0x90\n\t"
-                        :
-                        : "r" (tmp_addr), "r" (version));
-                tmp_addr += adi_blksz;
-        }
-       asm volatile("membar #Sync\n\t");
-
-        /* Create a versioned address from the normal address by placing
-        * version tag in the upper adi_nbits bits
-         */
-        tmp_addr = (void *) ((unsigned long)shmaddr << adi_nbits);
-        tmp_addr = (void *) ((unsigned long)tmp_addr >> adi_nbits);
-        veraddr = (void *) (((unsigned long)version << (64-adi_nbits))
-                        | (unsigned long)tmp_addr);
-
-        printf("Starting the writes:\n");
-        for (i = 0; i < BUFFER_SIZE; i++) {
-                veraddr[i] = (char)(i);
-                if (!(i % (1024 * 1024)))
-                        printf(".");
-        }
-        printf("\n");
-
-        printf("Verifying data...");
-       fflush(stdout);
-        for (i = 0; i < BUFFER_SIZE; i++)
-                if (veraddr[i] != (char)i)
-                        printf("\nIndex %lu mismatched\n", i);
-        printf("Done.\n");
-
-        /* Disable ADI and clean up
-         */
-       if (mprotect(shmaddr, BUFFER_SIZE, PROT_READ|PROT_WRITE)) {
-               perror("mprotect failed");
-               goto err_out;
-       }
-
-        if (shmdt((const void *)shmaddr) != 0)
-                perror("Detach failure");
-        shmctl(shmid, IPC_RMID, NULL);
-
-        exit(0);
-
-err_out:
-        if (shmdt((const void *)shmaddr) != 0)
-                perror("Detach failure");
-        shmctl(shmid, IPC_RMID, NULL);
-        exit(1);
-}
diff --git a/Documentation/sparc/console.rst b/Documentation/sparc/console.rst

new file mode 100644 (file)

index 0000000..73132db
--- /dev/null
+++ b/Documentation/sparc/console.rst
@@ -0,0 +1,9 @@
+Steps for sending 'break' on sunhv console
+==========================================
+
+On Baremetal:
+   1. press   Esc + 'B'
+
+On LDOM:
+   1. press    Ctrl + ']'
+   2. telnet> send  break
diff --git a/Documentation/sparc/console.txt b/Documentation/sparc/console.txt

deleted file mode 100644 (file)

index 5aa735a..0000000
--- a/Documentation/sparc/console.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-Steps for sending 'break' on sunhv console:
-===========================================
-
-On Baremetal:
-   1. press   Esc + 'B'
-
-On LDOM:
-   1. press    Ctrl + ']'
-   2. telnet> send  break
diff --git a/Documentation/sparc/index.rst b/Documentation/sparc/index.rst

new file mode 100644 (file)

index 0000000..91f7d66
--- /dev/null
+++ b/Documentation/sparc/index.rst
@@ -0,0 +1,13 @@
+:orphan:
+
+==================
+Sparc Architecture
+==================
+
+.. toctree::
+   :maxdepth: 1
+
+   console
+   adi
+
+   oradax/oracle-dax
diff --git a/Documentation/sparc/oradax/oracle-dax.rst b/Documentation/sparc/oradax/oracle-dax.rst

new file mode 100644 (file)

index 0000000..d1e14d5
--- /dev/null
+++ b/Documentation/sparc/oradax/oracle-dax.rst
@@ -0,0 +1,445 @@
+=======================================
+Oracle Data Analytics Accelerator (DAX)
+=======================================
+
+DAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8
+(DAX2) processor chips, and has direct access to the CPU's L3 caches
+as well as physical memory. It can perform several operations on data
+streams with various input and output formats.  A driver provides a
+transport mechanism and has limited knowledge of the various opcodes
+and data formats. A user space library provides high level services
+and translates these into low level commands which are then passed
+into the driver and subsequently the Hypervisor and the coprocessor.
+The library is the recommended way for applications to use the
+coprocessor, and the driver interface is not intended for general use.
+This document describes the general flow of the driver, its
+structures, and its programmatic interface. It also provides example
+code sufficient to write user or kernel applications that use DAX
+functionality.
+
+The user library is open source and available at:
+
+    https://oss.oracle.com/git/gitweb.cgi?p=libdax.git
+
+The Hypervisor interface to the coprocessor is described in detail in
+the accompanying document, dax-hv-api.txt, which is a plain text
+excerpt of the (Oracle internal) "UltraSPARC Virtual Machine
+Specification" version 3.0.20+15, dated 2017-09-25.
+
+
+High Level Overview
+===================
+
+A coprocessor request is described by a Command Control Block
+(CCB). The CCB contains an opcode and various parameters. The opcode
+specifies what operation is to be done, and the parameters specify
+options, flags, sizes, and addresses.  The CCB (or an array of CCBs)
+is passed to the Hypervisor, which handles queueing and scheduling of
+requests to the available coprocessor execution units. A status code
+returned indicates if the request was submitted successfully or if
+there was an error.  One of the addresses given in each CCB is a
+pointer to a "completion area", which is a 128 byte memory block that
+is written by the coprocessor to provide execution status. No
+interrupt is generated upon completion; the completion area must be
+polled by software to find out when a transaction has finished, but
+the M7 and later processors provide a mechanism to pause the virtual
+processor until the completion status has been updated by the
+coprocessor. This is done using the monitored load and mwait
+instructions, which are described in more detail later.  The DAX
+coprocessor was designed so that after a request is submitted, the
+kernel is no longer involved in the processing of it.  The polling is
+done at the user level, which results in almost zero latency between
+completion of a request and resumption of execution of the requesting
+thread.
+
+
+Addressing Memory
+=================
+
+The kernel does not have access to physical memory in the Sun4v
+architecture, as there is an additional level of memory virtualization
+present. This intermediate level is called "real" memory, and the
+kernel treats this as if it were physical.  The Hypervisor handles the
+translations between real memory and physical so that each logical
+domain (LDOM) can have a partition of physical memory that is isolated
+from that of other LDOMs.  When the kernel sets up a virtual mapping,
+it specifies a virtual address and the real address to which it should
+be mapped.
+
+The DAX coprocessor can only operate on physical memory, so before a
+request can be fed to the coprocessor, all the addresses in a CCB must
+be converted into physical addresses. The kernel cannot do this since
+it has no visibility into physical addresses. So a CCB may contain
+either the virtual or real addresses of the buffers or a combination
+of them. An "address type" field is available for each address that
+may be given in the CCB. In all cases, the Hypervisor will translate
+all the addresses to physical before dispatching to hardware. Address
+translations are performed using the context of the process initiating
+the request.
+
+
+The Driver API
+==============
+
+An application makes requests to the driver via the write() system
+call, and gets results (if any) via read(). The completion areas are
+made accessible via mmap(), and are read-only for the application.
+
+The request may either be an immediate command or an array of CCBs to
+be submitted to the hardware.
+
+Each open instance of the device is exclusive to the thread that
+opened it, and must be used by that thread for all subsequent
+operations. The driver open function creates a new context for the
+thread and initializes it for use.  This context contains pointers and
+values used internally by the driver to keep track of submitted
+requests. The completion area buffer is also allocated, and this is
+large enough to contain the completion areas for many concurrent
+requests.  When the device is closed, any outstanding transactions are
+flushed and the context is cleaned up.
+
+On a DAX1 system (M7), the device will be called "oradax1", while on a
+DAX2 system (M8) it will be "oradax2". If an application requires one
+or the other, it should simply attempt to open the appropriate
+device. Only one of the devices will exist on any given system, so the
+name can be used to determine what the platform supports.
+
+The immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For
+all of these, success is indicated by a return value from write()
+equal to the number of bytes given in the call. Otherwise -1 is
+returned and errno is set.
+
+CCB_DEQUEUE
+-----------
+
+Tells the driver to clean up resources associated with past
+requests. Since no interrupt is generated upon the completion of a
+request, the driver must be told when it may reclaim resources.  No
+further status information is returned, so the user should not
+subsequently call read().
+
+CCB_KILL
+--------
+
+Kills a CCB during execution. The CCB is guaranteed to not continue
+executing once this call returns successfully. On success, read() must
+be called to retrieve the result of the action.
+
+CCB_INFO
+--------
+
+Retrieves information about a currently executing CCB. Note that some
+Hypervisors might return 'notfound' when the CCB is in 'inprogress'
+state. To ensure a CCB in the 'notfound' state will never be executed,
+CCB_KILL must be invoked on that CCB. Upon success, read() must be
+called to retrieve the details of the action.
+
+Submission of an array of CCBs for execution
+---------------------------------------------
+
+A write() whose length is a multiple of the CCB size is treated as a
+submit operation. The file offset is treated as the index of the
+completion area to use, and may be set via lseek() or using the
+pwrite() system call. If -1 is returned then errno is set to indicate
+the error. Otherwise, the return value is the length of the array that
+was actually accepted by the coprocessor. If the accepted length is
+equal to the requested length, then the submission was completely
+successful and there is no further status needed; hence, the user
+should not subsequently call read(). Partial acceptance of the CCB
+array is indicated by a return value less than the requested length,
+and read() must be called to retrieve further status information.  The
+status will reflect the error caused by the first CCB that was not
+accepted, and status_data will provide additional data in some cases.
+
+MMAP
+----
+
+The mmap() function provides access to the completion area allocated
+in the driver.  Note that the completion area is not writeable by the
+user process, and the mmap call must not specify PROT_WRITE.
+
+
+Completion of a Request
+=======================
+
+The first byte in each completion area is the command status which is
+updated by the coprocessor hardware. Software may take advantage of
+new M7/M8 processor capabilities to efficiently poll this status byte.
+First, a "monitored load" is achieved via a Load from Alternate Space
+(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY).  Second, a
+"monitored wait" is achieved via the mwait instruction (a write to
+%asr28). This instruction is like pause in that it suspends execution
+of the virtual processor for the given number of nanoseconds, but in
+addition will terminate early when one of several events occur. If the
+block of data containing the monitored location is modified, then the
+mwait terminates. This causes software to resume execution immediately
+(without a context switch or kernel to user transition) after a
+transaction completes. Thus the latency between transaction completion
+and resumption of execution may be just a few nanoseconds.
+
+
+Application Life Cycle of a DAX Submission
+==========================================
+
+ - open dax device
+ - call mmap() to get the completion area address
+ - allocate a CCB and fill in the opcode, flags, parameters, addresses, etc.
+ - submit CCB via write() or pwrite()
+ - go into a loop executing monitored load + monitored wait and
+   terminate when the command status indicates the request is complete
+   (CCB_KILL or CCB_INFO may be used any time as necessary)
+ - perform a CCB_DEQUEUE
+ - call munmap() for completion area
+ - close the dax device
+
+
+Memory Constraints
+==================
+
+The DAX hardware operates only on physical addresses. Therefore, it is
+not aware of virtual memory mappings and the discontiguities that may
+exist in the physical memory that a virtual buffer maps to. There is
+no I/O TLB or any scatter/gather mechanism. All buffers, whether input
+or output, must reside in a physically contiguous region of memory.
+
+The Hypervisor translates all addresses within a CCB to physical
+before handing off the CCB to DAX. The Hypervisor determines the
+virtual page size for each virtual address given, and uses this to
+program a size limit for each address. This prevents the coprocessor
+from reading or writing beyond the bound of the virtual page, even
+though it is accessing physical memory directly. A simpler way of
+saying this is that a DAX operation will never "cross" a virtual page
+boundary. If an 8k virtual page is used, then the data is strictly
+limited to 8k. If a user's buffer is larger than 8k, then a larger
+page size must be used, or the transaction size will be truncated to
+8k.
+
+Huge pages. A user may allocate huge pages using standard interfaces.
+Memory buffers residing on huge pages may be used to achieve much
+larger DAX transaction sizes, but the rules must still be followed,
+and no transaction will cross a page boundary, even a huge page.  A
+major caveat is that Linux on Sparc presents 8Mb as one of the huge
+page sizes. Sparc does not actually provide a 8Mb hardware page size,
+and this size is synthesized by pasting together two 4Mb pages. The
+reasons for this are historical, and it creates an issue because only
+half of this 8Mb page can actually be used for any given buffer in a
+DAX request, and it must be either the first half or the second half;
+it cannot be a 4Mb chunk in the middle, since that crosses a
+(hardware) page boundary. Note that this entire issue may be hidden by
+higher level libraries.
+
+
+CCB Structure
+-------------
+A CCB is an array of 8 64-bit words. Several of these words provide
+command opcodes, parameters, flags, etc., and the rest are addresses
+for the completion area, output buffer, and various inputs::
+
+   struct ccb {
+       u64   control;
+       u64   completion;
+       u64   input0;
+       u64   access;
+       u64   input1;
+       u64   op_data;
+       u64   output;
+       u64   table;
+   };
+
+See libdax/common/sys/dax1/dax1_ccb.h for a detailed description of
+each of these fields, and see dax-hv-api.txt for a complete description
+of the Hypervisor API available to the guest OS (ie, Linux kernel).
+
+The first word (control) is examined by the driver for the following:
+ - CCB version, which must be consistent with hardware version
+ - Opcode, which must be one of the documented allowable commands
+ - Address types, which must be set to "virtual" for all the addresses
+   given by the user, thereby ensuring that the application can
+   only access memory that it owns
+
+
+Example Code
+============
+
+The DAX is accessible to both user and kernel code.  The kernel code
+can make hypercalls directly while the user code must use wrappers
+provided by the driver. The setup of the CCB is nearly identical for
+both; the only difference is in preparation of the completion area. An
+example of user code is given now, with kernel code afterwards.
+
+In order to program using the driver API, the file
+arch/sparc/include/uapi/asm/oradax.h must be included.
+
+First, the proper device must be opened. For M7 it will be
+/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest
+procedure is to attempt to open both, as only one will succeed::
+
+       fd = open("/dev/oradax1", O_RDWR);
+       if (fd < 0)
+               fd = open("/dev/oradax2", O_RDWR);
+       if (fd < 0)
+              /* No DAX found */
+
+Next, the completion area must be mapped::
+
+      completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0);
+
+All input and output buffers must be fully contained in one hardware
+page, since as explained above, the DAX is strictly constrained by
+virtual page boundaries.  In addition, the output buffer must be
+64-byte aligned and its size must be a multiple of 64 bytes because
+the coprocessor writes in units of cache lines.
+
+This example demonstrates the DAX Scan command, which takes as input a
+vector and a match value, and produces a bitmap as the output. For
+each input element that matches the value, the corresponding bit is
+set in the output.
+
+In this example, the input vector consists of a series of single bits,
+and the match value is 0. So each 0 bit in the input will produce a 1
+in the output, and vice versa, which produces an output bitmap which
+is the input bitmap inverted.
+
+For details of all the parameters and bits used in this CCB, please
+refer to section 36.2.1.3 of the DAX Hypervisor API document, which
+describes the Scan command in detail::
+
+       ccb->control =       /* Table 36.1, CCB Header Format */
+                 (2L << 48)     /* command = Scan Value */
+               | (3L << 40)     /* output address type = primary virtual */
+               | (3L << 34)     /* primary input address type = primary virtual */
+                            /* Section 36.2.1, Query CCB Command Formats */
+               | (1 << 28)     /* 36.2.1.1.1 primary input format = fixed width bit packed */
+               | (0 << 23)     /* 36.2.1.1.2 primary input element size = 0 (1 bit) */
+               | (8 << 10)     /* 36.2.1.1.6 output format = bit vector */
+               | (0 <<  5)     /* 36.2.1.3 First scan criteria size = 0 (1 byte) */
+               | (31 << 0);    /* 36.2.1.3 Disable second scan criteria */
+
+       ccb->completion = 0;    /* Completion area address, to be filled in by driver */
+
+       ccb->input0 = (unsigned long) input; /* primary input address */
+
+       ccb->access =       /* Section 36.2.1.2, Data Access Control */
+                 (2 << 24)    /* Primary input length format = bits */
+               | (nbits - 1); /* number of bits in primary input stream, minus 1 */
+
+       ccb->input1 = 0;       /* secondary input address, unused */
+
+       ccb->op_data = 0;      /* scan criteria (value to be matched) */
+
+       ccb->output = (unsigned long) output;   /* output address */
+
+       ccb->table = 0;        /* table address, unused */
+
+The CCB submission is a write() or pwrite() system call to the
+driver. If the call fails, then a read() must be used to retrieve the
+status::
+
+       if (pwrite(fd, ccb, 64, 0) != 64) {
+               struct ccb_exec_result status;
+               read(fd, &status, sizeof(status));
+               /* bail out */
+       }
+
+After a successful submission of the CCB, the completion area may be
+polled to determine when the DAX is finished. Detailed information on
+the contents of the completion area can be found in section 36.2.2 of
+the DAX HV API document::
+
+       while (1) {
+               /* Monitored Load */
+               __asm__ __volatile__("lduba [%1] 0x84, %0\n"
+                                    : "=r" (status)
+                                    : "r"  (completion_area));
+
+               if (status)          /* 0 indicates command in progress */
+                       break;
+
+               /* MWAIT */
+               __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
+       }
+
+A completion area status of 1 indicates successful completion of the
+CCB and validity of the output bitmap, which may be used immediately.
+All other non-zero values indicate error conditions which are
+described in section 36.2.2::
+
+       if (completion_area[0] != 1) {  /* section 36.2.2, 1 = command ran and succeeded */
+               /* completion_area[0] contains the completion status */
+               /* completion_area[1] contains an error code, see 36.2.2 */
+       }
+
+After the completion area has been processed, the driver must be
+notified that it can release any resources associated with the
+request. This is done via the dequeue operation::
+
+       struct dax_command cmd;
+       cmd.command = CCB_DEQUEUE;
+       if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) {
+               /* bail out */
+       }
+
+Finally, normal program cleanup should be done, i.e., unmapping
+completion area, closing the dax device, freeing memory etc.
+
+Kernel example
+--------------
+
+The only difference in using the DAX in kernel code is the treatment
+of the completion area. Unlike user applications which mmap the
+completion area allocated by the driver, kernel code must allocate its
+own memory to use for the completion area, and this address and its
+type must be given in the CCB::
+
+       ccb->control |=      /* Table 36.1, CCB Header Format */
+               (3L << 32);     /* completion area address type = primary virtual */
+
+       ccb->completion = (unsigned long) completion_area;   /* Completion area address */
+
+The dax submit hypercall is made directly. The flags used in the
+ccb_submit call are documented in the DAX HV API in section 36.3.1/
+
+::
+
+  #include <asm/hypervisor.h>
+
+       hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64,
+                                HV_CCB_QUERY_CMD |
+                                HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY |
+                                HV_CCB_VA_PRIVILEGED,
+                                0, &bytes_accepted, &status_data);
+
+       if (hv_rv != HV_EOK) {
+               /* hv_rv is an error code, status_data contains */
+               /* potential additional status, see 36.3.1.1 */
+       }
+
+After the submission, the completion area polling code is identical to
+that in user land::
+
+       while (1) {
+               /* Monitored Load */
+               __asm__ __volatile__("lduba [%1] 0x84, %0\n"
+                                    : "=r" (status)
+                                    : "r"  (completion_area));
+
+               if (status)          /* 0 indicates command in progress */
+                       break;
+
+               /* MWAIT */
+               __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
+       }
+
+       if (completion_area[0] != 1) {  /* section 36.2.2, 1 = command ran and succeeded */
+               /* completion_area[0] contains the completion status */
+               /* completion_area[1] contains an error code, see 36.2.2 */
+       }
+
+The output bitmap is ready for consumption immediately after the
+completion status indicates success.
+
+Excer[t from UltraSPARC Virtual Machine Specification
+=====================================================
+
+ .. include:: dax-hv-api.txt
+    :literal:
diff --git a/Documentation/sparc/oradax/oracle-dax.txt b/Documentation/sparc/oradax/oracle-dax.txt

deleted file mode 100644 (file)

index 9d53ac9..0000000
--- a/Documentation/sparc/oradax/oracle-dax.txt
+++ /dev/null
@@ -1,429 +0,0 @@
-Oracle Data Analytics Accelerator (DAX)
----------------------------------------
-
-DAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8
-(DAX2) processor chips, and has direct access to the CPU's L3 caches
-as well as physical memory. It can perform several operations on data
-streams with various input and output formats.  A driver provides a
-transport mechanism and has limited knowledge of the various opcodes
-and data formats. A user space library provides high level services
-and translates these into low level commands which are then passed
-into the driver and subsequently the Hypervisor and the coprocessor.
-The library is the recommended way for applications to use the
-coprocessor, and the driver interface is not intended for general use.
-This document describes the general flow of the driver, its
-structures, and its programmatic interface. It also provides example
-code sufficient to write user or kernel applications that use DAX
-functionality.
-
-The user library is open source and available at:
-    https://oss.oracle.com/git/gitweb.cgi?p=libdax.git
-
-The Hypervisor interface to the coprocessor is described in detail in
-the accompanying document, dax-hv-api.txt, which is a plain text
-excerpt of the (Oracle internal) "UltraSPARC Virtual Machine
-Specification" version 3.0.20+15, dated 2017-09-25.
-
-
-High Level Overview
--------------------
-
-A coprocessor request is described by a Command Control Block
-(CCB). The CCB contains an opcode and various parameters. The opcode
-specifies what operation is to be done, and the parameters specify
-options, flags, sizes, and addresses.  The CCB (or an array of CCBs)
-is passed to the Hypervisor, which handles queueing and scheduling of
-requests to the available coprocessor execution units. A status code
-returned indicates if the request was submitted successfully or if
-there was an error.  One of the addresses given in each CCB is a
-pointer to a "completion area", which is a 128 byte memory block that
-is written by the coprocessor to provide execution status. No
-interrupt is generated upon completion; the completion area must be
-polled by software to find out when a transaction has finished, but
-the M7 and later processors provide a mechanism to pause the virtual
-processor until the completion status has been updated by the
-coprocessor. This is done using the monitored load and mwait
-instructions, which are described in more detail later.  The DAX
-coprocessor was designed so that after a request is submitted, the
-kernel is no longer involved in the processing of it.  The polling is
-done at the user level, which results in almost zero latency between
-completion of a request and resumption of execution of the requesting
-thread.
-
-
-Addressing Memory
------------------
-
-The kernel does not have access to physical memory in the Sun4v
-architecture, as there is an additional level of memory virtualization
-present. This intermediate level is called "real" memory, and the
-kernel treats this as if it were physical.  The Hypervisor handles the
-translations between real memory and physical so that each logical
-domain (LDOM) can have a partition of physical memory that is isolated
-from that of other LDOMs.  When the kernel sets up a virtual mapping,
-it specifies a virtual address and the real address to which it should
-be mapped.
-
-The DAX coprocessor can only operate on physical memory, so before a
-request can be fed to the coprocessor, all the addresses in a CCB must
-be converted into physical addresses. The kernel cannot do this since
-it has no visibility into physical addresses. So a CCB may contain
-either the virtual or real addresses of the buffers or a combination
-of them. An "address type" field is available for each address that
-may be given in the CCB. In all cases, the Hypervisor will translate
-all the addresses to physical before dispatching to hardware. Address
-translations are performed using the context of the process initiating
-the request.
-
-
-The Driver API
---------------
-
-An application makes requests to the driver via the write() system
-call, and gets results (if any) via read(). The completion areas are
-made accessible via mmap(), and are read-only for the application.
-
-The request may either be an immediate command or an array of CCBs to
-be submitted to the hardware.
-
-Each open instance of the device is exclusive to the thread that
-opened it, and must be used by that thread for all subsequent
-operations. The driver open function creates a new context for the
-thread and initializes it for use.  This context contains pointers and
-values used internally by the driver to keep track of submitted
-requests. The completion area buffer is also allocated, and this is
-large enough to contain the completion areas for many concurrent
-requests.  When the device is closed, any outstanding transactions are
-flushed and the context is cleaned up.
-
-On a DAX1 system (M7), the device will be called "oradax1", while on a
-DAX2 system (M8) it will be "oradax2". If an application requires one
-or the other, it should simply attempt to open the appropriate
-device. Only one of the devices will exist on any given system, so the
-name can be used to determine what the platform supports.
-
-The immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For
-all of these, success is indicated by a return value from write()
-equal to the number of bytes given in the call. Otherwise -1 is
-returned and errno is set.
-
-CCB_DEQUEUE
-
-Tells the driver to clean up resources associated with past
-requests. Since no interrupt is generated upon the completion of a
-request, the driver must be told when it may reclaim resources.  No
-further status information is returned, so the user should not
-subsequently call read().
-
-CCB_KILL
-
-Kills a CCB during execution. The CCB is guaranteed to not continue
-executing once this call returns successfully. On success, read() must
-be called to retrieve the result of the action.
-
-CCB_INFO
-
-Retrieves information about a currently executing CCB. Note that some
-Hypervisors might return 'notfound' when the CCB is in 'inprogress'
-state. To ensure a CCB in the 'notfound' state will never be executed,
-CCB_KILL must be invoked on that CCB. Upon success, read() must be
-called to retrieve the details of the action.
-
-Submission of an array of CCBs for execution
-
-A write() whose length is a multiple of the CCB size is treated as a
-submit operation. The file offset is treated as the index of the
-completion area to use, and may be set via lseek() or using the
-pwrite() system call. If -1 is returned then errno is set to indicate
-the error. Otherwise, the return value is the length of the array that
-was actually accepted by the coprocessor. If the accepted length is
-equal to the requested length, then the submission was completely
-successful and there is no further status needed; hence, the user
-should not subsequently call read(). Partial acceptance of the CCB
-array is indicated by a return value less than the requested length,
-and read() must be called to retrieve further status information.  The
-status will reflect the error caused by the first CCB that was not
-accepted, and status_data will provide additional data in some cases.
-
-MMAP
-
-The mmap() function provides access to the completion area allocated
-in the driver.  Note that the completion area is not writeable by the
-user process, and the mmap call must not specify PROT_WRITE.
-
-
-Completion of a Request
------------------------
-
-The first byte in each completion area is the command status which is
-updated by the coprocessor hardware. Software may take advantage of
-new M7/M8 processor capabilities to efficiently poll this status byte.
-First, a "monitored load" is achieved via a Load from Alternate Space
-(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY).  Second, a
-"monitored wait" is achieved via the mwait instruction (a write to
-%asr28). This instruction is like pause in that it suspends execution
-of the virtual processor for the given number of nanoseconds, but in
-addition will terminate early when one of several events occur. If the
-block of data containing the monitored location is modified, then the
-mwait terminates. This causes software to resume execution immediately
-(without a context switch or kernel to user transition) after a
-transaction completes. Thus the latency between transaction completion
-and resumption of execution may be just a few nanoseconds.
-
-
-Application Life Cycle of a DAX Submission
-------------------------------------------
-
- - open dax device
- - call mmap() to get the completion area address
- - allocate a CCB and fill in the opcode, flags, parameters, addresses, etc.
- - submit CCB via write() or pwrite()
- - go into a loop executing monitored load + monitored wait and
-   terminate when the command status indicates the request is complete
-   (CCB_KILL or CCB_INFO may be used any time as necessary)
- - perform a CCB_DEQUEUE
- - call munmap() for completion area
- - close the dax device
-
-
-Memory Constraints
-------------------
-
-The DAX hardware operates only on physical addresses. Therefore, it is
-not aware of virtual memory mappings and the discontiguities that may
-exist in the physical memory that a virtual buffer maps to. There is
-no I/O TLB or any scatter/gather mechanism. All buffers, whether input
-or output, must reside in a physically contiguous region of memory.
-
-The Hypervisor translates all addresses within a CCB to physical
-before handing off the CCB to DAX. The Hypervisor determines the
-virtual page size for each virtual address given, and uses this to
-program a size limit for each address. This prevents the coprocessor
-from reading or writing beyond the bound of the virtual page, even
-though it is accessing physical memory directly. A simpler way of
-saying this is that a DAX operation will never "cross" a virtual page
-boundary. If an 8k virtual page is used, then the data is strictly
-limited to 8k. If a user's buffer is larger than 8k, then a larger
-page size must be used, or the transaction size will be truncated to
-8k.
-
-Huge pages. A user may allocate huge pages using standard interfaces.
-Memory buffers residing on huge pages may be used to achieve much
-larger DAX transaction sizes, but the rules must still be followed,
-and no transaction will cross a page boundary, even a huge page.  A
-major caveat is that Linux on Sparc presents 8Mb as one of the huge
-page sizes. Sparc does not actually provide a 8Mb hardware page size,
-and this size is synthesized by pasting together two 4Mb pages. The
-reasons for this are historical, and it creates an issue because only
-half of this 8Mb page can actually be used for any given buffer in a
-DAX request, and it must be either the first half or the second half;
-it cannot be a 4Mb chunk in the middle, since that crosses a
-(hardware) page boundary. Note that this entire issue may be hidden by
-higher level libraries.
-
-
-CCB Structure
--------------
-A CCB is an array of 8 64-bit words. Several of these words provide
-command opcodes, parameters, flags, etc., and the rest are addresses
-for the completion area, output buffer, and various inputs:
-
-   struct ccb {
-       u64   control;
-       u64   completion;
-       u64   input0;
-       u64   access;
-       u64   input1;
-       u64   op_data;
-       u64   output;
-       u64   table;
-   };
-
-See libdax/common/sys/dax1/dax1_ccb.h for a detailed description of
-each of these fields, and see dax-hv-api.txt for a complete description
-of the Hypervisor API available to the guest OS (ie, Linux kernel).
-
-The first word (control) is examined by the driver for the following:
- - CCB version, which must be consistent with hardware version
- - Opcode, which must be one of the documented allowable commands
- - Address types, which must be set to "virtual" for all the addresses
-   given by the user, thereby ensuring that the application can
-   only access memory that it owns
-
-
-Example Code
-------------
-
-The DAX is accessible to both user and kernel code.  The kernel code
-can make hypercalls directly while the user code must use wrappers
-provided by the driver. The setup of the CCB is nearly identical for
-both; the only difference is in preparation of the completion area. An
-example of user code is given now, with kernel code afterwards.
-
-In order to program using the driver API, the file
-arch/sparc/include/uapi/asm/oradax.h must be included.
-
-First, the proper device must be opened. For M7 it will be
-/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest
-procedure is to attempt to open both, as only one will succeed:
-
-       fd = open("/dev/oradax1", O_RDWR);
-       if (fd < 0)
-               fd = open("/dev/oradax2", O_RDWR);
-       if (fd < 0)
-              /* No DAX found */
-
-Next, the completion area must be mapped:
-
-      completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0);
-
-All input and output buffers must be fully contained in one hardware
-page, since as explained above, the DAX is strictly constrained by
-virtual page boundaries.  In addition, the output buffer must be
-64-byte aligned and its size must be a multiple of 64 bytes because
-the coprocessor writes in units of cache lines.
-
-This example demonstrates the DAX Scan command, which takes as input a
-vector and a match value, and produces a bitmap as the output. For
-each input element that matches the value, the corresponding bit is
-set in the output.
-
-In this example, the input vector consists of a series of single bits,
-and the match value is 0. So each 0 bit in the input will produce a 1
-in the output, and vice versa, which produces an output bitmap which
-is the input bitmap inverted.
-
-For details of all the parameters and bits used in this CCB, please
-refer to section 36.2.1.3 of the DAX Hypervisor API document, which
-describes the Scan command in detail.
-
-       ccb->control =       /* Table 36.1, CCB Header Format */
-                 (2L << 48)     /* command = Scan Value */
-               | (3L << 40)     /* output address type = primary virtual */
-               | (3L << 34)     /* primary input address type = primary virtual */
-                            /* Section 36.2.1, Query CCB Command Formats */
-               | (1 << 28)     /* 36.2.1.1.1 primary input format = fixed width bit packed */
-               | (0 << 23)     /* 36.2.1.1.2 primary input element size = 0 (1 bit) */
-               | (8 << 10)     /* 36.2.1.1.6 output format = bit vector */
-               | (0 <<  5)     /* 36.2.1.3 First scan criteria size = 0 (1 byte) */
-               | (31 << 0);    /* 36.2.1.3 Disable second scan criteria */
-
-       ccb->completion = 0;    /* Completion area address, to be filled in by driver */
-
-       ccb->input0 = (unsigned long) input; /* primary input address */
-
-       ccb->access =       /* Section 36.2.1.2, Data Access Control */
-                 (2 << 24)    /* Primary input length format = bits */
-               | (nbits - 1); /* number of bits in primary input stream, minus 1 */
-
-       ccb->input1 = 0;       /* secondary input address, unused */
-
-       ccb->op_data = 0;      /* scan criteria (value to be matched) */
-
-       ccb->output = (unsigned long) output;   /* output address */
-
-       ccb->table = 0;        /* table address, unused */
-
-The CCB submission is a write() or pwrite() system call to the
-driver. If the call fails, then a read() must be used to retrieve the
-status:
-
-       if (pwrite(fd, ccb, 64, 0) != 64) {
-               struct ccb_exec_result status;
-               read(fd, &status, sizeof(status));
-               /* bail out */
-       }
-
-After a successful submission of the CCB, the completion area may be
-polled to determine when the DAX is finished. Detailed information on
-the contents of the completion area can be found in section 36.2.2 of
-the DAX HV API document.
-
-       while (1) {
-               /* Monitored Load */
-               __asm__ __volatile__("lduba [%1] 0x84, %0\n"
-                                    : "=r" (status)
-                                    : "r"  (completion_area));
-
-               if (status)          /* 0 indicates command in progress */
-                       break;
-
-               /* MWAIT */
-               __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
-       }
-
-A completion area status of 1 indicates successful completion of the
-CCB and validity of the output bitmap, which may be used immediately.
-All other non-zero values indicate error conditions which are
-described in section 36.2.2.
-
-       if (completion_area[0] != 1) {  /* section 36.2.2, 1 = command ran and succeeded */
-               /* completion_area[0] contains the completion status */
-               /* completion_area[1] contains an error code, see 36.2.2 */
-       }
-
-After the completion area has been processed, the driver must be
-notified that it can release any resources associated with the
-request. This is done via the dequeue operation:
-
-       struct dax_command cmd;
-       cmd.command = CCB_DEQUEUE;
-       if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) {
-               /* bail out */
-       }
-
-Finally, normal program cleanup should be done, i.e., unmapping
-completion area, closing the dax device, freeing memory etc.
-
-[Kernel example]
-
-The only difference in using the DAX in kernel code is the treatment
-of the completion area. Unlike user applications which mmap the
-completion area allocated by the driver, kernel code must allocate its
-own memory to use for the completion area, and this address and its
-type must be given in the CCB:
-
-       ccb->control |=      /* Table 36.1, CCB Header Format */
-               (3L << 32);     /* completion area address type = primary virtual */
-
-       ccb->completion = (unsigned long) completion_area;   /* Completion area address */
-
-The dax submit hypercall is made directly. The flags used in the
-ccb_submit call are documented in the DAX HV API in section 36.3.1.
-
-#include <asm/hypervisor.h>
-
-       hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64,
-                                HV_CCB_QUERY_CMD |
-                                HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY |
-                                HV_CCB_VA_PRIVILEGED,
-                                0, &bytes_accepted, &status_data);
-
-       if (hv_rv != HV_EOK) {
-               /* hv_rv is an error code, status_data contains */
-               /* potential additional status, see 36.3.1.1 */
-       }
-
-After the submission, the completion area polling code is identical to
-that in user land:
-
-       while (1) {
-               /* Monitored Load */
-               __asm__ __volatile__("lduba [%1] 0x84, %0\n"
-                                    : "=r" (status)
-                                    : "r"  (completion_area));
-
-               if (status)          /* 0 indicates command in progress */
-                       break;
-
-               /* MWAIT */
-               __asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::);    /* 1000 ns */
-       }
-
-       if (completion_area[0] != 1) {  /* section 36.2.2, 1 = command ran and succeeded */
-               /* completion_area[0] contains the completion status */
-               /* completion_area[1] contains an error code, see 36.2.2 */
-       }
-
-The output bitmap is ready for consumption immediately after the
-completion status indicates success.
diff --git a/drivers/sbus/char/oradax.c b/drivers/sbus/char/oradax.c

index 6516bc3cb58b0704b37a57d1cff3d3297feccb71..acd9ba40eabe04abe5a99d3dc5839721f7d7140c 100644 (file)
--- a/drivers/sbus/char/oradax.c
+++ b/drivers/sbus/char/oradax.c
@@ -30,7 +30,7 @@
   * the recommended way for applications to use the coprocessor, and
   * the driver interface is not intended for general use.
   *
- * See Documentation/sparc/oradax/oracle-dax.txt for more details.
+ * See Documentation/sparc/oradax/oracle-dax.rst for more details.
   */
  
  #include <linux/uaccess.h>
author	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
	Mon, 22 Apr 2019 13:28:02 +0000 (10:28 -0300)
committer	David S. Miller <davem@davemloft.net>
	Thu, 9 May 2019 00:13:35 +0000 (17:13 -0700)
Documentation/sparc/adi.rst	[new file with mode: 0644]	patch \| blob
Documentation/sparc/adi.txt	[deleted file]	patch \| blob \| blame \| history
Documentation/sparc/console.rst	[new file with mode: 0644]	patch \| blob
Documentation/sparc/console.txt	[deleted file]	patch \| blob \| blame \| history
Documentation/sparc/index.rst	[new file with mode: 0644]	patch \| blob
Documentation/sparc/oradax/oracle-dax.rst	[new file with mode: 0644]	patch \| blob
Documentation/sparc/oradax/oracle-dax.txt	[deleted file]	patch \| blob \| blame \| history
drivers/sbus/char/oradax.c		patch \| blob \| blame \| history