* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: cpu_debug remove execute permission
x86: smarten /proc/interrupts output for new counters
x86: DMI match for the Dell DXP061 as it needs BIOS reboot
x86: make 64 bit to use default_inquire_remote_apic
x86, setup: un-resequence mode setting for VGA 80x34 and 80x60 modes
x86, intel-iommu: fix X2APIC && !ACPI build failure
process (bash) into it. CPU time consumed by this bash and its children
can be obtained from g1/cpuacct.usage and the same is accumulated in
/cgroups/cpuacct.usage also.
+
+cpuacct.stat file lists a few statistics which further divide the
+CPU time obtained by the cgroup into user and system times. Currently
+the following statistics are supported:
+
+user: Time spent by tasks of the cgroup in user mode.
+system: Time spent by tasks of the cgroup in kernel mode.
+
+user and system are in USER_HZ unit.
+
+cpuacct controller uses percpu_counter interface to collect user and
+system times. This has two side effects:
+
+- It is theoretically possible to see wrong values for user and system times.
+ This is because percpu_counter_read() on 32bit systems isn't safe
+ against concurrent writes.
+- It is possible to see slightly outdated values for user and system times
+ due to the batch processing nature of percpu_counter.
+++ /dev/null
- ftrace - Function Tracer
- ========================
-
-Copyright 2008 Red Hat Inc.
- Author: Steven Rostedt <srostedt@redhat.com>
- License: The GNU Free Documentation License, Version 1.2
- (dual licensed under the GPL v2)
-Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
- John Kacur, and David Teigland.
-
-Written for: 2.6.28-rc2
-
-Introduction
-------------
-
-Ftrace is an internal tracer designed to help out developers and
-designers of systems to find what is going on inside the kernel.
-It can be used for debugging or analyzing latencies and
-performance issues that take place outside of user-space.
-
-Although ftrace is the function tracer, it also includes an
-infrastructure that allows for other types of tracing. Some of
-the tracers that are currently in ftrace include a tracer to
-trace context switches, the time it takes for a high priority
-task to run after it was woken up, the time interrupts are
-disabled, and more (ftrace allows for tracer plugins, which
-means that the list of tracers can always grow).
-
-
-The File System
----------------
-
-Ftrace uses the debugfs file system to hold the control files as
-well as the files to display output.
-
-To mount the debugfs system:
-
- # mkdir /debug
- # mount -t debugfs nodev /debug
-
-( Note: it is more common to mount at /sys/kernel/debug, but for
- simplicity this document will use /debug)
-
-That's it! (assuming that you have ftrace configured into your kernel)
-
-After mounting the debugfs, you can see a directory called
-"tracing". This directory contains the control and output files
-of ftrace. Here is a list of some of the key files:
-
-
- Note: all time values are in microseconds.
-
- current_tracer:
-
- This is used to set or display the current tracer
- that is configured.
-
- available_tracers:
-
- This holds the different types of tracers that
- have been compiled into the kernel. The
- tracers listed here can be configured by
- echoing their name into current_tracer.
-
- tracing_enabled:
-
- This sets or displays whether the current_tracer
- is activated and tracing or not. Echo 0 into this
- file to disable the tracer or 1 to enable it.
-
- trace:
-
- This file holds the output of the trace in a human
- readable format (described below).
-
- latency_trace:
-
- This file shows the same trace but the information
- is organized more to display possible latencies
- in the system (described below).
-
- trace_pipe:
-
- The output is the same as the "trace" file but this
- file is meant to be streamed with live tracing.
- Reads from this file will block until new data
- is retrieved. Unlike the "trace" and "latency_trace"
- files, this file is a consumer. This means reading
- from this file causes sequential reads to display
- more current data. Once data is read from this
- file, it is consumed, and will not be read
- again with a sequential read. The "trace" and
- "latency_trace" files are static, and if the
- tracer is not adding more data, they will display
- the same information every time they are read.
-
- trace_options:
-
- This file lets the user control the amount of data
- that is displayed in one of the above output
- files.
-
- tracing_max_latency:
-
- Some of the tracers record the max latency.
- For example, the time interrupts are disabled.
- This time is saved in this file. The max trace
- will also be stored, and displayed by either
- "trace" or "latency_trace". A new max trace will
- only be recorded if the latency is greater than
- the value in this file. (in microseconds)
-
- buffer_size_kb:
-
- This sets or displays the number of kilobytes each CPU
- buffer can hold. The tracer buffers are the same size
- for each CPU. The displayed number is the size of the
- CPU buffer and not total size of all buffers. The
- trace buffers are allocated in pages (blocks of memory
- that the kernel uses for allocation, usually 4 KB in size).
- If the last page allocated has room for more bytes
- than requested, the rest of the page will be used,
- making the actual allocation bigger than requested.
- ( Note, the size may not be a multiple of the page size
- due to buffer managment overhead. )
-
- This can only be updated when the current_tracer
- is set to "nop".
-
- tracing_cpumask:
-
- This is a mask that lets the user only trace
- on specified CPUS. The format is a hex string
- representing the CPUS.
-
- set_ftrace_filter:
-
- When dynamic ftrace is configured in (see the
- section below "dynamic ftrace"), the code is dynamically
- modified (code text rewrite) to disable calling of the
- function profiler (mcount). This lets tracing be configured
- in with practically no overhead in performance. This also
- has a side effect of enabling or disabling specific functions
- to be traced. Echoing names of functions into this file
- will limit the trace to only those functions.
-
- set_ftrace_notrace:
-
- This has an effect opposite to that of
- set_ftrace_filter. Any function that is added here will not
- be traced. If a function exists in both set_ftrace_filter
- and set_ftrace_notrace, the function will _not_ be traced.
-
- set_ftrace_pid:
-
- Have the function tracer only trace a single thread.
-
- set_graph_function:
-
- Set a "trigger" function where tracing should start
- with the function graph tracer (See the section
- "dynamic ftrace" for more details).
-
- available_filter_functions:
-
- This lists the functions that ftrace
- has processed and can trace. These are the function
- names that you can pass to "set_ftrace_filter" or
- "set_ftrace_notrace". (See the section "dynamic ftrace"
- below for more details.)
-
-
-The Tracers
------------
-
-Here is the list of current tracers that may be configured.
-
- "function"
-
- Function call tracer to trace all kernel functions.
-
- "function_graph_tracer"
-
- Similar to the function tracer except that the
- function tracer probes the functions on their entry
- whereas the function graph tracer traces on both entry
- and exit of the functions. It then provides the ability
- to draw a graph of function calls similar to C code
- source.
-
- "sched_switch"
-
- Traces the context switches and wakeups between tasks.
-
- "irqsoff"
-
- Traces the areas that disable interrupts and saves
- the trace with the longest max latency.
- See tracing_max_latency. When a new max is recorded,
- it replaces the old trace. It is best to view this
- trace via the latency_trace file.
-
- "preemptoff"
-
- Similar to irqsoff but traces and records the amount of
- time for which preemption is disabled.
-
- "preemptirqsoff"
-
- Similar to irqsoff and preemptoff, but traces and
- records the largest time for which irqs and/or preemption
- is disabled.
-
- "wakeup"
-
- Traces and records the max latency that it takes for
- the highest priority task to get scheduled after
- it has been woken up.
-
- "hw-branch-tracer"
-
- Uses the BTS CPU feature on x86 CPUs to traces all
- branches executed.
-
- "nop"
-
- This is the "trace nothing" tracer. To remove all
- tracers from tracing simply echo "nop" into
- current_tracer.
-
-
-Examples of using the tracer
-----------------------------
-
-Here are typical examples of using the tracers when controlling
-them only with the debugfs interface (without using any
-user-land utilities).
-
-Output format:
---------------
-
-Here is an example of the output format of the file "trace"
-
- --------
-# tracer: function
-#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-4251 [01] 10152.583854: path_put <-path_walk
- bash-4251 [01] 10152.583855: dput <-path_put
- bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput
- --------
-
-A header is printed with the tracer name that is represented by
-the trace. In this case the tracer is "function". Then a header
-showing the format. Task name "bash", the task PID "4251", the
-CPU that it was running on "01", the timestamp in <secs>.<usecs>
-format, the function name that was traced "path_put" and the
-parent function that called this function "path_walk". The
-timestamp is the time at which the function was entered.
-
-The sched_switch tracer also includes tracing of task wakeups
-and context switches.
-
- ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
- ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
- ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
- events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
- kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
- ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
-
-Wake ups are represented by a "+" and the context switches are
-shown as "==>". The format is:
-
- Context switches:
-
- Previous task Next Task
-
- <pid>:<prio>:<state> ==> <pid>:<prio>:<state>
-
- Wake ups:
-
- Current task Task waking up
-
- <pid>:<prio>:<state> + <pid>:<prio>:<state>
-
-The prio is the internal kernel priority, which is the inverse
-of the priority that is usually displayed by user-space tools.
-Zero represents the highest priority (99). Prio 100 starts the
-"nice" priorities with 100 being equal to nice -20 and 139 being
-nice 19. The prio "140" is reserved for the idle task which is
-the lowest priority thread (pid 0).
-
-
-Latency trace format
---------------------
-
-For traces that display latency times, the latency_trace file
-gives somewhat more information to see why a latency happened.
-Here is a typical trace.
-
-# tracer: irqsoff
-#
-irqsoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 97 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: apic_timer_interrupt
- => ended at: do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- <idle>-0 0d..1 0us+: trace_hardirqs_off_thunk (apic_timer_interrupt)
- <idle>-0 0d.s. 97us : __do_softirq (do_softirq)
- <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq)
-
-
-This shows that the current tracer is "irqsoff" tracing the time
-for which interrupts were disabled. It gives the trace version
-and the version of the kernel upon which this was executed on
-(2.6.26-rc8). Then it displays the max latency in microsecs (97
-us). The number of trace entries displayed and the total number
-recorded (both are three: #3/3). The type of preemption that was
-used (PREEMPT). VP, KP, SP, and HP are always zero and are
-reserved for later use. #P is the number of online CPUS (#P:2).
-
-The task is the process that was running when the latency
-occurred. (swapper pid: 0).
-
-The start and stop (the functions in which the interrupts were
-disabled and enabled respectively) that caused the latencies:
-
- apic_timer_interrupt is where the interrupts were disabled.
- do_softirq is where they were enabled again.
-
-The next lines after the header are the trace itself. The header
-explains which is which.
-
- cmd: The name of the process in the trace.
-
- pid: The PID of that process.
-
- CPU#: The CPU which the process was running on.
-
- irqs-off: 'd' interrupts are disabled. '.' otherwise.
- Note: If the architecture does not support a way to
- read the irq flags variable, an 'X' will always
- be printed here.
-
- need-resched: 'N' task need_resched is set, '.' otherwise.
-
- hardirq/softirq:
- 'H' - hard irq occurred inside a softirq.
- 'h' - hard irq is running
- 's' - soft irq is running
- '.' - normal context.
-
- preempt-depth: The level of preempt_disabled
-
-The above is mostly meaningful for kernel developers.
-
- time: This differs from the trace file output. The trace file output
- includes an absolute timestamp. The timestamp used by the
- latency_trace file is relative to the start of the trace.
-
- delay: This is just to help catch your eye a bit better. And
- needs to be fixed to be only relative to the same CPU.
- The marks are determined by the difference between this
- current trace and the next trace.
- '!' - greater than preempt_mark_thresh (default 100)
- '+' - greater than 1 microsecond
- ' ' - less than or equal to 1 microsecond.
-
- The rest is the same as the 'trace' file.
-
-
-trace_options
--------------
-
-The trace_options file is used to control what gets printed in
-the trace output. To see what is available, simply cat the file:
-
- cat /debug/tracing/trace_options
- print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
- noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
-
-To disable one of the options, echo in the option prepended with
-"no".
-
- echo noprint-parent > /debug/tracing/trace_options
-
-To enable an option, leave off the "no".
-
- echo sym-offset > /debug/tracing/trace_options
-
-Here are the available options:
-
- print-parent - On function traces, display the calling (parent)
- function as well as the function being traced.
-
- print-parent:
- bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul
-
- noprint-parent:
- bash-4000 [01] 1477.606694: simple_strtoul
-
-
- sym-offset - Display not only the function name, but also the
- offset in the function. For example, instead of
- seeing just "ktime_get", you will see
- "ktime_get+0xb/0x20".
-
- sym-offset:
- bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
-
- sym-addr - this will also display the function address as well
- as the function name.
-
- sym-addr:
- bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
-
- verbose - This deals with the latency_trace file.
-
- bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
- (+0.000ms): simple_strtoul (strict_strtoul)
-
- raw - This will display raw numbers. This option is best for
- use with user applications that can translate the raw
- numbers better than having it done in the kernel.
-
- hex - Similar to raw, but the numbers will be in a hexadecimal
- format.
-
- bin - This will print out the formats in raw binary.
-
- block - TBD (needs update)
-
- stacktrace - This is one of the options that changes the trace
- itself. When a trace is recorded, so is the stack
- of functions. This allows for back traces of
- trace sites.
-
- userstacktrace - This option changes the trace. It records a
- stacktrace of the current userspace thread.
-
- sym-userobj - when user stacktrace are enabled, look up which
- object the address belongs to, and print a
- relative address. This is especially useful when
- ASLR is on, otherwise you don't get a chance to
- resolve the address to object/file/line after
- the app is no longer running
-
- The lookup is performed when you read
- trace,trace_pipe,latency_trace. Example:
-
- a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
-x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
-
- sched-tree - trace all tasks that are on the runqueue, at
- every scheduling event. Will add overhead if
- there's a lot of tasks running at once.
-
-
-sched_switch
-------------
-
-This tracer simply records schedule switches. Here is an example
-of how to use it.
-
- # echo sched_switch > /debug/tracing/current_tracer
- # echo 1 > /debug/tracing/tracing_enabled
- # sleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/trace
-
-# tracer: sched_switch
-#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
- bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
- sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
- bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
- bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
- sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
- bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
- bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
- <idle>-0 [00] 240.132589: 0:140:R + 4:115:S
- <idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
- ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
- <idle>-0 [00] 240.132598: 0:140:R + 4:115:S
- <idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
- ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
- sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
- [...]
-
-
-As we have discussed previously about this format, the header
-shows the name of the trace and points to the options. The
-"FUNCTION" is a misnomer since here it represents the wake ups
-and context switches.
-
-The sched_switch file only lists the wake ups (represented with
-'+') and context switches ('==>') with the previous task or
-current task first followed by the next task or task waking up.
-The format for both of these is PID:KERNEL-PRIO:TASK-STATE.
-Remember that the KERNEL-PRIO is the inverse of the actual
-priority with zero (0) being the highest priority and the nice
-values starting at 100 (nice -20). Below is a quick chart to map
-the kernel priority to user land priorities.
-
- Kernel priority: 0 to 99 ==> user RT priority 99 to 0
- Kernel priority: 100 to 139 ==> user nice -20 to 19
- Kernel priority: 140 ==> idle task priority
-
-The task states are:
-
- R - running : wants to run, may not actually be running
- S - sleep : process is waiting to be woken up (handles signals)
- D - disk sleep (uninterruptible sleep) : process must be woken up
- (ignores signals)
- T - stopped : process suspended
- t - traced : process is being traced (with something like gdb)
- Z - zombie : process waiting to be cleaned up
- X - unknown
-
-
-ftrace_enabled
---------------
-
-The following tracers (listed below) give different output
-depending on whether or not the sysctl ftrace_enabled is set. To
-set ftrace_enabled, one can either use the sysctl function or
-set it via the proc file system interface.
-
- sysctl kernel.ftrace_enabled=1
-
- or
-
- echo 1 > /proc/sys/kernel/ftrace_enabled
-
-To disable ftrace_enabled simply replace the '1' with '0' in the
-above commands.
-
-When ftrace_enabled is set the tracers will also record the
-functions that are within the trace. The descriptions of the
-tracers will also show an example with ftrace enabled.
-
-
-irqsoff
--------
-
-When interrupts are disabled, the CPU can not react to any other
-external event (besides NMIs and SMIs). This prevents the timer
-interrupt from triggering or the mouse interrupt from letting
-the kernel know of a new mouse event. The result is a latency
-with the reaction time.
-
-The irqsoff tracer tracks the time for which interrupts are
-disabled. When a new maximum latency is hit, the tracer saves
-the trace leading up to that latency point so that every time a
-new maximum is reached, the old saved trace is discarded and the
-new trace is saved.
-
-To reset the maximum, echo 0 into tracing_max_latency. Here is
-an example:
-
- # echo irqsoff > /debug/tracing/current_tracer
- # echo 0 > /debug/tracing/tracing_max_latency
- # echo 1 > /debug/tracing/tracing_enabled
- # ls -ltr
- [...]
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/latency_trace
-# tracer: irqsoff
-#
-irqsoff latency trace v1.1.5 on 2.6.26
---------------------------------------------------------------------
- latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: sys_setpgid
- => ended at: sys_setpgid
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- bash-3730 1d... 0us : _write_lock_irq (sys_setpgid)
- bash-3730 1d..1 1us+: _write_unlock_irq (sys_setpgid)
- bash-3730 1d..2 14us : trace_hardirqs_on (sys_setpgid)
-
-
-Here we see that that we had a latency of 12 microsecs (which is
-very good). The _write_lock_irq in sys_setpgid disabled
-interrupts. The difference between the 12 and the displayed
-timestamp 14us occurred because the clock was incremented
-between the time of recording the max latency and the time of
-recording the function that had that latency.
-
-Note the above example had ftrace_enabled not set. If we set the
-ftrace_enabled, we get a much larger output:
-
-# tracer: irqsoff
-#
-irqsoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 50 us, #101/101, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: ls-4339 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: __alloc_pages_internal
- => ended at: __alloc_pages_internal
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- ls-4339 0...1 0us+: get_page_from_freelist (__alloc_pages_internal)
- ls-4339 0d..1 3us : rmqueue_bulk (get_page_from_freelist)
- ls-4339 0d..1 3us : _spin_lock (rmqueue_bulk)
- ls-4339 0d..1 4us : add_preempt_count (_spin_lock)
- ls-4339 0d..2 4us : __rmqueue (rmqueue_bulk)
- ls-4339 0d..2 5us : __rmqueue_smallest (__rmqueue)
- ls-4339 0d..2 5us : __mod_zone_page_state (__rmqueue_smallest)
- ls-4339 0d..2 6us : __rmqueue (rmqueue_bulk)
- ls-4339 0d..2 6us : __rmqueue_smallest (__rmqueue)
- ls-4339 0d..2 7us : __mod_zone_page_state (__rmqueue_smallest)
- ls-4339 0d..2 7us : __rmqueue (rmqueue_bulk)
- ls-4339 0d..2 8us : __rmqueue_smallest (__rmqueue)
-[...]
- ls-4339 0d..2 46us : __rmqueue_smallest (__rmqueue)
- ls-4339 0d..2 47us : __mod_zone_page_state (__rmqueue_smallest)
- ls-4339 0d..2 47us : __rmqueue (rmqueue_bulk)
- ls-4339 0d..2 48us : __rmqueue_smallest (__rmqueue)
- ls-4339 0d..2 48us : __mod_zone_page_state (__rmqueue_smallest)
- ls-4339 0d..2 49us : _spin_unlock (rmqueue_bulk)
- ls-4339 0d..2 49us : sub_preempt_count (_spin_unlock)
- ls-4339 0d..1 50us : get_page_from_freelist (__alloc_pages_internal)
- ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal)
-
-
-
-Here we traced a 50 microsecond latency. But we also see all the
-functions that were called during that time. Note that by
-enabling function tracing, we incur an added overhead. This
-overhead may extend the latency times. But nevertheless, this
-trace has provided some very helpful debugging information.
-
-
-preemptoff
-----------
-
-When preemption is disabled, we may be able to receive
-interrupts but the task cannot be preempted and a higher
-priority task must wait for preemption to be enabled again
-before it can preempt a lower priority task.
-
-The preemptoff tracer traces the places that disable preemption.
-Like the irqsoff tracer, it records the maximum latency for
-which preemption was disabled. The control of preemptoff tracer
-is much like the irqsoff tracer.
-
- # echo preemptoff > /debug/tracing/current_tracer
- # echo 0 > /debug/tracing/tracing_max_latency
- # echo 1 > /debug/tracing/tracing_enabled
- # ls -ltr
- [...]
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/latency_trace
-# tracer: preemptoff
-#
-preemptoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 29 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: do_IRQ
- => ended at: __do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- sshd-4261 0d.h. 0us+: irq_enter (do_IRQ)
- sshd-4261 0d.s. 29us : _local_bh_enable (__do_softirq)
- sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq)
-
-
-This has some more changes. Preemption was disabled when an
-interrupt came in (notice the 'h'), and was enabled while doing
-a softirq. (notice the 's'). But we also see that interrupts
-have been disabled when entering the preempt off section and
-leaving it (the 'd'). We do not know if interrupts were enabled
-in the mean time.
-
-# tracer: preemptoff
-#
-preemptoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 63 us, #87/87, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: remove_wait_queue
- => ended at: __do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- sshd-4261 0d..1 0us : _spin_lock_irqsave (remove_wait_queue)
- sshd-4261 0d..1 1us : _spin_unlock_irqrestore (remove_wait_queue)
- sshd-4261 0d..1 2us : do_IRQ (common_interrupt)
- sshd-4261 0d..1 2us : irq_enter (do_IRQ)
- sshd-4261 0d..1 2us : idle_cpu (irq_enter)
- sshd-4261 0d..1 3us : add_preempt_count (irq_enter)
- sshd-4261 0d.h1 3us : idle_cpu (irq_enter)
- sshd-4261 0d.h. 4us : handle_fasteoi_irq (do_IRQ)
-[...]
- sshd-4261 0d.h. 12us : add_preempt_count (_spin_lock)
- sshd-4261 0d.h1 12us : ack_ioapic_quirk_irq (handle_fasteoi_irq)
- sshd-4261 0d.h1 13us : move_native_irq (ack_ioapic_quirk_irq)
- sshd-4261 0d.h1 13us : _spin_unlock (handle_fasteoi_irq)
- sshd-4261 0d.h1 14us : sub_preempt_count (_spin_unlock)
- sshd-4261 0d.h1 14us : irq_exit (do_IRQ)
- sshd-4261 0d.h1 15us : sub_preempt_count (irq_exit)
- sshd-4261 0d..2 15us : do_softirq (irq_exit)
- sshd-4261 0d... 15us : __do_softirq (do_softirq)
- sshd-4261 0d... 16us : __local_bh_disable (__do_softirq)
- sshd-4261 0d... 16us+: add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s4 20us : add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s4 21us : sub_preempt_count (local_bh_enable)
- sshd-4261 0d.s5 21us : sub_preempt_count (local_bh_enable)
-[...]
- sshd-4261 0d.s6 41us : add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s6 42us : sub_preempt_count (local_bh_enable)
- sshd-4261 0d.s7 42us : sub_preempt_count (local_bh_enable)
- sshd-4261 0d.s5 43us : add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s5 43us : sub_preempt_count (local_bh_enable_ip)
- sshd-4261 0d.s6 44us : sub_preempt_count (local_bh_enable_ip)
- sshd-4261 0d.s5 44us : add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s5 45us : sub_preempt_count (local_bh_enable)
-[...]
- sshd-4261 0d.s. 63us : _local_bh_enable (__do_softirq)
- sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq)
-
-
-The above is an example of the preemptoff trace with
-ftrace_enabled set. Here we see that interrupts were disabled
-the entire time. The irq_enter code lets us know that we entered
-an interrupt 'h'. Before that, the functions being traced still
-show that it is not in an interrupt, but we can see from the
-functions themselves that this is not the case.
-
-Notice that __do_softirq when called does not have a
-preempt_count. It may seem that we missed a preempt enabling.
-What really happened is that the preempt count is held on the
-thread's stack and we switched to the softirq stack (4K stacks
-in effect). The code does not copy the preempt count, but
-because interrupts are disabled, we do not need to worry about
-it. Having a tracer like this is good for letting people know
-what really happens inside the kernel.
-
-
-preemptirqsoff
---------------
-
-Knowing the locations that have interrupts disabled or
-preemption disabled for the longest times is helpful. But
-sometimes we would like to know when either preemption and/or
-interrupts are disabled.
-
-Consider the following code:
-
- local_irq_disable();
- call_function_with_irqs_off();
- preempt_disable();
- call_function_with_irqs_and_preemption_off();
- local_irq_enable();
- call_function_with_preemption_off();
- preempt_enable();
-
-The irqsoff tracer will record the total length of
-call_function_with_irqs_off() and
-call_function_with_irqs_and_preemption_off().
-
-The preemptoff tracer will record the total length of
-call_function_with_irqs_and_preemption_off() and
-call_function_with_preemption_off().
-
-But neither will trace the time that interrupts and/or
-preemption is disabled. This total time is the time that we can
-not schedule. To record this time, use the preemptirqsoff
-tracer.
-
-Again, using this trace is much like the irqsoff and preemptoff
-tracers.
-
- # echo preemptirqsoff > /debug/tracing/current_tracer
- # echo 0 > /debug/tracing/tracing_max_latency
- # echo 1 > /debug/tracing/tracing_enabled
- # ls -ltr
- [...]
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/latency_trace
-# tracer: preemptirqsoff
-#
-preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 293 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: ls-4860 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: apic_timer_interrupt
- => ended at: __do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- ls-4860 0d... 0us!: trace_hardirqs_off_thunk (apic_timer_interrupt)
- ls-4860 0d.s. 294us : _local_bh_enable (__do_softirq)
- ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq)
-
-
-
-The trace_hardirqs_off_thunk is called from assembly on x86 when
-interrupts are disabled in the assembly code. Without the
-function tracing, we do not know if interrupts were enabled
-within the preemption points. We do see that it started with
-preemption enabled.
-
-Here is a trace with ftrace_enabled set:
-
-
-# tracer: preemptirqsoff
-#
-preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 105 us, #183/183, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
- -----------------
- => started at: write_chan
- => ended at: __do_softirq
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- ls-4473 0.N.. 0us : preempt_schedule (write_chan)
- ls-4473 0dN.1 1us : _spin_lock (schedule)
- ls-4473 0dN.1 2us : add_preempt_count (_spin_lock)
- ls-4473 0d..2 2us : put_prev_task_fair (schedule)
-[...]
- ls-4473 0d..2 13us : set_normalized_timespec (ktime_get_ts)
- ls-4473 0d..2 13us : __switch_to (schedule)
- sshd-4261 0d..2 14us : finish_task_switch (schedule)
- sshd-4261 0d..2 14us : _spin_unlock_irq (finish_task_switch)
- sshd-4261 0d..1 15us : add_preempt_count (_spin_lock_irqsave)
- sshd-4261 0d..2 16us : _spin_unlock_irqrestore (hrtick_set)
- sshd-4261 0d..2 16us : do_IRQ (common_interrupt)
- sshd-4261 0d..2 17us : irq_enter (do_IRQ)
- sshd-4261 0d..2 17us : idle_cpu (irq_enter)
- sshd-4261 0d..2 18us : add_preempt_count (irq_enter)
- sshd-4261 0d.h2 18us : idle_cpu (irq_enter)
- sshd-4261 0d.h. 18us : handle_fasteoi_irq (do_IRQ)
- sshd-4261 0d.h. 19us : _spin_lock (handle_fasteoi_irq)
- sshd-4261 0d.h. 19us : add_preempt_count (_spin_lock)
- sshd-4261 0d.h1 20us : _spin_unlock (handle_fasteoi_irq)
- sshd-4261 0d.h1 20us : sub_preempt_count (_spin_unlock)
-[...]
- sshd-4261 0d.h1 28us : _spin_unlock (handle_fasteoi_irq)
- sshd-4261 0d.h1 29us : sub_preempt_count (_spin_unlock)
- sshd-4261 0d.h2 29us : irq_exit (do_IRQ)
- sshd-4261 0d.h2 29us : sub_preempt_count (irq_exit)
- sshd-4261 0d..3 30us : do_softirq (irq_exit)
- sshd-4261 0d... 30us : __do_softirq (do_softirq)
- sshd-4261 0d... 31us : __local_bh_disable (__do_softirq)
- sshd-4261 0d... 31us+: add_preempt_count (__local_bh_disable)
- sshd-4261 0d.s4 34us : add_preempt_count (__local_bh_disable)
-[...]
- sshd-4261 0d.s3 43us : sub_preempt_count (local_bh_enable_ip)
- sshd-4261 0d.s4 44us : sub_preempt_count (local_bh_enable_ip)
- sshd-4261 0d.s3 44us : smp_apic_timer_interrupt (apic_timer_interrupt)
- sshd-4261 0d.s3 45us : irq_enter (smp_apic_timer_interrupt)
- sshd-4261 0d.s3 45us : idle_cpu (irq_enter)
- sshd-4261 0d.s3 46us : add_preempt_count (irq_enter)
- sshd-4261 0d.H3 46us : idle_cpu (irq_enter)
- sshd-4261 0d.H3 47us : hrtimer_interrupt (smp_apic_timer_interrupt)
- sshd-4261 0d.H3 47us : ktime_get (hrtimer_interrupt)
-[...]
- sshd-4261 0d.H3 81us : tick_program_event (hrtimer_interrupt)
- sshd-4261 0d.H3 82us : ktime_get (tick_program_event)
- sshd-4261 0d.H3 82us : ktime_get_ts (ktime_get)
- sshd-4261 0d.H3 83us : getnstimeofday (ktime_get_ts)
- sshd-4261 0d.H3 83us : set_normalized_timespec (ktime_get_ts)
- sshd-4261 0d.H3 84us : clockevents_program_event (tick_program_event)
- sshd-4261 0d.H3 84us : lapic_next_event (clockevents_program_event)
- sshd-4261 0d.H3 85us : irq_exit (smp_apic_timer_interrupt)
- sshd-4261 0d.H3 85us : sub_preempt_count (irq_exit)
- sshd-4261 0d.s4 86us : sub_preempt_count (irq_exit)
- sshd-4261 0d.s3 86us : add_preempt_count (__local_bh_disable)
-[...]
- sshd-4261 0d.s1 98us : sub_preempt_count (net_rx_action)
- sshd-4261 0d.s. 99us : add_preempt_count (_spin_lock_irq)
- sshd-4261 0d.s1 99us+: _spin_unlock_irq (run_timer_softirq)
- sshd-4261 0d.s. 104us : _local_bh_enable (__do_softirq)
- sshd-4261 0d.s. 104us : sub_preempt_count (_local_bh_enable)
- sshd-4261 0d.s. 105us : _local_bh_enable (__do_softirq)
- sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq)
-
-
-This is a very interesting trace. It started with the preemption
-of the ls task. We see that the task had the "need_resched" bit
-set via the 'N' in the trace. Interrupts were disabled before
-the spin_lock at the beginning of the trace. We see that a
-schedule took place to run sshd. When the interrupts were
-enabled, we took an interrupt. On return from the interrupt
-handler, the softirq ran. We took another interrupt while
-running the softirq as we see from the capital 'H'.
-
-
-wakeup
-------
-
-In a Real-Time environment it is very important to know the
-wakeup time it takes for the highest priority task that is woken
-up to the time that it executes. This is also known as "schedule
-latency". I stress the point that this is about RT tasks. It is
-also important to know the scheduling latency of non-RT tasks,
-but the average schedule latency is better for non-RT tasks.
-Tools like LatencyTop are more appropriate for such
-measurements.
-
-Real-Time environments are interested in the worst case latency.
-That is the longest latency it takes for something to happen,
-and not the average. We can have a very fast scheduler that may
-only have a large latency once in a while, but that would not
-work well with Real-Time tasks. The wakeup tracer was designed
-to record the worst case wakeups of RT tasks. Non-RT tasks are
-not recorded because the tracer only records one worst case and
-tracing non-RT tasks that are unpredictable will overwrite the
-worst case latency of RT tasks.
-
-Since this tracer only deals with RT tasks, we will run this
-slightly differently than we did with the previous tracers.
-Instead of performing an 'ls', we will run 'sleep 1' under
-'chrt' which changes the priority of the task.
-
- # echo wakeup > /debug/tracing/current_tracer
- # echo 0 > /debug/tracing/tracing_max_latency
- # echo 1 > /debug/tracing/tracing_enabled
- # chrt -f 5 sleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/latency_trace
-# tracer: wakeup
-#
-wakeup latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 4 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sleep-4901 (uid:0 nice:0 policy:1 rt_prio:5)
- -----------------
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
- <idle>-0 1d.h4 0us+: try_to_wake_up (wake_up_process)
- <idle>-0 1d..4 4us : schedule (cpu_idle)
-
-
-Running this on an idle system, we see that it only took 4
-microseconds to perform the task switch. Note, since the trace
-marker in the schedule is before the actual "switch", we stop
-the tracing when the recorded task is about to schedule in. This
-may change if we add a new marker at the end of the scheduler.
-
-Notice that the recorded task is 'sleep' with the PID of 4901
-and it has an rt_prio of 5. This priority is user-space priority
-and not the internal kernel priority. The policy is 1 for
-SCHED_FIFO and 2 for SCHED_RR.
-
-Doing the same with chrt -r 5 and ftrace_enabled set.
-
-# tracer: wakeup
-#
-wakeup latency trace v1.1.5 on 2.6.26-rc8
---------------------------------------------------------------------
- latency: 50 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
- -----------------
- | task: sleep-4068 (uid:0 nice:0 policy:2 rt_prio:5)
- -----------------
-
-# _------=> CPU#
-# / _-----=> irqs-off
-# | / _----=> need-resched
-# || / _---=> hardirq/softirq
-# ||| / _--=> preempt-depth
-# |||| /
-# ||||| delay
-# cmd pid ||||| time | caller
-# \ / ||||| \ | /
-ksoftirq-7 1d.H3 0us : try_to_wake_up (wake_up_process)
-ksoftirq-7 1d.H4 1us : sub_preempt_count (marker_probe_cb)
-ksoftirq-7 1d.H3 2us : check_preempt_wakeup (try_to_wake_up)
-ksoftirq-7 1d.H3 3us : update_curr (check_preempt_wakeup)
-ksoftirq-7 1d.H3 4us : calc_delta_mine (update_curr)
-ksoftirq-7 1d.H3 5us : __resched_task (check_preempt_wakeup)
-ksoftirq-7 1d.H3 6us : task_wake_up_rt (try_to_wake_up)
-ksoftirq-7 1d.H3 7us : _spin_unlock_irqrestore (try_to_wake_up)
-[...]
-ksoftirq-7 1d.H2 17us : irq_exit (smp_apic_timer_interrupt)
-ksoftirq-7 1d.H2 18us : sub_preempt_count (irq_exit)
-ksoftirq-7 1d.s3 19us : sub_preempt_count (irq_exit)
-ksoftirq-7 1..s2 20us : rcu_process_callbacks (__do_softirq)
-[...]
-ksoftirq-7 1..s2 26us : __rcu_process_callbacks (rcu_process_callbacks)
-ksoftirq-7 1d.s2 27us : _local_bh_enable (__do_softirq)
-ksoftirq-7 1d.s2 28us : sub_preempt_count (_local_bh_enable)
-ksoftirq-7 1.N.3 29us : sub_preempt_count (ksoftirqd)
-ksoftirq-7 1.N.2 30us : _cond_resched (ksoftirqd)
-ksoftirq-7 1.N.2 31us : __cond_resched (_cond_resched)
-ksoftirq-7 1.N.2 32us : add_preempt_count (__cond_resched)
-ksoftirq-7 1.N.2 33us : schedule (__cond_resched)
-ksoftirq-7 1.N.2 33us : add_preempt_count (schedule)
-ksoftirq-7 1.N.3 34us : hrtick_clear (schedule)
-ksoftirq-7 1dN.3 35us : _spin_lock (schedule)
-ksoftirq-7 1dN.3 36us : add_preempt_count (_spin_lock)
-ksoftirq-7 1d..4 37us : put_prev_task_fair (schedule)
-ksoftirq-7 1d..4 38us : update_curr (put_prev_task_fair)
-[...]
-ksoftirq-7 1d..5 47us : _spin_trylock (tracing_record_cmdline)
-ksoftirq-7 1d..5 48us : add_preempt_count (_spin_trylock)
-ksoftirq-7 1d..6 49us : _spin_unlock (tracing_record_cmdline)
-ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock)
-ksoftirq-7 1d..4 50us : schedule (__cond_resched)
-
-The interrupt went off while running ksoftirqd. This task runs
-at SCHED_OTHER. Why did not we see the 'N' set early? This may
-be a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K
-stacks configured, the interrupt and softirq run with their own
-stack. Some information is held on the top of the task's stack
-(need_resched and preempt_count are both stored there). The
-setting of the NEED_RESCHED bit is done directly to the task's
-stack, but the reading of the NEED_RESCHED is done by looking at
-the current stack, which in this case is the stack for the hard
-interrupt. This hides the fact that NEED_RESCHED has been set.
-We do not see the 'N' until we switch back to the task's
-assigned stack.
-
-function
---------
-
-This tracer is the function tracer. Enabling the function tracer
-can be done from the debug file system. Make sure the
-ftrace_enabled is set; otherwise this tracer is a nop.
-
- # sysctl kernel.ftrace_enabled=1
- # echo function > /debug/tracing/current_tracer
- # echo 1 > /debug/tracing/tracing_enabled
- # usleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/trace
-# tracer: function
-#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-4003 [00] 123.638713: finish_task_switch <-schedule
- bash-4003 [00] 123.638714: _spin_unlock_irq <-finish_task_switch
- bash-4003 [00] 123.638714: sub_preempt_count <-_spin_unlock_irq
- bash-4003 [00] 123.638715: hrtick_set <-schedule
- bash-4003 [00] 123.638715: _spin_lock_irqsave <-hrtick_set
- bash-4003 [00] 123.638716: add_preempt_count <-_spin_lock_irqsave
- bash-4003 [00] 123.638716: _spin_unlock_irqrestore <-hrtick_set
- bash-4003 [00] 123.638717: sub_preempt_count <-_spin_unlock_irqrestore
- bash-4003 [00] 123.638717: hrtick_clear <-hrtick_set
- bash-4003 [00] 123.638718: sub_preempt_count <-schedule
- bash-4003 [00] 123.638718: sub_preempt_count <-preempt_schedule
- bash-4003 [00] 123.638719: wait_for_completion <-__stop_machine_run
- bash-4003 [00] 123.638719: wait_for_common <-wait_for_completion
- bash-4003 [00] 123.638720: _spin_lock_irq <-wait_for_common
- bash-4003 [00] 123.638720: add_preempt_count <-_spin_lock_irq
-[...]
-
-
-Note: function tracer uses ring buffers to store the above
-entries. The newest data may overwrite the oldest data.
-Sometimes using echo to stop the trace is not sufficient because
-the tracing could have overwritten the data that you wanted to
-record. For this reason, it is sometimes better to disable
-tracing directly from a program. This allows you to stop the
-tracing at the point that you hit the part that you are
-interested in. To disable the tracing directly from a C program,
-something like following code snippet can be used:
-
-int trace_fd;
-[...]
-int main(int argc, char *argv[]) {
- [...]
- trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY);
- [...]
- if (condition_hit()) {
- write(trace_fd, "0", 1);
- }
- [...]
-}
-
-Note: Here we hard coded the path name. The debugfs mount is not
-guaranteed to be at /debug (and is more commonly at
-/sys/kernel/debug). For simple one time traces, the above is
-sufficent. For anything else, a search through /proc/mounts may
-be needed to find where the debugfs file-system is mounted.
-
-
-Single thread tracing
----------------------
-
-By writing into /debug/tracing/set_ftrace_pid you can trace a
-single thread. For example:
-
-# cat /debug/tracing/set_ftrace_pid
-no pid
-# echo 3111 > /debug/tracing/set_ftrace_pid
-# cat /debug/tracing/set_ftrace_pid
-3111
-# echo function > /debug/tracing/current_tracer
-# cat /debug/tracing/trace | head
- # tracer: function
- #
- # TASK-PID CPU# TIMESTAMP FUNCTION
- # | | | | |
- yum-updatesd-3111 [003] 1637.254676: finish_task_switch <-thread_return
- yum-updatesd-3111 [003] 1637.254681: hrtimer_cancel <-schedule_hrtimeout_range
- yum-updatesd-3111 [003] 1637.254682: hrtimer_try_to_cancel <-hrtimer_cancel
- yum-updatesd-3111 [003] 1637.254683: lock_hrtimer_base <-hrtimer_try_to_cancel
- yum-updatesd-3111 [003] 1637.254685: fget_light <-do_sys_poll
- yum-updatesd-3111 [003] 1637.254686: pipe_poll <-do_sys_poll
-# echo -1 > /debug/tracing/set_ftrace_pid
-# cat /debug/tracing/trace |head
- # tracer: function
- #
- # TASK-PID CPU# TIMESTAMP FUNCTION
- # | | | | |
- ##### CPU 3 buffer started ####
- yum-updatesd-3111 [003] 1701.957688: free_poll_entry <-poll_freewait
- yum-updatesd-3111 [003] 1701.957689: remove_wait_queue <-free_poll_entry
- yum-updatesd-3111 [003] 1701.957691: fput <-free_poll_entry
- yum-updatesd-3111 [003] 1701.957692: audit_syscall_exit <-sysret_audit
- yum-updatesd-3111 [003] 1701.957693: path_put <-audit_syscall_exit
-
-If you want to trace a function when executing, you could use
-something like this simple program:
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <unistd.h>
-
-int main (int argc, char **argv)
-{
- if (argc < 1)
- exit(-1);
-
- if (fork() > 0) {
- int fd, ffd;
- char line[64];
- int s;
-
- ffd = open("/debug/tracing/current_tracer", O_WRONLY);
- if (ffd < 0)
- exit(-1);
- write(ffd, "nop", 3);
-
- fd = open("/debug/tracing/set_ftrace_pid", O_WRONLY);
- s = sprintf(line, "%d\n", getpid());
- write(fd, line, s);
-
- write(ffd, "function", 8);
-
- close(fd);
- close(ffd);
-
- execvp(argv[1], argv+1);
- }
-
- return 0;
-}
-
-
-hw-branch-tracer (x86 only)
----------------------------
-
-This tracer uses the x86 last branch tracing hardware feature to
-collect a branch trace on all cpus with relatively low overhead.
-
-The tracer uses a fixed-size circular buffer per cpu and only
-traces ring 0 branches. The trace file dumps that buffer in the
-following format:
-
-# tracer: hw-branch-tracer
-#
-# CPU# TO <- FROM
- 0 scheduler_tick+0xb5/0x1bf <- task_tick_idle+0x5/0x6
- 2 run_posix_cpu_timers+0x2b/0x72a <- run_posix_cpu_timers+0x25/0x72a
- 0 scheduler_tick+0x139/0x1bf <- scheduler_tick+0xed/0x1bf
- 0 scheduler_tick+0x17c/0x1bf <- scheduler_tick+0x148/0x1bf
- 2 run_posix_cpu_timers+0x9e/0x72a <- run_posix_cpu_timers+0x5e/0x72a
- 0 scheduler_tick+0x1b6/0x1bf <- scheduler_tick+0x1aa/0x1bf
-
-
-The tracer may be used to dump the trace for the oops'ing cpu on
-a kernel oops into the system log. To enable this,
-ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one
-can either use the sysctl function or set it via the proc system
-interface.
-
- sysctl kernel.ftrace_dump_on_oops=1
-
-or
-
- echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
-
-
-Here's an example of such a dump after a null pointer
-dereference in a kernel module:
-
-[57848.105921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
-[57848.106019] IP: [<ffffffffa0000006>] open+0x6/0x14 [oops]
-[57848.106019] PGD 2354e9067 PUD 2375e7067 PMD 0
-[57848.106019] Oops: 0002 [#1] SMP
-[57848.106019] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:20:05.0/local_cpus
-[57848.106019] Dumping ftrace buffer:
-[57848.106019] ---------------------------------
-[...]
-[57848.106019] 0 chrdev_open+0xe6/0x165 <- cdev_put+0x23/0x24
-[57848.106019] 0 chrdev_open+0x117/0x165 <- chrdev_open+0xfa/0x165
-[57848.106019] 0 chrdev_open+0x120/0x165 <- chrdev_open+0x11c/0x165
-[57848.106019] 0 chrdev_open+0x134/0x165 <- chrdev_open+0x12b/0x165
-[57848.106019] 0 open+0x0/0x14 [oops] <- chrdev_open+0x144/0x165
-[57848.106019] 0 page_fault+0x0/0x30 <- open+0x6/0x14 [oops]
-[57848.106019] 0 error_entry+0x0/0x5b <- page_fault+0x4/0x30
-[57848.106019] 0 error_kernelspace+0x0/0x31 <- error_entry+0x59/0x5b
-[57848.106019] 0 error_sti+0x0/0x1 <- error_kernelspace+0x2d/0x31
-[57848.106019] 0 page_fault+0x9/0x30 <- error_sti+0x0/0x1
-[57848.106019] 0 do_page_fault+0x0/0x881 <- page_fault+0x1a/0x30
-[...]
-[57848.106019] 0 do_page_fault+0x66b/0x881 <- is_prefetch+0x1ee/0x1f2
-[57848.106019] 0 do_page_fault+0x6e0/0x881 <- do_page_fault+0x67a/0x881
-[57848.106019] 0 oops_begin+0x0/0x96 <- do_page_fault+0x6e0/0x881
-[57848.106019] 0 trace_hw_branch_oops+0x0/0x2d <- oops_begin+0x9/0x96
-[...]
-[57848.106019] 0 ds_suspend_bts+0x2a/0xe3 <- ds_suspend_bts+0x1a/0xe3
-[57848.106019] ---------------------------------
-[57848.106019] CPU 0
-[57848.106019] Modules linked in: oops
-[57848.106019] Pid: 5542, comm: cat Tainted: G W 2.6.28 #23
-[57848.106019] RIP: 0010:[<ffffffffa0000006>] [<ffffffffa0000006>] open+0x6/0x14 [oops]
-[57848.106019] RSP: 0018:ffff880235457d48 EFLAGS: 00010246
-[...]
-
-
-function graph tracer
----------------------------
-
-This tracer is similar to the function tracer except that it
-probes a function on its entry and its exit. This is done by
-using a dynamically allocated stack of return addresses in each
-task_struct. On function entry the tracer overwrites the return
-address of each function traced to set a custom probe. Thus the
-original return address is stored on the stack of return address
-in the task_struct.
-
-Probing on both ends of a function leads to special features
-such as:
-
-- measure of a function's time execution
-- having a reliable call stack to draw function calls graph
-
-This tracer is useful in several situations:
-
-- you want to find the reason of a strange kernel behavior and
- need to see what happens in detail on any areas (or specific
- ones).
-
-- you are experiencing weird latencies but it's difficult to
- find its origin.
-
-- you want to find quickly which path is taken by a specific
- function
-
-- you just want to peek inside a working kernel and want to see
- what happens there.
-
-# tracer: function_graph
-#
-# CPU DURATION FUNCTION CALLS
-# | | | | | | |
-
- 0) | sys_open() {
- 0) | do_sys_open() {
- 0) | getname() {
- 0) | kmem_cache_alloc() {
- 0) 1.382 us | __might_sleep();
- 0) 2.478 us | }
- 0) | strncpy_from_user() {
- 0) | might_fault() {
- 0) 1.389 us | __might_sleep();
- 0) 2.553 us | }
- 0) 3.807 us | }
- 0) 7.876 us | }
- 0) | alloc_fd() {
- 0) 0.668 us | _spin_lock();
- 0) 0.570 us | expand_files();
- 0) 0.586 us | _spin_unlock();
-
-
-There are several columns that can be dynamically
-enabled/disabled. You can use every combination of options you
-want, depending on your needs.
-
-- The cpu number on which the function executed is default
- enabled. It is sometimes better to only trace one cpu (see
- tracing_cpu_mask file) or you might sometimes see unordered
- function calls while cpu tracing switch.
-
- hide: echo nofuncgraph-cpu > /debug/tracing/trace_options
- show: echo funcgraph-cpu > /debug/tracing/trace_options
-
-- The duration (function's time of execution) is displayed on
- the closing bracket line of a function or on the same line
- than the current function in case of a leaf one. It is default
- enabled.
-
- hide: echo nofuncgraph-duration > /debug/tracing/trace_options
- show: echo funcgraph-duration > /debug/tracing/trace_options
-
-- The overhead field precedes the duration field in case of
- reached duration thresholds.
-
- hide: echo nofuncgraph-overhead > /debug/tracing/trace_options
- show: echo funcgraph-overhead > /debug/tracing/trace_options
- depends on: funcgraph-duration
-
- ie:
-
- 0) | up_write() {
- 0) 0.646 us | _spin_lock_irqsave();
- 0) 0.684 us | _spin_unlock_irqrestore();
- 0) 3.123 us | }
- 0) 0.548 us | fput();
- 0) + 58.628 us | }
-
- [...]
-
- 0) | putname() {
- 0) | kmem_cache_free() {
- 0) 0.518 us | __phys_addr();
- 0) 1.757 us | }
- 0) 2.861 us | }
- 0) ! 115.305 us | }
- 0) ! 116.402 us | }
-
- + means that the function exceeded 10 usecs.
- ! means that the function exceeded 100 usecs.
-
-
-- The task/pid field displays the thread cmdline and pid which
- executed the function. It is default disabled.
-
- hide: echo nofuncgraph-proc > /debug/tracing/trace_options
- show: echo funcgraph-proc > /debug/tracing/trace_options
-
- ie:
-
- # tracer: function_graph
- #
- # CPU TASK/PID DURATION FUNCTION CALLS
- # | | | | | | | | |
- 0) sh-4802 | | d_free() {
- 0) sh-4802 | | call_rcu() {
- 0) sh-4802 | | __call_rcu() {
- 0) sh-4802 | 0.616 us | rcu_process_gp_end();
- 0) sh-4802 | 0.586 us | check_for_new_grace_period();
- 0) sh-4802 | 2.899 us | }
- 0) sh-4802 | 4.040 us | }
- 0) sh-4802 | 5.151 us | }
- 0) sh-4802 | + 49.370 us | }
-
-
-- The absolute time field is an absolute timestamp given by the
- system clock since it started. A snapshot of this time is
- given on each entry/exit of functions
-
- hide: echo nofuncgraph-abstime > /debug/tracing/trace_options
- show: echo funcgraph-abstime > /debug/tracing/trace_options
-
- ie:
-
- #
- # TIME CPU DURATION FUNCTION CALLS
- # | | | | | | | |
- 360.774522 | 1) 0.541 us | }
- 360.774522 | 1) 4.663 us | }
- 360.774523 | 1) 0.541 us | __wake_up_bit();
- 360.774524 | 1) 6.796 us | }
- 360.774524 | 1) 7.952 us | }
- 360.774525 | 1) 9.063 us | }
- 360.774525 | 1) 0.615 us | journal_mark_dirty();
- 360.774527 | 1) 0.578 us | __brelse();
- 360.774528 | 1) | reiserfs_prepare_for_journal() {
- 360.774528 | 1) | unlock_buffer() {
- 360.774529 | 1) | wake_up_bit() {
- 360.774529 | 1) | bit_waitqueue() {
- 360.774530 | 1) 0.594 us | __phys_addr();
-
-
-You can put some comments on specific functions by using
-trace_printk() For example, if you want to put a comment inside
-the __might_sleep() function, you just have to include
-<linux/ftrace.h> and call trace_printk() inside __might_sleep()
-
-trace_printk("I'm a comment!\n")
-
-will produce:
-
- 1) | __might_sleep() {
- 1) | /* I'm a comment! */
- 1) 1.449 us | }
-
-
-You might find other useful features for this tracer in the
-following "dynamic ftrace" section such as tracing only specific
-functions or tasks.
-
-dynamic ftrace
---------------
-
-If CONFIG_DYNAMIC_FTRACE is set, the system will run with
-virtually no overhead when function tracing is disabled. The way
-this works is the mcount function call (placed at the start of
-every kernel function, produced by the -pg switch in gcc),
-starts of pointing to a simple return. (Enabling FTRACE will
-include the -pg switch in the compiling of the kernel.)
-
-At compile time every C file object is run through the
-recordmcount.pl script (located in the scripts directory). This
-script will process the C object using objdump to find all the
-locations in the .text section that call mcount. (Note, only the
-.text section is processed, since processing other sections like
-.init.text may cause races due to those sections being freed).
-
-A new section called "__mcount_loc" is created that holds
-references to all the mcount call sites in the .text section.
-This section is compiled back into the original object. The
-final linker will add all these references into a single table.
-
-On boot up, before SMP is initialized, the dynamic ftrace code
-scans this table and updates all the locations into nops. It
-also records the locations, which are added to the
-available_filter_functions list. Modules are processed as they
-are loaded and before they are executed. When a module is
-unloaded, it also removes its functions from the ftrace function
-list. This is automatic in the module unload code, and the
-module author does not need to worry about it.
-
-When tracing is enabled, kstop_machine is called to prevent
-races with the CPUS executing code being modified (which can
-cause the CPU to do undesireable things), and the nops are
-patched back to calls. But this time, they do not call mcount
-(which is just a function stub). They now call into the ftrace
-infrastructure.
-
-One special side-effect to the recording of the functions being
-traced is that we can now selectively choose which functions we
-wish to trace and which ones we want the mcount calls to remain
-as nops.
-
-Two files are used, one for enabling and one for disabling the
-tracing of specified functions. They are:
-
- set_ftrace_filter
-
-and
-
- set_ftrace_notrace
-
-A list of available functions that you can add to these files is
-listed in:
-
- available_filter_functions
-
- # cat /debug/tracing/available_filter_functions
-put_prev_task_idle
-kmem_cache_create
-pick_next_task_rt
-get_online_cpus
-pick_next_task_fair
-mutex_lock
-[...]
-
-If I am only interested in sys_nanosleep and hrtimer_interrupt:
-
- # echo sys_nanosleep hrtimer_interrupt \
- > /debug/tracing/set_ftrace_filter
- # echo ftrace > /debug/tracing/current_tracer
- # echo 1 > /debug/tracing/tracing_enabled
- # usleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/trace
-# tracer: ftrace
-#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- usleep-4134 [00] 1317.070017: hrtimer_interrupt <-smp_apic_timer_interrupt
- usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call
- <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt
-
-To see which functions are being traced, you can cat the file:
-
- # cat /debug/tracing/set_ftrace_filter
-hrtimer_interrupt
-sys_nanosleep
-
-
-Perhaps this is not enough. The filters also allow simple wild
-cards. Only the following are currently available
-
- <match>* - will match functions that begin with <match>
- *<match> - will match functions that end with <match>
- *<match>* - will match functions that have <match> in it
-
-These are the only wild cards which are supported.
-
- <match>*<match> will not work.
-
-Note: It is better to use quotes to enclose the wild cards,
- otherwise the shell may expand the parameters into names
- of files in the local directory.
-
- # echo 'hrtimer_*' > /debug/tracing/set_ftrace_filter
-
-Produces:
-
-# tracer: ftrace
-#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-4003 [00] 1480.611794: hrtimer_init <-copy_process
- bash-4003 [00] 1480.611941: hrtimer_start <-hrtick_set
- bash-4003 [00] 1480.611956: hrtimer_cancel <-hrtick_clear
- bash-4003 [00] 1480.611956: hrtimer_try_to_cancel <-hrtimer_cancel
- <idle>-0 [00] 1480.612019: hrtimer_get_next_event <-get_next_timer_interrupt
- <idle>-0 [00] 1480.612025: hrtimer_get_next_event <-get_next_timer_interrupt
- <idle>-0 [00] 1480.612032: hrtimer_get_next_event <-get_next_timer_interrupt
- <idle>-0 [00] 1480.612037: hrtimer_get_next_event <-get_next_timer_interrupt
- <idle>-0 [00] 1480.612382: hrtimer_get_next_event <-get_next_timer_interrupt
-
-
-Notice that we lost the sys_nanosleep.
-
- # cat /debug/tracing/set_ftrace_filter
-hrtimer_run_queues
-hrtimer_run_pending
-hrtimer_init
-hrtimer_cancel
-hrtimer_try_to_cancel
-hrtimer_forward
-hrtimer_start
-hrtimer_reprogram
-hrtimer_force_reprogram
-hrtimer_get_next_event
-hrtimer_interrupt
-hrtimer_nanosleep
-hrtimer_wakeup
-hrtimer_get_remaining
-hrtimer_get_res
-hrtimer_init_sleeper
-
-
-This is because the '>' and '>>' act just like they do in bash.
-To rewrite the filters, use '>'
-To append to the filters, use '>>'
-
-To clear out a filter so that all functions will be recorded
-again:
-
- # echo > /debug/tracing/set_ftrace_filter
- # cat /debug/tracing/set_ftrace_filter
- #
-
-Again, now we want to append.
-
- # echo sys_nanosleep > /debug/tracing/set_ftrace_filter
- # cat /debug/tracing/set_ftrace_filter
-sys_nanosleep
- # echo 'hrtimer_*' >> /debug/tracing/set_ftrace_filter
- # cat /debug/tracing/set_ftrace_filter
-hrtimer_run_queues
-hrtimer_run_pending
-hrtimer_init
-hrtimer_cancel
-hrtimer_try_to_cancel
-hrtimer_forward
-hrtimer_start
-hrtimer_reprogram
-hrtimer_force_reprogram
-hrtimer_get_next_event
-hrtimer_interrupt
-sys_nanosleep
-hrtimer_nanosleep
-hrtimer_wakeup
-hrtimer_get_remaining
-hrtimer_get_res
-hrtimer_init_sleeper
-
-
-The set_ftrace_notrace prevents those functions from being
-traced.
-
- # echo '*preempt*' '*lock*' > /debug/tracing/set_ftrace_notrace
-
-Produces:
-
-# tracer: ftrace
-#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
- bash-4043 [01] 115.281644: finish_task_switch <-schedule
- bash-4043 [01] 115.281645: hrtick_set <-schedule
- bash-4043 [01] 115.281645: hrtick_clear <-hrtick_set
- bash-4043 [01] 115.281646: wait_for_completion <-__stop_machine_run
- bash-4043 [01] 115.281647: wait_for_common <-wait_for_completion
- bash-4043 [01] 115.281647: kthread_stop <-stop_machine_run
- bash-4043 [01] 115.281648: init_waitqueue_head <-kthread_stop
- bash-4043 [01] 115.281648: wake_up_process <-kthread_stop
- bash-4043 [01] 115.281649: try_to_wake_up <-wake_up_process
-
-We can see that there's no more lock or preempt tracing.
-
-
-Dynamic ftrace with the function graph tracer
----------------------------------------------
-
-Although what has been explained above concerns both the
-function tracer and the function-graph-tracer, there are some
-special features only available in the function-graph tracer.
-
-If you want to trace only one function and all of its children,
-you just have to echo its name into set_graph_function:
-
- echo __do_fault > set_graph_function
-
-will produce the following "expanded" trace of the __do_fault()
-function:
-
- 0) | __do_fault() {
- 0) | filemap_fault() {
- 0) | find_lock_page() {
- 0) 0.804 us | find_get_page();
- 0) | __might_sleep() {
- 0) 1.329 us | }
- 0) 3.904 us | }
- 0) 4.979 us | }
- 0) 0.653 us | _spin_lock();
- 0) 0.578 us | page_add_file_rmap();
- 0) 0.525 us | native_set_pte_at();
- 0) 0.585 us | _spin_unlock();
- 0) | unlock_page() {
- 0) 0.541 us | page_waitqueue();
- 0) 0.639 us | __wake_up_bit();
- 0) 2.786 us | }
- 0) + 14.237 us | }
- 0) | __do_fault() {
- 0) | filemap_fault() {
- 0) | find_lock_page() {
- 0) 0.698 us | find_get_page();
- 0) | __might_sleep() {
- 0) 1.412 us | }
- 0) 3.950 us | }
- 0) 5.098 us | }
- 0) 0.631 us | _spin_lock();
- 0) 0.571 us | page_add_file_rmap();
- 0) 0.526 us | native_set_pte_at();
- 0) 0.586 us | _spin_unlock();
- 0) | unlock_page() {
- 0) 0.533 us | page_waitqueue();
- 0) 0.638 us | __wake_up_bit();
- 0) 2.793 us | }
- 0) + 14.012 us | }
-
-You can also expand several functions at once:
-
- echo sys_open > set_graph_function
- echo sys_close >> set_graph_function
-
-Now if you want to go back to trace all functions you can clear
-this special filter via:
-
- echo > set_graph_function
-
-
-trace_pipe
-----------
-
-The trace_pipe outputs the same content as the trace file, but
-the effect on the tracing is different. Every read from
-trace_pipe is consumed. This means that subsequent reads will be
-different. The trace is live.
-
- # echo function > /debug/tracing/current_tracer
- # cat /debug/tracing/trace_pipe > /tmp/trace.out &
-[1] 4153
- # echo 1 > /debug/tracing/tracing_enabled
- # usleep 1
- # echo 0 > /debug/tracing/tracing_enabled
- # cat /debug/tracing/trace
-# tracer: function
-#
-# TASK-PID CPU# TIMESTAMP FUNCTION
-# | | | | |
-
- #
- # cat /tmp/trace.out
- bash-4043 [00] 41.267106: finish_task_switch <-schedule
- bash-4043 [00] 41.267106: hrtick_set <-schedule
- bash-4043 [00] 41.267107: hrtick_clear <-hrtick_set
- bash-4043 [00] 41.267108: wait_for_completion <-__stop_machine_run
- bash-4043 [00] 41.267108: wait_for_common <-wait_for_completion
- bash-4043 [00] 41.267109: kthread_stop <-stop_machine_run
- bash-4043 [00] 41.267109: init_waitqueue_head <-kthread_stop
- bash-4043 [00] 41.267110: wake_up_process <-kthread_stop
- bash-4043 [00] 41.267110: try_to_wake_up <-wake_up_process
- bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up
-
-
-Note, reading the trace_pipe file will block until more input is
-added. By changing the tracer, trace_pipe will issue an EOF. We
-needed to set the function tracer _before_ we "cat" the
-trace_pipe file.
-
-
-trace entries
--------------
-
-Having too much or not enough data can be troublesome in
-diagnosing an issue in the kernel. The file buffer_size_kb is
-used to modify the size of the internal trace buffers. The
-number listed is the number of entries that can be recorded per
-CPU. To know the full size, multiply the number of possible CPUS
-with the number of entries.
-
- # cat /debug/tracing/buffer_size_kb
-1408 (units kilobytes)
-
-Note, to modify this, you must have tracing completely disabled.
-To do that, echo "nop" into the current_tracer. If the
-current_tracer is not set to "nop", an EINVAL error will be
-returned.
-
- # echo nop > /debug/tracing/current_tracer
- # echo 10000 > /debug/tracing/buffer_size_kb
- # cat /debug/tracing/buffer_size_kb
-10000 (units kilobytes)
-
-The number of pages which will be allocated is limited to a
-percentage of available memory. Allocating too much will produce
-an error.
-
- # echo 1000000000000 > /debug/tracing/buffer_size_kb
--bash: echo: write error: Cannot allocate memory
- # cat /debug/tracing/buffer_size_kb
-85
-
------------
-
-More details can be found in the source code, in the
-kernel/tracing/*.c files.
--- /dev/null
+ ftrace - Function Tracer
+ ========================
+
+Copyright 2008 Red Hat Inc.
+ Author: Steven Rostedt <srostedt@redhat.com>
+ License: The GNU Free Documentation License, Version 1.2
+ (dual licensed under the GPL v2)
+Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
+ John Kacur, and David Teigland.
+
+Written for: 2.6.28-rc2
+
+Introduction
+------------
+
+Ftrace is an internal tracer designed to help out developers and
+designers of systems to find what is going on inside the kernel.
+It can be used for debugging or analyzing latencies and
+performance issues that take place outside of user-space.
+
+Although ftrace is the function tracer, it also includes an
+infrastructure that allows for other types of tracing. Some of
+the tracers that are currently in ftrace include a tracer to
+trace context switches, the time it takes for a high priority
+task to run after it was woken up, the time interrupts are
+disabled, and more (ftrace allows for tracer plugins, which
+means that the list of tracers can always grow).
+
+
+The File System
+---------------
+
+Ftrace uses the debugfs file system to hold the control files as
+well as the files to display output.
+
+To mount the debugfs system:
+
+ # mkdir /debug
+ # mount -t debugfs nodev /debug
+
+( Note: it is more common to mount at /sys/kernel/debug, but for
+ simplicity this document will use /debug)
+
+That's it! (assuming that you have ftrace configured into your kernel)
+
+After mounting the debugfs, you can see a directory called
+"tracing". This directory contains the control and output files
+of ftrace. Here is a list of some of the key files:
+
+
+ Note: all time values are in microseconds.
+
+ current_tracer:
+
+ This is used to set or display the current tracer
+ that is configured.
+
+ available_tracers:
+
+ This holds the different types of tracers that
+ have been compiled into the kernel. The
+ tracers listed here can be configured by
+ echoing their name into current_tracer.
+
+ tracing_enabled:
+
+ This sets or displays whether the current_tracer
+ is activated and tracing or not. Echo 0 into this
+ file to disable the tracer or 1 to enable it.
+
+ trace:
+
+ This file holds the output of the trace in a human
+ readable format (described below).
+
+ latency_trace:
+
+ This file shows the same trace but the information
+ is organized more to display possible latencies
+ in the system (described below).
+
+ trace_pipe:
+
+ The output is the same as the "trace" file but this
+ file is meant to be streamed with live tracing.
+ Reads from this file will block until new data
+ is retrieved. Unlike the "trace" and "latency_trace"
+ files, this file is a consumer. This means reading
+ from this file causes sequential reads to display
+ more current data. Once data is read from this
+ file, it is consumed, and will not be read
+ again with a sequential read. The "trace" and
+ "latency_trace" files are static, and if the
+ tracer is not adding more data, they will display
+ the same information every time they are read.
+
+ trace_options:
+
+ This file lets the user control the amount of data
+ that is displayed in one of the above output
+ files.
+
+ tracing_max_latency:
+
+ Some of the tracers record the max latency.
+ For example, the time interrupts are disabled.
+ This time is saved in this file. The max trace
+ will also be stored, and displayed by either
+ "trace" or "latency_trace". A new max trace will
+ only be recorded if the latency is greater than
+ the value in this file. (in microseconds)
+
+ buffer_size_kb:
+
+ This sets or displays the number of kilobytes each CPU
+ buffer can hold. The tracer buffers are the same size
+ for each CPU. The displayed number is the size of the
+ CPU buffer and not total size of all buffers. The
+ trace buffers are allocated in pages (blocks of memory
+ that the kernel uses for allocation, usually 4 KB in size).
+ If the last page allocated has room for more bytes
+ than requested, the rest of the page will be used,
+ making the actual allocation bigger than requested.
+ ( Note, the size may not be a multiple of the page size
+ due to buffer managment overhead. )
+
+ This can only be updated when the current_tracer
+ is set to "nop".
+
+ tracing_cpumask:
+
+ This is a mask that lets the user only trace
+ on specified CPUS. The format is a hex string
+ representing the CPUS.
+
+ set_ftrace_filter:
+
+ When dynamic ftrace is configured in (see the
+ section below "dynamic ftrace"), the code is dynamically
+ modified (code text rewrite) to disable calling of the
+ function profiler (mcount). This lets tracing be configured
+ in with practically no overhead in performance. This also
+ has a side effect of enabling or disabling specific functions
+ to be traced. Echoing names of functions into this file
+ will limit the trace to only those functions.
+
+ set_ftrace_notrace:
+
+ This has an effect opposite to that of
+ set_ftrace_filter. Any function that is added here will not
+ be traced. If a function exists in both set_ftrace_filter
+ and set_ftrace_notrace, the function will _not_ be traced.
+
+ set_ftrace_pid:
+
+ Have the function tracer only trace a single thread.
+
+ set_graph_function:
+
+ Set a "trigger" function where tracing should start
+ with the function graph tracer (See the section
+ "dynamic ftrace" for more details).
+
+ available_filter_functions:
+
+ This lists the functions that ftrace
+ has processed and can trace. These are the function
+ names that you can pass to "set_ftrace_filter" or
+ "set_ftrace_notrace". (See the section "dynamic ftrace"
+ below for more details.)
+
+
+The Tracers
+-----------
+
+Here is the list of current tracers that may be configured.
+
+ "function"
+
+ Function call tracer to trace all kernel functions.
+
+ "function_graph_tracer"
+
+ Similar to the function tracer except that the
+ function tracer probes the functions on their entry
+ whereas the function graph tracer traces on both entry
+ and exit of the functions. It then provides the ability
+ to draw a graph of function calls similar to C code
+ source.
+
+ "sched_switch"
+
+ Traces the context switches and wakeups between tasks.
+
+ "irqsoff"
+
+ Traces the areas that disable interrupts and saves
+ the trace with the longest max latency.
+ See tracing_max_latency. When a new max is recorded,
+ it replaces the old trace. It is best to view this
+ trace via the latency_trace file.
+
+ "preemptoff"
+
+ Similar to irqsoff but traces and records the amount of
+ time for which preemption is disabled.
+
+ "preemptirqsoff"
+
+ Similar to irqsoff and preemptoff, but traces and
+ records the largest time for which irqs and/or preemption
+ is disabled.
+
+ "wakeup"
+
+ Traces and records the max latency that it takes for
+ the highest priority task to get scheduled after
+ it has been woken up.
+
+ "hw-branch-tracer"
+
+ Uses the BTS CPU feature on x86 CPUs to traces all
+ branches executed.
+
+ "nop"
+
+ This is the "trace nothing" tracer. To remove all
+ tracers from tracing simply echo "nop" into
+ current_tracer.
+
+
+Examples of using the tracer
+----------------------------
+
+Here are typical examples of using the tracers when controlling
+them only with the debugfs interface (without using any
+user-land utilities).
+
+Output format:
+--------------
+
+Here is an example of the output format of the file "trace"
+
+ --------
+# tracer: function
+#
+# TASK-PID CPU# TIMESTAMP FUNCTION
+# | | | | |
+ bash-4251 [01] 10152.583854: path_put <-path_walk
+ bash-4251 [01] 10152.583855: dput <-path_put
+ bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput
+ --------
+
+A header is printed with the tracer name that is represented by
+the trace. In this case the tracer is "function". Then a header
+showing the format. Task name "bash", the task PID "4251", the
+CPU that it was running on "01", the timestamp in <secs>.<usecs>
+format, the function name that was traced "path_put" and the
+parent function that called this function "path_walk". The
+timestamp is the time at which the function was entered.
+
+The sched_switch tracer also includes tracing of task wakeups
+and context switches.
+
+ ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
+ ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
+ ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
+ events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
+ kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
+ ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
+
+Wake ups are represented by a "+" and the context switches are
+shown as "==>". The format is:
+
+ Context switches:
+
+ Previous task Next Task
+
+ <pid>:<prio>:<state> ==> <pid>:<prio>:<state>
+
+ Wake ups:
+
+ Current task Task waking up
+
+ <pid>:<prio>:<state> + <pid>:<prio>:<state>
+
+The prio is the internal kernel priority, which is the inverse
+of the priority that is usually displayed by user-space tools.
+Zero represents the highest priority (99). Prio 100 starts the
+"nice" priorities with 100 being equal to nice -20 and 139 being
+nice 19. The prio "140" is reserved for the idle task which is
+the lowest priority thread (pid 0).
+
+
+Latency trace format
+--------------------
+
+For traces that display latency times, the latency_trace file
+gives somewhat more information to see why a latency happened.
+Here is a typical trace.
+
+# tracer: irqsoff
+#
+irqsoff latency trace v1.1.5 on 2.6.26-rc8
+--------------------------------------------------------------------
+ latency: 97 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
+ -----------------
+ => started at: apic_timer_interrupt
+ => ended at: do_softirq
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ <idle>-0 0d..1 0us+: trace_hardirqs_off_thunk (apic_timer_interrupt)
+ <idle>-0 0d.s. 97us : __do_softirq (do_softirq)
+ <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq)
+
+
+This shows that the current tracer is "irqsoff" tracing the time
+for which interrupts were disabled. It gives the trace version
+and the version of the kernel upon which this was executed on
+(2.6.26-rc8). Then it displays the max latency in microsecs (97
+us). The number of trace entries displayed and the total number
+recorded (both are three: #3/3). The type of preemption that was
+used (PREEMPT). VP, KP, SP, and HP are always zero and are
+reserved for later use. #P is the number of online CPUS (#P:2).
+
+The task is the process that was running when the latency
+occurred. (swapper pid: 0).
+
+The start and stop (the functions in which the interrupts were
+disabled and enabled respectively) that caused the latencies:
+
+ apic_timer_interrupt is where the interrupts were disabled.
+ do_softirq is where they were enabled again.
+
+The next lines after the header are the trace itself. The header
+explains which is which.
+
+ cmd: The name of the process in the trace.
+
+ pid: The PID of that process.
+
+ CPU#: The CPU which the process was running on.
+
+ irqs-off: 'd' interrupts are disabled. '.' otherwise.
+ Note: If the architecture does not support a way to
+ read the irq flags variable, an 'X' will always
+ be printed here.
+
+ need-resched: 'N' task need_resched is set, '.' otherwise.
+
+ hardirq/softirq:
+ 'H' - hard irq occurred inside a softirq.
+ 'h' - hard irq is running
+ 's' - soft irq is running
+ '.' - normal context.
+
+ preempt-depth: The level of preempt_disabled
+
+The above is mostly meaningful for kernel developers.
+
+ time: This differs from the trace file output. The trace file output
+ includes an absolute timestamp. The timestamp used by the
+ latency_trace file is relative to the start of the trace.
+
+ delay: This is just to help catch your eye a bit better. And
+ needs to be fixed to be only relative to the same CPU.
+ The marks are determined by the difference between this
+ current trace and the next trace.
+ '!' - greater than preempt_mark_thresh (default 100)
+ '+' - greater than 1 microsecond
+ ' ' - less than or equal to 1 microsecond.
+
+ The rest is the same as the 'trace' file.
+
+
+trace_options
+-------------
+
+The trace_options file is used to control what gets printed in
+the trace output. To see what is available, simply cat the file:
+
+ cat /debug/tracing/trace_options
+ print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
+ noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
+
+To disable one of the options, echo in the option prepended with
+"no".
+
+ echo noprint-parent > /debug/tracing/trace_options
+
+To enable an option, leave off the "no".
+
+ echo sym-offset > /debug/tracing/trace_options
+
+Here are the available options:
+
+ print-parent - On function traces, display the calling (parent)
+ function as well as the function being traced.
+
+ print-parent:
+ bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul
+
+ noprint-parent:
+ bash-4000 [01] 1477.606694: simple_strtoul
+
+
+ sym-offset - Display not only the function name, but also the
+ offset in the function. For example, instead of
+ seeing just "ktime_get", you will see
+ "ktime_get+0xb/0x20".
+
+ sym-offset:
+ bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
+
+ sym-addr - this will also display the function address as well
+ as the function name.
+
+ sym-addr:
+ bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
+
+ verbose - This deals with the latency_trace file.
+
+ bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
+ (+0.000ms): simple_strtoul (strict_strtoul)
+
+ raw - This will display raw numbers. This option is best for
+ use with user applications that can translate the raw
+ numbers better than having it done in the kernel.
+
+ hex - Similar to raw, but the numbers will be in a hexadecimal
+ format.
+
+ bin - This will print out the formats in raw binary.
+
+ block - TBD (needs update)
+
+ stacktrace - This is one of the options that changes the trace
+ itself. When a trace is recorded, so is the stack
+ of functions. This allows for back traces of
+ trace sites.
+
+ userstacktrace - This option changes the trace. It records a
+ stacktrace of the current userspace thread.
+
+ sym-userobj - when user stacktrace are enabled, look up which
+ object the address belongs to, and print a
+ relative address. This is especially useful when
+ ASLR is on, otherwise you don't get a chance to
+ resolve the address to object/file/line after
+ the app is no longer running
+
+ The lookup is performed when you read
+ trace,trace_pipe,latency_trace. Example:
+
+ a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
+x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
+
+ sched-tree - trace all tasks that are on the runqueue, at
+ every scheduling event. Will add overhead if
+ there's a lot of tasks running at once.
+
+
+sched_switch
+------------
+
+This tracer simply records schedule switches. Here is an example
+of how to use it.
+
+ # echo sched_switch > /debug/tracing/current_tracer
+ # echo 1 > /debug/tracing/tracing_enabled
+ # sleep 1
+ # echo 0 > /debug/tracing/tracing_enabled
+ # cat /debug/tracing/trace
+
+# tracer: sched_switch
+#
+# TASK-PID CPU# TIMESTAMP FUNCTION
+# | | | | |
+ bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
+ bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
+ sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
+ bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
+ bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
+ sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
+ bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
+ bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
+ <idle>-0 [00] 240.132589: 0:140:R + 4:115:S
+ <idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
+ ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
+ <idle>-0 [00] 240.132598: 0:140:R + 4:115:S
+ <idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
+ ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
+ sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
+ [...]
+
+
+As we have discussed previously about this format, the header
+shows the name of the trace and points to the options. The
+"FUNCTION" is a misnomer since here it represents the wake ups
+and context switches.
+
+The sched_switch file only lists the wake ups (represented with
+'+') and context switches ('==>') with the previous task or
+current task first followed by the next task or task waking up.
+The format for both of these is PID:KERNEL-PRIO:TASK-STATE.
+Remember that the KERNEL-PRIO is the inverse of the actual
+priority with zero (0) being the highest priority and the nice
+values starting at 100 (nice -20). Below is a quick chart to map
+the kernel priority to user land priorities.
+
+ Kernel priority: 0 to 99 ==> user RT priority 99 to 0
+ Kernel priority: 100 to 139 ==> user nice -20 to 19
+ Kernel priority: 140 ==> idle task priority
+
+The task states are:
+
+ R - running : wants to run, may not actually be running
+ S - sleep : process is waiting to be woken up (handles signals)
+ D - disk sleep (uninterruptible sleep) : process must be woken up
+ (ignores signals)
+ T - stopped : process suspended
+ t - traced : process is being traced (with something like gdb)
+ Z - zombie : process waiting to be cleaned up
+ X - unknown
+
+
+ftrace_enabled
+--------------
+
+The following tracers (listed below) give different output
+depending on whether or not the sysctl ftrace_enabled is set. To
+set ftrace_enabled, one can either use the sysctl function or
+set it via the proc file system interface.
+
+ sysctl kernel.ftrace_enabled=1
+
+ or
+
+ echo 1 > /proc/sys/kernel/ftrace_enabled
+
+To disable ftrace_enabled simply replace the '1' with '0' in the
+above commands.
+
+When ftrace_enabled is set the tracers will also record the
+functions that are within the trace. The descriptions of the
+tracers will also show an example with ftrace enabled.
+
+
+irqsoff
+-------
+
+When interrupts are disabled, the CPU can not react to any other
+external event (besides NMIs and SMIs). This prevents the timer
+interrupt from triggering or the mouse interrupt from letting
+the kernel know of a new mouse event. The result is a latency
+with the reaction time.
+
+The irqsoff tracer tracks the time for which interrupts are
+disabled. When a new maximum latency is hit, the tracer saves
+the trace leading up to that latency point so that every time a
+new maximum is reached, the old saved trace is discarded and the
+new trace is saved.
+
+To reset the maximum, echo 0 into tracing_max_latency. Here is
+an example:
+
+ # echo irqsoff > /debug/tracing/current_tracer
+ # echo 0 > /debug/tracing/tracing_max_latency
+ # echo 1 > /debug/tracing/tracing_enabled
+ # ls -ltr
+ [...]
+ # echo 0 > /debug/tracing/tracing_enabled
+ # cat /debug/tracing/latency_trace
+# tracer: irqsoff
+#
+irqsoff latency trace v1.1.5 on 2.6.26
+--------------------------------------------------------------------
+ latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0)
+ -----------------
+ => started at: sys_setpgid
+ => ended at: sys_setpgid
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ bash-3730 1d... 0us : _write_lock_irq (sys_setpgid)
+ bash-3730 1d..1 1us+: _write_unlock_irq (sys_setpgid)
+ bash-3730 1d..2 14us : trace_hardirqs_on (sys_setpgid)
+
+
+Here we see that that we had a latency of 12 microsecs (which is
+very good). The _write_lock_irq in sys_setpgid disabled
+interrupts. The difference between the 12 and the displayed
+timestamp 14us occurred because the clock was incremented
+between the time of recording the max latency and the time of
+recording the function that had that latency.
+
+Note the above example had ftrace_enabled not set. If we set the
+ftrace_enabled, we get a much larger output:
+
+# tracer: irqsoff
+#
+irqsoff latency trace v1.1.5 on 2.6.26-rc8
+--------------------------------------------------------------------
+ latency: 50 us, #101/101, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: ls-4339 (uid:0 nice:0 policy:0 rt_prio:0)
+ -----------------
+ => started at: __alloc_pages_internal
+ => ended at: __alloc_pages_internal
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ ls-4339 0...1 0us+: get_page_from_freelist (__alloc_pages_internal)
+ ls-4339 0d..1 3us : rmqueue_bulk (get_page_from_freelist)
+ ls-4339 0d..1 3us : _spin_lock (rmqueue_bulk)
+ ls-4339 0d..1 4us : add_preempt_count (_spin_lock)
+ ls-4339 0d..2 4us : __rmqueue (rmqueue_bulk)
+ ls-4339 0d..2 5us : __rmqueue_smallest (__rmqueue)
+ ls-4339 0d..2 5us : __mod_zone_page_state (__rmqueue_smallest)
+ ls-4339 0d..2 6us : __rmqueue (rmqueue_bulk)
+ ls-4339 0d..2 6us : __rmqueue_smallest (__rmqueue)
+ ls-4339 0d..2 7us : __mod_zone_page_state (__rmqueue_smallest)
+ ls-4339 0d..2 7us : __rmqueue (rmqueue_bulk)
+ ls-4339 0d..2 8us : __rmqueue_smallest (__rmqueue)
+[...]
+ ls-4339 0d..2 46us : __rmqueue_smallest (__rmqueue)
+ ls-4339 0d..2 47us : __mod_zone_page_state (__rmqueue_smallest)
+ ls-4339 0d..2 47us : __rmqueue (rmqueue_bulk)
+ ls-4339 0d..2 48us : __rmqueue_smallest (__rmqueue)
+ ls-4339 0d..2 48us : __mod_zone_page_state (__rmqueue_smallest)
+ ls-4339 0d..2 49us : _spin_unlock (rmqueue_bulk)
+ ls-4339 0d..2 49us : sub_preempt_count (_spin_unlock)
+ ls-4339 0d..1 50us : get_page_from_freelist (__alloc_pages_internal)
+ ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal)
+
+
+
+Here we traced a 50 microsecond latency. But we also see all the
+functions that were called during that time. Note that by
+enabling function tracing, we incur an added overhead. This
+overhead may extend the latency times. But nevertheless, this
+trace has provided some very helpful debugging information.
+
+
+preemptoff
+----------
+
+When preemption is disabled, we may be able to receive
+interrupts but the task cannot be preempted and a higher
+priority task must wait for preemption to be enabled again
+before it can preempt a lower priority task.
+
+The preemptoff tracer traces the places that disable preemption.
+Like the irqsoff tracer, it records the maximum latency for
+which preemption was disabled. The control of preemptoff tracer
+is much like the irqsoff tracer.
+
+ # echo preemptoff > /debug/tracing/current_tracer
+ # echo 0 > /debug/tracing/tracing_max_latency
+ # echo 1 > /debug/tracing/tracing_enabled
+ # ls -ltr
+ [...]
+ # echo 0 > /debug/tracing/tracing_enabled
+ # cat /debug/tracing/latency_trace
+# tracer: preemptoff
+#
+preemptoff latency trace v1.1.5 on 2.6.26-rc8
+--------------------------------------------------------------------
+ latency: 29 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
+ -----------------
+ => started at: do_IRQ
+ => ended at: __do_softirq
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ sshd-4261 0d.h. 0us+: irq_enter (do_IRQ)
+ sshd-4261 0d.s. 29us : _local_bh_enable (__do_softirq)
+ sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq)
+
+
+This has some more changes. Preemption was disabled when an
+interrupt came in (notice the 'h'), and was enabled while doing
+a softirq. (notice the 's'). But we also see that interrupts
+have been disabled when entering the preempt off section and
+leaving it (the 'd'). We do not know if interrupts were enabled
+in the mean time.
+
+# tracer: preemptoff
+#
+preemptoff latency trace v1.1.5 on 2.6.26-rc8
+--------------------------------------------------------------------
+ latency: 63 us, #87/87, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
+ -----------------
+ => started at: remove_wait_queue
+ => ended at: __do_softirq
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ sshd-4261 0d..1 0us : _spin_lock_irqsave (remove_wait_queue)
+ sshd-4261 0d..1 1us : _spin_unlock_irqrestore (remove_wait_queue)
+ sshd-4261 0d..1 2us : do_IRQ (common_interrupt)
+ sshd-4261 0d..1 2us : irq_enter (do_IRQ)
+ sshd-4261 0d..1 2us : idle_cpu (irq_enter)
+ sshd-4261 0d..1 3us : add_preempt_count (irq_enter)
+ sshd-4261 0d.h1 3us : idle_cpu (irq_enter)
+ sshd-4261 0d.h. 4us : handle_fasteoi_irq (do_IRQ)
+[...]
+ sshd-4261 0d.h. 12us : add_preempt_count (_spin_lock)
+ sshd-4261 0d.h1 12us : ack_ioapic_quirk_irq (handle_fasteoi_irq)
+ sshd-4261 0d.h1 13us : move_native_irq (ack_ioapic_quirk_irq)
+ sshd-4261 0d.h1 13us : _spin_unlock (handle_fasteoi_irq)
+ sshd-4261 0d.h1 14us : sub_preempt_count (_spin_unlock)
+ sshd-4261 0d.h1 14us : irq_exit (do_IRQ)
+ sshd-4261 0d.h1 15us : sub_preempt_count (irq_exit)
+ sshd-4261 0d..2 15us : do_softirq (irq_exit)
+ sshd-4261 0d... 15us : __do_softirq (do_softirq)
+ sshd-4261 0d... 16us : __local_bh_disable (__do_softirq)
+ sshd-4261 0d... 16us+: add_preempt_count (__local_bh_disable)
+ sshd-4261 0d.s4 20us : add_preempt_count (__local_bh_disable)
+ sshd-4261 0d.s4 21us : sub_preempt_count (local_bh_enable)
+ sshd-4261 0d.s5 21us : sub_preempt_count (local_bh_enable)
+[...]
+ sshd-4261 0d.s6 41us : add_preempt_count (__local_bh_disable)
+ sshd-4261 0d.s6 42us : sub_preempt_count (local_bh_enable)
+ sshd-4261 0d.s7 42us : sub_preempt_count (local_bh_enable)
+ sshd-4261 0d.s5 43us : add_preempt_count (__local_bh_disable)
+ sshd-4261 0d.s5 43us : sub_preempt_count (local_bh_enable_ip)
+ sshd-4261 0d.s6 44us : sub_preempt_count (local_bh_enable_ip)
+ sshd-4261 0d.s5 44us : add_preempt_count (__local_bh_disable)
+ sshd-4261 0d.s5 45us : sub_preempt_count (local_bh_enable)
+[...]
+ sshd-4261 0d.s. 63us : _local_bh_enable (__do_softirq)
+ sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq)
+
+
+The above is an example of the preemptoff trace with
+ftrace_enabled set. Here we see that interrupts were disabled
+the entire time. The irq_enter code lets us know that we entered
+an interrupt 'h'. Before that, the functions being traced still
+show that it is not in an interrupt, but we can see from the
+functions themselves that this is not the case.
+
+Notice that __do_softirq when called does not have a
+preempt_count. It may seem that we missed a preempt enabling.
+What really happened is that the preempt count is held on the
+thread's stack and we switched to the softirq stack (4K stacks
+in effect). The code does not copy the preempt count, but
+because interrupts are disabled, we do not need to worry about
+it. Having a tracer like this is good for letting people know
+what really happens inside the kernel.
+
+
+preemptirqsoff
+--------------
+
+Knowing the locations that have interrupts disabled or
+preemption disabled for the longest times is helpful. But
+sometimes we would like to know when either preemption and/or
+interrupts are disabled.
+
+Consider the following code:
+
+ local_irq_disable();
+ call_function_with_irqs_off();
+ preempt_disable();
+ call_function_with_irqs_and_preemption_off();
+ local_irq_enable();
+ call_function_with_preemption_off();
+ preempt_enable();
+
+The irqsoff tracer will record the total length of
+call_function_with_irqs_off() and
+call_function_with_irqs_and_preemption_off().
+
+The preemptoff tracer will record the total length of
+call_function_with_irqs_and_preemption_off() and
+call_function_with_preemption_off().
+
+But neither will trace the time that interrupts and/or
+preemption is disabled. This total time is the time that we can
+not schedule. To record this time, use the preemptirqsoff
+tracer.
+
+Again, using this trace is much like the irqsoff and preemptoff
+tracers.
+
+ # echo preemptirqsoff > /debug/tracing/current_tracer
+ # echo 0 > /debug/tracing/tracing_max_latency
+ # echo 1 > /debug/tracing/tracing_enabled
+ # ls -ltr
+ [...]
+ # echo 0 > /debug/tracing/tracing_enabled
+ # cat /debug/tracing/latency_trace
+# tracer: preemptirqsoff
+#
+preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
+--------------------------------------------------------------------
+ latency: 293 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: ls-4860 (uid:0 nice:0 policy:0 rt_prio:0)
+ -----------------
+ => started at: apic_timer_interrupt
+ => ended at: __do_softirq
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ ls-4860 0d... 0us!: trace_hardirqs_off_thunk (apic_timer_interrupt)
+ ls-4860 0d.s. 294us : _local_bh_enable (__do_softirq)
+ ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq)
+
+
+
+The trace_hardirqs_off_thunk is called from assembly on x86 when
+interrupts are disabled in the assembly code. Without the
+function tracing, we do not know if interrupts were enabled
+within the preemption points. We do see that it started with
+preemption enabled.
+
+Here is a trace with ftrace_enabled set:
+
+
+# tracer: preemptirqsoff
+#
+preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
+--------------------------------------------------------------------
+ latency: 105 us, #183/183, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
+ -----------------
+ => started at: write_chan
+ => ended at: __do_softirq
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ ls-4473 0.N.. 0us : preempt_schedule (write_chan)
+ ls-4473 0dN.1 1us : _spin_lock (schedule)
+ ls-4473 0dN.1 2us : add_preempt_count (_spin_lock)
+ ls-4473 0d..2 2us : put_prev_task_fair (schedule)
+[...]
+ ls-4473 0d..2 13us : set_normalized_timespec (ktime_get_ts)
+ ls-4473 0d..2 13us : __switch_to (schedule)
+ sshd-4261 0d..2 14us : finish_task_switch (schedule)
+ sshd-4261 0d..2 14us : _spin_unlock_irq (finish_task_switch)
+ sshd-4261 0d..1 15us : add_preempt_count (_spin_lock_irqsave)
+ sshd-4261 0d..2 16us : _spin_unlock_irqrestore (hrtick_set)
+ sshd-4261 0d..2 16us : do_IRQ (common_interrupt)
+ sshd-4261 0d..2 17us : irq_enter (do_IRQ)
+ sshd-4261 0d..2 17us : idle_cpu (irq_enter)
+ sshd-4261 0d..2 18us : add_preempt_count (irq_enter)
+ sshd-4261 0d.h2 18us : idle_cpu (irq_enter)
+ sshd-4261 0d.h. 18us : handle_fasteoi_irq (do_IRQ)
+ sshd-4261 0d.h. 19us : _spin_lock (handle_fasteoi_irq)
+ sshd-4261 0d.h. 19us : add_preempt_count (_spin_lock)
+ sshd-4261 0d.h1 20us : _spin_unlock (handle_fasteoi_irq)
+ sshd-4261 0d.h1 20us : sub_preempt_count (_spin_unlock)
+[...]
+ sshd-4261 0d.h1 28us : _spin_unlock (handle_fasteoi_irq)
+ sshd-4261 0d.h1 29us : sub_preempt_count (_spin_unlock)
+ sshd-4261 0d.h2 29us : irq_exit (do_IRQ)
+ sshd-4261 0d.h2 29us : sub_preempt_count (irq_exit)
+ sshd-4261 0d..3 30us : do_softirq (irq_exit)
+ sshd-4261 0d... 30us : __do_softirq (do_softirq)
+ sshd-4261 0d... 31us : __local_bh_disable (__do_softirq)
+ sshd-4261 0d... 31us+: add_preempt_count (__local_bh_disable)
+ sshd-4261 0d.s4 34us : add_preempt_count (__local_bh_disable)
+[...]
+ sshd-4261 0d.s3 43us : sub_preempt_count (local_bh_enable_ip)
+ sshd-4261 0d.s4 44us : sub_preempt_count (local_bh_enable_ip)
+ sshd-4261 0d.s3 44us : smp_apic_timer_interrupt (apic_timer_interrupt)
+ sshd-4261 0d.s3 45us : irq_enter (smp_apic_timer_interrupt)
+ sshd-4261 0d.s3 45us : idle_cpu (irq_enter)
+ sshd-4261 0d.s3 46us : add_preempt_count (irq_enter)
+ sshd-4261 0d.H3 46us : idle_cpu (irq_enter)
+ sshd-4261 0d.H3 47us : hrtimer_interrupt (smp_apic_timer_interrupt)
+ sshd-4261 0d.H3 47us : ktime_get (hrtimer_interrupt)
+[...]
+ sshd-4261 0d.H3 81us : tick_program_event (hrtimer_interrupt)
+ sshd-4261 0d.H3 82us : ktime_get (tick_program_event)
+ sshd-4261 0d.H3 82us : ktime_get_ts (ktime_get)
+ sshd-4261 0d.H3 83us : getnstimeofday (ktime_get_ts)
+ sshd-4261 0d.H3 83us : set_normalized_timespec (ktime_get_ts)
+ sshd-4261 0d.H3 84us : clockevents_program_event (tick_program_event)
+ sshd-4261 0d.H3 84us : lapic_next_event (clockevents_program_event)
+ sshd-4261 0d.H3 85us : irq_exit (smp_apic_timer_interrupt)
+ sshd-4261 0d.H3 85us : sub_preempt_count (irq_exit)
+ sshd-4261 0d.s4 86us : sub_preempt_count (irq_exit)
+ sshd-4261 0d.s3 86us : add_preempt_count (__local_bh_disable)
+[...]
+ sshd-4261 0d.s1 98us : sub_preempt_count (net_rx_action)
+ sshd-4261 0d.s. 99us : add_preempt_count (_spin_lock_irq)
+ sshd-4261 0d.s1 99us+: _spin_unlock_irq (run_timer_softirq)
+ sshd-4261 0d.s. 104us : _local_bh_enable (__do_softirq)
+ sshd-4261 0d.s. 104us : sub_preempt_count (_local_bh_enable)
+ sshd-4261 0d.s. 105us : _local_bh_enable (__do_softirq)
+ sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq)
+
+
+This is a very interesting trace. It started with the preemption
+of the ls task. We see that the task had the "need_resched" bit
+set via the 'N' in the trace. Interrupts were disabled before
+the spin_lock at the beginning of the trace. We see that a
+schedule took place to run sshd. When the interrupts were
+enabled, we took an interrupt. On return from the interrupt
+handler, the softirq ran. We took another interrupt while
+running the softirq as we see from the capital 'H'.
+
+
+wakeup
+------
+
+In a Real-Time environment it is very important to know the
+wakeup time it takes for the highest priority task that is woken
+up to the time that it executes. This is also known as "schedule
+latency". I stress the point that this is about RT tasks. It is
+also important to know the scheduling latency of non-RT tasks,
+but the average schedule latency is better for non-RT tasks.
+Tools like LatencyTop are more appropriate for such
+measurements.
+
+Real-Time environments are interested in the worst case latency.
+That is the longest latency it takes for something to happen,
+and not the average. We can have a very fast scheduler that may
+only have a large latency once in a while, but that would not
+work well with Real-Time tasks. The wakeup tracer was designed
+to record the worst case wakeups of RT tasks. Non-RT tasks are
+not recorded because the tracer only records one worst case and
+tracing non-RT tasks that are unpredictable will overwrite the
+worst case latency of RT tasks.
+
+Since this tracer only deals with RT tasks, we will run this
+slightly differently than we did with the previous tracers.
+Instead of performing an 'ls', we will run 'sleep 1' under
+'chrt' which changes the priority of the task.
+
+ # echo wakeup > /debug/tracing/current_tracer
+ # echo 0 > /debug/tracing/tracing_max_latency
+ # echo 1 > /debug/tracing/tracing_enabled
+ # chrt -f 5 sleep 1
+ # echo 0 > /debug/tracing/tracing_enabled
+ # cat /debug/tracing/latency_trace
+# tracer: wakeup
+#
+wakeup latency trace v1.1.5 on 2.6.26-rc8
+--------------------------------------------------------------------
+ latency: 4 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: sleep-4901 (uid:0 nice:0 policy:1 rt_prio:5)
+ -----------------
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ <idle>-0 1d.h4 0us+: try_to_wake_up (wake_up_process)
+ <idle>-0 1d..4 4us : schedule (cpu_idle)
+
+
+Running this on an idle system, we see that it only took 4
+microseconds to perform the task switch. Note, since the trace
+marker in the schedule is before the actual "switch", we stop
+the tracing when the recorded task is about to schedule in. This
+may change if we add a new marker at the end of the scheduler.
+
+Notice that the recorded task is 'sleep' with the PID of 4901
+and it has an rt_prio of 5. This priority is user-space priority
+and not the internal kernel priority. The policy is 1 for
+SCHED_FIFO and 2 for SCHED_RR.
+
+Doing the same with chrt -r 5 and ftrace_enabled set.
+
+# tracer: wakeup
+#
+wakeup latency trace v1.1.5 on 2.6.26-rc8
+--------------------------------------------------------------------
+ latency: 50 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
+ -----------------
+ | task: sleep-4068 (uid:0 nice:0 policy:2 rt_prio:5)
+ -----------------
+
+# _------=> CPU#
+# / _-----=> irqs-off
+# | / _----=> need-resched
+# || / _---=> hardirq/softirq
+# ||| / _--=> preempt-depth
+# |||| /
+# ||||| delay
+# cmd pid ||||| time | caller
+# \ / ||||| \ | /
+ksoftirq-7 1d.H3 0us : try_to_wake_up (wake_up_process)
+ksoftirq-7 1d.H4 1us : sub_preempt_count (marker_probe_cb)
+ksoftirq-7 1d.H3 2us : check_preempt_wakeup (try_to_wake_up)
+ksoftirq-7 1d.H3 3us : update_curr (check_preempt_wakeup)
+ksoftirq-7 1d.H3 4us : calc_delta_mine (update_curr)
+ksoftirq-7 1d.H3 5us : __resched_task (check_preempt_wakeup)
+ksoftirq-7 1d.H3 6us : task_wake_up_rt (try_to_wake_up)
+ksoftirq-7 1d.H3 7us : _spin_unlock_irqrestore (try_to_wake_up)
+[...]
+ksoftirq-7 1d.H2 17us : irq_exit (smp_apic_timer_interrupt)
+ksoftirq-7 1d.H2 18us : sub_preempt_count (irq_exit)
+ksoftirq-7 1d.s3 19us : sub_preempt_count (irq_exit)
+ksoftirq-7 1..s2 20us : rcu_process_callbacks (__do_softirq)
+[...]
+ksoftirq-7 1..s2 26us : __rcu_process_callbacks (rcu_process_callbacks)
+ksoftirq-7 1d.s2 27us : _local_bh_enable (__do_softirq)
+ksoftirq-7 1d.s2 28us : sub_preempt_count (_local_bh_enable)
+ksoftirq-7 1.N.3 29us : sub_preempt_count (ksoftirqd)
+ksoftirq-7 1.N.2 30us : _cond_resched (ksoftirqd)
+ksoftirq-7 1.N.2 31us : __cond_resched (_cond_resched)
+ksoftirq-7 1.N.2 32us : add_preempt_count (__cond_resched)
+ksoftirq-7 1.N.2 33us : schedule (__cond_resched)
+ksoftirq-7 1.N.2 33us : add_preempt_count (schedule)
+ksoftirq-7 1.N.3 34us : hrtick_clear (schedule)
+ksoftirq-7 1dN.3 35us : _spin_lock (schedule)
+ksoftirq-7 1dN.3 36us : add_preempt_count (_spin_lock)
+ksoftirq-7 1d..4 37us : put_prev_task_fair (schedule)
+ksoftirq-7 1d..4 38us : update_curr (put_prev_task_fair)
+[...]
+ksoftirq-7 1d..5 47us : _spin_trylock (tracing_record_cmdline)
+ksoftirq-7 1d..5 48us : add_preempt_count (_spin_trylock)
+ksoftirq-7 1d..6 49us : _spin_unlock (tracing_record_cmdline)
+ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock)
+ksoftirq-7 1d..4 50us : schedule (__cond_resched)
+
+The interrupt went off while running ksoftirqd. This task runs
+at SCHED_OTHER. Why did not we see the 'N' set early? This may
+be a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K
+stacks configured, the interrupt and softirq run with their own
+stack. Some information is held on the top of the task's stack
+(need_resched and preempt_count are both stored there). The
+setting of the NEED_RESCHED bit is done directly to the task's
+stack, but the reading of the NEED_RESCHED is done by looking at
+the current stack, which in this case is the stack for the hard
+interrupt. This hides the fact that NEED_RESCHED has been set.
+We do not see the 'N' until we switch back to the task's
+assigned stack.
+
+function
+--------
+
+This tracer is the function tracer. Enabling the function tracer
+can be done from the debug file system. Make sure the
+ftrace_enabled is set; otherwise this tracer is a nop.
+
+ # sysctl kernel.ftrace_enabled=1
+ # echo function > /debug/tracing/current_tracer
+ # echo 1 > /debug/tracing/tracing_enabled
+ # usleep 1
+ # echo 0 > /debug/tracing/tracing_enabled
+ # cat /debug/tracing/trace
+# tracer: function
+#
+# TASK-PID CPU# TIMESTAMP FUNCTION
+# | | | | |
+ bash-4003 [00] 123.638713: finish_task_switch <-schedule
+ bash-4003 [00] 123.638714: _spin_unlock_irq <-finish_task_switch
+ bash-4003 [00] 123.638714: sub_preempt_count <-_spin_unlock_irq
+ bash-4003 [00] 123.638715: hrtick_set <-schedule
+ bash-4003 [00] 123.638715: _spin_lock_irqsave <-hrtick_set
+ bash-4003 [00] 123.638716: add_preempt_count <-_spin_lock_irqsave
+ bash-4003 [00] 123.638716: _spin_unlock_irqrestore <-hrtick_set
+ bash-4003 [00] 123.638717: sub_preempt_count <-_spin_unlock_irqrestore
+ bash-4003 [00] 123.638717: hrtick_clear <-hrtick_set
+ bash-4003 [00] 123.638718: sub_preempt_count <-schedule
+ bash-4003 [00] 123.638718: sub_preempt_count <-preempt_schedule
+ bash-4003 [00] 123.638719: wait_for_completion <-__stop_machine_run
+ bash-4003 [00] 123.638719: wait_for_common <-wait_for_completion
+ bash-4003 [00] 123.638720: _spin_lock_irq <-wait_for_common
+ bash-4003 [00] 123.638720: add_preempt_count <-_spin_lock_irq
+[...]
+
+
+Note: function tracer uses ring buffers to store the above
+entries. The newest data may overwrite the oldest data.
+Sometimes using echo to stop the trace is not sufficient because
+the tracing could have overwritten the data that you wanted to
+record. For this reason, it is sometimes better to disable
+tracing directly from a program. This allows you to stop the
+tracing at the point that you hit the part that you are
+interested in. To disable the tracing directly from a C program,
+something like following code snippet can be used:
+
+int trace_fd;
+[...]
+int main(int argc, char *argv[]) {
+ [...]
+ trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY);
+ [...]
+ if (condition_hit()) {
+ write(trace_fd, "0", 1);
+ }
+ [...]
+}
+
+Note: Here we hard coded the path name. The debugfs mount is not
+guaranteed to be at /debug (and is more commonly at
+/sys/kernel/debug). For simple one time traces, the above is
+sufficent. For anything else, a search through /proc/mounts may
+be needed to find where the debugfs file-system is mounted.
+
+
+Single thread tracing
+---------------------
+
+By writing into /debug/tracing/set_ftrace_pid you can trace a
+single thread. For example:
+
+# cat /debug/tracing/set_ftrace_pid
+no pid
+# echo 3111 > /debug/tracing/set_ftrace_pid
+# cat /debug/tracing/set_ftrace_pid
+3111
+# echo function > /debug/tracing/current_tracer
+# cat /debug/tracing/trace | head
+ # tracer: function
+ #
+ # TASK-PID CPU# TIMESTAMP FUNCTION
+ # | | | | |
+ yum-updatesd-3111 [003] 1637.254676: finish_task_switch <-thread_return
+ yum-updatesd-3111 [003] 1637.254681: hrtimer_cancel <-schedule_hrtimeout_range
+ yum-updatesd-3111 [003] 1637.254682: hrtimer_try_to_cancel <-hrtimer_cancel
+ yum-updatesd-3111 [003] 1637.254683: lock_hrtimer_base <-hrtimer_try_to_cancel
+ yum-updatesd-3111 [003] 1637.254685: fget_light <-do_sys_poll
+ yum-updatesd-3111 [003] 1637.254686: pipe_poll <-do_sys_poll
+# echo -1 > /debug/tracing/set_ftrace_pid
+# cat /debug/tracing/trace |head
+ # tracer: function
+ #
+ # TASK-PID CPU# TIMESTAMP FUNCTION
+ # | | | | |
+ ##### CPU 3 buffer started ####
+ yum-updatesd-3111 [003] 1701.957688: free_poll_entry <-poll_freewait
+ yum-updatesd-3111 [003] 1701.957689: remove_wait_queue <-free_poll_entry
+ yum-updatesd-3111 [003] 1701.957691: fput <-free_poll_entry
+ yum-updatesd-3111 [003] 1701.957692: audit_syscall_exit <-sysret_audit
+ yum-updatesd-3111 [003] 1701.957693: path_put <-audit_syscall_exit
+
+If you want to trace a function when executing, you could use
+something like this simple program:
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+int main (int argc, char **argv)
+{
+ if (argc < 1)
+ exit(-1);
+
+ if (fork() > 0) {
+ int fd, ffd;
+ char line[64];
+ int s;
+
+ ffd = open("/debug/tracing/current_tracer", O_WRONLY);
+ if (ffd < 0)
+ exit(-1);
+ write(ffd, "nop", 3);
+
+ fd = open("/debug/tracing/set_ftrace_pid", O_WRONLY);
+ s = sprintf(line, "%d\n", getpid());
+ write(fd, line, s);
+
+ write(ffd, "function", 8);
+
+ close(fd);
+ close(ffd);
+
+ execvp(argv[1], argv+1);
+ }
+
+ return 0;
+}
+
+
+hw-branch-tracer (x86 only)
+---------------------------
+
+This tracer uses the x86 last branch tracing hardware feature to
+collect a branch trace on all cpus with relatively low overhead.
+
+The tracer uses a fixed-size circular buffer per cpu and only
+traces ring 0 branches. The trace file dumps that buffer in the
+following format:
+
+# tracer: hw-branch-tracer
+#
+# CPU# TO <- FROM
+ 0 scheduler_tick+0xb5/0x1bf <- task_tick_idle+0x5/0x6
+ 2 run_posix_cpu_timers+0x2b/0x72a <- run_posix_cpu_timers+0x25/0x72a
+ 0 scheduler_tick+0x139/0x1bf <- scheduler_tick+0xed/0x1bf
+ 0 scheduler_tick+0x17c/0x1bf <- scheduler_tick+0x148/0x1bf
+ 2 run_posix_cpu_timers+0x9e/0x72a <- run_posix_cpu_timers+0x5e/0x72a
+ 0 scheduler_tick+0x1b6/0x1bf <- scheduler_tick+0x1aa/0x1bf
+
+
+The tracer may be used to dump the trace for the oops'ing cpu on
+a kernel oops into the system log. To enable this,
+ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one
+can either use the sysctl function or set it via the proc system
+interface.
+
+ sysctl kernel.ftrace_dump_on_oops=1
+
+or
+
+ echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
+
+
+Here's an example of such a dump after a null pointer
+dereference in a kernel module:
+
+[57848.105921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
+[57848.106019] IP: [<ffffffffa0000006>] open+0x6/0x14 [oops]
+[57848.106019] PGD 2354e9067 PUD 2375e7067 PMD 0
+[57848.106019] Oops: 0002 [#1] SMP
+[57848.106019] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:20:05.0/local_cpus
+[57848.106019] Dumping ftrace buffer:
+[57848.106019] ---------------------------------
+[...]
+[57848.106019] 0 chrdev_open+0xe6/0x165 <- cdev_put+0x23/0x24
+[57848.106019] 0 chrdev_open+0x117/0x165 <- chrdev_open+0xfa/0x165
+[57848.106019] 0 chrdev_open+0x120/0x165 <- chrdev_open+0x11c/0x165
+[57848.106019] 0 chrdev_open+0x134/0x165 <- chrdev_open+0x12b/0x165
+[57848.106019] 0 open+0x0/0x14 [oops] <- chrdev_open+0x144/0x165
+[57848.106019] 0 page_fault+0x0/0x30 <- open+0x6/0x14 [oops]
+[57848.106019] 0 error_entry+0x0/0x5b <- page_fault+0x4/0x30
+[57848.106019] 0 error_kernelspace+0x0/0x31 <- error_entry+0x59/0x5b
+[57848.106019] 0 error_sti+0x0/0x1 <- error_kernelspace+0x2d/0x31
+[57848.106019] 0 page_fault+0x9/0x30 <- error_sti+0x0/0x1
+[57848.106019] 0 do_page_fault+0x0/0x881 <- page_fault+0x1a/0x30
+[...]
+[57848.106019] 0 do_page_fault+0x66b/0x881 <- is_prefetch+0x1ee/0x1f2
+[57848.106019] 0 do_page_fault+0x6e0/0x881 <- do_page_fault+0x67a/0x881
+[57848.106019] 0 oops_begin+0x0/0x96 <- do_page_fault+0x6e0/0x881
+[57848.106019] 0 trace_hw_branch_oops+0x0/0x2d <- oops_begin+0x9/0x96
+[...]
+[57848.106019] 0 ds_suspend_bts+0x2a/0xe3 <- ds_suspend_bts+0x1a/0xe3
+[57848.106019] ---------------------------------
+[57848.106019] CPU 0
+[57848.106019] Modules linked in: oops
+[57848.106019] Pid: 5542, comm: cat Tainted: G W 2.6.28 #23
+[57848.106019] RIP: 0010:[<ffffffffa0000006>] [<ffffffffa0000006>] open+0x6/0x14 [oops]
+[57848.106019] RSP: 0018:ffff880235457d48 EFLAGS: 00010246
+[...]
+
+
+function graph tracer
+---------------------------
+
+This tracer is similar to the function tracer except that it
+probes a function on its entry and its exit. This is done by
+using a dynamically allocated stack of return addresses in each
+task_struct. On function entry the tracer overwrites the return
+address of each function traced to set a custom probe. Thus the
+original return address is stored on the stack of return address
+in the task_struct.
+
+Probing on both ends of a function leads to special features
+such as:
+
+- measure of a function's time execution
+- having a reliable call stack to draw function calls graph
+
+This tracer is useful in several situations:
+
+- you want to find the reason of a strange kernel behavior and
+ need to see what happens in detail on any areas (or specific
+ ones).
+
+- you are experiencing weird latencies but it's difficult to
+ find its origin.
+
+- you want to find quickly which path is taken by a specific
+ function
+
+- you just want to peek inside a working kernel and want to see
+ what happens there.
+
+# tracer: function_graph
+#
+# CPU DURATION FUNCTION CALLS
+# | | | | | | |
+
+ 0) | sys_open() {
+ 0) | do_sys_open() {
+ 0) | getname() {
+ 0) | kmem_cache_alloc() {
+ 0) 1.382 us | __might_sleep();
+ 0) 2.478 us | }
+ 0) | strncpy_from_user() {
+ 0) | might_fault() {
+ 0) 1.389 us | __might_sleep();
+ 0) 2.553 us | }
+ 0) 3.807 us | }
+ 0) 7.876 us | }
+ 0) | alloc_fd() {
+ 0) 0.668 us | _spin_lock();
+ 0) 0.570 us | expand_files();
+ 0) 0.586 us | _spin_unlock();
+
+
+There are several columns that can be dynamically
+enabled/disabled. You can use every combination of options you
+want, depending on your needs.
+
+- The cpu number on which the function executed is default
+ enabled. It is sometimes better to only trace one cpu (see
+ tracing_cpu_mask file) or you might sometimes see unordered
+ function calls while cpu tracing switch.
+
+ hide: echo nofuncgraph-cpu > /debug/tracing/trace_options
+ show: echo funcgraph-cpu > /debug/tracing/trace_options
+
+- The duration (function's time of execution) is displayed on
+ the closing bracket line of a function or on the same line
+ than the current function in case of a leaf one. It is default
+ enabled.
+
+ hide: echo nofuncgraph-duration > /debug/tracing/trace_options
+ show: echo funcgraph-duration > /debug/tracing/trace_options
+
+- The overhead field precedes the duration field in case of
+ reached duration thresholds.
+
+ hide: echo nofuncgraph-overhead > /debug/tracing/trace_options
+ show: echo funcgraph-overhead > /debug/tracing/trace_options
+ depends on: funcgraph-duration
+
+ ie:
+
+ 0) | up_write() {
+ 0) 0.646 us | _spin_lock_irqsave();
+ 0) 0.684 us | _spin_unlock_irqrestore();
+ 0) 3.123 us | }
+ 0) 0.548 us | fput();
+ 0) + 58.628 us | }
+
+ [...]
+
+ 0) | putname() {
+ 0) | kmem_cache_free() {
+ 0) 0.518 us | __phys_addr();
+ 0) 1.757 us | }
+ 0) 2.861 us | }
+ 0) ! 115.305 us | }
+ 0) ! 116.402 us | }
+
+ + means that the function exceeded 10 usecs.
+ ! means that the function exceeded 100 usecs.
+
+
+- The task/pid field displays the thread cmdline and pid which
+ executed the function. It is default disabled.
+
+ hide: echo nofuncgraph-proc > /debug/tracing/trace_options
+ show: echo funcgraph-proc > /debug/tracing/trace_options
+
+ ie:
+
+ # tracer: function_graph
+ #
+ # CPU TASK/PID DURATION FUNCTION CALLS
+ # | | | | | | | | |
+ 0) sh-4802 | | d_free() {
+ 0) sh-4802 | | call_rcu() {
+ 0) sh-4802 | | __call_rcu() {
+ 0) sh-4802 | 0.616 us | rcu_process_gp_end();
+ 0) sh-4802 | 0.586 us | check_for_new_grace_period();
+ 0) sh-4802 | 2.899 us | }
+ 0) sh-4802 | 4.040 us | }
+ 0) sh-4802 | 5.151 us | }
+ 0) sh-4802 | + 49.370 us | }
+
+
+- The absolute time field is an absolute timestamp given by the
+ system clock since it started. A snapshot of this time is
+ given on each entry/exit of functions
+
+ hide: echo nofuncgraph-abstime > /debug/tracing/trace_options
+ show: echo funcgraph-abstime > /debug/tracing/trace_options
+
+ ie:
+
+ #
+ # TIME CPU DURATION FUNCTION CALLS
+ # | | | | | | | |
+ 360.774522 | 1) 0.541 us | }
+ 360.774522 | 1) 4.663 us | }
+ 360.774523 | 1) 0.541 us | __wake_up_bit();
+ 360.774524 | 1) 6.796 us | }
+ 360.774524 | 1) 7.952 us | }
+ 360.774525 | 1) 9.063 us | }
+ 360.774525 | 1) 0.615 us | journal_mark_dirty();
+ 360.774527 | 1) 0.578 us | __brelse();
+ 360.774528 | 1) | reiserfs_prepare_for_journal() {
+ 360.774528 | 1) | unlock_buffer() {
+ 360.774529 | 1) | wake_up_bit() {
+ 360.774529 | 1) | bit_waitqueue() {
+ 360.774530 | 1) 0.594 us | __phys_addr();
+
+
+You can put some comments on specific functions by using
+trace_printk() For example, if you want to put a comment inside
+the __might_sleep() function, you just have to include
+<linux/ftrace.h> and call trace_printk() inside __might_sleep()
+
+trace_printk("I'm a comment!\n")
+
+will produce:
+
+ 1) | __might_sleep() {
+ 1) | /* I'm a comment! */
+ 1) 1.449 us | }
+
+
+You might find other useful features for this tracer in the
+following "dynamic ftrace" section such as tracing only specific
+functions or tasks.
+
+dynamic ftrace
+--------------
+
+If CONFIG_DYNAMIC_FTRACE is set, the system will run with
+virtually no overhead when function tracing is disabled. The way
+this works is the mcount function call (placed at the start of
+every kernel function, produced by the -pg switch in gcc),
+starts of pointing to a simple return. (Enabling FTRACE will
+include the -pg switch in the compiling of the kernel.)
+
+At compile time every C file object is run through the
+recordmcount.pl script (located in the scripts directory). This
+script will process the C object using objdump to find all the
+locations in the .text section that call mcount. (Note, only the
+.text section is processed, since processing other sections like
+.init.text may cause races due to those sections being freed).
+
+A new section called "__mcount_loc" is created that holds
+references to all the mcount call sites in the .text section.
+This section is compiled back into the original object. The
+final linker will add all these references into a single table.
+
+On boot up, before SMP is initialized, the dynamic ftrace code
+scans this table and updates all the locations into nops. It
+also records the locations, which are added to the
+available_filter_functions list. Modules are processed as they
+are loaded and before they are executed. When a module is
+unloaded, it also removes its functions from the ftrace function
+list. This is automatic in the module unload code, and the
+module author does not need to worry about it.
+
+When tracing is enabled, kstop_machine is called to prevent
+races with the CPUS executing code being modified (which can
+cause the CPU to do undesireable things), and the nops are
+patched back to calls. But this time, they do not call mcount
+(which is just a function stub). They now call into the ftrace
+infrastructure.
+
+One special side-effect to the recording of the functions being
+traced is that we can now selectively choose which functions we
+wish to trace and which ones we want the mcount calls to remain
+as nops.
+
+Two files are used, one for enabling and one for disabling the
+tracing of specified functions. They are:
+
+ set_ftrace_filter
+
+and
+
+ set_ftrace_notrace
+
+A list of available functions that you can add to these files is
+listed in:
+
+ available_filter_functions
+
+ # cat /debug/tracing/available_filter_functions
+put_prev_task_idle
+kmem_cache_create
+pick_next_task_rt
+get_online_cpus
+pick_next_task_fair
+mutex_lock
+[...]
+
+If I am only interested in sys_nanosleep and hrtimer_interrupt:
+
+ # echo sys_nanosleep hrtimer_interrupt \
+ > /debug/tracing/set_ftrace_filter
+ # echo ftrace > /debug/tracing/current_tracer
+ # echo 1 > /debug/tracing/tracing_enabled
+ # usleep 1
+ # echo 0 > /debug/tracing/tracing_enabled
+ # cat /debug/tracing/trace
+# tracer: ftrace
+#
+# TASK-PID CPU# TIMESTAMP FUNCTION
+# | | | | |
+ usleep-4134 [00] 1317.070017: hrtimer_interrupt <-smp_apic_timer_interrupt
+ usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call
+ <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt
+
+To see which functions are being traced, you can cat the file:
+
+ # cat /debug/tracing/set_ftrace_filter
+hrtimer_interrupt
+sys_nanosleep
+
+
+Perhaps this is not enough. The filters also allow simple wild
+cards. Only the following are currently available
+
+ <match>* - will match functions that begin with <match>
+ *<match> - will match functions that end with <match>
+ *<match>* - will match functions that have <match> in it
+
+These are the only wild cards which are supported.
+
+ <match>*<match> will not work.
+
+Note: It is better to use quotes to enclose the wild cards,
+ otherwise the shell may expand the parameters into names
+ of files in the local directory.
+
+ # echo 'hrtimer_*' > /debug/tracing/set_ftrace_filter
+
+Produces:
+
+# tracer: ftrace
+#
+# TASK-PID CPU# TIMESTAMP FUNCTION
+# | | | | |
+ bash-4003 [00] 1480.611794: hrtimer_init <-copy_process
+ bash-4003 [00] 1480.611941: hrtimer_start <-hrtick_set
+ bash-4003 [00] 1480.611956: hrtimer_cancel <-hrtick_clear
+ bash-4003 [00] 1480.611956: hrtimer_try_to_cancel <-hrtimer_cancel
+ <idle>-0 [00] 1480.612019: hrtimer_get_next_event <-get_next_timer_interrupt
+ <idle>-0 [00] 1480.612025: hrtimer_get_next_event <-get_next_timer_interrupt
+ <idle>-0 [00] 1480.612032: hrtimer_get_next_event <-get_next_timer_interrupt
+ <idle>-0 [00] 1480.612037: hrtimer_get_next_event <-get_next_timer_interrupt
+ <idle>-0 [00] 1480.612382: hrtimer_get_next_event <-get_next_timer_interrupt
+
+
+Notice that we lost the sys_nanosleep.
+
+ # cat /debug/tracing/set_ftrace_filter
+hrtimer_run_queues
+hrtimer_run_pending
+hrtimer_init
+hrtimer_cancel
+hrtimer_try_to_cancel
+hrtimer_forward
+hrtimer_start
+hrtimer_reprogram
+hrtimer_force_reprogram
+hrtimer_get_next_event
+hrtimer_interrupt
+hrtimer_nanosleep
+hrtimer_wakeup
+hrtimer_get_remaining
+hrtimer_get_res
+hrtimer_init_sleeper
+
+
+This is because the '>' and '>>' act just like they do in bash.
+To rewrite the filters, use '>'
+To append to the filters, use '>>'
+
+To clear out a filter so that all functions will be recorded
+again:
+
+ # echo > /debug/tracing/set_ftrace_filter
+ # cat /debug/tracing/set_ftrace_filter
+ #
+
+Again, now we want to append.
+
+ # echo sys_nanosleep > /debug/tracing/set_ftrace_filter
+ # cat /debug/tracing/set_ftrace_filter
+sys_nanosleep
+ # echo 'hrtimer_*' >> /debug/tracing/set_ftrace_filter
+ # cat /debug/tracing/set_ftrace_filter
+hrtimer_run_queues
+hrtimer_run_pending
+hrtimer_init
+hrtimer_cancel
+hrtimer_try_to_cancel
+hrtimer_forward
+hrtimer_start
+hrtimer_reprogram
+hrtimer_force_reprogram
+hrtimer_get_next_event
+hrtimer_interrupt
+sys_nanosleep
+hrtimer_nanosleep
+hrtimer_wakeup
+hrtimer_get_remaining
+hrtimer_get_res
+hrtimer_init_sleeper
+
+
+The set_ftrace_notrace prevents those functions from being
+traced.
+
+ # echo '*preempt*' '*lock*' > /debug/tracing/set_ftrace_notrace
+
+Produces:
+
+# tracer: ftrace
+#
+# TASK-PID CPU# TIMESTAMP FUNCTION
+# | | | | |
+ bash-4043 [01] 115.281644: finish_task_switch <-schedule
+ bash-4043 [01] 115.281645: hrtick_set <-schedule
+ bash-4043 [01] 115.281645: hrtick_clear <-hrtick_set
+ bash-4043 [01] 115.281646: wait_for_completion <-__stop_machine_run
+ bash-4043 [01] 115.281647: wait_for_common <-wait_for_completion
+ bash-4043 [01] 115.281647: kthread_stop <-stop_machine_run
+ bash-4043 [01] 115.281648: init_waitqueue_head <-kthread_stop
+ bash-4043 [01] 115.281648: wake_up_process <-kthread_stop
+ bash-4043 [01] 115.281649: try_to_wake_up <-wake_up_process
+
+We can see that there's no more lock or preempt tracing.
+
+
+Dynamic ftrace with the function graph tracer
+---------------------------------------------
+
+Although what has been explained above concerns both the
+function tracer and the function-graph-tracer, there are some
+special features only available in the function-graph tracer.
+
+If you want to trace only one function and all of its children,
+you just have to echo its name into set_graph_function:
+
+ echo __do_fault > set_graph_function
+
+will produce the following "expanded" trace of the __do_fault()
+function:
+
+ 0) | __do_fault() {
+ 0) | filemap_fault() {
+ 0) | find_lock_page() {
+ 0) 0.804 us | find_get_page();
+ 0) | __might_sleep() {
+ 0) 1.329 us | }
+ 0) 3.904 us | }
+ 0) 4.979 us | }
+ 0) 0.653 us | _spin_lock();
+ 0) 0.578 us | page_add_file_rmap();
+ 0) 0.525 us | native_set_pte_at();
+ 0) 0.585 us | _spin_unlock();
+ 0) | unlock_page() {
+ 0) 0.541 us | page_waitqueue();
+ 0) 0.639 us | __wake_up_bit();
+ 0) 2.786 us | }
+ 0) + 14.237 us | }
+ 0) | __do_fault() {
+ 0) | filemap_fault() {
+ 0) | find_lock_page() {
+ 0) 0.698 us | find_get_page();
+ 0) | __might_sleep() {
+ 0) 1.412 us | }
+ 0) 3.950 us | }
+ 0) 5.098 us | }
+ 0) 0.631 us | _spin_lock();
+ 0) 0.571 us | page_add_file_rmap();
+ 0) 0.526 us | native_set_pte_at();
+ 0) 0.586 us | _spin_unlock();
+ 0) | unlock_page() {
+ 0) 0.533 us | page_waitqueue();
+ 0) 0.638 us | __wake_up_bit();
+ 0) 2.793 us | }
+ 0) + 14.012 us | }
+
+You can also expand several functions at once:
+
+ echo sys_open > set_graph_function
+ echo sys_close >> set_graph_function
+
+Now if you want to go back to trace all functions you can clear
+this special filter via:
+
+ echo > set_graph_function
+
+
+trace_pipe
+----------
+
+The trace_pipe outputs the same content as the trace file, but
+the effect on the tracing is different. Every read from
+trace_pipe is consumed. This means that subsequent reads will be
+different. The trace is live.
+
+ # echo function > /debug/tracing/current_tracer
+ # cat /debug/tracing/trace_pipe > /tmp/trace.out &
+[1] 4153
+ # echo 1 > /debug/tracing/tracing_enabled
+ # usleep 1
+ # echo 0 > /debug/tracing/tracing_enabled
+ # cat /debug/tracing/trace
+# tracer: function
+#
+# TASK-PID CPU# TIMESTAMP FUNCTION
+# | | | | |
+
+ #
+ # cat /tmp/trace.out
+ bash-4043 [00] 41.267106: finish_task_switch <-schedule
+ bash-4043 [00] 41.267106: hrtick_set <-schedule
+ bash-4043 [00] 41.267107: hrtick_clear <-hrtick_set
+ bash-4043 [00] 41.267108: wait_for_completion <-__stop_machine_run
+ bash-4043 [00] 41.267108: wait_for_common <-wait_for_completion
+ bash-4043 [00] 41.267109: kthread_stop <-stop_machine_run
+ bash-4043 [00] 41.267109: init_waitqueue_head <-kthread_stop
+ bash-4043 [00] 41.267110: wake_up_process <-kthread_stop
+ bash-4043 [00] 41.267110: try_to_wake_up <-wake_up_process
+ bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up
+
+
+Note, reading the trace_pipe file will block until more input is
+added. By changing the tracer, trace_pipe will issue an EOF. We
+needed to set the function tracer _before_ we "cat" the
+trace_pipe file.
+
+
+trace entries
+-------------
+
+Having too much or not enough data can be troublesome in
+diagnosing an issue in the kernel. The file buffer_size_kb is
+used to modify the size of the internal trace buffers. The
+number listed is the number of entries that can be recorded per
+CPU. To know the full size, multiply the number of possible CPUS
+with the number of entries.
+
+ # cat /debug/tracing/buffer_size_kb
+1408 (units kilobytes)
+
+Note, to modify this, you must have tracing completely disabled.
+To do that, echo "nop" into the current_tracer. If the
+current_tracer is not set to "nop", an EINVAL error will be
+returned.
+
+ # echo nop > /debug/tracing/current_tracer
+ # echo 10000 > /debug/tracing/buffer_size_kb
+ # cat /debug/tracing/buffer_size_kb
+10000 (units kilobytes)
+
+The number of pages which will be allocated is limited to a
+percentage of available memory. Allocating too much will produce
+an error.
+
+ # echo 1000000000000 > /debug/tracing/buffer_size_kb
+-bash: echo: write error: Cannot allocate memory
+ # cat /debug/tracing/buffer_size_kb
+85
+
+-----------
+
+More details can be found in the source code, in the
+kernel/tracing/*.c files.
--- /dev/null
+ kmemtrace - Kernel Memory Tracer
+
+ by Eduard - Gabriel Munteanu
+ <eduard.munteanu@linux360.ro>
+
+I. Introduction
+===============
+
+kmemtrace helps kernel developers figure out two things:
+1) how different allocators (SLAB, SLUB etc.) perform
+2) how kernel code allocates memory and how much
+
+To do this, we trace every allocation and export information to the userspace
+through the relay interface. We export things such as the number of requested
+bytes, the number of bytes actually allocated (i.e. including internal
+fragmentation), whether this is a slab allocation or a plain kmalloc() and so
+on.
+
+The actual analysis is performed by a userspace tool (see section III for
+details on where to get it from). It logs the data exported by the kernel,
+processes it and (as of writing this) can provide the following information:
+- the total amount of memory allocated and fragmentation per call-site
+- the amount of memory allocated and fragmentation per allocation
+- total memory allocated and fragmentation in the collected dataset
+- number of cross-CPU allocation and frees (makes sense in NUMA environments)
+
+Moreover, it can potentially find inconsistent and erroneous behavior in
+kernel code, such as using slab free functions on kmalloc'ed memory or
+allocating less memory than requested (but not truly failed allocations).
+
+kmemtrace also makes provisions for tracing on some arch and analysing the
+data on another.
+
+II. Design and goals
+====================
+
+kmemtrace was designed to handle rather large amounts of data. Thus, it uses
+the relay interface to export whatever is logged to userspace, which then
+stores it. Analysis and reporting is done asynchronously, that is, after the
+data is collected and stored. By design, it allows one to log and analyse
+on different machines and different arches.
+
+As of writing this, the ABI is not considered stable, though it might not
+change much. However, no guarantees are made about compatibility yet. When
+deemed stable, the ABI should still allow easy extension while maintaining
+backward compatibility. This is described further in Documentation/ABI.
+
+Summary of design goals:
+ - allow logging and analysis to be done across different machines
+ - be fast and anticipate usage in high-load environments (*)
+ - be reasonably extensible
+ - make it possible for GNU/Linux distributions to have kmemtrace
+ included in their repositories
+
+(*) - one of the reasons Pekka Enberg's original userspace data analysis
+ tool's code was rewritten from Perl to C (although this is more than a
+ simple conversion)
+
+
+III. Quick usage guide
+======================
+
+1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
+CONFIG_KMEMTRACE).
+
+2) Get the userspace tool and build it:
+$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository
+$ cd kmemtrace-user/
+$ ./autogen.sh
+$ ./configure
+$ make
+
+3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
+'single' runlevel (so that relay buffers don't fill up easily), and run
+kmemtrace:
+# '$' does not mean user, but root here.
+$ mount -t debugfs none /sys/kernel/debug
+$ mount -t proc none /proc
+$ cd path/to/kmemtrace-user/
+$ ./kmemtraced
+Wait a bit, then stop it with CTRL+C.
+$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
+ # overrun, should
+ # be zero.
+$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
+ check its correctness]
+$ ./kmemtrace-report
+
+Now you should have a nice and short summary of how the allocator performs.
+
+IV. FAQ and known issues
+========================
+
+Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
+this? Should I worry?
+A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
+large the number is. You can fix it by supplying a higher
+'kmemtrace.subbufs=N' kernel parameter.
+---
+
+Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
+A: This is a bug and should be reported. It can occur for a variety of
+reasons:
+ - possible bugs in relay code
+ - possible misuse of relay by kmemtrace
+ - timestamps being collected unorderly
+Or you may fix it yourself and send us a patch.
+---
+
+Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
+A: This is a known issue and I'm working on it. These might be true errors
+in kernel code, which may have inconsistent behavior (e.g. allocating memory
+with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
+out this behavior may work with SLAB, but may fail with other allocators.
+
+It may also be due to lack of tracing in some unusual allocator functions.
+
+We don't want bug reports regarding this issue yet.
+---
+
+V. See also
+===========
+
+Documentation/kernel-parameters.txt
+Documentation/ABI/testing/debugfs-kmemtrace
+
--- /dev/null
+ In-kernel memory-mapped I/O tracing
+
+
+Home page and links to optional user space tools:
+
+ http://nouveau.freedesktop.org/wiki/MmioTrace
+
+MMIO tracing was originally developed by Intel around 2003 for their Fault
+Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel,
+Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau
+project in mind. Since then many people have contributed.
+
+Mmiotrace was built for reverse engineering any memory-mapped IO device with
+the Nouveau project as the first real user. Only x86 and x86_64 architectures
+are supported.
+
+Out-of-tree mmiotrace was originally modified for mainline inclusion and
+ftrace framework by Pekka Paalanen <pq@iki.fi>.
+
+
+Preparation
+-----------
+
+Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is
+disabled by default, so it is safe to have this set to yes. SMP systems are
+supported, but tracing is unreliable and may miss events if more than one CPU
+is on-line, therefore mmiotrace takes all but one CPU off-line during run-time
+activation. You can re-enable CPUs by hand, but you have been warned, there
+is no way to automatically detect if you are losing events due to CPUs racing.
+
+
+Usage Quick Reference
+---------------------
+
+$ mount -t debugfs debugfs /debug
+$ echo mmiotrace > /debug/tracing/current_tracer
+$ cat /debug/tracing/trace_pipe > mydump.txt &
+Start X or whatever.
+$ echo "X is up" > /debug/tracing/trace_marker
+$ echo nop > /debug/tracing/current_tracer
+Check for lost events.
+
+
+Usage
+-----
+
+Make sure debugfs is mounted to /debug. If not, (requires root privileges)
+$ mount -t debugfs debugfs /debug
+
+Check that the driver you are about to trace is not loaded.
+
+Activate mmiotrace (requires root privileges):
+$ echo mmiotrace > /debug/tracing/current_tracer
+
+Start storing the trace:
+$ cat /debug/tracing/trace_pipe > mydump.txt &
+The 'cat' process should stay running (sleeping) in the background.
+
+Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
+accesses to areas that are ioremapped while mmiotrace is active.
+
+During tracing you can place comments (markers) into the trace by
+$ echo "X is up" > /debug/tracing/trace_marker
+This makes it easier to see which part of the (huge) trace corresponds to
+which action. It is recommended to place descriptive markers about what you
+do.
+
+Shut down mmiotrace (requires root privileges):
+$ echo nop > /debug/tracing/current_tracer
+The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
+pressing ctrl+c.
+
+Check that mmiotrace did not lose events due to a buffer filling up. Either
+$ grep -i lost mydump.txt
+which tells you exactly how many events were lost, or use
+$ dmesg
+to view your kernel log and look for "mmiotrace has lost events" warning. If
+events were lost, the trace is incomplete. You should enlarge the buffers and
+try again. Buffers are enlarged by first seeing how large the current buffers
+are:
+$ cat /debug/tracing/buffer_size_kb
+gives you a number. Approximately double this number and write it back, for
+instance:
+$ echo 128000 > /debug/tracing/buffer_size_kb
+Then start again from the top.
+
+If you are doing a trace for a driver project, e.g. Nouveau, you should also
+do the following before sending your results:
+$ lspci -vvv > lspci.txt
+$ dmesg > dmesg.txt
+$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
+and then send the .tar.gz file. The trace compresses considerably. Replace
+"pciid" and "nick" with the PCI ID or model name of your piece of hardware
+under investigation and your nick name.
+
+
+How Mmiotrace Works
+-------------------
+
+Access to hardware IO-memory is gained by mapping addresses from PCI bus by
+calling one of the ioremap_*() functions. Mmiotrace is hooked into the
+__ioremap() function and gets called whenever a mapping is created. Mapping is
+an event that is recorded into the trace log. Note, that ISA range mappings
+are not caught, since the mapping always exists and is returned directly.
+
+MMIO accesses are recorded via page faults. Just before __ioremap() returns,
+the mapped pages are marked as not present. Any access to the pages causes a
+fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace
+marks the page present, sets TF flag to achieve single stepping and exits the
+fault handler. The instruction that faulted is executed and debug trap is
+entered. Here mmiotrace again marks the page as not present. The instruction
+is decoded to get the type of operation (read/write), data width and the value
+read or written. These are stored to the trace log.
+
+Setting the page present in the page fault handler has a race condition on SMP
+machines. During the single stepping other CPUs may run freely on that page
+and events can be missed without a notice. Re-enabling other CPUs during
+tracing is discouraged.
+
+
+Trace Log Format
+----------------
+
+The raw log is text and easily filtered with e.g. grep and awk. One record is
+one line in the log. A record starts with a keyword, followed by keyword
+dependant arguments. Arguments are separated by a space, or continue until the
+end of line. The format for version 20070824 is as follows:
+
+Explanation Keyword Space separated arguments
+---------------------------------------------------------------------------
+
+read event R width, timestamp, map id, physical, value, PC, PID
+write event W width, timestamp, map id, physical, value, PC, PID
+ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID
+iounmap event UNMAP timestamp, map id, PC, PID
+marker MARK timestamp, text
+version VERSION the string "20070824"
+info for reader LSPCI one line from lspci -v
+PCI address map PCIDEV space separated /proc/bus/pci/devices data
+unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID
+
+Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual
+is a kernel virtual address. Width is the data width in bytes and value is the
+data value. Map id is an arbitrary id number identifying the mapping that was
+used in an operation. PC is the program counter and PID is process id. PC is
+zero if it is not recorded. PID is always zero as tracing MMIO accesses
+originating in user space memory is not yet supported.
+
+For instance, the following awk filter will pass all 32-bit writes that target
+physical addresses in the range [0xfb73ce40, 0xfb800000[
+
+$ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 &&
+adr < 0xfb800000) print; }'
+
+
+Tools for Developers
+--------------------
+
+The user space tools include utilities for:
+- replacing numeric addresses and values with hardware register names
+- replaying MMIO logs, i.e., re-executing the recorded writes
+
+
--- /dev/null
+ Using the Linux Kernel Tracepoints
+
+ Mathieu Desnoyers
+
+
+This document introduces Linux Kernel Tracepoints and their use. It
+provides examples of how to insert tracepoints in the kernel and
+connect probe functions to them and provides some examples of probe
+functions.
+
+
+* Purpose of tracepoints
+
+A tracepoint placed in code provides a hook to call a function (probe)
+that you can provide at runtime. A tracepoint can be "on" (a probe is
+connected to it) or "off" (no probe is attached). When a tracepoint is
+"off" it has no effect, except for adding a tiny time penalty
+(checking a condition for a branch) and space penalty (adding a few
+bytes for the function call at the end of the instrumented function
+and adds a data structure in a separate section). When a tracepoint
+is "on", the function you provide is called each time the tracepoint
+is executed, in the execution context of the caller. When the function
+provided ends its execution, it returns to the caller (continuing from
+the tracepoint site).
+
+You can put tracepoints at important locations in the code. They are
+lightweight hooks that can pass an arbitrary number of parameters,
+which prototypes are described in a tracepoint declaration placed in a
+header file.
+
+They can be used for tracing and performance accounting.
+
+
+* Usage
+
+Two elements are required for tracepoints :
+
+- A tracepoint definition, placed in a header file.
+- The tracepoint statement, in C code.
+
+In order to use tracepoints, you should include linux/tracepoint.h.
+
+In include/trace/subsys.h :
+
+#include <linux/tracepoint.h>
+
+DECLARE_TRACE(subsys_eventname,
+ TP_PROTO(int firstarg, struct task_struct *p),
+ TP_ARGS(firstarg, p));
+
+In subsys/file.c (where the tracing statement must be added) :
+
+#include <trace/subsys.h>
+
+DEFINE_TRACE(subsys_eventname);
+
+void somefct(void)
+{
+ ...
+ trace_subsys_eventname(arg, task);
+ ...
+}
+
+Where :
+- subsys_eventname is an identifier unique to your event
+ - subsys is the name of your subsystem.
+ - eventname is the name of the event to trace.
+
+- TP_PROTO(int firstarg, struct task_struct *p) is the prototype of the
+ function called by this tracepoint.
+
+- TP_ARGS(firstarg, p) are the parameters names, same as found in the
+ prototype.
+
+Connecting a function (probe) to a tracepoint is done by providing a
+probe (function to call) for the specific tracepoint through
+register_trace_subsys_eventname(). Removing a probe is done through
+unregister_trace_subsys_eventname(); it will remove the probe.
+
+tracepoint_synchronize_unregister() must be called before the end of
+the module exit function to make sure there is no caller left using
+the probe. This, and the fact that preemption is disabled around the
+probe call, make sure that probe removal and module unload are safe.
+See the "Probe example" section below for a sample probe module.
+
+The tracepoint mechanism supports inserting multiple instances of the
+same tracepoint, but a single definition must be made of a given
+tracepoint name over all the kernel to make sure no type conflict will
+occur. Name mangling of the tracepoints is done using the prototypes
+to make sure typing is correct. Verification of probe type correctness
+is done at the registration site by the compiler. Tracepoints can be
+put in inline functions, inlined static functions, and unrolled loops
+as well as regular functions.
+
+The naming scheme "subsys_event" is suggested here as a convention
+intended to limit collisions. Tracepoint names are global to the
+kernel: they are considered as being the same whether they are in the
+core kernel image or in modules.
+
+If the tracepoint has to be used in kernel modules, an
+EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
+used to export the defined tracepoints.
+
+* Probe / tracepoint example
+
+See the example provided in samples/tracepoints
+
+Compile them with your kernel. They are built during 'make' (not
+'make modules') when CONFIG_SAMPLE_TRACEPOINTS=m.
+
+Run, as root :
+modprobe tracepoint-sample (insmod order is not important)
+modprobe tracepoint-probe-sample
+cat /proc/tracepoint-sample (returns an expected error)
+rmmod tracepoint-sample tracepoint-probe-sample
+dmesg
+++ /dev/null
- Using the Linux Kernel Tracepoints
-
- Mathieu Desnoyers
-
-
-This document introduces Linux Kernel Tracepoints and their use. It
-provides examples of how to insert tracepoints in the kernel and
-connect probe functions to them and provides some examples of probe
-functions.
-
-
-* Purpose of tracepoints
-
-A tracepoint placed in code provides a hook to call a function (probe)
-that you can provide at runtime. A tracepoint can be "on" (a probe is
-connected to it) or "off" (no probe is attached). When a tracepoint is
-"off" it has no effect, except for adding a tiny time penalty
-(checking a condition for a branch) and space penalty (adding a few
-bytes for the function call at the end of the instrumented function
-and adds a data structure in a separate section). When a tracepoint
-is "on", the function you provide is called each time the tracepoint
-is executed, in the execution context of the caller. When the function
-provided ends its execution, it returns to the caller (continuing from
-the tracepoint site).
-
-You can put tracepoints at important locations in the code. They are
-lightweight hooks that can pass an arbitrary number of parameters,
-which prototypes are described in a tracepoint declaration placed in a
-header file.
-
-They can be used for tracing and performance accounting.
-
-
-* Usage
-
-Two elements are required for tracepoints :
-
-- A tracepoint definition, placed in a header file.
-- The tracepoint statement, in C code.
-
-In order to use tracepoints, you should include linux/tracepoint.h.
-
-In include/trace/subsys.h :
-
-#include <linux/tracepoint.h>
-
-DECLARE_TRACE(subsys_eventname,
- TP_PROTO(int firstarg, struct task_struct *p),
- TP_ARGS(firstarg, p));
-
-In subsys/file.c (where the tracing statement must be added) :
-
-#include <trace/subsys.h>
-
-DEFINE_TRACE(subsys_eventname);
-
-void somefct(void)
-{
- ...
- trace_subsys_eventname(arg, task);
- ...
-}
-
-Where :
-- subsys_eventname is an identifier unique to your event
- - subsys is the name of your subsystem.
- - eventname is the name of the event to trace.
-
-- TP_PROTO(int firstarg, struct task_struct *p) is the prototype of the
- function called by this tracepoint.
-
-- TP_ARGS(firstarg, p) are the parameters names, same as found in the
- prototype.
-
-Connecting a function (probe) to a tracepoint is done by providing a
-probe (function to call) for the specific tracepoint through
-register_trace_subsys_eventname(). Removing a probe is done through
-unregister_trace_subsys_eventname(); it will remove the probe.
-
-tracepoint_synchronize_unregister() must be called before the end of
-the module exit function to make sure there is no caller left using
-the probe. This, and the fact that preemption is disabled around the
-probe call, make sure that probe removal and module unload are safe.
-See the "Probe example" section below for a sample probe module.
-
-The tracepoint mechanism supports inserting multiple instances of the
-same tracepoint, but a single definition must be made of a given
-tracepoint name over all the kernel to make sure no type conflict will
-occur. Name mangling of the tracepoints is done using the prototypes
-to make sure typing is correct. Verification of probe type correctness
-is done at the registration site by the compiler. Tracepoints can be
-put in inline functions, inlined static functions, and unrolled loops
-as well as regular functions.
-
-The naming scheme "subsys_event" is suggested here as a convention
-intended to limit collisions. Tracepoint names are global to the
-kernel: they are considered as being the same whether they are in the
-core kernel image or in modules.
-
-If the tracepoint has to be used in kernel modules, an
-EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
-used to export the defined tracepoints.
-
-* Probe / tracepoint example
-
-See the example provided in samples/tracepoints
-
-Compile them with your kernel. They are built during 'make' (not
-'make modules') when CONFIG_SAMPLE_TRACEPOINTS=m.
-
-Run, as root :
-modprobe tracepoint-sample (insmod order is not important)
-modprobe tracepoint-probe-sample
-cat /proc/tracepoint-sample (returns an expected error)
-rmmod tracepoint-sample tracepoint-probe-sample
-dmesg
+++ /dev/null
- In-kernel memory-mapped I/O tracing
-
-
-Home page and links to optional user space tools:
-
- http://nouveau.freedesktop.org/wiki/MmioTrace
-
-MMIO tracing was originally developed by Intel around 2003 for their Fault
-Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel,
-Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau
-project in mind. Since then many people have contributed.
-
-Mmiotrace was built for reverse engineering any memory-mapped IO device with
-the Nouveau project as the first real user. Only x86 and x86_64 architectures
-are supported.
-
-Out-of-tree mmiotrace was originally modified for mainline inclusion and
-ftrace framework by Pekka Paalanen <pq@iki.fi>.
-
-
-Preparation
------------
-
-Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is
-disabled by default, so it is safe to have this set to yes. SMP systems are
-supported, but tracing is unreliable and may miss events if more than one CPU
-is on-line, therefore mmiotrace takes all but one CPU off-line during run-time
-activation. You can re-enable CPUs by hand, but you have been warned, there
-is no way to automatically detect if you are losing events due to CPUs racing.
-
-
-Usage Quick Reference
----------------------
-
-$ mount -t debugfs debugfs /debug
-$ echo mmiotrace > /debug/tracing/current_tracer
-$ cat /debug/tracing/trace_pipe > mydump.txt &
-Start X or whatever.
-$ echo "X is up" > /debug/tracing/trace_marker
-$ echo nop > /debug/tracing/current_tracer
-Check for lost events.
-
-
-Usage
------
-
-Make sure debugfs is mounted to /debug. If not, (requires root privileges)
-$ mount -t debugfs debugfs /debug
-
-Check that the driver you are about to trace is not loaded.
-
-Activate mmiotrace (requires root privileges):
-$ echo mmiotrace > /debug/tracing/current_tracer
-
-Start storing the trace:
-$ cat /debug/tracing/trace_pipe > mydump.txt &
-The 'cat' process should stay running (sleeping) in the background.
-
-Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
-accesses to areas that are ioremapped while mmiotrace is active.
-
-During tracing you can place comments (markers) into the trace by
-$ echo "X is up" > /debug/tracing/trace_marker
-This makes it easier to see which part of the (huge) trace corresponds to
-which action. It is recommended to place descriptive markers about what you
-do.
-
-Shut down mmiotrace (requires root privileges):
-$ echo nop > /debug/tracing/current_tracer
-The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
-pressing ctrl+c.
-
-Check that mmiotrace did not lose events due to a buffer filling up. Either
-$ grep -i lost mydump.txt
-which tells you exactly how many events were lost, or use
-$ dmesg
-to view your kernel log and look for "mmiotrace has lost events" warning. If
-events were lost, the trace is incomplete. You should enlarge the buffers and
-try again. Buffers are enlarged by first seeing how large the current buffers
-are:
-$ cat /debug/tracing/buffer_size_kb
-gives you a number. Approximately double this number and write it back, for
-instance:
-$ echo 128000 > /debug/tracing/buffer_size_kb
-Then start again from the top.
-
-If you are doing a trace for a driver project, e.g. Nouveau, you should also
-do the following before sending your results:
-$ lspci -vvv > lspci.txt
-$ dmesg > dmesg.txt
-$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
-and then send the .tar.gz file. The trace compresses considerably. Replace
-"pciid" and "nick" with the PCI ID or model name of your piece of hardware
-under investigation and your nick name.
-
-
-How Mmiotrace Works
--------------------
-
-Access to hardware IO-memory is gained by mapping addresses from PCI bus by
-calling one of the ioremap_*() functions. Mmiotrace is hooked into the
-__ioremap() function and gets called whenever a mapping is created. Mapping is
-an event that is recorded into the trace log. Note, that ISA range mappings
-are not caught, since the mapping always exists and is returned directly.
-
-MMIO accesses are recorded via page faults. Just before __ioremap() returns,
-the mapped pages are marked as not present. Any access to the pages causes a
-fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace
-marks the page present, sets TF flag to achieve single stepping and exits the
-fault handler. The instruction that faulted is executed and debug trap is
-entered. Here mmiotrace again marks the page as not present. The instruction
-is decoded to get the type of operation (read/write), data width and the value
-read or written. These are stored to the trace log.
-
-Setting the page present in the page fault handler has a race condition on SMP
-machines. During the single stepping other CPUs may run freely on that page
-and events can be missed without a notice. Re-enabling other CPUs during
-tracing is discouraged.
-
-
-Trace Log Format
-----------------
-
-The raw log is text and easily filtered with e.g. grep and awk. One record is
-one line in the log. A record starts with a keyword, followed by keyword
-dependant arguments. Arguments are separated by a space, or continue until the
-end of line. The format for version 20070824 is as follows:
-
-Explanation Keyword Space separated arguments
----------------------------------------------------------------------------
-
-read event R width, timestamp, map id, physical, value, PC, PID
-write event W width, timestamp, map id, physical, value, PC, PID
-ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID
-iounmap event UNMAP timestamp, map id, PC, PID
-marker MARK timestamp, text
-version VERSION the string "20070824"
-info for reader LSPCI one line from lspci -v
-PCI address map PCIDEV space separated /proc/bus/pci/devices data
-unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID
-
-Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual
-is a kernel virtual address. Width is the data width in bytes and value is the
-data value. Map id is an arbitrary id number identifying the mapping that was
-used in an operation. PC is the program counter and PID is process id. PC is
-zero if it is not recorded. PID is always zero as tracing MMIO accesses
-originating in user space memory is not yet supported.
-
-For instance, the following awk filter will pass all 32-bit writes that target
-physical addresses in the range [0xfb73ce40, 0xfb800000[
-
-$ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 &&
-adr < 0xfb800000) print; }'
-
-
-Tools for Developers
---------------------
-
-The user space tools include utilities for:
-- replacing numeric addresses and values with hardware register names
-- replaying MMIO logs, i.e., re-executing the recorded writes
-
-
+++ /dev/null
- kmemtrace - Kernel Memory Tracer
-
- by Eduard - Gabriel Munteanu
- <eduard.munteanu@linux360.ro>
-
-I. Introduction
-===============
-
-kmemtrace helps kernel developers figure out two things:
-1) how different allocators (SLAB, SLUB etc.) perform
-2) how kernel code allocates memory and how much
-
-To do this, we trace every allocation and export information to the userspace
-through the relay interface. We export things such as the number of requested
-bytes, the number of bytes actually allocated (i.e. including internal
-fragmentation), whether this is a slab allocation or a plain kmalloc() and so
-on.
-
-The actual analysis is performed by a userspace tool (see section III for
-details on where to get it from). It logs the data exported by the kernel,
-processes it and (as of writing this) can provide the following information:
-- the total amount of memory allocated and fragmentation per call-site
-- the amount of memory allocated and fragmentation per allocation
-- total memory allocated and fragmentation in the collected dataset
-- number of cross-CPU allocation and frees (makes sense in NUMA environments)
-
-Moreover, it can potentially find inconsistent and erroneous behavior in
-kernel code, such as using slab free functions on kmalloc'ed memory or
-allocating less memory than requested (but not truly failed allocations).
-
-kmemtrace also makes provisions for tracing on some arch and analysing the
-data on another.
-
-II. Design and goals
-====================
-
-kmemtrace was designed to handle rather large amounts of data. Thus, it uses
-the relay interface to export whatever is logged to userspace, which then
-stores it. Analysis and reporting is done asynchronously, that is, after the
-data is collected and stored. By design, it allows one to log and analyse
-on different machines and different arches.
-
-As of writing this, the ABI is not considered stable, though it might not
-change much. However, no guarantees are made about compatibility yet. When
-deemed stable, the ABI should still allow easy extension while maintaining
-backward compatibility. This is described further in Documentation/ABI.
-
-Summary of design goals:
- - allow logging and analysis to be done across different machines
- - be fast and anticipate usage in high-load environments (*)
- - be reasonably extensible
- - make it possible for GNU/Linux distributions to have kmemtrace
- included in their repositories
-
-(*) - one of the reasons Pekka Enberg's original userspace data analysis
- tool's code was rewritten from Perl to C (although this is more than a
- simple conversion)
-
-
-III. Quick usage guide
-======================
-
-1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
-CONFIG_KMEMTRACE).
-
-2) Get the userspace tool and build it:
-$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository
-$ cd kmemtrace-user/
-$ ./autogen.sh
-$ ./configure
-$ make
-
-3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
-'single' runlevel (so that relay buffers don't fill up easily), and run
-kmemtrace:
-# '$' does not mean user, but root here.
-$ mount -t debugfs none /sys/kernel/debug
-$ mount -t proc none /proc
-$ cd path/to/kmemtrace-user/
-$ ./kmemtraced
-Wait a bit, then stop it with CTRL+C.
-$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
- # overrun, should
- # be zero.
-$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
- check its correctness]
-$ ./kmemtrace-report
-
-Now you should have a nice and short summary of how the allocator performs.
-
-IV. FAQ and known issues
-========================
-
-Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
-this? Should I worry?
-A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
-large the number is. You can fix it by supplying a higher
-'kmemtrace.subbufs=N' kernel parameter.
----
-
-Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
-A: This is a bug and should be reported. It can occur for a variety of
-reasons:
- - possible bugs in relay code
- - possible misuse of relay by kmemtrace
- - timestamps being collected unorderly
-Or you may fix it yourself and send us a patch.
----
-
-Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
-A: This is a known issue and I'm working on it. These might be true errors
-in kernel code, which may have inconsistent behavior (e.g. allocating memory
-with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
-out this behavior may work with SLAB, but may fail with other allocators.
-
-It may also be due to lack of tracing in some unusual allocator functions.
-
-We don't want bug reports regarding this issue yet.
----
-
-V. See also
-===========
-
-Documentation/kernel-parameters.txt
-Documentation/ABI/testing/debugfs-kmemtrace
-
M: dirk@opfer-online.de
S: Maintained
-ARM/PALMTX,PALMT5,PALMLD SUPPORT
+ARM/PALMTX,PALMT5,PALMLD,PALMTE2 SUPPORT
P: Marek Vasut
M: marek.vasut@gmail.com
W: http://hackndev.com
MULTIMEDIA CARD (MMC), SECURE DIGITAL (SD) AND SDIO SUBSYSTEM
P: Pierre Ossman
-M: drzeus-mmc@drzeus.cx
+M: pierre@ossman.eu
L: linux-kernel@vger.kernel.org
S: Maintained
SCHEDULER
P: Ingo Molnar
M: mingo@elte.hu
-P: Robert Love [the preemptible kernel bits]
-M: rml@tech9.net
+P: Peter Zijlstra
+M: peterz@infradead.org
L: linux-kernel@vger.kernel.org
S: Maintained
SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) DRIVER
P: Pierre Ossman
-M: drzeus-sdhci@drzeus.cx
+M: pierre@ossman.eu
L: sdhci-devel@lists.ossman.eu
S: Maintained
W83L51xD SD/MMC CARD INTERFACE DRIVER
P: Pierre Ossman
-M: drzeus-wbsd@drzeus.cx
+M: pierre@ossman.eu
L: linux-kernel@vger.kernel.org
S: Maintained
CONFIG_RTC_DRV_SA1100=y
# CONFIG_RTC_DRV_PXA is not set
# CONFIG_DMADEVICES is not set
-# CONFIG_REGULATOR is not set
+CONFIG_REGULATOR=y
+# CONFIG_REGULATOR_DEBUG is not set
+# CONFIG_REGULATOR_FIXED_VOLTAGE is not set
+# CONFIG_REGULATOR_VIRTUAL_CONSUMER is not set
+CONFIG_REGULATOR_BQ24022=y
# CONFIG_UIO is not set
# CONFIG_STAGING is not set
#define SZ_4K 0x00001000
#define SZ_8K 0x00002000
#define SZ_16K 0x00004000
+#define SZ_32K 0x00008000
#define SZ_64K 0x00010000
#define SZ_128K 0x00020000
#define SZ_256K 0x00040000
/* USB Host */
struct at91_usbh_data {
u8 ports; /* number of ports on root hub */
- u8 vbus_pin[]; /* port power-control pin */
+ u8 vbus_pin[2]; /* port power-control pin */
};
extern void __init at91_add_device_usbh(struct at91_usbh_data *data);
static int omap1_clk_enable(struct clk *clk)
{
int ret = 0;
+
if (clk->usecount++ == 0) {
- if (likely(clk->parent)) {
+ if (clk->parent) {
ret = omap1_clk_enable(clk->parent);
-
- if (unlikely(ret != 0)) {
- clk->usecount--;
- return ret;
- }
+ if (ret)
+ goto err;
if (clk->flags & CLOCK_NO_IDLE_PARENT)
omap1_clk_deny_idle(clk->parent);
}
ret = clk->ops->enable(clk);
-
- if (unlikely(ret != 0) && clk->parent) {
- omap1_clk_disable(clk->parent);
- clk->usecount--;
+ if (ret) {
+ if (clk->parent)
+ omap1_clk_disable(clk->parent);
+ goto err;
}
}
+ return ret;
+err:
+ clk->usecount--;
return ret;
}
bool "PXA based Palm PDAs"
select HAVE_PWM
+config MACH_PALMTE2
+ bool "Palm Tungsten|E2"
+ default y
+ depends on ARCH_PXA_PALM
+ select PXA25x
+ help
+ Say Y here if you intend to run this kernel on a Palm Tungsten|E2
+ handheld computer.
+
config MACH_PALMT5
bool "Palm Tungsten|T5"
default y
obj-$(CONFIG_MACH_E750) += e750.o
obj-$(CONFIG_MACH_E400) += e400.o
obj-$(CONFIG_MACH_E800) += e800.o
+obj-$(CONFIG_MACH_PALMTE2) += palmte2.o
obj-$(CONFIG_MACH_PALMT5) += palmt5.o
obj-$(CONFIG_MACH_PALMTX) += palmtx.o
obj-$(CONFIG_MACH_PALMLD) += palmld.o
/* UCB1400 touchscreen controller */
#if defined(CONFIG_TOUCHSCREEN_UCB1400) || defined(CONFIG_TOUCHSCREEN_UCB1400_MODULE)
static struct platform_device cmx2xx_ts_device = {
- .name = "ucb1400_ts",
+ .name = "ucb1400_core",
.id = -1,
};
#include <linux/kernel.h>
#include <linux/platform_device.h>
#include <linux/gpio.h>
-#include <net/ax88796.h>
+#include <linux/interrupt.h>
#include <asm/mach-types.h>
#include <asm/sizes.h>
#if defined(CONFIG_AX88796)
#define COLIBRI_ETH_IRQ_GPIO mfp_to_gpio(GPIO26_GPIO)
+
/*
* Asix AX88796 Ethernet
*/
static struct ax_plat_data colibri_asix_platdata = {
- .flags = AXFLG_MAC_FROMDEV,
- .wordlength = 2
+ .flags = 0, /* defined later */
+ .wordlength = 2,
};
static struct resource colibri_asix_resource[] = {
[1] = {
.start = gpio_to_irq(COLIBRI_ETH_IRQ_GPIO),
.end = gpio_to_irq(COLIBRI_ETH_IRQ_GPIO),
- .flags = IORESOURCE_IRQ
+ .flags = IORESOURCE_IRQ | IRQF_TRIGGER_FALLING,
}
};
static void __init colibri_pxa300_init_eth(void)
{
+ colibri_pxa3xx_init_eth(&colibri_asix_platdata);
pxa3xx_mfp_config(ARRAY_AND_SIZE(colibri_pxa300_eth_pin_config));
- set_irq_type(gpio_to_irq(COLIBRI_ETH_IRQ_GPIO), IRQ_TYPE_EDGE_FALLING);
platform_device_register(&asix_device);
}
#else
#include <linux/kernel.h>
#include <linux/platform_device.h>
#include <linux/gpio.h>
-#include <net/ax88796.h>
+#include <linux/interrupt.h>
#include <asm/mach-types.h>
#include <asm/sizes.h>
* Asix AX88796 Ethernet
*/
static struct ax_plat_data colibri_asix_platdata = {
- .flags = AXFLG_MAC_FROMDEV,
- .wordlength = 2
+ .flags = 0, /* defined later */
+ .wordlength = 2,
};
static struct resource colibri_asix_resource[] = {
[1] = {
.start = gpio_to_irq(COLIBRI_ETH_IRQ_GPIO),
.end = gpio_to_irq(COLIBRI_ETH_IRQ_GPIO),
- .flags = IORESOURCE_IRQ
+ .flags = IORESOURCE_IRQ | IRQF_TRIGGER_FALLING,
}
};
static void __init colibri_pxa320_init_eth(void)
{
+ colibri_pxa3xx_init_eth(&colibri_asix_platdata);
pxa3xx_mfp_config(ARRAY_AND_SIZE(colibri_pxa320_eth_pin_config));
- set_irq_type(gpio_to_irq(COLIBRI_ETH_IRQ_GPIO), IRQ_TYPE_EDGE_FALLING);
platform_device_register(&asix_device);
}
#else
#include <linux/kernel.h>
#include <linux/platform_device.h>
#include <linux/gpio.h>
+#include <linux/etherdevice.h>
#include <asm/mach-types.h>
#include <mach/hardware.h>
#include <asm/sizes.h>
#include "generic.h"
#include "devices.h"
+#if defined(CONFIG_AX88796)
+#define ETHER_ADDR_LEN 6
+static u8 ether_mac_addr[ETHER_ADDR_LEN];
+
+void __init colibri_pxa3xx_init_eth(struct ax_plat_data *plat_data)
+{
+ int i;
+ u64 serial = ((u64) system_serial_high << 32) | system_serial_low;
+
+ /*
+ * If the bootloader passed in a serial boot tag, which contains a
+ * valid ethernet MAC, pass it to the interface. Toradex ships the
+ * modules with their own bootloader which provides a valid MAC
+ * this way.
+ */
+
+ for (i = 0; i < ETHER_ADDR_LEN; i++) {
+ ether_mac_addr[i] = serial & 0xff;
+ serial >>= 8;
+ }
+
+ if (is_valid_ether_addr(ether_mac_addr)) {
+ plat_data->flags |= AXFLG_MAC_FROMPLATFORM;
+ plat_data->mac_addr = ether_mac_addr;
+ printk(KERN_INFO "%s(): taking MAC from serial boot tag\n",
+ __func__);
+ } else {
+ plat_data->flags |= AXFLG_MAC_FROMDEV;
+ printk(KERN_INFO "%s(): no valid serial boot tag found, "
+ "taking MAC from device\n", __func__);
+ }
+}
+#endif
+
#if defined(CONFIG_MMC_PXA) || defined(CONFIG_MMC_PXA_MODULE)
static int mmc_detect_pin;
#include <linux/input.h>
#include <linux/leds.h>
+#include <asm/mach-types.h>
+
static struct gpio_keys_button csb701_buttons[] = {
{
.code = 0x7,
static int __init csb701_init(void)
{
+ if (!machine_is_csb726())
+ return -ENODEV;
+
return platform_add_devices(devices, ARRAY_SIZE(devices));
}
#include <mach/udc.h>
#include <mach/irda.h>
#include <mach/irqs.h>
+#include <mach/audio.h>
#include "generic.h"
#include "eseries.h"
eseries_get_tmio_gpios();
platform_add_devices(devices, ARRAY_SIZE(devices));
pxa_set_udc_info(&e7xx_udc_mach_info);
+ pxa_set_ac97_info(NULL);
e7xx_irda_init();
pxa_set_ficp_info(&e7xx_ficp_platform_data);
}
#include <mach/udc.h>
#include <mach/irda.h>
#include <mach/irqs.h>
+#include <mach/audio.h>
#include "generic.h"
#include "eseries.h"
eseries_get_tmio_gpios();
platform_add_devices(devices, ARRAY_SIZE(devices));
pxa_set_udc_info(&e7xx_udc_mach_info);
+ pxa_set_ac97_info(NULL);
e7xx_irda_init();
pxa_set_ficp_info(&e7xx_ficp_platform_data);
}
#include <mach/eseries-gpio.h>
#include <mach/udc.h>
#include <mach/irqs.h>
+#include <mach/audio.h>
#include "generic.h"
#include "eseries.h"
eseries_get_tmio_gpios();
platform_add_devices(devices, ARRAY_SIZE(devices));
pxa_set_udc_info(&e800_udc_mach_info);
+ pxa_set_ac97_info(NULL);
}
MACHINE_START(E800, "Toshiba e800")
#include <linux/regulator/machine.h>
#include <linux/spi/spi.h>
#include <linux/spi/tdo24m.h>
+#include <linux/spi/libertas_spi.h>
#include <linux/power_supply.h>
#include <linux/apm-emulation.h>
+#include <linux/delay.h>
#include <media/soc_camera.h>
#define GPIO93_CAM_RESET (93)
#define GPIO41_ETHIRQ (41)
#define EM_X270_ETHIRQ IRQ_GPIO(GPIO41_ETHIRQ)
+#define GPIO115_WLAN_PWEN (115)
+#define GPIO19_WLAN_STRAP (19)
static int mmc_cd;
static int nand_rb;
GPIO57_SSP1_TXD,
/* SSP2 */
- GPIO19_SSP2_SCLK,
- GPIO14_SSP2_SFRM,
+ GPIO19_GPIO, /* SSP2 clock is used as GPIO for Libertas pin-strap */
+ GPIO14_GPIO,
GPIO89_SSP2_TXD,
GPIO88_SSP2_RXD,
.model = TDO35S,
};
+static struct pxa2xx_spi_master em_x270_spi_2_info = {
+ .num_chipselect = 1,
+ .enable_dma = 1,
+};
+
+static struct pxa2xx_spi_chip em_x270_libertas_chip = {
+ .rx_threshold = 1,
+ .tx_threshold = 1,
+ .timeout = 1000,
+};
+
+static unsigned long em_x270_libertas_pin_config[] = {
+ /* SSP2 */
+ GPIO19_SSP2_SCLK,
+ GPIO14_GPIO,
+ GPIO89_SSP2_TXD,
+ GPIO88_SSP2_RXD,
+};
+
+static int em_x270_libertas_setup(struct spi_device *spi)
+{
+ int err = gpio_request(GPIO115_WLAN_PWEN, "WLAN PWEN");
+ if (err)
+ return err;
+
+ gpio_direction_output(GPIO19_WLAN_STRAP, 1);
+ mdelay(100);
+
+ pxa2xx_mfp_config(ARRAY_AND_SIZE(em_x270_libertas_pin_config));
+
+ gpio_direction_output(GPIO115_WLAN_PWEN, 0);
+ mdelay(100);
+ gpio_set_value(GPIO115_WLAN_PWEN, 1);
+ mdelay(100);
+
+ spi->bits_per_word = 16;
+ spi_setup(spi);
+
+ return 0;
+}
+
+static int em_x270_libertas_teardown(struct spi_device *spi)
+{
+ gpio_set_value(GPIO115_WLAN_PWEN, 0);
+ gpio_free(GPIO115_WLAN_PWEN);
+
+ return 0;
+}
+
+struct libertas_spi_platform_data em_x270_libertas_pdata = {
+ .use_dummy_writes = 1,
+ .gpio_cs = 14,
+ .setup = em_x270_libertas_setup,
+ .teardown = em_x270_libertas_teardown,
+};
+
static struct spi_board_info em_x270_spi_devices[] __initdata = {
{
- .modalias = "tdo24m",
- .max_speed_hz = 1000000,
- .bus_num = 1,
- .chip_select = 0,
- .controller_data = &em_x270_tdo24m_chip,
- .platform_data = &em_x270_tdo24m_pdata,
+ .modalias = "tdo24m",
+ .max_speed_hz = 1000000,
+ .bus_num = 1,
+ .chip_select = 0,
+ .controller_data = &em_x270_tdo24m_chip,
+ .platform_data = &em_x270_tdo24m_pdata,
+ },
+ {
+ .modalias = "libertas_spi",
+ .max_speed_hz = 13000000,
+ .bus_num = 2,
+ .irq = IRQ_GPIO(116),
+ .chip_select = 0,
+ .controller_data = &em_x270_libertas_chip,
+ .platform_data = &em_x270_libertas_pdata,
},
};
static void __init em_x270_init_spi(void)
{
pxa2xx_set_spi_info(1, &em_x270_spi_info);
+ pxa2xx_set_spi_info(2, &em_x270_spi_2_info);
spi_register_board_info(ARRAY_AND_SIZE(em_x270_spi_devices));
}
#else
#ifndef _COLIBRI_H_
#define _COLIBRI_H_
+
+#include <net/ax88796.h>
+
/*
* common settings for all modules
*/
static inline void colibri_pxa3xx_init_lcd(int) {}
#endif
+#if defined(CONFIG_AX88796)
+extern void colibri_pxa3xx_init_eth(struct ax_plat_data *plat_data);
+#endif
+
/* physical memory regions */
#define COLIBRI_SDRAM_BASE 0xa0000000 /* SDRAM region */
#define GPIO22_MAGICIAN_VIBRA_EN 22
#define GPIO26_MAGICIAN_GSM_POWER 26
#define GPIO27_MAGICIAN_USBC_PUEN 27
-#define GPIO30_MAGICIAN_nCHARGE_EN 30
+#define GPIO30_MAGICIAN_BQ24022_nCHARGE_EN 30
#define GPIO37_MAGICIAN_KEY_HANGUP 37
#define GPIO38_MAGICIAN_KEY_CONTACTS 38
#define GPIO40_MAGICIAN_GSM_OUT2 40
#define EGPIO_MAGICIAN_UNKNOWN_WAVEDEV_DLL MAGICIAN_EGPIO(2, 2)
#define EGPIO_MAGICIAN_FLASH_VPP MAGICIAN_EGPIO(2, 3)
#define EGPIO_MAGICIAN_BL_POWER2 MAGICIAN_EGPIO(2, 4)
-#define EGPIO_MAGICIAN_CHARGE_EN MAGICIAN_EGPIO(2, 5)
+#define EGPIO_MAGICIAN_BQ24022_ISET2 MAGICIAN_EGPIO(2, 5)
#define EGPIO_MAGICIAN_GSM_POWER MAGICIAN_EGPIO(2, 7)
/* input */
#define PALMLD_IDE_SIZE 0x00100000
#define PALMLD_PHYS_IO_START 0x40000000
+#define PALMLD_STR_BASE 0xa0200000
/* BATTERY */
#define PALMLD_BAT_MAX_VOLTAGE 4000 /* 4.00V maximum voltage */
/* Various addresses */
#define PALMT5_PHYS_RAM_START 0xa0000000
#define PALMT5_PHYS_IO_START 0x40000000
+#define PALMT5_STR_BASE 0xa0200000
/* TOUCHSCREEN */
#define AC97_LINK_FRAME 21
--- /dev/null
+/*
+ * GPIOs and interrupts for Palm Tungsten|E2 Handheld Computer
+ *
+ * Author:
+ * Carlos Eduardo Medaglia Dyonisio <cadu@nerdfeliz.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#ifndef _INCLUDE_PALMTE2_H_
+#define _INCLUDE_PALMTE2_H_
+
+/** HERE ARE GPIOs **/
+
+/* GPIOs */
+#define GPIO_NR_PALMTE2_POWER_DETECT 9
+#define GPIO_NR_PALMTE2_HOTSYNC_BUTTON_N 4
+#define GPIO_NR_PALMTE2_EARPHONE_DETECT 15
+
+/* SD/MMC */
+#define GPIO_NR_PALMTE2_SD_DETECT_N 10
+#define GPIO_NR_PALMTE2_SD_POWER 55
+#define GPIO_NR_PALMTE2_SD_READONLY 51
+
+/* IRDA - disable GPIO connected to SD pin of tranceiver (TFBS4710?) ? */
+#define GPIO_NR_PALMTE2_IR_DISABLE 48
+
+/* USB */
+#define GPIO_NR_PALMTE2_USB_DETECT_N 35
+#define GPIO_NR_PALMTE2_USB_PULLUP 53
+
+/* LCD/BACKLIGHT */
+#define GPIO_NR_PALMTE2_BL_POWER 56
+#define GPIO_NR_PALMTE2_LCD_POWER 37
+
+/* KEYS */
+#define GPIO_NR_PALMTE2_KEY_NOTES 5
+#define GPIO_NR_PALMTE2_KEY_TASKS 7
+#define GPIO_NR_PALMTE2_KEY_CALENDAR 11
+#define GPIO_NR_PALMTE2_KEY_CONTACTS 13
+#define GPIO_NR_PALMTE2_KEY_CENTER 14
+#define GPIO_NR_PALMTE2_KEY_LEFT 19
+#define GPIO_NR_PALMTE2_KEY_RIGHT 20
+#define GPIO_NR_PALMTE2_KEY_DOWN 21
+#define GPIO_NR_PALMTE2_KEY_UP 22
+
+/** HERE ARE INIT VALUES **/
+
+/* BACKLIGHT */
+#define PALMTE2_MAX_INTENSITY 0xFE
+#define PALMTE2_DEFAULT_INTENSITY 0x7E
+#define PALMTE2_LIMIT_MASK 0x7F
+#define PALMTE2_PRESCALER 0x3F
+#define PALMTE2_PERIOD_NS 3500
+
+/* BATTERY */
+#define PALMTE2_BAT_MAX_VOLTAGE 4000 /* 4.00v current voltage */
+#define PALMTE2_BAT_MIN_VOLTAGE 3550 /* 3.55v critical voltage */
+#define PALMTE2_BAT_MAX_CURRENT 0 /* unknokn */
+#define PALMTE2_BAT_MIN_CURRENT 0 /* unknown */
+#define PALMTE2_BAT_MAX_CHARGE 1 /* unknown */
+#define PALMTE2_BAT_MIN_CHARGE 1 /* unknown */
+#define PALMTE2_MAX_LIFE_MINS 360 /* on-life in minutes */
+
+#endif
#define PALMTX_PHYS_RAM_START 0xa0000000
#define PALMTX_PHYS_IO_START 0x40000000
+#define PALMTX_STR_BASE 0xa0200000
+
#define PALMTX_PHYS_FLASH_START PXA_CS0_PHYS /* ChipSelect 0 */
#define PALMTX_PHYS_NAND_START PXA_CS1_PHYS /* ChipSelect 1 */
#include <linux/mtd/physmap.h>
#include <linux/pda_power.h>
#include <linux/pwm_backlight.h>
+#include <linux/regulator/bq24022.h>
+#include <linux/regulator/machine.h>
#include <linux/usb/gpio_vbus.h>
#include <mach/hardware.h>
static int power_supply_init(struct device *dev)
{
- int ret;
-
- ret = gpio_request(EGPIO_MAGICIAN_CABLE_STATE_AC, "CABLE_STATE_AC");
- if (ret)
- goto err_cs_ac;
- ret = gpio_request(EGPIO_MAGICIAN_CABLE_STATE_USB, "CABLE_STATE_USB");
- if (ret)
- goto err_cs_usb;
- ret = gpio_request(EGPIO_MAGICIAN_CHARGE_EN, "CHARGE_EN");
- if (ret)
- goto err_chg_en;
- ret = gpio_request(GPIO30_MAGICIAN_nCHARGE_EN, "nCHARGE_EN");
- if (!ret)
- ret = gpio_direction_output(GPIO30_MAGICIAN_nCHARGE_EN, 0);
- if (ret)
- goto err_nchg_en;
-
- return 0;
-
-err_nchg_en:
- gpio_free(EGPIO_MAGICIAN_CHARGE_EN);
-err_chg_en:
- gpio_free(EGPIO_MAGICIAN_CABLE_STATE_USB);
-err_cs_usb:
- gpio_free(EGPIO_MAGICIAN_CABLE_STATE_AC);
-err_cs_ac:
- return ret;
+ return gpio_request(EGPIO_MAGICIAN_CABLE_STATE_AC, "CABLE_STATE_AC");
}
static int magician_is_ac_online(void)
return gpio_get_value(EGPIO_MAGICIAN_CABLE_STATE_AC);
}
-static int magician_is_usb_online(void)
-{
- return gpio_get_value(EGPIO_MAGICIAN_CABLE_STATE_USB);
-}
-
-static void magician_set_charge(int flags)
-{
- gpio_set_value(GPIO30_MAGICIAN_nCHARGE_EN, !flags);
- gpio_set_value(EGPIO_MAGICIAN_CHARGE_EN, flags);
-}
-
static void power_supply_exit(struct device *dev)
{
- gpio_free(GPIO30_MAGICIAN_nCHARGE_EN);
- gpio_free(EGPIO_MAGICIAN_CHARGE_EN);
- gpio_free(EGPIO_MAGICIAN_CABLE_STATE_USB);
gpio_free(EGPIO_MAGICIAN_CABLE_STATE_AC);
}
static struct pda_power_pdata power_supply_info = {
.init = power_supply_init,
.is_ac_online = magician_is_ac_online,
- .is_usb_online = magician_is_usb_online,
- .set_charge = magician_set_charge,
.exit = power_supply_exit,
.supplied_to = magician_supplicants,
.num_supplicants = ARRAY_SIZE(magician_supplicants),
.num_resources = ARRAY_SIZE(power_supply_resources),
};
+/*
+ * Battery charger
+ */
+
+static struct regulator_consumer_supply bq24022_consumers[] = {
+ {
+ .dev = &gpio_vbus.dev,
+ .supply = "vbus_draw",
+ },
+ {
+ .dev = &power_supply.dev,
+ .supply = "ac_draw",
+ },
+};
+
+static struct regulator_init_data bq24022_init_data = {
+ .constraints = {
+ .max_uA = 500000,
+ .valid_ops_mask = REGULATOR_CHANGE_CURRENT,
+ },
+ .num_consumer_supplies = ARRAY_SIZE(bq24022_consumers),
+ .consumer_supplies = bq24022_consumers,
+};
+
+static struct bq24022_mach_info bq24022_info = {
+ .gpio_nce = GPIO30_MAGICIAN_BQ24022_nCHARGE_EN,
+ .gpio_iset2 = EGPIO_MAGICIAN_BQ24022_ISET2,
+ .init_data = &bq24022_init_data,
+};
+
+static struct platform_device bq24022 = {
+ .name = "bq24022",
+ .id = -1,
+ .dev = {
+ .platform_data = &bq24022_info,
+ },
+};
/*
* MMC/SD
&egpio,
&backlight,
&pasic3,
+ &bq24022,
&gpio_vbus,
&power_supply,
&strataflash,
#include <mach/pxa27x-udc.h>
#include <mach/i2c.h>
#include <mach/camera.h>
+#include <mach/audio.h>
#include <media/soc_camera.h>
#include <mach/mioa701.h>
&mioa701_backlight_data);
MIO_SIMPLE_DEV(mioa701_led, "leds-gpio", &gpio_led_info)
MIO_SIMPLE_DEV(pxa2xx_pcm, "pxa2xx-pcm", NULL)
-MIO_SIMPLE_DEV(pxa2xx_ac97, "pxa2xx-ac97", NULL)
-MIO_PARENT_DEV(mio_wm9713_codec, "wm9713-codec", &pxa2xx_ac97.dev, NULL)
MIO_SIMPLE_DEV(mioa701_sound, "mioa701-wm9713", NULL)
MIO_SIMPLE_DEV(mioa701_board, "mioa701-board", NULL)
MIO_SIMPLE_DEV(gpio_vbus, "gpio-vbus", &gpio_vbus_data);
&mioa701_backlight,
&mioa701_led,
&pxa2xx_pcm,
- &pxa2xx_ac97,
- &mio_wm9713_codec,
&mioa701_sound,
&power_dev,
&strataflash,
pxa_set_keypad_info(&mioa701_keypad_info);
wm97xx_bat_set_pdata(&mioa701_battery_data);
pxa_set_udc_info(&mioa701_udc_info);
+ pxa_set_ac97_info(NULL);
pm_power_off = mioa701_poweroff;
arm_pm_restart = mioa701_restart;
platform_add_devices(devices, ARRAY_SIZE(devices));
#include <linux/gpio.h>
#include <linux/wm97xx_batt.h>
#include <linux/power_supply.h>
+#include <linux/sysdev.h>
#include <asm/mach-types.h>
#include <asm/mach/arch.h>
GPIO47_FICP_TXD,
/* MATRIX KEYPAD */
- GPIO100_KP_MKIN_0,
- GPIO101_KP_MKIN_1,
- GPIO102_KP_MKIN_2,
- GPIO97_KP_MKIN_3,
+ GPIO100_KP_MKIN_0 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO101_KP_MKIN_1 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO102_KP_MKIN_2 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO97_KP_MKIN_3 | WAKEUP_ON_LEVEL_HIGH,
GPIO103_KP_MKOUT_0,
GPIO104_KP_MKOUT_1,
GPIO105_KP_MKOUT_2,
.lcd_conn = LCD_COLOR_TFT_16BPP | LCD_PCLK_EDGE_FALL,
};
+/******************************************************************************
+ * Power management - standby
+ ******************************************************************************/
+#ifdef CONFIG_PM
+static u32 *addr __initdata;
+static u32 resume[3] __initdata = {
+ 0xe3a00101, /* mov r0, #0x40000000 */
+ 0xe380060f, /* orr r0, r0, #0x00f00000 */
+ 0xe590f008, /* ldr pc, [r0, #0x08] */
+};
+
+static int __init palmld_pm_init(void)
+{
+ int i;
+
+ /* this is where the bootloader jumps */
+ addr = phys_to_virt(PALMLD_STR_BASE);
+
+ for (i = 0; i < 3; i++)
+ addr[i] = resume[i];
+
+ return 0;
+}
+
+device_initcall(palmld_pm_init);
+#endif
+
/******************************************************************************
* Machine init
******************************************************************************/
GPIO95_GPIO, /* usb power */
/* MATRIX KEYPAD */
- GPIO100_KP_MKIN_0,
- GPIO101_KP_MKIN_1,
- GPIO102_KP_MKIN_2,
- GPIO97_KP_MKIN_3,
+ GPIO100_KP_MKIN_0 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO101_KP_MKIN_1 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO102_KP_MKIN_2 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO97_KP_MKIN_3 | WAKEUP_ON_LEVEL_HIGH,
GPIO103_KP_MKOUT_0,
GPIO104_KP_MKOUT_1,
GPIO105_KP_MKOUT_2,
.lcd_conn = LCD_COLOR_TFT_16BPP | LCD_PCLK_EDGE_FALL,
};
+/******************************************************************************
+ * Power management - standby
+ ******************************************************************************/
+#ifdef CONFIG_PM
+static u32 *addr __initdata;
+static u32 resume[3] __initdata = {
+ 0xe3a00101, /* mov r0, #0x40000000 */
+ 0xe380060f, /* orr r0, r0, #0x00f00000 */
+ 0xe590f008, /* ldr pc, [r0, #0x08] */
+};
+
+static int __init palmt5_pm_init(void)
+{
+ int i;
+
+ /* this is where the bootloader jumps */
+ addr = phys_to_virt(PALMT5_STR_BASE);
+
+ for (i = 0; i < 3; i++)
+ addr[i] = resume[i];
+
+ return 0;
+}
+
+device_initcall(palmt5_pm_init);
+#endif
+
/******************************************************************************
* Machine init
******************************************************************************/
--- /dev/null
+/*
+ * Hardware definitions for Palm Tungsten|E2
+ *
+ * Author:
+ * Carlos Eduardo Medaglia Dyonisio <cadu@nerdfeliz.com>
+ *
+ * Rewrite for mainline:
+ * Marek Vasut <marek.vasut@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * (find more info at www.hackndev.com)
+ *
+ */
+
+#include <linux/platform_device.h>
+#include <linux/delay.h>
+#include <linux/irq.h>
+#include <linux/gpio_keys.h>
+#include <linux/input.h>
+#include <linux/pda_power.h>
+#include <linux/pwm_backlight.h>
+#include <linux/gpio.h>
+#include <linux/wm97xx_batt.h>
+#include <linux/power_supply.h>
+
+#include <asm/mach-types.h>
+#include <asm/mach/arch.h>
+#include <asm/mach/map.h>
+
+#include <mach/audio.h>
+#include <mach/palmte2.h>
+#include <mach/mmc.h>
+#include <mach/pxafb.h>
+#include <mach/mfp-pxa25x.h>
+#include <mach/irda.h>
+#include <mach/udc.h>
+
+#include "generic.h"
+#include "devices.h"
+
+/******************************************************************************
+ * Pin configuration
+ ******************************************************************************/
+static unsigned long palmte2_pin_config[] __initdata = {
+ /* MMC */
+ GPIO6_MMC_CLK,
+ GPIO8_MMC_CS0,
+ GPIO10_GPIO, /* SD detect */
+ GPIO55_GPIO, /* SD power */
+ GPIO51_GPIO, /* SD r/o switch */
+
+ /* AC97 */
+ GPIO28_AC97_BITCLK,
+ GPIO29_AC97_SDATA_IN_0,
+ GPIO30_AC97_SDATA_OUT,
+ GPIO31_AC97_SYNC,
+
+ /* PWM */
+ GPIO16_PWM0_OUT,
+
+ /* USB */
+ GPIO15_GPIO, /* usb detect */
+ GPIO53_GPIO, /* usb power */
+
+ /* IrDA */
+ GPIO48_GPIO, /* ir disable */
+ GPIO46_FICP_RXD,
+ GPIO47_FICP_TXD,
+
+ /* LCD */
+ GPIO58_LCD_LDD_0,
+ GPIO59_LCD_LDD_1,
+ GPIO60_LCD_LDD_2,
+ GPIO61_LCD_LDD_3,
+ GPIO62_LCD_LDD_4,
+ GPIO63_LCD_LDD_5,
+ GPIO64_LCD_LDD_6,
+ GPIO65_LCD_LDD_7,
+ GPIO66_LCD_LDD_8,
+ GPIO67_LCD_LDD_9,
+ GPIO68_LCD_LDD_10,
+ GPIO69_LCD_LDD_11,
+ GPIO70_LCD_LDD_12,
+ GPIO71_LCD_LDD_13,
+ GPIO72_LCD_LDD_14,
+ GPIO73_LCD_LDD_15,
+ GPIO74_LCD_FCLK,
+ GPIO75_LCD_LCLK,
+ GPIO76_LCD_PCLK,
+ GPIO77_LCD_BIAS,
+
+ /* GPIO KEYS */
+ GPIO5_GPIO, /* notes */
+ GPIO7_GPIO, /* tasks */
+ GPIO11_GPIO, /* calendar */
+ GPIO13_GPIO, /* contacts */
+ GPIO14_GPIO, /* center */
+ GPIO19_GPIO, /* left */
+ GPIO20_GPIO, /* right */
+ GPIO21_GPIO, /* down */
+ GPIO22_GPIO, /* up */
+
+ /* MISC */
+ GPIO1_RST, /* reset */
+ GPIO4_GPIO, /* Hotsync button */
+ GPIO9_GPIO, /* power detect */
+ GPIO37_GPIO, /* LCD power */
+ GPIO56_GPIO, /* Backlight power */
+};
+
+/******************************************************************************
+ * SD/MMC card controller
+ ******************************************************************************/
+static int palmte2_mci_init(struct device *dev,
+ irq_handler_t palmte2_detect_int, void *data)
+{
+ int err = 0;
+
+ /* Setup an interrupt for detecting card insert/remove events */
+ err = gpio_request(GPIO_NR_PALMTE2_SD_DETECT_N, "SD IRQ");
+ if (err)
+ goto err;
+ err = gpio_direction_input(GPIO_NR_PALMTE2_SD_DETECT_N);
+ if (err)
+ goto err2;
+ err = request_irq(gpio_to_irq(GPIO_NR_PALMTE2_SD_DETECT_N),
+ palmte2_detect_int, IRQF_DISABLED | IRQF_SAMPLE_RANDOM |
+ IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING,
+ "SD/MMC card detect", data);
+ if (err) {
+ printk(KERN_ERR "%s: cannot request SD/MMC card detect IRQ\n",
+ __func__);
+ goto err2;
+ }
+
+ err = gpio_request(GPIO_NR_PALMTE2_SD_POWER, "SD_POWER");
+ if (err)
+ goto err3;
+ err = gpio_direction_output(GPIO_NR_PALMTE2_SD_POWER, 0);
+ if (err)
+ goto err4;
+
+ err = gpio_request(GPIO_NR_PALMTE2_SD_READONLY, "SD_READONLY");
+ if (err)
+ goto err4;
+ err = gpio_direction_input(GPIO_NR_PALMTE2_SD_READONLY);
+ if (err)
+ goto err5;
+
+ printk(KERN_DEBUG "%s: irq registered\n", __func__);
+
+ return 0;
+
+err5:
+ gpio_free(GPIO_NR_PALMTE2_SD_READONLY);
+err4:
+ gpio_free(GPIO_NR_PALMTE2_SD_POWER);
+err3:
+ free_irq(gpio_to_irq(GPIO_NR_PALMTE2_SD_DETECT_N), data);
+err2:
+ gpio_free(GPIO_NR_PALMTE2_SD_DETECT_N);
+err:
+ return err;
+}
+
+static void palmte2_mci_exit(struct device *dev, void *data)
+{
+ gpio_free(GPIO_NR_PALMTE2_SD_READONLY);
+ gpio_free(GPIO_NR_PALMTE2_SD_POWER);
+ free_irq(gpio_to_irq(GPIO_NR_PALMTE2_SD_DETECT_N), data);
+ gpio_free(GPIO_NR_PALMTE2_SD_DETECT_N);
+}
+
+static void palmte2_mci_power(struct device *dev, unsigned int vdd)
+{
+ struct pxamci_platform_data *p_d = dev->platform_data;
+ gpio_set_value(GPIO_NR_PALMTE2_SD_POWER, p_d->ocr_mask & (1 << vdd));
+}
+
+static int palmte2_mci_get_ro(struct device *dev)
+{
+ return gpio_get_value(GPIO_NR_PALMTE2_SD_READONLY);
+}
+
+static struct pxamci_platform_data palmte2_mci_platform_data = {
+ .ocr_mask = MMC_VDD_32_33 | MMC_VDD_33_34,
+ .setpower = palmte2_mci_power,
+ .get_ro = palmte2_mci_get_ro,
+ .init = palmte2_mci_init,
+ .exit = palmte2_mci_exit,
+};
+
+/******************************************************************************
+ * GPIO keys
+ ******************************************************************************/
+static struct gpio_keys_button palmte2_pxa_buttons[] = {
+ {KEY_F1, GPIO_NR_PALMTE2_KEY_CONTACTS, 1, "Contacts" },
+ {KEY_F2, GPIO_NR_PALMTE2_KEY_CALENDAR, 1, "Calendar" },
+ {KEY_F3, GPIO_NR_PALMTE2_KEY_TASKS, 1, "Tasks" },
+ {KEY_F4, GPIO_NR_PALMTE2_KEY_NOTES, 1, "Notes" },
+ {KEY_ENTER, GPIO_NR_PALMTE2_KEY_CENTER, 1, "Center" },
+ {KEY_LEFT, GPIO_NR_PALMTE2_KEY_LEFT, 1, "Left" },
+ {KEY_RIGHT, GPIO_NR_PALMTE2_KEY_RIGHT, 1, "Right" },
+ {KEY_DOWN, GPIO_NR_PALMTE2_KEY_DOWN, 1, "Down" },
+ {KEY_UP, GPIO_NR_PALMTE2_KEY_UP, 1, "Up" },
+};
+
+static struct gpio_keys_platform_data palmte2_pxa_keys_data = {
+ .buttons = palmte2_pxa_buttons,
+ .nbuttons = ARRAY_SIZE(palmte2_pxa_buttons),
+};
+
+static struct platform_device palmte2_pxa_keys = {
+ .name = "gpio-keys",
+ .id = -1,
+ .dev = {
+ .platform_data = &palmte2_pxa_keys_data,
+ },
+};
+
+/******************************************************************************
+ * Backlight
+ ******************************************************************************/
+static int palmte2_backlight_init(struct device *dev)
+{
+ int ret;
+
+ ret = gpio_request(GPIO_NR_PALMTE2_BL_POWER, "BL POWER");
+ if (ret)
+ goto err;
+ ret = gpio_direction_output(GPIO_NR_PALMTE2_BL_POWER, 0);
+ if (ret)
+ goto err2;
+ ret = gpio_request(GPIO_NR_PALMTE2_LCD_POWER, "LCD POWER");
+ if (ret)
+ goto err2;
+ ret = gpio_direction_output(GPIO_NR_PALMTE2_LCD_POWER, 0);
+ if (ret)
+ goto err3;
+
+ return 0;
+err3:
+ gpio_free(GPIO_NR_PALMTE2_LCD_POWER);
+err2:
+ gpio_free(GPIO_NR_PALMTE2_BL_POWER);
+err:
+ return ret;
+}
+
+static int palmte2_backlight_notify(int brightness)
+{
+ gpio_set_value(GPIO_NR_PALMTE2_BL_POWER, brightness);
+ gpio_set_value(GPIO_NR_PALMTE2_LCD_POWER, brightness);
+ return brightness;
+}
+
+static void palmte2_backlight_exit(struct device *dev)
+{
+ gpio_free(GPIO_NR_PALMTE2_BL_POWER);
+ gpio_free(GPIO_NR_PALMTE2_LCD_POWER);
+}
+
+static struct platform_pwm_backlight_data palmte2_backlight_data = {
+ .pwm_id = 0,
+ .max_brightness = PALMTE2_MAX_INTENSITY,
+ .dft_brightness = PALMTE2_MAX_INTENSITY,
+ .pwm_period_ns = PALMTE2_PERIOD_NS,
+ .init = palmte2_backlight_init,
+ .notify = palmte2_backlight_notify,
+ .exit = palmte2_backlight_exit,
+};
+
+static struct platform_device palmte2_backlight = {
+ .name = "pwm-backlight",
+ .dev = {
+ .parent = &pxa25x_device_pwm0.dev,
+ .platform_data = &palmte2_backlight_data,
+ },
+};
+
+/******************************************************************************
+ * IrDA
+ ******************************************************************************/
+static int palmte2_irda_startup(struct device *dev)
+{
+ int err;
+ err = gpio_request(GPIO_NR_PALMTE2_IR_DISABLE, "IR DISABLE");
+ if (err)
+ goto err;
+ err = gpio_direction_output(GPIO_NR_PALMTE2_IR_DISABLE, 1);
+ if (err)
+ gpio_free(GPIO_NR_PALMTE2_IR_DISABLE);
+err:
+ return err;
+}
+
+static void palmte2_irda_shutdown(struct device *dev)
+{
+ gpio_free(GPIO_NR_PALMTE2_IR_DISABLE);
+}
+
+static void palmte2_irda_transceiver_mode(struct device *dev, int mode)
+{
+ gpio_set_value(GPIO_NR_PALMTE2_IR_DISABLE, mode & IR_OFF);
+ pxa2xx_transceiver_mode(dev, mode);
+}
+
+static struct pxaficp_platform_data palmte2_ficp_platform_data = {
+ .startup = palmte2_irda_startup,
+ .shutdown = palmte2_irda_shutdown,
+ .transceiver_cap = IR_SIRMODE | IR_FIRMODE | IR_OFF,
+ .transceiver_mode = palmte2_irda_transceiver_mode,
+};
+
+/******************************************************************************
+ * UDC
+ ******************************************************************************/
+static struct pxa2xx_udc_mach_info palmte2_udc_info __initdata = {
+ .gpio_vbus = GPIO_NR_PALMTE2_USB_DETECT_N,
+ .gpio_vbus_inverted = 1,
+ .gpio_pullup = GPIO_NR_PALMTE2_USB_PULLUP,
+ .gpio_pullup_inverted = 0,
+};
+
+/******************************************************************************
+ * Power supply
+ ******************************************************************************/
+static int power_supply_init(struct device *dev)
+{
+ int ret;
+
+ ret = gpio_request(GPIO_NR_PALMTE2_POWER_DETECT, "CABLE_STATE_AC");
+ if (ret)
+ goto err1;
+ ret = gpio_direction_input(GPIO_NR_PALMTE2_POWER_DETECT);
+ if (ret)
+ goto err2;
+
+ return 0;
+
+err2:
+ gpio_free(GPIO_NR_PALMTE2_POWER_DETECT);
+err1:
+ return ret;
+}
+
+static int palmte2_is_ac_online(void)
+{
+ return gpio_get_value(GPIO_NR_PALMTE2_POWER_DETECT);
+}
+
+static void power_supply_exit(struct device *dev)
+{
+ gpio_free(GPIO_NR_PALMTE2_POWER_DETECT);
+}
+
+static char *palmte2_supplicants[] = {
+ "main-battery",
+};
+
+static struct pda_power_pdata power_supply_info = {
+ .init = power_supply_init,
+ .is_ac_online = palmte2_is_ac_online,
+ .exit = power_supply_exit,
+ .supplied_to = palmte2_supplicants,
+ .num_supplicants = ARRAY_SIZE(palmte2_supplicants),
+};
+
+static struct platform_device power_supply = {
+ .name = "pda-power",
+ .id = -1,
+ .dev = {
+ .platform_data = &power_supply_info,
+ },
+};
+
+/******************************************************************************
+ * WM97xx battery
+ ******************************************************************************/
+static struct wm97xx_batt_info wm97xx_batt_pdata = {
+ .batt_aux = WM97XX_AUX_ID3,
+ .temp_aux = WM97XX_AUX_ID2,
+ .charge_gpio = -1,
+ .max_voltage = PALMTE2_BAT_MAX_VOLTAGE,
+ .min_voltage = PALMTE2_BAT_MIN_VOLTAGE,
+ .batt_mult = 1000,
+ .batt_div = 414,
+ .temp_mult = 1,
+ .temp_div = 1,
+ .batt_tech = POWER_SUPPLY_TECHNOLOGY_LIPO,
+ .batt_name = "main-batt",
+};
+
+/******************************************************************************
+ * Framebuffer
+ ******************************************************************************/
+static struct pxafb_mode_info palmte2_lcd_modes[] = {
+{
+ .pixclock = 77757,
+ .xres = 320,
+ .yres = 320,
+ .bpp = 16,
+
+ .left_margin = 28,
+ .right_margin = 7,
+ .upper_margin = 7,
+ .lower_margin = 5,
+
+ .hsync_len = 4,
+ .vsync_len = 1,
+},
+};
+
+static struct pxafb_mach_info palmte2_lcd_screen = {
+ .modes = palmte2_lcd_modes,
+ .num_modes = ARRAY_SIZE(palmte2_lcd_modes),
+ .lcd_conn = LCD_COLOR_TFT_16BPP | LCD_PCLK_EDGE_FALL,
+};
+
+/******************************************************************************
+ * Machine init
+ ******************************************************************************/
+static struct platform_device *devices[] __initdata = {
+#if defined(CONFIG_KEYBOARD_GPIO) || defined(CONFIG_KEYBOARD_GPIO_MODULE)
+ &palmte2_pxa_keys,
+#endif
+ &palmte2_backlight,
+ &power_supply,
+};
+
+/* setup udc GPIOs initial state */
+static void __init palmte2_udc_init(void)
+{
+ if (!gpio_request(GPIO_NR_PALMTE2_USB_PULLUP, "UDC Vbus")) {
+ gpio_direction_output(GPIO_NR_PALMTE2_USB_PULLUP, 1);
+ gpio_free(GPIO_NR_PALMTE2_USB_PULLUP);
+ }
+}
+
+static void __init palmte2_init(void)
+{
+ pxa2xx_mfp_config(ARRAY_AND_SIZE(palmte2_pin_config));
+
+ set_pxa_fb_info(&palmte2_lcd_screen);
+ pxa_set_mci_info(&palmte2_mci_platform_data);
+ palmte2_udc_init();
+ pxa_set_udc_info(&palmte2_udc_info);
+ pxa_set_ac97_info(NULL);
+ pxa_set_ficp_info(&palmte2_ficp_platform_data);
+ wm97xx_bat_set_pdata(&wm97xx_batt_pdata);
+
+ platform_add_devices(devices, ARRAY_SIZE(devices));
+}
+
+MACHINE_START(PALMTE2, "Palm Tungsten|E2")
+ .phys_io = 0x40000000,
+ .io_pg_offst = (io_p2v(0x40000000) >> 18) & 0xfffc,
+ .boot_params = 0xa0000100,
+ .map_io = pxa_map_io,
+ .init_irq = pxa25x_init_irq,
+ .timer = &pxa_timer,
+ .init_machine = palmte2_init
+MACHINE_END
GPIO116_GPIO, /* wifi ready */
/* MATRIX KEYPAD */
- GPIO100_KP_MKIN_0,
- GPIO101_KP_MKIN_1,
- GPIO102_KP_MKIN_2,
- GPIO97_KP_MKIN_3,
+ GPIO100_KP_MKIN_0 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO101_KP_MKIN_1 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO102_KP_MKIN_2 | WAKEUP_ON_LEVEL_HIGH,
+ GPIO97_KP_MKIN_3 | WAKEUP_ON_LEVEL_HIGH,
GPIO103_KP_MKOUT_0,
GPIO104_KP_MKOUT_1,
GPIO105_KP_MKOUT_2,
.lcd_conn = LCD_COLOR_TFT_16BPP | LCD_PCLK_EDGE_FALL,
};
+/******************************************************************************
+ * Power management - standby
+ ******************************************************************************/
+#ifdef CONFIG_PM
+static u32 *addr __initdata;
+static u32 resume[3] __initdata = {
+ 0xe3a00101, /* mov r0, #0x40000000 */
+ 0xe380060f, /* orr r0, r0, #0x00f00000 */
+ 0xe590f008, /* ldr pc, [r0, #0x08] */
+};
+
+static int __init palmtx_pm_init(void)
+{
+ int i;
+
+ /* this is where the bootloader jumps */
+ addr = phys_to_virt(PALMTX_STR_BASE);
+
+ for (i = 0; i < 3; i++)
+ addr[i] = resume[i];
+
+ return 0;
+}
+
+device_initcall(palmtx_pm_init);
+#endif
+
/******************************************************************************
* Machine init
******************************************************************************/
#include <mach/udc.h>
#include <mach/tosa_bt.h>
#include <mach/pxa2xx_spi.h>
+#include <mach/audio.h>
#include <asm/mach/arch.h>
#include <mach/tosa.h>
pxa_set_udc_info(&udc_info);
pxa_set_ficp_info(&tosa_ficp_platform_data);
pxa_set_i2c_info(NULL);
+ pxa_set_ac97_info(NULL);
platform_scoop_config = &tosa_pcmcia_config;
pxa2xx_set_spi_info(2, &pxa_ssp_master_info);
BOOTMEM_DEFAULT);
}
+ if (machine_is_palmld() || machine_is_palmtx()) {
+ reserve_bootmem_node(pgdat, 0xa0000000, 0x1000,
+ BOOTMEM_EXCLUSIVE);
+ reserve_bootmem_node(pgdat, 0xa0200000, 0x1000,
+ BOOTMEM_EXCLUSIVE);
+ }
+
+ if (machine_is_palmt5())
+ reserve_bootmem_node(pgdat, 0xa0200000, 0x1000,
+ BOOTMEM_EXCLUSIVE);
+
#ifdef CONFIG_SA1111
/*
* Because of the SA1111 DMA bug, we want to preserve our
#define __NR_dup3 1316
#define __NR_pipe2 1317
#define __NR_inotify_init1 1318
+#define __NR_preadv 1319
+#define __NR_pwritev 1320
#ifdef __KERNEL__
-#define NR_syscalls 295 /* length of syscall table */
+#define NR_syscalls 297 /* length of syscall table */
/*
* The following defines stop scripts/checksyscalls.sh from complaining about
data8 sys_dup3
data8 sys_pipe2
data8 sys_inotify_init1
+ data8 sys_preadv
+ data8 sys_pwritev // 1320
.org sys_call_table + 8*NR_syscalls // guard against failures to increase NR_syscalls
#endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */
int irq;
for (irq = 0; irq < NR_IRQS; irq++)
- if (irq_desc[irq].chip == &no_irq_type)
+ if (irq_desc[irq].chip == &no_irq_chip)
/* due to the PIC latching interrupt requests, even
* when the IRQ is disabled, IRQ_PENDING is superfluous
* and we can use handle_level_irq() for edge-triggered
#define __NR_pipe2 321
#define __NR_inotify_init1 322
#define __NR_accept4 323
+#define __NR_preadv 324
+#define __NR_pwritev 325
-#define NR_SYSCALLS 324
+#define NR_SYSCALLS 326
#ifdef __32bit_syscall_numbers__
/* Sparc 32-bit only has the "setresuid32", "getresuid32" variants,
return r;
}
-static void __init get_cells(struct device_node *dp,
- int *addrc, int *sizec)
+static void get_cells(struct device_node *dp, int *addrc, int *sizec)
{
if (addrc)
*addrc = of_n_addr_cells(dp);
upa_writeq(~(u64)0, pbm->pbm_regs + FIRE_PEC_IENAB);
}
-static int __init pci_fire_pbm_init(struct pci_pbm_info *pbm,
- struct of_device *op, u32 portid)
+static int __devinit pci_fire_pbm_init(struct pci_pbm_info *pbm,
+ struct of_device *op, u32 portid)
{
const struct linux_prom64_registers *regs;
struct device_node *dp = op->node;
pci_config_write8(addr, 64);
}
-static void __init psycho_scan_bus(struct pci_pbm_info *pbm,
- struct device *parent)
+static void __devinit psycho_scan_bus(struct pci_pbm_info *pbm,
+ struct device *parent)
{
pbm_config_busmastering(pbm);
pbm->is_66mhz_capable = 0;
#define PSYCHO_MEMSPACE_B 0x180000000UL
#define PSYCHO_MEMSPACE_SIZE 0x07fffffffUL
-static void __init psycho_pbm_init(struct pci_pbm_info *pbm,
- struct of_device *op, int is_pbm_a)
+static void __devinit psycho_pbm_init(struct pci_pbm_info *pbm,
+ struct of_device *op, int is_pbm_a)
{
psycho_pbm_init_common(pbm, op, "PSYCHO", PBM_CHIP_TYPE_PSYCHO);
psycho_pbm_strbuf_init(pbm, is_pbm_a);
}
}
-static void __init sabre_scan_bus(struct pci_pbm_info *pbm,
- struct device *parent)
+static void __devinit sabre_scan_bus(struct pci_pbm_info *pbm,
+ struct device *parent)
{
static int once;
sabre_register_error_handlers(pbm);
}
-static void __init sabre_pbm_init(struct pci_pbm_info *pbm,
- struct of_device *op)
+static void __devinit sabre_pbm_init(struct pci_pbm_info *pbm,
+ struct of_device *op)
{
psycho_pbm_init_common(pbm, op, "SABRE", PBM_CHIP_TYPE_SABRE);
pbm->pci_afsr = pbm->controller_regs + SABRE_PIOAFSR;
.sync_sg_for_cpu = dma_4v_sync_sg_for_cpu,
};
-static void __init pci_sun4v_scan_bus(struct pci_pbm_info *pbm,
- struct device *parent)
+static void __devinit pci_sun4v_scan_bus(struct pci_pbm_info *pbm,
+ struct device *parent)
{
struct property *prop;
struct device_node *dp;
/* XXX register error interrupt handlers XXX */
}
-static unsigned long __init probe_existing_entries(struct pci_pbm_info *pbm,
- struct iommu *iommu)
+static unsigned long __devinit probe_existing_entries(struct pci_pbm_info *pbm,
+ struct iommu *iommu)
{
struct iommu_arena *arena = &iommu->arena;
unsigned long i, cnt = 0;
return cnt;
}
-static int __init pci_sun4v_iommu_init(struct pci_pbm_info *pbm)
+static int __devinit pci_sun4v_iommu_init(struct pci_pbm_info *pbm)
{
static const u32 vdma_default[] = { 0x80000000, 0x80000000 };
struct iommu *iommu = pbm->iommu;
}
#endif /* !(CONFIG_PCI_MSI) */
-static int __init pci_sun4v_pbm_init(struct pci_pbm_info *pbm,
- struct of_device *op, u32 devhandle)
+static int __devinit pci_sun4v_pbm_init(struct pci_pbm_info *pbm,
+ struct of_device *op, u32 devhandle)
{
struct device_node *dp = op->node;
int err;
return IRQ_HANDLED;
}
-static int __init has_button_interrupt(unsigned int irq, struct device_node *dp)
+static int __devinit has_button_interrupt(unsigned int irq, struct device_node *dp)
{
if (irq == 0xffffffff)
return 0;
/*305*/ .long sys_set_mempolicy, sys_kexec_load, sys_move_pages, sys_getcpu, sys_epoll_pwait
/*310*/ .long sys_utimensat, sys_signalfd, sys_timerfd_create, sys_eventfd, sys_fallocate
/*315*/ .long sys_timerfd_settime, sys_timerfd_gettime, sys_signalfd4, sys_eventfd2, sys_epoll_create1
-/*320*/ .long sys_dup3, sys_pipe2, sys_inotify_init1, sys_accept4
+/*320*/ .long sys_dup3, sys_pipe2, sys_inotify_init1, sys_accept4, sys_preadv, sys_pwritev
.word compat_sys_set_mempolicy, compat_sys_kexec_load, compat_sys_move_pages, sys_getcpu, compat_sys_epoll_pwait
/*310*/ .word compat_sys_utimensat, compat_sys_signalfd, sys_timerfd_create, sys_eventfd, compat_sys_fallocate
.word compat_sys_timerfd_settime, compat_sys_timerfd_gettime, compat_sys_signalfd4, sys_eventfd2, sys_epoll_create1
-/*320*/ .word sys_dup3, sys_pipe2, sys_inotify_init1, sys_accept4
+/*320*/ .word sys_dup3, sys_pipe2, sys_inotify_init1, sys_accept4, compat_sys_preadv, compat_sys_pwritev
#endif /* CONFIG_COMPAT */
.word sys_set_mempolicy, sys_kexec_load, sys_move_pages, sys_getcpu, sys_epoll_pwait
/*310*/ .word sys_utimensat, sys_signalfd, sys_timerfd_create, sys_eventfd, sys_fallocate
.word sys_timerfd_settime, sys_timerfd_gettime, sys_signalfd4, sys_eventfd2, sys_epoll_create1
-/*320*/ .word sys_dup3, sys_pipe2, sys_inotify_init1, sys_accept4
+/*320*/ .word sys_dup3, sys_pipe2, sys_inotify_init1, sys_accept4, sys_preadv, sys_pwritev
#define MAX_BANKS 32
-static struct linux_prom64_registers pavail[MAX_BANKS] __initdata;
-static int pavail_ents __initdata;
+static struct linux_prom64_registers pavail[MAX_BANKS] __devinitdata;
+static int pavail_ents __devinitdata;
static int cmp_p64(const void *a, const void *b)
{
return nid;
}
-static void add_node_ranges(void)
+static void __init add_node_ranges(void)
{
int i;
printk("Booting Linux...\n");
}
-int __init page_in_phys_avail(unsigned long paddr)
+int __devinit page_in_phys_avail(unsigned long paddr)
{
int i;
* CPUID levels like 0x6, 0xA etc
*/
#define X86_FEATURE_IDA (7*32+ 0) /* Intel Dynamic Acceleration */
+#define X86_FEATURE_ARAT (7*32+ 1) /* Always Running APIC Timer */
/* Virtualization flags: Linux defined */
#define X86_FEATURE_TPR_SHADOW (8*32+ 0) /* Intel TPR Shadow */
{
struct clock_event_device *levt = &__get_cpu_var(lapic_events);
+ if (cpu_has(¤t_cpu_data, X86_FEATURE_ARAT)) {
+ lapic_clockevent.features &= ~CLOCK_EVT_FEAT_C3STOP;
+ /* Make LAPIC timer preferrable over percpu HPET */
+ lapic_clockevent.rating = 150;
+ }
+
memcpy(levt, &lapic_clockevent, sizeof(*levt));
levt->cpumask = cpumask_of(smp_processor_id());
static const struct cpuid_bit __cpuinitconst cpuid_bits[] = {
{ X86_FEATURE_IDA, CR_EAX, 1, 0x00000006 },
+ { X86_FEATURE_ARAT, CR_EAX, 2, 0x00000006 },
{ 0, 0, 0, 0 }
};
unsigned int max_freq;
unsigned int resume;
unsigned int cpu_feature;
+ u64 saved_aperf, saved_mperf;
};
static DEFINE_PER_CPU(struct acpi_cpufreq_data *, drv_data);
return cmd.val;
}
-struct perf_cur {
+struct perf_pair {
union {
struct {
u32 lo;
u32 hi;
} split;
u64 whole;
- } aperf_cur, mperf_cur;
+ } aperf, mperf;
};
static long read_measured_perf_ctrs(void *_cur)
{
- struct perf_cur *cur = _cur;
+ struct perf_pair *cur = _cur;
- rdmsr(MSR_IA32_APERF, cur->aperf_cur.split.lo, cur->aperf_cur.split.hi);
- rdmsr(MSR_IA32_MPERF, cur->mperf_cur.split.lo, cur->mperf_cur.split.hi);
-
- wrmsr(MSR_IA32_APERF, 0, 0);
- wrmsr(MSR_IA32_MPERF, 0, 0);
+ rdmsr(MSR_IA32_APERF, cur->aperf.split.lo, cur->aperf.split.hi);
+ rdmsr(MSR_IA32_MPERF, cur->mperf.split.lo, cur->mperf.split.hi);
return 0;
}
static unsigned int get_measured_perf(struct cpufreq_policy *policy,
unsigned int cpu)
{
- struct perf_cur cur;
+ struct perf_pair readin, cur;
unsigned int perf_percent;
unsigned int retval;
- if (!work_on_cpu(cpu, read_measured_perf_ctrs, &cur))
+ if (!work_on_cpu(cpu, read_measured_perf_ctrs, &readin))
return 0;
+ cur.aperf.whole = readin.aperf.whole -
+ per_cpu(drv_data, cpu)->saved_aperf;
+ cur.mperf.whole = readin.mperf.whole -
+ per_cpu(drv_data, cpu)->saved_mperf;
+ per_cpu(drv_data, cpu)->saved_aperf = readin.aperf.whole;
+ per_cpu(drv_data, cpu)->saved_mperf = readin.mperf.whole;
+
#ifdef __i386__
/*
* We dont want to do 64 bit divide with 32 bit kernel
* Get an approximate value. Return failure in case we cannot get
* an approximate value.
*/
- if (unlikely(cur.aperf_cur.split.hi || cur.mperf_cur.split.hi)) {
+ if (unlikely(cur.aperf.split.hi || cur.mperf.split.hi)) {
int shift_count;
u32 h;
- h = max_t(u32, cur.aperf_cur.split.hi, cur.mperf_cur.split.hi);
+ h = max_t(u32, cur.aperf.split.hi, cur.mperf.split.hi);
shift_count = fls(h);
- cur.aperf_cur.whole >>= shift_count;
- cur.mperf_cur.whole >>= shift_count;
+ cur.aperf.whole >>= shift_count;
+ cur.mperf.whole >>= shift_count;
}
- if (((unsigned long)(-1) / 100) < cur.aperf_cur.split.lo) {
+ if (((unsigned long)(-1) / 100) < cur.aperf.split.lo) {
int shift_count = 7;
- cur.aperf_cur.split.lo >>= shift_count;
- cur.mperf_cur.split.lo >>= shift_count;
+ cur.aperf.split.lo >>= shift_count;
+ cur.mperf.split.lo >>= shift_count;
}
- if (cur.aperf_cur.split.lo && cur.mperf_cur.split.lo)
- perf_percent = (cur.aperf_cur.split.lo * 100) /
- cur.mperf_cur.split.lo;
+ if (cur.aperf.split.lo && cur.mperf.split.lo)
+ perf_percent = (cur.aperf.split.lo * 100) / cur.mperf.split.lo;
else
perf_percent = 0;
#else
- if (unlikely(((unsigned long)(-1) / 100) < cur.aperf_cur.whole)) {
+ if (unlikely(((unsigned long)(-1) / 100) < cur.aperf.whole)) {
int shift_count = 7;
- cur.aperf_cur.whole >>= shift_count;
- cur.mperf_cur.whole >>= shift_count;
+ cur.aperf.whole >>= shift_count;
+ cur.mperf.whole >>= shift_count;
}
- if (cur.aperf_cur.whole && cur.mperf_cur.whole)
- perf_percent = (cur.aperf_cur.whole * 100) /
- cur.mperf_cur.whole;
+ if (cur.aperf.whole && cur.mperf.whole)
+ perf_percent = (cur.aperf.whole * 100) / cur.mperf.whole;
else
perf_percent = 0;
#include <linux/timex.h>
#include <linux/io.h>
#include <linux/acpi.h>
-#include <linux/kernel.h>
#include <asm/msr.h>
#include <acpi/processor.h>
#include <linux/init.h>
#include <linux/list.h>
+#include <trace/syscall.h>
+
#include <asm/cacheflush.h>
#include <asm/ftrace.h>
#include <asm/nops.h>
#include <linux/audit.h>
#include <linux/seccomp.h>
#include <linux/signal.h>
-#include <linux/ftrace.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
#include <asm/proto.h>
#include <asm/ds.h>
+#include <trace/syscall.h>
+
#include "tls.h"
enum x86_regset {
{"PIT2", 0x0048, 0x004B, ACPI_OSI_WIN_XP},
{"RTC", 0x0070, 0x0071, ACPI_OSI_WIN_XP},
{"CMOS", 0x0074, 0x0076, ACPI_OSI_WIN_XP},
- {"DMA1", 0x0081, 0x0083, ACPI_OSI_WIN_XP},
{"DMA1L", 0x0087, 0x0087, ACPI_OSI_WIN_XP},
{"DMA2", 0x0089, 0x008B, ACPI_OSI_WIN_XP},
{"DMA2L", 0x008F, 0x008F, ACPI_OSI_WIN_XP},
},
};
-static void __init acpi_battery_init_async(void *unused, async_cookie_t cookie)
+static void acpi_battery_init_async(void *unused, async_cookie_t cookie)
{
if (acpi_disabled)
return;
}
#endif /* HAVE_ACPI_LEGACY_ALARM */
-extern struct list_head acpi_wakeup_device_list;
-extern spinlock_t acpi_device_lock;
-
static int
acpi_system_wakeup_device_seq_show(struct seq_file *seq, void *offset)
{
seq_printf(seq, "Device\tS-state\t Status Sysfs node\n");
- spin_lock(&acpi_device_lock);
+ mutex_lock(&acpi_device_lock);
list_for_each_safe(node, next, &acpi_wakeup_device_list) {
struct acpi_device *dev =
container_of(node, struct acpi_device, wakeup_list);
if (!dev->wakeup.flags.valid)
continue;
- spin_unlock(&acpi_device_lock);
ldev = acpi_get_physical_device(dev->handle);
seq_printf(seq, "%s\t S%d\t%c%-8s ",
seq_printf(seq, "\n");
put_device(ldev);
- spin_lock(&acpi_device_lock);
}
- spin_unlock(&acpi_device_lock);
+ mutex_unlock(&acpi_device_lock);
return 0;
}
strbuf[len] = '\0';
sscanf(strbuf, "%s", str);
- spin_lock(&acpi_device_lock);
+ mutex_lock(&acpi_device_lock);
list_for_each_safe(node, next, &acpi_wakeup_device_list) {
struct acpi_device *dev =
container_of(node, struct acpi_device, wakeup_list);
}
}
}
- spin_unlock(&acpi_device_lock);
+ mutex_unlock(&acpi_device_lock);
return count;
}
struct acpi_processor_power *pwr = &pr->power;
u8 type = local_apic_timer_c2_ok ? ACPI_STATE_C3 : ACPI_STATE_C2;
+ if (cpu_has(&cpu_data(pr->id), X86_FEATURE_ARAT))
+ return;
+
/*
* Check, if one of the previous states already marked the lapic
* unstable
static LIST_HEAD(acpi_device_list);
static LIST_HEAD(acpi_bus_id_list);
-DEFINE_SPINLOCK(acpi_device_lock);
+DEFINE_MUTEX(acpi_device_lock);
LIST_HEAD(acpi_wakeup_device_list);
struct acpi_device_bus_id{
*/
INIT_LIST_HEAD(&device->children);
INIT_LIST_HEAD(&device->node);
- INIT_LIST_HEAD(&device->g_list);
INIT_LIST_HEAD(&device->wakeup_list);
new_bus_id = kzalloc(sizeof(struct acpi_device_bus_id), GFP_KERNEL);
return -ENOMEM;
}
- spin_lock(&acpi_device_lock);
+ mutex_lock(&acpi_device_lock);
/*
* Find suitable bus_id and instance number in acpi_bus_id_list
* If failed, create one and link it into acpi_bus_id_list
}
dev_set_name(&device->dev, "%s:%02x", acpi_device_bus_id->bus_id, acpi_device_bus_id->instance_no);
- if (device->parent) {
+ if (device->parent)
list_add_tail(&device->node, &device->parent->children);
- list_add_tail(&device->g_list, &device->parent->g_list);
- } else
- list_add_tail(&device->g_list, &acpi_device_list);
+
if (device->wakeup.flags.valid)
list_add_tail(&device->wakeup_list, &acpi_wakeup_device_list);
- spin_unlock(&acpi_device_lock);
+ mutex_unlock(&acpi_device_lock);
if (device->parent)
device->dev.parent = &parent->dev;
device->removal_type = ACPI_BUS_REMOVAL_NORMAL;
return 0;
end:
- spin_lock(&acpi_device_lock);
- if (device->parent) {
+ mutex_lock(&acpi_device_lock);
+ if (device->parent)
list_del(&device->node);
- list_del(&device->g_list);
- } else
- list_del(&device->g_list);
list_del(&device->wakeup_list);
- spin_unlock(&acpi_device_lock);
+ mutex_unlock(&acpi_device_lock);
return result;
}
static void acpi_device_unregister(struct acpi_device *device, int type)
{
- spin_lock(&acpi_device_lock);
- if (device->parent) {
+ mutex_lock(&acpi_device_lock);
+ if (device->parent)
list_del(&device->node);
- list_del(&device->g_list);
- } else
- list_del(&device->g_list);
list_del(&device->wakeup_list);
- spin_unlock(&acpi_device_lock);
+ mutex_unlock(&acpi_device_lock);
acpi_detach_data(device->handle, acpi_bus_data_handler);
extern void acpi_enable_wakeup_device_prep(u8 sleep_state);
extern void acpi_enable_wakeup_device(u8 sleep_state);
extern void acpi_disable_wakeup_device(u8 sleep_state);
+
+extern struct list_head acpi_wakeup_device_list;
+extern struct mutex acpi_device_lock;
static int acpi_thermal_add(struct acpi_device *device);
static int acpi_thermal_remove(struct acpi_device *device, int type);
static int acpi_thermal_resume(struct acpi_device *device);
+static void acpi_thermal_notify(struct acpi_device *device, u32 event);
static int acpi_thermal_state_open_fs(struct inode *inode, struct file *file);
static int acpi_thermal_temp_open_fs(struct inode *inode, struct file *file);
static int acpi_thermal_trip_open_fs(struct inode *inode, struct file *file);
.add = acpi_thermal_add,
.remove = acpi_thermal_remove,
.resume = acpi_thermal_resume,
+ .notify = acpi_thermal_notify,
},
};
struct acpi_handle_list devices;
struct thermal_zone_device *thermal_zone;
int tz_enabled;
+ int kelvin_offset;
struct mutex lock;
};
}
/* sys I/F for generic thermal sysfs support */
-#define KELVIN_TO_MILLICELSIUS(t) (t * 100 - 273200)
+#define KELVIN_TO_MILLICELSIUS(t, off) (((t) - (off)) * 100)
static int thermal_get_temp(struct thermal_zone_device *thermal,
unsigned long *temp)
if (result)
return result;
- *temp = KELVIN_TO_MILLICELSIUS(tz->temperature);
+ *temp = KELVIN_TO_MILLICELSIUS(tz->temperature, tz->kelvin_offset);
return 0;
}
if (tz->trips.critical.flags.valid) {
if (!trip) {
*temp = KELVIN_TO_MILLICELSIUS(
- tz->trips.critical.temperature);
+ tz->trips.critical.temperature,
+ tz->kelvin_offset);
return 0;
}
trip--;
if (tz->trips.hot.flags.valid) {
if (!trip) {
*temp = KELVIN_TO_MILLICELSIUS(
- tz->trips.hot.temperature);
+ tz->trips.hot.temperature,
+ tz->kelvin_offset);
return 0;
}
trip--;
if (tz->trips.passive.flags.valid) {
if (!trip) {
*temp = KELVIN_TO_MILLICELSIUS(
- tz->trips.passive.temperature);
+ tz->trips.passive.temperature,
+ tz->kelvin_offset);
return 0;
}
trip--;
tz->trips.active[i].flags.valid; i++) {
if (!trip) {
*temp = KELVIN_TO_MILLICELSIUS(
- tz->trips.active[i].temperature);
+ tz->trips.active[i].temperature,
+ tz->kelvin_offset);
return 0;
}
trip--;
if (tz->trips.critical.flags.valid) {
*temperature = KELVIN_TO_MILLICELSIUS(
- tz->trips.critical.temperature);
+ tz->trips.critical.temperature,
+ tz->kelvin_offset);
return 0;
} else
return -EINVAL;
Driver Interface
-------------------------------------------------------------------------- */
-static void acpi_thermal_notify(acpi_handle handle, u32 event, void *data)
+static void acpi_thermal_notify(struct acpi_device *device, u32 event)
{
- struct acpi_thermal *tz = data;
- struct acpi_device *device = NULL;
+ struct acpi_thermal *tz = acpi_driver_data(device);
if (!tz)
return;
- device = tz->device;
-
switch (event) {
case ACPI_THERMAL_NOTIFY_TEMPERATURE:
acpi_thermal_check(tz);
"Unsupported event [0x%x]\n", event));
break;
}
-
- return;
}
static int acpi_thermal_get_info(struct acpi_thermal *tz)
return 0;
}
+/*
+ * The exact offset between Kelvin and degree Celsius is 273.15. However ACPI
+ * handles temperature values with a single decimal place. As a consequence,
+ * some implementations use an offset of 273.1 and others use an offset of
+ * 273.2. Try to find out which one is being used, to present the most
+ * accurate and visually appealing number.
+ *
+ * The heuristic below should work for all ACPI thermal zones which have a
+ * critical trip point with a value being a multiple of 0.5 degree Celsius.
+ */
+static void acpi_thermal_guess_offset(struct acpi_thermal *tz)
+{
+ if (tz->trips.critical.flags.valid &&
+ (tz->trips.critical.temperature % 5) == 1)
+ tz->kelvin_offset = 2731;
+ else
+ tz->kelvin_offset = 2732;
+}
+
static int acpi_thermal_add(struct acpi_device *device)
{
int result = 0;
- acpi_status status = AE_OK;
struct acpi_thermal *tz = NULL;
if (result)
goto free_memory;
+ acpi_thermal_guess_offset(tz);
+
result = acpi_thermal_register_thermal_zone(tz);
if (result)
goto free_memory;
if (result)
goto unregister_thermal_zone;
- status = acpi_install_notify_handler(device->handle,
- ACPI_DEVICE_NOTIFY,
- acpi_thermal_notify, tz);
- if (ACPI_FAILURE(status)) {
- result = -ENODEV;
- goto remove_fs;
- }
-
printk(KERN_INFO PREFIX "%s [%s] (%ld C)\n",
acpi_device_name(device), acpi_device_bid(device),
KELVIN_TO_CELSIUS(tz->temperature));
goto end;
-remove_fs:
- acpi_thermal_remove_fs(device);
unregister_thermal_zone:
thermal_zone_device_unregister(tz->thermal_zone);
free_memory:
static int acpi_thermal_remove(struct acpi_device *device, int type)
{
- acpi_status status = AE_OK;
struct acpi_thermal *tz = NULL;
if (!device || !acpi_driver_data(device))
tz = acpi_driver_data(device);
- status = acpi_remove_notify_handler(device->handle,
- ACPI_DEVICE_NOTIFY,
- acpi_thermal_notify);
-
acpi_thermal_remove_fs(device);
acpi_thermal_unregister_thermal_zone(tz);
mutex_destroy(&tz->lock);
static int acpi_video_bus_add(struct acpi_device *device);
static int acpi_video_bus_remove(struct acpi_device *device, int type);
static int acpi_video_resume(struct acpi_device *device);
+static void acpi_video_bus_notify(struct acpi_device *device, u32 event);
static const struct acpi_device_id video_device_ids[] = {
{ACPI_VIDEO_HID, 0},
.add = acpi_video_bus_add,
.remove = acpi_video_bus_remove,
.resume = acpi_video_resume,
+ .notify = acpi_video_bus_notify,
},
};
return acpi_video_bus_DOS(video, 0, 1);
}
-static void acpi_video_bus_notify(acpi_handle handle, u32 event, void *data)
+static void acpi_video_bus_notify(struct acpi_device *device, u32 event)
{
- struct acpi_video_bus *video = data;
- struct acpi_device *device = NULL;
+ struct acpi_video_bus *video = acpi_driver_data(device);
struct input_dev *input;
int keycode;
if (!video)
return;
- device = video->device;
input = video->input;
switch (event) {
static int acpi_video_bus_add(struct acpi_device *device)
{
- acpi_status status;
struct acpi_video_bus *video;
struct input_dev *input;
int error;
acpi_video_bus_get_devices(video, device);
acpi_video_bus_start_devices(video);
- status = acpi_install_notify_handler(device->handle,
- ACPI_DEVICE_NOTIFY,
- acpi_video_bus_notify, video);
- if (ACPI_FAILURE(status)) {
- printk(KERN_ERR PREFIX
- "Error installing notify handler\n");
- error = -ENODEV;
- goto err_stop_video;
- }
-
video->input = input = input_allocate_device();
if (!input) {
error = -ENOMEM;
- goto err_uninstall_notify;
+ goto err_stop_video;
}
snprintf(video->phys, sizeof(video->phys),
err_free_input_dev:
input_free_device(input);
- err_uninstall_notify:
- acpi_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
- acpi_video_bus_notify);
err_stop_video:
acpi_video_bus_stop_devices(video);
acpi_video_bus_put_devices(video);
static int acpi_video_bus_remove(struct acpi_device *device, int type)
{
- acpi_status status = 0;
struct acpi_video_bus *video = NULL;
video = acpi_driver_data(device);
acpi_video_bus_stop_devices(video);
-
- status = acpi_remove_notify_handler(video->device->handle,
- ACPI_DEVICE_NOTIFY,
- acpi_video_bus_notify);
-
acpi_video_bus_put_devices(video);
acpi_video_bus_remove_fs(device);
#include "internal.h"
#include "sleep.h"
+/*
+ * We didn't lock acpi_device_lock in the file, because it invokes oops in
+ * suspend/resume and isn't really required as this is called in S-state. At
+ * that time, there is no device hotplug
+ **/
#define _COMPONENT ACPI_SYSTEM_COMPONENT
ACPI_MODULE_NAME("wakeup_devices")
-extern struct list_head acpi_wakeup_device_list;
-extern spinlock_t acpi_device_lock;
-
/**
* acpi_enable_wakeup_device_prep - prepare wakeup devices
* @sleep_state: ACPI state
{
struct list_head *node, *next;
- spin_lock(&acpi_device_lock);
list_for_each_safe(node, next, &acpi_wakeup_device_list) {
struct acpi_device *dev = container_of(node,
struct acpi_device,
(sleep_state > (u32) dev->wakeup.sleep_state))
continue;
- spin_unlock(&acpi_device_lock);
acpi_enable_wakeup_device_power(dev, sleep_state);
- spin_lock(&acpi_device_lock);
}
- spin_unlock(&acpi_device_lock);
}
/**
* Caution: this routine must be invoked when interrupt is disabled
* Refer ACPI2.0: P212
*/
- spin_lock(&acpi_device_lock);
list_for_each_safe(node, next, &acpi_wakeup_device_list) {
struct acpi_device *dev =
container_of(node, struct acpi_device, wakeup_list);
if ((!dev->wakeup.state.enabled && !dev->wakeup.flags.prepared)
|| sleep_state > (u32) dev->wakeup.sleep_state) {
if (dev->wakeup.flags.run_wake) {
- spin_unlock(&acpi_device_lock);
/* set_gpe_type will disable GPE, leave it like that */
acpi_set_gpe_type(dev->wakeup.gpe_device,
dev->wakeup.gpe_number,
ACPI_GPE_TYPE_RUNTIME);
- spin_lock(&acpi_device_lock);
}
continue;
}
- spin_unlock(&acpi_device_lock);
if (!dev->wakeup.flags.run_wake)
acpi_enable_gpe(dev->wakeup.gpe_device,
dev->wakeup.gpe_number);
- spin_lock(&acpi_device_lock);
}
- spin_unlock(&acpi_device_lock);
}
/**
{
struct list_head *node, *next;
- spin_lock(&acpi_device_lock);
list_for_each_safe(node, next, &acpi_wakeup_device_list) {
struct acpi_device *dev =
container_of(node, struct acpi_device, wakeup_list);
if ((!dev->wakeup.state.enabled && !dev->wakeup.flags.prepared)
|| sleep_state > (u32) dev->wakeup.sleep_state) {
if (dev->wakeup.flags.run_wake) {
- spin_unlock(&acpi_device_lock);
acpi_set_gpe_type(dev->wakeup.gpe_device,
dev->wakeup.gpe_number,
ACPI_GPE_TYPE_WAKE_RUN);
/* Re-enable it, since set_gpe_type will disable it */
acpi_enable_gpe(dev->wakeup.gpe_device,
dev->wakeup.gpe_number);
- spin_lock(&acpi_device_lock);
}
continue;
}
- spin_unlock(&acpi_device_lock);
acpi_disable_wakeup_device_power(dev);
/* Never disable run-wake GPE */
if (!dev->wakeup.flags.run_wake) {
acpi_clear_gpe(dev->wakeup.gpe_device,
dev->wakeup.gpe_number, ACPI_NOT_ISR);
}
- spin_lock(&acpi_device_lock);
}
- spin_unlock(&acpi_device_lock);
}
int __init acpi_wakeup_device_init(void)
{
struct list_head *node, *next;
- spin_lock(&acpi_device_lock);
+ mutex_lock(&acpi_device_lock);
list_for_each_safe(node, next, &acpi_wakeup_device_list) {
struct acpi_device *dev = container_of(node,
struct acpi_device,
/* In case user doesn't load button driver */
if (!dev->wakeup.flags.run_wake || dev->wakeup.state.enabled)
continue;
- spin_unlock(&acpi_device_lock);
acpi_set_gpe_type(dev->wakeup.gpe_device,
dev->wakeup.gpe_number,
ACPI_GPE_TYPE_WAKE_RUN);
acpi_enable_gpe(dev->wakeup.gpe_device,
dev->wakeup.gpe_number);
dev->wakeup.state.enabled = 1;
- spin_lock(&acpi_device_lock);
}
- spin_unlock(&acpi_device_lock);
+ mutex_unlock(&acpi_device_lock);
return 0;
}
return dm_table_complete(table);
}
+static int table_prealloc_integrity(struct dm_table *t,
+ struct mapped_device *md)
+{
+ struct list_head *devices = dm_table_get_devices(t);
+ struct dm_dev_internal *dd;
+
+ list_for_each_entry(dd, devices, list)
+ if (bdev_get_integrity(dd->dm_dev.bdev))
+ return blk_integrity_register(dm_disk(md), NULL);
+
+ return 0;
+}
+
static int table_load(struct dm_ioctl *param, size_t param_size)
{
int r;
goto out;
}
+ r = table_prealloc_integrity(t, md);
+ if (r) {
+ DMERR("%s: could not register integrity profile.",
+ dm_device_name(md));
+ dm_table_destroy(t);
+ goto out;
+ }
+
down_write(&_hash_lock);
hc = dm_get_mdptr(md);
if (!hc || hc->md != md) {
dm_kcopyd_notify_fn fn = job->fn;
struct dm_kcopyd_client *kc = job->kc;
- kcopyd_put_pages(kc, job->pages);
+ if (job->pages)
+ kcopyd_put_pages(kc, job->pages);
mempool_free(job, kc->job_pool);
fn(read_err, write_err, context);
sector_t progress = 0;
sector_t count = 0;
struct kcopyd_job *job = (struct kcopyd_job *) context;
+ struct dm_kcopyd_client *kc = job->kc;
mutex_lock(&job->lock);
if (count) {
int i;
- struct kcopyd_job *sub_job = mempool_alloc(job->kc->job_pool,
+ struct kcopyd_job *sub_job = mempool_alloc(kc->job_pool,
GFP_NOIO);
*sub_job = *job;
} else if (atomic_dec_and_test(&job->sub_jobs)) {
/*
- * To avoid a race we must keep the job around
- * until after the notify function has completed.
- * Otherwise the client may try and stop the job
- * after we've completed.
+ * Queue the completion callback to the kcopyd thread.
+ *
+ * Some callers assume that all the completions are called
+ * from a single thread and don't race with each other.
+ *
+ * We must not call the callback directly here because this
+ * code may not be executing in the thread.
*/
- job->fn(read_err, write_err, job->context);
- mempool_free(job, job->kc->job_pool);
+ push(&kc->complete_jobs, job);
+ wake(kc);
}
}
{
int i;
+ atomic_inc(&job->kc->nr_jobs);
+
atomic_set(&job->sub_jobs, SPLIT_COUNT);
for (i = 0; i < SPLIT_COUNT; i++)
segment_complete(0, 0u, job);
.status = linear_status,
.ioctl = linear_ioctl,
.merge = linear_merge,
- .features = DM_TARGET_SUPPORTS_BARRIERS,
};
int __init dm_linear_init(void)
sector_t *highs;
struct dm_target *targets;
- unsigned barriers_supported:1;
-
/*
* Indicates the rw permissions for the new logical
* device. This should be a combination of FMODE_READ
INIT_LIST_HEAD(&t->devices);
atomic_set(&t->holders, 0);
- t->barriers_supported = 1;
if (!num_targets)
num_targets = KEYS_PER_NODE;
/* FIXME: the plan is to combine high here and then have
* the merge fn apply the target level restrictions. */
combine_restrictions_low(&t->limits, &tgt->limits);
-
- if (!(tgt->type->features & DM_TARGET_SUPPORTS_BARRIERS))
- t->barriers_supported = 0;
-
return 0;
bad:
check_for_valid_limits(&t->limits);
- /*
- * We only support barriers if there is exactly one underlying device.
- */
- if (!list_is_singular(&t->devices))
- t->barriers_supported = 0;
-
/* how many indexes will the btree have ? */
leaf_nodes = dm_div_up(t->num_targets, KEYS_PER_NODE);
t->depth = 1 + int_log(leaf_nodes, CHILDREN_PER_NODE);
return &t->targets[(KEYS_PER_NODE * n) + k];
}
+/*
+ * Set the integrity profile for this device if all devices used have
+ * matching profiles.
+ */
+static void dm_table_set_integrity(struct dm_table *t)
+{
+ struct list_head *devices = dm_table_get_devices(t);
+ struct dm_dev_internal *prev = NULL, *dd = NULL;
+
+ if (!blk_get_integrity(dm_disk(t->md)))
+ return;
+
+ list_for_each_entry(dd, devices, list) {
+ if (prev &&
+ blk_integrity_compare(prev->dm_dev.bdev->bd_disk,
+ dd->dm_dev.bdev->bd_disk) < 0) {
+ DMWARN("%s: integrity not set: %s and %s mismatch",
+ dm_device_name(t->md),
+ prev->dm_dev.bdev->bd_disk->disk_name,
+ dd->dm_dev.bdev->bd_disk->disk_name);
+ goto no_integrity;
+ }
+ prev = dd;
+ }
+
+ if (!prev || !bdev_get_integrity(prev->dm_dev.bdev))
+ goto no_integrity;
+
+ blk_integrity_register(dm_disk(t->md),
+ bdev_get_integrity(prev->dm_dev.bdev));
+
+ return;
+
+no_integrity:
+ blk_integrity_register(dm_disk(t->md), NULL);
+
+ return;
+}
+
void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q)
{
/*
else
queue_flag_set_unlocked(QUEUE_FLAG_CLUSTER, q);
+ dm_table_set_integrity(t);
}
unsigned int dm_table_get_num_targets(struct dm_table *t)
return t->md;
}
-int dm_table_barrier_ok(struct dm_table *t)
-{
- return t->barriers_supported;
-}
-EXPORT_SYMBOL(dm_table_barrier_ok);
-
EXPORT_SYMBOL(dm_vcalloc);
EXPORT_SYMBOL(dm_get_device);
EXPORT_SYMBOL(dm_put_device);
/*
* Bits for the md->flags field.
*/
-#define DMF_BLOCK_IO 0
+#define DMF_BLOCK_IO_FOR_SUSPEND 0
#define DMF_SUSPENDED 1
#define DMF_FROZEN 2
#define DMF_FREEING 3
#define DMF_DELETING 4
#define DMF_NOFLUSH_SUSPENDING 5
+#define DMF_QUEUE_IO_TO_THREAD 6
/*
* Work processed by per-device workqueue.
struct bio_list deferred;
spinlock_t deferred_lock;
+ /*
+ * An error from the barrier request currently being processed.
+ */
+ int barrier_error;
+
/*
* Processing queue (flush/barriers)
*/
part_stat_add(cpu, &dm_disk(md)->part0, ticks[rw], duration);
part_stat_unlock();
+ /*
+ * After this is decremented the bio must not be touched if it is
+ * a barrier.
+ */
dm_disk(md)->part0.in_flight = pending =
atomic_dec_return(&md->pending);
/*
* Add the bio to the list of deferred io.
*/
-static int queue_io(struct mapped_device *md, struct bio *bio)
+static void queue_io(struct mapped_device *md, struct bio *bio)
{
down_write(&md->io_lock);
- if (!test_bit(DMF_BLOCK_IO, &md->flags)) {
- up_write(&md->io_lock);
- return 1;
- }
-
spin_lock_irq(&md->deferred_lock);
bio_list_add(&md->deferred, bio);
spin_unlock_irq(&md->deferred_lock);
+ if (!test_and_set_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags))
+ queue_work(md->wq, &md->work);
+
up_write(&md->io_lock);
- return 0; /* deferred successfully */
}
/*
*/
spin_lock_irqsave(&md->deferred_lock, flags);
if (__noflush_suspending(md))
- bio_list_add(&md->deferred, io->bio);
+ bio_list_add_head(&md->deferred, io->bio);
else
/* noflush suspend was interrupted. */
io->error = -EIO;
spin_unlock_irqrestore(&md->deferred_lock, flags);
}
- end_io_acct(io);
-
io_error = io->error;
bio = io->bio;
- free_io(md, io);
+ if (bio_barrier(bio)) {
+ /*
+ * There can be just one barrier request so we use
+ * a per-device variable for error reporting.
+ * Note that you can't touch the bio after end_io_acct
+ */
+ md->barrier_error = io_error;
+ end_io_acct(io);
+ } else {
+ end_io_acct(io);
- if (io_error != DM_ENDIO_REQUEUE) {
- trace_block_bio_complete(md->queue, bio);
+ if (io_error != DM_ENDIO_REQUEUE) {
+ trace_block_bio_complete(md->queue, bio);
- bio_endio(bio, io_error);
+ bio_endio(bio, io_error);
+ }
}
+
+ free_io(md, io);
}
}
clone->bi_sector = sector;
clone->bi_bdev = bio->bi_bdev;
- clone->bi_rw = bio->bi_rw;
+ clone->bi_rw = bio->bi_rw & ~(1 << BIO_RW_BARRIER);
clone->bi_vcnt = 1;
clone->bi_size = to_bytes(len);
clone->bi_io_vec->bv_offset = offset;
clone->bi_io_vec->bv_len = clone->bi_size;
clone->bi_flags |= 1 << BIO_CLONED;
+ if (bio_integrity(bio)) {
+ bio_integrity_clone(clone, bio, GFP_NOIO);
+ bio_integrity_trim(clone,
+ bio_sector_offset(bio, idx, offset), len);
+ }
+
return clone;
}
clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
__bio_clone(clone, bio);
+ clone->bi_rw &= ~(1 << BIO_RW_BARRIER);
clone->bi_destructor = dm_bio_destructor;
clone->bi_sector = sector;
clone->bi_idx = idx;
clone->bi_size = to_bytes(len);
clone->bi_flags &= ~(1 << BIO_SEG_VALID);
+ if (bio_integrity(bio)) {
+ bio_integrity_clone(clone, bio, GFP_NOIO);
+
+ if (idx != bio->bi_idx || clone->bi_size < bio->bi_size)
+ bio_integrity_trim(clone,
+ bio_sector_offset(bio, idx, 0), len);
+ }
+
return clone;
}
ci.map = dm_get_table(md);
if (unlikely(!ci.map)) {
- bio_io_error(bio);
- return;
- }
- if (unlikely(bio_barrier(bio) && !dm_table_barrier_ok(ci.map))) {
- dm_table_put(ci.map);
- bio_endio(bio, -EOPNOTSUPP);
+ if (!bio_barrier(bio))
+ bio_io_error(bio);
+ else
+ md->barrier_error = -EIO;
return;
}
+
ci.md = md;
ci.bio = bio;
ci.io = alloc_io(md);
*/
static int dm_request(struct request_queue *q, struct bio *bio)
{
- int r = -EIO;
int rw = bio_data_dir(bio);
struct mapped_device *md = q->queuedata;
int cpu;
part_stat_unlock();
/*
- * If we're suspended we have to queue
- * this io for later.
+ * If we're suspended or the thread is processing barriers
+ * we have to queue this io for later.
*/
- while (test_bit(DMF_BLOCK_IO, &md->flags)) {
+ if (unlikely(test_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags)) ||
+ unlikely(bio_barrier(bio))) {
up_read(&md->io_lock);
- if (bio_rw(bio) != READA)
- r = queue_io(md, bio);
+ if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) &&
+ bio_rw(bio) == READA) {
+ bio_io_error(bio);
+ return 0;
+ }
- if (r <= 0)
- goto out_req;
+ queue_io(md, bio);
- /*
- * We're in a while loop, because someone could suspend
- * before we get to the following read lock.
- */
- down_read(&md->io_lock);
+ return 0;
}
__split_and_process_bio(md, bio);
up_read(&md->io_lock);
return 0;
-
-out_req:
- if (r < 0)
- bio_io_error(bio);
-
- return 0;
}
static void dm_unplug_all(struct request_queue *q)
struct mapped_device *md = congested_data;
struct dm_table *map;
- if (!test_bit(DMF_BLOCK_IO, &md->flags)) {
+ if (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
map = dm_get_table(md);
if (map) {
r = dm_table_any_congested(map, bdi_bits);
mempool_destroy(md->tio_pool);
mempool_destroy(md->io_pool);
bioset_free(md->bs);
+ blk_integrity_unregister(md->disk);
del_gendisk(md->disk);
free_minor(minor);
return r;
}
+static int dm_flush(struct mapped_device *md)
+{
+ dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
+ return 0;
+}
+
+static void process_barrier(struct mapped_device *md, struct bio *bio)
+{
+ int error = dm_flush(md);
+
+ if (unlikely(error)) {
+ bio_endio(bio, error);
+ return;
+ }
+ if (bio_empty_barrier(bio)) {
+ bio_endio(bio, 0);
+ return;
+ }
+
+ __split_and_process_bio(md, bio);
+
+ error = dm_flush(md);
+
+ if (!error && md->barrier_error)
+ error = md->barrier_error;
+
+ if (md->barrier_error != DM_ENDIO_REQUEUE)
+ bio_endio(bio, error);
+}
+
/*
* Process the deferred bios
*/
down_write(&md->io_lock);
-next_bio:
- spin_lock_irq(&md->deferred_lock);
- c = bio_list_pop(&md->deferred);
- spin_unlock_irq(&md->deferred_lock);
+ while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
+ spin_lock_irq(&md->deferred_lock);
+ c = bio_list_pop(&md->deferred);
+ spin_unlock_irq(&md->deferred_lock);
- if (c) {
- __split_and_process_bio(md, c);
- goto next_bio;
- }
+ if (!c) {
+ clear_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags);
+ break;
+ }
- clear_bit(DMF_BLOCK_IO, &md->flags);
+ up_write(&md->io_lock);
+
+ if (bio_barrier(c))
+ process_barrier(md, c);
+ else
+ __split_and_process_bio(md, c);
+
+ down_write(&md->io_lock);
+ }
up_write(&md->io_lock);
}
static void dm_queue_flush(struct mapped_device *md)
{
+ clear_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
+ smp_mb__after_clear_bit();
queue_work(md->wq, &md->work);
- flush_workqueue(md->wq);
}
/*
}
/*
- * First we set the BLOCK_IO flag so no more ios will be mapped.
+ * Here we must make sure that no processes are submitting requests
+ * to target drivers i.e. no one may be executing
+ * __split_and_process_bio. This is called from dm_request and
+ * dm_wq_work.
+ *
+ * To get all processes out of __split_and_process_bio in dm_request,
+ * we take the write lock. To prevent any process from reentering
+ * __split_and_process_bio from dm_request, we set
+ * DMF_QUEUE_IO_TO_THREAD.
+ *
+ * To quiesce the thread (dm_wq_work), we set DMF_BLOCK_IO_FOR_SUSPEND
+ * and call flush_workqueue(md->wq). flush_workqueue will wait until
+ * dm_wq_work exits and DMF_BLOCK_IO_FOR_SUSPEND will prevent any
+ * further calls to __split_and_process_bio from dm_wq_work.
*/
down_write(&md->io_lock);
- set_bit(DMF_BLOCK_IO, &md->flags);
-
+ set_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
+ set_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags);
up_write(&md->io_lock);
+ flush_workqueue(md->wq);
+
/*
- * Wait for the already-mapped ios to complete.
+ * At this point no more requests are entering target request routines.
+ * We call dm_wait_for_completion to wait for all existing requests
+ * to finish.
*/
r = dm_wait_for_completion(md, TASK_INTERRUPTIBLE);
down_write(&md->io_lock);
-
if (noflush)
clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
up_write(&md->io_lock);
goto out; /* pushback list is already flushed, so skip flush */
}
+ /*
+ * If dm_wait_for_completion returned 0, the device is completely
+ * quiescent now. There is no request-processing activity. All new
+ * requests are being added to md->deferred list.
+ */
+
dm_table_postsuspend_targets(map);
set_bit(DMF_SUSPENDED, &md->flags);
* To check the return value from dm_table_find_target().
*/
#define dm_target_is_valid(t) ((t)->table)
-int dm_table_barrier_ok(struct dm_table *t);
/*-----------------------------------------------------------------
* A registry of target types.
}
ext_csd_struct = ext_csd[EXT_CSD_REV];
- if (ext_csd_struct > 2) {
+ if (ext_csd_struct > 3) {
printk(KERN_ERR "%s: unrecognised EXT_CSD structure "
"version %d\n", mmc_hostname(card->host),
ext_csd_struct);
if (err)
goto err;
- /*
- * For SPI, enable CRC as appropriate.
- */
- if (mmc_host_is_spi(host)) {
- err = mmc_spi_set_crc(host, use_spi_crc);
- if (err)
- goto err;
- }
-
/*
* Fetch CID from card.
*/
goto free_card;
}
+ /*
+ * For SPI, enable CRC as appropriate.
+ * This CRC enable is located AFTER the reading of the
+ * card registers because some SDHC cards are not able
+ * to provide valid CRCs for non-512-byte blocks.
+ */
+ if (mmc_host_is_spi(host)) {
+ err = mmc_spi_set_crc(host, use_spi_crc);
+ if (err)
+ goto free_card;
+ }
+
/*
* Attempt to change to high-speed (if supported)
*/
wmb();
- if (host->actual_bus_width == MMC_BUS_WIDTH_4)
- BLR(host->dma) = 0; /* burst 64 byte read / 64 bytes write */
- else
- BLR(host->dma) = 16; /* burst 16 byte read / 16 bytes write */
-
- RSSR(host->dma) = DMA_REQ_SDHC;
-
set_bit(IMXMCI_PEND_DMA_DATA_b, &host->pending_events);
clear_bit(IMXMCI_PEND_CPU_DATA_b, &host->pending_events);
if (ios->bus_width == MMC_BUS_WIDTH_4) {
host->actual_bus_width = MMC_BUS_WIDTH_4;
imx_gpio_mode(PB11_PF_SD_DAT3);
+ BLR(host->dma) = 0; /* burst 64 byte read/write */
} else {
host->actual_bus_width = MMC_BUS_WIDTH_1;
imx_gpio_mode(GPIO_PORTB | GPIO_IN | GPIO_PUEN | 11);
+ BLR(host->dma) = 16; /* burst 16 byte read/write */
}
if (host->power_mode != ios->power_mode) {
mod_timer(&host->timer, jiffies + (HZ>>1));
}
-static int imxmci_probe(struct platform_device *pdev)
+static int __init imxmci_probe(struct platform_device *pdev)
{
struct mmc_host *mmc;
struct imxmci_host *host = NULL;
}
host->dma_allocated = 1;
imx_dma_setup_handlers(host->dma, imxmci_dma_irq, NULL, host);
+ RSSR(host->dma) = DMA_REQ_SDHC;
tasklet_init(&host->tasklet, imxmci_tasklet_fnc, (unsigned long)host);
host->status_reg=0;
return ret;
}
-static int imxmci_remove(struct platform_device *pdev)
+static int __exit imxmci_remove(struct platform_device *pdev)
{
struct mmc_host *mmc = platform_get_drvdata(pdev);
#endif /* CONFIG_PM */
static struct platform_driver imxmci_driver = {
- .probe = imxmci_probe,
- .remove = imxmci_remove,
+ .remove = __exit_p(imxmci_remove),
.suspend = imxmci_suspend,
.resume = imxmci_resume,
.driver = {
static int __init imxmci_init(void)
{
- return platform_driver_register(&imxmci_driver);
+ return platform_driver_probe(&imxmci_driver, imxmci_probe);
}
static void __exit imxmci_exit(void)
* along with this program; if not, write to the Free Software
* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
-#include <linux/hrtimer.h>
+#include <linux/sched.h>
#include <linux/delay.h>
#include <linux/bio.h>
#include <linux/dma-mapping.h>
* reads which takes nowhere near that long. Older cards may be able to use
* shorter timeouts ... but why bother?
*/
-#define r1b_timeout ktime_set(3, 0)
+#define r1b_timeout (HZ * 3)
/****************************************************************************/
return status;
}
-static int
-mmc_spi_skip(struct mmc_spi_host *host, ktime_t timeout, unsigned n, u8 byte)
+static int mmc_spi_skip(struct mmc_spi_host *host, unsigned long timeout,
+ unsigned n, u8 byte)
{
u8 *cp = host->data->status;
-
- timeout = ktime_add(timeout, ktime_get());
+ unsigned long start = jiffies;
while (1) {
int status;
return cp[i];
}
- /* REVISIT investigate msleep() to avoid busy-wait I/O
- * in at least some cases.
- */
- if (ktime_to_ns(ktime_sub(ktime_get(), timeout)) > 0)
+ if (time_is_before_jiffies(start + timeout))
break;
+
+ /* If we need long timeouts, we may release the CPU.
+ * We use jiffies here because we want to have a relation
+ * between elapsed time and the blocking of the scheduler.
+ */
+ if (time_is_before_jiffies(start+1))
+ schedule();
}
return -ETIMEDOUT;
}
static inline int
-mmc_spi_wait_unbusy(struct mmc_spi_host *host, ktime_t timeout)
+mmc_spi_wait_unbusy(struct mmc_spi_host *host, unsigned long timeout)
{
return mmc_spi_skip(host, timeout, sizeof(host->data->status), 0);
}
-static int mmc_spi_readtoken(struct mmc_spi_host *host, ktime_t timeout)
+static int mmc_spi_readtoken(struct mmc_spi_host *host, unsigned long timeout)
{
return mmc_spi_skip(host, timeout, 1, 0xff);
}
u8 *cp = host->data->status;
u8 *end = cp + host->t.len;
int value = 0;
+ int bitshift;
+ u8 leftover = 0;
+ unsigned short rotator;
+ int i;
char tag[32];
snprintf(tag, sizeof(tag), " ... CMD%d response SPI_%s",
/* Data block reads (R1 response types) may need more data... */
if (cp == end) {
- unsigned i;
-
cp = host->data->status;
+ end = cp+1;
/* Card sends N(CR) (== 1..8) bytes of all-ones then one
* status byte ... and we already scanned 2 bytes.
}
checkstatus:
- if (*cp & 0x80) {
- dev_dbg(&host->spi->dev, "%s: INVALID RESPONSE, %02x\n",
- tag, *cp);
- value = -EBADR;
- goto done;
+ bitshift = 0;
+ if (*cp & 0x80) {
+ /* Houston, we have an ugly card with a bit-shifted response */
+ rotator = *cp++ << 8;
+ /* read the next byte */
+ if (cp == end) {
+ value = mmc_spi_readbytes(host, 1);
+ if (value < 0)
+ goto done;
+ cp = host->data->status;
+ end = cp+1;
+ }
+ rotator |= *cp++;
+ while (rotator & 0x8000) {
+ bitshift++;
+ rotator <<= 1;
+ }
+ cmd->resp[0] = rotator >> 8;
+ leftover = rotator;
+ } else {
+ cmd->resp[0] = *cp++;
}
-
- cmd->resp[0] = *cp++;
cmd->error = 0;
/* Status byte: the entire seven-bit R1 response. */
if (cmd->resp[0] != 0) {
if ((R1_SPI_PARAMETER | R1_SPI_ADDRESS
- | R1_SPI_ILLEGAL_COMMAND)
+ | R1_SPI_ILLEGAL_COMMAND)
& cmd->resp[0])
value = -EINVAL;
else if (R1_SPI_COM_CRC & cmd->resp[0])
* SPI R5 == R1 + data byte; IO_RW_DIRECT
*/
case MMC_RSP_SPI_R2:
- cmd->resp[0] |= *cp << 8;
+ /* read the next byte */
+ if (cp == end) {
+ value = mmc_spi_readbytes(host, 1);
+ if (value < 0)
+ goto done;
+ cp = host->data->status;
+ end = cp+1;
+ }
+ if (bitshift) {
+ rotator = leftover << 8;
+ rotator |= *cp << bitshift;
+ cmd->resp[0] |= (rotator & 0xFF00);
+ } else {
+ cmd->resp[0] |= *cp << 8;
+ }
break;
/* SPI R3, R4, or R7 == R1 + 4 bytes */
case MMC_RSP_SPI_R3:
- cmd->resp[1] = get_unaligned_be32(cp);
+ rotator = leftover << 8;
+ cmd->resp[1] = 0;
+ for (i = 0; i < 4; i++) {
+ cmd->resp[1] <<= 8;
+ /* read the next byte */
+ if (cp == end) {
+ value = mmc_spi_readbytes(host, 1);
+ if (value < 0)
+ goto done;
+ cp = host->data->status;
+ end = cp+1;
+ }
+ if (bitshift) {
+ rotator |= *cp++ << bitshift;
+ cmd->resp[1] |= (rotator >> 8);
+ rotator <<= 8;
+ } else {
+ cmd->resp[1] |= *cp++;
+ }
+ }
break;
/* SPI R1 == just one status byte */
*/
static int
mmc_spi_writeblock(struct mmc_spi_host *host, struct spi_transfer *t,
- ktime_t timeout)
+ unsigned long timeout)
{
struct spi_device *spi = host->spi;
int status, i;
*/
static int
mmc_spi_readblock(struct mmc_spi_host *host, struct spi_transfer *t,
- ktime_t timeout)
+ unsigned long timeout)
{
struct spi_device *spi = host->spi;
int status;
struct scratch *scratch = host->data;
+ unsigned int bitshift;
+ u8 leftover;
/* At least one SD card sends an all-zeroes byte when N(CX)
* applies, before the all-ones bytes ... just cope with that.
if (status == 0xff || status == 0)
status = mmc_spi_readtoken(host, timeout);
- if (status == SPI_TOKEN_SINGLE) {
- if (host->dma_dev) {
- dma_sync_single_for_device(host->dma_dev,
- host->data_dma, sizeof(*scratch),
- DMA_BIDIRECTIONAL);
- dma_sync_single_for_device(host->dma_dev,
- t->rx_dma, t->len,
- DMA_FROM_DEVICE);
- }
+ if (status < 0) {
+ dev_dbg(&spi->dev, "read error %02x (%d)\n", status, status);
+ return status;
+ }
- status = spi_sync(spi, &host->m);
+ /* The token may be bit-shifted...
+ * the first 0-bit precedes the data stream.
+ */
+ bitshift = 7;
+ while (status & 0x80) {
+ status <<= 1;
+ bitshift--;
+ }
+ leftover = status << 1;
- if (host->dma_dev) {
- dma_sync_single_for_cpu(host->dma_dev,
- host->data_dma, sizeof(*scratch),
- DMA_BIDIRECTIONAL);
- dma_sync_single_for_cpu(host->dma_dev,
- t->rx_dma, t->len,
- DMA_FROM_DEVICE);
- }
+ if (host->dma_dev) {
+ dma_sync_single_for_device(host->dma_dev,
+ host->data_dma, sizeof(*scratch),
+ DMA_BIDIRECTIONAL);
+ dma_sync_single_for_device(host->dma_dev,
+ t->rx_dma, t->len,
+ DMA_FROM_DEVICE);
+ }
- } else {
- dev_dbg(&spi->dev, "read error %02x (%d)\n", status, status);
+ status = spi_sync(spi, &host->m);
- /* we've read extra garbage, timed out, etc */
- if (status < 0)
- return status;
+ if (host->dma_dev) {
+ dma_sync_single_for_cpu(host->dma_dev,
+ host->data_dma, sizeof(*scratch),
+ DMA_BIDIRECTIONAL);
+ dma_sync_single_for_cpu(host->dma_dev,
+ t->rx_dma, t->len,
+ DMA_FROM_DEVICE);
+ }
- /* low four bits are an R2 subset, fifth seems to be
- * vendor specific ... map them all to generic error..
+ if (bitshift) {
+ /* Walk through the data and the crc and do
+ * all the magic to get byte-aligned data.
*/
- return -EIO;
+ u8 *cp = t->rx_buf;
+ unsigned int len;
+ unsigned int bitright = 8 - bitshift;
+ u8 temp;
+ for (len = t->len; len; len--) {
+ temp = *cp;
+ *cp++ = leftover | (temp >> bitshift);
+ leftover = temp << bitright;
+ }
+ cp = (u8 *) &scratch->crc_val;
+ temp = *cp;
+ *cp++ = leftover | (temp >> bitshift);
+ leftover = temp << bitright;
+ temp = *cp;
+ *cp = leftover | (temp >> bitshift);
}
if (host->mmc->use_spi_crc) {
unsigned n_sg;
int multiple = (data->blocks > 1);
u32 clock_rate;
- ktime_t timeout;
+ unsigned long timeout;
if (data->flags & MMC_DATA_READ)
direction = DMA_FROM_DEVICE;
else
clock_rate = spi->max_speed_hz;
- timeout = ktime_add_ns(ktime_set(0, 0), data->timeout_ns +
- data->timeout_clks * 1000000 / clock_rate);
+ timeout = data->timeout_ns +
+ data->timeout_clks * 1000000 / clock_rate;
+ timeout = usecs_to_jiffies((unsigned int)(timeout / 1000)) + 1;
/* Handle scatterlist segments one at a time, with synch for
* each 512-byte block
struct mmc_request *mrq = host->mrq;
host->mrq = NULL;
- mmc_omap_fclk_lazy_disable(host);
mmc_request_done(host->mmc, mrq);
return;
}
if (host->mrq == NULL) {
OMAP_HSMMC_WRITE(host->base, STAT,
OMAP_HSMMC_READ(host->base, STAT));
+ /* Flush posted write */
+ OMAP_HSMMC_READ(host->base, STAT);
return IRQ_HANDLED;
}
}
OMAP_HSMMC_WRITE(host->base, STAT, status);
+ /* Flush posted write */
+ OMAP_HSMMC_READ(host->base, STAT);
- if (end_cmd || (status & CC))
+ if (end_cmd || ((status & CC) && host->cmd))
mmc_omap_cmd_done(host, host->cmd);
if (end_trans || (status & TC))
mmc_omap_xfer_done(host, data);
module_init(sdhci_drv_init);
module_exit(sdhci_drv_exit);
-MODULE_AUTHOR("Pierre Ossman <drzeus@drzeus.cx>");
+MODULE_AUTHOR("Pierre Ossman <pierre@ossman.eu>");
MODULE_DESCRIPTION("Secure Digital Host Controller Interface PCI driver");
MODULE_LICENSE("GPL");
module_param(debug_quirks, uint, 0444);
-MODULE_AUTHOR("Pierre Ossman <drzeus@drzeus.cx>");
+MODULE_AUTHOR("Pierre Ossman <pierre@ossman.eu>");
MODULE_DESCRIPTION("Secure Digital Host Controller Interface core driver");
MODULE_LICENSE("GPL");
module_param_named(dma, param_dma, int, 0444);
MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Pierre Ossman <drzeus@drzeus.cx>");
+MODULE_AUTHOR("Pierre Ossman <pierre@ossman.eu>");
MODULE_DESCRIPTION("Winbond W83L51xD SD/MMC card interface driver");
#ifdef CONFIG_PNP
depends on NET_ETHERNET && HAS_IOMEM
select MII
select PHYLIB
+ select CRC32
+ select BITREVERSE
help
Say Y here if you want to use the OpenCores 10/100 Mbps Ethernet MAC.
driver. DCA is a method for warming the CPU cache before data
is used, with the intent of lessening the impact of cache misses.
+config IGBVF
+ tristate "Intel(R) 82576 Virtual Function Ethernet support"
+ depends on PCI
+ ---help---
+ This driver supports Intel(R) 82576 virtual functions. For more
+ information on how to identify your adapter, go to the Adapter &
+ Driver ID Guide at:
+
+ <http://support.intel.com/support/network/adapter/pro100/21397.htm>
+
+ For general information and support, go to the Intel support
+ website at:
+
+ <http://support.intel.com>
+
+ More specific information on configuring the driver is in
+ <file:Documentation/networking/e1000.txt>.
+
+ To compile this driver as a module, choose M here. The module
+ will be called igbvf.
+
source "drivers/net/ixp2000/Kconfig"
config MYRI_SBUS
obj-$(CONFIG_E1000E) += e1000e/
obj-$(CONFIG_IBM_NEW_EMAC) += ibm_newemac/
obj-$(CONFIG_IGB) += igb/
+obj-$(CONFIG_IGBVF) += igbvf/
obj-$(CONFIG_IXGBE) += ixgbe/
obj-$(CONFIG_IXGB) += ixgb/
obj-$(CONFIG_IP1000) += ipg.o
bnx2_request_firmware(struct bnx2 *bp)
{
const char *mips_fw_file, *rv2p_fw_file;
- const struct bnx2_mips_fw_file *mips;
- const struct bnx2_rv2p_fw_file *rv2p;
+ const struct bnx2_mips_fw_file *mips_fw;
+ const struct bnx2_rv2p_fw_file *rv2p_fw;
int rc;
if (CHIP_NUM(bp) == CHIP_NUM_5709) {
rv2p_fw_file);
return rc;
}
- mips = (const struct bnx2_mips_fw_file *) bp->mips_firmware->data;
- rv2p = (const struct bnx2_rv2p_fw_file *) bp->rv2p_firmware->data;
- if (bp->mips_firmware->size < sizeof(*mips) ||
- check_mips_fw_entry(bp->mips_firmware, &mips->com) ||
- check_mips_fw_entry(bp->mips_firmware, &mips->cp) ||
- check_mips_fw_entry(bp->mips_firmware, &mips->rxp) ||
- check_mips_fw_entry(bp->mips_firmware, &mips->tpat) ||
- check_mips_fw_entry(bp->mips_firmware, &mips->txp)) {
+ mips_fw = (const struct bnx2_mips_fw_file *) bp->mips_firmware->data;
+ rv2p_fw = (const struct bnx2_rv2p_fw_file *) bp->rv2p_firmware->data;
+ if (bp->mips_firmware->size < sizeof(*mips_fw) ||
+ check_mips_fw_entry(bp->mips_firmware, &mips_fw->com) ||
+ check_mips_fw_entry(bp->mips_firmware, &mips_fw->cp) ||
+ check_mips_fw_entry(bp->mips_firmware, &mips_fw->rxp) ||
+ check_mips_fw_entry(bp->mips_firmware, &mips_fw->tpat) ||
+ check_mips_fw_entry(bp->mips_firmware, &mips_fw->txp)) {
printk(KERN_ERR PFX "Firmware file \"%s\" is invalid\n",
mips_fw_file);
return -EINVAL;
}
- if (bp->rv2p_firmware->size < sizeof(*rv2p) ||
- check_fw_section(bp->rv2p_firmware, &rv2p->proc1.rv2p, 8, true) ||
- check_fw_section(bp->rv2p_firmware, &rv2p->proc2.rv2p, 8, true)) {
+ if (bp->rv2p_firmware->size < sizeof(*rv2p_fw) ||
+ check_fw_section(bp->rv2p_firmware, &rv2p_fw->proc1.rv2p, 8, true) ||
+ check_fw_section(bp->rv2p_firmware, &rv2p_fw->proc2.rv2p, 8, true)) {
printk(KERN_ERR PFX "Firmware file \"%s\" is invalid\n",
rv2p_fw_file);
return -EINVAL;
}
spin_unlock_bh(&eql->queue.lock);
+ dev_put(slave_dev);
+
return ret;
}
icrp = (volatile unsigned long *) (MCF_MBAR + MCFSIM_ICR1);
*icrp = 0x0d000000;
}
+#endif
#ifdef CONFIG_M5272
static void __inline__ fec_get_mac(struct net_device *dev)
/* for netdump / net console */
static void igb_netpoll(struct net_device *);
#endif
-
#ifdef CONFIG_PCI_IOV
-static ssize_t igb_set_num_vfs(struct device *, struct device_attribute *,
- const char *, size_t);
-static ssize_t igb_show_num_vfs(struct device *, struct device_attribute *,
- char *);
-DEVICE_ATTR(num_vfs, S_IRUGO | S_IWUSR, igb_show_num_vfs, igb_set_num_vfs);
-#endif
+static unsigned int max_vfs = 0;
+module_param(max_vfs, uint, 0);
+MODULE_PARM_DESC(max_vfs, "Maximum number of virtual functions to allocate "
+ "per physical function");
+#endif /* CONFIG_PCI_IOV */
+
static pci_ers_result_t igb_io_error_detected(struct pci_dev *,
pci_channel_state_t);
static pci_ers_result_t igb_io_slot_reset(struct pci_dev *);
/* If we can't do MSI-X, try MSI */
msi_only:
+#ifdef CONFIG_PCI_IOV
+ /* disable SR-IOV for non MSI-X configurations */
+ if (adapter->vf_data) {
+ struct e1000_hw *hw = &adapter->hw;
+ /* disable iov and allow time for transactions to clear */
+ pci_disable_sriov(adapter->pdev);
+ msleep(500);
+
+ kfree(adapter->vf_data);
+ adapter->vf_data = NULL;
+ wr32(E1000_IOVCTL, E1000_IOVCTL_REUSE_VFQ);
+ msleep(100);
+ dev_info(&adapter->pdev->dev, "IOV Disabled\n");
+ }
+#endif
adapter->num_rx_queues = 1;
adapter->num_tx_queues = 1;
if (!pci_enable_msi(adapter->pdev))
if (err)
goto err_sw_init;
+#ifdef CONFIG_PCI_IOV
+ /* since iov functionality isn't critical to base device function we
+ * can accept failure. If it fails we don't allow iov to be enabled */
+ if (hw->mac.type == e1000_82576) {
+ /* 82576 supports a maximum of 7 VFs in addition to the PF */
+ unsigned int num_vfs = (max_vfs > 7) ? 7 : max_vfs;
+ int i;
+ unsigned char mac_addr[ETH_ALEN];
+
+ if (num_vfs)
+ adapter->vf_data = kcalloc(num_vfs,
+ sizeof(struct vf_data_storage),
+ GFP_KERNEL);
+ if (!adapter->vf_data) {
+ dev_err(&pdev->dev, "Could not allocate VF private "
+ "data - IOV enable failed\n");
+ } else {
+ err = pci_enable_sriov(pdev, num_vfs);
+ if (!err) {
+ adapter->vfs_allocated_count = num_vfs;
+ dev_info(&pdev->dev, "%d vfs allocated\n", num_vfs);
+ for (i = 0; i < adapter->vfs_allocated_count; i++) {
+ random_ether_addr(mac_addr);
+ igb_set_vf_mac(adapter, i, mac_addr);
+ }
+ } else {
+ kfree(adapter->vf_data);
+ adapter->vf_data = NULL;
+ }
+ }
+ }
+
+#endif
/* setup the private structure */
err = igb_sw_init(adapter);
if (err)
if (err)
goto err_register;
-#ifdef CONFIG_PCI_IOV
- /* since iov functionality isn't critical to base device function we
- * can accept failure. If it fails we don't allow iov to be enabled */
- if (hw->mac.type == e1000_82576) {
- err = pci_enable_sriov(pdev, 0);
- if (!err)
- err = device_create_file(&netdev->dev,
- &dev_attr_num_vfs);
- if (err)
- dev_err(&pdev->dev, "Failed to initialize IOV\n");
- }
-
-#endif
#ifdef CONFIG_IGB_DCA
if (dca_add_requester(&pdev->dev) == 0) {
adapter->flags |= IGB_FLAG_DCA_ENABLED;
igb_vmdq_set_replication_pf(hw, true);
}
-#ifdef CONFIG_PCI_IOV
-static ssize_t igb_show_num_vfs(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct igb_adapter *adapter = netdev_priv(to_net_dev(dev));
-
- return sprintf(buf, "%d\n", adapter->vfs_allocated_count);
-}
-
-static ssize_t igb_set_num_vfs(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct net_device *netdev = to_net_dev(dev);
- struct igb_adapter *adapter = netdev_priv(netdev);
- struct e1000_hw *hw = &adapter->hw;
- struct pci_dev *pdev = adapter->pdev;
- unsigned int num_vfs, i;
- unsigned char mac_addr[ETH_ALEN];
- int err;
-
- sscanf(buf, "%u", &num_vfs);
-
- if (num_vfs > 7)
- num_vfs = 7;
-
- /* value unchanged do nothing */
- if (num_vfs == adapter->vfs_allocated_count)
- return count;
-
- if (netdev->flags & IFF_UP)
- igb_close(netdev);
-
- igb_reset_interrupt_capability(adapter);
- igb_free_queues(adapter);
- adapter->tx_ring = NULL;
- adapter->rx_ring = NULL;
- adapter->vfs_allocated_count = 0;
-
- /* reclaim resources allocated to VFs since we are changing count */
- if (adapter->vf_data) {
- /* disable iov and allow time for transactions to clear */
- pci_disable_sriov(pdev);
- msleep(500);
-
- kfree(adapter->vf_data);
- adapter->vf_data = NULL;
- wr32(E1000_IOVCTL, E1000_IOVCTL_REUSE_VFQ);
- msleep(100);
- dev_info(&pdev->dev, "IOV Disabled\n");
- }
-
- if (num_vfs) {
- adapter->vf_data = kcalloc(num_vfs,
- sizeof(struct vf_data_storage),
- GFP_KERNEL);
- if (!adapter->vf_data) {
- dev_err(&pdev->dev, "Could not allocate VF private "
- "data - IOV enable failed\n");
- } else {
- err = pci_enable_sriov(pdev, num_vfs);
- if (!err) {
- adapter->vfs_allocated_count = num_vfs;
- dev_info(&pdev->dev, "%d vfs allocated\n", num_vfs);
- for (i = 0; i < adapter->vfs_allocated_count; i++) {
- random_ether_addr(mac_addr);
- igb_set_vf_mac(adapter, i, mac_addr);
- }
- } else {
- kfree(adapter->vf_data);
- adapter->vf_data = NULL;
- }
- }
- }
-
- igb_set_interrupt_capability(adapter);
- igb_alloc_queues(adapter);
- igb_reset(adapter);
-
- if (netdev->flags & IFF_UP)
- igb_open(netdev);
-
- return count;
-}
-#endif /* CONFIG_PCI_IOV */
/* igb_main.c */
--- /dev/null
+################################################################################
+#
+# Intel(R) 82576 Virtual Function Linux driver
+# Copyright(c) 2009 Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+# more details.
+#
+# You should have received a copy of the GNU General Public License along with
+# this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+#
+# The full GNU General Public License is included in this distribution in
+# the file called "COPYING".
+#
+# Contact Information:
+# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+#
+################################################################################
+
+#
+# Makefile for the Intel(R) 82576 VF ethernet driver
+#
+
+obj-$(CONFIG_IGBVF) += igbvf.o
+
+igbvf-objs := vf.o \
+ mbx.o \
+ ethtool.o \
+ netdev.o
+
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 1999 - 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#ifndef _E1000_DEFINES_H_
+#define _E1000_DEFINES_H_
+
+/* Number of Transmit and Receive Descriptors must be a multiple of 8 */
+#define REQ_TX_DESCRIPTOR_MULTIPLE 8
+#define REQ_RX_DESCRIPTOR_MULTIPLE 8
+
+/* IVAR valid bit */
+#define E1000_IVAR_VALID 0x80
+
+/* Receive Descriptor bit definitions */
+#define E1000_RXD_STAT_DD 0x01 /* Descriptor Done */
+#define E1000_RXD_STAT_EOP 0x02 /* End of Packet */
+#define E1000_RXD_STAT_IXSM 0x04 /* Ignore checksum */
+#define E1000_RXD_STAT_VP 0x08 /* IEEE VLAN Packet */
+#define E1000_RXD_STAT_UDPCS 0x10 /* UDP xsum calculated */
+#define E1000_RXD_STAT_TCPCS 0x20 /* TCP xsum calculated */
+#define E1000_RXD_STAT_IPCS 0x40 /* IP xsum calculated */
+#define E1000_RXD_ERR_SE 0x02 /* Symbol Error */
+#define E1000_RXD_SPC_VLAN_MASK 0x0FFF /* VLAN ID is in lower 12 bits */
+
+#define E1000_RXDEXT_STATERR_CE 0x01000000
+#define E1000_RXDEXT_STATERR_SE 0x02000000
+#define E1000_RXDEXT_STATERR_SEQ 0x04000000
+#define E1000_RXDEXT_STATERR_CXE 0x10000000
+#define E1000_RXDEXT_STATERR_TCPE 0x20000000
+#define E1000_RXDEXT_STATERR_IPE 0x40000000
+#define E1000_RXDEXT_STATERR_RXE 0x80000000
+
+
+/* Same mask, but for extended and packet split descriptors */
+#define E1000_RXDEXT_ERR_FRAME_ERR_MASK ( \
+ E1000_RXDEXT_STATERR_CE | \
+ E1000_RXDEXT_STATERR_SE | \
+ E1000_RXDEXT_STATERR_SEQ | \
+ E1000_RXDEXT_STATERR_CXE | \
+ E1000_RXDEXT_STATERR_RXE)
+
+/* Device Control */
+#define E1000_CTRL_RST 0x04000000 /* Global reset */
+
+/* Device Status */
+#define E1000_STATUS_FD 0x00000001 /* Full duplex.0=half,1=full */
+#define E1000_STATUS_LU 0x00000002 /* Link up.0=no,1=link */
+#define E1000_STATUS_TXOFF 0x00000010 /* transmission paused */
+#define E1000_STATUS_SPEED_10 0x00000000 /* Speed 10Mb/s */
+#define E1000_STATUS_SPEED_100 0x00000040 /* Speed 100Mb/s */
+#define E1000_STATUS_SPEED_1000 0x00000080 /* Speed 1000Mb/s */
+
+#define SPEED_10 10
+#define SPEED_100 100
+#define SPEED_1000 1000
+#define HALF_DUPLEX 1
+#define FULL_DUPLEX 2
+
+/* Transmit Descriptor bit definitions */
+#define E1000_TXD_POPTS_IXSM 0x01 /* Insert IP checksum */
+#define E1000_TXD_POPTS_TXSM 0x02 /* Insert TCP/UDP checksum */
+#define E1000_TXD_CMD_DEXT 0x20000000 /* Descriptor extension (0 = legacy) */
+#define E1000_TXD_STAT_DD 0x00000001 /* Descriptor Done */
+
+#define MAX_JUMBO_FRAME_SIZE 0x3F00
+
+/* 802.1q VLAN Packet Size */
+#define VLAN_TAG_SIZE 4 /* 802.3ac tag (not DMA'd) */
+
+/* Error Codes */
+#define E1000_SUCCESS 0
+#define E1000_ERR_CONFIG 3
+#define E1000_ERR_MAC_INIT 5
+#define E1000_ERR_MBX 15
+
+#ifndef ETH_ADDR_LEN
+#define ETH_ADDR_LEN 6
+#endif
+
+/* SRRCTL bit definitions */
+#define E1000_SRRCTL_BSIZEPKT_SHIFT 10 /* Shift _right_ */
+#define E1000_SRRCTL_BSIZEHDRSIZE_MASK 0x00000F00
+#define E1000_SRRCTL_BSIZEHDRSIZE_SHIFT 2 /* Shift _left_ */
+#define E1000_SRRCTL_DESCTYPE_ADV_ONEBUF 0x02000000
+#define E1000_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS 0x0A000000
+#define E1000_SRRCTL_DESCTYPE_MASK 0x0E000000
+#define E1000_SRRCTL_DROP_EN 0x80000000
+
+#define E1000_SRRCTL_BSIZEPKT_MASK 0x0000007F
+#define E1000_SRRCTL_BSIZEHDR_MASK 0x00003F00
+
+/* Additional Descriptor Control definitions */
+#define E1000_TXDCTL_QUEUE_ENABLE 0x02000000 /* Enable specific Tx Queue */
+#define E1000_RXDCTL_QUEUE_ENABLE 0x02000000 /* Enable specific Rx Queue */
+
+/* Direct Cache Access (DCA) definitions */
+#define E1000_DCA_TXCTRL_TX_WB_RO_EN (1 << 11) /* Tx Desc writeback RO bit */
+
+#define E1000_VF_INIT_TIMEOUT 200 /* Number of retries to clear RSTI */
+
+#endif /* _E1000_DEFINES_H_ */
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+/* ethtool support for igbvf */
+
+#include <linux/netdevice.h>
+#include <linux/ethtool.h>
+#include <linux/pci.h>
+#include <linux/vmalloc.h>
+#include <linux/delay.h>
+
+#include "igbvf.h"
+#include <linux/if_vlan.h>
+
+
+struct igbvf_stats {
+ char stat_string[ETH_GSTRING_LEN];
+ int sizeof_stat;
+ int stat_offset;
+ int base_stat_offset;
+};
+
+#define IGBVF_STAT(current, base) \
+ sizeof(((struct igbvf_adapter *)0)->current), \
+ offsetof(struct igbvf_adapter, current), \
+ offsetof(struct igbvf_adapter, base)
+
+static const struct igbvf_stats igbvf_gstrings_stats[] = {
+ { "rx_packets", IGBVF_STAT(stats.gprc, stats.base_gprc) },
+ { "tx_packets", IGBVF_STAT(stats.gptc, stats.base_gptc) },
+ { "rx_bytes", IGBVF_STAT(stats.gorc, stats.base_gorc) },
+ { "tx_bytes", IGBVF_STAT(stats.gotc, stats.base_gotc) },
+ { "multicast", IGBVF_STAT(stats.mprc, stats.base_mprc) },
+ { "lbrx_bytes", IGBVF_STAT(stats.gorlbc, stats.base_gorlbc) },
+ { "lbrx_packets", IGBVF_STAT(stats.gprlbc, stats.base_gprlbc) },
+ { "tx_restart_queue", IGBVF_STAT(restart_queue, zero_base) },
+ { "rx_long_byte_count", IGBVF_STAT(stats.gorc, stats.base_gorc) },
+ { "rx_csum_offload_good", IGBVF_STAT(hw_csum_good, zero_base) },
+ { "rx_csum_offload_errors", IGBVF_STAT(hw_csum_err, zero_base) },
+ { "rx_header_split", IGBVF_STAT(rx_hdr_split, zero_base) },
+ { "alloc_rx_buff_failed", IGBVF_STAT(alloc_rx_buff_failed, zero_base) },
+};
+
+#define IGBVF_GLOBAL_STATS_LEN ARRAY_SIZE(igbvf_gstrings_stats)
+
+static const char igbvf_gstrings_test[][ETH_GSTRING_LEN] = {
+ "Link test (on/offline)"
+};
+
+#define IGBVF_TEST_LEN ARRAY_SIZE(igbvf_gstrings_test)
+
+static int igbvf_get_settings(struct net_device *netdev,
+ struct ethtool_cmd *ecmd)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+ u32 status;
+
+ ecmd->supported = SUPPORTED_1000baseT_Full;
+
+ ecmd->advertising = ADVERTISED_1000baseT_Full;
+
+ ecmd->port = -1;
+ ecmd->transceiver = XCVR_DUMMY1;
+
+ status = er32(STATUS);
+ if (status & E1000_STATUS_LU) {
+ if (status & E1000_STATUS_SPEED_1000)
+ ecmd->speed = 1000;
+ else if (status & E1000_STATUS_SPEED_100)
+ ecmd->speed = 100;
+ else
+ ecmd->speed = 10;
+
+ if (status & E1000_STATUS_FD)
+ ecmd->duplex = DUPLEX_FULL;
+ else
+ ecmd->duplex = DUPLEX_HALF;
+ } else {
+ ecmd->speed = -1;
+ ecmd->duplex = -1;
+ }
+
+ ecmd->autoneg = AUTONEG_DISABLE;
+
+ return 0;
+}
+
+static u32 igbvf_get_link(struct net_device *netdev)
+{
+ return netif_carrier_ok(netdev);
+}
+
+static int igbvf_set_settings(struct net_device *netdev,
+ struct ethtool_cmd *ecmd)
+{
+ return -EOPNOTSUPP;
+}
+
+static void igbvf_get_pauseparam(struct net_device *netdev,
+ struct ethtool_pauseparam *pause)
+{
+ return;
+}
+
+static int igbvf_set_pauseparam(struct net_device *netdev,
+ struct ethtool_pauseparam *pause)
+{
+ return -EOPNOTSUPP;
+}
+
+static u32 igbvf_get_tx_csum(struct net_device *netdev)
+{
+ return ((netdev->features & NETIF_F_IP_CSUM) != 0);
+}
+
+static int igbvf_set_tx_csum(struct net_device *netdev, u32 data)
+{
+ if (data)
+ netdev->features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM);
+ else
+ netdev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM);
+ return 0;
+}
+
+static int igbvf_set_tso(struct net_device *netdev, u32 data)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ int i;
+ struct net_device *v_netdev;
+
+ if (data) {
+ netdev->features |= NETIF_F_TSO;
+ netdev->features |= NETIF_F_TSO6;
+ } else {
+ netdev->features &= ~NETIF_F_TSO;
+ netdev->features &= ~NETIF_F_TSO6;
+ /* disable TSO on all VLANs if they're present */
+ if (!adapter->vlgrp)
+ goto tso_out;
+ for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++) {
+ v_netdev = vlan_group_get_device(adapter->vlgrp, i);
+ if (!v_netdev)
+ continue;
+
+ v_netdev->features &= ~NETIF_F_TSO;
+ v_netdev->features &= ~NETIF_F_TSO6;
+ vlan_group_set_device(adapter->vlgrp, i, v_netdev);
+ }
+ }
+
+tso_out:
+ dev_info(&adapter->pdev->dev, "TSO is %s\n",
+ data ? "Enabled" : "Disabled");
+ adapter->flags |= FLAG_TSO_FORCE;
+ return 0;
+}
+
+static u32 igbvf_get_msglevel(struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ return adapter->msg_enable;
+}
+
+static void igbvf_set_msglevel(struct net_device *netdev, u32 data)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ adapter->msg_enable = data;
+}
+
+static int igbvf_get_regs_len(struct net_device *netdev)
+{
+#define IGBVF_REGS_LEN 8
+ return IGBVF_REGS_LEN * sizeof(u32);
+}
+
+static void igbvf_get_regs(struct net_device *netdev,
+ struct ethtool_regs *regs, void *p)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+ u32 *regs_buff = p;
+ u8 revision_id;
+
+ memset(p, 0, IGBVF_REGS_LEN * sizeof(u32));
+
+ pci_read_config_byte(adapter->pdev, PCI_REVISION_ID, &revision_id);
+
+ regs->version = (1 << 24) | (revision_id << 16) | adapter->pdev->device;
+
+ regs_buff[0] = er32(CTRL);
+ regs_buff[1] = er32(STATUS);
+
+ regs_buff[2] = er32(RDLEN(0));
+ regs_buff[3] = er32(RDH(0));
+ regs_buff[4] = er32(RDT(0));
+
+ regs_buff[5] = er32(TDLEN(0));
+ regs_buff[6] = er32(TDH(0));
+ regs_buff[7] = er32(TDT(0));
+}
+
+static int igbvf_get_eeprom_len(struct net_device *netdev)
+{
+ return 0;
+}
+
+static int igbvf_get_eeprom(struct net_device *netdev,
+ struct ethtool_eeprom *eeprom, u8 *bytes)
+{
+ return -EOPNOTSUPP;
+}
+
+static int igbvf_set_eeprom(struct net_device *netdev,
+ struct ethtool_eeprom *eeprom, u8 *bytes)
+{
+ return -EOPNOTSUPP;
+}
+
+static void igbvf_get_drvinfo(struct net_device *netdev,
+ struct ethtool_drvinfo *drvinfo)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ char firmware_version[32] = "N/A";
+
+ strncpy(drvinfo->driver, igbvf_driver_name, 32);
+ strncpy(drvinfo->version, igbvf_driver_version, 32);
+ strncpy(drvinfo->fw_version, firmware_version, 32);
+ strncpy(drvinfo->bus_info, pci_name(adapter->pdev), 32);
+ drvinfo->regdump_len = igbvf_get_regs_len(netdev);
+ drvinfo->eedump_len = igbvf_get_eeprom_len(netdev);
+}
+
+static void igbvf_get_ringparam(struct net_device *netdev,
+ struct ethtool_ringparam *ring)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct igbvf_ring *tx_ring = adapter->tx_ring;
+ struct igbvf_ring *rx_ring = adapter->rx_ring;
+
+ ring->rx_max_pending = IGBVF_MAX_RXD;
+ ring->tx_max_pending = IGBVF_MAX_TXD;
+ ring->rx_mini_max_pending = 0;
+ ring->rx_jumbo_max_pending = 0;
+ ring->rx_pending = rx_ring->count;
+ ring->tx_pending = tx_ring->count;
+ ring->rx_mini_pending = 0;
+ ring->rx_jumbo_pending = 0;
+}
+
+static int igbvf_set_ringparam(struct net_device *netdev,
+ struct ethtool_ringparam *ring)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct igbvf_ring *temp_ring;
+ int err;
+ u32 new_rx_count, new_tx_count;
+
+ if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
+ return -EINVAL;
+
+ new_rx_count = max(ring->rx_pending, (u32)IGBVF_MIN_RXD);
+ new_rx_count = min(new_rx_count, (u32)IGBVF_MAX_RXD);
+ new_rx_count = ALIGN(new_rx_count, REQ_RX_DESCRIPTOR_MULTIPLE);
+
+ new_tx_count = max(ring->tx_pending, (u32)IGBVF_MIN_TXD);
+ new_tx_count = min(new_tx_count, (u32)IGBVF_MAX_TXD);
+ new_tx_count = ALIGN(new_tx_count, REQ_TX_DESCRIPTOR_MULTIPLE);
+
+ if ((new_tx_count == adapter->tx_ring->count) &&
+ (new_rx_count == adapter->rx_ring->count)) {
+ /* nothing to do */
+ return 0;
+ }
+
+ temp_ring = vmalloc(sizeof(struct igbvf_ring));
+ if (!temp_ring)
+ return -ENOMEM;
+
+ while (test_and_set_bit(__IGBVF_RESETTING, &adapter->state))
+ msleep(1);
+
+ if (netif_running(adapter->netdev))
+ igbvf_down(adapter);
+
+ /*
+ * We can't just free everything and then setup again,
+ * because the ISRs in MSI-X mode get passed pointers
+ * to the tx and rx ring structs.
+ */
+ if (new_tx_count != adapter->tx_ring->count) {
+ memcpy(temp_ring, adapter->tx_ring, sizeof(struct igbvf_ring));
+
+ temp_ring->count = new_tx_count;
+ err = igbvf_setup_tx_resources(adapter, temp_ring);
+ if (err)
+ goto err_setup;
+
+ igbvf_free_tx_resources(adapter->tx_ring);
+
+ memcpy(adapter->tx_ring, temp_ring, sizeof(struct igbvf_ring));
+ }
+
+ if (new_rx_count != adapter->rx_ring->count) {
+ memcpy(temp_ring, adapter->rx_ring, sizeof(struct igbvf_ring));
+
+ temp_ring->count = new_rx_count;
+ err = igbvf_setup_rx_resources(adapter, temp_ring);
+ if (err)
+ goto err_setup;
+
+ igbvf_free_rx_resources(adapter->rx_ring);
+
+ memcpy(adapter->rx_ring, temp_ring,sizeof(struct igbvf_ring));
+ }
+
+ err = 0;
+err_setup:
+ if (netif_running(adapter->netdev))
+ igbvf_up(adapter);
+
+ clear_bit(__IGBVF_RESETTING, &adapter->state);
+ vfree(temp_ring);
+ return err;
+}
+
+static int igbvf_link_test(struct igbvf_adapter *adapter, u64 *data)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ *data = 0;
+
+ hw->mac.ops.check_for_link(hw);
+
+ if (!(er32(STATUS) & E1000_STATUS_LU))
+ *data = 1;
+
+ return *data;
+}
+
+static int igbvf_get_self_test_count(struct net_device *netdev)
+{
+ return IGBVF_TEST_LEN;
+}
+
+static int igbvf_get_stats_count(struct net_device *netdev)
+{
+ return IGBVF_GLOBAL_STATS_LEN;
+}
+
+static void igbvf_diag_test(struct net_device *netdev,
+ struct ethtool_test *eth_test, u64 *data)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ set_bit(__IGBVF_TESTING, &adapter->state);
+
+ /*
+ * Link test performed before hardware reset so autoneg doesn't
+ * interfere with test result
+ */
+ if (igbvf_link_test(adapter, &data[0]))
+ eth_test->flags |= ETH_TEST_FL_FAILED;
+
+ clear_bit(__IGBVF_TESTING, &adapter->state);
+ msleep_interruptible(4 * 1000);
+}
+
+static void igbvf_get_wol(struct net_device *netdev,
+ struct ethtool_wolinfo *wol)
+{
+ wol->supported = 0;
+ wol->wolopts = 0;
+
+ return;
+}
+
+static int igbvf_set_wol(struct net_device *netdev,
+ struct ethtool_wolinfo *wol)
+{
+ return -EOPNOTSUPP;
+}
+
+static int igbvf_phys_id(struct net_device *netdev, u32 data)
+{
+ return 0;
+}
+
+static int igbvf_get_coalesce(struct net_device *netdev,
+ struct ethtool_coalesce *ec)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ if (adapter->itr_setting <= 3)
+ ec->rx_coalesce_usecs = adapter->itr_setting;
+ else
+ ec->rx_coalesce_usecs = adapter->itr_setting >> 2;
+
+ return 0;
+}
+
+static int igbvf_set_coalesce(struct net_device *netdev,
+ struct ethtool_coalesce *ec)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+
+ if ((ec->rx_coalesce_usecs > IGBVF_MAX_ITR_USECS) ||
+ ((ec->rx_coalesce_usecs > 3) &&
+ (ec->rx_coalesce_usecs < IGBVF_MIN_ITR_USECS)) ||
+ (ec->rx_coalesce_usecs == 2))
+ return -EINVAL;
+
+ /* convert to rate of irq's per second */
+ if (ec->rx_coalesce_usecs && ec->rx_coalesce_usecs <= 3) {
+ adapter->itr = IGBVF_START_ITR;
+ adapter->itr_setting = ec->rx_coalesce_usecs;
+ } else {
+ adapter->itr = ec->rx_coalesce_usecs << 2;
+ adapter->itr_setting = adapter->itr;
+ }
+
+ writel(adapter->itr,
+ hw->hw_addr + adapter->rx_ring[0].itr_register);
+
+ return 0;
+}
+
+static int igbvf_nway_reset(struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ if (netif_running(netdev))
+ igbvf_reinit_locked(adapter);
+ return 0;
+}
+
+
+static void igbvf_get_ethtool_stats(struct net_device *netdev,
+ struct ethtool_stats *stats,
+ u64 *data)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ int i;
+
+ igbvf_update_stats(adapter);
+ for (i = 0; i < IGBVF_GLOBAL_STATS_LEN; i++) {
+ char *p = (char *)adapter +
+ igbvf_gstrings_stats[i].stat_offset;
+ char *b = (char *)adapter +
+ igbvf_gstrings_stats[i].base_stat_offset;
+ data[i] = ((igbvf_gstrings_stats[i].sizeof_stat ==
+ sizeof(u64)) ? (*(u64 *)p - *(u64 *)b) :
+ (*(u32 *)p - *(u32 *)b));
+ }
+
+}
+
+static void igbvf_get_strings(struct net_device *netdev, u32 stringset,
+ u8 *data)
+{
+ u8 *p = data;
+ int i;
+
+ switch (stringset) {
+ case ETH_SS_TEST:
+ memcpy(data, *igbvf_gstrings_test, sizeof(igbvf_gstrings_test));
+ break;
+ case ETH_SS_STATS:
+ for (i = 0; i < IGBVF_GLOBAL_STATS_LEN; i++) {
+ memcpy(p, igbvf_gstrings_stats[i].stat_string,
+ ETH_GSTRING_LEN);
+ p += ETH_GSTRING_LEN;
+ }
+ break;
+ }
+}
+
+static const struct ethtool_ops igbvf_ethtool_ops = {
+ .get_settings = igbvf_get_settings,
+ .set_settings = igbvf_set_settings,
+ .get_drvinfo = igbvf_get_drvinfo,
+ .get_regs_len = igbvf_get_regs_len,
+ .get_regs = igbvf_get_regs,
+ .get_wol = igbvf_get_wol,
+ .set_wol = igbvf_set_wol,
+ .get_msglevel = igbvf_get_msglevel,
+ .set_msglevel = igbvf_set_msglevel,
+ .nway_reset = igbvf_nway_reset,
+ .get_link = igbvf_get_link,
+ .get_eeprom_len = igbvf_get_eeprom_len,
+ .get_eeprom = igbvf_get_eeprom,
+ .set_eeprom = igbvf_set_eeprom,
+ .get_ringparam = igbvf_get_ringparam,
+ .set_ringparam = igbvf_set_ringparam,
+ .get_pauseparam = igbvf_get_pauseparam,
+ .set_pauseparam = igbvf_set_pauseparam,
+ .get_tx_csum = igbvf_get_tx_csum,
+ .set_tx_csum = igbvf_set_tx_csum,
+ .get_sg = ethtool_op_get_sg,
+ .set_sg = ethtool_op_set_sg,
+ .get_tso = ethtool_op_get_tso,
+ .set_tso = igbvf_set_tso,
+ .self_test = igbvf_diag_test,
+ .get_strings = igbvf_get_strings,
+ .phys_id = igbvf_phys_id,
+ .get_ethtool_stats = igbvf_get_ethtool_stats,
+ .self_test_count = igbvf_get_self_test_count,
+ .get_stats_count = igbvf_get_stats_count,
+ .get_coalesce = igbvf_get_coalesce,
+ .set_coalesce = igbvf_set_coalesce,
+};
+
+void igbvf_set_ethtool_ops(struct net_device *netdev)
+{
+ /* have to "undeclare" const on this struct to remove warnings */
+ SET_ETHTOOL_OPS(netdev, (struct ethtool_ops *)&igbvf_ethtool_ops);
+}
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+/* Linux PRO/1000 Ethernet Driver main header file */
+
+#ifndef _IGBVF_H_
+#define _IGBVF_H_
+
+#include <linux/types.h>
+#include <linux/timer.h>
+#include <linux/io.h>
+#include <linux/netdevice.h>
+
+
+#include "vf.h"
+
+/* Forward declarations */
+struct igbvf_info;
+struct igbvf_adapter;
+
+/* Interrupt defines */
+#define IGBVF_START_ITR 648 /* ~6000 ints/sec */
+
+/* Interrupt modes, as used by the IntMode paramter */
+#define IGBVF_INT_MODE_LEGACY 0
+#define IGBVF_INT_MODE_MSI 1
+#define IGBVF_INT_MODE_MSIX 2
+
+/* Tx/Rx descriptor defines */
+#define IGBVF_DEFAULT_TXD 256
+#define IGBVF_MAX_TXD 4096
+#define IGBVF_MIN_TXD 80
+
+#define IGBVF_DEFAULT_RXD 256
+#define IGBVF_MAX_RXD 4096
+#define IGBVF_MIN_RXD 80
+
+#define IGBVF_MIN_ITR_USECS 10 /* 100000 irq/sec */
+#define IGBVF_MAX_ITR_USECS 10000 /* 100 irq/sec */
+
+/* RX descriptor control thresholds.
+ * PTHRESH - MAC will consider prefetch if it has fewer than this number of
+ * descriptors available in its onboard memory.
+ * Setting this to 0 disables RX descriptor prefetch.
+ * HTHRESH - MAC will only prefetch if there are at least this many descriptors
+ * available in host memory.
+ * If PTHRESH is 0, this should also be 0.
+ * WTHRESH - RX descriptor writeback threshold - MAC will delay writing back
+ * descriptors until either it has this many to write back, or the
+ * ITR timer expires.
+ */
+#define IGBVF_RX_PTHRESH 16
+#define IGBVF_RX_HTHRESH 8
+#define IGBVF_RX_WTHRESH 1
+
+/* this is the size past which hardware will drop packets when setting LPE=0 */
+#define MAXIMUM_ETHERNET_VLAN_SIZE 1522
+
+#define IGBVF_FC_PAUSE_TIME 0x0680 /* 858 usec */
+
+/* How many Tx Descriptors do we need to call netif_wake_queue ? */
+#define IGBVF_TX_QUEUE_WAKE 32
+/* How many Rx Buffers do we bundle into one write to the hardware ? */
+#define IGBVF_RX_BUFFER_WRITE 16 /* Must be power of 2 */
+
+#define AUTO_ALL_MODES 0
+#define IGBVF_EEPROM_APME 0x0400
+
+#define IGBVF_MNG_VLAN_NONE (-1)
+
+/* Number of packet split data buffers (not including the header buffer) */
+#define PS_PAGE_BUFFERS (MAX_PS_BUFFERS - 1)
+
+enum igbvf_boards {
+ board_vf,
+};
+
+struct igbvf_queue_stats {
+ u64 packets;
+ u64 bytes;
+};
+
+/*
+ * wrappers around a pointer to a socket buffer,
+ * so a DMA handle can be stored along with the buffer
+ */
+struct igbvf_buffer {
+ dma_addr_t dma;
+ struct sk_buff *skb;
+ union {
+ /* Tx */
+ struct {
+ unsigned long time_stamp;
+ u16 length;
+ u16 next_to_watch;
+ };
+ /* Rx */
+ struct {
+ struct page *page;
+ u64 page_dma;
+ unsigned int page_offset;
+ };
+ };
+ struct page *page;
+};
+
+union igbvf_desc {
+ union e1000_adv_rx_desc rx_desc;
+ union e1000_adv_tx_desc tx_desc;
+ struct e1000_adv_tx_context_desc tx_context_desc;
+};
+
+struct igbvf_ring {
+ struct igbvf_adapter *adapter; /* backlink */
+ union igbvf_desc *desc; /* pointer to ring memory */
+ dma_addr_t dma; /* phys address of ring */
+ unsigned int size; /* length of ring in bytes */
+ unsigned int count; /* number of desc. in ring */
+
+ u16 next_to_use;
+ u16 next_to_clean;
+
+ u16 head;
+ u16 tail;
+
+ /* array of buffer information structs */
+ struct igbvf_buffer *buffer_info;
+ struct napi_struct napi;
+
+ char name[IFNAMSIZ + 5];
+ u32 eims_value;
+ u32 itr_val;
+ u16 itr_register;
+ int set_itr;
+
+ struct sk_buff *rx_skb_top;
+
+ struct igbvf_queue_stats stats;
+};
+
+/* board specific private data structure */
+struct igbvf_adapter {
+ struct timer_list watchdog_timer;
+ struct timer_list blink_timer;
+
+ struct work_struct reset_task;
+ struct work_struct watchdog_task;
+
+ const struct igbvf_info *ei;
+
+ struct vlan_group *vlgrp;
+ u32 bd_number;
+ u32 rx_buffer_len;
+ u32 polling_interval;
+ u16 mng_vlan_id;
+ u16 link_speed;
+ u16 link_duplex;
+
+ spinlock_t tx_queue_lock; /* prevent concurrent tail updates */
+
+ /* track device up/down/testing state */
+ unsigned long state;
+
+ /* Interrupt Throttle Rate */
+ u32 itr;
+ u32 itr_setting;
+ u16 tx_itr;
+ u16 rx_itr;
+
+ /*
+ * Tx
+ */
+ struct igbvf_ring *tx_ring /* One per active queue */
+ ____cacheline_aligned_in_smp;
+
+ unsigned long tx_queue_len;
+ unsigned int restart_queue;
+ u32 txd_cmd;
+
+ bool detect_tx_hung;
+ u8 tx_timeout_factor;
+
+ u32 tx_int_delay;
+ u32 tx_abs_int_delay;
+
+ unsigned int total_tx_bytes;
+ unsigned int total_tx_packets;
+ unsigned int total_rx_bytes;
+ unsigned int total_rx_packets;
+
+ /* Tx stats */
+ u32 tx_timeout_count;
+ u32 tx_fifo_head;
+ u32 tx_head_addr;
+ u32 tx_fifo_size;
+ u32 tx_dma_failed;
+
+ /*
+ * Rx
+ */
+ struct igbvf_ring *rx_ring;
+
+ u32 rx_int_delay;
+ u32 rx_abs_int_delay;
+
+ /* Rx stats */
+ u64 hw_csum_err;
+ u64 hw_csum_good;
+ u64 rx_hdr_split;
+ u32 alloc_rx_buff_failed;
+ u32 rx_dma_failed;
+
+ unsigned int rx_ps_hdr_size;
+ u32 max_frame_size;
+ u32 min_frame_size;
+
+ /* OS defined structs */
+ struct net_device *netdev;
+ struct pci_dev *pdev;
+ struct net_device_stats net_stats;
+ spinlock_t stats_lock; /* prevent concurrent stats updates */
+
+ /* structs defined in e1000_hw.h */
+ struct e1000_hw hw;
+
+ /* The VF counters don't clear on read so we have to get a base
+ * count on driver start up and always subtract that base on
+ * on the first update, thus the flag..
+ */
+ struct e1000_vf_stats stats;
+ u64 zero_base;
+
+ struct igbvf_ring test_tx_ring;
+ struct igbvf_ring test_rx_ring;
+ u32 test_icr;
+
+ u32 msg_enable;
+ struct msix_entry *msix_entries;
+ int int_mode;
+ u32 eims_enable_mask;
+ u32 eims_other;
+ u32 int_counter0;
+ u32 int_counter1;
+
+ u32 eeprom_wol;
+ u32 wol;
+ u32 pba;
+
+ bool fc_autoneg;
+
+ unsigned long led_status;
+
+ unsigned int flags;
+};
+
+struct igbvf_info {
+ enum e1000_mac_type mac;
+ unsigned int flags;
+ u32 pba;
+ void (*init_ops)(struct e1000_hw *);
+ s32 (*get_variants)(struct igbvf_adapter *);
+};
+
+/* hardware capability, feature, and workaround flags */
+#define FLAG_HAS_HW_VLAN_FILTER (1 << 0)
+#define FLAG_HAS_JUMBO_FRAMES (1 << 1)
+#define FLAG_MSI_ENABLED (1 << 2)
+#define FLAG_RX_CSUM_ENABLED (1 << 3)
+#define FLAG_TSO_FORCE (1 << 4)
+
+#define IGBVF_RX_DESC_ADV(R, i) \
+ (&((((R).desc))[i].rx_desc))
+#define IGBVF_TX_DESC_ADV(R, i) \
+ (&((((R).desc))[i].tx_desc))
+#define IGBVF_TX_CTXTDESC_ADV(R, i) \
+ (&((((R).desc))[i].tx_context_desc))
+
+enum igbvf_state_t {
+ __IGBVF_TESTING,
+ __IGBVF_RESETTING,
+ __IGBVF_DOWN
+};
+
+enum latency_range {
+ lowest_latency = 0,
+ low_latency = 1,
+ bulk_latency = 2,
+ latency_invalid = 255
+};
+
+extern char igbvf_driver_name[];
+extern const char igbvf_driver_version[];
+
+extern void igbvf_check_options(struct igbvf_adapter *);
+extern void igbvf_set_ethtool_ops(struct net_device *);
+
+extern int igbvf_up(struct igbvf_adapter *);
+extern void igbvf_down(struct igbvf_adapter *);
+extern void igbvf_reinit_locked(struct igbvf_adapter *);
+extern void igbvf_reset(struct igbvf_adapter *);
+extern int igbvf_setup_rx_resources(struct igbvf_adapter *, struct igbvf_ring *);
+extern int igbvf_setup_tx_resources(struct igbvf_adapter *, struct igbvf_ring *);
+extern void igbvf_free_rx_resources(struct igbvf_ring *);
+extern void igbvf_free_tx_resources(struct igbvf_ring *);
+extern void igbvf_update_stats(struct igbvf_adapter *);
+extern void igbvf_set_interrupt_capability(struct igbvf_adapter *);
+extern void igbvf_reset_interrupt_capability(struct igbvf_adapter *);
+
+extern unsigned int copybreak;
+
+#endif /* _IGBVF_H_ */
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#include "mbx.h"
+
+/**
+ * e1000_poll_for_msg - Wait for message notification
+ * @hw: pointer to the HW structure
+ *
+ * returns SUCCESS if it successfully received a message notification
+ **/
+static s32 e1000_poll_for_msg(struct e1000_hw *hw)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ int countdown = mbx->timeout;
+
+ if (!mbx->ops.check_for_msg)
+ goto out;
+
+ while (countdown && mbx->ops.check_for_msg(hw)) {
+ countdown--;
+ udelay(mbx->usec_delay);
+ }
+
+ /* if we failed, all future posted messages fail until reset */
+ if (!countdown)
+ mbx->timeout = 0;
+out:
+ return countdown ? E1000_SUCCESS : -E1000_ERR_MBX;
+}
+
+/**
+ * e1000_poll_for_ack - Wait for message acknowledgement
+ * @hw: pointer to the HW structure
+ *
+ * returns SUCCESS if it successfully received a message acknowledgement
+ **/
+static s32 e1000_poll_for_ack(struct e1000_hw *hw)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ int countdown = mbx->timeout;
+
+ if (!mbx->ops.check_for_ack)
+ goto out;
+
+ while (countdown && mbx->ops.check_for_ack(hw)) {
+ countdown--;
+ udelay(mbx->usec_delay);
+ }
+
+ /* if we failed, all future posted messages fail until reset */
+ if (!countdown)
+ mbx->timeout = 0;
+out:
+ return countdown ? E1000_SUCCESS : -E1000_ERR_MBX;
+}
+
+/**
+ * e1000_read_posted_mbx - Wait for message notification and receive message
+ * @hw: pointer to the HW structure
+ * @msg: The message buffer
+ * @size: Length of buffer
+ *
+ * returns SUCCESS if it successfully received a message notification and
+ * copied it into the receive buffer.
+ **/
+static s32 e1000_read_posted_mbx(struct e1000_hw *hw, u32 *msg, u16 size)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ s32 ret_val = -E1000_ERR_MBX;
+
+ if (!mbx->ops.read)
+ goto out;
+
+ ret_val = e1000_poll_for_msg(hw);
+
+ /* if ack received read message, otherwise we timed out */
+ if (!ret_val)
+ ret_val = mbx->ops.read(hw, msg, size);
+out:
+ return ret_val;
+}
+
+/**
+ * e1000_write_posted_mbx - Write a message to the mailbox, wait for ack
+ * @hw: pointer to the HW structure
+ * @msg: The message buffer
+ * @size: Length of buffer
+ *
+ * returns SUCCESS if it successfully copied message into the buffer and
+ * received an ack to that message within delay * timeout period
+ **/
+static s32 e1000_write_posted_mbx(struct e1000_hw *hw, u32 *msg, u16 size)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ s32 ret_val = -E1000_ERR_MBX;
+
+ /* exit if we either can't write or there isn't a defined timeout */
+ if (!mbx->ops.write || !mbx->timeout)
+ goto out;
+
+ /* send msg*/
+ ret_val = mbx->ops.write(hw, msg, size);
+
+ /* if msg sent wait until we receive an ack */
+ if (!ret_val)
+ ret_val = e1000_poll_for_ack(hw);
+out:
+ return ret_val;
+}
+
+/**
+ * e1000_read_v2p_mailbox - read v2p mailbox
+ * @hw: pointer to the HW structure
+ *
+ * This function is used to read the v2p mailbox without losing the read to
+ * clear status bits.
+ **/
+static u32 e1000_read_v2p_mailbox(struct e1000_hw *hw)
+{
+ u32 v2p_mailbox = er32(V2PMAILBOX(0));
+
+ v2p_mailbox |= hw->dev_spec.vf.v2p_mailbox;
+ hw->dev_spec.vf.v2p_mailbox |= v2p_mailbox & E1000_V2PMAILBOX_R2C_BITS;
+
+ return v2p_mailbox;
+}
+
+/**
+ * e1000_check_for_bit_vf - Determine if a status bit was set
+ * @hw: pointer to the HW structure
+ * @mask: bitmask for bits to be tested and cleared
+ *
+ * This function is used to check for the read to clear bits within
+ * the V2P mailbox.
+ **/
+static s32 e1000_check_for_bit_vf(struct e1000_hw *hw, u32 mask)
+{
+ u32 v2p_mailbox = e1000_read_v2p_mailbox(hw);
+ s32 ret_val = -E1000_ERR_MBX;
+
+ if (v2p_mailbox & mask)
+ ret_val = E1000_SUCCESS;
+
+ hw->dev_spec.vf.v2p_mailbox &= ~mask;
+
+ return ret_val;
+}
+
+/**
+ * e1000_check_for_msg_vf - checks to see if the PF has sent mail
+ * @hw: pointer to the HW structure
+ *
+ * returns SUCCESS if the PF has set the Status bit or else ERR_MBX
+ **/
+static s32 e1000_check_for_msg_vf(struct e1000_hw *hw)
+{
+ s32 ret_val = -E1000_ERR_MBX;
+
+ if (!e1000_check_for_bit_vf(hw, E1000_V2PMAILBOX_PFSTS)) {
+ ret_val = E1000_SUCCESS;
+ hw->mbx.stats.reqs++;
+ }
+
+ return ret_val;
+}
+
+/**
+ * e1000_check_for_ack_vf - checks to see if the PF has ACK'd
+ * @hw: pointer to the HW structure
+ *
+ * returns SUCCESS if the PF has set the ACK bit or else ERR_MBX
+ **/
+static s32 e1000_check_for_ack_vf(struct e1000_hw *hw)
+{
+ s32 ret_val = -E1000_ERR_MBX;
+
+ if (!e1000_check_for_bit_vf(hw, E1000_V2PMAILBOX_PFACK)) {
+ ret_val = E1000_SUCCESS;
+ hw->mbx.stats.acks++;
+ }
+
+ return ret_val;
+}
+
+/**
+ * e1000_check_for_rst_vf - checks to see if the PF has reset
+ * @hw: pointer to the HW structure
+ *
+ * returns true if the PF has set the reset done bit or else false
+ **/
+static s32 e1000_check_for_rst_vf(struct e1000_hw *hw)
+{
+ s32 ret_val = -E1000_ERR_MBX;
+
+ if (!e1000_check_for_bit_vf(hw, (E1000_V2PMAILBOX_RSTD |
+ E1000_V2PMAILBOX_RSTI))) {
+ ret_val = E1000_SUCCESS;
+ hw->mbx.stats.rsts++;
+ }
+
+ return ret_val;
+}
+
+/**
+ * e1000_obtain_mbx_lock_vf - obtain mailbox lock
+ * @hw: pointer to the HW structure
+ *
+ * return SUCCESS if we obtained the mailbox lock
+ **/
+static s32 e1000_obtain_mbx_lock_vf(struct e1000_hw *hw)
+{
+ s32 ret_val = -E1000_ERR_MBX;
+
+ /* Take ownership of the buffer */
+ ew32(V2PMAILBOX(0), E1000_V2PMAILBOX_VFU);
+
+ /* reserve mailbox for vf use */
+ if (e1000_read_v2p_mailbox(hw) & E1000_V2PMAILBOX_VFU)
+ ret_val = E1000_SUCCESS;
+
+ return ret_val;
+}
+
+/**
+ * e1000_write_mbx_vf - Write a message to the mailbox
+ * @hw: pointer to the HW structure
+ * @msg: The message buffer
+ * @size: Length of buffer
+ *
+ * returns SUCCESS if it successfully copied message into the buffer
+ **/
+static s32 e1000_write_mbx_vf(struct e1000_hw *hw, u32 *msg, u16 size)
+{
+ s32 err;
+ u16 i;
+
+ /* lock the mailbox to prevent pf/vf race condition */
+ err = e1000_obtain_mbx_lock_vf(hw);
+ if (err)
+ goto out_no_write;
+
+ /* flush any ack or msg as we are going to overwrite mailbox */
+ e1000_check_for_ack_vf(hw);
+ e1000_check_for_msg_vf(hw);
+
+ /* copy the caller specified message to the mailbox memory buffer */
+ for (i = 0; i < size; i++)
+ array_ew32(VMBMEM(0), i, msg[i]);
+
+ /* update stats */
+ hw->mbx.stats.msgs_tx++;
+
+ /* Drop VFU and interrupt the PF to tell it a message has been sent */
+ ew32(V2PMAILBOX(0), E1000_V2PMAILBOX_REQ);
+
+out_no_write:
+ return err;
+}
+
+/**
+ * e1000_read_mbx_vf - Reads a message from the inbox intended for vf
+ * @hw: pointer to the HW structure
+ * @msg: The message buffer
+ * @size: Length of buffer
+ *
+ * returns SUCCESS if it successfuly read message from buffer
+ **/
+static s32 e1000_read_mbx_vf(struct e1000_hw *hw, u32 *msg, u16 size)
+{
+ s32 err;
+ u16 i;
+
+ /* lock the mailbox to prevent pf/vf race condition */
+ err = e1000_obtain_mbx_lock_vf(hw);
+ if (err)
+ goto out_no_read;
+
+ /* copy the message from the mailbox memory buffer */
+ for (i = 0; i < size; i++)
+ msg[i] = array_er32(VMBMEM(0), i);
+
+ /* Acknowledge receipt and release mailbox, then we're done */
+ ew32(V2PMAILBOX(0), E1000_V2PMAILBOX_ACK);
+
+ /* update stats */
+ hw->mbx.stats.msgs_rx++;
+
+out_no_read:
+ return err;
+}
+
+/**
+ * e1000_init_mbx_params_vf - set initial values for vf mailbox
+ * @hw: pointer to the HW structure
+ *
+ * Initializes the hw->mbx struct to correct values for vf mailbox
+ */
+s32 e1000_init_mbx_params_vf(struct e1000_hw *hw)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+
+ /* start mailbox as timed out and let the reset_hw call set the timeout
+ * value to being communications */
+ mbx->timeout = 0;
+ mbx->usec_delay = E1000_VF_MBX_INIT_DELAY;
+
+ mbx->size = E1000_VFMAILBOX_SIZE;
+
+ mbx->ops.read = e1000_read_mbx_vf;
+ mbx->ops.write = e1000_write_mbx_vf;
+ mbx->ops.read_posted = e1000_read_posted_mbx;
+ mbx->ops.write_posted = e1000_write_posted_mbx;
+ mbx->ops.check_for_msg = e1000_check_for_msg_vf;
+ mbx->ops.check_for_ack = e1000_check_for_ack_vf;
+ mbx->ops.check_for_rst = e1000_check_for_rst_vf;
+
+ mbx->stats.msgs_tx = 0;
+ mbx->stats.msgs_rx = 0;
+ mbx->stats.reqs = 0;
+ mbx->stats.acks = 0;
+ mbx->stats.rsts = 0;
+
+ return E1000_SUCCESS;
+}
+
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 1999 - 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#ifndef _E1000_MBX_H_
+#define _E1000_MBX_H_
+
+#include "vf.h"
+
+#define E1000_V2PMAILBOX_REQ 0x00000001 /* Request for PF Ready bit */
+#define E1000_V2PMAILBOX_ACK 0x00000002 /* Ack PF message received */
+#define E1000_V2PMAILBOX_VFU 0x00000004 /* VF owns the mailbox buffer */
+#define E1000_V2PMAILBOX_PFU 0x00000008 /* PF owns the mailbox buffer */
+#define E1000_V2PMAILBOX_PFSTS 0x00000010 /* PF wrote a message in the MB */
+#define E1000_V2PMAILBOX_PFACK 0x00000020 /* PF ack the previous VF msg */
+#define E1000_V2PMAILBOX_RSTI 0x00000040 /* PF has reset indication */
+#define E1000_V2PMAILBOX_RSTD 0x00000080 /* PF has indicated reset done */
+#define E1000_V2PMAILBOX_R2C_BITS 0x000000B0 /* All read to clear bits */
+
+#define E1000_VFMAILBOX_SIZE 16 /* 16 32 bit words - 64 bytes */
+
+/* If it's a E1000_VF_* msg then it originates in the VF and is sent to the
+ * PF. The reverse is true if it is E1000_PF_*.
+ * Message ACK's are the value or'd with 0xF0000000
+ */
+#define E1000_VT_MSGTYPE_ACK 0x80000000 /* Messages below or'd with
+ * this are the ACK */
+#define E1000_VT_MSGTYPE_NACK 0x40000000 /* Messages below or'd with
+ * this are the NACK */
+#define E1000_VT_MSGTYPE_CTS 0x20000000 /* Indicates that VF is still
+ clear to send requests */
+
+/* We have a total wait time of 1s for vf mailbox posted messages */
+#define E1000_VF_MBX_INIT_TIMEOUT 2000 /* retry count for mailbox timeout */
+#define E1000_VF_MBX_INIT_DELAY 500 /* usec delay between retries */
+
+#define E1000_VT_MSGINFO_SHIFT 16
+/* bits 23:16 are used for exra info for certain messages */
+#define E1000_VT_MSGINFO_MASK (0xFF << E1000_VT_MSGINFO_SHIFT)
+
+#define E1000_VF_RESET 0x01 /* VF requests reset */
+#define E1000_VF_SET_MAC_ADDR 0x02 /* VF requests PF to set MAC addr */
+#define E1000_VF_SET_MULTICAST 0x03 /* VF requests PF to set MC addr */
+#define E1000_VF_SET_VLAN 0x04 /* VF requests PF to set VLAN */
+#define E1000_VF_SET_LPE 0x05 /* VF requests PF to set VMOLR.LPE */
+
+#define E1000_PF_CONTROL_MSG 0x0100 /* PF control message */
+
+void e1000_init_mbx_ops_generic(struct e1000_hw *hw);
+s32 e1000_init_mbx_params_vf(struct e1000_hw *);
+
+#endif /* _E1000_MBX_H_ */
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/pci.h>
+#include <linux/vmalloc.h>
+#include <linux/pagemap.h>
+#include <linux/delay.h>
+#include <linux/netdevice.h>
+#include <linux/tcp.h>
+#include <linux/ipv6.h>
+#include <net/checksum.h>
+#include <net/ip6_checksum.h>
+#include <linux/mii.h>
+#include <linux/ethtool.h>
+#include <linux/if_vlan.h>
+#include <linux/pm_qos_params.h>
+
+#include "igbvf.h"
+
+#define DRV_VERSION "1.0.0-k0"
+char igbvf_driver_name[] = "igbvf";
+const char igbvf_driver_version[] = DRV_VERSION;
+static const char igbvf_driver_string[] =
+ "Intel(R) Virtual Function Network Driver";
+static const char igbvf_copyright[] = "Copyright (c) 2009 Intel Corporation.";
+
+static int igbvf_poll(struct napi_struct *napi, int budget);
+
+static struct igbvf_info igbvf_vf_info = {
+ .mac = e1000_vfadapt,
+ .flags = FLAG_HAS_JUMBO_FRAMES
+ | FLAG_RX_CSUM_ENABLED,
+ .pba = 10,
+ .init_ops = e1000_init_function_pointers_vf,
+};
+
+static const struct igbvf_info *igbvf_info_tbl[] = {
+ [board_vf] = &igbvf_vf_info,
+};
+
+/**
+ * igbvf_desc_unused - calculate if we have unused descriptors
+ **/
+static int igbvf_desc_unused(struct igbvf_ring *ring)
+{
+ if (ring->next_to_clean > ring->next_to_use)
+ return ring->next_to_clean - ring->next_to_use - 1;
+
+ return ring->count + ring->next_to_clean - ring->next_to_use - 1;
+}
+
+/**
+ * igbvf_receive_skb - helper function to handle Rx indications
+ * @adapter: board private structure
+ * @status: descriptor status field as written by hardware
+ * @vlan: descriptor vlan field as written by hardware (no le/be conversion)
+ * @skb: pointer to sk_buff to be indicated to stack
+ **/
+static void igbvf_receive_skb(struct igbvf_adapter *adapter,
+ struct net_device *netdev,
+ struct sk_buff *skb,
+ u32 status, u16 vlan)
+{
+ if (adapter->vlgrp && (status & E1000_RXD_STAT_VP))
+ vlan_hwaccel_receive_skb(skb, adapter->vlgrp,
+ le16_to_cpu(vlan) &
+ E1000_RXD_SPC_VLAN_MASK);
+ else
+ netif_receive_skb(skb);
+
+ netdev->last_rx = jiffies;
+}
+
+static inline void igbvf_rx_checksum_adv(struct igbvf_adapter *adapter,
+ u32 status_err, struct sk_buff *skb)
+{
+ skb->ip_summed = CHECKSUM_NONE;
+
+ /* Ignore Checksum bit is set or checksum is disabled through ethtool */
+ if ((status_err & E1000_RXD_STAT_IXSM))
+ return;
+ /* TCP/UDP checksum error bit is set */
+ if (status_err &
+ (E1000_RXDEXT_STATERR_TCPE | E1000_RXDEXT_STATERR_IPE)) {
+ /* let the stack verify checksum errors */
+ adapter->hw_csum_err++;
+ return;
+ }
+ /* It must be a TCP or UDP packet with a valid checksum */
+ if (status_err & (E1000_RXD_STAT_TCPCS | E1000_RXD_STAT_UDPCS))
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ adapter->hw_csum_good++;
+}
+
+/**
+ * igbvf_alloc_rx_buffers - Replace used receive buffers; packet split
+ * @rx_ring: address of ring structure to repopulate
+ * @cleaned_count: number of buffers to repopulate
+ **/
+static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
+ int cleaned_count)
+{
+ struct igbvf_adapter *adapter = rx_ring->adapter;
+ struct net_device *netdev = adapter->netdev;
+ struct pci_dev *pdev = adapter->pdev;
+ union e1000_adv_rx_desc *rx_desc;
+ struct igbvf_buffer *buffer_info;
+ struct sk_buff *skb;
+ unsigned int i;
+ int bufsz;
+
+ i = rx_ring->next_to_use;
+ buffer_info = &rx_ring->buffer_info[i];
+
+ if (adapter->rx_ps_hdr_size)
+ bufsz = adapter->rx_ps_hdr_size;
+ else
+ bufsz = adapter->rx_buffer_len;
+ bufsz += NET_IP_ALIGN;
+
+ while (cleaned_count--) {
+ rx_desc = IGBVF_RX_DESC_ADV(*rx_ring, i);
+
+ if (adapter->rx_ps_hdr_size && !buffer_info->page_dma) {
+ if (!buffer_info->page) {
+ buffer_info->page = alloc_page(GFP_ATOMIC);
+ if (!buffer_info->page) {
+ adapter->alloc_rx_buff_failed++;
+ goto no_buffers;
+ }
+ buffer_info->page_offset = 0;
+ } else {
+ buffer_info->page_offset ^= PAGE_SIZE / 2;
+ }
+ buffer_info->page_dma =
+ pci_map_page(pdev, buffer_info->page,
+ buffer_info->page_offset,
+ PAGE_SIZE / 2,
+ PCI_DMA_FROMDEVICE);
+ }
+
+ if (!buffer_info->skb) {
+ skb = netdev_alloc_skb(netdev, bufsz);
+ if (!skb) {
+ adapter->alloc_rx_buff_failed++;
+ goto no_buffers;
+ }
+
+ /* Make buffer alignment 2 beyond a 16 byte boundary
+ * this will result in a 16 byte aligned IP header after
+ * the 14 byte MAC header is removed
+ */
+ skb_reserve(skb, NET_IP_ALIGN);
+
+ buffer_info->skb = skb;
+ buffer_info->dma = pci_map_single(pdev, skb->data,
+ bufsz,
+ PCI_DMA_FROMDEVICE);
+ }
+ /* Refresh the desc even if buffer_addrs didn't change because
+ * each write-back erases this info. */
+ if (adapter->rx_ps_hdr_size) {
+ rx_desc->read.pkt_addr =
+ cpu_to_le64(buffer_info->page_dma);
+ rx_desc->read.hdr_addr = cpu_to_le64(buffer_info->dma);
+ } else {
+ rx_desc->read.pkt_addr =
+ cpu_to_le64(buffer_info->dma);
+ rx_desc->read.hdr_addr = 0;
+ }
+
+ i++;
+ if (i == rx_ring->count)
+ i = 0;
+ buffer_info = &rx_ring->buffer_info[i];
+ }
+
+no_buffers:
+ if (rx_ring->next_to_use != i) {
+ rx_ring->next_to_use = i;
+ if (i == 0)
+ i = (rx_ring->count - 1);
+ else
+ i--;
+
+ /* Force memory writes to complete before letting h/w
+ * know there are new descriptors to fetch. (Only
+ * applicable for weak-ordered memory model archs,
+ * such as IA-64). */
+ wmb();
+ writel(i, adapter->hw.hw_addr + rx_ring->tail);
+ }
+}
+
+/**
+ * igbvf_clean_rx_irq - Send received data up the network stack; legacy
+ * @adapter: board private structure
+ *
+ * the return value indicates whether actual cleaning was done, there
+ * is no guarantee that everything was cleaned
+ **/
+static bool igbvf_clean_rx_irq(struct igbvf_adapter *adapter,
+ int *work_done, int work_to_do)
+{
+ struct igbvf_ring *rx_ring = adapter->rx_ring;
+ struct net_device *netdev = adapter->netdev;
+ struct pci_dev *pdev = adapter->pdev;
+ union e1000_adv_rx_desc *rx_desc, *next_rxd;
+ struct igbvf_buffer *buffer_info, *next_buffer;
+ struct sk_buff *skb;
+ bool cleaned = false;
+ int cleaned_count = 0;
+ unsigned int total_bytes = 0, total_packets = 0;
+ unsigned int i;
+ u32 length, hlen, staterr;
+
+ i = rx_ring->next_to_clean;
+ rx_desc = IGBVF_RX_DESC_ADV(*rx_ring, i);
+ staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
+
+ while (staterr & E1000_RXD_STAT_DD) {
+ if (*work_done >= work_to_do)
+ break;
+ (*work_done)++;
+
+ buffer_info = &rx_ring->buffer_info[i];
+
+ /* HW will not DMA in data larger than the given buffer, even
+ * if it parses the (NFS, of course) header to be larger. In
+ * that case, it fills the header buffer and spills the rest
+ * into the page.
+ */
+ hlen = (le16_to_cpu(rx_desc->wb.lower.lo_dword.hs_rss.hdr_info) &
+ E1000_RXDADV_HDRBUFLEN_MASK) >> E1000_RXDADV_HDRBUFLEN_SHIFT;
+ if (hlen > adapter->rx_ps_hdr_size)
+ hlen = adapter->rx_ps_hdr_size;
+
+ length = le16_to_cpu(rx_desc->wb.upper.length);
+ cleaned = true;
+ cleaned_count++;
+
+ skb = buffer_info->skb;
+ prefetch(skb->data - NET_IP_ALIGN);
+ buffer_info->skb = NULL;
+ if (!adapter->rx_ps_hdr_size) {
+ pci_unmap_single(pdev, buffer_info->dma,
+ adapter->rx_buffer_len,
+ PCI_DMA_FROMDEVICE);
+ buffer_info->dma = 0;
+ skb_put(skb, length);
+ goto send_up;
+ }
+
+ if (!skb_shinfo(skb)->nr_frags) {
+ pci_unmap_single(pdev, buffer_info->dma,
+ adapter->rx_ps_hdr_size + NET_IP_ALIGN,
+ PCI_DMA_FROMDEVICE);
+ skb_put(skb, hlen);
+ }
+
+ if (length) {
+ pci_unmap_page(pdev, buffer_info->page_dma,
+ PAGE_SIZE / 2,
+ PCI_DMA_FROMDEVICE);
+ buffer_info->page_dma = 0;
+
+ skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
+ buffer_info->page,
+ buffer_info->page_offset,
+ length);
+
+ if ((adapter->rx_buffer_len > (PAGE_SIZE / 2)) ||
+ (page_count(buffer_info->page) != 1))
+ buffer_info->page = NULL;
+ else
+ get_page(buffer_info->page);
+
+ skb->len += length;
+ skb->data_len += length;
+ skb->truesize += length;
+ }
+send_up:
+ i++;
+ if (i == rx_ring->count)
+ i = 0;
+ next_rxd = IGBVF_RX_DESC_ADV(*rx_ring, i);
+ prefetch(next_rxd);
+ next_buffer = &rx_ring->buffer_info[i];
+
+ if (!(staterr & E1000_RXD_STAT_EOP)) {
+ buffer_info->skb = next_buffer->skb;
+ buffer_info->dma = next_buffer->dma;
+ next_buffer->skb = skb;
+ next_buffer->dma = 0;
+ goto next_desc;
+ }
+
+ if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
+ dev_kfree_skb_irq(skb);
+ goto next_desc;
+ }
+
+ total_bytes += skb->len;
+ total_packets++;
+
+ igbvf_rx_checksum_adv(adapter, staterr, skb);
+
+ skb->protocol = eth_type_trans(skb, netdev);
+
+ igbvf_receive_skb(adapter, netdev, skb, staterr,
+ rx_desc->wb.upper.vlan);
+
+ netdev->last_rx = jiffies;
+
+next_desc:
+ rx_desc->wb.upper.status_error = 0;
+
+ /* return some buffers to hardware, one at a time is too slow */
+ if (cleaned_count >= IGBVF_RX_BUFFER_WRITE) {
+ igbvf_alloc_rx_buffers(rx_ring, cleaned_count);
+ cleaned_count = 0;
+ }
+
+ /* use prefetched values */
+ rx_desc = next_rxd;
+ buffer_info = next_buffer;
+
+ staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
+ }
+
+ rx_ring->next_to_clean = i;
+ cleaned_count = igbvf_desc_unused(rx_ring);
+
+ if (cleaned_count)
+ igbvf_alloc_rx_buffers(rx_ring, cleaned_count);
+
+ adapter->total_rx_packets += total_packets;
+ adapter->total_rx_bytes += total_bytes;
+ adapter->net_stats.rx_bytes += total_bytes;
+ adapter->net_stats.rx_packets += total_packets;
+ return cleaned;
+}
+
+static void igbvf_put_txbuf(struct igbvf_adapter *adapter,
+ struct igbvf_buffer *buffer_info)
+{
+ buffer_info->dma = 0;
+ if (buffer_info->skb) {
+ skb_dma_unmap(&adapter->pdev->dev, buffer_info->skb,
+ DMA_TO_DEVICE);
+ dev_kfree_skb_any(buffer_info->skb);
+ buffer_info->skb = NULL;
+ }
+ buffer_info->time_stamp = 0;
+}
+
+static void igbvf_print_tx_hang(struct igbvf_adapter *adapter)
+{
+ struct igbvf_ring *tx_ring = adapter->tx_ring;
+ unsigned int i = tx_ring->next_to_clean;
+ unsigned int eop = tx_ring->buffer_info[i].next_to_watch;
+ union e1000_adv_tx_desc *eop_desc = IGBVF_TX_DESC_ADV(*tx_ring, eop);
+
+ /* detected Tx unit hang */
+ dev_err(&adapter->pdev->dev,
+ "Detected Tx Unit Hang:\n"
+ " TDH <%x>\n"
+ " TDT <%x>\n"
+ " next_to_use <%x>\n"
+ " next_to_clean <%x>\n"
+ "buffer_info[next_to_clean]:\n"
+ " time_stamp <%lx>\n"
+ " next_to_watch <%x>\n"
+ " jiffies <%lx>\n"
+ " next_to_watch.status <%x>\n",
+ readl(adapter->hw.hw_addr + tx_ring->head),
+ readl(adapter->hw.hw_addr + tx_ring->tail),
+ tx_ring->next_to_use,
+ tx_ring->next_to_clean,
+ tx_ring->buffer_info[eop].time_stamp,
+ eop,
+ jiffies,
+ eop_desc->wb.status);
+}
+
+/**
+ * igbvf_setup_tx_resources - allocate Tx resources (Descriptors)
+ * @adapter: board private structure
+ *
+ * Return 0 on success, negative on failure
+ **/
+int igbvf_setup_tx_resources(struct igbvf_adapter *adapter,
+ struct igbvf_ring *tx_ring)
+{
+ struct pci_dev *pdev = adapter->pdev;
+ int size;
+
+ size = sizeof(struct igbvf_buffer) * tx_ring->count;
+ tx_ring->buffer_info = vmalloc(size);
+ if (!tx_ring->buffer_info)
+ goto err;
+ memset(tx_ring->buffer_info, 0, size);
+
+ /* round up to nearest 4K */
+ tx_ring->size = tx_ring->count * sizeof(union e1000_adv_tx_desc);
+ tx_ring->size = ALIGN(tx_ring->size, 4096);
+
+ tx_ring->desc = pci_alloc_consistent(pdev, tx_ring->size,
+ &tx_ring->dma);
+
+ if (!tx_ring->desc)
+ goto err;
+
+ tx_ring->adapter = adapter;
+ tx_ring->next_to_use = 0;
+ tx_ring->next_to_clean = 0;
+
+ return 0;
+err:
+ vfree(tx_ring->buffer_info);
+ dev_err(&adapter->pdev->dev,
+ "Unable to allocate memory for the transmit descriptor ring\n");
+ return -ENOMEM;
+}
+
+/**
+ * igbvf_setup_rx_resources - allocate Rx resources (Descriptors)
+ * @adapter: board private structure
+ *
+ * Returns 0 on success, negative on failure
+ **/
+int igbvf_setup_rx_resources(struct igbvf_adapter *adapter,
+ struct igbvf_ring *rx_ring)
+{
+ struct pci_dev *pdev = adapter->pdev;
+ int size, desc_len;
+
+ size = sizeof(struct igbvf_buffer) * rx_ring->count;
+ rx_ring->buffer_info = vmalloc(size);
+ if (!rx_ring->buffer_info)
+ goto err;
+ memset(rx_ring->buffer_info, 0, size);
+
+ desc_len = sizeof(union e1000_adv_rx_desc);
+
+ /* Round up to nearest 4K */
+ rx_ring->size = rx_ring->count * desc_len;
+ rx_ring->size = ALIGN(rx_ring->size, 4096);
+
+ rx_ring->desc = pci_alloc_consistent(pdev, rx_ring->size,
+ &rx_ring->dma);
+
+ if (!rx_ring->desc)
+ goto err;
+
+ rx_ring->next_to_clean = 0;
+ rx_ring->next_to_use = 0;
+
+ rx_ring->adapter = adapter;
+
+ return 0;
+
+err:
+ vfree(rx_ring->buffer_info);
+ rx_ring->buffer_info = NULL;
+ dev_err(&adapter->pdev->dev,
+ "Unable to allocate memory for the receive descriptor ring\n");
+ return -ENOMEM;
+}
+
+/**
+ * igbvf_clean_tx_ring - Free Tx Buffers
+ * @tx_ring: ring to be cleaned
+ **/
+static void igbvf_clean_tx_ring(struct igbvf_ring *tx_ring)
+{
+ struct igbvf_adapter *adapter = tx_ring->adapter;
+ struct igbvf_buffer *buffer_info;
+ unsigned long size;
+ unsigned int i;
+
+ if (!tx_ring->buffer_info)
+ return;
+
+ /* Free all the Tx ring sk_buffs */
+ for (i = 0; i < tx_ring->count; i++) {
+ buffer_info = &tx_ring->buffer_info[i];
+ igbvf_put_txbuf(adapter, buffer_info);
+ }
+
+ size = sizeof(struct igbvf_buffer) * tx_ring->count;
+ memset(tx_ring->buffer_info, 0, size);
+
+ /* Zero out the descriptor ring */
+ memset(tx_ring->desc, 0, tx_ring->size);
+
+ tx_ring->next_to_use = 0;
+ tx_ring->next_to_clean = 0;
+
+ writel(0, adapter->hw.hw_addr + tx_ring->head);
+ writel(0, adapter->hw.hw_addr + tx_ring->tail);
+}
+
+/**
+ * igbvf_free_tx_resources - Free Tx Resources per Queue
+ * @tx_ring: ring to free resources from
+ *
+ * Free all transmit software resources
+ **/
+void igbvf_free_tx_resources(struct igbvf_ring *tx_ring)
+{
+ struct pci_dev *pdev = tx_ring->adapter->pdev;
+
+ igbvf_clean_tx_ring(tx_ring);
+
+ vfree(tx_ring->buffer_info);
+ tx_ring->buffer_info = NULL;
+
+ pci_free_consistent(pdev, tx_ring->size, tx_ring->desc, tx_ring->dma);
+
+ tx_ring->desc = NULL;
+}
+
+/**
+ * igbvf_clean_rx_ring - Free Rx Buffers per Queue
+ * @adapter: board private structure
+ **/
+static void igbvf_clean_rx_ring(struct igbvf_ring *rx_ring)
+{
+ struct igbvf_adapter *adapter = rx_ring->adapter;
+ struct igbvf_buffer *buffer_info;
+ struct pci_dev *pdev = adapter->pdev;
+ unsigned long size;
+ unsigned int i;
+
+ if (!rx_ring->buffer_info)
+ return;
+
+ /* Free all the Rx ring sk_buffs */
+ for (i = 0; i < rx_ring->count; i++) {
+ buffer_info = &rx_ring->buffer_info[i];
+ if (buffer_info->dma) {
+ if (adapter->rx_ps_hdr_size){
+ pci_unmap_single(pdev, buffer_info->dma,
+ adapter->rx_ps_hdr_size,
+ PCI_DMA_FROMDEVICE);
+ } else {
+ pci_unmap_single(pdev, buffer_info->dma,
+ adapter->rx_buffer_len,
+ PCI_DMA_FROMDEVICE);
+ }
+ buffer_info->dma = 0;
+ }
+
+ if (buffer_info->skb) {
+ dev_kfree_skb(buffer_info->skb);
+ buffer_info->skb = NULL;
+ }
+
+ if (buffer_info->page) {
+ if (buffer_info->page_dma)
+ pci_unmap_page(pdev, buffer_info->page_dma,
+ PAGE_SIZE / 2,
+ PCI_DMA_FROMDEVICE);
+ put_page(buffer_info->page);
+ buffer_info->page = NULL;
+ buffer_info->page_dma = 0;
+ buffer_info->page_offset = 0;
+ }
+ }
+
+ size = sizeof(struct igbvf_buffer) * rx_ring->count;
+ memset(rx_ring->buffer_info, 0, size);
+
+ /* Zero out the descriptor ring */
+ memset(rx_ring->desc, 0, rx_ring->size);
+
+ rx_ring->next_to_clean = 0;
+ rx_ring->next_to_use = 0;
+
+ writel(0, adapter->hw.hw_addr + rx_ring->head);
+ writel(0, adapter->hw.hw_addr + rx_ring->tail);
+}
+
+/**
+ * igbvf_free_rx_resources - Free Rx Resources
+ * @rx_ring: ring to clean the resources from
+ *
+ * Free all receive software resources
+ **/
+
+void igbvf_free_rx_resources(struct igbvf_ring *rx_ring)
+{
+ struct pci_dev *pdev = rx_ring->adapter->pdev;
+
+ igbvf_clean_rx_ring(rx_ring);
+
+ vfree(rx_ring->buffer_info);
+ rx_ring->buffer_info = NULL;
+
+ dma_free_coherent(&pdev->dev, rx_ring->size, rx_ring->desc,
+ rx_ring->dma);
+ rx_ring->desc = NULL;
+}
+
+/**
+ * igbvf_update_itr - update the dynamic ITR value based on statistics
+ * @adapter: pointer to adapter
+ * @itr_setting: current adapter->itr
+ * @packets: the number of packets during this measurement interval
+ * @bytes: the number of bytes during this measurement interval
+ *
+ * Stores a new ITR value based on packets and byte
+ * counts during the last interrupt. The advantage of per interrupt
+ * computation is faster updates and more accurate ITR for the current
+ * traffic pattern. Constants in this function were computed
+ * based on theoretical maximum wire speed and thresholds were set based
+ * on testing data as well as attempting to minimize response time
+ * while increasing bulk throughput. This functionality is controlled
+ * by the InterruptThrottleRate module parameter.
+ **/
+static unsigned int igbvf_update_itr(struct igbvf_adapter *adapter,
+ u16 itr_setting, int packets,
+ int bytes)
+{
+ unsigned int retval = itr_setting;
+
+ if (packets == 0)
+ goto update_itr_done;
+
+ switch (itr_setting) {
+ case lowest_latency:
+ /* handle TSO and jumbo frames */
+ if (bytes/packets > 8000)
+ retval = bulk_latency;
+ else if ((packets < 5) && (bytes > 512))
+ retval = low_latency;
+ break;
+ case low_latency: /* 50 usec aka 20000 ints/s */
+ if (bytes > 10000) {
+ /* this if handles the TSO accounting */
+ if (bytes/packets > 8000)
+ retval = bulk_latency;
+ else if ((packets < 10) || ((bytes/packets) > 1200))
+ retval = bulk_latency;
+ else if ((packets > 35))
+ retval = lowest_latency;
+ } else if (bytes/packets > 2000) {
+ retval = bulk_latency;
+ } else if (packets <= 2 && bytes < 512) {
+ retval = lowest_latency;
+ }
+ break;
+ case bulk_latency: /* 250 usec aka 4000 ints/s */
+ if (bytes > 25000) {
+ if (packets > 35)
+ retval = low_latency;
+ } else if (bytes < 6000) {
+ retval = low_latency;
+ }
+ break;
+ }
+
+update_itr_done:
+ return retval;
+}
+
+static void igbvf_set_itr(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ u16 current_itr;
+ u32 new_itr = adapter->itr;
+
+ adapter->tx_itr = igbvf_update_itr(adapter, adapter->tx_itr,
+ adapter->total_tx_packets,
+ adapter->total_tx_bytes);
+ /* conservative mode (itr 3) eliminates the lowest_latency setting */
+ if (adapter->itr_setting == 3 && adapter->tx_itr == lowest_latency)
+ adapter->tx_itr = low_latency;
+
+ adapter->rx_itr = igbvf_update_itr(adapter, adapter->rx_itr,
+ adapter->total_rx_packets,
+ adapter->total_rx_bytes);
+ /* conservative mode (itr 3) eliminates the lowest_latency setting */
+ if (adapter->itr_setting == 3 && adapter->rx_itr == lowest_latency)
+ adapter->rx_itr = low_latency;
+
+ current_itr = max(adapter->rx_itr, adapter->tx_itr);
+
+ switch (current_itr) {
+ /* counts and packets in update_itr are dependent on these numbers */
+ case lowest_latency:
+ new_itr = 70000;
+ break;
+ case low_latency:
+ new_itr = 20000; /* aka hwitr = ~200 */
+ break;
+ case bulk_latency:
+ new_itr = 4000;
+ break;
+ default:
+ break;
+ }
+
+ if (new_itr != adapter->itr) {
+ /*
+ * this attempts to bias the interrupt rate towards Bulk
+ * by adding intermediate steps when interrupt rate is
+ * increasing
+ */
+ new_itr = new_itr > adapter->itr ?
+ min(adapter->itr + (new_itr >> 2), new_itr) :
+ new_itr;
+ adapter->itr = new_itr;
+ adapter->rx_ring->itr_val = 1952;
+
+ if (adapter->msix_entries)
+ adapter->rx_ring->set_itr = 1;
+ else
+ ew32(ITR, 1952);
+ }
+}
+
+/**
+ * igbvf_clean_tx_irq - Reclaim resources after transmit completes
+ * @adapter: board private structure
+ * returns true if ring is completely cleaned
+ **/
+static bool igbvf_clean_tx_irq(struct igbvf_ring *tx_ring)
+{
+ struct igbvf_adapter *adapter = tx_ring->adapter;
+ struct e1000_hw *hw = &adapter->hw;
+ struct net_device *netdev = adapter->netdev;
+ struct igbvf_buffer *buffer_info;
+ struct sk_buff *skb;
+ union e1000_adv_tx_desc *tx_desc, *eop_desc;
+ unsigned int total_bytes = 0, total_packets = 0;
+ unsigned int i, eop, count = 0;
+ bool cleaned = false;
+
+ i = tx_ring->next_to_clean;
+ eop = tx_ring->buffer_info[i].next_to_watch;
+ eop_desc = IGBVF_TX_DESC_ADV(*tx_ring, eop);
+
+ while ((eop_desc->wb.status & cpu_to_le32(E1000_TXD_STAT_DD)) &&
+ (count < tx_ring->count)) {
+ for (cleaned = false; !cleaned; count++) {
+ tx_desc = IGBVF_TX_DESC_ADV(*tx_ring, i);
+ buffer_info = &tx_ring->buffer_info[i];
+ cleaned = (i == eop);
+ skb = buffer_info->skb;
+
+ if (skb) {
+ unsigned int segs, bytecount;
+
+ /* gso_segs is currently only valid for tcp */
+ segs = skb_shinfo(skb)->gso_segs ?: 1;
+ /* multiply data chunks by size of headers */
+ bytecount = ((segs - 1) * skb_headlen(skb)) +
+ skb->len;
+ total_packets += segs;
+ total_bytes += bytecount;
+ }
+
+ igbvf_put_txbuf(adapter, buffer_info);
+ tx_desc->wb.status = 0;
+
+ i++;
+ if (i == tx_ring->count)
+ i = 0;
+ }
+ eop = tx_ring->buffer_info[i].next_to_watch;
+ eop_desc = IGBVF_TX_DESC_ADV(*tx_ring, eop);
+ }
+
+ tx_ring->next_to_clean = i;
+
+ if (unlikely(count &&
+ netif_carrier_ok(netdev) &&
+ igbvf_desc_unused(tx_ring) >= IGBVF_TX_QUEUE_WAKE)) {
+ /* Make sure that anybody stopping the queue after this
+ * sees the new next_to_clean.
+ */
+ smp_mb();
+ if (netif_queue_stopped(netdev) &&
+ !(test_bit(__IGBVF_DOWN, &adapter->state))) {
+ netif_wake_queue(netdev);
+ ++adapter->restart_queue;
+ }
+ }
+
+ if (adapter->detect_tx_hung) {
+ /* Detect a transmit hang in hardware, this serializes the
+ * check with the clearing of time_stamp and movement of i */
+ adapter->detect_tx_hung = false;
+ if (tx_ring->buffer_info[i].time_stamp &&
+ time_after(jiffies, tx_ring->buffer_info[i].time_stamp +
+ (adapter->tx_timeout_factor * HZ))
+ && !(er32(STATUS) & E1000_STATUS_TXOFF)) {
+
+ tx_desc = IGBVF_TX_DESC_ADV(*tx_ring, i);
+ /* detected Tx unit hang */
+ igbvf_print_tx_hang(adapter);
+
+ netif_stop_queue(netdev);
+ }
+ }
+ adapter->net_stats.tx_bytes += total_bytes;
+ adapter->net_stats.tx_packets += total_packets;
+ return (count < tx_ring->count);
+}
+
+static irqreturn_t igbvf_msix_other(int irq, void *data)
+{
+ struct net_device *netdev = data;
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+
+ adapter->int_counter1++;
+
+ netif_carrier_off(netdev);
+ hw->mac.get_link_status = 1;
+ if (!test_bit(__IGBVF_DOWN, &adapter->state))
+ mod_timer(&adapter->watchdog_timer, jiffies + 1);
+
+ ew32(EIMS, adapter->eims_other);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t igbvf_intr_msix_tx(int irq, void *data)
+{
+ struct net_device *netdev = data;
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+ struct igbvf_ring *tx_ring = adapter->tx_ring;
+
+
+ adapter->total_tx_bytes = 0;
+ adapter->total_tx_packets = 0;
+
+ /* auto mask will automatically reenable the interrupt when we write
+ * EICS */
+ if (!igbvf_clean_tx_irq(tx_ring))
+ /* Ring was not completely cleaned, so fire another interrupt */
+ ew32(EICS, tx_ring->eims_value);
+ else
+ ew32(EIMS, tx_ring->eims_value);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t igbvf_intr_msix_rx(int irq, void *data)
+{
+ struct net_device *netdev = data;
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ adapter->int_counter0++;
+
+ /* Write the ITR value calculated at the end of the
+ * previous interrupt.
+ */
+ if (adapter->rx_ring->set_itr) {
+ writel(adapter->rx_ring->itr_val,
+ adapter->hw.hw_addr + adapter->rx_ring->itr_register);
+ adapter->rx_ring->set_itr = 0;
+ }
+
+ if (napi_schedule_prep(&adapter->rx_ring->napi)) {
+ adapter->total_rx_bytes = 0;
+ adapter->total_rx_packets = 0;
+ __napi_schedule(&adapter->rx_ring->napi);
+ }
+
+ return IRQ_HANDLED;
+}
+
+#define IGBVF_NO_QUEUE -1
+
+static void igbvf_assign_vector(struct igbvf_adapter *adapter, int rx_queue,
+ int tx_queue, int msix_vector)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ u32 ivar, index;
+
+ /* 82576 uses a table-based method for assigning vectors.
+ Each queue has a single entry in the table to which we write
+ a vector number along with a "valid" bit. Sadly, the layout
+ of the table is somewhat counterintuitive. */
+ if (rx_queue > IGBVF_NO_QUEUE) {
+ index = (rx_queue >> 1);
+ ivar = array_er32(IVAR0, index);
+ if (rx_queue & 0x1) {
+ /* vector goes into third byte of register */
+ ivar = ivar & 0xFF00FFFF;
+ ivar |= (msix_vector | E1000_IVAR_VALID) << 16;
+ } else {
+ /* vector goes into low byte of register */
+ ivar = ivar & 0xFFFFFF00;
+ ivar |= msix_vector | E1000_IVAR_VALID;
+ }
+ adapter->rx_ring[rx_queue].eims_value = 1 << msix_vector;
+ array_ew32(IVAR0, index, ivar);
+ }
+ if (tx_queue > IGBVF_NO_QUEUE) {
+ index = (tx_queue >> 1);
+ ivar = array_er32(IVAR0, index);
+ if (tx_queue & 0x1) {
+ /* vector goes into high byte of register */
+ ivar = ivar & 0x00FFFFFF;
+ ivar |= (msix_vector | E1000_IVAR_VALID) << 24;
+ } else {
+ /* vector goes into second byte of register */
+ ivar = ivar & 0xFFFF00FF;
+ ivar |= (msix_vector | E1000_IVAR_VALID) << 8;
+ }
+ adapter->tx_ring[tx_queue].eims_value = 1 << msix_vector;
+ array_ew32(IVAR0, index, ivar);
+ }
+}
+
+/**
+ * igbvf_configure_msix - Configure MSI-X hardware
+ *
+ * igbvf_configure_msix sets up the hardware to properly
+ * generate MSI-X interrupts.
+ **/
+static void igbvf_configure_msix(struct igbvf_adapter *adapter)
+{
+ u32 tmp;
+ struct e1000_hw *hw = &adapter->hw;
+ struct igbvf_ring *tx_ring = adapter->tx_ring;
+ struct igbvf_ring *rx_ring = adapter->rx_ring;
+ int vector = 0;
+
+ adapter->eims_enable_mask = 0;
+
+ igbvf_assign_vector(adapter, IGBVF_NO_QUEUE, 0, vector++);
+ adapter->eims_enable_mask |= tx_ring->eims_value;
+ if (tx_ring->itr_val)
+ writel(tx_ring->itr_val,
+ hw->hw_addr + tx_ring->itr_register);
+ else
+ writel(1952, hw->hw_addr + tx_ring->itr_register);
+
+ igbvf_assign_vector(adapter, 0, IGBVF_NO_QUEUE, vector++);
+ adapter->eims_enable_mask |= rx_ring->eims_value;
+ if (rx_ring->itr_val)
+ writel(rx_ring->itr_val,
+ hw->hw_addr + rx_ring->itr_register);
+ else
+ writel(1952, hw->hw_addr + rx_ring->itr_register);
+
+ /* set vector for other causes, i.e. link changes */
+
+ tmp = (vector++ | E1000_IVAR_VALID);
+
+ ew32(IVAR_MISC, tmp);
+
+ adapter->eims_enable_mask = (1 << (vector)) - 1;
+ adapter->eims_other = 1 << (vector - 1);
+ e1e_flush();
+}
+
+void igbvf_reset_interrupt_capability(struct igbvf_adapter *adapter)
+{
+ if (adapter->msix_entries) {
+ pci_disable_msix(adapter->pdev);
+ kfree(adapter->msix_entries);
+ adapter->msix_entries = NULL;
+ }
+}
+
+/**
+ * igbvf_set_interrupt_capability - set MSI or MSI-X if supported
+ *
+ * Attempt to configure interrupts using the best available
+ * capabilities of the hardware and kernel.
+ **/
+void igbvf_set_interrupt_capability(struct igbvf_adapter *adapter)
+{
+ int err = -ENOMEM;
+ int i;
+
+ /* we allocate 3 vectors, 1 for tx, 1 for rx, one for pf messages */
+ adapter->msix_entries = kcalloc(3, sizeof(struct msix_entry),
+ GFP_KERNEL);
+ if (adapter->msix_entries) {
+ for (i = 0; i < 3; i++)
+ adapter->msix_entries[i].entry = i;
+
+ err = pci_enable_msix(adapter->pdev,
+ adapter->msix_entries, 3);
+ }
+
+ if (err) {
+ /* MSI-X failed */
+ dev_err(&adapter->pdev->dev,
+ "Failed to initialize MSI-X interrupts.\n");
+ igbvf_reset_interrupt_capability(adapter);
+ }
+}
+
+/**
+ * igbvf_request_msix - Initialize MSI-X interrupts
+ *
+ * igbvf_request_msix allocates MSI-X vectors and requests interrupts from the
+ * kernel.
+ **/
+static int igbvf_request_msix(struct igbvf_adapter *adapter)
+{
+ struct net_device *netdev = adapter->netdev;
+ int err = 0, vector = 0;
+
+ if (strlen(netdev->name) < (IFNAMSIZ - 5)) {
+ sprintf(adapter->tx_ring->name, "%s-tx-0", netdev->name);
+ sprintf(adapter->rx_ring->name, "%s-rx-0", netdev->name);
+ } else {
+ memcpy(adapter->tx_ring->name, netdev->name, IFNAMSIZ);
+ memcpy(adapter->rx_ring->name, netdev->name, IFNAMSIZ);
+ }
+
+ err = request_irq(adapter->msix_entries[vector].vector,
+ &igbvf_intr_msix_tx, 0, adapter->tx_ring->name,
+ netdev);
+ if (err)
+ goto out;
+
+ adapter->tx_ring->itr_register = E1000_EITR(vector);
+ adapter->tx_ring->itr_val = 1952;
+ vector++;
+
+ err = request_irq(adapter->msix_entries[vector].vector,
+ &igbvf_intr_msix_rx, 0, adapter->rx_ring->name,
+ netdev);
+ if (err)
+ goto out;
+
+ adapter->rx_ring->itr_register = E1000_EITR(vector);
+ adapter->rx_ring->itr_val = 1952;
+ vector++;
+
+ err = request_irq(adapter->msix_entries[vector].vector,
+ &igbvf_msix_other, 0, netdev->name, netdev);
+ if (err)
+ goto out;
+
+ igbvf_configure_msix(adapter);
+ return 0;
+out:
+ return err;
+}
+
+/**
+ * igbvf_alloc_queues - Allocate memory for all rings
+ * @adapter: board private structure to initialize
+ **/
+static int __devinit igbvf_alloc_queues(struct igbvf_adapter *adapter)
+{
+ struct net_device *netdev = adapter->netdev;
+
+ adapter->tx_ring = kzalloc(sizeof(struct igbvf_ring), GFP_KERNEL);
+ if (!adapter->tx_ring)
+ return -ENOMEM;
+
+ adapter->rx_ring = kzalloc(sizeof(struct igbvf_ring), GFP_KERNEL);
+ if (!adapter->rx_ring) {
+ kfree(adapter->tx_ring);
+ return -ENOMEM;
+ }
+
+ netif_napi_add(netdev, &adapter->rx_ring->napi, igbvf_poll, 64);
+
+ return 0;
+}
+
+/**
+ * igbvf_request_irq - initialize interrupts
+ *
+ * Attempts to configure interrupts using the best available
+ * capabilities of the hardware and kernel.
+ **/
+static int igbvf_request_irq(struct igbvf_adapter *adapter)
+{
+ int err = -1;
+
+ /* igbvf supports msi-x only */
+ if (adapter->msix_entries)
+ err = igbvf_request_msix(adapter);
+
+ if (!err)
+ return err;
+
+ dev_err(&adapter->pdev->dev,
+ "Unable to allocate interrupt, Error: %d\n", err);
+
+ return err;
+}
+
+static void igbvf_free_irq(struct igbvf_adapter *adapter)
+{
+ struct net_device *netdev = adapter->netdev;
+ int vector;
+
+ if (adapter->msix_entries) {
+ for (vector = 0; vector < 3; vector++)
+ free_irq(adapter->msix_entries[vector].vector, netdev);
+ }
+}
+
+/**
+ * igbvf_irq_disable - Mask off interrupt generation on the NIC
+ **/
+static void igbvf_irq_disable(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+
+ ew32(EIMC, ~0);
+
+ if (adapter->msix_entries)
+ ew32(EIAC, 0);
+}
+
+/**
+ * igbvf_irq_enable - Enable default interrupt generation settings
+ **/
+static void igbvf_irq_enable(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+
+ ew32(EIAC, adapter->eims_enable_mask);
+ ew32(EIAM, adapter->eims_enable_mask);
+ ew32(EIMS, adapter->eims_enable_mask);
+}
+
+/**
+ * igbvf_poll - NAPI Rx polling callback
+ * @napi: struct associated with this polling callback
+ * @budget: amount of packets driver is allowed to process this poll
+ **/
+static int igbvf_poll(struct napi_struct *napi, int budget)
+{
+ struct igbvf_ring *rx_ring = container_of(napi, struct igbvf_ring, napi);
+ struct igbvf_adapter *adapter = rx_ring->adapter;
+ struct e1000_hw *hw = &adapter->hw;
+ int work_done = 0;
+
+ igbvf_clean_rx_irq(adapter, &work_done, budget);
+
+ /* If not enough Rx work done, exit the polling mode */
+ if (work_done < budget) {
+ napi_complete(napi);
+
+ if (adapter->itr_setting & 3)
+ igbvf_set_itr(adapter);
+
+ if (!test_bit(__IGBVF_DOWN, &adapter->state))
+ ew32(EIMS, adapter->rx_ring->eims_value);
+ }
+
+ return work_done;
+}
+
+/**
+ * igbvf_set_rlpml - set receive large packet maximum length
+ * @adapter: board private structure
+ *
+ * Configure the maximum size of packets that will be received
+ */
+static void igbvf_set_rlpml(struct igbvf_adapter *adapter)
+{
+ int max_frame_size = adapter->max_frame_size;
+ struct e1000_hw *hw = &adapter->hw;
+
+ if (adapter->vlgrp)
+ max_frame_size += VLAN_TAG_SIZE;
+
+ e1000_rlpml_set_vf(hw, max_frame_size);
+}
+
+static void igbvf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+
+ if (hw->mac.ops.set_vfta(hw, vid, true))
+ dev_err(&adapter->pdev->dev, "Failed to add vlan id %d\n", vid);
+}
+
+static void igbvf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+
+ igbvf_irq_disable(adapter);
+ vlan_group_set_device(adapter->vlgrp, vid, NULL);
+
+ if (!test_bit(__IGBVF_DOWN, &adapter->state))
+ igbvf_irq_enable(adapter);
+
+ if (hw->mac.ops.set_vfta(hw, vid, false))
+ dev_err(&adapter->pdev->dev,
+ "Failed to remove vlan id %d\n", vid);
+}
+
+static void igbvf_vlan_rx_register(struct net_device *netdev,
+ struct vlan_group *grp)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ adapter->vlgrp = grp;
+}
+
+static void igbvf_restore_vlan(struct igbvf_adapter *adapter)
+{
+ u16 vid;
+
+ if (!adapter->vlgrp)
+ return;
+
+ for (vid = 0; vid < VLAN_GROUP_ARRAY_LEN; vid++) {
+ if (!vlan_group_get_device(adapter->vlgrp, vid))
+ continue;
+ igbvf_vlan_rx_add_vid(adapter->netdev, vid);
+ }
+
+ igbvf_set_rlpml(adapter);
+}
+
+/**
+ * igbvf_configure_tx - Configure Transmit Unit after Reset
+ * @adapter: board private structure
+ *
+ * Configure the Tx unit of the MAC after a reset.
+ **/
+static void igbvf_configure_tx(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ struct igbvf_ring *tx_ring = adapter->tx_ring;
+ u64 tdba;
+ u32 txdctl, dca_txctrl;
+
+ /* disable transmits */
+ txdctl = er32(TXDCTL(0));
+ ew32(TXDCTL(0), txdctl & ~E1000_TXDCTL_QUEUE_ENABLE);
+ msleep(10);
+
+ /* Setup the HW Tx Head and Tail descriptor pointers */
+ ew32(TDLEN(0), tx_ring->count * sizeof(union e1000_adv_tx_desc));
+ tdba = tx_ring->dma;
+ ew32(TDBAL(0), (tdba & DMA_32BIT_MASK));
+ ew32(TDBAH(0), (tdba >> 32));
+ ew32(TDH(0), 0);
+ ew32(TDT(0), 0);
+ tx_ring->head = E1000_TDH(0);
+ tx_ring->tail = E1000_TDT(0);
+
+ /* Turn off Relaxed Ordering on head write-backs. The writebacks
+ * MUST be delivered in order or it will completely screw up
+ * our bookeeping.
+ */
+ dca_txctrl = er32(DCA_TXCTRL(0));
+ dca_txctrl &= ~E1000_DCA_TXCTRL_TX_WB_RO_EN;
+ ew32(DCA_TXCTRL(0), dca_txctrl);
+
+ /* enable transmits */
+ txdctl |= E1000_TXDCTL_QUEUE_ENABLE;
+ ew32(TXDCTL(0), txdctl);
+
+ /* Setup Transmit Descriptor Settings for eop descriptor */
+ adapter->txd_cmd = E1000_ADVTXD_DCMD_EOP | E1000_ADVTXD_DCMD_IFCS;
+
+ /* enable Report Status bit */
+ adapter->txd_cmd |= E1000_ADVTXD_DCMD_RS;
+
+ adapter->tx_queue_len = adapter->netdev->tx_queue_len;
+}
+
+/**
+ * igbvf_setup_srrctl - configure the receive control registers
+ * @adapter: Board private structure
+ **/
+static void igbvf_setup_srrctl(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ u32 srrctl = 0;
+
+ srrctl &= ~(E1000_SRRCTL_DESCTYPE_MASK |
+ E1000_SRRCTL_BSIZEHDR_MASK |
+ E1000_SRRCTL_BSIZEPKT_MASK);
+
+ /* Enable queue drop to avoid head of line blocking */
+ srrctl |= E1000_SRRCTL_DROP_EN;
+
+ /* Setup buffer sizes */
+ srrctl |= ALIGN(adapter->rx_buffer_len, 1024) >>
+ E1000_SRRCTL_BSIZEPKT_SHIFT;
+
+ if (adapter->rx_buffer_len < 2048) {
+ adapter->rx_ps_hdr_size = 0;
+ srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF;
+ } else {
+ adapter->rx_ps_hdr_size = 128;
+ srrctl |= adapter->rx_ps_hdr_size <<
+ E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
+ srrctl |= E1000_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
+ }
+
+ ew32(SRRCTL(0), srrctl);
+}
+
+/**
+ * igbvf_configure_rx - Configure Receive Unit after Reset
+ * @adapter: board private structure
+ *
+ * Configure the Rx unit of the MAC after a reset.
+ **/
+static void igbvf_configure_rx(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ struct igbvf_ring *rx_ring = adapter->rx_ring;
+ u64 rdba;
+ u32 rdlen, rxdctl;
+
+ /* disable receives */
+ rxdctl = er32(RXDCTL(0));
+ ew32(RXDCTL(0), rxdctl & ~E1000_RXDCTL_QUEUE_ENABLE);
+ msleep(10);
+
+ rdlen = rx_ring->count * sizeof(union e1000_adv_rx_desc);
+
+ /*
+ * Setup the HW Rx Head and Tail Descriptor Pointers and
+ * the Base and Length of the Rx Descriptor Ring
+ */
+ rdba = rx_ring->dma;
+ ew32(RDBAL(0), (rdba & DMA_32BIT_MASK));
+ ew32(RDBAH(0), (rdba >> 32));
+ ew32(RDLEN(0), rx_ring->count * sizeof(union e1000_adv_rx_desc));
+ rx_ring->head = E1000_RDH(0);
+ rx_ring->tail = E1000_RDT(0);
+ ew32(RDH(0), 0);
+ ew32(RDT(0), 0);
+
+ rxdctl |= E1000_RXDCTL_QUEUE_ENABLE;
+ rxdctl &= 0xFFF00000;
+ rxdctl |= IGBVF_RX_PTHRESH;
+ rxdctl |= IGBVF_RX_HTHRESH << 8;
+ rxdctl |= IGBVF_RX_WTHRESH << 16;
+
+ igbvf_set_rlpml(adapter);
+
+ /* enable receives */
+ ew32(RXDCTL(0), rxdctl);
+}
+
+/**
+ * igbvf_set_multi - Multicast and Promiscuous mode set
+ * @netdev: network interface device structure
+ *
+ * The set_multi entry point is called whenever the multicast address
+ * list or the network interface flags are updated. This routine is
+ * responsible for configuring the hardware for proper multicast,
+ * promiscuous mode, and all-multi behavior.
+ **/
+static void igbvf_set_multi(struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+ struct dev_mc_list *mc_ptr;
+ u8 *mta_list = NULL;
+ int i;
+
+ if (netdev->mc_count) {
+ mta_list = kmalloc(netdev->mc_count * 6, GFP_ATOMIC);
+ if (!mta_list) {
+ dev_err(&adapter->pdev->dev,
+ "failed to allocate multicast filter list\n");
+ return;
+ }
+ }
+
+ /* prepare a packed array of only addresses. */
+ mc_ptr = netdev->mc_list;
+
+ for (i = 0; i < netdev->mc_count; i++) {
+ if (!mc_ptr)
+ break;
+ memcpy(mta_list + (i*ETH_ALEN), mc_ptr->dmi_addr,
+ ETH_ALEN);
+ mc_ptr = mc_ptr->next;
+ }
+
+ hw->mac.ops.update_mc_addr_list(hw, mta_list, i, 0, 0);
+ kfree(mta_list);
+}
+
+/**
+ * igbvf_configure - configure the hardware for Rx and Tx
+ * @adapter: private board structure
+ **/
+static void igbvf_configure(struct igbvf_adapter *adapter)
+{
+ igbvf_set_multi(adapter->netdev);
+
+ igbvf_restore_vlan(adapter);
+
+ igbvf_configure_tx(adapter);
+ igbvf_setup_srrctl(adapter);
+ igbvf_configure_rx(adapter);
+ igbvf_alloc_rx_buffers(adapter->rx_ring,
+ igbvf_desc_unused(adapter->rx_ring));
+}
+
+/* igbvf_reset - bring the hardware into a known good state
+ *
+ * This function boots the hardware and enables some settings that
+ * require a configuration cycle of the hardware - those cannot be
+ * set/changed during runtime. After reset the device needs to be
+ * properly configured for Rx, Tx etc.
+ */
+void igbvf_reset(struct igbvf_adapter *adapter)
+{
+ struct e1000_mac_info *mac = &adapter->hw.mac;
+ struct net_device *netdev = adapter->netdev;
+ struct e1000_hw *hw = &adapter->hw;
+
+ /* Allow time for pending master requests to run */
+ if (mac->ops.reset_hw(hw))
+ dev_err(&adapter->pdev->dev, "PF still resetting\n");
+
+ mac->ops.init_hw(hw);
+
+ if (is_valid_ether_addr(adapter->hw.mac.addr)) {
+ memcpy(netdev->dev_addr, adapter->hw.mac.addr,
+ netdev->addr_len);
+ memcpy(netdev->perm_addr, adapter->hw.mac.addr,
+ netdev->addr_len);
+ }
+}
+
+int igbvf_up(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+
+ /* hardware has been reset, we need to reload some things */
+ igbvf_configure(adapter);
+
+ clear_bit(__IGBVF_DOWN, &adapter->state);
+
+ napi_enable(&adapter->rx_ring->napi);
+ if (adapter->msix_entries)
+ igbvf_configure_msix(adapter);
+
+ /* Clear any pending interrupts. */
+ er32(EICR);
+ igbvf_irq_enable(adapter);
+
+ /* start the watchdog */
+ hw->mac.get_link_status = 1;
+ mod_timer(&adapter->watchdog_timer, jiffies + 1);
+
+
+ return 0;
+}
+
+void igbvf_down(struct igbvf_adapter *adapter)
+{
+ struct net_device *netdev = adapter->netdev;
+ struct e1000_hw *hw = &adapter->hw;
+ u32 rxdctl, txdctl;
+
+ /*
+ * signal that we're down so the interrupt handler does not
+ * reschedule our watchdog timer
+ */
+ set_bit(__IGBVF_DOWN, &adapter->state);
+
+ /* disable receives in the hardware */
+ rxdctl = er32(RXDCTL(0));
+ ew32(RXDCTL(0), rxdctl & ~E1000_RXDCTL_QUEUE_ENABLE);
+
+ netif_stop_queue(netdev);
+
+ /* disable transmits in the hardware */
+ txdctl = er32(TXDCTL(0));
+ ew32(TXDCTL(0), txdctl & ~E1000_TXDCTL_QUEUE_ENABLE);
+
+ /* flush both disables and wait for them to finish */
+ e1e_flush();
+ msleep(10);
+
+ napi_disable(&adapter->rx_ring->napi);
+
+ igbvf_irq_disable(adapter);
+
+ del_timer_sync(&adapter->watchdog_timer);
+
+ netdev->tx_queue_len = adapter->tx_queue_len;
+ netif_carrier_off(netdev);
+
+ /* record the stats before reset*/
+ igbvf_update_stats(adapter);
+
+ adapter->link_speed = 0;
+ adapter->link_duplex = 0;
+
+ igbvf_reset(adapter);
+ igbvf_clean_tx_ring(adapter->tx_ring);
+ igbvf_clean_rx_ring(adapter->rx_ring);
+}
+
+void igbvf_reinit_locked(struct igbvf_adapter *adapter)
+{
+ might_sleep();
+ while (test_and_set_bit(__IGBVF_RESETTING, &adapter->state))
+ msleep(1);
+ igbvf_down(adapter);
+ igbvf_up(adapter);
+ clear_bit(__IGBVF_RESETTING, &adapter->state);
+}
+
+/**
+ * igbvf_sw_init - Initialize general software structures (struct igbvf_adapter)
+ * @adapter: board private structure to initialize
+ *
+ * igbvf_sw_init initializes the Adapter private data structure.
+ * Fields are initialized based on PCI device information and
+ * OS network device settings (MTU size).
+ **/
+static int __devinit igbvf_sw_init(struct igbvf_adapter *adapter)
+{
+ struct net_device *netdev = adapter->netdev;
+ s32 rc;
+
+ adapter->rx_buffer_len = ETH_FRAME_LEN + VLAN_HLEN + ETH_FCS_LEN;
+ adapter->rx_ps_hdr_size = 0;
+ adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
+ adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
+
+ adapter->tx_int_delay = 8;
+ adapter->tx_abs_int_delay = 32;
+ adapter->rx_int_delay = 0;
+ adapter->rx_abs_int_delay = 8;
+ adapter->itr_setting = 3;
+ adapter->itr = 20000;
+
+ /* Set various function pointers */
+ adapter->ei->init_ops(&adapter->hw);
+
+ rc = adapter->hw.mac.ops.init_params(&adapter->hw);
+ if (rc)
+ return rc;
+
+ rc = adapter->hw.mbx.ops.init_params(&adapter->hw);
+ if (rc)
+ return rc;
+
+ igbvf_set_interrupt_capability(adapter);
+
+ if (igbvf_alloc_queues(adapter))
+ return -ENOMEM;
+
+ spin_lock_init(&adapter->tx_queue_lock);
+
+ /* Explicitly disable IRQ since the NIC can be in any state. */
+ igbvf_irq_disable(adapter);
+
+ spin_lock_init(&adapter->stats_lock);
+
+ set_bit(__IGBVF_DOWN, &adapter->state);
+ return 0;
+}
+
+static void igbvf_initialize_last_counter_stats(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+
+ adapter->stats.last_gprc = er32(VFGPRC);
+ adapter->stats.last_gorc = er32(VFGORC);
+ adapter->stats.last_gptc = er32(VFGPTC);
+ adapter->stats.last_gotc = er32(VFGOTC);
+ adapter->stats.last_mprc = er32(VFMPRC);
+ adapter->stats.last_gotlbc = er32(VFGOTLBC);
+ adapter->stats.last_gptlbc = er32(VFGPTLBC);
+ adapter->stats.last_gorlbc = er32(VFGORLBC);
+ adapter->stats.last_gprlbc = er32(VFGPRLBC);
+
+ adapter->stats.base_gprc = er32(VFGPRC);
+ adapter->stats.base_gorc = er32(VFGORC);
+ adapter->stats.base_gptc = er32(VFGPTC);
+ adapter->stats.base_gotc = er32(VFGOTC);
+ adapter->stats.base_mprc = er32(VFMPRC);
+ adapter->stats.base_gotlbc = er32(VFGOTLBC);
+ adapter->stats.base_gptlbc = er32(VFGPTLBC);
+ adapter->stats.base_gorlbc = er32(VFGORLBC);
+ adapter->stats.base_gprlbc = er32(VFGPRLBC);
+}
+
+/**
+ * igbvf_open - Called when a network interface is made active
+ * @netdev: network interface device structure
+ *
+ * Returns 0 on success, negative value on failure
+ *
+ * The open entry point is called when a network interface is made
+ * active by the system (IFF_UP). At this point all resources needed
+ * for transmit and receive operations are allocated, the interrupt
+ * handler is registered with the OS, the watchdog timer is started,
+ * and the stack is notified that the interface is ready.
+ **/
+static int igbvf_open(struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+ int err;
+
+ /* disallow open during test */
+ if (test_bit(__IGBVF_TESTING, &adapter->state))
+ return -EBUSY;
+
+ /* allocate transmit descriptors */
+ err = igbvf_setup_tx_resources(adapter, adapter->tx_ring);
+ if (err)
+ goto err_setup_tx;
+
+ /* allocate receive descriptors */
+ err = igbvf_setup_rx_resources(adapter, adapter->rx_ring);
+ if (err)
+ goto err_setup_rx;
+
+ /*
+ * before we allocate an interrupt, we must be ready to handle it.
+ * Setting DEBUG_SHIRQ in the kernel makes it fire an interrupt
+ * as soon as we call pci_request_irq, so we have to setup our
+ * clean_rx handler before we do so.
+ */
+ igbvf_configure(adapter);
+
+ err = igbvf_request_irq(adapter);
+ if (err)
+ goto err_req_irq;
+
+ /* From here on the code is the same as igbvf_up() */
+ clear_bit(__IGBVF_DOWN, &adapter->state);
+
+ napi_enable(&adapter->rx_ring->napi);
+
+ /* clear any pending interrupts */
+ er32(EICR);
+
+ igbvf_irq_enable(adapter);
+
+ /* start the watchdog */
+ hw->mac.get_link_status = 1;
+ mod_timer(&adapter->watchdog_timer, jiffies + 1);
+
+ return 0;
+
+err_req_irq:
+ igbvf_free_rx_resources(adapter->rx_ring);
+err_setup_rx:
+ igbvf_free_tx_resources(adapter->tx_ring);
+err_setup_tx:
+ igbvf_reset(adapter);
+
+ return err;
+}
+
+/**
+ * igbvf_close - Disables a network interface
+ * @netdev: network interface device structure
+ *
+ * Returns 0, this is not allowed to fail
+ *
+ * The close entry point is called when an interface is de-activated
+ * by the OS. The hardware is still under the drivers control, but
+ * needs to be disabled. A global MAC reset is issued to stop the
+ * hardware, and all transmit and receive resources are freed.
+ **/
+static int igbvf_close(struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ WARN_ON(test_bit(__IGBVF_RESETTING, &adapter->state));
+ igbvf_down(adapter);
+
+ igbvf_free_irq(adapter);
+
+ igbvf_free_tx_resources(adapter->tx_ring);
+ igbvf_free_rx_resources(adapter->rx_ring);
+
+ return 0;
+}
+/**
+ * igbvf_set_mac - Change the Ethernet Address of the NIC
+ * @netdev: network interface device structure
+ * @p: pointer to an address structure
+ *
+ * Returns 0 on success, negative on failure
+ **/
+static int igbvf_set_mac(struct net_device *netdev, void *p)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+ struct sockaddr *addr = p;
+
+ if (!is_valid_ether_addr(addr->sa_data))
+ return -EADDRNOTAVAIL;
+
+ memcpy(hw->mac.addr, addr->sa_data, netdev->addr_len);
+
+ hw->mac.ops.rar_set(hw, hw->mac.addr, 0);
+
+ if (memcmp(addr->sa_data, hw->mac.addr, 6))
+ return -EADDRNOTAVAIL;
+
+ memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
+
+ return 0;
+}
+
+#define UPDATE_VF_COUNTER(reg, name) \
+ { \
+ u32 current_counter = er32(reg); \
+ if (current_counter < adapter->stats.last_##name) \
+ adapter->stats.name += 0x100000000LL; \
+ adapter->stats.last_##name = current_counter; \
+ adapter->stats.name &= 0xFFFFFFFF00000000LL; \
+ adapter->stats.name |= current_counter; \
+ }
+
+/**
+ * igbvf_update_stats - Update the board statistics counters
+ * @adapter: board private structure
+**/
+void igbvf_update_stats(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ struct pci_dev *pdev = adapter->pdev;
+
+ /*
+ * Prevent stats update while adapter is being reset, link is down
+ * or if the pci connection is down.
+ */
+ if (adapter->link_speed == 0)
+ return;
+
+ if (test_bit(__IGBVF_RESETTING, &adapter->state))
+ return;
+
+ if (pci_channel_offline(pdev))
+ return;
+
+ UPDATE_VF_COUNTER(VFGPRC, gprc);
+ UPDATE_VF_COUNTER(VFGORC, gorc);
+ UPDATE_VF_COUNTER(VFGPTC, gptc);
+ UPDATE_VF_COUNTER(VFGOTC, gotc);
+ UPDATE_VF_COUNTER(VFMPRC, mprc);
+ UPDATE_VF_COUNTER(VFGOTLBC, gotlbc);
+ UPDATE_VF_COUNTER(VFGPTLBC, gptlbc);
+ UPDATE_VF_COUNTER(VFGORLBC, gorlbc);
+ UPDATE_VF_COUNTER(VFGPRLBC, gprlbc);
+
+ /* Fill out the OS statistics structure */
+ adapter->net_stats.multicast = adapter->stats.mprc;
+}
+
+static void igbvf_print_link_info(struct igbvf_adapter *adapter)
+{
+ dev_info(&adapter->pdev->dev, "Link is Up %d Mbps %s\n",
+ adapter->link_speed,
+ ((adapter->link_duplex == FULL_DUPLEX) ?
+ "Full Duplex" : "Half Duplex"));
+}
+
+static bool igbvf_has_link(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ s32 ret_val = E1000_SUCCESS;
+ bool link_active;
+
+ ret_val = hw->mac.ops.check_for_link(hw);
+ link_active = !hw->mac.get_link_status;
+
+ /* if check for link returns error we will need to reset */
+ if (ret_val)
+ schedule_work(&adapter->reset_task);
+
+ return link_active;
+}
+
+/**
+ * igbvf_watchdog - Timer Call-back
+ * @data: pointer to adapter cast into an unsigned long
+ **/
+static void igbvf_watchdog(unsigned long data)
+{
+ struct igbvf_adapter *adapter = (struct igbvf_adapter *) data;
+
+ /* Do the rest outside of interrupt context */
+ schedule_work(&adapter->watchdog_task);
+}
+
+static void igbvf_watchdog_task(struct work_struct *work)
+{
+ struct igbvf_adapter *adapter = container_of(work,
+ struct igbvf_adapter,
+ watchdog_task);
+ struct net_device *netdev = adapter->netdev;
+ struct e1000_mac_info *mac = &adapter->hw.mac;
+ struct igbvf_ring *tx_ring = adapter->tx_ring;
+ struct e1000_hw *hw = &adapter->hw;
+ u32 link;
+ int tx_pending = 0;
+
+ link = igbvf_has_link(adapter);
+
+ if (link) {
+ if (!netif_carrier_ok(netdev)) {
+ bool txb2b = 1;
+
+ mac->ops.get_link_up_info(&adapter->hw,
+ &adapter->link_speed,
+ &adapter->link_duplex);
+ igbvf_print_link_info(adapter);
+
+ /*
+ * tweak tx_queue_len according to speed/duplex
+ * and adjust the timeout factor
+ */
+ netdev->tx_queue_len = adapter->tx_queue_len;
+ adapter->tx_timeout_factor = 1;
+ switch (adapter->link_speed) {
+ case SPEED_10:
+ txb2b = 0;
+ netdev->tx_queue_len = 10;
+ adapter->tx_timeout_factor = 16;
+ break;
+ case SPEED_100:
+ txb2b = 0;
+ netdev->tx_queue_len = 100;
+ /* maybe add some timeout factor ? */
+ break;
+ }
+
+ netif_carrier_on(netdev);
+ netif_wake_queue(netdev);
+ }
+ } else {
+ if (netif_carrier_ok(netdev)) {
+ adapter->link_speed = 0;
+ adapter->link_duplex = 0;
+ dev_info(&adapter->pdev->dev, "Link is Down\n");
+ netif_carrier_off(netdev);
+ netif_stop_queue(netdev);
+ }
+ }
+
+ if (netif_carrier_ok(netdev)) {
+ igbvf_update_stats(adapter);
+ } else {
+ tx_pending = (igbvf_desc_unused(tx_ring) + 1 <
+ tx_ring->count);
+ if (tx_pending) {
+ /*
+ * We've lost link, so the controller stops DMA,
+ * but we've got queued Tx work that's never going
+ * to get done, so reset controller to flush Tx.
+ * (Do the reset outside of interrupt context).
+ */
+ adapter->tx_timeout_count++;
+ schedule_work(&adapter->reset_task);
+ }
+ }
+
+ /* Cause software interrupt to ensure Rx ring is cleaned */
+ ew32(EICS, adapter->rx_ring->eims_value);
+
+ /* Force detection of hung controller every watchdog period */
+ adapter->detect_tx_hung = 1;
+
+ /* Reset the timer */
+ if (!test_bit(__IGBVF_DOWN, &adapter->state))
+ mod_timer(&adapter->watchdog_timer,
+ round_jiffies(jiffies + (2 * HZ)));
+}
+
+#define IGBVF_TX_FLAGS_CSUM 0x00000001
+#define IGBVF_TX_FLAGS_VLAN 0x00000002
+#define IGBVF_TX_FLAGS_TSO 0x00000004
+#define IGBVF_TX_FLAGS_IPV4 0x00000008
+#define IGBVF_TX_FLAGS_VLAN_MASK 0xffff0000
+#define IGBVF_TX_FLAGS_VLAN_SHIFT 16
+
+static int igbvf_tso(struct igbvf_adapter *adapter,
+ struct igbvf_ring *tx_ring,
+ struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
+{
+ struct e1000_adv_tx_context_desc *context_desc;
+ unsigned int i;
+ int err;
+ struct igbvf_buffer *buffer_info;
+ u32 info = 0, tu_cmd = 0;
+ u32 mss_l4len_idx, l4len;
+ *hdr_len = 0;
+
+ if (skb_header_cloned(skb)) {
+ err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+ if (err) {
+ dev_err(&adapter->pdev->dev,
+ "igbvf_tso returning an error\n");
+ return err;
+ }
+ }
+
+ l4len = tcp_hdrlen(skb);
+ *hdr_len += l4len;
+
+ if (skb->protocol == htons(ETH_P_IP)) {
+ struct iphdr *iph = ip_hdr(skb);
+ iph->tot_len = 0;
+ iph->check = 0;
+ tcp_hdr(skb)->check = ~csum_tcpudp_magic(iph->saddr,
+ iph->daddr, 0,
+ IPPROTO_TCP,
+ 0);
+ } else if (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV6) {
+ ipv6_hdr(skb)->payload_len = 0;
+ tcp_hdr(skb)->check = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+ &ipv6_hdr(skb)->daddr,
+ 0, IPPROTO_TCP, 0);
+ }
+
+ i = tx_ring->next_to_use;
+
+ buffer_info = &tx_ring->buffer_info[i];
+ context_desc = IGBVF_TX_CTXTDESC_ADV(*tx_ring, i);
+ /* VLAN MACLEN IPLEN */
+ if (tx_flags & IGBVF_TX_FLAGS_VLAN)
+ info |= (tx_flags & IGBVF_TX_FLAGS_VLAN_MASK);
+ info |= (skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT);
+ *hdr_len += skb_network_offset(skb);
+ info |= (skb_transport_header(skb) - skb_network_header(skb));
+ *hdr_len += (skb_transport_header(skb) - skb_network_header(skb));
+ context_desc->vlan_macip_lens = cpu_to_le32(info);
+
+ /* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
+ tu_cmd |= (E1000_TXD_CMD_DEXT | E1000_ADVTXD_DTYP_CTXT);
+
+ if (skb->protocol == htons(ETH_P_IP))
+ tu_cmd |= E1000_ADVTXD_TUCMD_IPV4;
+ tu_cmd |= E1000_ADVTXD_TUCMD_L4T_TCP;
+
+ context_desc->type_tucmd_mlhl = cpu_to_le32(tu_cmd);
+
+ /* MSS L4LEN IDX */
+ mss_l4len_idx = (skb_shinfo(skb)->gso_size << E1000_ADVTXD_MSS_SHIFT);
+ mss_l4len_idx |= (l4len << E1000_ADVTXD_L4LEN_SHIFT);
+
+ context_desc->mss_l4len_idx = cpu_to_le32(mss_l4len_idx);
+ context_desc->seqnum_seed = 0;
+
+ buffer_info->time_stamp = jiffies;
+ buffer_info->next_to_watch = i;
+ buffer_info->dma = 0;
+ i++;
+ if (i == tx_ring->count)
+ i = 0;
+
+ tx_ring->next_to_use = i;
+
+ return true;
+}
+
+static inline bool igbvf_tx_csum(struct igbvf_adapter *adapter,
+ struct igbvf_ring *tx_ring,
+ struct sk_buff *skb, u32 tx_flags)
+{
+ struct e1000_adv_tx_context_desc *context_desc;
+ unsigned int i;
+ struct igbvf_buffer *buffer_info;
+ u32 info = 0, tu_cmd = 0;
+
+ if ((skb->ip_summed == CHECKSUM_PARTIAL) ||
+ (tx_flags & IGBVF_TX_FLAGS_VLAN)) {
+ i = tx_ring->next_to_use;
+ buffer_info = &tx_ring->buffer_info[i];
+ context_desc = IGBVF_TX_CTXTDESC_ADV(*tx_ring, i);
+
+ if (tx_flags & IGBVF_TX_FLAGS_VLAN)
+ info |= (tx_flags & IGBVF_TX_FLAGS_VLAN_MASK);
+
+ info |= (skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT);
+ if (skb->ip_summed == CHECKSUM_PARTIAL)
+ info |= (skb_transport_header(skb) -
+ skb_network_header(skb));
+
+
+ context_desc->vlan_macip_lens = cpu_to_le32(info);
+
+ tu_cmd |= (E1000_TXD_CMD_DEXT | E1000_ADVTXD_DTYP_CTXT);
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ switch (skb->protocol) {
+ case __constant_htons(ETH_P_IP):
+ tu_cmd |= E1000_ADVTXD_TUCMD_IPV4;
+ if (ip_hdr(skb)->protocol == IPPROTO_TCP)
+ tu_cmd |= E1000_ADVTXD_TUCMD_L4T_TCP;
+ break;
+ case __constant_htons(ETH_P_IPV6):
+ if (ipv6_hdr(skb)->nexthdr == IPPROTO_TCP)
+ tu_cmd |= E1000_ADVTXD_TUCMD_L4T_TCP;
+ break;
+ default:
+ break;
+ }
+ }
+
+ context_desc->type_tucmd_mlhl = cpu_to_le32(tu_cmd);
+ context_desc->seqnum_seed = 0;
+ context_desc->mss_l4len_idx = 0;
+
+ buffer_info->time_stamp = jiffies;
+ buffer_info->next_to_watch = i;
+ buffer_info->dma = 0;
+ i++;
+ if (i == tx_ring->count)
+ i = 0;
+ tx_ring->next_to_use = i;
+
+ return true;
+ }
+
+ return false;
+}
+
+static int igbvf_maybe_stop_tx(struct net_device *netdev, int size)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ /* there is enough descriptors then we don't need to worry */
+ if (igbvf_desc_unused(adapter->tx_ring) >= size)
+ return 0;
+
+ netif_stop_queue(netdev);
+
+ smp_mb();
+
+ /* We need to check again just in case room has been made available */
+ if (igbvf_desc_unused(adapter->tx_ring) < size)
+ return -EBUSY;
+
+ netif_wake_queue(netdev);
+
+ ++adapter->restart_queue;
+ return 0;
+}
+
+#define IGBVF_MAX_TXD_PWR 16
+#define IGBVF_MAX_DATA_PER_TXD (1 << IGBVF_MAX_TXD_PWR)
+
+static inline int igbvf_tx_map_adv(struct igbvf_adapter *adapter,
+ struct igbvf_ring *tx_ring,
+ struct sk_buff *skb,
+ unsigned int first)
+{
+ struct igbvf_buffer *buffer_info;
+ unsigned int len = skb_headlen(skb);
+ unsigned int count = 0, i;
+ unsigned int f;
+ dma_addr_t *map;
+
+ i = tx_ring->next_to_use;
+
+ if (skb_dma_map(&adapter->pdev->dev, skb, DMA_TO_DEVICE)) {
+ dev_err(&adapter->pdev->dev, "TX DMA map failed\n");
+ return 0;
+ }
+
+ map = skb_shinfo(skb)->dma_maps;
+
+ buffer_info = &tx_ring->buffer_info[i];
+ BUG_ON(len >= IGBVF_MAX_DATA_PER_TXD);
+ buffer_info->length = len;
+ /* set time_stamp *before* dma to help avoid a possible race */
+ buffer_info->time_stamp = jiffies;
+ buffer_info->next_to_watch = i;
+ buffer_info->dma = map[count];
+ count++;
+
+ for (f = 0; f < skb_shinfo(skb)->nr_frags; f++) {
+ struct skb_frag_struct *frag;
+
+ i++;
+ if (i == tx_ring->count)
+ i = 0;
+
+ frag = &skb_shinfo(skb)->frags[f];
+ len = frag->size;
+
+ buffer_info = &tx_ring->buffer_info[i];
+ BUG_ON(len >= IGBVF_MAX_DATA_PER_TXD);
+ buffer_info->length = len;
+ buffer_info->time_stamp = jiffies;
+ buffer_info->next_to_watch = i;
+ buffer_info->dma = map[count];
+ count++;
+ }
+
+ tx_ring->buffer_info[i].skb = skb;
+ tx_ring->buffer_info[first].next_to_watch = i;
+
+ return count;
+}
+
+static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
+ struct igbvf_ring *tx_ring,
+ int tx_flags, int count, u32 paylen,
+ u8 hdr_len)
+{
+ union e1000_adv_tx_desc *tx_desc = NULL;
+ struct igbvf_buffer *buffer_info;
+ u32 olinfo_status = 0, cmd_type_len;
+ unsigned int i;
+
+ cmd_type_len = (E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_IFCS |
+ E1000_ADVTXD_DCMD_DEXT);
+
+ if (tx_flags & IGBVF_TX_FLAGS_VLAN)
+ cmd_type_len |= E1000_ADVTXD_DCMD_VLE;
+
+ if (tx_flags & IGBVF_TX_FLAGS_TSO) {
+ cmd_type_len |= E1000_ADVTXD_DCMD_TSE;
+
+ /* insert tcp checksum */
+ olinfo_status |= E1000_TXD_POPTS_TXSM << 8;
+
+ /* insert ip checksum */
+ if (tx_flags & IGBVF_TX_FLAGS_IPV4)
+ olinfo_status |= E1000_TXD_POPTS_IXSM << 8;
+
+ } else if (tx_flags & IGBVF_TX_FLAGS_CSUM) {
+ olinfo_status |= E1000_TXD_POPTS_TXSM << 8;
+ }
+
+ olinfo_status |= ((paylen - hdr_len) << E1000_ADVTXD_PAYLEN_SHIFT);
+
+ i = tx_ring->next_to_use;
+ while (count--) {
+ buffer_info = &tx_ring->buffer_info[i];
+ tx_desc = IGBVF_TX_DESC_ADV(*tx_ring, i);
+ tx_desc->read.buffer_addr = cpu_to_le64(buffer_info->dma);
+ tx_desc->read.cmd_type_len =
+ cpu_to_le32(cmd_type_len | buffer_info->length);
+ tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
+ i++;
+ if (i == tx_ring->count)
+ i = 0;
+ }
+
+ tx_desc->read.cmd_type_len |= cpu_to_le32(adapter->txd_cmd);
+ /* Force memory writes to complete before letting h/w
+ * know there are new descriptors to fetch. (Only
+ * applicable for weak-ordered memory model archs,
+ * such as IA-64). */
+ wmb();
+
+ tx_ring->next_to_use = i;
+ writel(i, adapter->hw.hw_addr + tx_ring->tail);
+ /* we need this if more than one processor can write to our tail
+ * at a time, it syncronizes IO on IA64/Altix systems */
+ mmiowb();
+}
+
+static int igbvf_xmit_frame_ring_adv(struct sk_buff *skb,
+ struct net_device *netdev,
+ struct igbvf_ring *tx_ring)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ unsigned int first, tx_flags = 0;
+ u8 hdr_len = 0;
+ int count = 0;
+ int tso = 0;
+
+ if (test_bit(__IGBVF_DOWN, &adapter->state)) {
+ dev_kfree_skb_any(skb);
+ return NETDEV_TX_OK;
+ }
+
+ if (skb->len <= 0) {
+ dev_kfree_skb_any(skb);
+ return NETDEV_TX_OK;
+ }
+
+ /*
+ * need: count + 4 desc gap to keep tail from touching
+ * + 2 desc gap to keep tail from touching head,
+ * + 1 desc for skb->data,
+ * + 1 desc for context descriptor,
+ * head, otherwise try next time
+ */
+ if (igbvf_maybe_stop_tx(netdev, skb_shinfo(skb)->nr_frags + 4)) {
+ /* this is a hard error */
+ return NETDEV_TX_BUSY;
+ }
+
+ if (adapter->vlgrp && vlan_tx_tag_present(skb)) {
+ tx_flags |= IGBVF_TX_FLAGS_VLAN;
+ tx_flags |= (vlan_tx_tag_get(skb) << IGBVF_TX_FLAGS_VLAN_SHIFT);
+ }
+
+ if (skb->protocol == htons(ETH_P_IP))
+ tx_flags |= IGBVF_TX_FLAGS_IPV4;
+
+ first = tx_ring->next_to_use;
+
+ tso = skb_is_gso(skb) ?
+ igbvf_tso(adapter, tx_ring, skb, tx_flags, &hdr_len) : 0;
+ if (unlikely(tso < 0)) {
+ dev_kfree_skb_any(skb);
+ return NETDEV_TX_OK;
+ }
+
+ if (tso)
+ tx_flags |= IGBVF_TX_FLAGS_TSO;
+ else if (igbvf_tx_csum(adapter, tx_ring, skb, tx_flags) &&
+ (skb->ip_summed == CHECKSUM_PARTIAL))
+ tx_flags |= IGBVF_TX_FLAGS_CSUM;
+
+ /*
+ * count reflects descriptors mapped, if 0 then mapping error
+ * has occured and we need to rewind the descriptor queue
+ */
+ count = igbvf_tx_map_adv(adapter, tx_ring, skb, first);
+
+ if (count) {
+ igbvf_tx_queue_adv(adapter, tx_ring, tx_flags, count,
+ skb->len, hdr_len);
+ netdev->trans_start = jiffies;
+ /* Make sure there is space in the ring for the next send. */
+ igbvf_maybe_stop_tx(netdev, MAX_SKB_FRAGS + 4);
+ } else {
+ dev_kfree_skb_any(skb);
+ tx_ring->buffer_info[first].time_stamp = 0;
+ tx_ring->next_to_use = first;
+ }
+
+ return NETDEV_TX_OK;
+}
+
+static int igbvf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct igbvf_ring *tx_ring;
+ int retval;
+
+ if (test_bit(__IGBVF_DOWN, &adapter->state)) {
+ dev_kfree_skb_any(skb);
+ return NETDEV_TX_OK;
+ }
+
+ tx_ring = &adapter->tx_ring[0];
+
+ retval = igbvf_xmit_frame_ring_adv(skb, netdev, tx_ring);
+
+ return retval;
+}
+
+/**
+ * igbvf_tx_timeout - Respond to a Tx Hang
+ * @netdev: network interface device structure
+ **/
+static void igbvf_tx_timeout(struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ /* Do the reset outside of interrupt context */
+ adapter->tx_timeout_count++;
+ schedule_work(&adapter->reset_task);
+}
+
+static void igbvf_reset_task(struct work_struct *work)
+{
+ struct igbvf_adapter *adapter;
+ adapter = container_of(work, struct igbvf_adapter, reset_task);
+
+ igbvf_reinit_locked(adapter);
+}
+
+/**
+ * igbvf_get_stats - Get System Network Statistics
+ * @netdev: network interface device structure
+ *
+ * Returns the address of the device statistics structure.
+ * The statistics are actually updated from the timer callback.
+ **/
+static struct net_device_stats *igbvf_get_stats(struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ /* only return the current stats */
+ return &adapter->net_stats;
+}
+
+/**
+ * igbvf_change_mtu - Change the Maximum Transfer Unit
+ * @netdev: network interface device structure
+ * @new_mtu: new value for maximum frame size
+ *
+ * Returns 0 on success, negative on failure
+ **/
+static int igbvf_change_mtu(struct net_device *netdev, int new_mtu)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
+
+ if ((new_mtu < 68) || (max_frame > MAX_JUMBO_FRAME_SIZE)) {
+ dev_err(&adapter->pdev->dev, "Invalid MTU setting\n");
+ return -EINVAL;
+ }
+
+ /* Jumbo frame size limits */
+ if (max_frame > ETH_FRAME_LEN + ETH_FCS_LEN) {
+ if (!(adapter->flags & FLAG_HAS_JUMBO_FRAMES)) {
+ dev_err(&adapter->pdev->dev,
+ "Jumbo Frames not supported.\n");
+ return -EINVAL;
+ }
+ }
+
+#define MAX_STD_JUMBO_FRAME_SIZE 9234
+ if (max_frame > MAX_STD_JUMBO_FRAME_SIZE) {
+ dev_err(&adapter->pdev->dev, "MTU > 9216 not supported.\n");
+ return -EINVAL;
+ }
+
+ while (test_and_set_bit(__IGBVF_RESETTING, &adapter->state))
+ msleep(1);
+ /* igbvf_down has a dependency on max_frame_size */
+ adapter->max_frame_size = max_frame;
+ if (netif_running(netdev))
+ igbvf_down(adapter);
+
+ /*
+ * NOTE: netdev_alloc_skb reserves 16 bytes, and typically NET_IP_ALIGN
+ * means we reserve 2 more, this pushes us to allocate from the next
+ * larger slab size.
+ * i.e. RXBUFFER_2048 --> size-4096 slab
+ * However with the new *_jumbo_rx* routines, jumbo receives will use
+ * fragmented skbs
+ */
+
+ if (max_frame <= 1024)
+ adapter->rx_buffer_len = 1024;
+ else if (max_frame <= 2048)
+ adapter->rx_buffer_len = 2048;
+ else
+#if (PAGE_SIZE / 2) > 16384
+ adapter->rx_buffer_len = 16384;
+#else
+ adapter->rx_buffer_len = PAGE_SIZE / 2;
+#endif
+
+
+ /* adjust allocation if LPE protects us, and we aren't using SBP */
+ if ((max_frame == ETH_FRAME_LEN + ETH_FCS_LEN) ||
+ (max_frame == ETH_FRAME_LEN + VLAN_HLEN + ETH_FCS_LEN))
+ adapter->rx_buffer_len = ETH_FRAME_LEN + VLAN_HLEN +
+ ETH_FCS_LEN;
+
+ dev_info(&adapter->pdev->dev, "changing MTU from %d to %d\n",
+ netdev->mtu, new_mtu);
+ netdev->mtu = new_mtu;
+
+ if (netif_running(netdev))
+ igbvf_up(adapter);
+ else
+ igbvf_reset(adapter);
+
+ clear_bit(__IGBVF_RESETTING, &adapter->state);
+
+ return 0;
+}
+
+static int igbvf_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+ switch (cmd) {
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int igbvf_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+#ifdef CONFIG_PM
+ int retval = 0;
+#endif
+
+ netif_device_detach(netdev);
+
+ if (netif_running(netdev)) {
+ WARN_ON(test_bit(__IGBVF_RESETTING, &adapter->state));
+ igbvf_down(adapter);
+ igbvf_free_irq(adapter);
+ }
+
+#ifdef CONFIG_PM
+ retval = pci_save_state(pdev);
+ if (retval)
+ return retval;
+#endif
+
+ pci_disable_device(pdev);
+
+ return 0;
+}
+
+#ifdef CONFIG_PM
+static int igbvf_resume(struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ u32 err;
+
+ pci_restore_state(pdev);
+ err = pci_enable_device_mem(pdev);
+ if (err) {
+ dev_err(&pdev->dev, "Cannot enable PCI device from suspend\n");
+ return err;
+ }
+
+ pci_set_master(pdev);
+
+ if (netif_running(netdev)) {
+ err = igbvf_request_irq(adapter);
+ if (err)
+ return err;
+ }
+
+ igbvf_reset(adapter);
+
+ if (netif_running(netdev))
+ igbvf_up(adapter);
+
+ netif_device_attach(netdev);
+
+ return 0;
+}
+#endif
+
+static void igbvf_shutdown(struct pci_dev *pdev)
+{
+ igbvf_suspend(pdev, PMSG_SUSPEND);
+}
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/*
+ * Polling 'interrupt' - used by things like netconsole to send skbs
+ * without having to re-enable interrupts. It's not called while
+ * the interrupt routine is executing.
+ */
+static void igbvf_netpoll(struct net_device *netdev)
+{
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ disable_irq(adapter->pdev->irq);
+
+ igbvf_clean_tx_irq(adapter->tx_ring);
+
+ enable_irq(adapter->pdev->irq);
+}
+#endif
+
+/**
+ * igbvf_io_error_detected - called when PCI error is detected
+ * @pdev: Pointer to PCI device
+ * @state: The current pci connection state
+ *
+ * This function is called after a PCI bus error affecting
+ * this device has been detected.
+ */
+static pci_ers_result_t igbvf_io_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ netif_device_detach(netdev);
+
+ if (netif_running(netdev))
+ igbvf_down(adapter);
+ pci_disable_device(pdev);
+
+ /* Request a slot slot reset. */
+ return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * igbvf_io_slot_reset - called after the pci bus has been reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Restart the card from scratch, as if from a cold-boot. Implementation
+ * resembles the first-half of the igbvf_resume routine.
+ */
+static pci_ers_result_t igbvf_io_slot_reset(struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ if (pci_enable_device_mem(pdev)) {
+ dev_err(&pdev->dev,
+ "Cannot re-enable PCI device after reset.\n");
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+ pci_set_master(pdev);
+
+ igbvf_reset(adapter);
+
+ return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * igbvf_io_resume - called when traffic can start flowing again.
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called when the error recovery driver tells us that
+ * its OK to resume normal operation. Implementation resembles the
+ * second-half of the igbvf_resume routine.
+ */
+static void igbvf_io_resume(struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+ if (netif_running(netdev)) {
+ if (igbvf_up(adapter)) {
+ dev_err(&pdev->dev,
+ "can't bring device back up after reset\n");
+ return;
+ }
+ }
+
+ netif_device_attach(netdev);
+}
+
+static void igbvf_print_device_info(struct igbvf_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ struct net_device *netdev = adapter->netdev;
+ struct pci_dev *pdev = adapter->pdev;
+
+ dev_info(&pdev->dev, "Intel(R) 82576 Virtual Function\n");
+ dev_info(&pdev->dev, "Address: %02x:%02x:%02x:%02x:%02x:%02x\n",
+ /* MAC address */
+ netdev->dev_addr[0], netdev->dev_addr[1],
+ netdev->dev_addr[2], netdev->dev_addr[3],
+ netdev->dev_addr[4], netdev->dev_addr[5]);
+ dev_info(&pdev->dev, "MAC: %d\n", hw->mac.type);
+}
+
+static const struct net_device_ops igbvf_netdev_ops = {
+ .ndo_open = igbvf_open,
+ .ndo_stop = igbvf_close,
+ .ndo_start_xmit = igbvf_xmit_frame,
+ .ndo_get_stats = igbvf_get_stats,
+ .ndo_set_multicast_list = igbvf_set_multi,
+ .ndo_set_mac_address = igbvf_set_mac,
+ .ndo_change_mtu = igbvf_change_mtu,
+ .ndo_do_ioctl = igbvf_ioctl,
+ .ndo_tx_timeout = igbvf_tx_timeout,
+ .ndo_vlan_rx_register = igbvf_vlan_rx_register,
+ .ndo_vlan_rx_add_vid = igbvf_vlan_rx_add_vid,
+ .ndo_vlan_rx_kill_vid = igbvf_vlan_rx_kill_vid,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ .ndo_poll_controller = igbvf_netpoll,
+#endif
+};
+
+/**
+ * igbvf_probe - Device Initialization Routine
+ * @pdev: PCI device information struct
+ * @ent: entry in igbvf_pci_tbl
+ *
+ * Returns 0 on success, negative on failure
+ *
+ * igbvf_probe initializes an adapter identified by a pci_dev structure.
+ * The OS initialization, configuring of the adapter private structure,
+ * and a hardware reset occur.
+ **/
+static int __devinit igbvf_probe(struct pci_dev *pdev,
+ const struct pci_device_id *ent)
+{
+ struct net_device *netdev;
+ struct igbvf_adapter *adapter;
+ struct e1000_hw *hw;
+ const struct igbvf_info *ei = igbvf_info_tbl[ent->driver_data];
+
+ static int cards_found;
+ int err, pci_using_dac;
+
+ err = pci_enable_device_mem(pdev);
+ if (err)
+ return err;
+
+ pci_using_dac = 0;
+ err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
+ if (!err) {
+ err = pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
+ if (!err)
+ pci_using_dac = 1;
+ } else {
+ err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
+ if (err) {
+ err = pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK);
+ if (err) {
+ dev_err(&pdev->dev, "No usable DMA "
+ "configuration, aborting\n");
+ goto err_dma;
+ }
+ }
+ }
+
+ err = pci_request_regions(pdev, igbvf_driver_name);
+ if (err)
+ goto err_pci_reg;
+
+ pci_set_master(pdev);
+
+ err = -ENOMEM;
+ netdev = alloc_etherdev(sizeof(struct igbvf_adapter));
+ if (!netdev)
+ goto err_alloc_etherdev;
+
+ SET_NETDEV_DEV(netdev, &pdev->dev);
+
+ pci_set_drvdata(pdev, netdev);
+ adapter = netdev_priv(netdev);
+ hw = &adapter->hw;
+ adapter->netdev = netdev;
+ adapter->pdev = pdev;
+ adapter->ei = ei;
+ adapter->pba = ei->pba;
+ adapter->flags = ei->flags;
+ adapter->hw.back = adapter;
+ adapter->hw.mac.type = ei->mac;
+ adapter->msg_enable = (1 << NETIF_MSG_DRV | NETIF_MSG_PROBE) - 1;
+
+ /* PCI config space info */
+
+ hw->vendor_id = pdev->vendor;
+ hw->device_id = pdev->device;
+ hw->subsystem_vendor_id = pdev->subsystem_vendor;
+ hw->subsystem_device_id = pdev->subsystem_device;
+
+ pci_read_config_byte(pdev, PCI_REVISION_ID, &hw->revision_id);
+
+ err = -EIO;
+ adapter->hw.hw_addr = ioremap(pci_resource_start(pdev, 0),
+ pci_resource_len(pdev, 0));
+
+ if (!adapter->hw.hw_addr)
+ goto err_ioremap;
+
+ if (ei->get_variants) {
+ err = ei->get_variants(adapter);
+ if (err)
+ goto err_ioremap;
+ }
+
+ /* setup adapter struct */
+ err = igbvf_sw_init(adapter);
+ if (err)
+ goto err_sw_init;
+
+ /* construct the net_device struct */
+ netdev->netdev_ops = &igbvf_netdev_ops;
+
+ igbvf_set_ethtool_ops(netdev);
+ netdev->watchdog_timeo = 5 * HZ;
+ strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
+
+ adapter->bd_number = cards_found++;
+
+ netdev->features = NETIF_F_SG |
+ NETIF_F_IP_CSUM |
+ NETIF_F_HW_VLAN_TX |
+ NETIF_F_HW_VLAN_RX |
+ NETIF_F_HW_VLAN_FILTER;
+
+ netdev->features |= NETIF_F_IPV6_CSUM;
+ netdev->features |= NETIF_F_TSO;
+ netdev->features |= NETIF_F_TSO6;
+
+ if (pci_using_dac)
+ netdev->features |= NETIF_F_HIGHDMA;
+
+ netdev->vlan_features |= NETIF_F_TSO;
+ netdev->vlan_features |= NETIF_F_TSO6;
+ netdev->vlan_features |= NETIF_F_IP_CSUM;
+ netdev->vlan_features |= NETIF_F_IPV6_CSUM;
+ netdev->vlan_features |= NETIF_F_SG;
+
+ /*reset the controller to put the device in a known good state */
+ err = hw->mac.ops.reset_hw(hw);
+ if (err) {
+ dev_info(&pdev->dev,
+ "PF still in reset state, assigning new address\n");
+ random_ether_addr(hw->mac.addr);
+ } else {
+ err = hw->mac.ops.read_mac_addr(hw);
+ if (err) {
+ dev_err(&pdev->dev, "Error reading MAC address\n");
+ goto err_hw_init;
+ }
+ }
+
+ memcpy(netdev->dev_addr, adapter->hw.mac.addr, netdev->addr_len);
+ memcpy(netdev->perm_addr, adapter->hw.mac.addr, netdev->addr_len);
+
+ if (!is_valid_ether_addr(netdev->perm_addr)) {
+ dev_err(&pdev->dev, "Invalid MAC Address: "
+ "%02x:%02x:%02x:%02x:%02x:%02x\n",
+ netdev->dev_addr[0], netdev->dev_addr[1],
+ netdev->dev_addr[2], netdev->dev_addr[3],
+ netdev->dev_addr[4], netdev->dev_addr[5]);
+ err = -EIO;
+ goto err_hw_init;
+ }
+
+ setup_timer(&adapter->watchdog_timer, &igbvf_watchdog,
+ (unsigned long) adapter);
+
+ INIT_WORK(&adapter->reset_task, igbvf_reset_task);
+ INIT_WORK(&adapter->watchdog_task, igbvf_watchdog_task);
+
+ /* ring size defaults */
+ adapter->rx_ring->count = 1024;
+ adapter->tx_ring->count = 1024;
+
+ /* reset the hardware with the new settings */
+ igbvf_reset(adapter);
+
+ /* tell the stack to leave us alone until igbvf_open() is called */
+ netif_carrier_off(netdev);
+ netif_stop_queue(netdev);
+
+ strcpy(netdev->name, "eth%d");
+ err = register_netdev(netdev);
+ if (err)
+ goto err_hw_init;
+
+ igbvf_print_device_info(adapter);
+
+ igbvf_initialize_last_counter_stats(adapter);
+
+ return 0;
+
+err_hw_init:
+ kfree(adapter->tx_ring);
+ kfree(adapter->rx_ring);
+err_sw_init:
+ igbvf_reset_interrupt_capability(adapter);
+ iounmap(adapter->hw.hw_addr);
+err_ioremap:
+ free_netdev(netdev);
+err_alloc_etherdev:
+ pci_release_regions(pdev);
+err_pci_reg:
+err_dma:
+ pci_disable_device(pdev);
+ return err;
+}
+
+/**
+ * igbvf_remove - Device Removal Routine
+ * @pdev: PCI device information struct
+ *
+ * igbvf_remove is called by the PCI subsystem to alert the driver
+ * that it should release a PCI device. The could be caused by a
+ * Hot-Plug event, or because the driver is going to be removed from
+ * memory.
+ **/
+static void __devexit igbvf_remove(struct pci_dev *pdev)
+{
+ struct net_device *netdev = pci_get_drvdata(pdev);
+ struct igbvf_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
+
+ /*
+ * flush_scheduled work may reschedule our watchdog task, so
+ * explicitly disable watchdog tasks from being rescheduled
+ */
+ set_bit(__IGBVF_DOWN, &adapter->state);
+ del_timer_sync(&adapter->watchdog_timer);
+
+ flush_scheduled_work();
+
+ unregister_netdev(netdev);
+
+ igbvf_reset_interrupt_capability(adapter);
+
+ /*
+ * it is important to delete the napi struct prior to freeing the
+ * rx ring so that you do not end up with null pointer refs
+ */
+ netif_napi_del(&adapter->rx_ring->napi);
+ kfree(adapter->tx_ring);
+ kfree(adapter->rx_ring);
+
+ iounmap(hw->hw_addr);
+ if (hw->flash_address)
+ iounmap(hw->flash_address);
+ pci_release_regions(pdev);
+
+ free_netdev(netdev);
+
+ pci_disable_device(pdev);
+}
+
+/* PCI Error Recovery (ERS) */
+static struct pci_error_handlers igbvf_err_handler = {
+ .error_detected = igbvf_io_error_detected,
+ .slot_reset = igbvf_io_slot_reset,
+ .resume = igbvf_io_resume,
+};
+
+static struct pci_device_id igbvf_pci_tbl[] = {
+ { PCI_VDEVICE(INTEL, E1000_DEV_ID_82576_VF), board_vf },
+ { } /* terminate list */
+};
+MODULE_DEVICE_TABLE(pci, igbvf_pci_tbl);
+
+/* PCI Device API Driver */
+static struct pci_driver igbvf_driver = {
+ .name = igbvf_driver_name,
+ .id_table = igbvf_pci_tbl,
+ .probe = igbvf_probe,
+ .remove = __devexit_p(igbvf_remove),
+#ifdef CONFIG_PM
+ /* Power Management Hooks */
+ .suspend = igbvf_suspend,
+ .resume = igbvf_resume,
+#endif
+ .shutdown = igbvf_shutdown,
+ .err_handler = &igbvf_err_handler
+};
+
+/**
+ * igbvf_init_module - Driver Registration Routine
+ *
+ * igbvf_init_module is the first routine called when the driver is
+ * loaded. All it does is register with the PCI subsystem.
+ **/
+static int __init igbvf_init_module(void)
+{
+ int ret;
+ printk(KERN_INFO "%s - version %s\n",
+ igbvf_driver_string, igbvf_driver_version);
+ printk(KERN_INFO "%s\n", igbvf_copyright);
+
+ ret = pci_register_driver(&igbvf_driver);
+ pm_qos_add_requirement(PM_QOS_CPU_DMA_LATENCY, igbvf_driver_name,
+ PM_QOS_DEFAULT_VALUE);
+
+ return ret;
+}
+module_init(igbvf_init_module);
+
+/**
+ * igbvf_exit_module - Driver Exit Cleanup Routine
+ *
+ * igbvf_exit_module is called just before the driver is removed
+ * from memory.
+ **/
+static void __exit igbvf_exit_module(void)
+{
+ pci_unregister_driver(&igbvf_driver);
+ pm_qos_remove_requirement(PM_QOS_CPU_DMA_LATENCY, igbvf_driver_name);
+}
+module_exit(igbvf_exit_module);
+
+
+MODULE_AUTHOR("Intel Corporation, <e1000-devel@lists.sourceforge.net>");
+MODULE_DESCRIPTION("Intel(R) 82576 Virtual Function Network Driver");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(DRV_VERSION);
+
+/* netdev.c */
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#ifndef _E1000_REGS_H_
+#define _E1000_REGS_H_
+
+#define E1000_CTRL 0x00000 /* Device Control - RW */
+#define E1000_STATUS 0x00008 /* Device Status - RO */
+#define E1000_ITR 0x000C4 /* Interrupt Throttling Rate - RW */
+#define E1000_EICR 0x01580 /* Ext. Interrupt Cause Read - R/clr */
+#define E1000_EITR(_n) (0x01680 + (0x4 * (_n)))
+#define E1000_EICS 0x01520 /* Ext. Interrupt Cause Set - W0 */
+#define E1000_EIMS 0x01524 /* Ext. Interrupt Mask Set/Read - RW */
+#define E1000_EIMC 0x01528 /* Ext. Interrupt Mask Clear - WO */
+#define E1000_EIAC 0x0152C /* Ext. Interrupt Auto Clear - RW */
+#define E1000_EIAM 0x01530 /* Ext. Interrupt Ack Auto Clear Mask - RW */
+#define E1000_IVAR0 0x01700 /* Interrupt Vector Allocation (array) - RW */
+#define E1000_IVAR_MISC 0x01740 /* IVAR for "other" causes - RW */
+/*
+ * Convenience macros
+ *
+ * Note: "_n" is the queue number of the register to be written to.
+ *
+ * Example usage:
+ * E1000_RDBAL_REG(current_rx_queue)
+ */
+#define E1000_RDBAL(_n) ((_n) < 4 ? (0x02800 + ((_n) * 0x100)) : \
+ (0x0C000 + ((_n) * 0x40)))
+#define E1000_RDBAH(_n) ((_n) < 4 ? (0x02804 + ((_n) * 0x100)) : \
+ (0x0C004 + ((_n) * 0x40)))
+#define E1000_RDLEN(_n) ((_n) < 4 ? (0x02808 + ((_n) * 0x100)) : \
+ (0x0C008 + ((_n) * 0x40)))
+#define E1000_SRRCTL(_n) ((_n) < 4 ? (0x0280C + ((_n) * 0x100)) : \
+ (0x0C00C + ((_n) * 0x40)))
+#define E1000_RDH(_n) ((_n) < 4 ? (0x02810 + ((_n) * 0x100)) : \
+ (0x0C010 + ((_n) * 0x40)))
+#define E1000_RDT(_n) ((_n) < 4 ? (0x02818 + ((_n) * 0x100)) : \
+ (0x0C018 + ((_n) * 0x40)))
+#define E1000_RXDCTL(_n) ((_n) < 4 ? (0x02828 + ((_n) * 0x100)) : \
+ (0x0C028 + ((_n) * 0x40)))
+#define E1000_TDBAL(_n) ((_n) < 4 ? (0x03800 + ((_n) * 0x100)) : \
+ (0x0E000 + ((_n) * 0x40)))
+#define E1000_TDBAH(_n) ((_n) < 4 ? (0x03804 + ((_n) * 0x100)) : \
+ (0x0E004 + ((_n) * 0x40)))
+#define E1000_TDLEN(_n) ((_n) < 4 ? (0x03808 + ((_n) * 0x100)) : \
+ (0x0E008 + ((_n) * 0x40)))
+#define E1000_TDH(_n) ((_n) < 4 ? (0x03810 + ((_n) * 0x100)) : \
+ (0x0E010 + ((_n) * 0x40)))
+#define E1000_TDT(_n) ((_n) < 4 ? (0x03818 + ((_n) * 0x100)) : \
+ (0x0E018 + ((_n) * 0x40)))
+#define E1000_TXDCTL(_n) ((_n) < 4 ? (0x03828 + ((_n) * 0x100)) : \
+ (0x0E028 + ((_n) * 0x40)))
+#define E1000_DCA_TXCTRL(_n) (0x03814 + (_n << 8))
+#define E1000_DCA_RXCTRL(_n) (0x02814 + (_n << 8))
+#define E1000_RAL(_i) (((_i) <= 15) ? (0x05400 + ((_i) * 8)) : \
+ (0x054E0 + ((_i - 16) * 8)))
+#define E1000_RAH(_i) (((_i) <= 15) ? (0x05404 + ((_i) * 8)) : \
+ (0x054E4 + ((_i - 16) * 8)))
+
+/* Statistics registers */
+#define E1000_VFGPRC 0x00F10
+#define E1000_VFGORC 0x00F18
+#define E1000_VFMPRC 0x00F3C
+#define E1000_VFGPTC 0x00F14
+#define E1000_VFGOTC 0x00F34
+#define E1000_VFGOTLBC 0x00F50
+#define E1000_VFGPTLBC 0x00F44
+#define E1000_VFGORLBC 0x00F48
+#define E1000_VFGPRLBC 0x00F40
+
+/* These act per VF so an array friendly macro is used */
+#define E1000_V2PMAILBOX(_n) (0x00C40 + (4 * (_n)))
+#define E1000_VMBMEM(_n) (0x00800 + (64 * (_n)))
+
+/* Define macros for handling registers */
+#define er32(reg) readl(hw->hw_addr + E1000_##reg)
+#define ew32(reg, val) writel((val), hw->hw_addr + E1000_##reg)
+#define array_er32(reg, offset) \
+ readl(hw->hw_addr + E1000_##reg + (offset << 2))
+#define array_ew32(reg, offset, val) \
+ writel((val), hw->hw_addr + E1000_##reg + (offset << 2))
+#define e1e_flush() er32(STATUS)
+
+#endif
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+
+#include "vf.h"
+
+static s32 e1000_check_for_link_vf(struct e1000_hw *hw);
+static s32 e1000_get_link_up_info_vf(struct e1000_hw *hw, u16 *speed,
+ u16 *duplex);
+static s32 e1000_init_hw_vf(struct e1000_hw *hw);
+static s32 e1000_reset_hw_vf(struct e1000_hw *hw);
+
+static void e1000_update_mc_addr_list_vf(struct e1000_hw *hw, u8 *,
+ u32, u32, u32);
+static void e1000_rar_set_vf(struct e1000_hw *, u8 *, u32);
+static s32 e1000_read_mac_addr_vf(struct e1000_hw *);
+static s32 e1000_set_vfta_vf(struct e1000_hw *, u16, bool);
+
+/**
+ * e1000_init_mac_params_vf - Inits MAC params
+ * @hw: pointer to the HW structure
+ **/
+s32 e1000_init_mac_params_vf(struct e1000_hw *hw)
+{
+ struct e1000_mac_info *mac = &hw->mac;
+
+ /* VF's have no MTA Registers - PF feature only */
+ mac->mta_reg_count = 128;
+ /* VF's have no access to RAR entries */
+ mac->rar_entry_count = 1;
+
+ /* Function pointers */
+ /* reset */
+ mac->ops.reset_hw = e1000_reset_hw_vf;
+ /* hw initialization */
+ mac->ops.init_hw = e1000_init_hw_vf;
+ /* check for link */
+ mac->ops.check_for_link = e1000_check_for_link_vf;
+ /* link info */
+ mac->ops.get_link_up_info = e1000_get_link_up_info_vf;
+ /* multicast address update */
+ mac->ops.update_mc_addr_list = e1000_update_mc_addr_list_vf;
+ /* set mac address */
+ mac->ops.rar_set = e1000_rar_set_vf;
+ /* read mac address */
+ mac->ops.read_mac_addr = e1000_read_mac_addr_vf;
+ /* set vlan filter table array */
+ mac->ops.set_vfta = e1000_set_vfta_vf;
+
+ return E1000_SUCCESS;
+}
+
+/**
+ * e1000_init_function_pointers_vf - Inits function pointers
+ * @hw: pointer to the HW structure
+ **/
+void e1000_init_function_pointers_vf(struct e1000_hw *hw)
+{
+ hw->mac.ops.init_params = e1000_init_mac_params_vf;
+ hw->mbx.ops.init_params = e1000_init_mbx_params_vf;
+}
+
+/**
+ * e1000_get_link_up_info_vf - Gets link info.
+ * @hw: pointer to the HW structure
+ * @speed: pointer to 16 bit value to store link speed.
+ * @duplex: pointer to 16 bit value to store duplex.
+ *
+ * Since we cannot read the PHY and get accurate link info, we must rely upon
+ * the status register's data which is often stale and inaccurate.
+ **/
+static s32 e1000_get_link_up_info_vf(struct e1000_hw *hw, u16 *speed,
+ u16 *duplex)
+{
+ s32 status;
+
+ status = er32(STATUS);
+ if (status & E1000_STATUS_SPEED_1000)
+ *speed = SPEED_1000;
+ else if (status & E1000_STATUS_SPEED_100)
+ *speed = SPEED_100;
+ else
+ *speed = SPEED_10;
+
+ if (status & E1000_STATUS_FD)
+ *duplex = FULL_DUPLEX;
+ else
+ *duplex = HALF_DUPLEX;
+
+ return E1000_SUCCESS;
+}
+
+/**
+ * e1000_reset_hw_vf - Resets the HW
+ * @hw: pointer to the HW structure
+ *
+ * VF's provide a function level reset. This is done using bit 26 of ctrl_reg.
+ * This is all the reset we can perform on a VF.
+ **/
+static s32 e1000_reset_hw_vf(struct e1000_hw *hw)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ u32 timeout = E1000_VF_INIT_TIMEOUT;
+ u32 ret_val = -E1000_ERR_MAC_INIT;
+ u32 msgbuf[3];
+ u8 *addr = (u8 *)(&msgbuf[1]);
+ u32 ctrl;
+
+ /* assert vf queue/interrupt reset */
+ ctrl = er32(CTRL);
+ ew32(CTRL, ctrl | E1000_CTRL_RST);
+
+ /* we cannot initialize while the RSTI / RSTD bits are asserted */
+ while (!mbx->ops.check_for_rst(hw) && timeout) {
+ timeout--;
+ udelay(5);
+ }
+
+ if (timeout) {
+ /* mailbox timeout can now become active */
+ mbx->timeout = E1000_VF_MBX_INIT_TIMEOUT;
+
+ /* notify pf of vf reset completion */
+ msgbuf[0] = E1000_VF_RESET;
+ mbx->ops.write_posted(hw, msgbuf, 1);
+
+ msleep(10);
+
+ /* set our "perm_addr" based on info provided by PF */
+ ret_val = mbx->ops.read_posted(hw, msgbuf, 3);
+ if (!ret_val) {
+ if (msgbuf[0] == (E1000_VF_RESET | E1000_VT_MSGTYPE_ACK))
+ memcpy(hw->mac.perm_addr, addr, 6);
+ else
+ ret_val = -E1000_ERR_MAC_INIT;
+ }
+ }
+
+ return ret_val;
+}
+
+/**
+ * e1000_init_hw_vf - Inits the HW
+ * @hw: pointer to the HW structure
+ *
+ * Not much to do here except clear the PF Reset indication if there is one.
+ **/
+static s32 e1000_init_hw_vf(struct e1000_hw *hw)
+{
+ /* attempt to set and restore our mac address */
+ e1000_rar_set_vf(hw, hw->mac.addr, 0);
+
+ return E1000_SUCCESS;
+}
+
+/**
+ * e1000_hash_mc_addr_vf - Generate a multicast hash value
+ * @hw: pointer to the HW structure
+ * @mc_addr: pointer to a multicast address
+ *
+ * Generates a multicast address hash value which is used to determine
+ * the multicast filter table array address and new table value. See
+ * e1000_mta_set_generic()
+ **/
+static u32 e1000_hash_mc_addr_vf(struct e1000_hw *hw, u8 *mc_addr)
+{
+ u32 hash_value, hash_mask;
+ u8 bit_shift = 0;
+
+ /* Register count multiplied by bits per register */
+ hash_mask = (hw->mac.mta_reg_count * 32) - 1;
+
+ /*
+ * The bit_shift is the number of left-shifts
+ * where 0xFF would still fall within the hash mask.
+ */
+ while (hash_mask >> bit_shift != 0xFF)
+ bit_shift++;
+
+ hash_value = hash_mask & (((mc_addr[4] >> (8 - bit_shift)) |
+ (((u16) mc_addr[5]) << bit_shift)));
+
+ return hash_value;
+}
+
+/**
+ * e1000_update_mc_addr_list_vf - Update Multicast addresses
+ * @hw: pointer to the HW structure
+ * @mc_addr_list: array of multicast addresses to program
+ * @mc_addr_count: number of multicast addresses to program
+ * @rar_used_count: the first RAR register free to program
+ * @rar_count: total number of supported Receive Address Registers
+ *
+ * Updates the Receive Address Registers and Multicast Table Array.
+ * The caller must have a packed mc_addr_list of multicast addresses.
+ * The parameter rar_count will usually be hw->mac.rar_entry_count
+ * unless there are workarounds that change this.
+ **/
+void e1000_update_mc_addr_list_vf(struct e1000_hw *hw,
+ u8 *mc_addr_list, u32 mc_addr_count,
+ u32 rar_used_count, u32 rar_count)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ u32 msgbuf[E1000_VFMAILBOX_SIZE];
+ u16 *hash_list = (u16 *)&msgbuf[1];
+ u32 hash_value;
+ u32 cnt, i;
+
+ /* Each entry in the list uses 1 16 bit word. We have 30
+ * 16 bit words available in our HW msg buffer (minus 1 for the
+ * msg type). That's 30 hash values if we pack 'em right. If
+ * there are more than 30 MC addresses to add then punt the
+ * extras for now and then add code to handle more than 30 later.
+ * It would be unusual for a server to request that many multi-cast
+ * addresses except for in large enterprise network environments.
+ */
+
+ cnt = (mc_addr_count > 30) ? 30 : mc_addr_count;
+ msgbuf[0] = E1000_VF_SET_MULTICAST;
+ msgbuf[0] |= cnt << E1000_VT_MSGINFO_SHIFT;
+
+ for (i = 0; i < cnt; i++) {
+ hash_value = e1000_hash_mc_addr_vf(hw, mc_addr_list);
+ hash_list[i] = hash_value & 0x0FFFF;
+ mc_addr_list += ETH_ADDR_LEN;
+ }
+
+ mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE);
+}
+
+/**
+ * e1000_set_vfta_vf - Set/Unset vlan filter table address
+ * @hw: pointer to the HW structure
+ * @vid: determines the vfta register and bit to set/unset
+ * @set: if true then set bit, else clear bit
+ **/
+static s32 e1000_set_vfta_vf(struct e1000_hw *hw, u16 vid, bool set)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ u32 msgbuf[2];
+ s32 err;
+
+ msgbuf[0] = E1000_VF_SET_VLAN;
+ msgbuf[1] = vid;
+ /* Setting the 8 bit field MSG INFO to true indicates "add" */
+ if (set)
+ msgbuf[0] |= 1 << E1000_VT_MSGINFO_SHIFT;
+
+ mbx->ops.write_posted(hw, msgbuf, 2);
+
+ err = mbx->ops.read_posted(hw, msgbuf, 2);
+
+ /* if nacked the vlan was rejected */
+ if (!err && (msgbuf[0] == (E1000_VF_SET_VLAN | E1000_VT_MSGTYPE_NACK)))
+ err = -E1000_ERR_MAC_INIT;
+
+ return err;
+}
+
+/** e1000_rlpml_set_vf - Set the maximum receive packet length
+ * @hw: pointer to the HW structure
+ * @max_size: value to assign to max frame size
+ **/
+void e1000_rlpml_set_vf(struct e1000_hw *hw, u16 max_size)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ u32 msgbuf[2];
+
+ msgbuf[0] = E1000_VF_SET_LPE;
+ msgbuf[1] = max_size;
+
+ mbx->ops.write_posted(hw, msgbuf, 2);
+}
+
+/**
+ * e1000_rar_set_vf - set device MAC address
+ * @hw: pointer to the HW structure
+ * @addr: pointer to the receive address
+ * @index receive address array register
+ **/
+static void e1000_rar_set_vf(struct e1000_hw *hw, u8 * addr, u32 index)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ u32 msgbuf[3];
+ u8 *msg_addr = (u8 *)(&msgbuf[1]);
+ s32 ret_val;
+
+ memset(msgbuf, 0, 12);
+ msgbuf[0] = E1000_VF_SET_MAC_ADDR;
+ memcpy(msg_addr, addr, 6);
+ ret_val = mbx->ops.write_posted(hw, msgbuf, 3);
+
+ if (!ret_val)
+ ret_val = mbx->ops.read_posted(hw, msgbuf, 3);
+
+ /* if nacked the address was rejected, use "perm_addr" */
+ if (!ret_val &&
+ (msgbuf[0] == (E1000_VF_SET_MAC_ADDR | E1000_VT_MSGTYPE_NACK)))
+ e1000_read_mac_addr_vf(hw);
+}
+
+/**
+ * e1000_read_mac_addr_vf - Read device MAC address
+ * @hw: pointer to the HW structure
+ **/
+static s32 e1000_read_mac_addr_vf(struct e1000_hw *hw)
+{
+ int i;
+
+ for (i = 0; i < ETH_ADDR_LEN; i++)
+ hw->mac.addr[i] = hw->mac.perm_addr[i];
+
+ return E1000_SUCCESS;
+}
+
+/**
+ * e1000_check_for_link_vf - Check for link for a virtual interface
+ * @hw: pointer to the HW structure
+ *
+ * Checks to see if the underlying PF is still talking to the VF and
+ * if it is then it reports the link state to the hardware, otherwise
+ * it reports link down and returns an error.
+ **/
+static s32 e1000_check_for_link_vf(struct e1000_hw *hw)
+{
+ struct e1000_mbx_info *mbx = &hw->mbx;
+ struct e1000_mac_info *mac = &hw->mac;
+ s32 ret_val = E1000_SUCCESS;
+ u32 in_msg = 0;
+
+ /*
+ * We only want to run this if there has been a rst asserted.
+ * in this case that could mean a link change, device reset,
+ * or a virtual function reset
+ */
+
+ /* If we were hit with a reset drop the link */
+ if (!mbx->ops.check_for_rst(hw))
+ mac->get_link_status = true;
+
+ if (!mac->get_link_status)
+ goto out;
+
+ /* if link status is down no point in checking to see if pf is up */
+ if (!(er32(STATUS) & E1000_STATUS_LU))
+ goto out;
+
+ /* if the read failed it could just be a mailbox collision, best wait
+ * until we are called again and don't report an error */
+ if (mbx->ops.read(hw, &in_msg, 1))
+ goto out;
+
+ /* if incoming message isn't clear to send we are waiting on response */
+ if (!(in_msg & E1000_VT_MSGTYPE_CTS)) {
+ /* message is not CTS and is NACK we must have lost CTS status */
+ if (in_msg & E1000_VT_MSGTYPE_NACK)
+ ret_val = -E1000_ERR_MAC_INIT;
+ goto out;
+ }
+
+ /* the pf is talking, if we timed out in the past we reinit */
+ if (!mbx->timeout) {
+ ret_val = -E1000_ERR_MAC_INIT;
+ goto out;
+ }
+
+ /* if we passed all the tests above then the link is up and we no
+ * longer need to check for link */
+ mac->get_link_status = false;
+
+out:
+ return ret_val;
+}
+
--- /dev/null
+/*******************************************************************************
+
+ Intel(R) 82576 Virtual Function Linux driver
+ Copyright(c) 2009 Intel Corporation.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms and conditions of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ This program is distributed in the hope it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ You should have received a copy of the GNU General Public License along with
+ this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+ The full GNU General Public License is included in this distribution in
+ the file called "COPYING".
+
+ Contact Information:
+ e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#ifndef _E1000_VF_H_
+#define _E1000_VF_H_
+
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/if_ether.h>
+
+#include "regs.h"
+#include "defines.h"
+
+struct e1000_hw;
+
+#define E1000_DEV_ID_82576_VF 0x10CA
+#define E1000_REVISION_0 0
+#define E1000_REVISION_1 1
+#define E1000_REVISION_2 2
+#define E1000_REVISION_3 3
+#define E1000_REVISION_4 4
+
+#define E1000_FUNC_0 0
+#define E1000_FUNC_1 1
+
+/*
+ * Receive Address Register Count
+ * Number of high/low register pairs in the RAR. The RAR (Receive Address
+ * Registers) holds the directed and multicast addresses that we monitor.
+ * These entries are also used for MAC-based filtering.
+ */
+#define E1000_RAR_ENTRIES_VF 1
+
+/* Receive Descriptor - Advanced */
+union e1000_adv_rx_desc {
+ struct {
+ u64 pkt_addr; /* Packet buffer address */
+ u64 hdr_addr; /* Header buffer address */
+ } read;
+ struct {
+ struct {
+ union {
+ u32 data;
+ struct {
+ u16 pkt_info; /* RSS/Packet type */
+ u16 hdr_info; /* Split Header,
+ * hdr buffer length */
+ } hs_rss;
+ } lo_dword;
+ union {
+ u32 rss; /* RSS Hash */
+ struct {
+ u16 ip_id; /* IP id */
+ u16 csum; /* Packet Checksum */
+ } csum_ip;
+ } hi_dword;
+ } lower;
+ struct {
+ u32 status_error; /* ext status/error */
+ u16 length; /* Packet length */
+ u16 vlan; /* VLAN tag */
+ } upper;
+ } wb; /* writeback */
+};
+
+#define E1000_RXDADV_HDRBUFLEN_MASK 0x7FE0
+#define E1000_RXDADV_HDRBUFLEN_SHIFT 5
+
+/* Transmit Descriptor - Advanced */
+union e1000_adv_tx_desc {
+ struct {
+ u64 buffer_addr; /* Address of descriptor's data buf */
+ u32 cmd_type_len;
+ u32 olinfo_status;
+ } read;
+ struct {
+ u64 rsvd; /* Reserved */
+ u32 nxtseq_seed;
+ u32 status;
+ } wb;
+};
+
+/* Adv Transmit Descriptor Config Masks */
+#define E1000_ADVTXD_DTYP_CTXT 0x00200000 /* Advanced Context Descriptor */
+#define E1000_ADVTXD_DTYP_DATA 0x00300000 /* Advanced Data Descriptor */
+#define E1000_ADVTXD_DCMD_EOP 0x01000000 /* End of Packet */
+#define E1000_ADVTXD_DCMD_IFCS 0x02000000 /* Insert FCS (Ethernet CRC) */
+#define E1000_ADVTXD_DCMD_RS 0x08000000 /* Report Status */
+#define E1000_ADVTXD_DCMD_DEXT 0x20000000 /* Descriptor extension (1=Adv) */
+#define E1000_ADVTXD_DCMD_VLE 0x40000000 /* VLAN pkt enable */
+#define E1000_ADVTXD_DCMD_TSE 0x80000000 /* TCP Seg enable */
+#define E1000_ADVTXD_PAYLEN_SHIFT 14 /* Adv desc PAYLEN shift */
+
+/* Context descriptors */
+struct e1000_adv_tx_context_desc {
+ u32 vlan_macip_lens;
+ u32 seqnum_seed;
+ u32 type_tucmd_mlhl;
+ u32 mss_l4len_idx;
+};
+
+#define E1000_ADVTXD_MACLEN_SHIFT 9 /* Adv ctxt desc mac len shift */
+#define E1000_ADVTXD_TUCMD_IPV4 0x00000400 /* IP Packet Type: 1=IPv4 */
+#define E1000_ADVTXD_TUCMD_L4T_TCP 0x00000800 /* L4 Packet TYPE of TCP */
+#define E1000_ADVTXD_L4LEN_SHIFT 8 /* Adv ctxt L4LEN shift */
+#define E1000_ADVTXD_MSS_SHIFT 16 /* Adv ctxt MSS shift */
+
+enum e1000_mac_type {
+ e1000_undefined = 0,
+ e1000_vfadapt,
+ e1000_num_macs /* List is 1-based, so subtract 1 for true count. */
+};
+
+struct e1000_vf_stats {
+ u64 base_gprc;
+ u64 base_gptc;
+ u64 base_gorc;
+ u64 base_gotc;
+ u64 base_mprc;
+ u64 base_gotlbc;
+ u64 base_gptlbc;
+ u64 base_gorlbc;
+ u64 base_gprlbc;
+
+ u32 last_gprc;
+ u32 last_gptc;
+ u32 last_gorc;
+ u32 last_gotc;
+ u32 last_mprc;
+ u32 last_gotlbc;
+ u32 last_gptlbc;
+ u32 last_gorlbc;
+ u32 last_gprlbc;
+
+ u64 gprc;
+ u64 gptc;
+ u64 gorc;
+ u64 gotc;
+ u64 mprc;
+ u64 gotlbc;
+ u64 gptlbc;
+ u64 gorlbc;
+ u64 gprlbc;
+};
+
+#include "mbx.h"
+
+struct e1000_mac_operations {
+ /* Function pointers for the MAC. */
+ s32 (*init_params)(struct e1000_hw *);
+ s32 (*check_for_link)(struct e1000_hw *);
+ void (*clear_vfta)(struct e1000_hw *);
+ s32 (*get_bus_info)(struct e1000_hw *);
+ s32 (*get_link_up_info)(struct e1000_hw *, u16 *, u16 *);
+ void (*update_mc_addr_list)(struct e1000_hw *, u8 *, u32, u32, u32);
+ s32 (*reset_hw)(struct e1000_hw *);
+ s32 (*init_hw)(struct e1000_hw *);
+ s32 (*setup_link)(struct e1000_hw *);
+ void (*write_vfta)(struct e1000_hw *, u32, u32);
+ void (*mta_set)(struct e1000_hw *, u32);
+ void (*rar_set)(struct e1000_hw *, u8*, u32);
+ s32 (*read_mac_addr)(struct e1000_hw *);
+ s32 (*set_vfta)(struct e1000_hw *, u16, bool);
+};
+
+struct e1000_mac_info {
+ struct e1000_mac_operations ops;
+ u8 addr[6];
+ u8 perm_addr[6];
+
+ enum e1000_mac_type type;
+
+ u16 mta_reg_count;
+ u16 rar_entry_count;
+
+ bool get_link_status;
+};
+
+struct e1000_mbx_operations {
+ s32 (*init_params)(struct e1000_hw *hw);
+ s32 (*read)(struct e1000_hw *, u32 *, u16);
+ s32 (*write)(struct e1000_hw *, u32 *, u16);
+ s32 (*read_posted)(struct e1000_hw *, u32 *, u16);
+ s32 (*write_posted)(struct e1000_hw *, u32 *, u16);
+ s32 (*check_for_msg)(struct e1000_hw *);
+ s32 (*check_for_ack)(struct e1000_hw *);
+ s32 (*check_for_rst)(struct e1000_hw *);
+};
+
+struct e1000_mbx_stats {
+ u32 msgs_tx;
+ u32 msgs_rx;
+
+ u32 acks;
+ u32 reqs;
+ u32 rsts;
+};
+
+struct e1000_mbx_info {
+ struct e1000_mbx_operations ops;
+ struct e1000_mbx_stats stats;
+ u32 timeout;
+ u32 usec_delay;
+ u16 size;
+};
+
+struct e1000_dev_spec_vf {
+ u32 vf_number;
+ u32 v2p_mailbox;
+};
+
+struct e1000_hw {
+ void *back;
+
+ u8 __iomem *hw_addr;
+ u8 __iomem *flash_address;
+ unsigned long io_base;
+
+ struct e1000_mac_info mac;
+ struct e1000_mbx_info mbx;
+
+ union {
+ struct e1000_dev_spec_vf vf;
+ } dev_spec;
+
+ u16 device_id;
+ u16 subsystem_vendor_id;
+ u16 subsystem_device_id;
+ u16 vendor_id;
+
+ u8 revision_id;
+};
+
+/* These functions must be implemented by drivers */
+void e1000_rlpml_set_vf(struct e1000_hw *, u16);
+void e1000_init_function_pointers_vf(struct e1000_hw *hw);
+s32 e1000_init_mac_params_vf(struct e1000_hw *hw);
+
+
+#endif /* _E1000_VF_H_ */
pscr |= FORCE_LINK_PASS;
wrlp(mp, PORT_SERIAL_CONTROL, pscr);
- wrlp(mp, SDMA_CONFIG, PORT_SDMA_CONFIG_DEFAULT_VALUE);
-
/*
* Configure TX path and queues.
*/
netif_carrier_off(dev);
+ wrlp(mp, SDMA_CONFIG, PORT_SDMA_CONFIG_DEFAULT_VALUE);
+
set_rx_coal(mp, 250);
set_tx_coal(mp, 0);
{
u64 val = 0;
+ *ret = 0;
switch (rp->rbr_block_size) {
case 4 * 1024:
val |= (RBR_BLKSIZE_4K << RBR_CFIG_B_BLKSIZE_SHIFT);
plat_dev = platform_device_register_simple("niu", niu_parent_index,
NULL, 0);
- if (!plat_dev)
+ if (IS_ERR(plat_dev))
return NULL;
for (i = 0; attr_name(niu_parent_attributes[i]); i++) {
"Florian Fainelli <florian@openwrt.org>");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("RDC R6040 NAPI PCI FastEthernet driver");
+MODULE_VERSION(DRV_VERSION " " DRV_RELDATE);
/* RX and TX interrupts that we handle */
#define RX_INTS (RX_FIFO_FULL | RX_NO_DESC | RX_FINISH)
goto out;
}
- SMSC_WARNING(HW, "Timed out waiting for MII write to finish");
+ SMSC_WARNING(HW, "Timed out waiting for MII read to finish");
reg = -EIO;
out:
static struct fujitsu_hotkey_t *fujitsu_hotkey;
-static void acpi_fujitsu_hotkey_notify(acpi_handle handle, u32 event,
- void *data);
+static void acpi_fujitsu_hotkey_notify(struct acpi_device *device, u32 event);
#ifdef CONFIG_LEDS_CLASS
static enum led_brightness logolamp_get(struct led_classdev *cdev);
static u32 dbg_level = 0x03;
#endif
-static void acpi_fujitsu_notify(acpi_handle handle, u32 event, void *data);
+static void acpi_fujitsu_notify(struct acpi_device *device, u32 event);
/* Fujitsu ACPI interface function */
static int acpi_fujitsu_add(struct acpi_device *device)
{
- acpi_status status;
acpi_handle handle;
int result = 0;
int state = 0;
sprintf(acpi_device_class(device), "%s", ACPI_FUJITSU_CLASS);
device->driver_data = fujitsu;
- status = acpi_install_notify_handler(device->handle,
- ACPI_DEVICE_NOTIFY,
- acpi_fujitsu_notify, fujitsu);
-
- if (ACPI_FAILURE(status)) {
- printk(KERN_ERR "Error installing notify handler\n");
- error = -ENODEV;
- goto err_stop;
- }
-
fujitsu->input = input = input_allocate_device();
if (!input) {
error = -ENOMEM;
- goto err_uninstall_notify;
+ goto err_stop;
}
snprintf(fujitsu->phys, sizeof(fujitsu->phys),
end:
err_free_input_dev:
input_free_device(input);
-err_uninstall_notify:
- acpi_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
- acpi_fujitsu_notify);
err_stop:
return result;
static int acpi_fujitsu_remove(struct acpi_device *device, int type)
{
- acpi_status status;
struct fujitsu_t *fujitsu = NULL;
if (!device || !acpi_driver_data(device))
fujitsu = acpi_driver_data(device);
- status = acpi_remove_notify_handler(fujitsu->acpi_handle,
- ACPI_DEVICE_NOTIFY,
- acpi_fujitsu_notify);
-
if (!device || !acpi_driver_data(device))
return -EINVAL;
/* Brightness notify */
-static void acpi_fujitsu_notify(acpi_handle handle, u32 event, void *data)
+static void acpi_fujitsu_notify(struct acpi_device *device, u32 event)
{
struct input_dev *input;
int keycode;
input_report_key(input, keycode, 0);
input_sync(input);
}
-
- return;
}
/* ACPI device for hotkey handling */
static int acpi_fujitsu_hotkey_add(struct acpi_device *device)
{
- acpi_status status;
acpi_handle handle;
int result = 0;
int state = 0;
sprintf(acpi_device_class(device), "%s", ACPI_FUJITSU_CLASS);
device->driver_data = fujitsu_hotkey;
- status = acpi_install_notify_handler(device->handle,
- ACPI_DEVICE_NOTIFY,
- acpi_fujitsu_hotkey_notify,
- fujitsu_hotkey);
-
- if (ACPI_FAILURE(status)) {
- printk(KERN_ERR "Error installing notify handler\n");
- error = -ENODEV;
- goto err_stop;
- }
-
/* kfifo */
spin_lock_init(&fujitsu_hotkey->fifo_lock);
fujitsu_hotkey->fifo =
fujitsu_hotkey->input = input = input_allocate_device();
if (!input) {
error = -ENOMEM;
- goto err_uninstall_notify;
+ goto err_free_fifo;
}
snprintf(fujitsu_hotkey->phys, sizeof(fujitsu_hotkey->phys),
end:
err_free_input_dev:
input_free_device(input);
-err_uninstall_notify:
- acpi_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
- acpi_fujitsu_hotkey_notify);
+err_free_fifo:
kfifo_free(fujitsu_hotkey->fifo);
err_stop:
static int acpi_fujitsu_hotkey_remove(struct acpi_device *device, int type)
{
- acpi_status status;
struct fujitsu_hotkey_t *fujitsu_hotkey = NULL;
if (!device || !acpi_driver_data(device))
fujitsu_hotkey = acpi_driver_data(device);
- status = acpi_remove_notify_handler(fujitsu_hotkey->acpi_handle,
- ACPI_DEVICE_NOTIFY,
- acpi_fujitsu_hotkey_notify);
-
fujitsu_hotkey->acpi_handle = NULL;
kfifo_free(fujitsu_hotkey->fifo);
return 0;
}
-static void acpi_fujitsu_hotkey_notify(acpi_handle handle, u32 event,
- void *data)
+static void acpi_fujitsu_hotkey_notify(struct acpi_device *device, u32 event)
{
struct input_dev *input;
int keycode, keycode_r;
input_sync(input);
break;
}
-
- return;
}
/* Initialization */
.ops = {
.add = acpi_fujitsu_add,
.remove = acpi_fujitsu_remove,
+ .notify = acpi_fujitsu_notify,
},
};
.ops = {
.add = acpi_fujitsu_hotkey_add,
.remove = acpi_fujitsu_hotkey_remove,
+ .notify = acpi_fujitsu_hotkey_notify,
},
};
static int acpi_pcc_hotkey_add(struct acpi_device *device);
static int acpi_pcc_hotkey_remove(struct acpi_device *device, int type);
static int acpi_pcc_hotkey_resume(struct acpi_device *device);
+static void acpi_pcc_hotkey_notify(struct acpi_device *device, u32 event);
static const struct acpi_device_id pcc_device_ids[] = {
{ "MAT0012", 0},
.add = acpi_pcc_hotkey_add,
.remove = acpi_pcc_hotkey_remove,
.resume = acpi_pcc_hotkey_resume,
+ .notify = acpi_pcc_hotkey_notify,
},
};
union acpi_object *hkey = NULL;
int i;
- status = acpi_evaluate_object(pcc->handle, METHOD_HKEY_SINF, 0,
+ status = acpi_evaluate_object(pcc->handle, METHOD_HKEY_SINF, NULL,
&buffer);
if (ACPI_FAILURE(status)) {
ACPI_DEBUG_PRINT((ACPI_DB_ERROR,
return;
}
-static void acpi_pcc_hotkey_notify(acpi_handle handle, u32 event, void *data)
+static void acpi_pcc_hotkey_notify(struct acpi_device *device, u32 event)
{
- struct pcc_acpi *pcc = (struct pcc_acpi *) data;
+ struct pcc_acpi *pcc = acpi_driver_data(device);
switch (event) {
case HKEY_NOTIFY:
static int acpi_pcc_hotkey_add(struct acpi_device *device)
{
- acpi_status status;
struct pcc_acpi *pcc;
int num_sifr, result;
goto out_sinf;
}
- /* initialize hotkey input device */
- status = acpi_install_notify_handler(pcc->handle, ACPI_DEVICE_NOTIFY,
- acpi_pcc_hotkey_notify, pcc);
-
- if (ACPI_FAILURE(status)) {
- ACPI_DEBUG_PRINT((ACPI_DB_ERROR,
- "Error installing notify handler\n"));
- result = -ENODEV;
- goto out_input;
- }
-
/* initialize backlight */
pcc->backlight = backlight_device_register("panasonic", NULL, pcc,
&pcc_backlight_ops);
if (IS_ERR(pcc->backlight))
- goto out_notify;
+ goto out_input;
if (!acpi_pcc_retrieve_biosdata(pcc, pcc->sinf)) {
ACPI_DEBUG_PRINT((ACPI_DB_ERROR,
out_backlight:
backlight_device_unregister(pcc->backlight);
-out_notify:
- acpi_remove_notify_handler(pcc->handle, ACPI_DEVICE_NOTIFY,
- acpi_pcc_hotkey_notify);
out_input:
input_unregister_device(pcc->input_dev);
/* no need to input_free_device() since core input API refcount and
backlight_device_unregister(pcc->backlight);
- acpi_remove_notify_handler(pcc->handle, ACPI_DEVICE_NOTIFY,
- acpi_pcc_hotkey_notify);
-
input_unregister_device(pcc->input_dev);
/* no need to input_free_device() since core input API refcount and
* free()s the device */
/*
* ACPI callbacks
*/
-static void sony_acpi_notify(acpi_handle handle, u32 event, void *data)
+static void sony_nc_notify(struct acpi_device *device, u32 event)
{
u32 ev = event;
struct sony_nc_event *key_event;
if (sony_call_snc_handle(key_handle, 0x200, &result)) {
- dprintk("sony_acpi_notify, unable to decode"
+ dprintk("sony_nc_notify, unable to decode"
" event 0x%.2x 0x%.2x\n", key_handle,
ev);
/* restore the original event */
} else
sony_laptop_report_input_event(ev);
- dprintk("sony_acpi_notify, event: 0x%.2x\n", ev);
+ dprintk("sony_nc_notify, event: 0x%.2x\n", ev);
acpi_bus_generate_proc_event(sony_nc_acpi_device, 1, ev);
}
goto outwalk;
}
- status = acpi_install_notify_handler(sony_nc_acpi_handle,
- ACPI_DEVICE_NOTIFY,
- sony_acpi_notify, NULL);
- if (ACPI_FAILURE(status)) {
- printk(KERN_WARNING DRV_PFX "unable to install notify handler (%u)\n", status);
- result = -ENODEV;
- goto outinput;
- }
-
if (acpi_video_backlight_support()) {
printk(KERN_INFO DRV_PFX "brightness ignored, must be "
"controlled by ACPI video driver\n");
if (sony_backlight_device)
backlight_device_unregister(sony_backlight_device);
- status = acpi_remove_notify_handler(sony_nc_acpi_handle,
- ACPI_DEVICE_NOTIFY,
- sony_acpi_notify);
- if (ACPI_FAILURE(status))
- printk(KERN_WARNING DRV_PFX "unable to remove notify handler\n");
-
- outinput:
sony_laptop_remove_input();
outwalk:
static int sony_nc_remove(struct acpi_device *device, int type)
{
- acpi_status status;
struct sony_nc_value *item;
if (sony_backlight_device)
sony_nc_acpi_device = NULL;
- status = acpi_remove_notify_handler(sony_nc_acpi_handle,
- ACPI_DEVICE_NOTIFY,
- sony_acpi_notify);
- if (ACPI_FAILURE(status))
- printk(KERN_WARNING DRV_PFX "unable to remove notify handler\n");
-
for (item = sony_nc_values; item->name; ++item) {
device_remove_file(&sony_pf_device->dev, &item->devattr);
}
.add = sony_nc_add,
.remove = sony_nc_remove,
.resume = sony_nc_resume,
+ .notify = sony_nc_notify,
},
};
static int acpi_wmi_remove(struct acpi_device *device, int type);
static int acpi_wmi_add(struct acpi_device *device);
+static void acpi_wmi_notify(struct acpi_device *device, u32 event);
static const struct acpi_device_id wmi_device_ids[] = {
{"PNP0C14", 0},
.ops = {
.add = acpi_wmi_add,
.remove = acpi_wmi_remove,
+ .notify = acpi_wmi_notify,
},
};
}
}
-static void acpi_wmi_notify(acpi_handle handle, u32 event, void *data)
+static void acpi_wmi_notify(struct acpi_device *device, u32 event)
{
struct guid_block *block;
struct wmi_block *wblock;
struct list_head *p;
- struct acpi_device *device = data;
list_for_each(p, &wmi_blocks.list) {
wblock = list_entry(p, struct wmi_block, list);
static int acpi_wmi_remove(struct acpi_device *device, int type)
{
- acpi_remove_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
- acpi_wmi_notify);
-
acpi_remove_address_space_handler(device->handle,
ACPI_ADR_SPACE_EC, &acpi_wmi_ec_space_handler);
acpi_status status;
int result = 0;
- status = acpi_install_notify_handler(device->handle, ACPI_DEVICE_NOTIFY,
- acpi_wmi_notify, device);
- if (ACPI_FAILURE(status)) {
- printk(KERN_ERR PREFIX "Error installing notify handler\n");
- return -ENODEV;
- }
-
status = acpi_install_address_space_handler(device->handle,
ACPI_ADR_SPACE_EC,
&acpi_wmi_ec_space_handler,
struct power_supply usb;
struct power_supply adapter;
+
+ struct delayed_work charging_restart_work;
};
int pcf50633_mbc_usb_curlim_set(struct pcf50633 *pcf, int ma)
struct pcf50633_mbc *mbc = platform_get_drvdata(pcf->mbc_pdev);
int ret = 0;
u8 bits;
+ int charging_start = 1;
+ u8 mbcs2, chgmod;
if (ma >= 1000)
bits = PCF50633_MBCC7_USB_1000mA;
bits = PCF50633_MBCC7_USB_500mA;
else if (ma >= 100)
bits = PCF50633_MBCC7_USB_100mA;
- else
+ else {
bits = PCF50633_MBCC7_USB_SUSPEND;
+ charging_start = 0;
+ }
ret = pcf50633_reg_set_bit_mask(pcf, PCF50633_REG_MBCC7,
PCF50633_MBCC7_USB_MASK, bits);
else
dev_info(pcf->dev, "usb curlim to %d mA\n", ma);
+ /* Manual charging start */
+ mbcs2 = pcf50633_reg_read(pcf, PCF50633_REG_MBCS2);
+ chgmod = (mbcs2 & PCF50633_MBCS2_MBC_MASK);
+
+ /* If chgmod == BATFULL, setting chgena has no effect.
+ * We need to set resume instead.
+ */
+ if (chgmod != PCF50633_MBCS2_MBC_BAT_FULL)
+ pcf50633_reg_set_bit_mask(pcf, PCF50633_REG_MBCC1,
+ PCF50633_MBCC1_CHGENA, PCF50633_MBCC1_CHGENA);
+ else
+ pcf50633_reg_set_bit_mask(pcf, PCF50633_REG_MBCC1,
+ PCF50633_MBCC1_RESUME, PCF50633_MBCC1_RESUME);
+
+ mbc->usb_active = charging_start;
+
power_supply_changed(&mbc->usb);
return ret;
}
EXPORT_SYMBOL_GPL(pcf50633_mbc_get_status);
-void pcf50633_mbc_set_status(struct pcf50633 *pcf, int what, int status)
-{
- struct pcf50633_mbc *mbc = platform_get_drvdata(pcf->mbc_pdev);
-
- if (what & PCF50633_MBC_USB_ONLINE)
- mbc->usb_online = !!status;
- if (what & PCF50633_MBC_USB_ACTIVE)
- mbc->usb_active = !!status;
- if (what & PCF50633_MBC_ADAPTER_ONLINE)
- mbc->adapter_online = !!status;
- if (what & PCF50633_MBC_ADAPTER_ACTIVE)
- mbc->adapter_active = !!status;
-}
-EXPORT_SYMBOL_GPL(pcf50633_mbc_set_status);
-
static ssize_t
show_chgmode(struct device *dev, struct device_attribute *attr, char *buf)
{
.attrs = pcf50633_mbc_sysfs_entries,
};
+/* MBC state machine switches into charging mode when the battery voltage
+ * falls below 96% of a battery float voltage. But the voltage drop in Li-ion
+ * batteries is marginal(1~2 %) till about 80% of its capacity - which means,
+ * after a BATFULL, charging won't be restarted until 80%.
+ *
+ * This work_struct function restarts charging at regular intervals to make
+ * sure we don't discharge too much
+ */
+
+static void pcf50633_mbc_charging_restart(struct work_struct *work)
+{
+ struct pcf50633_mbc *mbc;
+ u8 mbcs2, chgmod;
+
+ mbc = container_of(work, struct pcf50633_mbc,
+ charging_restart_work.work);
+
+ mbcs2 = pcf50633_reg_read(mbc->pcf, PCF50633_REG_MBCS2);
+ chgmod = (mbcs2 & PCF50633_MBCS2_MBC_MASK);
+
+ if (chgmod != PCF50633_MBCS2_MBC_BAT_FULL)
+ return;
+
+ /* Restart charging */
+ pcf50633_reg_set_bit_mask(mbc->pcf, PCF50633_REG_MBCC1,
+ PCF50633_MBCC1_RESUME, PCF50633_MBCC1_RESUME);
+ mbc->usb_active = 1;
+ power_supply_changed(&mbc->usb);
+
+ dev_info(mbc->pcf->dev, "Charging restarted\n");
+}
+
static void
pcf50633_mbc_irq_handler(int irq, void *data)
{
struct pcf50633_mbc *mbc = data;
+ int chg_restart_interval =
+ mbc->pcf->pdata->charging_restart_interval;
/* USB */
if (irq == PCF50633_IRQ_USBINS) {
mbc->usb_online = 0;
mbc->usb_active = 0;
pcf50633_mbc_usb_curlim_set(mbc->pcf, 0);
+ cancel_delayed_work_sync(&mbc->charging_restart_work);
}
/* Adapter */
if (irq == PCF50633_IRQ_BATFULL) {
mbc->usb_active = 0;
mbc->adapter_active = 0;
- }
+
+ if (chg_restart_interval > 0)
+ schedule_delayed_work(&mbc->charging_restart_work,
+ chg_restart_interval);
+ } else if (irq == PCF50633_IRQ_USBLIMON)
+ mbc->usb_active = 0;
+ else if (irq == PCF50633_IRQ_USBLIMOFF)
+ mbc->usb_active = 1;
power_supply_changed(&mbc->usb);
power_supply_changed(&mbc->adapter);
return ret;
}
+ INIT_DELAYED_WORK(&mbc->charging_restart_work,
+ pcf50633_mbc_charging_restart);
+
ret = sysfs_create_group(&pdev->dev.kobj, &mbc_attr_group);
if (ret)
dev_err(mbc->pcf->dev, "failed to create sysfs entries\n");
power_supply_unregister(&mbc->usb);
power_supply_unregister(&mbc->adapter);
+ cancel_delayed_work_sync(&mbc->charging_restart_work);
+
kfree(mbc);
return 0;
#include <linux/module.h>
#include <linux/platform_device.h>
+#include <linux/err.h>
#include <linux/interrupt.h>
#include <linux/power_supply.h>
#include <linux/pda_power.h>
+#include <linux/regulator/consumer.h>
#include <linux/timer.h>
#include <linux/jiffies.h>
+#include <linux/usb/otg.h>
static inline unsigned int get_irq_flags(struct resource *res)
{
static struct timer_list polling_timer;
static int polling;
+#ifdef CONFIG_USB_OTG_UTILS
+static struct otg_transceiver *transceiver;
+#endif
+static struct regulator *ac_draw;
+
enum {
PDA_PSY_OFFLINE = 0,
PDA_PSY_ONLINE = 1,
static void update_charger(void)
{
- if (!pdata->set_charge)
- return;
-
- if (new_ac_status > 0) {
- dev_dbg(dev, "charger on (AC)\n");
- pdata->set_charge(PDA_POWER_CHARGE_AC);
- } else if (new_usb_status > 0) {
- dev_dbg(dev, "charger on (USB)\n");
- pdata->set_charge(PDA_POWER_CHARGE_USB);
- } else {
- dev_dbg(dev, "charger off\n");
- pdata->set_charge(0);
+ static int regulator_enabled;
+ int max_uA = pdata->ac_max_uA;
+
+ if (pdata->set_charge) {
+ if (new_ac_status > 0) {
+ dev_dbg(dev, "charger on (AC)\n");
+ pdata->set_charge(PDA_POWER_CHARGE_AC);
+ } else if (new_usb_status > 0) {
+ dev_dbg(dev, "charger on (USB)\n");
+ pdata->set_charge(PDA_POWER_CHARGE_USB);
+ } else {
+ dev_dbg(dev, "charger off\n");
+ pdata->set_charge(0);
+ }
+ } else if (ac_draw) {
+ if (new_ac_status > 0) {
+ regulator_set_current_limit(ac_draw, max_uA, max_uA);
+ if (!regulator_enabled) {
+ dev_dbg(dev, "charger on (AC)\n");
+ regulator_enable(ac_draw);
+ regulator_enabled = 1;
+ }
+ } else {
+ if (regulator_enabled) {
+ dev_dbg(dev, "charger off\n");
+ regulator_disable(ac_draw);
+ regulator_enabled = 0;
+ }
+ }
}
}
jiffies + msecs_to_jiffies(pdata->polling_interval));
}
+#ifdef CONFIG_USB_OTG_UTILS
+static int otg_is_usb_online(void)
+{
+ return (transceiver->state == OTG_STATE_B_PERIPHERAL);
+}
+#endif
+
static int pda_power_probe(struct platform_device *pdev)
{
int ret = 0;
if (!pdata->polling_interval)
pdata->polling_interval = 2000;
+ if (!pdata->ac_max_uA)
+ pdata->ac_max_uA = 500000;
+
setup_timer(&charger_timer, charger_timer_func, 0);
setup_timer(&supply_timer, supply_timer_func, 0);
pda_psy_usb.num_supplicants = pdata->num_supplicants;
}
+ ac_draw = regulator_get(dev, "ac_draw");
+ if (IS_ERR(ac_draw)) {
+ dev_dbg(dev, "couldn't get ac_draw regulator\n");
+ ac_draw = NULL;
+ ret = PTR_ERR(ac_draw);
+ }
+
if (pdata->is_ac_online) {
ret = power_supply_register(&pdev->dev, &pda_psy_ac);
if (ret) {
}
}
+#ifdef CONFIG_USB_OTG_UTILS
+ transceiver = otg_get_transceiver();
+ if (transceiver && !pdata->is_usb_online) {
+ pdata->is_usb_online = otg_is_usb_online;
+ }
+#endif
+
if (pdata->is_usb_online) {
ret = power_supply_register(&pdev->dev, &pda_psy_usb);
if (ret) {
usb_supply_failed:
if (pdata->is_ac_online && ac_irq)
free_irq(ac_irq->start, &pda_psy_ac);
+#ifdef CONFIG_USB_OTG_UTILS
+ if (transceiver)
+ otg_put_transceiver(transceiver);
+#endif
ac_irq_failed:
if (pdata->is_ac_online)
power_supply_unregister(&pda_psy_ac);
ac_supply_failed:
+ if (ac_draw) {
+ regulator_put(ac_draw);
+ ac_draw = NULL;
+ }
if (pdata->exit)
pdata->exit(dev);
init_failed:
power_supply_unregister(&pda_psy_usb);
if (pdata->is_ac_online)
power_supply_unregister(&pda_psy_ac);
+#ifdef CONFIG_USB_OTG_UTILS
+ if (transceiver)
+ otg_put_transceiver(transceiver);
+#endif
+ if (ac_draw) {
+ regulator_put(ac_draw);
+ ac_draw = NULL;
+ }
if (pdata->exit)
pdata->exit(dev);
--- /dev/null
+/*
+ *
+ * Copyright (C) 2008 Christian Pellegrin <chripell@evolware.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ *
+ * Notes: the MAX3100 doesn't provide an interrupt on CTS so we have
+ * to use polling for flow control. TX empty IRQ is unusable, since
+ * writing conf clears FIFO buffer and we cannot have this interrupt
+ * always asking us for attention.
+ *
+ * Example platform data:
+
+ static struct plat_max3100 max3100_plat_data = {
+ .loopback = 0,
+ .crystal = 0,
+ .poll_time = 100,
+ };
+
+ static struct spi_board_info spi_board_info[] = {
+ {
+ .modalias = "max3100",
+ .platform_data = &max3100_plat_data,
+ .irq = IRQ_EINT12,
+ .max_speed_hz = 5*1000*1000,
+ .chip_select = 0,
+ },
+ };
+
+ * The initial minor number is 209 in the low-density serial port:
+ * mknod /dev/ttyMAX0 c 204 209
+ */
+
+#define MAX3100_MAJOR 204
+#define MAX3100_MINOR 209
+/* 4 MAX3100s should be enough for everyone */
+#define MAX_MAX3100 4
+
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/serial_core.h>
+#include <linux/serial.h>
+#include <linux/spi/spi.h>
+#include <linux/freezer.h>
+
+#include <linux/serial_max3100.h>
+
+#define MAX3100_C (1<<14)
+#define MAX3100_D (0<<14)
+#define MAX3100_W (1<<15)
+#define MAX3100_RX (0<<15)
+
+#define MAX3100_WC (MAX3100_W | MAX3100_C)
+#define MAX3100_RC (MAX3100_RX | MAX3100_C)
+#define MAX3100_WD (MAX3100_W | MAX3100_D)
+#define MAX3100_RD (MAX3100_RX | MAX3100_D)
+#define MAX3100_CMD (3 << 14)
+
+#define MAX3100_T (1<<14)
+#define MAX3100_R (1<<15)
+
+#define MAX3100_FEN (1<<13)
+#define MAX3100_SHDN (1<<12)
+#define MAX3100_TM (1<<11)
+#define MAX3100_RM (1<<10)
+#define MAX3100_PM (1<<9)
+#define MAX3100_RAM (1<<8)
+#define MAX3100_IR (1<<7)
+#define MAX3100_ST (1<<6)
+#define MAX3100_PE (1<<5)
+#define MAX3100_L (1<<4)
+#define MAX3100_BAUD (0xf)
+
+#define MAX3100_TE (1<<10)
+#define MAX3100_RAFE (1<<10)
+#define MAX3100_RTS (1<<9)
+#define MAX3100_CTS (1<<9)
+#define MAX3100_PT (1<<8)
+#define MAX3100_DATA (0xff)
+
+#define MAX3100_RT (MAX3100_R | MAX3100_T)
+#define MAX3100_RTC (MAX3100_RT | MAX3100_CTS | MAX3100_RAFE)
+
+/* the following simulate a status reg for ignore_status_mask */
+#define MAX3100_STATUS_PE 1
+#define MAX3100_STATUS_FE 2
+#define MAX3100_STATUS_OE 4
+
+struct max3100_port {
+ struct uart_port port;
+ struct spi_device *spi;
+
+ int cts; /* last CTS received for flow ctrl */
+ int tx_empty; /* last TX empty bit */
+
+ spinlock_t conf_lock; /* shared data */
+ int conf_commit; /* need to make changes */
+ int conf; /* configuration for the MAX31000
+ * (bits 0-7, bits 8-11 are irqs) */
+ int rts_commit; /* need to change rts */
+ int rts; /* rts status */
+ int baud; /* current baud rate */
+
+ int parity; /* keeps track if we should send parity */
+#define MAX3100_PARITY_ON 1
+#define MAX3100_PARITY_ODD 2
+#define MAX3100_7BIT 4
+ int rx_enabled; /* if we should rx chars */
+
+ int irq; /* irq assigned to the max3100 */
+
+ int minor; /* minor number */
+ int crystal; /* 1 if 3.6864Mhz crystal 0 for 1.8432 */
+ int loopback; /* 1 if we are in loopback mode */
+
+ /* for handling irqs: need workqueue since we do spi_sync */
+ struct workqueue_struct *workqueue;
+ struct work_struct work;
+ /* set to 1 to make the workhandler exit as soon as possible */
+ int force_end_work;
+ /* need to know we are suspending to avoid deadlock on workqueue */
+ int suspending;
+
+ /* hook for suspending MAX3100 via dedicated pin */
+ void (*max3100_hw_suspend) (int suspend);
+
+ /* poll time (in ms) for ctrl lines */
+ int poll_time;
+ /* and its timer */
+ struct timer_list timer;
+};
+
+static struct max3100_port *max3100s[MAX_MAX3100]; /* the chips */
+static DEFINE_MUTEX(max3100s_lock); /* race on probe */
+
+static int max3100_do_parity(struct max3100_port *s, u16 c)
+{
+ int parity;
+
+ if (s->parity & MAX3100_PARITY_ODD)
+ parity = 1;
+ else
+ parity = 0;
+
+ if (s->parity & MAX3100_7BIT)
+ c &= 0x7f;
+ else
+ c &= 0xff;
+
+ parity = parity ^ (hweight8(c) & 1);
+ return parity;
+}
+
+static int max3100_check_parity(struct max3100_port *s, u16 c)
+{
+ return max3100_do_parity(s, c) == ((c >> 8) & 1);
+}
+
+static void max3100_calc_parity(struct max3100_port *s, u16 *c)
+{
+ if (s->parity & MAX3100_7BIT)
+ *c &= 0x7f;
+ else
+ *c &= 0xff;
+
+ if (s->parity & MAX3100_PARITY_ON)
+ *c |= max3100_do_parity(s, *c) << 8;
+}
+
+static void max3100_work(struct work_struct *w);
+
+static void max3100_dowork(struct max3100_port *s)
+{
+ if (!s->force_end_work && !work_pending(&s->work) &&
+ !freezing(current) && !s->suspending)
+ queue_work(s->workqueue, &s->work);
+}
+
+static void max3100_timeout(unsigned long data)
+{
+ struct max3100_port *s = (struct max3100_port *)data;
+
+ if (s->port.info) {
+ max3100_dowork(s);
+ mod_timer(&s->timer, jiffies + s->poll_time);
+ }
+}
+
+static int max3100_sr(struct max3100_port *s, u16 tx, u16 *rx)
+{
+ struct spi_message message;
+ u16 etx, erx;
+ int status;
+ struct spi_transfer tran = {
+ .tx_buf = &etx,
+ .rx_buf = &erx,
+ .len = 2,
+ };
+
+ etx = cpu_to_be16(tx);
+ spi_message_init(&message);
+ spi_message_add_tail(&tran, &message);
+ status = spi_sync(s->spi, &message);
+ if (status) {
+ dev_warn(&s->spi->dev, "error while calling spi_sync\n");
+ return -EIO;
+ }
+ *rx = be16_to_cpu(erx);
+ s->tx_empty = (*rx & MAX3100_T) > 0;
+ dev_dbg(&s->spi->dev, "%04x - %04x\n", tx, *rx);
+ return 0;
+}
+
+static int max3100_handlerx(struct max3100_port *s, u16 rx)
+{
+ unsigned int ch, flg, status = 0;
+ int ret = 0, cts;
+
+ if (rx & MAX3100_R && s->rx_enabled) {
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+ ch = rx & (s->parity & MAX3100_7BIT ? 0x7f : 0xff);
+ if (rx & MAX3100_RAFE) {
+ s->port.icount.frame++;
+ flg = TTY_FRAME;
+ status |= MAX3100_STATUS_FE;
+ } else {
+ if (s->parity & MAX3100_PARITY_ON) {
+ if (max3100_check_parity(s, rx)) {
+ s->port.icount.rx++;
+ flg = TTY_NORMAL;
+ } else {
+ s->port.icount.parity++;
+ flg = TTY_PARITY;
+ status |= MAX3100_STATUS_PE;
+ }
+ } else {
+ s->port.icount.rx++;
+ flg = TTY_NORMAL;
+ }
+ }
+ uart_insert_char(&s->port, status, MAX3100_STATUS_OE, ch, flg);
+ ret = 1;
+ }
+
+ cts = (rx & MAX3100_CTS) > 0;
+ if (s->cts != cts) {
+ s->cts = cts;
+ uart_handle_cts_change(&s->port, cts ? TIOCM_CTS : 0);
+ }
+
+ return ret;
+}
+
+static void max3100_work(struct work_struct *w)
+{
+ struct max3100_port *s = container_of(w, struct max3100_port, work);
+ int rxchars;
+ u16 tx, rx;
+ int conf, cconf, rts, crts;
+ struct circ_buf *xmit = &s->port.info->xmit;
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ rxchars = 0;
+ do {
+ spin_lock(&s->conf_lock);
+ conf = s->conf;
+ cconf = s->conf_commit;
+ s->conf_commit = 0;
+ rts = s->rts;
+ crts = s->rts_commit;
+ s->rts_commit = 0;
+ spin_unlock(&s->conf_lock);
+ if (cconf)
+ max3100_sr(s, MAX3100_WC | conf, &rx);
+ if (crts) {
+ max3100_sr(s, MAX3100_WD | MAX3100_TE |
+ (s->rts ? MAX3100_RTS : 0), &rx);
+ rxchars += max3100_handlerx(s, rx);
+ }
+
+ max3100_sr(s, MAX3100_RD, &rx);
+ rxchars += max3100_handlerx(s, rx);
+
+ if (rx & MAX3100_T) {
+ tx = 0xffff;
+ if (s->port.x_char) {
+ tx = s->port.x_char;
+ s->port.icount.tx++;
+ s->port.x_char = 0;
+ } else if (!uart_circ_empty(xmit) &&
+ !uart_tx_stopped(&s->port)) {
+ tx = xmit->buf[xmit->tail];
+ xmit->tail = (xmit->tail + 1) &
+ (UART_XMIT_SIZE - 1);
+ s->port.icount.tx++;
+ }
+ if (tx != 0xffff) {
+ max3100_calc_parity(s, &tx);
+ tx |= MAX3100_WD | (s->rts ? MAX3100_RTS : 0);
+ max3100_sr(s, tx, &rx);
+ rxchars += max3100_handlerx(s, rx);
+ }
+ }
+
+ if (rxchars > 16 && s->port.info->port.tty != NULL) {
+ tty_flip_buffer_push(s->port.info->port.tty);
+ rxchars = 0;
+ }
+ if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS)
+ uart_write_wakeup(&s->port);
+
+ } while (!s->force_end_work &&
+ !freezing(current) &&
+ ((rx & MAX3100_R) ||
+ (!uart_circ_empty(xmit) &&
+ !uart_tx_stopped(&s->port))));
+
+ if (rxchars > 0 && s->port.info->port.tty != NULL)
+ tty_flip_buffer_push(s->port.info->port.tty);
+}
+
+static irqreturn_t max3100_irq(int irqno, void *dev_id)
+{
+ struct max3100_port *s = dev_id;
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ max3100_dowork(s);
+ return IRQ_HANDLED;
+}
+
+static void max3100_enable_ms(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ if (s->poll_time > 0)
+ mod_timer(&s->timer, jiffies);
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+}
+
+static void max3100_start_tx(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ max3100_dowork(s);
+}
+
+static void max3100_stop_rx(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ s->rx_enabled = 0;
+ spin_lock(&s->conf_lock);
+ s->conf &= ~MAX3100_RM;
+ s->conf_commit = 1;
+ spin_unlock(&s->conf_lock);
+ max3100_dowork(s);
+}
+
+static unsigned int max3100_tx_empty(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ /* may not be truly up-to-date */
+ max3100_dowork(s);
+ return s->tx_empty;
+}
+
+static unsigned int max3100_get_mctrl(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ /* may not be truly up-to-date */
+ max3100_dowork(s);
+ /* always assert DCD and DSR since these lines are not wired */
+ return (s->cts ? TIOCM_CTS : 0) | TIOCM_DSR | TIOCM_CAR;
+}
+
+static void max3100_set_mctrl(struct uart_port *port, unsigned int mctrl)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+ int rts;
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ rts = (mctrl & TIOCM_RTS) > 0;
+
+ spin_lock(&s->conf_lock);
+ if (s->rts != rts) {
+ s->rts = rts;
+ s->rts_commit = 1;
+ max3100_dowork(s);
+ }
+ spin_unlock(&s->conf_lock);
+}
+
+static void
+max3100_set_termios(struct uart_port *port, struct ktermios *termios,
+ struct ktermios *old)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+ int baud = 0;
+ unsigned cflag;
+ u32 param_new, param_mask, parity = 0;
+ struct tty_struct *tty = s->port.info->port.tty;
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+ if (!tty)
+ return;
+
+ cflag = termios->c_cflag;
+ param_new = 0;
+ param_mask = 0;
+
+ baud = tty_get_baud_rate(tty);
+ param_new = s->conf & MAX3100_BAUD;
+ switch (baud) {
+ case 300:
+ if (s->crystal)
+ baud = s->baud;
+ else
+ param_new = 15;
+ break;
+ case 600:
+ param_new = 14 + s->crystal;
+ break;
+ case 1200:
+ param_new = 13 + s->crystal;
+ break;
+ case 2400:
+ param_new = 12 + s->crystal;
+ break;
+ case 4800:
+ param_new = 11 + s->crystal;
+ break;
+ case 9600:
+ param_new = 10 + s->crystal;
+ break;
+ case 19200:
+ param_new = 9 + s->crystal;
+ break;
+ case 38400:
+ param_new = 8 + s->crystal;
+ break;
+ case 57600:
+ param_new = 1 + s->crystal;
+ break;
+ case 115200:
+ param_new = 0 + s->crystal;
+ break;
+ case 230400:
+ if (s->crystal)
+ param_new = 0;
+ else
+ baud = s->baud;
+ break;
+ default:
+ baud = s->baud;
+ }
+ tty_encode_baud_rate(tty, baud, baud);
+ s->baud = baud;
+ param_mask |= MAX3100_BAUD;
+
+ if ((cflag & CSIZE) == CS8) {
+ param_new &= ~MAX3100_L;
+ parity &= ~MAX3100_7BIT;
+ } else {
+ param_new |= MAX3100_L;
+ parity |= MAX3100_7BIT;
+ cflag = (cflag & ~CSIZE) | CS7;
+ }
+ param_mask |= MAX3100_L;
+
+ if (cflag & CSTOPB)
+ param_new |= MAX3100_ST;
+ else
+ param_new &= ~MAX3100_ST;
+ param_mask |= MAX3100_ST;
+
+ if (cflag & PARENB) {
+ param_new |= MAX3100_PE;
+ parity |= MAX3100_PARITY_ON;
+ } else {
+ param_new &= ~MAX3100_PE;
+ parity &= ~MAX3100_PARITY_ON;
+ }
+ param_mask |= MAX3100_PE;
+
+ if (cflag & PARODD)
+ parity |= MAX3100_PARITY_ODD;
+ else
+ parity &= ~MAX3100_PARITY_ODD;
+
+ /* mask termios capabilities we don't support */
+ cflag &= ~CMSPAR;
+ termios->c_cflag = cflag;
+
+ s->port.ignore_status_mask = 0;
+ if (termios->c_iflag & IGNPAR)
+ s->port.ignore_status_mask |=
+ MAX3100_STATUS_PE | MAX3100_STATUS_FE |
+ MAX3100_STATUS_OE;
+
+ /* we are sending char from a workqueue so enable */
+ s->port.info->port.tty->low_latency = 1;
+
+ if (s->poll_time > 0)
+ del_timer_sync(&s->timer);
+
+ uart_update_timeout(port, termios->c_cflag, baud);
+
+ spin_lock(&s->conf_lock);
+ s->conf = (s->conf & ~param_mask) | (param_new & param_mask);
+ s->conf_commit = 1;
+ s->parity = parity;
+ spin_unlock(&s->conf_lock);
+ max3100_dowork(s);
+
+ if (UART_ENABLE_MS(&s->port, termios->c_cflag))
+ max3100_enable_ms(&s->port);
+}
+
+static void max3100_shutdown(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ if (s->suspending)
+ return;
+
+ s->force_end_work = 1;
+
+ if (s->poll_time > 0)
+ del_timer_sync(&s->timer);
+
+ if (s->workqueue) {
+ flush_workqueue(s->workqueue);
+ destroy_workqueue(s->workqueue);
+ s->workqueue = NULL;
+ }
+ if (s->irq)
+ free_irq(s->irq, s);
+
+ /* set shutdown mode to save power */
+ if (s->max3100_hw_suspend)
+ s->max3100_hw_suspend(1);
+ else {
+ u16 tx, rx;
+
+ tx = MAX3100_WC | MAX3100_SHDN;
+ max3100_sr(s, tx, &rx);
+ }
+}
+
+static int max3100_startup(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+ char b[12];
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ s->conf = MAX3100_RM;
+ s->baud = s->crystal ? 230400 : 115200;
+ s->rx_enabled = 1;
+
+ if (s->suspending)
+ return 0;
+
+ s->force_end_work = 0;
+ s->parity = 0;
+ s->rts = 0;
+
+ sprintf(b, "max3100-%d", s->minor);
+ s->workqueue = create_freezeable_workqueue(b);
+ if (!s->workqueue) {
+ dev_warn(&s->spi->dev, "cannot create workqueue\n");
+ return -EBUSY;
+ }
+ INIT_WORK(&s->work, max3100_work);
+
+ if (request_irq(s->irq, max3100_irq,
+ IRQF_TRIGGER_FALLING, "max3100", s) < 0) {
+ dev_warn(&s->spi->dev, "cannot allocate irq %d\n", s->irq);
+ s->irq = 0;
+ destroy_workqueue(s->workqueue);
+ s->workqueue = NULL;
+ return -EBUSY;
+ }
+
+ if (s->loopback) {
+ u16 tx, rx;
+ tx = 0x4001;
+ max3100_sr(s, tx, &rx);
+ }
+
+ if (s->max3100_hw_suspend)
+ s->max3100_hw_suspend(0);
+ s->conf_commit = 1;
+ max3100_dowork(s);
+ /* wait for clock to settle */
+ msleep(50);
+
+ max3100_enable_ms(&s->port);
+
+ return 0;
+}
+
+static const char *max3100_type(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ return s->port.type == PORT_MAX3100 ? "MAX3100" : NULL;
+}
+
+static void max3100_release_port(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+}
+
+static void max3100_config_port(struct uart_port *port, int flags)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ if (flags & UART_CONFIG_TYPE)
+ s->port.type = PORT_MAX3100;
+}
+
+static int max3100_verify_port(struct uart_port *port,
+ struct serial_struct *ser)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+ int ret = -EINVAL;
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ if (ser->type == PORT_UNKNOWN || ser->type == PORT_MAX3100)
+ ret = 0;
+ return ret;
+}
+
+static void max3100_stop_tx(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+}
+
+static int max3100_request_port(struct uart_port *port)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+ return 0;
+}
+
+static void max3100_break_ctl(struct uart_port *port, int break_state)
+{
+ struct max3100_port *s = container_of(port,
+ struct max3100_port,
+ port);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+}
+
+static struct uart_ops max3100_ops = {
+ .tx_empty = max3100_tx_empty,
+ .set_mctrl = max3100_set_mctrl,
+ .get_mctrl = max3100_get_mctrl,
+ .stop_tx = max3100_stop_tx,
+ .start_tx = max3100_start_tx,
+ .stop_rx = max3100_stop_rx,
+ .enable_ms = max3100_enable_ms,
+ .break_ctl = max3100_break_ctl,
+ .startup = max3100_startup,
+ .shutdown = max3100_shutdown,
+ .set_termios = max3100_set_termios,
+ .type = max3100_type,
+ .release_port = max3100_release_port,
+ .request_port = max3100_request_port,
+ .config_port = max3100_config_port,
+ .verify_port = max3100_verify_port,
+};
+
+static struct uart_driver max3100_uart_driver = {
+ .owner = THIS_MODULE,
+ .driver_name = "ttyMAX",
+ .dev_name = "ttyMAX",
+ .major = MAX3100_MAJOR,
+ .minor = MAX3100_MINOR,
+ .nr = MAX_MAX3100,
+};
+static int uart_driver_registered;
+
+static int __devinit max3100_probe(struct spi_device *spi)
+{
+ int i, retval;
+ struct plat_max3100 *pdata;
+ u16 tx, rx;
+
+ mutex_lock(&max3100s_lock);
+
+ if (!uart_driver_registered) {
+ uart_driver_registered = 1;
+ retval = uart_register_driver(&max3100_uart_driver);
+ if (retval) {
+ printk(KERN_ERR "Couldn't register max3100 uart driver\n");
+ mutex_unlock(&max3100s_lock);
+ return retval;
+ }
+ }
+
+ for (i = 0; i < MAX_MAX3100; i++)
+ if (!max3100s[i])
+ break;
+ if (i == MAX_MAX3100) {
+ dev_warn(&spi->dev, "too many MAX3100 chips\n");
+ mutex_unlock(&max3100s_lock);
+ return -ENOMEM;
+ }
+
+ max3100s[i] = kzalloc(sizeof(struct max3100_port), GFP_KERNEL);
+ if (!max3100s[i]) {
+ dev_warn(&spi->dev,
+ "kmalloc for max3100 structure %d failed!\n", i);
+ mutex_unlock(&max3100s_lock);
+ return -ENOMEM;
+ }
+ max3100s[i]->spi = spi;
+ max3100s[i]->irq = spi->irq;
+ spin_lock_init(&max3100s[i]->conf_lock);
+ dev_set_drvdata(&spi->dev, max3100s[i]);
+ pdata = spi->dev.platform_data;
+ max3100s[i]->crystal = pdata->crystal;
+ max3100s[i]->loopback = pdata->loopback;
+ max3100s[i]->poll_time = pdata->poll_time * HZ / 1000;
+ if (pdata->poll_time > 0 && max3100s[i]->poll_time == 0)
+ max3100s[i]->poll_time = 1;
+ max3100s[i]->max3100_hw_suspend = pdata->max3100_hw_suspend;
+ max3100s[i]->minor = i;
+ init_timer(&max3100s[i]->timer);
+ max3100s[i]->timer.function = max3100_timeout;
+ max3100s[i]->timer.data = (unsigned long) max3100s[i];
+
+ dev_dbg(&spi->dev, "%s: adding port %d\n", __func__, i);
+ max3100s[i]->port.irq = max3100s[i]->irq;
+ max3100s[i]->port.uartclk = max3100s[i]->crystal ? 3686400 : 1843200;
+ max3100s[i]->port.fifosize = 16;
+ max3100s[i]->port.ops = &max3100_ops;
+ max3100s[i]->port.flags = UPF_SKIP_TEST | UPF_BOOT_AUTOCONF;
+ max3100s[i]->port.line = i;
+ max3100s[i]->port.type = PORT_MAX3100;
+ max3100s[i]->port.dev = &spi->dev;
+ retval = uart_add_one_port(&max3100_uart_driver, &max3100s[i]->port);
+ if (retval < 0)
+ dev_warn(&spi->dev,
+ "uart_add_one_port failed for line %d with error %d\n",
+ i, retval);
+
+ /* set shutdown mode to save power. Will be woken-up on open */
+ if (max3100s[i]->max3100_hw_suspend)
+ max3100s[i]->max3100_hw_suspend(1);
+ else {
+ tx = MAX3100_WC | MAX3100_SHDN;
+ max3100_sr(max3100s[i], tx, &rx);
+ }
+ mutex_unlock(&max3100s_lock);
+ return 0;
+}
+
+static int __devexit max3100_remove(struct spi_device *spi)
+{
+ struct max3100_port *s = dev_get_drvdata(&spi->dev);
+ int i;
+
+ mutex_lock(&max3100s_lock);
+
+ /* find out the index for the chip we are removing */
+ for (i = 0; i < MAX_MAX3100; i++)
+ if (max3100s[i] == s)
+ break;
+
+ dev_dbg(&spi->dev, "%s: removing port %d\n", __func__, i);
+ uart_remove_one_port(&max3100_uart_driver, &max3100s[i]->port);
+ kfree(max3100s[i]);
+ max3100s[i] = NULL;
+
+ /* check if this is the last chip we have */
+ for (i = 0; i < MAX_MAX3100; i++)
+ if (max3100s[i]) {
+ mutex_unlock(&max3100s_lock);
+ return 0;
+ }
+ pr_debug("removing max3100 driver\n");
+ uart_unregister_driver(&max3100_uart_driver);
+
+ mutex_unlock(&max3100s_lock);
+ return 0;
+}
+
+#ifdef CONFIG_PM
+
+static int max3100_suspend(struct spi_device *spi, pm_message_t state)
+{
+ struct max3100_port *s = dev_get_drvdata(&spi->dev);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ disable_irq(s->irq);
+
+ s->suspending = 1;
+ uart_suspend_port(&max3100_uart_driver, &s->port);
+
+ if (s->max3100_hw_suspend)
+ s->max3100_hw_suspend(1);
+ else {
+ /* no HW suspend, so do SW one */
+ u16 tx, rx;
+
+ tx = MAX3100_WC | MAX3100_SHDN;
+ max3100_sr(s, tx, &rx);
+ }
+ return 0;
+}
+
+static int max3100_resume(struct spi_device *spi)
+{
+ struct max3100_port *s = dev_get_drvdata(&spi->dev);
+
+ dev_dbg(&s->spi->dev, "%s\n", __func__);
+
+ if (s->max3100_hw_suspend)
+ s->max3100_hw_suspend(0);
+ uart_resume_port(&max3100_uart_driver, &s->port);
+ s->suspending = 0;
+
+ enable_irq(s->irq);
+
+ s->conf_commit = 1;
+ if (s->workqueue)
+ max3100_dowork(s);
+
+ return 0;
+}
+
+#else
+#define max3100_suspend NULL
+#define max3100_resume NULL
+#endif
+
+static struct spi_driver max3100_driver = {
+ .driver = {
+ .name = "max3100",
+ .bus = &spi_bus_type,
+ .owner = THIS_MODULE,
+ },
+
+ .probe = max3100_probe,
+ .remove = __devexit_p(max3100_remove),
+ .suspend = max3100_suspend,
+ .resume = max3100_resume,
+};
+
+static int __init max3100_init(void)
+{
+ return spi_register_driver(&max3100_driver);
+}
+module_init(max3100_init);
+
+static void __exit max3100_exit(void)
+{
+ spi_unregister_driver(&max3100_driver);
+}
+module_exit(max3100_exit);
+
+MODULE_DESCRIPTION("MAX3100 driver");
+MODULE_AUTHOR("Christian Pellegrin <chripell@evolware.org>");
+MODULE_LICENSE("GPL");
.major = TTY_MAJOR,
};
-static int __init sunsu_kbd_ms_init(struct uart_sunsu_port *up)
+static int __devinit sunsu_kbd_ms_init(struct uart_sunsu_port *up)
{
int quot, baud;
#ifdef CONFIG_SERIO
* are always powered while this driver is active, and use
* active-low power switches.
*/
- for (i = 0; i < pdata->ports; i++) {
+ for (i = 0; i < ARRAY_SIZE(pdata->vbus_pin); i++) {
if (pdata->vbus_pin[i] <= 0)
continue;
gpio_request(pdata->vbus_pin[i], "ohci_vbus");
int i;
if (pdata) {
- for (i = 0; i < pdata->ports; i++) {
+ for (i = 0; i < ARRAY_SIZE(pdata->vbus_pin); i++) {
if (pdata->vbus_pin[i] <= 0)
continue;
gpio_direction_output(pdata->vbus_pin[i], 1);
*/
#include <linux/fs.h>
+#include <asm/page.h> /* for PAGE_SIZE */
#include "befs.h"
#include "super.h"
* locked buffer. This only can happen if someone has written the buffer
* directly, with submit_bh(). At the address_space level PageWriteback
* prevents this contention from occurring.
+ *
+ * If block_write_full_page() is called with wbc->sync_mode ==
+ * WB_SYNC_ALL, the writes are posted using WRITE_SYNC_PLUG; this
+ * causes the writes to be flagged as synchronous writes, but the
+ * block device queue will NOT be unplugged, since usually many pages
+ * will be pushed to the out before the higher-level caller actually
+ * waits for the writes to be completed. The various wait functions,
+ * such as wait_on_writeback_range() will ultimately call sync_page()
+ * which will ultimately call blk_run_backing_dev(), which will end up
+ * unplugging the device queue.
*/
static int __block_write_full_page(struct inode *inode, struct page *page,
get_block_t *get_block, struct writeback_control *wbc)
struct buffer_head *bh, *head;
const unsigned blocksize = 1 << inode->i_blkbits;
int nr_underway = 0;
- int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
+ int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
+ WRITE_SYNC_PLUG : WRITE);
BUG_ON(!PageLocked(page));
if (!page_has_buffers(page)) {
create_empty_buffers(page, inode->i_sb->s_blocksize,
(1 << BH_Dirty)|(1 << BH_Uptodate));
- } else if (!walk_page_buffers(NULL, page_buffers(page), 0, PAGE_CACHE_SIZE, NULL, buffer_unmapped)) {
- /* Provide NULL instead of get_block so that we catch bugs if buffers weren't really mapped */
- return block_write_full_page(page, NULL, wbc);
+ page_bufs = page_buffers(page);
+ } else {
+ page_bufs = page_buffers(page);
+ if (!walk_page_buffers(NULL, page_bufs, 0, PAGE_CACHE_SIZE,
+ NULL, buffer_unmapped)) {
+ /* Provide NULL get_block() to catch bugs if buffers
+ * weren't really mapped */
+ return block_write_full_page(page, NULL, wbc);
+ }
}
- page_bufs = page_buffers(page);
-
handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
if (IS_ERR(handle)) {
if (ext3_journal_current_handle())
goto out_fail;
+ if (page_has_buffers(page)) {
+ if (!walk_page_buffers(NULL, page_buffers(page), 0,
+ PAGE_CACHE_SIZE, NULL, buffer_unmapped)) {
+ /* Provide NULL get_block() to catch bugs if buffers
+ * weren't really mapped */
+ return block_write_full_page(page, NULL, wbc);
+ }
+ }
+
handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
dev = inode->i_sb->s_dev;
ino = inode->i_ino;
- pgoff = (loff_t)vma->pg_off << PAGE_SHIFT;
+ pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
}
seq_printf(m,
struct list_head children;
struct list_head node;
struct list_head wakeup_list;
- struct list_head g_list;
struct acpi_device_status status;
struct acpi_device_flags flags;
struct acpi_device_pnp pnp;
/*
* Target features
*/
-#define DM_TARGET_SUPPORTS_BARRIERS 0x00000001
struct target_type {
uint64_t features;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
/* for init task */
-#define INIT_FTRACE_GRAPH .ret_stack = NULL
+#define INIT_FTRACE_GRAPH .ret_stack = NULL,
/*
* Stack of return addresses for functions
#endif /* CONFIG_HW_BRANCH_TRACER */
-/*
- * A syscall entry in the ftrace syscalls array.
- *
- * @name: name of the syscall
- * @nb_args: number of parameters it takes
- * @types: list of types as strings
- * @args: list of args as strings (args[i] matches types[i])
- */
-struct syscall_metadata {
- const char *name;
- int nb_args;
- const char **types;
- const char **args;
-};
-
-#ifdef CONFIG_FTRACE_SYSCALLS
-extern void arch_init_ftrace_syscalls(void);
-extern struct syscall_metadata *syscall_nr_to_meta(int nr);
-extern void start_ftrace_syscalls(void);
-extern void stop_ftrace_syscalls(void);
-extern void ftrace_syscall_enter(struct pt_regs *regs);
-extern void ftrace_syscall_exit(struct pt_regs *regs);
-#else
-static inline void start_ftrace_syscalls(void) { }
-static inline void stop_ftrace_syscalls(void) { }
-static inline void ftrace_syscall_enter(struct pt_regs *regs) { }
-static inline void ftrace_syscall_exit(struct pt_regs *regs) { }
-#endif
-
#endif /* _LINUX_FTRACE_H */
#endif
}
+static inline void free_desc_masks(struct irq_desc *old_desc,
+ struct irq_desc *new_desc)
+{
+ free_cpumask_var(old_desc->affinity);
+
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+ free_cpumask_var(old_desc->pending_mask);
+#endif
+}
+
#else /* !CONFIG_SMP */
static inline bool init_alloc_desc_masks(struct irq_desc *desc, int cpu,
{
}
+static inline void free_desc_masks(struct irq_desc *old_desc,
+ struct irq_desc *new_desc)
+{
+}
#endif /* CONFIG_SMP */
#endif /* _LINUX_IRQ_H */
#define request_module(mod...) __request_module(true, mod)
#define request_module_nowait(mod...) __request_module(false, mod)
#define try_then_request_module(x, mod...) \
- ((x) ?: (__request_module(false, mod), (x)))
+ ((x) ?: (__request_module(true, mod), (x)))
#else
static inline int request_module(const char *name, ...) { return -ENOSYS; }
static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
char **batteries;
int num_batteries;
+ int charging_restart_interval;
+
/* Callbacks */
void (*probe_done)(struct pcf50633 *);
void (*mbc_event_callback)(struct pcf50633 *, int);
int pcf50633_mbc_usb_curlim_set(struct pcf50633 *pcf, int ma);
int pcf50633_mbc_get_status(struct pcf50633 *);
-void pcf50633_mbc_set_status(struct pcf50633 *, int what, int status);
#endif
unsigned int wait_for_status; /* msecs, default is 500 */
unsigned int wait_for_charger; /* msecs, default is 500 */
unsigned int polling_interval; /* msecs, default is 2000 */
+
+ unsigned long ac_max_uA; /* current to draw when on AC */
};
#endif /* __PDA_POWER_H__ */
#define task_is_stopped_or_traced(task) \
((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
#define task_contributes_to_load(task) \
- ((task->state & TASK_UNINTERRUPTIBLE) != 0)
+ ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
+ (task->flags & PF_FROZEN) == 0)
#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
--- /dev/null
+/*
+ *
+ * Copyright (C) 2007 Christian Pellegrin
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+
+#ifndef _LINUX_SERIAL_MAX3100_H
+#define _LINUX_SERIAL_MAX3100_H 1
+
+
+/**
+ * struct plat_max3100 - MAX3100 SPI UART platform data
+ * @loopback: force MAX3100 in loopback
+ * @crystal: 1 for 3.6864 Mhz, 0 for 1.8432
+ * @max3100_hw_suspend: MAX3100 has a shutdown pin. This is a hook
+ * called on suspend and resume to activate it.
+ * @poll_time: poll time for CTS signal in ms, 0 disables (so no hw
+ * flow ctrl is possible but you have less CPU usage)
+ *
+ * You should use this structure in your machine description to specify
+ * how the MAX3100 is connected. Example:
+ *
+ * static struct plat_max3100 max3100_plat_data = {
+ * .loopback = 0,
+ * .crystal = 0,
+ * .poll_time = 100,
+ * };
+ *
+ * static struct spi_board_info spi_board_info[] = {
+ * {
+ * .modalias = "max3100",
+ * .platform_data = &max3100_plat_data,
+ * .irq = IRQ_EINT12,
+ * .max_speed_hz = 5*1000*1000,
+ * .chip_select = 0,
+ * },
+ * };
+ *
+ **/
+struct plat_max3100 {
+ int loopback;
+ int crystal;
+ void (*max3100_hw_suspend) (int suspend);
+ int poll_time;
+};
+
+#endif
#include <asm/signal.h>
#include <linux/quota.h>
#include <linux/key.h>
-#include <linux/ftrace.h>
+#include <trace/syscall.h>
#define __SC_DECL1(t1, a1) t1 a1
#define __SC_DECL2(t2, a2, ...) t2 a2, __SC_DECL1(__VA_ARGS__)
const union nf_inet_addr *,
u_int8_t, const __be16 *, const __be16 *);
void nf_ct_expect_put(struct nf_conntrack_expect *exp);
-int nf_ct_expect_related(struct nf_conntrack_expect *expect);
int nf_ct_expect_related_report(struct nf_conntrack_expect *expect,
u32 pid, int report);
+static inline int nf_ct_expect_related(struct nf_conntrack_expect *expect)
+{
+ return nf_ct_expect_related_report(expect, 0, 0);
+}
#endif /*_NF_CONNTRACK_EXPECT_H*/
--- /dev/null
+#ifndef _TRACE_SYSCALL_H
+#define _TRACE_SYSCALL_H
+
+#include <asm/ptrace.h>
+
+/*
+ * A syscall entry in the ftrace syscalls array.
+ *
+ * @name: name of the syscall
+ * @nb_args: number of parameters it takes
+ * @types: list of types as strings
+ * @args: list of args as strings (args[i] matches types[i])
+ */
+struct syscall_metadata {
+ const char *name;
+ int nb_args;
+ const char **types;
+ const char **args;
+};
+
+#ifdef CONFIG_FTRACE_SYSCALLS
+extern void arch_init_ftrace_syscalls(void);
+extern struct syscall_metadata *syscall_nr_to_meta(int nr);
+extern void start_ftrace_syscalls(void);
+extern void stop_ftrace_syscalls(void);
+extern void ftrace_syscall_enter(struct pt_regs *regs);
+extern void ftrace_syscall_exit(struct pt_regs *regs);
+#else
+static inline void start_ftrace_syscalls(void) { }
+static inline void stop_ftrace_syscalls(void) { }
+static inline void ftrace_syscall_enter(struct pt_regs *regs) { }
+static inline void ftrace_syscall_exit(struct pt_regs *regs) { }
+#endif
+
+#endif /* _TRACE_SYSCALL_H */
sig->cputime_expires.virt_exp = cputime_zero;
sig->cputime_expires.sched_exp = 0;
+ if (sig->rlim[RLIMIT_CPU].rlim_cur != RLIM_INFINITY) {
+ sig->cputime_expires.prof_exp =
+ secs_to_cputime(sig->rlim[RLIMIT_CPU].rlim_cur);
+ sig->cputimer.running = 1;
+ }
+
/* The timer lists. */
INIT_LIST_HEAD(&sig->cpu_timers[0]);
INIT_LIST_HEAD(&sig->cpu_timers[1]);
atomic_inc(¤t->signal->live);
return 0;
}
- sig = kmem_cache_alloc(signal_cachep, GFP_KERNEL);
-
- if (sig)
- posix_cpu_timers_init_group(sig);
+ sig = kmem_cache_alloc(signal_cachep, GFP_KERNEL);
tsk->signal = sig;
if (!sig)
return -ENOMEM;
memcpy(sig->rlim, current->signal->rlim, sizeof sig->rlim);
task_unlock(current->group_leader);
+ posix_cpu_timers_init_group(sig);
+
acct_init_pacct(&sig->pacct);
tty_audit_fork(sig);
out_unlock:
double_unlock_hb(hb1, hb2);
- /* drop_futex_key_refs() must be called outside the spinlocks. */
+ /*
+ * drop_futex_key_refs() must be called outside the spinlocks. During
+ * the requeue we moved futex_q's from the hash bucket at key1 to the
+ * one at key2 and updated their key pointer. We no longer need to
+ * hold the references to key1.
+ */
while (--drop_count >= 0)
drop_futex_key_refs(&key1);
static void free_one_irq_desc(struct irq_desc *old_desc, struct irq_desc *desc)
{
free_kstat_irqs(old_desc, desc);
+ free_desc_masks(old_desc, desc);
arch_free_chip_data(old_desc, desc);
}
/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_UNINTERRUPTIBLE);
+ create->result = current;
complete(&create->started);
schedule();
/* We want our own signal handler (we take no signals by default). */
pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
- if (pid < 0) {
+ if (pid < 0)
create->result = ERR_PTR(pid);
- } else {
- struct sched_param param = { .sched_priority = 0 };
+ else
wait_for_completion(&create->started);
- read_lock(&tasklist_lock);
- create->result = find_task_by_pid_ns(pid, &init_pid_ns);
- read_unlock(&tasklist_lock);
- /*
- * root may have changed our (kthreadd's) priority or CPU mask.
- * The kernel thread should not inherit these properties.
- */
- sched_setscheduler(create->result, SCHED_NORMAL, ¶m);
- set_user_nice(create->result, KTHREAD_NICE_LEVEL);
- set_cpus_allowed_ptr(create->result, cpu_all_mask);
- }
complete(&create->done);
}
wait_for_completion(&create.done);
if (!IS_ERR(create.result)) {
+ struct sched_param param = { .sched_priority = 0 };
va_list args;
+
va_start(args, namefmt);
vsnprintf(create.result->comm, sizeof(create.result->comm),
namefmt, args);
va_end(args);
+ /*
+ * root may have changed our (kthreadd's) priority or CPU mask.
+ * The kernel thread should not inherit these properties.
+ */
+ sched_setscheduler_nocheck(create.result, SCHED_NORMAL, ¶m);
+ set_user_nice(create.result, KTHREAD_NICE_LEVEL);
+ set_cpus_allowed_ptr(create.result, cpu_all_mask);
}
return create.result;
}
cputime = secs_to_cputime(rlim_new);
if (cputime_eq(current->signal->it_prof_expires, cputime_zero) ||
- cputime_lt(current->signal->it_prof_expires, cputime)) {
+ cputime_gt(current->signal->it_prof_expires, cputime)) {
spin_lock_irq(¤t->sighand->siglock);
set_process_cpu_timer(current, CPUCLOCK_PROF, &cputime, NULL);
spin_unlock_irq(¤t->sighand->siglock);
cpu->cpu = virt_ticks(p);
break;
case CPUCLOCK_SCHED:
- cpu->sched = p->se.sum_exec_runtime + task_delta_exec(p);
+ cpu->sched = task_sched_runtime(p);
break;
}
return 0;
{
struct task_cputime cputime;
- thread_group_cputime(p, &cputime);
switch (CPUCLOCK_WHICH(which_clock)) {
default:
return -EINVAL;
case CPUCLOCK_PROF:
+ thread_group_cputime(p, &cputime);
cpu->cpu = cputime_add(cputime.utime, cputime.stime);
break;
case CPUCLOCK_VIRT:
+ thread_group_cputime(p, &cputime);
cpu->cpu = cputime.utime;
break;
case CPUCLOCK_SCHED:
- cpu->sched = cputime.sum_exec_runtime + task_delta_exec(p);
+ cpu->sched = thread_group_sched_runtime(p);
break;
}
return 0;
#include <linux/audit.h>
#include <linux/pid_namespace.h>
#include <linux/syscalls.h>
-
-#include <asm/pgtable.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
/*
list_add(&child->ptrace_entry, &new_parent->ptraced);
child->parent = new_parent;
}
-
+
/*
* Turn a tracing stop into a normal stop now, since with no tracer there
* would be no way to wake it up with SIGCONT or SIGKILL. If there was a
task_lock(task);
err = __ptrace_may_access(task, mode);
task_unlock(task);
- return (!err ? true : false);
+ return !err;
}
int ptrace_attach(struct task_struct *task)
copied += retval;
src += retval;
dst += retval;
- len -= retval;
+ len -= retval;
}
return copied;
}
copied += retval;
src += retval;
dst += retval;
- len -= retval;
+ len -= retval;
}
return copied;
}
if (unlikely(!arch_has_single_step()))
return -EIO;
user_enable_single_step(child);
- }
- else
+ } else {
user_disable_single_step(child);
+ }
child->exit_code = data;
wake_up_process(child);
struct rq_iterator *iterator);
#endif
+/* Time spent by the tasks of the cpu accounting group executing in ... */
+enum cpuacct_stat_index {
+ CPUACCT_STAT_USER, /* ... user mode */
+ CPUACCT_STAT_SYSTEM, /* ... kernel mode */
+
+ CPUACCT_STAT_NSTATS,
+};
+
#ifdef CONFIG_CGROUP_CPUACCT
static void cpuacct_charge(struct task_struct *tsk, u64 cputime);
+static void cpuacct_update_stats(struct task_struct *tsk,
+ enum cpuacct_stat_index idx, cputime_t val);
#else
static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) {}
+static inline void cpuacct_update_stats(struct task_struct *tsk,
+ enum cpuacct_stat_index idx, cputime_t val) {}
#endif
static inline void inc_cpu_load(struct rq *rq, unsigned long load)
EXPORT_PER_CPU_SYMBOL(kstat);
/*
- * Return any ns on the sched_clock that have not yet been banked in
+ * Return any ns on the sched_clock that have not yet been accounted in
* @p in case that task is currently running.
+ *
+ * Called with task_rq_lock() held on @rq.
*/
+static u64 do_task_delta_exec(struct task_struct *p, struct rq *rq)
+{
+ u64 ns = 0;
+
+ if (task_current(rq, p)) {
+ update_rq_clock(rq);
+ ns = rq->clock - p->se.exec_start;
+ if ((s64)ns < 0)
+ ns = 0;
+ }
+
+ return ns;
+}
+
unsigned long long task_delta_exec(struct task_struct *p)
{
unsigned long flags;
u64 ns = 0;
rq = task_rq_lock(p, &flags);
+ ns = do_task_delta_exec(p, rq);
+ task_rq_unlock(rq, &flags);
- if (task_current(rq, p)) {
- u64 delta_exec;
+ return ns;
+}
- update_rq_clock(rq);
- delta_exec = rq->clock - p->se.exec_start;
- if ((s64)delta_exec > 0)
- ns = delta_exec;
- }
+/*
+ * Return accounted runtime for the task.
+ * In case the task is currently running, return the runtime plus current's
+ * pending runtime that have not been accounted yet.
+ */
+unsigned long long task_sched_runtime(struct task_struct *p)
+{
+ unsigned long flags;
+ struct rq *rq;
+ u64 ns = 0;
+
+ rq = task_rq_lock(p, &flags);
+ ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
+ task_rq_unlock(rq, &flags);
+
+ return ns;
+}
+
+/*
+ * Return sum_exec_runtime for the thread group.
+ * In case the task is currently running, return the sum plus current's
+ * pending runtime that have not been accounted yet.
+ *
+ * Note that the thread group might have other running tasks as well,
+ * so the return value not includes other pending runtime that other
+ * running tasks might have.
+ */
+unsigned long long thread_group_sched_runtime(struct task_struct *p)
+{
+ struct task_cputime totals;
+ unsigned long flags;
+ struct rq *rq;
+ u64 ns;
+ rq = task_rq_lock(p, &flags);
+ thread_group_cputime(p, &totals);
+ ns = totals.sum_exec_runtime + do_task_delta_exec(p, rq);
task_rq_unlock(rq, &flags);
return ns;
cpustat->nice = cputime64_add(cpustat->nice, tmp);
else
cpustat->user = cputime64_add(cpustat->user, tmp);
+
+ cpuacct_update_stats(p, CPUACCT_STAT_USER, cputime);
/* Account for user time used */
acct_update_integrals(p);
}
else
cpustat->system = cputime64_add(cpustat->system, tmp);
+ cpuacct_update_stats(p, CPUACCT_STAT_SYSTEM, cputime);
+
/* Account for system time used */
acct_update_integrals(p);
}
cpumask_or(groupmask, groupmask, sched_group_cpus(group));
cpulist_scnprintf(str, sizeof(str), sched_group_cpus(group));
- printk(KERN_CONT " %s", str);
+ printk(KERN_CONT " %s (__cpu_power = %d)", str,
+ group->__cpu_power);
group = group->next;
} while (group != sd->groups);
struct cgroup_subsys_state css;
/* cpuusage holds pointer to a u64-type object on every cpu */
u64 *cpuusage;
+ struct percpu_counter cpustat[CPUACCT_STAT_NSTATS];
struct cpuacct *parent;
};
struct cgroup_subsys *ss, struct cgroup *cgrp)
{
struct cpuacct *ca = kzalloc(sizeof(*ca), GFP_KERNEL);
+ int i;
if (!ca)
- return ERR_PTR(-ENOMEM);
+ goto out;
ca->cpuusage = alloc_percpu(u64);
- if (!ca->cpuusage) {
- kfree(ca);
- return ERR_PTR(-ENOMEM);
- }
+ if (!ca->cpuusage)
+ goto out_free_ca;
+
+ for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
+ if (percpu_counter_init(&ca->cpustat[i], 0))
+ goto out_free_counters;
if (cgrp->parent)
ca->parent = cgroup_ca(cgrp->parent);
return &ca->css;
+
+out_free_counters:
+ while (--i >= 0)
+ percpu_counter_destroy(&ca->cpustat[i]);
+ free_percpu(ca->cpuusage);
+out_free_ca:
+ kfree(ca);
+out:
+ return ERR_PTR(-ENOMEM);
}
/* destroy an existing cpu accounting group */
cpuacct_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
{
struct cpuacct *ca = cgroup_ca(cgrp);
+ int i;
+ for (i = 0; i < CPUACCT_STAT_NSTATS; i++)
+ percpu_counter_destroy(&ca->cpustat[i]);
free_percpu(ca->cpuusage);
kfree(ca);
}
return 0;
}
+static const char *cpuacct_stat_desc[] = {
+ [CPUACCT_STAT_USER] = "user",
+ [CPUACCT_STAT_SYSTEM] = "system",
+};
+
+static int cpuacct_stats_show(struct cgroup *cgrp, struct cftype *cft,
+ struct cgroup_map_cb *cb)
+{
+ struct cpuacct *ca = cgroup_ca(cgrp);
+ int i;
+
+ for (i = 0; i < CPUACCT_STAT_NSTATS; i++) {
+ s64 val = percpu_counter_read(&ca->cpustat[i]);
+ val = cputime64_to_clock_t(val);
+ cb->fill(cb, cpuacct_stat_desc[i], val);
+ }
+ return 0;
+}
+
static struct cftype files[] = {
{
.name = "usage",
.name = "usage_percpu",
.read_seq_string = cpuacct_percpu_seq_read,
},
-
+ {
+ .name = "stat",
+ .read_map = cpuacct_stats_show,
+ },
};
static int cpuacct_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
return;
cpu = task_cpu(tsk);
+
+ rcu_read_lock();
+
ca = task_ca(tsk);
for (; ca; ca = ca->parent) {
u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);
*cpuusage += cputime;
}
+
+ rcu_read_unlock();
+}
+
+/*
+ * Charge the system/user time to the task's accounting group.
+ */
+static void cpuacct_update_stats(struct task_struct *tsk,
+ enum cpuacct_stat_index idx, cputime_t val)
+{
+ struct cpuacct *ca;
+
+ if (unlikely(!cpuacct_subsys.active))
+ return;
+
+ rcu_read_lock();
+ ca = task_ca(tsk);
+
+ do {
+ percpu_counter_add(&ca->cpustat[idx], val);
+ ca = ca->parent;
+ } while (ca);
+ rcu_read_unlock();
}
struct cgroup_subsys cpuacct_subsys = {
* cpupri_find - find the best (lowest-pri) CPU in the system
* @cp: The cpupri context
* @p: The task
- * @lowest_mask: A mask to fill in with selected CPUs
+ * @lowest_mask: A mask to fill in with selected CPUs (or NULL)
*
* Note: This function returns the recommended CPUs as calculated during the
* current invokation. By the time the call returns, the CPUs may have in
if (cpumask_any_and(&p->cpus_allowed, vec->mask) >= nr_cpu_ids)
continue;
- cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
+ if (lowest_mask)
+ cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
return 1;
}
static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
{
- cpumask_var_t mask;
-
if (rq->curr->rt.nr_cpus_allowed == 1)
return;
- if (!alloc_cpumask_var(&mask, GFP_ATOMIC))
- return;
-
if (p->rt.nr_cpus_allowed != 1
- && cpupri_find(&rq->rd->cpupri, p, mask))
- goto free;
+ && cpupri_find(&rq->rd->cpupri, p, NULL))
+ return;
- if (!cpupri_find(&rq->rd->cpupri, rq->curr, mask))
- goto free;
+ if (!cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+ return;
/*
* There appears to be other cpus that can accept
*/
requeue_task_rt(rq, p, 1);
resched_task(rq->curr);
-free:
- free_cpumask_var(mask);
}
#endif /* CONFIG_SMP */
}
/**
- * init_timer - initialize a timer.
+ * init_timer_key - initialize a timer
* @timer: the timer to be initialized
+ * @name: name of the timer
+ * @key: lockdep class key of the fake lock used for tracking timer
+ * sync lock dependencies
*
- * init_timer() must be done to a timer prior calling *any* of the
+ * init_timer_key() must be done to a timer prior calling *any* of the
* other timer functions.
*/
void init_timer_key(struct timer_list *timer,
{
int i;
int mask = 0;
- char *s, *token;
+ char *buf, *s, *token;
- s = kstrdup(str, GFP_KERNEL);
- if (s == NULL)
+ buf = kstrdup(str, GFP_KERNEL);
+ if (buf == NULL)
return -ENOMEM;
- s = strstrip(s);
+ s = strstrip(buf);
while (1) {
token = strsep(&s, ",");
break;
}
}
- kfree(s);
+ kfree(buf);
return mask;
}
+#include <trace/syscall.h>
#include <linux/kernel.h>
-#include <linux/ftrace.h>
#include <asm/syscall.h>
#include "trace_output.h"
}
#ifdef CONFIG_SMP
-static struct workqueue_struct *work_on_cpu_wq __read_mostly;
struct work_for_cpu {
- struct work_struct work;
+ struct completion completion;
long (*fn)(void *);
void *arg;
long ret;
};
-static void do_work_for_cpu(struct work_struct *w)
+static int do_work_for_cpu(void *_wfc)
{
- struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
-
+ struct work_for_cpu *wfc = _wfc;
wfc->ret = wfc->fn(wfc->arg);
+ complete(&wfc->completion);
+ return 0;
}
/**
*
* This will return the value @fn returns.
* It is up to the caller to ensure that the cpu doesn't go offline.
+ * The caller must not hold any locks which would prevent @fn from completing.
*/
long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
{
- struct work_for_cpu wfc;
-
- INIT_WORK(&wfc.work, do_work_for_cpu);
- wfc.fn = fn;
- wfc.arg = arg;
- queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
- flush_work(&wfc.work);
-
+ struct task_struct *sub_thread;
+ struct work_for_cpu wfc = {
+ .completion = COMPLETION_INITIALIZER_ONSTACK(wfc.completion),
+ .fn = fn,
+ .arg = arg,
+ };
+
+ sub_thread = kthread_create(do_work_for_cpu, &wfc, "work_for_cpu");
+ if (IS_ERR(sub_thread))
+ return PTR_ERR(sub_thread);
+ kthread_bind(sub_thread, cpu);
+ wake_up_process(sub_thread);
+ wait_for_completion(&wfc.completion);
return wfc.ret;
}
EXPORT_SYMBOL_GPL(work_on_cpu);
hotcpu_notifier(workqueue_cpu_callback, 0);
keventd_wq = create_workqueue("events");
BUG_ON(!keventd_wq);
-#ifdef CONFIG_SMP
- work_on_cpu_wq = create_workqueue("work_on_cpu");
- BUG_ON(!work_on_cpu_wq);
-#endif
}
if (str < end)
*str = '%';
++str;
- if (*fmt) {
- if (str < end)
- *str = *fmt;
- ++str;
- } else {
- --fmt;
- }
break;
case FORMAT_TYPE_NRCHARS: {
break;
case FORMAT_TYPE_INVALID:
- if (!*fmt)
- --fmt;
break;
case FORMAT_TYPE_NRCHARS: {
if (str < end)
*str = '%';
++str;
- if (*fmt) {
- if (str < end)
- *str = *fmt;
- ++str;
- } else {
- --fmt;
- }
break;
case FORMAT_TYPE_NRCHARS:
xt_free_table_info(info);
+ return counters;
+
free_counters:
vfree(counters);
nomem:
config NETFILTER_XT_TARGET_LED
tristate '"LED" target support'
- depends on LEDS_CLASS && LED_TRIGGERS
+ depends on LEDS_CLASS && LEDS_TRIGGERS
depends on NETFILTER_ADVANCED
help
This option adds a `LED' target, which allows you to blink LEDs in
struct net *net = nf_ct_exp_net(expect);
struct hlist_node *n;
unsigned int h;
- int ret = 0;
+ int ret = 1;
if (!master_help->helper) {
ret = -ESHUTDOWN;
return ret;
}
-int nf_ct_expect_related(struct nf_conntrack_expect *expect)
+int nf_ct_expect_related_report(struct nf_conntrack_expect *expect,
+ u32 pid, int report)
{
int ret;
spin_lock_bh(&nf_conntrack_lock);
ret = __nf_ct_expect_check(expect);
- if (ret < 0)
+ if (ret <= 0)
goto out;
+ ret = 0;
nf_ct_expect_insert(expect);
- atomic_inc(&expect->use);
- spin_unlock_bh(&nf_conntrack_lock);
- nf_ct_expect_event(IPEXP_NEW, expect);
- nf_ct_expect_put(expect);
- return ret;
-out:
spin_unlock_bh(&nf_conntrack_lock);
+ nf_ct_expect_event_report(IPEXP_NEW, expect, pid, report);
return ret;
-}
-EXPORT_SYMBOL_GPL(nf_ct_expect_related);
-
-int nf_ct_expect_related_report(struct nf_conntrack_expect *expect,
- u32 pid, int report)
-{
- int ret;
-
- spin_lock_bh(&nf_conntrack_lock);
- ret = __nf_ct_expect_check(expect);
- if (ret < 0)
- goto out;
- nf_ct_expect_insert(expect);
out:
spin_unlock_bh(&nf_conntrack_lock);
- if (ret == 0)
- nf_ct_expect_event_report(IPEXP_NEW, expect, pid, report);
return ret;
}
EXPORT_SYMBOL_GPL(nf_ct_expect_related_report);
return commit_creds(new);
no_change:
- error = 0;
error:
abort_creds(new);
return error;
return 0;
}
-static void __init snd_cs4231_init(struct snd_cs4231 *chip)
+static void __devinit snd_cs4231_init(struct snd_cs4231 *chip)
{
unsigned long flags;
return bytes_to_frames(substream->runtime, ptr);
}
-static int __init snd_cs4231_probe(struct snd_cs4231 *chip)
+static int __devinit snd_cs4231_probe(struct snd_cs4231 *chip)
{
unsigned long flags;
int i;
.pointer = snd_cs4231_capture_pointer,
};
-static int __init snd_cs4231_pcm(struct snd_card *card)
+static int __devinit snd_cs4231_pcm(struct snd_card *card)
{
struct snd_cs4231 *chip = card->private_data;
struct snd_pcm *pcm;
return 0;
}
-static int __init snd_cs4231_timer(struct snd_card *card)
+static int __devinit snd_cs4231_timer(struct snd_card *card)
{
struct snd_cs4231 *chip = card->private_data;
struct snd_timer *timer;
.private_value = (left_reg) | ((right_reg) << 8) | ((shift_left) << 16) | \
((shift_right) << 19) | ((mask) << 24) | ((invert) << 22) }
-static struct snd_kcontrol_new snd_cs4231_controls[] __initdata = {
+static struct snd_kcontrol_new snd_cs4231_controls[] __devinitdata = {
CS4231_DOUBLE("PCM Playback Switch", 0, CS4231_LEFT_OUTPUT,
CS4231_RIGHT_OUTPUT, 7, 7, 1, 1),
CS4231_DOUBLE("PCM Playback Volume", 0, CS4231_LEFT_OUTPUT,
CS4231_SINGLE("Headphone Out Switch", 0, CS4231_PIN_CTRL, 7, 1, 1)
};
-static int __init snd_cs4231_mixer(struct snd_card *card)
+static int __devinit snd_cs4231_mixer(struct snd_card *card)
{
struct snd_cs4231 *chip = card->private_data;
int err, idx;
static int dev;
-static int __init cs4231_attach_begin(struct snd_card **rcard)
+static int __devinit cs4231_attach_begin(struct snd_card **rcard)
{
struct snd_card *card;
struct snd_cs4231 *chip;
return 0;
}
-static int __init cs4231_attach_finish(struct snd_card *card)
+static int __devinit cs4231_attach_finish(struct snd_card *card)
{
struct snd_cs4231 *chip = card->private_data;
int err;
.dev_free = snd_cs4231_sbus_dev_free,
};
-static int __init snd_cs4231_sbus_create(struct snd_card *card,
- struct of_device *op,
- int dev)
+static int __devinit snd_cs4231_sbus_create(struct snd_card *card,
+ struct of_device *op,
+ int dev)
{
struct snd_cs4231 *chip = card->private_data;
int err;
.dev_free = snd_cs4231_ebus_dev_free,
};
-static int __init snd_cs4231_ebus_create(struct snd_card *card,
- struct of_device *op,
- int dev)
+static int __devinit snd_cs4231_ebus_create(struct snd_card *card,
+ struct of_device *op,
+ int dev)
{
struct snd_cs4231 *chip = card->private_data;
int err;