changeset 14116:951eacaf9381 stable

Initial documentation for broadcasting and general vectorization guidelines * vectorize.txi: New file. * NEWS: Update with location of broadcasting documentation. * Makefile.am: Add vectorize.texi * arith.txi: Move accumarray and accumdim docstring to vectorize.txi * container.txi: Move structfun docstring to vectorize.txi * expr.txi: Mention broadcasting where relevant. * func.txi: Move vectorize docstring to vectorize.txi * matrix.txi: Move function application section to vectorize.txi * octave.texi: Add vectorize.txi and its menu options * sparse.txi: Move spfun to vectorize.txi * tips.txi: Move and rewrite coding tips section in vectorize.txi * bsxfun.h (is_valid_bsxfun, is_valid_inplace_bsxfun): Rename warning to "Octave:broadcast" * accumdim.m: Reformat to use @example in lieu of @smallexample * warning_ids.m: Add Octave:broadcast * bsxfun.cc: Reword docstring to mention broadcasting * cellfun.cc: Move comment about efficiency from tips.txi * version.h.in: Add a big startup warning about broadcasting
author Jordi Gutiérrez Hermoso <jordigh@octave.org>
date Tue, 27 Dec 2011 15:15:41 -0500
parents 0b3518c1228a
children acdc4520286a
files NEWS doc/interpreter/Makefile.am doc/interpreter/arith.txi doc/interpreter/container.txi doc/interpreter/expr.txi doc/interpreter/func.txi doc/interpreter/matrix.txi doc/interpreter/octave.texi doc/interpreter/sparse.txi doc/interpreter/tips.txi doc/interpreter/vectorize.txi liboctave/bsxfun.h scripts/general/accumdim.m scripts/miscellaneous/warning_ids.m src/DLD-FUNCTIONS/bsxfun.cc src/DLD-FUNCTIONS/cellfun.cc src/version.h.in
diffstat 17 files changed, 685 insertions(+), 244 deletions(-) [+]
line wrap: on
line diff
--- a/NEWS
+++ b/NEWS
@@ -42,8 +42,8 @@
 
       +=  -=  .+=  .-=  .*=  ./=  .\=  .^=  .**=  &=  |=
 
-    See the new "Broadcasting Operations" chapter in the manual for more
-    details.
+    See the "Broadcasting" section in the new "Vectorization and Faster
+    Code Execution" chapter of the manual for more details.
 
  ** Octave now features a profiler, thanks to the work of Daniel Kraft
     under the Google Summer of Code mentorship program.  The manual has
--- a/doc/interpreter/Makefile.am
+++ b/doc/interpreter/Makefile.am
@@ -148,7 +148,8 @@
   system.texi \
   testfun.texi \
   tips.texi \
-  var.texi
+  var.texi \
+  vectorize.texi
 
 TXI_SRC = $(MUNGED_TEXI_SRC:.texi=.txi)
 
--- a/doc/interpreter/arith.txi
+++ b/doc/interpreter/arith.txi
@@ -189,10 +189,6 @@
 
 @DOCSTRING(sumsq)
 
-@DOCSTRING(accumarray)
-
-@DOCSTRING(accumdim)
-
 @node Utility Functions
 @section Utility Functions
 
--- a/doc/interpreter/container.txi
+++ b/doc/interpreter/container.txi
@@ -507,9 +507,7 @@
 The simplest way to process data in a structure is within a @code{for}
 loop (@pxref{Looping Over Structure Elements}).  A similar effect can be
 achieved with the @code{structfun} function, where a user defined
-function is applied to each field of the structure.
-
-@DOCSTRING(structfun)
+function is applied to each field of the structure. @xref{doc-structfun}.
 
 Alternatively, to process the data in a structure, the structure might
 be converted to another type of container before being treated.
@@ -887,9 +885,7 @@
 is to iterate through it using one or more @code{for} loops.  The same
 idea can be implemented more easily through the use of the @code{cellfun}
 function that calls a user-specified function on all elements of a cell
-array.
-
-@DOCSTRING(cellfun)
+array. @xref{doc-cellfun}.
 
 An alternative is to convert the data to a different container, such as
 a matrix or a data structure.  Depending on the data this is possible
--- a/doc/interpreter/expr.txi
+++ b/doc/interpreter/expr.txi
@@ -512,7 +512,8 @@
 @cindex complex-conjugate transpose
 
 The following arithmetic operators are available, and work on scalars
-and matrices.
+and matrices. They element-by-element operators and functions broadcast
+(@pxref{Broadcasting}).
 
 @table @asis
 @item @var{x} + @var{y}
@@ -715,7 +716,8 @@
 
 All of Octave's comparison operators return a value of 1 if the
 comparison is true, or 0 if it is false.  For matrix values, they all
-work on an element-by-element basis.  For example:
+work on an element-by-element basis.  Broadcasting rules apply.
+@xref{Broadcasting}. For example:
 
 @example
 @group
@@ -725,9 +727,9 @@
 @end group
 @end example
 
-If one operand is a scalar and the other is a matrix, the scalar is
-compared to each element of the matrix in turn, and the result is the
-same size as the matrix.
+According to broadcasting rules, if one operand is a scalar and the
+other is a matrix, the scalar is compared to each element of the matrix
+in turn, and the result is the same size as the matrix.
 
 @table @code
 @item @var{x} < @var{y}
@@ -862,8 +864,8 @@
 @var{boolean} is false.
 @end table
 
-For matrix operands, these operators work on an element-by-element
-basis.  For example, the expression
+These operators work on an element-by-element basis. For example, the
+expression
 
 @example
 [1, 0; 0, 1] & [1, 0; 2, 3]
@@ -872,8 +874,8 @@
 @noindent
 returns a two by two identity matrix.
 
-For the binary operators, the dimensions of the operands must conform if
-both are matrices.  If one of the operands is a scalar and the other a
+For the binary operators, broadcasting rules apply. @xref{Broadcasting}.
+In particular, if one of the operands is a scalar and the other a
 matrix, the operator is applied to the scalar and each element of the
 matrix.
 
--- a/doc/interpreter/func.txi
+++ b/doc/interpreter/func.txi
@@ -1225,8 +1225,6 @@
 
 @DOCSTRING(formula)
 
-@DOCSTRING(vectorize)
-
 @DOCSTRING(symvar)
 
 @node Commands
--- a/doc/interpreter/matrix.txi
+++ b/doc/interpreter/matrix.txi
@@ -29,7 +29,6 @@
 @menu
 * Finding Elements and Checking Conditions::  
 * Rearranging Matrices::        
-* Applying a Function to an Array::
 * Special Utility Matrices::    
 * Famous Matrices::             
 @end menu
@@ -140,13 +139,6 @@
 
 @DOCSTRING(blkdiag)
 
-@node Applying a Function to an Array
-@section Applying a Function to an Array
-
-@DOCSTRING(arrayfun)
-
-@DOCSTRING(bsxfun)
-
 @node Special Utility Matrices
 @section Special Utility Matrices
 
--- a/doc/interpreter/octave.texi
+++ b/doc/interpreter/octave.texi
@@ -197,8 +197,9 @@
 * Plotting::                    
 * Matrix Manipulation::         
 * Arithmetic::                  
-* Linear Algebra::              
-* Nonlinear Equations::         
+* Linear Algebra::
+* Vectorization and Faster Code Execution::
+* Nonlinear Equations::
 * Diagonal and Permutation Matrices::
 * Sparse Matrices::
 * Numerical Integration::                  
@@ -600,7 +601,6 @@
 
 * Finding Elements and Checking Conditions::  
 * Rearranging Matrices::        
-* Applying a Function to an Array::
 * Special Utility Matrices::    
 * Famous Matrices::             
 
@@ -624,6 +624,15 @@
 * Functions of a Matrix::       
 * Specialized Solvers::
 
+Vectorization and Faster Code Execution
+
+* Basic Vectorization::
+* Broadcasting::
+* Function Application::
+* Accumulation::
+* Miscellaneous Techniques::
+* Examples::
+
 Nonlinear Equations
 
 * Solvers::
@@ -844,7 +853,6 @@
 Tips and Standards
 
 * Style Tips::                  Writing clean and robust programs.
-* Coding Tips::                 Making code run faster.
 * Comment Tips::                Conventions for writing comments.
 * Function Headers::            Standard headers for functions.
 * Documentation Tips::          Writing readable documentation strings.
@@ -910,6 +918,7 @@
 @include matrix.texi
 @include arith.texi
 @include linalg.texi
+@include vectorize.texi
 @include nonlin.texi
 @include diagperm.texi
 @include sparse.texi
--- a/doc/interpreter/sparse.txi
+++ b/doc/interpreter/sparse.txi
@@ -223,8 +223,6 @@
 
 @DOCSTRING(speye)
 
-@DOCSTRING(spfun)
-
 @DOCSTRING(spones)
 
 @DOCSTRING(sprand)
--- a/doc/interpreter/tips.txi
+++ b/doc/interpreter/tips.txi
@@ -28,7 +28,6 @@
 
 @menu
 * Style Tips::                  Writing clean and robust programs.
-* Coding Tips::                 Making code run faster.
 * Comment Tips::                Conventions for writing comments.
 * Function Headers::            Standard headers for functions.
 * Documentation Tips::          Writing readable documentation strings.
@@ -75,203 +74,6 @@
 copyright to anyone else, then place your name in the copyright notice.
 @end itemize
 
-@node Coding Tips
-@section Tips for Making Code Run Faster.
-@cindex execution speed
-@cindex speedups
-
-Here are some ways of improving the execution speed of Octave programs.
-
-@itemize @bullet
-@item
-Vectorize loops.  For instance, rather than
-
-@example
-@group
-for i = 1:n-1
-  a(i) = b(i+1) - b(i);
-endfor
-@end group
-@end example
-
-@noindent
-write
-
-@example
-a = b(2:n) - b(1:n-1);
-@end example
-
-This is especially important for loops with "cheap" bodies.  Often it suffices
-to vectorize just the innermost loop to get acceptable performance.  A general
-rule of thumb is that the "order" of the vectorized body should be greater or
-equal to the "order" of the enclosing loop.
-
-@item
-Use built-in and library functions if possible.  Built-in and compiled functions
-are very fast.  Even with a m-file library function, chances are good that it is
-already optimized, or will be optimized more in a future release.
-
-For instance, even better than
-
-@example
-a = b(2:n) - b(1:n-1);
-@end example
-
-@noindent
-is
-
-@example
-a = diff (b);
-@end example
-
-
-@item
-Avoid computing costly intermediate results multiple times.  Octave currently
-does not eliminate common subexpressions.
-Also, certain internal computation results are cached for variables.
-For instance, if a matrix variable is used multiple times as an index,
-checking the indices (and internal conversion to integers) is only done once.
-
-@item
-Be aware of lazy copies (copy-on-write).  When a copy of an object
-is created, the data is not immediately copied, but rather shared.  The actual
-copying is postponed until the copied data needs to be modified.  For example:
-
-@example
-@group
-a = zeros (1000); # create a 1000x1000 matrix
-b = a; # no copying done here
-b(1) = 1; # copying done here
-@end group
-@end example
-
-Lazy copying applies to whole Octave objects such as matrices, cells, struct,
-and also individual cell or struct elements (not array elements).
-
-Additionally, index expressions also use lazy copying when Octave can determine
-that the indexed portion is contiguous in memory.  For example:
-
-@example
-@group
-a = zeros (1000); # create a 1000x1000 matrix
-b = a(:,10:100); # no copying done here
-b = a(10:100,:); # copying done here
-@end group
-@end example
-
-This applies to arrays (matrices), cell arrays, and structs indexed using ().
-Index expressions generating cs-lists can also benefit of shallow copying
-in some cases.  In particular, when @var{a} is a struct array, expressions like
-@code{@{a.x@}, @{a(:,2).x@}} will use lazy copying, so that data can be shared
-between a struct array and a cell array.
-
-Most indexing expressions do not live longer than their `parent' objects.
-In rare cases, however, a lazily copied slice outlasts its parent, in which
-case it becomes orphaned, still occupying unnecessarily more memory than needed.
-To provide a remedy working in most real cases,
-Octave checks for orphaned lazy slices at certain situations, when a value
-is stored into a "permanent" location, such as a named variable or cell or
-struct element, and possibly economizes them.  For example:
-
-@example
-@group
-a = zeros (1000); # create a 1000x1000 matrix
-b = a(:,10:100);  # lazy slice
-a = []; # the original a array is still allocated
-c@{1@} = b; # b is reallocated at this point
-@end group
-@end example
-
-@item
-Avoid deep recursion.  Function calls to m-file functions carry a relatively
-significant overhead, so rewriting a recursion as a loop often helps.  Also,
-note that the maximum level of recursion is limited.
-
-@item
-Avoid resizing matrices unnecessarily.  When building a single result
-matrix from a series of calculations, set the size of the result matrix
-first, then insert values into it.  Write
-
-@example
-@group
-result = zeros (big_n, big_m)
-for i = over:and_over
-  r1 = @dots{}
-  r2 = @dots{}
-  result (r1, r2) = new_value ();
-endfor
-@end group
-@end example
-
-@noindent
-instead of
-
-@example
-@group
-result = [];
-for i = ever:and_ever
-  result = [ result, new_value() ];
-endfor
-@end group
-@end example
-
-Sometimes the number of items can't be computed in advance, and stack-like
-operations are needed.  When elements are being repeatedly inserted at/removed
-from the end of an array, Octave detects it as stack usage and attempts to use a
-smarter memory management strategy pre-allocating the array in bigger chunks. 
-Likewise works for cell and struct arrays.
-
-@example
-@group
-a = [];
-while (condition)
-  @dots{}
-  a(end+1) = value; # "push" operation
-  @dots{}
-  a(end) = []; # "pop" operation
-  @dots{}
-endwhile
-@end group
-@end example
-
-@item
-Use @code{cellfun} intelligently.  The @code{cellfun} function is a useful tool
-for avoiding loops.  @xref{Processing Data in Cell Arrays}.
-@code{cellfun} is often used with anonymous function handles; however, calling
-an anonymous function involves an overhead quite comparable to the overhead
-of an m-file function.  Passing a handle to a built-in function is faster,
-because the interpreter is not involved in the internal loop.  For example:
-
-@example
-@group
-a = @{@dots{}@}
-v = cellfun (@@(x) det(x), a); # compute determinants
-v = cellfun (@@det, a); # faster
-@end group
-@end example
-
-@item
-Octave includes a number of other functions that can replace common types of
-loops, including @code{bsxfun}, @code{arrayfun}, @code{structfun},
-@code{accumarray}.  These functions can take an arbitrary function as a handle.
-Be sure to get familiar with them if you want to become an Octave expert.
-
-@item
-Avoid calling @code{eval} or @code{feval} excessively, because
-they require Octave to parse input or look up the name of a function in
-the symbol table.
-
-If you are using @code{eval} as an exception handling mechanism and not
-because you need to execute some arbitrary text, use the @code{try}
-statement instead.  @xref{The @code{try} Statement}.
-
-@item
-If you are calling lots of functions but none of them will need to
-change during your run, set the variable
-@code{ignore_function_time_stamp} to @code{"all"} so that Octave doesn't
-waste a lot of time checking to see if you have updated your function
-files.
-@end itemize
 
 @node Comment Tips
 @section Tips on Writing Comments
new file mode 100644
--- /dev/null
+++ b/doc/interpreter/vectorize.txi
@@ -0,0 +1,625 @@
+@c Copyright (C) 2011 Jordi GutiƩrrez Hermoso
+@c
+@c This file is part of Octave.
+@c
+@c Octave is free software; you can redistribute it and/or modify it
+@c under the terms of the GNU General Public License as published by the
+@c Free Software Foundation; either version 3 of the License, or (at
+@c your option) any later version.
+@c
+@c Octave is distributed in the hope that it will be useful, but WITHOUT
+@c ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+@c FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+@c for more details.
+@c
+@c You should have received a copy of the GNU General Public License
+@c along with Octave; see the file COPYING. If not, see
+@c <http://www.gnu.org/licenses/>.
+
+@node Vectorization and Faster Code Execution
+@chapter Vectorization and Faster Code Execution
+@cindex vectorization
+@cindex vectorize
+
+Vectorization is a programming technique that uses vector operations
+instead of element-by-element loop-based operations. Besides frequently
+writing more succinct code, vectorization also allows for better
+optimization of the code. The optimizations may occur either in Octave's
+own Fortran, C, or C++ internal implementation, or even at a lower level
+depending on the compiler and external numerical libraries used to
+build Octave. The ultimate goal is to make use of your hardware's vector
+instructions if possible or to perform other optimizations in software.
+
+Vectorization is not a concept unique to Octave, but is particularly
+important because Octave is a matrix-oriented language. Vectorized
+Octave code will see a dramatic speed up in most cases.
+
+This chapter discusses vectorization and other techniques for faster
+code execution.
+
+@menu
+* Basic Vectorization::        Basic techniques for code optimization
+* Broadcasting::               Broadcasting operations
+* Function Application::       Applying functions to arrays, cells, and structs
+* Accumulation::               Accumulation functions
+* Miscellaneous Techniques::   Other techniques for speeding up code
+* Examples::
+@end menu
+
+@node Basic Vectorization
+@section Basic Vectorization
+
+To a very good first approximation, the goal in vectorization is to
+write code that avoids loops and uses whole-array operations. As a
+trivial example, consider
+
+@example
+@group
+for i = 1:n
+  for j = 1:m
+    c(i,j) = a(i,j) + b(i,j);
+  endfor
+endfor
+@end group
+@end example
+
+@noindent
+compared to the much simpler
+
+@example
+c = a + b;
+@end example
+
+@noindent
+This isn't merely easier to write; it is also internally much easier to
+optimize. Octave delegates this operation to an underlying
+implementation which among other optimizations may use special vector
+hardware instructions or could conceivably even perform the additions in
+parallel. In general, if the code is vectorized, the underlying
+implementation has much more freedom about the assumptions it can make
+in order to achieve faster execution.
+
+This is especially important for loops with "cheap" bodies. Often it
+suffices to vectorize just the innermost loop to get acceptable
+performance. A general rule of thumb is that the "order" of the
+vectorized body should be greater or equal to the "order" of the
+enclosing loop.
+
+As a less trivial example, rather than
+
+@example
+@group
+for i = 1:n-1
+  a(i) = b(i+1) - b(i);
+endfor
+@end group
+@end example
+
+@noindent
+write
+
+@example
+a = b(2:n) - b(1:n-1);
+@end example
+
+This shows an important general concept about using arrays for indexing
+instead of looping over an index variable. @xref{Index Expressions}.
+Also use boolean indexing generously. If a condition needs to be tested,
+this condition can also be written as a boolean index. For instance,
+instead of
+
+@example
+@group
+for i = 1:n
+  if a(i) > 5
+    a(i) -= 20
+  endif
+endfor
+@end group
+@end example
+
+@noindent
+write
+
+@example
+a(a>5) -= 20;
+@end example
+
+@noindent
+which exploits the fact that @code{a > 5} produces a boolean index.
+
+Use elementwise vector operators whenever possible to avoid looping
+(operators like @code{.*} and @code{.^}). @xref{Arithmetic Ops}. For
+simple in-line functions, the @code{vectorize} function can do this
+automatically.
+
+@DOCSTRING(vectorize)
+
+Also exploit broadcasting in these elementise operators both to avoid
+looping and unnecessary intermediate memory allocations.
+@xref{Broadcasting}.
+
+Use built-in and library functions if possible. Built-in and compiled
+functions are very fast. Even with a m-file library function, chances
+are good that it is already optimized, or will be optimized more in a
+future release.
+
+For instance, even better than
+
+@example
+a = b(2:n) - b(1:n-1);
+@end example
+
+@noindent
+is
+
+@example
+a = diff (b);
+@end example
+
+Most Octave functions are written with vector and array arguments in
+mind. If you find yourself writing a loop with a very simple operation,
+chances are that such a function already exists. The following functions
+occur frequently in vectorized code:
+
+@itemize @bullet
+@item
+Index manipulation
+
+@itemize
+@item
+find
+@item
+sub2ind
+@item
+ind2sub
+@item
+sort
+@item
+unique
+@item
+lookup
+@item
+ifelse / merge
+@end itemize
+
+@item
+Repetition
+@itemize
+@item
+repmat
+@item
+repelems
+@end itemize
+
+@item
+Vectorized arithmetic
+@itemize
+@item
+sum
+@item
+prod
+@item
+cumsum
+@item
+cumprod
+@item
+sumsq
+@item
+diff
+@item
+dot
+@item
+cummax
+@item
+cummin
+@end itemize
+
+@item
+Shape of higher dimensional arrays
+@itemize
+@item
+reshape
+@item
+resize
+@item
+permute
+@item
+squeeze
+@item
+deal
+@end itemize
+
+@end itemize
+
+@node Broadcasting
+@section Broadcasting
+@cindex broadcast
+@cindex broadcasting
+@cindex BSX
+@cindex recycling
+@cindex SIMD
+
+Broadcasting refers to how Octave binary operators and functions behave
+when their matrix or array operands or arguments differ in size. Since
+version 3.6.0, Octave now automatically broadcasts vectors, matrices,
+and arrays when using elementwise binary operators and functions.
+Broadly speaking, smaller arrays are ``broadcast'' across the larger
+one, until they have a compatible shape. The rule is that corresponding
+array dimensions must either
+
+@enumerate
+@item
+be equal or,
+@item
+one of them must be 1.
+@end enumerate
+
+@noindent
+In case all dimensions are equal, no broadcasting occurs and ordinary
+element-by-element arithmetic takes place. For arrays of higher
+dimensions, if the number of dimensions isn't the same, then missing
+trailing dimensions are treated as 1. When one of the dimensions is 1,
+the array with that singleton dimension gets copied along that dimension
+until it matches the dimension of the other array. For example, consider
+
+@example
+@group
+x = [1 2 3;
+     4 5 6;
+     7 8 9];
+
+y = [10 20 30];
+
+x + y
+@end group
+@end example
+
+@noindent
+Without broadcasting, @code{x + y} would be an error because dimensions
+do not agree. However, with broadcasting it is as if the following
+operation were performed:
+
+@example
+@group
+x = [1 2 3
+     4 5 6
+     7 8 9];
+
+y = [10 20 30
+     10 20 30
+     10 20 30];
+
+x + y
+@result{}    11   22   33
+      14   25   36
+      17   28   39
+@end group
+@end example
+
+@noindent
+That is, the smaller array of size @code{[1 3]} gets copied along the
+singleton dimension (the number of rows) until it is @code{[3 3]}. No
+actual copying takes place, however. The internal implementation reuses
+elements along the necessary dimension in order to achieve the desired
+effect without copying in memory.
+
+Both arrays can be broadcast across each other, for example, all
+pairwise differences of the elements of a vector with itself:
+
+@example
+@group
+y - y'
+@result{}    0   10   20
+    -10    0   10
+    -20  -10    0
+@end group
+@end example
+
+@noindent
+Here the vectors of size @code{[1 3]} and @code{[3 1]} both get
+broadcast into matrices of size @code{[3 3]} before ordinary matrix
+subtraction takes place.
+
+For a higher-dimensional example, suppose @code{img} is an RGB image of
+size @code{[m n 3]} and we wish to multiply each colour by a different
+scalar. The following code accomplishes this with broadcasting,
+
+@example
+img .*= permute ([0.8, 0.9, 1.2], [1, 3, 2]);
+@end example
+
+@noindent
+Note the usage of permute to match the dimensions of the @code{[0.8,
+0.9, 1.2]} vector with @code{img}.
+
+For functions that are not written with broadcasting semantics,
+@code{bsxfun} can be useful for coercing them to broadcast.
+
+@DOCSTRING(bsxfun)
+
+Broadcasting is only applied if either of the two broadcasting
+conditions hold. As usual, however, broadcasting does not apply when two
+dimensions differ and neither is 1:
+
+@example
+@group
+x = [1 2 3
+     4 5 6];
+y = [10 20
+     30 40];
+x + y
+@end group
+@end example
+
+@noindent
+This will produce an error about nonconformant arguments.
+
+Besides common arithmetic operations, several functions of two arguments
+also broadcast. The full list of functions and operators that broadcast
+is
+
+@example
+@group
+      plus      +  .+
+      minus     -  .-
+      times     .*
+      rdivide   ./
+      ldivide   .\
+      power     .^  .**
+      lt        <
+      le        <=
+      eq        ==
+      gt        >
+      ge        >=
+      ne        !=  ~=
+      and       &
+      or        |
+      atan2
+      hypot
+      max
+      min
+      mod
+      rem
+      xor
+
+      +=  -=  .+=  .-=  .*=  ./=  .\=  .^=  .**=  &=  |=
+@end group
+@end example
+
+Beware of resorting to broadcasting if a simpler operation will suffice.
+For matrices @var{a} and @var{b}, consider the following:
+
+@example
+c = sum (permute (a, [1, 3, 2]) .* permute (b, [3, 2, 1]), 3);
+@end example
+
+@noindent
+This operation broadcasts the two matrices with permuted dimensions
+across each other during elementwise multiplication in order to obtain a
+larger 3d array, and this array is the summed along the third dimension.
+A moment of thought will prove that this operation is simply the much
+faster ordinary matrix multiplication, @code{c = a*b;}.
+
+A note on terminology: ``broadcasting'' is the term popularized by the
+Numpy numerical environment in the Python programming language. In other
+programming languages and environments, broadcasting may also be known
+as @emph{binary singleton expansion} (BSX, in @sc{Matlab}, and the
+origin of the name of the @code{bsxfun} function), @emph{recycling} (R
+programming language), @emph{single-instruction multiple data} (SIMD),
+or @emph{replication}.
+
+@subsection Broadcasting and Legacy Code
+
+The new broadcasting semantics do not affect almost any code that worked
+in previous versions of Octave without error. Thus for example all code
+inherited from @sc{Matlab} that worked in previous versions of Octave
+should still work without change in Octave. The only exception is code
+such as
+
+@example
+@group
+try
+  c = a.*b;
+catch
+  c = a.*a;
+end_try_catch
+@end group
+@end example
+
+@noindent
+that may have relied on matrices of different size producing an error.
+Due to how broadcasting changes semantics with older versions of Octave,
+by default Octave warns if a broadcasting operation is performed. To
+disable this warning, refer to its ID (@pxref{doc-warning_ids}):
+
+@example
+warning ("off", "Octave:broadcast");
+@end example
+
+@noindent
+If you want to recover the old behaviour and produce an error, turn this
+warning into an error:
+
+@example
+warning ("error", "Octave:broadcast");
+@end example
+
+@node Function Application
+@section Function Application
+@cindex map
+@cindex apply
+@cindex function application
+
+As a general rule, functions should already be written with matrix
+arguments in mind and should consider whole matrix operations in a
+vectorized manner. Sometimes, writing functions in this way appears
+difficult or impossible for various reasons. For those situations,
+Octave provides facilities for applying a function to each element of an
+array, cell, or struct.
+
+@DOCSTRING(arrayfun)
+@DOCSTRING(spfun)
+@DOCSTRING(cellfun)
+@DOCSTRING(structfun)
+
+@node Accumulation
+@section Accumulation
+
+Whenever it's possible to categorize according to indices the elements
+of an array when performing a computation, accumulation functions can be
+useful.
+
+@DOCSTRING(accumarray)
+
+@DOCSTRING(accumdim)
+
+@node Miscellaneous Techniques
+@section Miscellaneous Techniques
+@cindex execution speed
+@cindex speedups
+@cindex optimization
+
+Here are some other ways of improving the execution speed of Octave
+programs.
+
+@itemize @bullet
+
+@item
+Avoid computing costly intermediate results multiple times. Octave
+currently does not eliminate common subexpressions. Also, certain
+internal computation results are cached for variables. For instance, if
+a matrix variable is used multiple times as an index, checking the
+indices (and internal conversion to integers) is only done once.
+
+@item
+@cindex copy-on-write
+@cindex COW
+@cindex memory management
+Be aware of lazy copies (copy-on-write). When a copy of an object is
+created, the data is not immediately copied, but rather shared. The
+actual copying is postponed until the copied data needs to be modified.
+For example:
+
+@example
+@group
+a = zeros (1000); # create a 1000x1000 matrix
+b = a; # no copying done here
+b(1) = 1; # copying done here
+@end group
+@end example
+
+Lazy copying applies to whole Octave objects such as matrices, cells,
+struct, and also individual cell or struct elements (not array
+elements).
+
+Additionally, index expressions also use lazy copying when Octave can
+determine that the indexed portion is contiguous in memory. For example:
+
+@example
+@group
+a = zeros (1000); # create a 1000x1000 matrix
+b = a(:,10:100); # no copying done here
+b = a(10:100,:); # copying done here
+@end group
+@end example
+
+This applies to arrays (matrices), cell arrays, and structs indexed
+using (). Index expressions generating cs-lists can also benefit of
+shallow copying in some cases. In particular, when @var{a} is a struct
+array, expressions like @code{@{a.x@}, @{a(:,2).x@}} will use lazy
+copying, so that data can be shared between a struct array and a cell
+array.
+
+Most indexing expressions do not live longer than their `parent'
+objects. In rare cases, however, a lazily copied slice outlasts its
+parent, in which case it becomes orphaned, still occupying unnecessarily
+more memory than needed. To provide a remedy working in most real cases,
+Octave checks for orphaned lazy slices at certain situations, when a
+value is stored into a "permanent" location, such as a named variable or
+cell or struct element, and possibly economizes them. For example:
+
+@example
+@group
+a = zeros (1000); # create a 1000x1000 matrix
+b = a(:,10:100);  # lazy slice
+a = []; # the original a array is still allocated
+c@{1@} = b; # b is reallocated at this point
+@end group
+@end example
+
+@item
+Avoid deep recursion. Function calls to m-file functions carry a
+relatively significant overhead, so rewriting a recursion as a loop
+often helps. Also, note that the maximum level of recursion is limited.
+
+@item
+Avoid resizing matrices unnecessarily.  When building a single result
+matrix from a series of calculations, set the size of the result matrix
+first, then insert values into it.  Write
+
+@example
+@group
+result = zeros (big_n, big_m)
+for i = over:and_over
+  r1 = @dots{}
+  r2 = @dots{}
+  result (r1, r2) = new_value ();
+endfor
+@end group
+@end example
+
+@noindent
+instead of
+
+@example
+@group
+result = [];
+for i = ever:and_ever
+  result = [ result, new_value() ];
+endfor
+@end group
+@end example
+
+Sometimes the number of items can't be computed in advance, and
+stack-like operations are needed. When elements are being repeatedly
+inserted at/removed from the end of an array, Octave detects it as stack
+usage and attempts to use a smarter memory management strategy
+pre-allocating the array in bigger chunks. Likewise works for cell and
+struct arrays.
+
+@example
+@group
+a = [];
+while (condition)
+  @dots{}
+  a(end+1) = value; # "push" operation
+  @dots{}
+  a(end) = []; # "pop" operation
+  @dots{}
+endwhile
+@end group
+@end example
+
+@item
+Avoid calling @code{eval} or @code{feval} excessively, because
+they require Octave to parse input or look up the name of a function in
+the symbol table.
+
+If you are using @code{eval} as an exception handling mechanism and not
+because you need to execute some arbitrary text, use the @code{try}
+statement instead.  @xref{The @code{try} Statement}.
+
+@item
+If you are calling lots of functions but none of them will need to
+change during your run, set the variable
+@code{ignore_function_time_stamp} to @code{"all"} so that Octave doesn't
+waste a lot of time checking to see if you have updated your function
+files.
+@end itemize
+
+@node Examples
+@section Examples
+
+Here goes a gallery of vectorization puzzles with solutions culled from
+the help mailing list and the machine learning class...
--- a/liboctave/bsxfun.h
+++ b/liboctave/bsxfun.h
@@ -42,7 +42,7 @@
     }
 
   (*current_liboctave_warning_with_id_handler)
-    ("Octave:auto-bsxfun", "%s: automatic broadcasting operation applied", name.c_str ());
+    ("Octave:broadcast", "%s: automatic broadcasting operation applied", name.c_str ());
 
   return true;
 }
@@ -69,7 +69,7 @@
     }
 
   (*current_liboctave_warning_with_id_handler)
-    ("Octave:auto-bsxfun", "%s: automatic broadcasting operation applied", name.c_str ());
+    ("Octave:broadcast", "%s: automatic broadcasting operation applied", name.c_str ());
 
   return true;
 }
--- a/scripts/general/accumdim.m
+++ b/scripts/general/accumdim.m
@@ -43,7 +43,7 @@
 ##
 ## An example of the use of @code{accumdim} is:
 ##
-## @smallexample
+## @example
 ## @group
 ## accumdim ([1, 2, 1, 2, 1], [ 7, -10,   4;
 ##                             -5, -12,   8;
@@ -52,7 +52,7 @@
 ##                             -5,  -3, -13])
 ## @result{} ans = [-10,-11,-1;-15,-3,5]
 ## @end group
-## @end smallexample
+## @end example
 ##
 ## @seealso{accumarray}
 ## @end deftypefn
--- a/scripts/miscellaneous/warning_ids.m
+++ b/scripts/miscellaneous/warning_ids.m
@@ -120,6 +120,11 @@
 ## directory as the .oct file referred to by the autoload() command.
 ## By default, the @code{Octave:autoload-relative-file-name} warning is enabled.
 ##
+## @item Octave:broadcast
+## Warn when performing broadcasting operations. By default, this is
+## enabled. See the Broadcasting section in the Vectorization and Faster
+## Code Execution chapter in the manual.
+##
 ## @item Octave:built-in-variable-assignment
 ## By default, the @code{Octave:built-in-variable-assignment} warning is
 ## enabled.
--- a/src/DLD-FUNCTIONS/bsxfun.cc
+++ b/src/DLD-FUNCTIONS/bsxfun.cc
@@ -312,8 +312,8 @@
 DEFUN_DLD (bsxfun, args, ,
   "-*- texinfo -*-\n\
 @deftypefn {Loadable Function} {} bsxfun (@var{f}, @var{A}, @var{B})\n\
-The binary singleton expansion function applier does what its name\n\
-suggests: applies a binary function @var{f} element-by-element to two\n\
+The binary singleton expansion function applier performs broadcasting,\n\
+that is, applies a binary function @var{f} element-by-element to two\n\
 array arguments @var{A} and @var{B}, and expands as necessary\n\
 singleton dimensions in either input argument.  @var{f} is a function\n\
 handle, inline function, or string containing the name of the function\n\
--- a/src/DLD-FUNCTIONS/cellfun.cc
+++ b/src/DLD-FUNCTIONS/cellfun.cc
@@ -365,6 +365,21 @@
 @end group\n\
 @end example\n\
 \n\
+Use @code{cellfun} intelligently. The @code{cellfun} function is a\n\
+useful tool for avoiding loops. It is often used with anonymous\n\
+function handles; however, calling an anonymous function involves an\n\
+overhead quite comparable to the overhead of an m-file function.\n\
+Passing a handle to a built-in function is faster, because the\n\
+interpreter is not involved in the internal loop. For example:\n\
+\n\
+@example\n\
+@group\n\
+a = @{@dots{}@}\n\
+v = cellfun (@@(x) det(x), a); # compute determinants\n\
+v = cellfun (@@det, a); # faster\n\
+@end group\n\
+@end example\n\
+\n\
 @seealso{arrayfun, structfun, spfun}\n\
 @end deftypefn")
 {
--- a/src/version.h.in
+++ b/src/version.h.in
@@ -87,6 +87,8 @@
 #define OCTAVE_STARTUP_MESSAGE \
   X_OCTAVE_NAME_VERSION_COPYRIGHT_COPYING_WARRANTY_AND_BUGS \
     ("  For details, type `warranty'.") "\n\n" \
-  "For information about changes from previous versions, type `news'."
-
+  "For information about changes from previous versions, type `news'.\n\n\
+** WARNING ** Semantics for arithmetic operators have changed and now\n\
+              broadcast. Type `doc broadcast' to get a description of\n\
+              the changes."
 #endif