Mercurial > hg > octave-lyh
changeset 14116:951eacaf9381 stable
Initial documentation for broadcasting and general vectorization guidelines
* vectorize.txi: New file.
* NEWS: Update with location of broadcasting documentation.
* Makefile.am: Add vectorize.texi
* arith.txi: Move accumarray and accumdim docstring to vectorize.txi
* container.txi: Move structfun docstring to vectorize.txi
* expr.txi: Mention broadcasting where relevant.
* func.txi: Move vectorize docstring to vectorize.txi
* matrix.txi: Move function application section to vectorize.txi
* octave.texi: Add vectorize.txi and its menu options
* sparse.txi: Move spfun to vectorize.txi
* tips.txi: Move and rewrite coding tips section in vectorize.txi
* bsxfun.h (is_valid_bsxfun, is_valid_inplace_bsxfun): Rename warning
to "Octave:broadcast"
* accumdim.m: Reformat to use @example in lieu of @smallexample
* warning_ids.m: Add Octave:broadcast
* bsxfun.cc: Reword docstring to mention broadcasting
* cellfun.cc: Move comment about efficiency from tips.txi
* version.h.in: Add a big startup warning about broadcasting
author | Jordi Gutiérrez Hermoso <jordigh@octave.org> |
---|---|
date | Tue, 27 Dec 2011 15:15:41 -0500 |
parents | 0b3518c1228a |
children | acdc4520286a |
files | NEWS doc/interpreter/Makefile.am doc/interpreter/arith.txi doc/interpreter/container.txi doc/interpreter/expr.txi doc/interpreter/func.txi doc/interpreter/matrix.txi doc/interpreter/octave.texi doc/interpreter/sparse.txi doc/interpreter/tips.txi doc/interpreter/vectorize.txi liboctave/bsxfun.h scripts/general/accumdim.m scripts/miscellaneous/warning_ids.m src/DLD-FUNCTIONS/bsxfun.cc src/DLD-FUNCTIONS/cellfun.cc src/version.h.in |
diffstat | 17 files changed, 685 insertions(+), 244 deletions(-) [+] |
line wrap: on
line diff
--- a/NEWS +++ b/NEWS @@ -42,8 +42,8 @@ += -= .+= .-= .*= ./= .\= .^= .**= &= |= - See the new "Broadcasting Operations" chapter in the manual for more - details. + See the "Broadcasting" section in the new "Vectorization and Faster + Code Execution" chapter of the manual for more details. ** Octave now features a profiler, thanks to the work of Daniel Kraft under the Google Summer of Code mentorship program. The manual has
--- a/doc/interpreter/Makefile.am +++ b/doc/interpreter/Makefile.am @@ -148,7 +148,8 @@ system.texi \ testfun.texi \ tips.texi \ - var.texi + var.texi \ + vectorize.texi TXI_SRC = $(MUNGED_TEXI_SRC:.texi=.txi)
--- a/doc/interpreter/arith.txi +++ b/doc/interpreter/arith.txi @@ -189,10 +189,6 @@ @DOCSTRING(sumsq) -@DOCSTRING(accumarray) - -@DOCSTRING(accumdim) - @node Utility Functions @section Utility Functions
--- a/doc/interpreter/container.txi +++ b/doc/interpreter/container.txi @@ -507,9 +507,7 @@ The simplest way to process data in a structure is within a @code{for} loop (@pxref{Looping Over Structure Elements}). A similar effect can be achieved with the @code{structfun} function, where a user defined -function is applied to each field of the structure. - -@DOCSTRING(structfun) +function is applied to each field of the structure. @xref{doc-structfun}. Alternatively, to process the data in a structure, the structure might be converted to another type of container before being treated. @@ -887,9 +885,7 @@ is to iterate through it using one or more @code{for} loops. The same idea can be implemented more easily through the use of the @code{cellfun} function that calls a user-specified function on all elements of a cell -array. - -@DOCSTRING(cellfun) +array. @xref{doc-cellfun}. An alternative is to convert the data to a different container, such as a matrix or a data structure. Depending on the data this is possible
--- a/doc/interpreter/expr.txi +++ b/doc/interpreter/expr.txi @@ -512,7 +512,8 @@ @cindex complex-conjugate transpose The following arithmetic operators are available, and work on scalars -and matrices. +and matrices. They element-by-element operators and functions broadcast +(@pxref{Broadcasting}). @table @asis @item @var{x} + @var{y} @@ -715,7 +716,8 @@ All of Octave's comparison operators return a value of 1 if the comparison is true, or 0 if it is false. For matrix values, they all -work on an element-by-element basis. For example: +work on an element-by-element basis. Broadcasting rules apply. +@xref{Broadcasting}. For example: @example @group @@ -725,9 +727,9 @@ @end group @end example -If one operand is a scalar and the other is a matrix, the scalar is -compared to each element of the matrix in turn, and the result is the -same size as the matrix. +According to broadcasting rules, if one operand is a scalar and the +other is a matrix, the scalar is compared to each element of the matrix +in turn, and the result is the same size as the matrix. @table @code @item @var{x} < @var{y} @@ -862,8 +864,8 @@ @var{boolean} is false. @end table -For matrix operands, these operators work on an element-by-element -basis. For example, the expression +These operators work on an element-by-element basis. For example, the +expression @example [1, 0; 0, 1] & [1, 0; 2, 3] @@ -872,8 +874,8 @@ @noindent returns a two by two identity matrix. -For the binary operators, the dimensions of the operands must conform if -both are matrices. If one of the operands is a scalar and the other a +For the binary operators, broadcasting rules apply. @xref{Broadcasting}. +In particular, if one of the operands is a scalar and the other a matrix, the operator is applied to the scalar and each element of the matrix.
--- a/doc/interpreter/func.txi +++ b/doc/interpreter/func.txi @@ -1225,8 +1225,6 @@ @DOCSTRING(formula) -@DOCSTRING(vectorize) - @DOCSTRING(symvar) @node Commands
--- a/doc/interpreter/matrix.txi +++ b/doc/interpreter/matrix.txi @@ -29,7 +29,6 @@ @menu * Finding Elements and Checking Conditions:: * Rearranging Matrices:: -* Applying a Function to an Array:: * Special Utility Matrices:: * Famous Matrices:: @end menu @@ -140,13 +139,6 @@ @DOCSTRING(blkdiag) -@node Applying a Function to an Array -@section Applying a Function to an Array - -@DOCSTRING(arrayfun) - -@DOCSTRING(bsxfun) - @node Special Utility Matrices @section Special Utility Matrices
--- a/doc/interpreter/octave.texi +++ b/doc/interpreter/octave.texi @@ -197,8 +197,9 @@ * Plotting:: * Matrix Manipulation:: * Arithmetic:: -* Linear Algebra:: -* Nonlinear Equations:: +* Linear Algebra:: +* Vectorization and Faster Code Execution:: +* Nonlinear Equations:: * Diagonal and Permutation Matrices:: * Sparse Matrices:: * Numerical Integration:: @@ -600,7 +601,6 @@ * Finding Elements and Checking Conditions:: * Rearranging Matrices:: -* Applying a Function to an Array:: * Special Utility Matrices:: * Famous Matrices:: @@ -624,6 +624,15 @@ * Functions of a Matrix:: * Specialized Solvers:: +Vectorization and Faster Code Execution + +* Basic Vectorization:: +* Broadcasting:: +* Function Application:: +* Accumulation:: +* Miscellaneous Techniques:: +* Examples:: + Nonlinear Equations * Solvers:: @@ -844,7 +853,6 @@ Tips and Standards * Style Tips:: Writing clean and robust programs. -* Coding Tips:: Making code run faster. * Comment Tips:: Conventions for writing comments. * Function Headers:: Standard headers for functions. * Documentation Tips:: Writing readable documentation strings. @@ -910,6 +918,7 @@ @include matrix.texi @include arith.texi @include linalg.texi +@include vectorize.texi @include nonlin.texi @include diagperm.texi @include sparse.texi
--- a/doc/interpreter/sparse.txi +++ b/doc/interpreter/sparse.txi @@ -223,8 +223,6 @@ @DOCSTRING(speye) -@DOCSTRING(spfun) - @DOCSTRING(spones) @DOCSTRING(sprand)
--- a/doc/interpreter/tips.txi +++ b/doc/interpreter/tips.txi @@ -28,7 +28,6 @@ @menu * Style Tips:: Writing clean and robust programs. -* Coding Tips:: Making code run faster. * Comment Tips:: Conventions for writing comments. * Function Headers:: Standard headers for functions. * Documentation Tips:: Writing readable documentation strings. @@ -75,203 +74,6 @@ copyright to anyone else, then place your name in the copyright notice. @end itemize -@node Coding Tips -@section Tips for Making Code Run Faster. -@cindex execution speed -@cindex speedups - -Here are some ways of improving the execution speed of Octave programs. - -@itemize @bullet -@item -Vectorize loops. For instance, rather than - -@example -@group -for i = 1:n-1 - a(i) = b(i+1) - b(i); -endfor -@end group -@end example - -@noindent -write - -@example -a = b(2:n) - b(1:n-1); -@end example - -This is especially important for loops with "cheap" bodies. Often it suffices -to vectorize just the innermost loop to get acceptable performance. A general -rule of thumb is that the "order" of the vectorized body should be greater or -equal to the "order" of the enclosing loop. - -@item -Use built-in and library functions if possible. Built-in and compiled functions -are very fast. Even with a m-file library function, chances are good that it is -already optimized, or will be optimized more in a future release. - -For instance, even better than - -@example -a = b(2:n) - b(1:n-1); -@end example - -@noindent -is - -@example -a = diff (b); -@end example - - -@item -Avoid computing costly intermediate results multiple times. Octave currently -does not eliminate common subexpressions. -Also, certain internal computation results are cached for variables. -For instance, if a matrix variable is used multiple times as an index, -checking the indices (and internal conversion to integers) is only done once. - -@item -Be aware of lazy copies (copy-on-write). When a copy of an object -is created, the data is not immediately copied, but rather shared. The actual -copying is postponed until the copied data needs to be modified. For example: - -@example -@group -a = zeros (1000); # create a 1000x1000 matrix -b = a; # no copying done here -b(1) = 1; # copying done here -@end group -@end example - -Lazy copying applies to whole Octave objects such as matrices, cells, struct, -and also individual cell or struct elements (not array elements). - -Additionally, index expressions also use lazy copying when Octave can determine -that the indexed portion is contiguous in memory. For example: - -@example -@group -a = zeros (1000); # create a 1000x1000 matrix -b = a(:,10:100); # no copying done here -b = a(10:100,:); # copying done here -@end group -@end example - -This applies to arrays (matrices), cell arrays, and structs indexed using (). -Index expressions generating cs-lists can also benefit of shallow copying -in some cases. In particular, when @var{a} is a struct array, expressions like -@code{@{a.x@}, @{a(:,2).x@}} will use lazy copying, so that data can be shared -between a struct array and a cell array. - -Most indexing expressions do not live longer than their `parent' objects. -In rare cases, however, a lazily copied slice outlasts its parent, in which -case it becomes orphaned, still occupying unnecessarily more memory than needed. -To provide a remedy working in most real cases, -Octave checks for orphaned lazy slices at certain situations, when a value -is stored into a "permanent" location, such as a named variable or cell or -struct element, and possibly economizes them. For example: - -@example -@group -a = zeros (1000); # create a 1000x1000 matrix -b = a(:,10:100); # lazy slice -a = []; # the original a array is still allocated -c@{1@} = b; # b is reallocated at this point -@end group -@end example - -@item -Avoid deep recursion. Function calls to m-file functions carry a relatively -significant overhead, so rewriting a recursion as a loop often helps. Also, -note that the maximum level of recursion is limited. - -@item -Avoid resizing matrices unnecessarily. When building a single result -matrix from a series of calculations, set the size of the result matrix -first, then insert values into it. Write - -@example -@group -result = zeros (big_n, big_m) -for i = over:and_over - r1 = @dots{} - r2 = @dots{} - result (r1, r2) = new_value (); -endfor -@end group -@end example - -@noindent -instead of - -@example -@group -result = []; -for i = ever:and_ever - result = [ result, new_value() ]; -endfor -@end group -@end example - -Sometimes the number of items can't be computed in advance, and stack-like -operations are needed. When elements are being repeatedly inserted at/removed -from the end of an array, Octave detects it as stack usage and attempts to use a -smarter memory management strategy pre-allocating the array in bigger chunks. -Likewise works for cell and struct arrays. - -@example -@group -a = []; -while (condition) - @dots{} - a(end+1) = value; # "push" operation - @dots{} - a(end) = []; # "pop" operation - @dots{} -endwhile -@end group -@end example - -@item -Use @code{cellfun} intelligently. The @code{cellfun} function is a useful tool -for avoiding loops. @xref{Processing Data in Cell Arrays}. -@code{cellfun} is often used with anonymous function handles; however, calling -an anonymous function involves an overhead quite comparable to the overhead -of an m-file function. Passing a handle to a built-in function is faster, -because the interpreter is not involved in the internal loop. For example: - -@example -@group -a = @{@dots{}@} -v = cellfun (@@(x) det(x), a); # compute determinants -v = cellfun (@@det, a); # faster -@end group -@end example - -@item -Octave includes a number of other functions that can replace common types of -loops, including @code{bsxfun}, @code{arrayfun}, @code{structfun}, -@code{accumarray}. These functions can take an arbitrary function as a handle. -Be sure to get familiar with them if you want to become an Octave expert. - -@item -Avoid calling @code{eval} or @code{feval} excessively, because -they require Octave to parse input or look up the name of a function in -the symbol table. - -If you are using @code{eval} as an exception handling mechanism and not -because you need to execute some arbitrary text, use the @code{try} -statement instead. @xref{The @code{try} Statement}. - -@item -If you are calling lots of functions but none of them will need to -change during your run, set the variable -@code{ignore_function_time_stamp} to @code{"all"} so that Octave doesn't -waste a lot of time checking to see if you have updated your function -files. -@end itemize @node Comment Tips @section Tips on Writing Comments
new file mode 100644 --- /dev/null +++ b/doc/interpreter/vectorize.txi @@ -0,0 +1,625 @@ +@c Copyright (C) 2011 Jordi GutiƩrrez Hermoso +@c +@c This file is part of Octave. +@c +@c Octave is free software; you can redistribute it and/or modify it +@c under the terms of the GNU General Public License as published by the +@c Free Software Foundation; either version 3 of the License, or (at +@c your option) any later version. +@c +@c Octave is distributed in the hope that it will be useful, but WITHOUT +@c ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +@c FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +@c for more details. +@c +@c You should have received a copy of the GNU General Public License +@c along with Octave; see the file COPYING. If not, see +@c <http://www.gnu.org/licenses/>. + +@node Vectorization and Faster Code Execution +@chapter Vectorization and Faster Code Execution +@cindex vectorization +@cindex vectorize + +Vectorization is a programming technique that uses vector operations +instead of element-by-element loop-based operations. Besides frequently +writing more succinct code, vectorization also allows for better +optimization of the code. The optimizations may occur either in Octave's +own Fortran, C, or C++ internal implementation, or even at a lower level +depending on the compiler and external numerical libraries used to +build Octave. The ultimate goal is to make use of your hardware's vector +instructions if possible or to perform other optimizations in software. + +Vectorization is not a concept unique to Octave, but is particularly +important because Octave is a matrix-oriented language. Vectorized +Octave code will see a dramatic speed up in most cases. + +This chapter discusses vectorization and other techniques for faster +code execution. + +@menu +* Basic Vectorization:: Basic techniques for code optimization +* Broadcasting:: Broadcasting operations +* Function Application:: Applying functions to arrays, cells, and structs +* Accumulation:: Accumulation functions +* Miscellaneous Techniques:: Other techniques for speeding up code +* Examples:: +@end menu + +@node Basic Vectorization +@section Basic Vectorization + +To a very good first approximation, the goal in vectorization is to +write code that avoids loops and uses whole-array operations. As a +trivial example, consider + +@example +@group +for i = 1:n + for j = 1:m + c(i,j) = a(i,j) + b(i,j); + endfor +endfor +@end group +@end example + +@noindent +compared to the much simpler + +@example +c = a + b; +@end example + +@noindent +This isn't merely easier to write; it is also internally much easier to +optimize. Octave delegates this operation to an underlying +implementation which among other optimizations may use special vector +hardware instructions or could conceivably even perform the additions in +parallel. In general, if the code is vectorized, the underlying +implementation has much more freedom about the assumptions it can make +in order to achieve faster execution. + +This is especially important for loops with "cheap" bodies. Often it +suffices to vectorize just the innermost loop to get acceptable +performance. A general rule of thumb is that the "order" of the +vectorized body should be greater or equal to the "order" of the +enclosing loop. + +As a less trivial example, rather than + +@example +@group +for i = 1:n-1 + a(i) = b(i+1) - b(i); +endfor +@end group +@end example + +@noindent +write + +@example +a = b(2:n) - b(1:n-1); +@end example + +This shows an important general concept about using arrays for indexing +instead of looping over an index variable. @xref{Index Expressions}. +Also use boolean indexing generously. If a condition needs to be tested, +this condition can also be written as a boolean index. For instance, +instead of + +@example +@group +for i = 1:n + if a(i) > 5 + a(i) -= 20 + endif +endfor +@end group +@end example + +@noindent +write + +@example +a(a>5) -= 20; +@end example + +@noindent +which exploits the fact that @code{a > 5} produces a boolean index. + +Use elementwise vector operators whenever possible to avoid looping +(operators like @code{.*} and @code{.^}). @xref{Arithmetic Ops}. For +simple in-line functions, the @code{vectorize} function can do this +automatically. + +@DOCSTRING(vectorize) + +Also exploit broadcasting in these elementise operators both to avoid +looping and unnecessary intermediate memory allocations. +@xref{Broadcasting}. + +Use built-in and library functions if possible. Built-in and compiled +functions are very fast. Even with a m-file library function, chances +are good that it is already optimized, or will be optimized more in a +future release. + +For instance, even better than + +@example +a = b(2:n) - b(1:n-1); +@end example + +@noindent +is + +@example +a = diff (b); +@end example + +Most Octave functions are written with vector and array arguments in +mind. If you find yourself writing a loop with a very simple operation, +chances are that such a function already exists. The following functions +occur frequently in vectorized code: + +@itemize @bullet +@item +Index manipulation + +@itemize +@item +find +@item +sub2ind +@item +ind2sub +@item +sort +@item +unique +@item +lookup +@item +ifelse / merge +@end itemize + +@item +Repetition +@itemize +@item +repmat +@item +repelems +@end itemize + +@item +Vectorized arithmetic +@itemize +@item +sum +@item +prod +@item +cumsum +@item +cumprod +@item +sumsq +@item +diff +@item +dot +@item +cummax +@item +cummin +@end itemize + +@item +Shape of higher dimensional arrays +@itemize +@item +reshape +@item +resize +@item +permute +@item +squeeze +@item +deal +@end itemize + +@end itemize + +@node Broadcasting +@section Broadcasting +@cindex broadcast +@cindex broadcasting +@cindex BSX +@cindex recycling +@cindex SIMD + +Broadcasting refers to how Octave binary operators and functions behave +when their matrix or array operands or arguments differ in size. Since +version 3.6.0, Octave now automatically broadcasts vectors, matrices, +and arrays when using elementwise binary operators and functions. +Broadly speaking, smaller arrays are ``broadcast'' across the larger +one, until they have a compatible shape. The rule is that corresponding +array dimensions must either + +@enumerate +@item +be equal or, +@item +one of them must be 1. +@end enumerate + +@noindent +In case all dimensions are equal, no broadcasting occurs and ordinary +element-by-element arithmetic takes place. For arrays of higher +dimensions, if the number of dimensions isn't the same, then missing +trailing dimensions are treated as 1. When one of the dimensions is 1, +the array with that singleton dimension gets copied along that dimension +until it matches the dimension of the other array. For example, consider + +@example +@group +x = [1 2 3; + 4 5 6; + 7 8 9]; + +y = [10 20 30]; + +x + y +@end group +@end example + +@noindent +Without broadcasting, @code{x + y} would be an error because dimensions +do not agree. However, with broadcasting it is as if the following +operation were performed: + +@example +@group +x = [1 2 3 + 4 5 6 + 7 8 9]; + +y = [10 20 30 + 10 20 30 + 10 20 30]; + +x + y +@result{} 11 22 33 + 14 25 36 + 17 28 39 +@end group +@end example + +@noindent +That is, the smaller array of size @code{[1 3]} gets copied along the +singleton dimension (the number of rows) until it is @code{[3 3]}. No +actual copying takes place, however. The internal implementation reuses +elements along the necessary dimension in order to achieve the desired +effect without copying in memory. + +Both arrays can be broadcast across each other, for example, all +pairwise differences of the elements of a vector with itself: + +@example +@group +y - y' +@result{} 0 10 20 + -10 0 10 + -20 -10 0 +@end group +@end example + +@noindent +Here the vectors of size @code{[1 3]} and @code{[3 1]} both get +broadcast into matrices of size @code{[3 3]} before ordinary matrix +subtraction takes place. + +For a higher-dimensional example, suppose @code{img} is an RGB image of +size @code{[m n 3]} and we wish to multiply each colour by a different +scalar. The following code accomplishes this with broadcasting, + +@example +img .*= permute ([0.8, 0.9, 1.2], [1, 3, 2]); +@end example + +@noindent +Note the usage of permute to match the dimensions of the @code{[0.8, +0.9, 1.2]} vector with @code{img}. + +For functions that are not written with broadcasting semantics, +@code{bsxfun} can be useful for coercing them to broadcast. + +@DOCSTRING(bsxfun) + +Broadcasting is only applied if either of the two broadcasting +conditions hold. As usual, however, broadcasting does not apply when two +dimensions differ and neither is 1: + +@example +@group +x = [1 2 3 + 4 5 6]; +y = [10 20 + 30 40]; +x + y +@end group +@end example + +@noindent +This will produce an error about nonconformant arguments. + +Besides common arithmetic operations, several functions of two arguments +also broadcast. The full list of functions and operators that broadcast +is + +@example +@group + plus + .+ + minus - .- + times .* + rdivide ./ + ldivide .\ + power .^ .** + lt < + le <= + eq == + gt > + ge >= + ne != ~= + and & + or | + atan2 + hypot + max + min + mod + rem + xor + + += -= .+= .-= .*= ./= .\= .^= .**= &= |= +@end group +@end example + +Beware of resorting to broadcasting if a simpler operation will suffice. +For matrices @var{a} and @var{b}, consider the following: + +@example +c = sum (permute (a, [1, 3, 2]) .* permute (b, [3, 2, 1]), 3); +@end example + +@noindent +This operation broadcasts the two matrices with permuted dimensions +across each other during elementwise multiplication in order to obtain a +larger 3d array, and this array is the summed along the third dimension. +A moment of thought will prove that this operation is simply the much +faster ordinary matrix multiplication, @code{c = a*b;}. + +A note on terminology: ``broadcasting'' is the term popularized by the +Numpy numerical environment in the Python programming language. In other +programming languages and environments, broadcasting may also be known +as @emph{binary singleton expansion} (BSX, in @sc{Matlab}, and the +origin of the name of the @code{bsxfun} function), @emph{recycling} (R +programming language), @emph{single-instruction multiple data} (SIMD), +or @emph{replication}. + +@subsection Broadcasting and Legacy Code + +The new broadcasting semantics do not affect almost any code that worked +in previous versions of Octave without error. Thus for example all code +inherited from @sc{Matlab} that worked in previous versions of Octave +should still work without change in Octave. The only exception is code +such as + +@example +@group +try + c = a.*b; +catch + c = a.*a; +end_try_catch +@end group +@end example + +@noindent +that may have relied on matrices of different size producing an error. +Due to how broadcasting changes semantics with older versions of Octave, +by default Octave warns if a broadcasting operation is performed. To +disable this warning, refer to its ID (@pxref{doc-warning_ids}): + +@example +warning ("off", "Octave:broadcast"); +@end example + +@noindent +If you want to recover the old behaviour and produce an error, turn this +warning into an error: + +@example +warning ("error", "Octave:broadcast"); +@end example + +@node Function Application +@section Function Application +@cindex map +@cindex apply +@cindex function application + +As a general rule, functions should already be written with matrix +arguments in mind and should consider whole matrix operations in a +vectorized manner. Sometimes, writing functions in this way appears +difficult or impossible for various reasons. For those situations, +Octave provides facilities for applying a function to each element of an +array, cell, or struct. + +@DOCSTRING(arrayfun) +@DOCSTRING(spfun) +@DOCSTRING(cellfun) +@DOCSTRING(structfun) + +@node Accumulation +@section Accumulation + +Whenever it's possible to categorize according to indices the elements +of an array when performing a computation, accumulation functions can be +useful. + +@DOCSTRING(accumarray) + +@DOCSTRING(accumdim) + +@node Miscellaneous Techniques +@section Miscellaneous Techniques +@cindex execution speed +@cindex speedups +@cindex optimization + +Here are some other ways of improving the execution speed of Octave +programs. + +@itemize @bullet + +@item +Avoid computing costly intermediate results multiple times. Octave +currently does not eliminate common subexpressions. Also, certain +internal computation results are cached for variables. For instance, if +a matrix variable is used multiple times as an index, checking the +indices (and internal conversion to integers) is only done once. + +@item +@cindex copy-on-write +@cindex COW +@cindex memory management +Be aware of lazy copies (copy-on-write). When a copy of an object is +created, the data is not immediately copied, but rather shared. The +actual copying is postponed until the copied data needs to be modified. +For example: + +@example +@group +a = zeros (1000); # create a 1000x1000 matrix +b = a; # no copying done here +b(1) = 1; # copying done here +@end group +@end example + +Lazy copying applies to whole Octave objects such as matrices, cells, +struct, and also individual cell or struct elements (not array +elements). + +Additionally, index expressions also use lazy copying when Octave can +determine that the indexed portion is contiguous in memory. For example: + +@example +@group +a = zeros (1000); # create a 1000x1000 matrix +b = a(:,10:100); # no copying done here +b = a(10:100,:); # copying done here +@end group +@end example + +This applies to arrays (matrices), cell arrays, and structs indexed +using (). Index expressions generating cs-lists can also benefit of +shallow copying in some cases. In particular, when @var{a} is a struct +array, expressions like @code{@{a.x@}, @{a(:,2).x@}} will use lazy +copying, so that data can be shared between a struct array and a cell +array. + +Most indexing expressions do not live longer than their `parent' +objects. In rare cases, however, a lazily copied slice outlasts its +parent, in which case it becomes orphaned, still occupying unnecessarily +more memory than needed. To provide a remedy working in most real cases, +Octave checks for orphaned lazy slices at certain situations, when a +value is stored into a "permanent" location, such as a named variable or +cell or struct element, and possibly economizes them. For example: + +@example +@group +a = zeros (1000); # create a 1000x1000 matrix +b = a(:,10:100); # lazy slice +a = []; # the original a array is still allocated +c@{1@} = b; # b is reallocated at this point +@end group +@end example + +@item +Avoid deep recursion. Function calls to m-file functions carry a +relatively significant overhead, so rewriting a recursion as a loop +often helps. Also, note that the maximum level of recursion is limited. + +@item +Avoid resizing matrices unnecessarily. When building a single result +matrix from a series of calculations, set the size of the result matrix +first, then insert values into it. Write + +@example +@group +result = zeros (big_n, big_m) +for i = over:and_over + r1 = @dots{} + r2 = @dots{} + result (r1, r2) = new_value (); +endfor +@end group +@end example + +@noindent +instead of + +@example +@group +result = []; +for i = ever:and_ever + result = [ result, new_value() ]; +endfor +@end group +@end example + +Sometimes the number of items can't be computed in advance, and +stack-like operations are needed. When elements are being repeatedly +inserted at/removed from the end of an array, Octave detects it as stack +usage and attempts to use a smarter memory management strategy +pre-allocating the array in bigger chunks. Likewise works for cell and +struct arrays. + +@example +@group +a = []; +while (condition) + @dots{} + a(end+1) = value; # "push" operation + @dots{} + a(end) = []; # "pop" operation + @dots{} +endwhile +@end group +@end example + +@item +Avoid calling @code{eval} or @code{feval} excessively, because +they require Octave to parse input or look up the name of a function in +the symbol table. + +If you are using @code{eval} as an exception handling mechanism and not +because you need to execute some arbitrary text, use the @code{try} +statement instead. @xref{The @code{try} Statement}. + +@item +If you are calling lots of functions but none of them will need to +change during your run, set the variable +@code{ignore_function_time_stamp} to @code{"all"} so that Octave doesn't +waste a lot of time checking to see if you have updated your function +files. +@end itemize + +@node Examples +@section Examples + +Here goes a gallery of vectorization puzzles with solutions culled from +the help mailing list and the machine learning class...
--- a/liboctave/bsxfun.h +++ b/liboctave/bsxfun.h @@ -42,7 +42,7 @@ } (*current_liboctave_warning_with_id_handler) - ("Octave:auto-bsxfun", "%s: automatic broadcasting operation applied", name.c_str ()); + ("Octave:broadcast", "%s: automatic broadcasting operation applied", name.c_str ()); return true; } @@ -69,7 +69,7 @@ } (*current_liboctave_warning_with_id_handler) - ("Octave:auto-bsxfun", "%s: automatic broadcasting operation applied", name.c_str ()); + ("Octave:broadcast", "%s: automatic broadcasting operation applied", name.c_str ()); return true; }
--- a/scripts/general/accumdim.m +++ b/scripts/general/accumdim.m @@ -43,7 +43,7 @@ ## ## An example of the use of @code{accumdim} is: ## -## @smallexample +## @example ## @group ## accumdim ([1, 2, 1, 2, 1], [ 7, -10, 4; ## -5, -12, 8; @@ -52,7 +52,7 @@ ## -5, -3, -13]) ## @result{} ans = [-10,-11,-1;-15,-3,5] ## @end group -## @end smallexample +## @end example ## ## @seealso{accumarray} ## @end deftypefn
--- a/scripts/miscellaneous/warning_ids.m +++ b/scripts/miscellaneous/warning_ids.m @@ -120,6 +120,11 @@ ## directory as the .oct file referred to by the autoload() command. ## By default, the @code{Octave:autoload-relative-file-name} warning is enabled. ## +## @item Octave:broadcast +## Warn when performing broadcasting operations. By default, this is +## enabled. See the Broadcasting section in the Vectorization and Faster +## Code Execution chapter in the manual. +## ## @item Octave:built-in-variable-assignment ## By default, the @code{Octave:built-in-variable-assignment} warning is ## enabled.
--- a/src/DLD-FUNCTIONS/bsxfun.cc +++ b/src/DLD-FUNCTIONS/bsxfun.cc @@ -312,8 +312,8 @@ DEFUN_DLD (bsxfun, args, , "-*- texinfo -*-\n\ @deftypefn {Loadable Function} {} bsxfun (@var{f}, @var{A}, @var{B})\n\ -The binary singleton expansion function applier does what its name\n\ -suggests: applies a binary function @var{f} element-by-element to two\n\ +The binary singleton expansion function applier performs broadcasting,\n\ +that is, applies a binary function @var{f} element-by-element to two\n\ array arguments @var{A} and @var{B}, and expands as necessary\n\ singleton dimensions in either input argument. @var{f} is a function\n\ handle, inline function, or string containing the name of the function\n\
--- a/src/DLD-FUNCTIONS/cellfun.cc +++ b/src/DLD-FUNCTIONS/cellfun.cc @@ -365,6 +365,21 @@ @end group\n\ @end example\n\ \n\ +Use @code{cellfun} intelligently. The @code{cellfun} function is a\n\ +useful tool for avoiding loops. It is often used with anonymous\n\ +function handles; however, calling an anonymous function involves an\n\ +overhead quite comparable to the overhead of an m-file function.\n\ +Passing a handle to a built-in function is faster, because the\n\ +interpreter is not involved in the internal loop. For example:\n\ +\n\ +@example\n\ +@group\n\ +a = @{@dots{}@}\n\ +v = cellfun (@@(x) det(x), a); # compute determinants\n\ +v = cellfun (@@det, a); # faster\n\ +@end group\n\ +@end example\n\ +\n\ @seealso{arrayfun, structfun, spfun}\n\ @end deftypefn") {
--- a/src/version.h.in +++ b/src/version.h.in @@ -87,6 +87,8 @@ #define OCTAVE_STARTUP_MESSAGE \ X_OCTAVE_NAME_VERSION_COPYRIGHT_COPYING_WARRANTY_AND_BUGS \ (" For details, type `warranty'.") "\n\n" \ - "For information about changes from previous versions, type `news'." - + "For information about changes from previous versions, type `news'.\n\n\ +** WARNING ** Semantics for arithmetic operators have changed and now\n\ + broadcast. Type `doc broadcast' to get a description of\n\ + the changes." #endif