Theano: A graph computation framework - Notes and Tips

Although the library stops developing, it is worth to learn since there are many prominent papers using Theano. Besides, there are lots of stuff to learn from the implementation.

So, Theano basically is a symbolic graphs of computation. Plus, it can do automatically gradient computing stuff.

Learning Materials


sudo pip install theano

To fix the inconsistent between float32 and float64, in .theanorc:

floatX = float32

Getting Started

There are several conventions:

import theano.tensor as T
from theano import function
  • T stands for tensor, it includes tensor operations.
  • function constructs a function from given input, output and other properties.
    • function([input], output): the input always is a list even though there is only 1 argument.



x = T.dscalar('x')
y = T.dscalar('y')
z = x + y
f = function([x, y], z)

To create a scalar varialbe, use T.<type>scalar('name_variable') where <type> stands for the type of the variable. The prefixes b,i,f,d,c used for byte, integer, float, double, complex respectively. By the way, there are 7 primitive types in Theano: byte (b), 16-bit integer (w), 32-bit integers (i), 4-bit integers (l), float (f), double (d), complex ©.

Use pp to pretty-print the symbolic variable of Theano.


This damn thing seems very important. Let break it down and see how to manipulate it. Beyound basic argument including inputs, output, there are two more fancier ones:

  • givens: pairs of Var1, Var2 which later the function will substitute Var1 by Var2.
  • updates: Update rules.

To break the function down or debug it, pydotprint visualize the function by graph. By far, this is the most intuitive way to examine the function.

Data Management

Must-Read material: Understand Memory Aliasing for Speed and Correctness. There are some takeaway notes:

  • Use borrow=True when creating new share variables.
  • Use borrow=False when retrieving the values of TensorVariable, this also is the default value of borrow in get_value.

Tensor Operations

Nondifferentiable functions

Firstly, let take a look at the sgn function. Its gradients are zeros. It means that if we put the signed function, which is especially common in the hashing problem, all prior components of the network are not able to update their weights. Therefore, if the loss function is an autoencoder, namely:

$$ L = \left\lVert f(sgn(g(X))) - X \right\rVert $$

where $g$ and $f$ are encoder and decoder, respectively. We could not learn the encoder at all.


comments powered by Disqus