Although the library stops developing, it is worth to learn since there are many prominent papers using Theano. Besides, there are lots of stuff to learn from the implementation.
So, Theano basically is a symbolic graphs of computation. Plus, it can do automatically gradient computing stuff.
sudo pip install theano
To fix the inconsistent between
[global] floatX = float32
There are several conventions:
import theano.tensor as T from theano import function
Tstands for tensor, it includes tensor operations.
functionconstructs a function from given input, output and other properties.
- function([input], output): the input always is a list even though there is only 1 argument.
x = T.dscalar('x') y = T.dscalar('y') z = x + y f = function([x, y], z)
To create a scalar varialbe, use
<type> stands for the type of
the variable. The prefixes
b,i,f,d,c used for
byte, integer, float, double, complex respectively. By the way, there are 7 primitive types in Theano: byte (b), 16-bit integer (w), 32-bit integers (i), 4-bit integers (l), float (f), double (d), complex ©.
pp to pretty-print the symbolic variable of Theano.
This damn thing seems very important. Let break it down and see how to manipulate it. Beyound basic argument including inputs, output, there are two more fancier ones:
givens: pairs of
Var1, Var2which later the function will substitute
updates: Update rules.
To break the function down or debug it,
pydotprint visualize the function by graph. By far, this is the most intuitive way to examine the function.
Must-Read material: Understand Memory Aliasing for Speed and Correctness. There are some takeaway notes:
borrow=Truewhen creating new share variables.
borrow=Falsewhen retrieving the values of TensorVariable, this also is the default value of
Firstly, let take a look at the
sgn function. Its gradients are zeros. It means that if we put the signed function, which is especially common in the hashing problem, all prior components of the network are not able to update their weights. Therefore, if the loss function is an autoencoder, namely:
$$ L = \left\lVert f(sgn(g(X))) - X \right\rVert $$
where $g$ and $f$ are encoder and decoder, respectively. We could not learn the encoder at all.