Linear Algebra - d2l.ai Exercises - Part 3
The third notebook in a series to be posted aiming to solve and understand exercises from d2l.ai curriculum on deep learning
- Exercise setup
- Problems and answers
- Prove that the transpose of a matrix A ’s transpose is A : $(A^{T})^T$=$A$.
- Show that the sum of transposes is equal to the transpose of a sum: $A^T+B^T=(A+B)^T$.
- Given any square matrix $A$ , is $A+A^T$ always symmetric? Why?
- Output of len(X) for tensor $X$ shaped (2, 3, 4) and does len(X) always correspond to the length of a certain axis of $X$? What is that axis?
- What happens when we divide $A$ by the sum of it's second axis? A / A.sum(axis=1)
- When traveling between two points in Manhattan, what is the distance that you need to cover in terms of the coordinates, i.e., in terms of avenues and streets? Can you travel diagonally?
- The summation outputs for tensor with shape (2, 3, 4) along axis 0, 1, and 2.
- Feed a tensor with 3 or more axes to the linalg.norm.
- What does this function compute for tensors of arbitrary shape?
import tensorflow as tf
A = tf.reshape(tf.range(20, dtype=tf.float32), (5,4))
A
This is straightforward, transposing is basically converting rows to columns and vice-versa, so when done twice we would end up what we started with.
At = tf.transpose(A)
At
At_t = tf.transpose(At)
At_t
At_t == A
Let's consider a second matrix, $B$
The transpose would be
$A^T+B^T$
The sum of the transposed matrices would be
$(A+B)^T$
The transposed sum would be
$A^T+B^T == (A+B)^T$ ?
All the numbers are equal, we can see that by looking at the results
Let's define a square matrix
The tranpose of the same would be
At = tf.transpose(A)
At
The sum of the tensors
Let's see if the condition stands
I guess since we add the rows and columns of the same matrix and its transpose and also since addition is commutative (ie) $A+B = B+A$, all the numbers we add endup becoming equal in terms of their respective positions, so even tranposing the resultant matrix ends up being equal to the former.
Let's consider the following as $X$
len(X)
We can see that the length returns the size of the first axis, let us see if it does the same for the other arbitrary tensors
len(X)
len(X)
No matter what the shape of the tensor len
always picks the first/outermost axis.
Let's define $A$
A / tf.reduce_sum(A, axis=1)
# This produces an error
Ok, there seems to be shape inconsistencies to the resultant sum tensor. Let's see the sum output for the axes in tensor.
tf.reduce_sum(A, axis=1)
tf.reduce_sum(A, axis=0)
So I think, When we sum a tensor on a particular axis, the shape of the resultant tensor will end taking the shape with the other remaining axes, for example a tensor with shape (5, 4 ,3)
when summed up along the third axis (2)
the resultant tensor would be of shape (5, 4)
The shapes for the tensors summed along the rest of the axes can be understood by the same.
(5, 4, 3)
axis = 0 : (4, 3)
axis = 1 : (5, 3)
axis = 2 : (5, 4)
When we do the division with the other resultant tensor, we can easily divide with it since the shapes follow the broadcasting rules
a - 5 X 4
summed_a - 4
result - 5 X 4
The following is the result of the division
A fellow learner suggested to reframe the question here, and said that the following code is what would have been expected by the one who framed the question.
A / tf.reshape(tf.reduce_sum(A,axis=1),(-1,1))
Let's see the shapes and values of the numerator and denominator of the above
tf.reshape(tf.reduce_sum(A,axis=1),(-1,1)), A
So if we compare the values, each value in A
has been divided by the summation of each of it's rows
A user suggested to look at the streets of manhattan in google maps, to understand the question better,! as suggested the maps picture looked like a big piece of land cut into several small rectangular boxes, very much like 2d coordinate space, where each intersection of street can be treated as an point in that space.
Let's look at sample screenshot from google maps of manhattan streets.
The question asks us to find the distance that we need to cover if we need to go from one point to another in terms of streets and avenues.
Looks like there is a formula called Manhattan distance
for a reason!
The distance between two points measured along axes at right angles. In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|. referred from here and there is also a special geometry called taxicab geometry, with manhattan distance as its metric, there are some good images and content to understand better here
if we need to measure the distance in terms of streets and avenues, we need to consider them as dimensions (x(street),y(avenues)), let us consider that we are in the metro point in 72nd street and we need to go to the Jones Wood Foundry in 76th street.
Let's define them as coordinates,
72nd street, 2nd Avenue -> $(72, 2)$
76th street, 1st Avenue -> $(76, 1)$
so according to the formula the answer would be $(4, 1)$ ie 4 streets and 1 avenue, and we don't travel diagonally unless we are flying.
I think I have answered this in the question about A / A.sum(axis=1)
, that should apply to the any arbitrary shape, for this one it would turnout to be the following
(2, 3, 4)
axis = 0 : (3, 4)
axis = 1 : (2, 4)
axis = 2 : (2, 3)
Let's take a 3-d tensor
tf.norm(X)
Let's take an arbitrary shaped tensor
tf.norm(X)
tf.norm
still calculates the square root of squared sum of all numbers in the tensor, equivalent to the following
tf.sqrt(
float(sum(
[x*x for x in range(80)]
))
)
I am not sure if there was any change of behaviour expected in this, so I should try using mxnet
to see if there is a difference in the above calculation of l2 norm.