Exercise setup

import tensorflow as tf

A = tf.reshape(tf.range(20, dtype=tf.float32), (5,4))
A
<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.],
       [16., 17., 18., 19.]], dtype=float32)>

Problems and answers

Prove that the transpose of a matrix A ’s transpose is A : $(A^{T})^T$=$A$.

This is straightforward, transposing is basically converting rows to columns and vice-versa, so when done twice we would end up what we started with.

At = tf.transpose(A)
At
<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[ 0.,  4.,  8., 12., 16.],
       [ 1.,  5.,  9., 13., 17.],
       [ 2.,  6., 10., 14., 18.],
       [ 3.,  7., 11., 15., 19.]], dtype=float32)>
At_t = tf.transpose(At)
At_t
<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.],
       [16., 17., 18., 19.]], dtype=float32)>
At_t == A
<tf.Tensor: shape=(5, 4), dtype=bool, numpy=
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])>

Show that the sum of transposes is equal to the transpose of a sum: $A^T+B^T=(A+B)^T$.

Let's consider a second matrix, $B$

<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[ 7.975751 ,  7.70928  ,  8.388272 , 11.001523 ],
       [ 7.6716766, 11.476339 ,  6.2204466,  5.6182394],
       [ 9.765643 ,  6.7869806,  8.873018 ,  5.6852665],
       [ 8.200825 ,  4.9842663, 11.172729 , 11.063158 ],
       [ 8.75681  ,  8.760315 ,  4.151512 ,  5.0749035]], dtype=float32)>

The transpose would be

<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[ 7.975751 ,  7.6716766,  9.765643 ,  8.200825 ,  8.75681  ],
       [ 7.70928  , 11.476339 ,  6.7869806,  4.9842663,  8.760315 ],
       [ 8.388272 ,  6.2204466,  8.873018 , 11.172729 ,  4.151512 ],
       [11.001523 ,  5.6182394,  5.6852665, 11.063158 ,  5.0749035]],
      dtype=float32)>

$A^T+B^T$

The sum of the transposed matrices would be

<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[ 7.975751, 11.671677, 17.765644, 20.200825, 24.75681 ],
       [ 8.70928 , 16.47634 , 15.786981, 17.984266, 25.760315],
       [10.388272, 12.220447, 18.873018, 25.17273 , 22.151512],
       [14.001523, 12.618239, 16.685266, 26.063158, 24.074903]],
      dtype=float32)>

$(A+B)^T$

The transposed sum would be

<tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[ 7.975751, 11.671677, 17.765644, 20.200825, 24.75681 ],
       [ 8.70928 , 16.47634 , 15.786981, 17.984266, 25.760315],
       [10.388272, 12.220447, 18.873018, 25.17273 , 22.151512],
       [14.001523, 12.618239, 16.685266, 26.063158, 24.074903]],
      dtype=float32)>

$A^T+B^T == (A+B)^T$ ?

<tf.Tensor: shape=(4, 5), dtype=bool, numpy=
array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])>

All the numbers are equal, we can see that by looking at the results

Given any square matrix $A$ , is $A+A^T$ always symmetric? Why?

Let's define a square matrix

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.]], dtype=float32)>

The tranpose of the same would be

At = tf.transpose(A)
At
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.,  4.,  8., 12.],
       [ 1.,  5.,  9., 13.],
       [ 2.,  6., 10., 14.],
       [ 3.,  7., 11., 15.]], dtype=float32)>

The sum of the tensors

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.,  5., 10., 15.],
       [ 5., 10., 15., 20.],
       [10., 15., 20., 25.],
       [15., 20., 25., 30.]], dtype=float32)>

Let's see if the condition stands

<tf.Tensor: shape=(4, 4), dtype=bool, numpy=
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])>

I guess since we add the rows and columns of the same matrix and its transpose and also since addition is commutative (ie) $A+B = B+A$, all the numbers we add endup becoming equal in terms of their respective positions, so even tranposing the resultant matrix ends up being equal to the former.

Output of len(X) for tensor $X$ shaped (2, 3, 4) and does len(X) always correspond to the length of a certain axis of $X$? What is that axis?

Let's consider the following as $X$

<tf.Tensor: shape=(2, 3, 4), dtype=int32, numpy=
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]], dtype=int32)>
len(X)
2

We can see that the length returns the size of the first axis, let us see if it does the same for the other arbitrary tensors

<tf.Tensor: shape=(8, 1, 3), dtype=float32, numpy=
array([[[11.439855 ,  5.880226 ,  5.9797716]],

       [[11.37106  ,  5.619686 ,  5.9706793]],

       [[ 6.4085245, 11.867535 ,  4.3086786]],

       [[ 7.1461754,  8.795105 ,  8.864346 ]],

       [[ 9.952526 ,  7.6806755,  7.7797728]],

       [[10.933958 , 11.748696 ,  6.464444 ]],

       [[ 5.296891 ,  6.7806816,  4.316203 ]],

       [[ 8.316187 ,  7.272793 ,  9.020613 ]]], dtype=float32)>
len(X)
8
<tf.Tensor: shape=(1, 2, 3, 9), dtype=float32, numpy=
array([[[[ 5.9411087,  6.242239 ,  4.4269447,  7.913884 ,  7.8960876,
           7.511854 ,  6.3407526, 11.290615 ,  4.5310717],
         [ 7.182088 ,  5.086608 ,  4.0900164,  4.7155457,  8.863187 ,
           4.1158237, 10.514992 ,  9.662274 ,  8.8960705],
         [11.142818 ,  6.125886 ,  9.6489105,  7.8091097,  9.66531  ,
           9.282991 ,  8.218669 , 11.877634 ,  8.727693 ]],

        [[ 4.3840303,  8.792656 ,  9.48595  ,  9.231619 ,  5.972165 ,
          11.478173 , 10.220118 , 10.394747 ,  4.430291 ],
         [ 7.198678 ,  7.2096577,  5.8975067,  4.6933975,  6.6245346,
          11.958464 , 10.320432 , 11.609855 ,  7.1605587],
         [ 6.389407 ,  5.9069185,  7.974592 ,  5.289855 ,  5.713969 ,
           6.6944523,  4.1094055,  4.077242 ,  8.026564 ]]]],
      dtype=float32)>
len(X)
1

No matter what the shape of the tensor len always picks the first/outermost axis.

What happens when we divide $A$ by the sum of it's second axis? A / A.sum(axis=1)

Let's define $A$

<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.],
       [16., 17., 18., 19.]], dtype=float32)>
A / tf.reduce_sum(A, axis=1)
# This produces an error

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-4-0f11f2009278> in <module>
      1 #collapse
----> 2 A / tf.reduce_sum(A, axis=1)
      3 # This produces an error

~\miniconda3\envs\tensor_env\lib\site-packages\tensorflow\python\ops\math_ops.py in binary_op_wrapper(x, y)
   1123     with ops.name_scope(None, op_name, [x, y]) as name:
   1124       try:
-> 1125         return func(x, y, name=name)
   1126       except (TypeError, ValueError) as e:
   1127         # Even if dispatching the op failed, the RHS may be a tensor aware

~\miniconda3\envs\tensor_env\lib\site-packages\tensorflow\python\util\dispatch.py in wrapper(*args, **kwargs)
    199     """Call target, and fall back on dispatchers if there is a TypeError."""
    200     try:
--> 201       return target(*args, **kwargs)
    202     except (TypeError, ValueError):
    203       # Note: convert_to_eager_tensor currently raises a ValueError, not a

~\miniconda3\envs\tensor_env\lib\site-packages\tensorflow\python\ops\math_ops.py in truediv(x, y, name)
   1295     TypeError: If `x` and `y` have different dtypes.
   1296   """
-> 1297   return _truediv_python3(x, y, name)
   1298 
   1299 

~\miniconda3\envs\tensor_env\lib\site-packages\tensorflow\python\ops\math_ops.py in _truediv_python3(x, y, name)
   1234       x = cast(x, dtype)
   1235       y = cast(y, dtype)
-> 1236     return gen_math_ops.real_div(x, y, name=name)
   1237 
   1238 

~\miniconda3\envs\tensor_env\lib\site-packages\tensorflow\python\ops\gen_math_ops.py in real_div(x, y, name)
   7440       return _result
   7441     except _core._NotOkStatusException as e:
-> 7442       _ops.raise_from_not_ok_status(e, name)
   7443     except _core._FallbackException:
   7444       pass

~\miniconda3\envs\tensor_env\lib\site-packages\tensorflow\python\framework\ops.py in raise_from_not_ok_status(e, name)
   6841   message = e.message + (" name: " + name if name is not None else "")
   6842   # pylint: disable=protected-access
-> 6843   six.raise_from(core._status_to_exception(e.code, message), None)
   6844   # pylint: enable=protected-access
   6845 

~\miniconda3\envs\tensor_env\lib\site-packages\six.py in raise_from(value, from_value)

InvalidArgumentError: Incompatible shapes: [5,4] vs. [5] [Op:RealDiv]

Ok, there seems to be shape inconsistencies to the resultant sum tensor. Let's see the sum output for the axes in tensor.

tf.reduce_sum(A, axis=1)
<tf.Tensor: shape=(5,), dtype=float32, numpy=array([ 6., 22., 38., 54., 70.], dtype=float32)>
tf.reduce_sum(A, axis=0)
<tf.Tensor: shape=(4,), dtype=float32, numpy=array([40., 45., 50., 55.], dtype=float32)>

So I think, When we sum a tensor on a particular axis, the shape of the resultant tensor will end taking the shape with the other remaining axes, for example a tensor with shape (5, 4 ,3) when summed up along the third axis (2) the resultant tensor would be of shape (5, 4)

The shapes for the tensors summed along the rest of the axes can be understood by the same.

(5, 4, 3)

axis = 0 : (4, 3)
axis = 1 : (5, 3)
axis = 2 : (5, 4)

When we do the division with the other resultant tensor, we can easily divide with it since the shapes follow the broadcasting rules

a - 5 X 4
summed_a -     4 
result   - 5 X 4

The following is the result of the division

<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[0.        , 0.02222222, 0.04      , 0.05454545],
       [0.1       , 0.11111111, 0.12      , 0.12727273],
       [0.2       , 0.2       , 0.2       , 0.2       ],
       [0.3       , 0.2888889 , 0.28      , 0.27272728],
       [0.4       , 0.37777779, 0.36      , 0.34545454]], dtype=float32)>

A fellow learner suggested to reframe the question here, and said that the following code is what would have been expected by the one who framed the question.

A / tf.reshape(tf.reduce_sum(A,axis=1),(-1,1))
<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[0.        , 0.16666667, 0.33333334, 0.5       ],
       [0.18181819, 0.22727273, 0.27272728, 0.3181818 ],
       [0.21052632, 0.23684211, 0.2631579 , 0.28947368],
       [0.22222222, 0.24074075, 0.25925925, 0.2777778 ],
       [0.22857143, 0.24285714, 0.25714287, 0.27142859]], dtype=float32)>

Let's see the shapes and values of the numerator and denominator of the above

Numerator:  (5, 4)
Denominator: (5, 1)
tf.reshape(tf.reduce_sum(A,axis=1),(-1,1)), A
(<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
 array([[ 6.],
        [22.],
        [38.],
        [54.],
        [70.]], dtype=float32)>,
 <tf.Tensor: shape=(5, 4), dtype=float32, numpy=
 array([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]], dtype=float32)>)

So if we compare the values, each value in A has been divided by the summation of each of it's rows

When traveling between two points in Manhattan, what is the distance that you need to cover in terms of the coordinates, i.e., in terms of avenues and streets? Can you travel diagonally?

A user suggested to look at the streets of manhattan in google maps, to understand the question better,! as suggested the maps picture looked like a big piece of land cut into several small rectangular boxes, very much like 2d coordinate space, where each intersection of street can be treated as an point in that space.

Let's look at sample screenshot from google maps of manhattan streets.

The question asks us to find the distance that we need to cover if we need to go from one point to another in terms of streets and avenues.
Looks like there is a formula called Manhattan distance for a reason!

The distance between two points measured along axes at right angles. In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|. referred from here and there is also a special geometry called taxicab geometry, with manhattan distance as its metric, there are some good images and content to understand better here

if we need to measure the distance in terms of streets and avenues, we need to consider them as dimensions (x(street),y(avenues)), let us consider that we are in the metro point in 72nd street and we need to go to the Jones Wood Foundry in 76th street.

Let's define them as coordinates,
72nd street, 2nd Avenue -> $(72, 2)$
76th street, 1st Avenue -> $(76, 1)$

so according to the formula the answer would be $(4, 1)$ ie 4 streets and 1 avenue, and we don't travel diagonally unless we are flying.

The summation outputs for tensor with shape (2, 3, 4) along axis 0, 1, and 2.

I think I have answered this in the question about A / A.sum(axis=1), that should apply to the any arbitrary shape, for this one it would turnout to be the following

(2, 3, 4)

axis = 0 : (3, 4)
axis = 1 : (2, 4)
axis = 2 : (2, 3)

Feed a tensor with 3 or more axes to the linalg.norm.

What does this function compute for tensors of arbitrary shape?

Let's take a 3-d tensor

<tf.Tensor: shape=(2, 2, 5), dtype=float32, numpy=
array([[[ 0.,  1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.,  9.]],

       [[10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19.]]], dtype=float32)>
tf.norm(X)
<tf.Tensor: shape=(), dtype=float32, numpy=49.699093>

Let's take an arbitrary shaped tensor

<tf.Tensor: shape=(8, 1, 2, 5), dtype=float32, numpy=
array([[[[ 0.,  1.,  2.,  3.,  4.],
         [ 5.,  6.,  7.,  8.,  9.]]],


       [[[10., 11., 12., 13., 14.],
         [15., 16., 17., 18., 19.]]],


       [[[20., 21., 22., 23., 24.],
         [25., 26., 27., 28., 29.]]],


       [[[30., 31., 32., 33., 34.],
         [35., 36., 37., 38., 39.]]],


       [[[40., 41., 42., 43., 44.],
         [45., 46., 47., 48., 49.]]],


       [[[50., 51., 52., 53., 54.],
         [55., 56., 57., 58., 59.]]],


       [[[60., 61., 62., 63., 64.],
         [65., 66., 67., 68., 69.]]],


       [[[70., 71., 72., 73., 74.],
         [75., 76., 77., 78., 79.]]]], dtype=float32)>
tf.norm(X)
<tf.Tensor: shape=(), dtype=float32, numpy=409.2432>

tf.norm still calculates the square root of squared sum of all numbers in the tensor, equivalent to the following

tf.sqrt(
    float(sum(
        [x*x for x in range(80)]
        ))
    )
<tf.Tensor: shape=(), dtype=float32, numpy=409.2432>

I am not sure if there was any change of behaviour expected in this, so I should try using mxnet to see if there is a difference in the above calculation of l2 norm.