1. PyTorch find minimum of a custom function with an optimiser (Adam)

March 2022

2. Binary Cross-Entropy vs Mean Squared Error

March 2022

3. Binary heap tree proof: index of left child is doubled index of parent + 1

January 2022

4. Regex-based tokenizer

January 2022

# Content

## PyTorch find minimum of a custom function with an optimiser (Adam)

I caught myself thinking, that most of the tutorial on PyTorch are about neural networks, meanwhile it’s a quite general optimisation framework. There’s a tutorial about how to use autograd, but, still, using autograd is not the same as using an already written high-quality optimiser like, Adam, Adagrad, etc.

So I decided to start with a minimum example and find minimum of x^2 + 1. Weird, but I have not found many tutorials and got stuck with that simple problem. Conor Mc wrote an article, but, still, it uses some custom class based on nn.Model. There also was an article by Bijay Kumar, yet, still, it used nn.Linear layer! ðŸ™‚ So, yeah, it took me some time to figure out a working solution and here it is:

``````from matplotlib.pyplot import *
from torch import Tensor
from torch.nn import Parameter

X = Parameter(Tensor([10]))

losses = []
for i_step in range(10):
y = X ** 2 + 1
y.backward()
opt.step()
losses.append(y.item())

plot(losses)
show()``````
Continue…

## Binary Cross-Entropy vs Mean Squared Error

In this post I’m trying better understand Cross-Entropy loss and why it is better than Mean-Squared Error.

On the plot below you can see, that, Mean Squared Error may provide just inadequate and, sometimes, unoptimisable values on low amount of noised data.

TODO: non-noised data, big amount of data, non-linearly separable data.

## Binary heap tree proof: index of left child is doubled index of parent + 1

We can often see, that i_left_child = i_parent * 2 + 1. And even though this formula is easy to check, it’s quite hard to believe in it, without a formal proof. I made an image that proofs this quite commonly used in computer science fact.

Continue…

## Regex-based tokenizer

This post is not mine, but I found it so useful, that, when I lost the url, I finally decided to save the content to my blog. Also, there’s a guide in the official Python documentation, but it looks a bit more complicated to me.

``````import re

SCANNER = re.compile(r'''
(\s+) |                      # whitespace
0[xX]([0-9A-Fa-f]+) |        # hexadecimal integer literals
(\d+) |                      # integer literals
(<<|>>) |                    # multi-char punctuation
([][(){}<>=,;:*+-/]) |       # punctuation
([A-Za-z_][A-Za-z0-9_]*) |   # identifiers
"""(.*?)""" |                # multi-line string literal
"((?:[^"\n\\]|\\.)*)" |      # regular string literal
(.)                          # an error!
''', re.DOTALL | re.VERBOSE)``````

If you combine this with a re.finditer() call on your source string like this:

``````for match in re.finditer(SCANNER, data):
space, comment, hexint, integer, mpunct, \
punct, word, mstringlit, stringlit, badchar = match.groups()
if space: ...
if comment: ...
# ...
``https://deplinenoise.wordpress.com/2012/01/04/python-tip-regex-based-tokenizer/``