The Concept of State in Blockchains

While working on the code that makes up an actual cryptocoin, I’d always come across a class or something that would refer to a ‘State’. And I never really knew what that was.

Was that the state of the coin’s network, on the latest block?
The state of that node perhaps – but how could the node have a valid state when it’s still busy downloading the blockchain?

It wasn’t until I read Ethereum’s whitepaper that the concept of state was formalized: it’s the state of the coin as your node sees it, with the blockchain data it has on your computer.

Should your node not have the latest blocks, well then your node’s state is out of date, and it needs to download more blocks and recalculate the new state from that.

Bitcoin’s state is very simple. It is simply which addresses have how many coins, or the Unspent Transaction Outputs (UTXOs). The blocks don’t have any information about the state; the Bitcoin client must calculate its own state based on the private keys it has in its wallet, and from what it has downloaded of the blockchain.

Ethereum blocks include the state in them, in that each block header holds a hash of the state tree’s root, as well as hashes of the transaction and storage trees’ roots (you wouldn’t want blocks to always have a full copy of the state. That would be a waste of space. So you store them in a tree structure, where the branches can point to data (leaves) in older blocks).
What’s in an Ethereum state?
1. The account’s nonce (each time you send a transaction from this account, the nonce increases by 1. This is to prevent double spending)
2. The account’s balance
3. storageRoot (data and Solidity programs go in here)
4. codeHash (what is this?)

Remember when you had to download that huge Bitcoin blockchain and wait for it to be synced up to the latest block just to know how much Bitcoin you have? In Ethereum, because the state is always stored with the block in this manner, you can just get the latest block, traverse the state tree (which points you to other blocks in the history that are relevant to your account) and you can get your balance much faster.

It seems many people like the idea of storing the state in blocks, since QRL also stores the state. The State in QRL consists of (for each account):
1. The account’s nonce (this is everywhere. I wonder if Bitcoin accounts also have this)
2. The account’s balance
3. a list of public keys that this account used before, to prevent them from being used again (One Time Signatures are what protects us from quantum computers)
4. a list of staking accounts (this is a PoS coin)

Generating Jules Verne novels with Torch-RNN

Torch-RNN is a rewrite of Andrej Karpathy’s char-rnn. You train it on some text, and then it can generate ‘similar’ text. I loved reading Jules Verne novels, so being able to just crank out some new novels whenever I feel like it sounds like a great idea right?

The hardest part was setting the environment up.

The text preprocessor was written in Python 2, and you’d think “hey it’s just wrangling some text what requirements does it need”. And it pulled in Cython, which numpy requires. Compiling Cython is always a bitch. I hate wading through compile scripts that I didn’t write myself that break.

The NN model itself is written in Lua and needs something called LuaJIT, which sounds like a faster variant of a Lua interpreter. Whatever, not interested in learning the language. Setting that up required a lot of compiling too.

In the end I managed to get everything setup, and realized that I couldn’t run the neural network on my GPU because everybody only writes for CUDA (thanks guys) and I have a Radeon HD 6870 (note: OpenCL won’t work out of the box with the Radeon Crimson 16.x beta drivers, the last ones to be released for Barts. You need Catalyst 15.7.1 WHQL for proper OpenCL 1.2 support).

Anyway. I took From the Earth to the Moon, Eight Hundred Leagues on the Amazon (I wanna read that!), and. The Secret of the Island (I just found out that this was written by someone else, and only translated by Verne!) and put them all in one huge  text file.

Training the neural network took all of my CPU. Since I was running in the Bash shell for Windows 10 there was no way I could’ve gotten it to run on the GPU anyway.

After a day or two of training (and about 20K iterations) the virtual Jules Verne spat this out:

& thon tole, by seemed profufess and metal stations requirements of Judge Ribeiro, which degrees.
They have been descended some caber.

“Imlet in “the struggled a pressure, we must attracted by Recthman. That is more principants of the amazing,
nothing the course were so
craw of the same peculiable perpetibilitions of the
life of Judge Jarriquez frave to given
arrive
to violence. On the forests, with the _Tapperto_”_

The companion, and
it like a close stretter had the traveler considerable than to the earth where the
quality.

On thick the Gun Club; what they disarsed of the idea of twenty-keeping them. But to this day as to me, fix _“jussiba!”

“Aboats of do.
During the darkment is recovertaken a despaited that public Street!”

The long poblen the: dembnoit at langual paralle _ther of the Amazon?”

“What a compressed to Project Gutenberg-textected to this apparent for hourbascure
was no doubt, when he was with in its finished, “there would return
with all, or rather journey.

The step of the document seen to ask supering and have been liness; it would true. Not
the diamobas writable intomarier of
great a previous; after less mean them.

And which simple with more than the villal
work of the projectile would have indeed the
topped
the moon?
Work of the pounds
to proceed, at
them in at over Joam Dacosta dadled for
the refund of Sateltences comply up the certain soon in the right of nigmon.
The
Sound the projectile, and without retrew of the best
conquernts misceldt.

He as reply, the loud, the two gran! Unifer–“it is inches that the mass approach
it, and we branches–that I cabinged that a
comes ank, low.
They reprisonation of the metal plantant mashed
which his destity proper profession of a large feeting his none-wall of the gas, free cupiness, through
frittle for the
jokes Donselver, putting scars of carrying an edgars a fear one of the
Rodroats
as soverlocks at the Chaboy, you, she not have the syriy the coupo

Clearly it seems I should’ve just removed the Project Gutenberg prefaces from the training text.

Anyway it’s doing kinda well for a neural network that doesn’t understand English, and besides, doesn’t even understand the concepts behind the words.

Apple’s custom GPU, Imagination, and why no one should’ve been surprised

Apparently Imagination lost 60% of its stock value after it was revealed that Apple wouldn’t be shipping its PowerVR GPUs within two years.

60%. Do you know what that means? It means many people were surprised. But why? If people had kept their heads to the ground they’d have noticed long ago.

Apple was always about vertical integration. The custom CPUs were already a big sign. And back in October 2016, the incredible David Kanter already published an article about A Look Inside Apple’s custom GPU for the iPhone.

You know what this also means?

Someone had shorted their Imagination stock.

WordPress on Low Memory Servers

This site runs on a 512MB DigitalOcean droplet.  Every week or two the Linux kernel would kill MySQL for using too much RAM… and restarting it got tiring.

After the tweaks

When I first start nginx, php-fpm and MySQL, the memory usage starts at 370MB and only goes upwards from there. htop (and journalctl) tells me that MySQL is the biggest offender, so let’s start there.

MySQL

After some googling one parameter always popped up in every guide: innodb_buffer_pool_size. MySQL docs state that it’s 128MB by default. So I reduced it to 8MB to see what would happen. After all, I doubt the entirety of my posts on this site+Wordpress settings could ever reach 8MB, and even if it did, it’s backed by an SSD, so who cares.

It made a huge difference. MySQL went from using 18% of my RAM (~92MB) to just 8% (43MB).

PHP

Some Googling led to A better way to run PHP, which advocates using pm = ondemand instead of pm = dynamic.

Before, php-fpm had 3 child processes taking up 18% of my RAM. Now, it has none, and it just spins them up on demand which makes a lot of sense.

I don’t use php-fpm pools so I ignored that part.

390MB memory on fresh start -> 140MB!!

The system now uses ~140MB of RAM and I’m pretty proud of myself – after just two tweaks! Nobody cares about this site right now but if traffic picks up I’ll just enable nginx microcaching.

From Python to Google Go and Life

Now that I’m an adult, I find that doing things on the side these days is nigh unsustainable when one has to spend most of the day making a living. Besides working out almost everyday and reading articles on entrepreneurship like I used to devour articles on dating, there’s no time left but to get a good 8 hours of sleep.

But recently I got a chance to study Golang. As an enthusiastic Python developer, Go shows up as a language that has the same philosophy, but just happens to be compiled, statically typed, and have better support for concurrency.

Documentation

The documentation is incredible. You can even do the Tour of Go on localhost by simply installing the gotour package.

go get golang.org/x/tour/gotour

But I never learned anything from that because the Go Tour is just a museum of code snippets that show you Go’s features.

As usual, the best way to learn is to implement some utility that you want for your own in Go. For this, the go doc command is incredible. For example, this is the output of ‘go doc json’, straight from the terminal:

shinichi@ayanami ~/go/src/github.com/randomshinichi/goutil $ go doc json
package json // import "encoding/json"

Package json implements encoding and decoding of JSON as defined in RFC
4627. The mapping between JSON and Go values is described in the
documentation for the Marshal and Unmarshal functions.

See "JSON and Go" for an introduction to this package:
https://golang.org/doc/articles/json_and_go.html

func Compact(dst *bytes.Buffer, src []byte) error
func HTMLEscape(dst *bytes.Buffer, src []byte)
func Indent(dst *bytes.Buffer, src []byte, prefix, indent string) error
func Marshal(v interface{}) ([]byte, error)
func MarshalIndent(v interface{}, prefix, indent string) ([]byte, error)
func Unmarshal(data []byte, v interface{}) error
type Decoder struct{ ... }
func NewDecoder(r io.Reader) *Decoder
type Delim rune
type Encoder struct{ ... }
func NewEncoder(w io.Writer) *Encoder
type InvalidUTF8Error struct{ ... }
type InvalidUnmarshalError struct{ ... }
type Marshaler interface{ ... }
type MarshalerError struct{ ... }
type Number string
type RawMessage []byte
type SyntaxError struct{ ... }
type Token interface{}
type UnmarshalFieldError struct{ ... }
type UnmarshalTypeError struct{ ... }
type Unmarshaler interface{ ... }
type UnsupportedTypeError struct{ ... }
type UnsupportedValueError struct{ ... }

And you can go deeper and ask for documentation on the functions and structs too:

shinichi@ayanami ~/go/src/github.com/randomshinichi/goutil $ go doc json.Token
package json // import "encoding/json"

type Token interface{}
A Token holds a value of one of these types:

Delim, for the four JSON delimiters [ ] { }
bool, for JSON booleans
float64, for JSON numbers
Number, for JSON numbers
string, for JSON string literals
nil, for JSON null

I only needed the internet to figure out how people usually did things in Go. For the specifics, this go doc command was incredible – never even had to leave my terminal.

The Result

A few hours took me from Hello World to a little utility that mirrors a directory structure with empty files. Why? My photos have important information in their filenames, and I want to write scripts that mess around with said filenames. Not going to do that on my real photo collection.

package main

import (
    "fmt"
    "path/filepath"
    "os"
)

func mirror(path string, info os.FileInfo, err error) error {
    relpath, _ := filepath.Rel("/Volumes/Toshiba 2TB/pictures/", path)
    fmt.Println(relpath)
    if info.IsDir() {
        os.Mkdir(relpath, 0755)
    } else {
        file , err := os.Create(relpath)
        file.Close()
        if err != nil {
            fmt.Println(err)
        }
    }
    return nil
}

func main(){
    fmt.Println("goutil starting")
    filepath.Walk("/Volumes/Toshiba 2TB/pictures/Photos", mirror)   
}

Things I noticed

Functions in Go usually return two values, the result and an error object. To receive both into variables you need := instead of =. I ran os.Create(), and some directories would have files in them but others wouldn’t, so I wanted to print the error object that os.Create() returns. However, it also returns a file object, and you can’t ignore that because the go compiler complains. That was seriously frustrating but it turns out that I did need the returned file handle because I was hitting the max open files limit. Good language design I suppose.

I have to say, error checking in returned values clutters up the code. Just look at the if statements above. This would be much more poetic in Python because of exceptions, which are implicit and bubble upwards from code below. Still, it probably doesn’t get much better than this in the compiled language world. Also, this can still be properly mitigated by keeping functions single purpose.

To ignore a return value, use _

Programming Languages and Social Issues

Afterwards I read Rob Pike’s blogpost on why people weren’t moving from C++ to Go as he had originally thought. It wasn’t about “the better tool for the job”, or productivity, or ease of maintenance. Simply put, C++ let you have control over everything, absolutely everything, and people who program in C++ like it that way, while Go has a garbage collector.

I get it, having control over absolutely everything, if only you knew enough about the language, is empowering.

However, I found that I really appreciate it when computers help me accomplish something and then get out of the way, like a tool. That’s why I use a Mac.

The choice of programming languages is now an ideology, a philosophy of life. Which brings us to the next question:

Does the inability of Lisp to gain popularity say something about the people who use it, and their life strategy?

Apparently it does, and I quickly found some articles about it.
Rudolf Winestock’s The Lisp Curse is the most plausible and well explained.
Mark Tarver’s The Bipolar Lisp Programmer is the just pithy and poetic, and it shows you what 56 years of living can do for your experience and knowledge.

The Lisp Curse also linked to Stanislav Datskovskiy, whose very writing radiates hatred, “I’m better than you-ness”, and a sense that he really is incredibly brilliant, which does no favours for his ego. I’ve been there, come back to earth, and I have just this one thing to say: he probably doesn’t get to fuck much.

And that was my day spent learning Google Go. In the end I guess I learned more about different walks of people than anything else.

Don’t add Django migrations to version control

Django docs recommend a lot of things which don’t really work out in reality.

For instance, having a single settings.py file. The only reason I can think of having it this way is so as not to overwhelm newcomers, because you need different settings for dev, production and staging.
A single tests.py for an app. Seriously, guys? You need test_views.py, test_models.py, test_manager.py, at least.

And here’s the one thing that might be useful, might not: committing your migrations to version control.

Here’s the situation: you’re working on the models for a Django app.

class Dog(models.Model):
    owner = models.ForeignKey(Person, on_delete=models.CASCADE)
    name = models.CharField(max_length=255)
    age = models.IntegerField()

Hmm, perhaps age should be date-of-birth. After all, you don’t want to have to write some script to be updating the age value for every dog every year. No, perhaps it should be named dob after all.

So you make the change in models.py and add the migrations for the changes as well, they get saved as 0002_renamed_age_to_dob.py or something.

Problem is, your colleague has been working on the very same model, and he had added some other fields too:

class Dog(models.Model):
    owner = models.ForeignKey(Person, on_delete=models.CASCADE)
    name = models.CharField(max_length=255)
    dob = models.DateField()
    fav_foods = models.CharField(max_length=255, blank=True)
    potty_trained = models.BooleanField(default=False)

The migration that changes the first code snippet to the second in PostgreSQL gets saved as 0002_added_fields.py, and now your django-migrate fails after a git pull. Because 0002_added_fields.py assumes that age is still there and it’s a models.IntegerField, but the database on your machine looks different because you already renamed age to dob. Git can resolve conflicts between your models.py, but not between the Django migrations.

So it’s best to just not add those migrations to the code repository unless you’re really sure that only one guy is working on the models. Because if more than one guy is working on the models, then everybody’s databases are different and the migrations can’t resolve that. You might as well do django-manage makemigrations from scratch.

Unless, of course, you already have data in there.

Easily Understandable Machine Learning Tutorials

Machine Learning

  1. Dr Jason Browniee’s Machine Learning Mastery is for everybody, even total beginners, who’ve heard the hype about ML, and want to get in on the action. It’s step by step, simple and very easy to follow.
  2. Once one has done quite a few chapters into Machine Learning Mastery it’s good to check out ujjwalkarn’s Github list of Machine Learning and Deep Learning Tutorials.

Neural Networks

  1. If you’re totally new to neural networks, Andrej Karpathy’s Hacker’s Guide to Neural Networks is hands down the best. No complex formulas, no jargon, just plain and simple concepts demonstrated with code (which you should definitely rewrite to fully understand).
  2. Genetic Algorithms for tuning the weights of an existing neural network: pretty cool, although this is just one way to train a neural network. I’d imagine one would use the gradients in Karpathy’s tutorial to get to a pretty well trained neural network, and then use the genetic algorithm to make small improvement tweaks from there.