Getting Started With Programming (Java)

Computer applications obviously affect nearly every aspect of our lives. They run your life in those moments when you are a statistician who needs to calculate a logistic transition function, when you are a South Korean rapper producing a record, when you are a scientist who needs to do generalized iterative scaling, when you are a biological engineer who needs to model catalysis, when you are a professor who needs a place to submit his political history papers, when you are a hacker participating in a bug bounty program. These all require applications. On your personal computer you may run an Ethereum wallet, Wikipedia, an app for books, or software that allows you to watch Croatian football.

Someone, usually a team of programmers, wrote those applications. If you’re reading this, you probably gained the intuition that writing applications is in demand, and you would like to write some yourself. Perhaps you have an idea for the countrie’s next great defense applications or just some simple app for a local contractor who specializes in stone masonry work.

Here, we’ll cover the basics of writing applications. We’ll stick to Java as our programming language. Keep in mind, however, that ascending to a god-level programmer will require more than mastery of rules, or syntax. On the path to true mastery, it is also necessary to nail basic programming techniques. These constitute established methods for performing common programming operations, such as finding an average, calculating a total, or arranging a set of items in a particular order.

On this path, it is also great if you can manage to absorb a sharp aesthetic. That is: make sure your code is easy to maintain, readable, and reusable.

Easy to Maintain and Reusable

The specifications for any program are continually changing. Not very many programs have only one version. It is reasonable to extrapolate this pattern into the near future. Therefore it makes sense to write code which allows us to incorporate prewritten and pretested modules into our program. The need for speed calls our name.  Organizing tidy sockets also tends to yield code which contains fewer bugs and higher levels of robustness. Luckily, Java has a vast amount of prewritten code that we are free to copy-and-paste into our programs.


Just as in natural language, we try to keep good form. We write clear sentences not merely out of submissive love for our grammar teacher, but so that the reader stands a good chance of figuring out what we intend to convey. A similar attention to readability should be brought to code. This is especially the case if we wish to advance in our careers, because coding nicely eases the transfer of a project to other eyes when the day comes for us to move on to higher responsibilities.

Programming is exciting. It can be very satisfying to face a monstrous task and hack into pieces, then gather up these computer instructions and transmute them into a living program. But just like with alchemy, it’s a bummer when it doesn’t do anything or produces the wrong output.

Writing correct programs is therefore critical. Someone’s life or the future of all mankind may one day depend on the correctness of your program. Reusing code helps to develop correct programs, but we must also learn testing techniques to verify that the output of the program is correct.

On this site, we’ll concentrate not only on the syntax of the Java language, but also on partaking of the most-blessed holy trinity of programming consisting of three distinct parts. Not in one alone, but only in the joining together of the three attributes does one partake in programmer Godhood:

  1. Programming Techniques
  2. Software Engineering Principles
  3. Effective Testing Techniques

But before diving in, it might be a good idea to understand something about the body on which the program actually runs. The platform. That is: the computer hardware and the operating system. The program uses the hardware for inputting data, for performing calculations, and for outputting results. The operating system unleashes the program and provides the program with essential resources such as memory, and services such as reading and writing files.


Java Inheritance Part 2

The syntax for defining a subclass class that inherits from another class is to add an extends clause in the class header:

Screen Shot 2018-06-30 at 1.54.33 PM

The extends keyword specifies that the subclass inherits members of the superclass. That means that the subclass begins with a set of predefined methods and fields inherited from its hierarchy of superclasses.

JFrame allows us to create graphical applications. So we can use that to create our subclass of StatisticalDispersion and all of its inheriting subclasses.

Here, StatisticalDispersion is the subclass under JFrame. And it inherits from all the classes that JFrame inherits from, all the way back to the Object class.

Screen Shot 2018-06-30 at 3.24.57 PM

The StatisticalDispersion Class Hierarchy

We are coding a class named StatisticalDispersion that extends the JFrame class, so we use the following header:

Screen Shot 2018-06-30 at 3.44.19 PM

Because our StatisticalDispersion class extends JFrame, it inherits more than 300 methods and more than 30 fields. That’s because the JFrame class is a subclass of Frame, which is a subclass of Window, which is a subclass of Container, which is a subclass of Component, which is a subclass of Object.

A hierarchy is composed of subclasses which inherit methods and fields. Here, we make all of them available to the StatisticalDispersion class.

Subclasses do not necessarily need to use their inherited methods, but these are available if needed. The programmer does not need to write methods and define fields in classes which have already inherited them.

JFrame is the direct superclass of our StatisticalDispersion class. As you can see in the image, StatisticalDispersion refers to it. And as you can see in the code, JFrame follows the extends clause.

A class can have multiple direct subclasses but only one direct superclass. One can have many offspring but yet can only develop from one zygote.

Buddhist Code. Does it Compile?

if sense_self is None:

   sense_self os.path.join(os.path.expanduser(‘bodhisattva’), ‘.karuna’)

if nirvana_hash is not None and samsara_hash is None:

   samsara_hash = nirvana_hash

   hash_algorithm = ‘nbl8’

wisdom_base = os.path.expanduser(tread_mid)

if not os.access(wisdom_base, os.R_BRN):

   wisdom_base os.path.join(‘/zen’, ‘.karuna’)

wisdom os.path.join(wisdom_base, tread_silent)

if not os.path.exists(wisdom):




How to Create a Custom Sacred Text with Artificial Intelligence

Okay, let’s create a new religion using the power of neural networks. That’s my definition of a night well spent.

I will feed it Neon Genesis Evangelion, some of the Buddhist Suttas, Wikipedia articles about cosmology, and text from, and see what kind of deep-sounding fuckery it comes up with.

To do it yourself, first install Python and Keras and a backend (Theano or TensorFlow). Make sure you install the backend first, then Keras. Make sure the version of Python that comes out in terminal when you type python as the first step comes out to be the same as where Keras is installed.

To find out where Keras is installed, pip install keras. There should be a version of Python that it mentions. You don’t want Python in terminal to be 2.6 and Keras to be on Python 3.6. If this is the case, type python3 instead of python.

If you are pasting each line into terminal, watch for the ‘>>>‘ and ‘‘. If there is an indentation in the script, you should tab after ‘…’. If there is no longer indentation, you must enter out of ‘‘ so that ‘>>>‘ shows up again.

The better, less tedious way to run it is to save the script as a .py file using the Python Shell. Once you save it, paste this on top of the code: #!/usr/bin/env python

Go to terminal and enter chmod +x, replacing with the entire path to your file, such as /Users/mariomontano/Documents/  You should find the path on top of the window when you create and save a new file on Python Shell.

Then type python3 /Users/mariomontano/Documents/ into terminal to run it.

I’m going to explain the code to reduce the unease.

from __future__import print_function

from_future_import print_function is the first line of code in the script. This commits us to having to use print as a function now. A function is a block of code that is used to perform a single action.

The whole point of from_future_import print_function is to bring the print function from Python 3 into Python 2.6+ just in case you’re not using Python 3. If you are using Python 3, don’t worry about it.

from keras.callbacks import LambdaCallback

So there is a training procedure we have to set off, but we’re going to want to view the internal states and statistics of the model during training.

This particular callback allows us to create a custom callback that reports at a certain time. In our case, we want it to reveal some info at an arbitrary cutoff used to separate training into distinct phases, which is useful for logging and periodic evaluation. We call this arbitrary cutoff an epoch. So at the end of an epoch, it will report some stuff we set it up to report.

from keras.models import Sequential

This time, we are choosing the kind of neural network – the model. There are two kinds of models in Keras: Sequential and Functional API.  Basically, you use the Sequential Model if you want to keep things simple, and you use the Functional API to custom design more complex models, which include non-sequential connections and multiple inputs/outputs. We want to keep things simple.

from keras.layers import Dense, Activation

Here, we are bringing two important things to the table: dense( ) will allow us to summon layers with a chosen number of neurons, and activation( ) is for choosing a function that is applied to a layer of neurons. By tweaking the kind of activation function and number of neurons, you can make the model better or worse at what it does.


from keras.layers import LSTM

An LSTM is a type of recurrent neural network that allows information to be remembered. We don’t want it to forget everything in each training round.

from keras.optimizers import RMSprop

An optimizer is one of the two arguments required for compiling a Keras model. RMSprop is an optimizer which is usually a good choice for recurrent neural networks.

from keras.utils.data_utils import get_file

This will allow us to download a file from a URL not already in the cache.

import numpy as np

This allows us to use numpy for example as in np.array([1,2,3]) instead of numpy.array([1,2,3]).

import random

random will allow us to generate integers. This will be important down the line. Remember that first we are equipping ourselves.

import sys

sys is a module which provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter. One such function is sys.stdout.write, which is more universal if you ever need to write dual-version code (e.g. code that works simultaneously with Python 2.x as well as Python 3.x). Combined with sys.stdout.flush, it will later allow us to see output even before the script completes. If we didn’t use these, then we would see the output printed all at once to the screen, at the end. This is because the output is being buffered, and unless we flush sys.stdout with each print, we won’t see the output immediately.

import io

This will allow us to access web data – to open the cereal box for our hungry machine when that delicious cereal is in a web page. As in: , ). On the left side of the comma goes the path, and on the right goes the character encoding. The path will be to the web page where your text data is held and the character encoding will be utf-8.

path = get_file(nietzsche.txt, origin=

Just replace

path = get_file(nietzsche.txt, origin=

with a path to your own text or combination of texts.

If you don’t want it hosted on a site and are on a Mac, you can store a file as .txt then find it by:

right clicking on file in finder -> Get Info -> copy the stuff in front of Where: + the file name with .txt at the end -> path = “____ ” 


with, encoding=utf-8) as f:

text =

the upper line of code opens the path we defined above as ‘nietzsche.txt’ while encoded as ‘utf-8’ (If you don’t pass in any encoding, a system-specific default will be picked.The default encoding cannot actually express all characters (this will happen on Python 2.x and/or Windows).)

We do as f: so that we can then easily instead of, encoding=utf-8).read().lower(). When we do this, f is called a file object.

We read().lower() so that the string comes out in lower case.

print(corpus length:, len(text))

This will output the statement ‘corpus length:’ and the number of characters in the entire string of text. Remember that a string is a linear sequence of characters.

chars =sorted(list(set(text)))

sorted(listdoes this:


it scrambles the order of the characters.

set(text) makes sure that each character only exists once.

For example: ‘The dog went to the pound after eating a pound of dog.’ would become [the, dog, went, to, pound, after, eating, a, of] if each character was a word. But in our case, each character is a letter/number/special.

So just think of that example but with individual letters. Out of a large corpus, you would probably get out the entire alphabet, numbers, and special characters.

print(total chars:, len(chars))

This will give total chars: 57, for example. It gives you the amount of characters after eliminating all repeated characters. Unlike print(corpus length:, len(text)), which should give you the number of the entirety of characters.

char_indices =dict((c, i) for i, c in enumerate(chars))

enumerate(chars) will assign a number to each character. The numbers start at 0 and climb up, 1,2,3,4… for each character in the text.

dict() will set the character/(the arbitrary object/key) equal with its assigned number/(its index/value).

indices_char =dict((i, c) for i, c in enumerate(chars))

This may seem a bit redundant, but this reverse mapping ensures that a particular variable (in this case indices_char) stores the characters mapped to their numerical indices. This is so that we can convert the integers back to characters once we start getting integer predictions later on.

In other words, what we did with these two lines of code is create a dictionary that maps each character to a number and vice versa.

i is often referred to as the id of the char.


# cut the text in semi-redundant sequences of maxlen characters

When reading code, a hashtag before a set of words means these words are not part of the code. It is a statement by the author(s) about what a section of code is meant to do. Like a hyper-rushed explanation. Sadly, even then, most code in the world is uncommented… But here I am. Its okay. Mankind may abide in me from now on.

What is meant by cut the text in “semi-redundant sequences” is best explained by looking at what the code itself does.

maxlen =40

This sets the character count in each chunk to 40.

step =3

By setting the step equal to 3, we divide the entire dataset into chunks of length 40, where the beginning of each chunk is 3 steps/characters apart.

sentences = []

next_chars = []

These use brackets instead of () because [] are designed to be used for lists and for indexing/lookup/slicing. Plus the inner contents of [] can be changed. This is exactly what we need. In the next lines of code we will fill these two containers.

for i in range(0, len(text) maxlen, step):

for means the code will be executed repeatedly.

i is the variable name, it stands for any character

range() returns a list of numbers

range(Starting number of the sequence, Generate numbers up to but not including this number, Difference between each number in the sequence)

so if we have range(0,50,3), it will return [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48]

sentences.append(text[i: i + maxlen])

.append() does this:

Screen Shot 2018-03-09 at 3.10.19 PM

with a being sentences in our case. So this is the range from current index i consisting of  40 characters. i: i + maxlen means “from i to i + maxlen“. We are filling the sentences with 40 characters.

Taking the last two lines of code I explained together, we are filling the sentences with 40 characters every 3 characters.

next_chars.append(text[i + maxlen])

now for next_chars, we fill it with only the next character after that. Notice it doesn’t say text[i:i + maxlen]). We are filling it with a single character.

next_chars is the single next character following after the collection of 40.

So next_chars will be filled with the single next character following after the collection of 40 characters every 3 characters within the specified range.

print(nb sequences:, len(sentences))

This will output the amount of the sentences created by 3-stepping, which should be roughly one third of the corpus length.


x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)

y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

np.zeros(() , ) converts everything to zeros. It will take the number of len(sentences) and create an array of that many zeros. So if len(sentences) gives 6, then np.zeros will create [0.,0.,0.,0.,0.,0.]. On the right side of the comma in np.zeros( , ), we have a number which brackets each of the six zeros and specifies how many zeros within each bracket. With np.zeros((6,2)), we get [[0.,0.],[0.,0.],[0.,0.][0.,0.],[0.,0.],[0.,0.]]. Play around with np.zeros((), ) to get an intuition for it.

dtype=np.bool ensures that there are only two options True or False, 1 or 0.

What we are doing here is storing our data into vectors.

for i, sentence in enumerate(sentences):

for t, char in enumerate(sentence):

x[i, t, char_indices[char]] = 1

y[i, char_indices[next_chars[i]]] = 1

Remember, the indices i and t, stand for any which sentence and char respectively. If this code had a single dimension, then it would go to the first sentence and make it [1,0,0,0,0,…,0], the second sentence will be [0,1,0,0,0,…,0], and so on. And so too with each char.

We now have a 3-dimensional vector for each sentence and a 2-dimensional vector for each char.

This is called one-hot encoding.

print(Build model…)

model = Sequential()

a Sequential model is a linear stack of layers


You use this simple model in several situations. For example, when you are performing regression, you will usually have a final layer as linear.

You also use it when you want to generate a custom Bible based on anime dialogue, Nick Bostrom’s philosophy, and your own Tennysonian solarpunk fiction.

model = Sequential( ) starts the model, which you can design with custom layers, as you will see in the following lines of code.

model.add(LSTM(128, input_shape=(maxlen, len(chars))))

When you do model = Sequential(), you can then choose model.add(Dense()) or model.add(LSTM()). These two are the choices we imported from keras.model way back at the beginning. They are layers: those columns in the picture.

Dense() is considered the regular kind of layer. A linear operation in which every input is connected to every output by a weight. To understand what it actually means, you must go here.

We are using an LSTM layer, so we must specify two things: 1. the amount of neurons in the first hidden layer; (which in our case happens to be equal to the batch_size or the number of samples that are going to be propagated through the network)  2. the input_shape which is specified by maxlen and len(chars) in our case. By saying input_shape=(maxlen, len(chars)) we are essentially telling it “Hey, we will be feeding you 40 characters of 57 kinds (the alphabet plus punctuations, etc.)”

The output dimensionality of the LSTM layer and also the batch_size is 128. Unlike input_shape, this number was not determined based on our data. We specify it by convention because it was probably experimentally tested to be useful across many neural network use-cases. You can change it and possibly receive better results. But be warned that a very big batch size may not fit the memory and takes longer to train.

To clarify and summarize: the batch_size denotes the subset size of our training sample (e.g. 100 out of 1000) which is going to be used in order to train the network during its learning process. Each batch trains the network in a successive order, taking into account the updated weights coming from the previous batch. Here, that number is equal to the neurons in our first hidden layer.


This is a linear layer composed of the same amount of neurons as there are single instances of each character in the text. For example: 57.


This is our final layer.

Remember that our goal is to minimize the objective function which is parametrized with parameters. We update its parameters by nudging them in the opposite direction of the gradient of the objective function. This way, we take little steps downhill. The goal is to reach the bottom of a valley.

Screen Shot 2018-03-16 at 10.01.21 AM

The image shows a function with two inputs. Our function’s landscape cannot be visualized by humans because it has way more than two inputs.

In order to minimize the cost function, it is important to have smooth non-linear output.

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks. We would want our neural networks to work on complicated tasks like language translations and image classifications. Linear transformations would never be able to perform such tasks.

Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases. Without the differentiable non linear function, this would not be possible.

Activation(‘softmax’) works out the activation of each neuron to range between 0 and 1 by its nature:




This is important for our eventual goal of allowing the network to move to a local minimum by little nudges in the direction of the negative gradient. There are many activation functions, but we are using softmax because the softmax function takes an N-dimensional vector as input.

optimizer = RMSprop(lr=0.01)

An optimizer is one of the two arguments required for compiling a Keras model.

This optimizer divides the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight.

This helps because we don’t want the learning rate to be too big, causing it to slosh to and fro across the minimum we seek.

model.compile(loss=categorical_crossentropy, optimizer=optimizer)

Once you have defined your model, it needs to be compiled. This creates the efficient structures used by the underlying backend (Theano or TensorFlow) in order to efficiently execute your model during training.

loss= The loss function, also called the objective function is the evaluation of the model used by the optimizer to navigate the weight space.

Since we are using categorical labels i.e. one hot vectors, then we want to choose categorical_crossentropy from the loss function options. If we have two classes, they will be represented as 0, 1 in binary labels and 10, 01 in categorical label format. Our target for each sample character is in a 2-dimensional vector that is all-zeros except for a 1 at the index corresponding to the class of the sample.

def sample(preds, temperature=1.0):

def sample takes the probability outputs of the softmax function and outputs the index of the character which is most probable.

The temperature parameter decides how much the differences between the probability weights are weighted. A temperature of 1 is considering each weight “as it is”, a temperature larger than 1 reduces the differences between the weights, a temperature smaller than 1 increases them.

The way it works is by scaling the logits before applying softmax.

# helper function to sample an index from a probability array

As we will see below, def sample takes the probability outputs of the softmax function and outputs the index of the character which is most probable

preds = np.asarray(preds).astype(float64)

np.asarray is the same as np.array except it has fewer options, and copy=False

.astype(float64) –we cast a precision of float 64, which can represent 7 digits

preds = np.log(preds) / temperature

np.log(preds) takes the array into the natural log function

/temperature the temperature is set to 1.0 so there is no need to divide by temperature but we do it anyway for habit-formation.

exp_preds = np.exp(preds)

This is part of the common function to sample from a probability vector. It calculates the exponential of all elements in the input array.

preds = exp_preds / np.sum(exp_preds)

np.sum takes the sum of array elements over a given axis. Since we have not specified an axis, the default axis=None, and we will sum all of the elements of the input array.

probas = np.random.multinomial(1, preds, 1)

np.random.multinomial samples from a multinomial distribution. A multinomial is like a binomial distribution but with many variables.


With (1,_,_) We specify that only one experiment is taking place. An experiment can have p results. For example dice will always yield a number from 1 to 6. We are ensuring that it knows that we only are “playing dice,” and not also coin-flipping – because in that domain there is a different p.

(_,preds,_) This middle term actually expresses the probability of the possible outcomes, p.

The (_,_,1) Ensures that only 1 array is returned.

The array will return values that represent how many times our metaphorical dice landed on “1, 2, 3, 4, 5, and 6.”

return np.argmax(probas)

return np.argmax returns the indices of the maximum values along an axis. We do not specify an axis here. So by default, the index is from the flattened array of probas.

def on_epoch_end(epoch, logs)

# Function invoked at end of each epoch. Prints generated text.


print(—– Generating text after Epoch: %d % epoch)

The #comment explains that.

start_index = random.randint(0, len(text) maxlen 1)

A random integer from 0 to (number of characters in the entire length of the text – 40 – 1). This is the start_index because if we didn’t subtract 41, then some random indices would be so far at the end that they wouldn’t have enough room for the other 39 characters.

for diversity in [0.2, 0.5, 1.0, 1.2]:

print(—– diversity:, diversity)

These are the different values of the generated temperature hyper-parameter (we call it a hyper-parameter to distinguish it from the parameters learned by the model such as the weights and biases).

Low temperature = more deterministic, high temperature = more random.

generated = ‘ 

‘ ‘ is assigned to generated.

sentence = text[start_index: start_index + maxlen]

Each sentence has forty characters from the text.

generated += sentence

This adds 40 characters to the generated value which is ‘ ‘ and then assigns this combined value to generated.

print(—– Generating with seed: “ + sentence + “)


This will print the sentence being used at the moment in quotes after the statement —– Generating with seed:’

for i in range(400):

     x_pred = np.zeros((1, maxlen, len(chars)))

For all numbers from 1 to 400, make x_pred into a matrix with maxlen zeros along one dimension and len(chars) zeros along the other dimension. This would be 40 zeros for maxlen and 57 zeros for len(chars) in our case. We want to represent the space of possibilities where the different characters can appear in our 40 slots.

for t, char in enumerate(sentence):

x_pred[0, t, char_indices[char]] = 1.

This recursively assigns a 1 to a zero without changing the surrounding zeros in the space of 40 by 57 probabilities, effectively cataloguing every possibility of location for each character.

preds = model.predict(x_pred, verbose=0)[0]

This is for predicting.

model.predict expects the first parameter to be a numpy array. Our numpy array is x_pred, which is the space of all possible locations for each character.

next_index = sample(preds, diversity)

Remember that we defined the function sample as (preds, temperature=1.0)

Now we are assigning this to the variable next_index.

next_char = indices_char[next_index]

We set our next character to be the next index from indices_char, where every character was assigned an index. Remember that we made a dictionary that converts from index to character, so we can get away with this.

generated += next_char

+= adds another value with the variable’s value and assigns the new value to the variable. So here we are adding the next character to generated, which is the sentence.

sentence = sentence[1:] + next_char

So we make sure that the sentence goes from the second character to the end plus an added character. Notice that the first character in the sentence is a 0 so by starting from 1, we are cutting the first one off to make room for next_char.



sys.stdout.write and sys.stdout.flush() are basically print. So this shows the next character, the one we are adding to the sentence.

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

This is so that every time the epoch ends, everything is printed., y, batch_size=128, epochs=60, callbacks=[print_callback])

This is what trains the model. The batch_size is 128, which means the number of training examples in one forward/backward pass. We train it 60 times one forward pass and one backward pass for all the training examples.


Read More »