I remember reading an article… or maybe it was a book about Google where it mentioned that Sergey Brin and Larry Page were researchers during their time at Stanford University and the code that they wrote for their search engine (then called Backrub) wasn’t anything groundbreaking. The algorithm implemented was the real backbone but the program itself wasn’t really optimised. Although the interface was simple — a plain input box on a blank white page, so much so that testers waited for some time before typing in the text box because they were “waiting for the page to load” — it wasn’t professional level code. Search engines back then were cluttered with other useless junk that made it harder for people to actually get to what they needed. Google was different. It was new.
What am I trying to say here? Well, the fact that the first coders at Google weren’t big optimisation fanatics when they were creating their soon-to-be multi-million dollar project and then recently the development of Go, an entire language created for optimisation, says a lot.
I was wondering about this today when I was writing a simple blog today. It was made using Django as an API and ReactJS as a frontend that feeds off the API. I use PyCharm for most of my Python work and since I’ve a Jetbrains Student license, I have access to other IDEs by them. But I realised that I had grown too comfortable to all those auto complete features so I decided to write this project using 100 % open source tools (with Github being the sole exception). So I fixed at the gnome-terminal, Atom, create-react-app for bootstrapping my React app and finally, Git for my version control. With all the boiler place code set up, I went to work. I always start with creating my models. Since this was supposed to be a blog, I created a post
app and opened the models.py
file. With that, I wrote the following code —
from django.db import models
from django.utils.text import slugify
from django.db.models.signals import pre_save
class Post(models.Model):
title = models.CharField(max_length=50)
content = models.TextField()
description = models.TextField(blank=True, null=True)
thumbnail = models.ImageField(upload_to='post_thumbnails')
draft = models.BooleanField(default=False)
slug = models.SlugField()
author = models.CharField(max_length=30, default='mentix02', blank=True)
timestamp = models.DateTimeField(auto_now_add=True, auto_now=False)
def create_slug(instance, new_slug=None):
slug = slugify(instance.title)
if new_slug is not None:
slug = new_slug
qs = Post.objects.filter(slug=slug).order_by('-pk')
exists = qs.exists()
if exists:
new_slug = '{}-{}'.format(slug, qs.first().pk)
return create_slug(instance, new_slug=new_slug)
return slug
def pre_save_post_receiver(sender, instance, *args, **kwargs):
if not instance.slug:
instance.slug = create_slug(instance)
pre_save.connect(pre_save_post_receiver, sender=Post)
I really hate how Medium displays code. There isn’t any syntax highlighting or auto indenting — another clue for how dependent I’ve become on expensive code editors. Regardless, the fact that the model fields weren’t arranged in ascending order according to their length was really ticking me off. As in, the code seemed really chaotic and it’s just a personal taste of my to have the shortest lines of code at the top and descend down in order. I know that it’s inevitable sometimes because of control flow but in places where I just have to set attributes and where execution order doesn’t matter, I format it my way.
In PyCharm, I used the line moving shortcut to move lines upwards or downwards — ctrl+shift+<up>or<down>
. It was a great shortcut and I loved using it.
However, I didn’t know how to do that in Atom. So… instead of cutting and pasting things, I opened up a Python REPL shell. And got to work.
My first approach to the problem was really bad.
I decided to split the lines and store the code lines in a list. Then I’ll create a dictionary with the keys as the length of the code string and the actual value as the code itself. Sorting of the dictionary according to the keys will follow and then finally, a for
loop would print the formatted lines.
That, I now realise, is just horrible.
Instead a simpler way to do that would be —
Sorting the list of strings according to their lengths.
To be honest and in my defence, that is what I was trying to achieve but I figured I had to relate the lengths of the strings to themselves and a dictionary seemed like the way to go.
Anyway, I spent around ten minutes coding a function and the final result was this —
That sucks. I didn’t time them but it clearly looks worse than what I did later
Just. One. Line.
print("\n".join(sorted(fields, key=len)))
Regardless, it probed me to think more about optimisation. That took me back to my Computer Science classes. We were just recently studying sorting… and it was really boring. Not because it wasn’t interesting but because I had already studied about it. We started with the simplest of them — insertion sort. Now if you’re a Computer Science major or have studied it in some depth, you’d know that time complexities are a big mathematical part when dealing with things like sorting, searching, factorising, etc, etc but get this, our syllabus doesn’t even have the term “time complexity” mentioned anywhere. It’s true. Now why that is, I’ll get to in another post but long story short — our syllabus is really outdated and poorly designed.
And then we did bubble sort and after that… nothing. Our book mentioned Quicksort, Heapsort, Selectionsort and I think that’s it but it said that it’s not a part of our syllabus. I really don’t want to get into the state of CS at my school but long story short, we ended there. But I went home and looked up other sorting algorithms because I was dusty and wanted to revise everything. So I came across something called Sleepsort. It was a joke sort that was posted one 4chan by some user and basically, it was an incredibly inefficient sort that used the concept of printing numbers after the_value_of_number
milliseconds. Didn’t understand? Well, here’s the pseudo code for it —
function print_number(num) {
sleep num milliseconds
print num
}for num in numbers {
run in background print_number(num)
}
That’s it. Still don’t get it? Alright, take an example of the following numbers — [9, 3, 14, 4]. Clearly, they aren’t arranged in any order. What the sorting algorithm tries to do is loop through all the numbers and then print out the number n
after n
milliseconds. So it’ll first encounter the number 9 and so it’ll start a process in the background that will print the number 9 after 9 milliseconds. Without waiting for that process to finish, it’ll print out 3 after 3 second and so on and so forth.
I took inspiration from this answer on Quora and the author was kind enough to provide the time complexities for the following too — O(max(args)) in a perfect world where there’s no lag and computers are incredibly fast whereas O(n² + max(args)) is the actual complexity since, in the answerer’s own words —
The complexity of this algorithm in a perfect world is
O(max(args))
, as it will takemax(args)
seconds to print out the biggestarg
. In reality, the complexity isO(n^2 + max(args))
, because maintaining multiple background processes relies on the operating system to manage the context switching and prioritization of the processes, and so the algorithm basically outsources the actual sorting to the kernel.
Overall, I find topics like these really interesting and so I found a new fire within me to go on ProjectEuler once again and solve all the problems. I’m not there yet… but I’ve solved around fifty of them. A lot of them involved prime numbers of huge magnitude and the basic algorithms I learnt at school failed me —
def is_prime(n):
for i in range(2, n):
if n % i == 0:
return True
else:
return Falseprimes = [num for num in range(2, 100000) if is_prime(num)]
It seems really logical but division for huge numbers over a huge list is actually a pretty expensive operation. And so I Googled and learnt about Sieve Theory. This lead to Prime Number Generation. And finally, this gold article.
Long story short, I finally found out how effective methods (pun intended) and algorithms for finding out prime numbers for a huge number. A post regarding that might follow but until then, I’ve been Manan.
Well, I will be Manan for quite some time but sure. Have a good one.