rand9

Binary Search in Ruby, or, “Picking the Right Number, as Quickly as Possible”

So you’re writing a parser in C that parses the lines of a file. The line you’re parsing is made up of a 40 character key and any number of ip addresses after, space-separated. You need to know a max line length to read (because C is mean like that), but you’re not sure how many ip’s you can fit on a line for a given key.

Such was my case yesterday and decided to write a mini script in ruby to figure it out. My first stab was to iterate from 1 to 100 and checking the line lengths by literally building a line with x number of ip elements on the line. While the code was correct and produced the necessary information for the given inputs, it was horribly inefficient and so I decided to rewrite it to be smarter. Enter the Binary search algorithm.

Using the binary search algorithm, we take a lower and an upper bound of possible elements and try to quickly guess which number is the highest possible without exceeding the line limit. So here’s the concrete data we know. The line format (as described above) will look something like this:

1
86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 174.18.0.1 174.18.0.2 174.18.0.3 174.18.0.4 174.18.0.5 174.18.0.6

… with theoretically unlimited ips per line. The first value is a key we’ll use to store the ips against in a lookup table, but don’t worry about that right now. The key is generated using sha1 digesting, so we know it will always be 40 characters. The max length for any given ip address is 15 assuming all 4 blocks are at least valued at 100 (e.g. 100.100.100.100). Space-separating the key and x number of ips and your line length calculation is f(x) = kl + (el*x) + x where x is line length, kl is key length, and el is element length (ip address length). In other words, if we’re testing 50 elements on the line, the line length would be 40 + (15*50) + 50 which equals 840.

Now that we can arbitrarily calculate the length of a line based on the number of ip elements we want to test, we can start “guessing”. This isn’t guessing at all, we just split our possible range in half and use the middle ground to test the possible length. In other words, if my initial range is 1..100 (read as “anywhere from 1 ip element to 100 ip elements”), then our first test value for x above would be 50, which if you remember produces a line length of 840. I assumed that I’d be okay with a max line length of 1000 characters, and so we assert that if len is less than the max, then we can use the upper half of the range boundary, or 50..100. If len was more than our max of 1000, we’d take the bottom half, or 1..50.

Using this technique recursively we can whittle down to the exact number of ip elements that can be inserted on a line before we go over the limit of 1000 characters on the line, which happens to be 60. You know you’re done checking when your range is only one element apart, in this case 60..61. With my first solution to iterate up from 1 to 100, this meant we had to check 61 times before we knew we were over the limit. With this new range, we actually only needed 8 iterations! Very cool how “guessing” can solve the problem quite nicely.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
require 'digest/sha1'
@k_len = Digest::SHA1.hexdigest('a').size # 40
@ip_len = "255.255.255.255".size # 15
@range = 1..100 # starting range
@max_line_len = 1000 # length to check against
@count = 0 # iteration counter
# Given a upper and lower boundary, determine if its
# middle value is over or under the given line length
# If over, use the lower boundary (lower..mid) for a recursive check,
# otherwise use upper boundary (mid..upper)
def check_boundary(lower, upper)
# determine middle value
mid = lower + ((upper-lower)/2)
# Exit recursion if we've found the value
throw(:found_value, mid) if (upper-lower) == 1
# only increment iter count if we're checking the length
@count += 1
# Get the line length for the variable number of elements
len = @k_len + (@ip_len*mid) + mid
# Perform the test
if len > @max_line_len
puts_stats lower, mid, upper, len, :over
# use the lower boundary
check_boundary(lower, mid)
else
puts_stats lower, mid, upper, len, :under
# use the upper boundary
check_boundary(mid, upper)
end
end
# Log method for values in a given test
def puts_stats lower, mid, upper, len, over_under
puts '%10d | %10d | %10d | %10d | %10s' % [lower, mid, upper, len, over_under]
end
# Specify some information for readability
puts 'Determining how many ip elements can sit on a line with a max length of %d' % @max_line_len
puts
legend = '%10s | %10s | %10s | %10s | %10s' % %w(lower mid/test upper len over/under)
puts legend
puts '-'*legend.size
# Run the recursive boundary checking
golden_ticket = catch(:found_value) do
check_boundary(@range.first, @range.last)
end
# Output results
puts
puts 'Golden Ticket (under) = %s' % golden_ticket.to_s
possible_iterations = @range.last-@range.first
efficiency = @count.to_f / possible_iterations.to_f
puts '%d iterations for %d possible iterations (%f efficiency)' % [@count, possible_iterations, efficiency]

Running the above script will produce the following output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Determining how many ip elements can sit on a line with a max length of 1000
lower | mid/test | upper | len | over/under
--------------------------------------------------------------
1 | 50 | 100 | 840 | under
50 | 75 | 100 | 1240 | over
50 | 62 | 75 | 1032 | over
50 | 56 | 62 | 936 | under
56 | 59 | 62 | 984 | under
59 | 60 | 62 | 1000 | under
60 | 61 | 62 | 1016 | over
Golden Ticket (under) = 60
8 iterations for 99 possible iterations (0.080808 efficiency)

I’m not really sure if the efficiency part makes sense, but you get a sense that it’s a LOT faster, not only because we’re calculating the line length per test, but also because we’re recursing a fraction of calls that the brute force method performs. It’s also fun to inflate/deflate the max line len or the starting range values to see how it affects the number of recursions needed to find the number. For instance, set the max line len to 100000 and see how many extra calls have to be made. Also, what happens if your range isn’t big enough? What if the range is off (e.g. 75..100)?

Algorithms are nifty.

Read on →

Git Pre-receive Hook for Rejecting a Bad Gemfile

Bundler has a cool facility with Gemfiles that allow you to specify some fine-grained options for a given gem beyond specifying a version. Things like :path, :branch, :git, and :tag. All of those things are neat for development, but horrible for production. I wanted a way to reject pushes to a repo if the Gemfile was changed to include any one of those options, and a git pre-receive hook was just the tonic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/usr/bin/env ruby
BRANCHES = %w( master stable )
REJECT_OPTIONS = %w( git tag branch path )
old_sha, new_sha, ref = STDIN.read.split(' ')
exit 0 unless BRANCHES.include?(ref.split('/').last)
diff = %x{ git diff-index --cached --name-only #{old_sha} 2> /dev/null }
if diff.is_a?(String)
diff = diff.split("\n")
end
if diff.detect{|file| file =~ /^Gemfile$/}
tree = %x{ git ls-tree --full-name #{new_sha} Gemfile 2> /dev/null }.split(" ")
contents = %x{ git cat-file blob #{tree[2]} 2> /dev/null }
invalid_lines = contents.each_line.select do |line|
line =~ /\b(#{REJECT_OPTIONS.join('|')})\b/
end
unless invalid_lines.empty?
puts
puts '> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
puts '> ---- PUSH REJECTED by origin ----'
puts '>'
puts "> You've specified an invalid option for #{invalid_lines.size} gem definitions in the Gemfile"
puts "> Invalid options are: #{REJECT_OPTIONS.join(', ')}"
puts '>'
puts "> The offending gems:"
puts ">\t" + invalid_lines.join(">\t")
puts '>'
puts '> To fix:'
puts ">\t* Remove the offending options"
puts ">\t* bundle install"
puts ">\t* Run tests"
puts ">\t* Ammend previous commit (git add . && git commit --amend)"
puts ">\t* git push origin #{ref.split('/').last}"
puts '>'
puts '> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
puts
exit 1
end
end

The script above monitors pushes to the “master” and “stable” branches (our development and production lines, respectively). It checks to see if the Gemfile was listed in the new commit file list, then parses the blob of the Gemfile for any of the offending options. Each offending line is then output back to the pushing developer with instructions on how to fix his/her Gemfile and how to amend the commit. Here’s what the output looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ git push origin master
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 362 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
remote:
remote: > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
remote: > ---- PUSH REJECTED by origin ----
remote: >
remote: > You've specified an invalid option for 2 gem definitions in the Gemfile
remote: > Invalid options are: git, tag, branch, path
remote: >
remote: > The offending gems:
remote: > gem 'utilio', :git => 'git@github.com:localshred/utilio.git'
remote: > gem 'rails', :git => 'git@github.com:rails/rails.git'
remote: >
remote: > To fix:
remote: > * Remove the offending options
remote: > * bundle install
remote: > * Run tests
remote: > * Ammend previous commit (git add . && git commit --amend)
remote: > * git push origin master
remote: >
remote: > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
remote:
To git@git.mycompany.com:repo1.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'git@git.mycompany.com:repo1.git'

It’s also worth noting that since this is a pre-receive hook, when returning an exit status of anything but 0, git will reject merging the commits. This is good because we don’t want “bad code” in our repo. You could also use this to do other checking measures, such as running a CI build or syntax checks.

To use the above hook, simply copy the script above into the ./hooks/pre-receive file in your origin repo. Be sure to chmod +x ./hooks/pre-receive otherwise git won’t be able to invoke the script when a new push occurs. We have ~15 repos that I manage at work that I want to use the hook on, so I just kept the file out on the git user’s home directory and symlinked it back to each repos hooks directory. Same results, just easier to manage if I need to make a quick change to the hook.

Happy coding.

Read on →

Thor Script for Managing a Unicorn-driven App

Today I deployed a mini sinatra app on one of our test servers to manage some internal QA. I’ve put out quite a few apps backed by Unicorn in QA recently and finally wrote a little script to handle stopping, starting, and reloading of the unicorn processes. Nothing super special here, just thought I’d share a useful script. Drop the following code into your application’s tasks directory, or place it on the app root and call it Thorfile.

tasks/unicorn.thor (or Thorfile)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# put me in /path/to/app/tasks/unicorn.thor
require 'thor'
class UnicornRunner < Thor
include Thor::Group
namespace :unicorn
UNICORN_CONFIG = "/path/to/app/config/unicorn.rb"
RACKUP_FILE = "/path/to/app/config.ru"
PID_FILE = "/path/to/app/tmp/application.pid"
desc 'start', 'Start the application'
def start
say 'Starting the application...', :yellow
`bundle exec unicorn -c #{UNICORN_CONFIG} -E production -D #{RACKUP_FILE}`
say 'Done', :green
end
desc 'stop', 'Stop the application'
def stop
say 'Stopping the application...', :yellow
`kill -QUIT $(cat #{PID_FILE})`
say 'Done', :green
end
desc 'reload', 'Reload the application'
def reload
say 'Reloading the application...', :yellow
`kill -USR2 $(cat #{PID_FILE})`
say 'done', :green
end
end

Usage

From your application root directory, run any of the three commands. Keep in mind you’ll need a unicorn config file that actually dictates how Unicorn should behave (like number of workers, where your logs go, etc). You’ll also need a Rackup file (config.ru) which tells unicorn how to run your app.

1
2
3
4
5
6
7
8
$ thor -T
# thor unicorn:start Start the application
# thor unicorn:stop Stop the application
# thor unicorn:reload Reload the application
$ thor unicorn:start # starts the unicorn master
$ thor unicorn:stop # sends the QUIT signal to master (graceful shutdown)
$ thor unicorn:reload # sends the USR2 signal to master (graceful reload of child workers)

Plop this puppy behind nginx and you’re golden. Thor has a lot more things you could do with this (like overriding which config file to use) by providing method-level options, but this is a great starting point for most people. Leave a comment if you have any improvements or other ways you handle this.

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.