Jan 13
2015

Open M3U8

We are happy to announce another open source project in audio streaming. Open M3U8 is an easy to use and easy to develop parser of M3U8 files.

https://github.com/iheartradio/open-m3u8

Why Open M3U8?

The License

Before Open M3U8, the only open source option available was LGPL licensed. Therefore it introduces artificial challenges to include in Android apps. Open M3U8 uses the MIT license which puts far fewer restrictions and requirements on developers so they can feel free to use and modify the library to fit their needs.

The Architecture

Open M3U8 is designed to make it very easy to adjust to any changes in the M3U8 specification. That includes additions, changes, and removals. Considering the specification is not yet an RFC, this is very important to the vitality of the project. The project is not yet complete and still in its infancy. If there are tags that are not implemented yet, it is easy for any developer to add support for those tags and have them added back to the main project.

The Language

Although developed for the purpose of being used in an Android app, Open M3U8 is not restricted to the Android platform. Being written in pure java makes it possible to use Open M3U8 just about anywhere.

Aug 23
2014

QuickIO

A little over a year ago, iHeartRadio started open sourcing projects it uses internally for everyone to consume. Our first open source project was a modest graph wall, and we’ve been slowly releasing more and more since then. Today sees the release of our biggest project yet to the world of open source under a free MIT license: QuickIO.

QuickIO is the software that powers the real-time stack behind iHeartRadio. Built to handle a combination of WebSockets and HTTP long polling, this little server has proven to be incredibly powerful and scalable when dealing with millions of simultaneous users. While still very much in its infancy, QuickIO promises to develop into a very strong, very scalable platform.

Keep reading

Jul 17
2014

Introducing ShinyBuilder - an open source, point-and-click dashboard builder GUI

We are proud to introduce ShinyBuilder, an open source, point-and-click dashboard platform based on R/Shiny which makes it simple to create and share live, connected dashboards.

To try it out, please see the ShinyBuilder Live Demo.

ShinyBuilder was created with the following goals:

Easy Access - Can be accessed from any web browser
Easy Authoring - Dashboards are created via a point-and-click GUI, enabling anyone with basic SQL knowledge to set up a professional dashboard in a matter of minutes.
Instantly Connected - ShinyBuilder charts begin as SQL queries, and are immediately linked to a live database. Once created, the charts are refreshed daily & automatically, eliminating need to build ETL workflows.
Extensible & Modular - all major JavaScript libraries used in the project have been wrapped into reusable R/Shiny packages, making it easy to extend ShinyBuilder or to use ShinyBuilder components in your own projects

To learn more, including how to install and configure ShinyBuilder, please visit our Github page.

We’ve had a great time using ShinyBuilder at iHeartRadio, and hope you do as well! Comments, suggestions, bug reports/fixes etc. are warmly welcomed!

Also blogged on www.r-bloggers.com

Jul 02
2014

Android APK Metrics

APK Information Script

Intro

This post will show how to extract key metrics from an Android app binary (.apk), report this to Team City as custom metrics using System Messages and plot the metrics over time on your build overview.

Why?

We wanted to extract interesting information from our APKs, both manually and as part of an automated build process so we can track change over time and link apk-metric changes back to specific source changes and commits.

APK size in bytes
classes.dex size in bytes
# methods (useful for monitoring the magical 65k dex limit)
# classes

How?

Run this script in any folder containing one or more .apk files.

Result (normal mode):

user$ apk-stats.sh
classes methods dex-bytes   apk-bytes   apk
790 6451    9027848 1080992 SomeApp-production.apk
760 6461    9027860 1081028 SomeApp-debug.apk
700 6661    9027848 1081010 SomeApp-stage.apk

Team City Support

To collect this information from a Team City job it supports TC’s system messages which you in turn can use to create nice custom charts for your project/job configuration. Simply run the script with the parameter teamcity and the System Messages will be echoed to terminal.

The custom metrics are reported like this:

Result (Team City mode):

apk-stats.sh teamcity
Team City system messages enabled!
classes methods dex-bytes   apk-bytes   apk
790 6451    3027848 5800992 SomeApp-production.apk
##teamcity[buildStatisticValue key='classes-SomeApp-production.apk' value='790']
##teamcity[buildStatisticValue key='methods-SomeApp-production.apk' value='6451']
##teamcity[buildStatisticValue key='dex-size-SomeApp-production.apk' value='3027848']
##teamcity[buildStatisticValue key='apk-size-SomeApp-production.apk' value='5800992']

Team City will automatically pick up these values and you’ll see them under the job’s Parameters tab and under Reported statistic values. Click on this graph to see the trend.

Custom chart setup

For a nice graph of these metrics on your project’s Statistics tab, change your Team City master’s .BuildServer/config/projects/:yourProject/pluginData/plugin-settings.xml to something like this:

<settings>
        <custom-graphs>
                <graph title=".dex Information">
                    <properties>
                        <property name="axis.y.max" value="65536"/>
                        <property name="axis.y.min" value="0"/>
                        <property name="height" value="300"/>
                    </properties>
                    <valueType key="methods-SomeApp-production.apk" title="Methods (Production)" buildTypeId="project_SomeJob"/>
                    <valueType key="classes-SomeApp-production.apk" title="Classes (Production)" buildTypeId="project_SomeJob"/>
                </graph>
                <graph title="APP Size" format="size">
                    <properties>
                        <property name="height" value="300"/>
                    </properties>
                    <valueType key="dex-size-SomeApp-production.apk" title=".dex (Production)" buildTypeId="project_SomeJob"/>
                    <valueType key="apk-size-SomeApp-production.apk" title=".apk (Production)" buildTypeId="project_SomeJob"/>
                </graph>
        </custom-graphs>
</settings>

This will give you a custom graph on the two metrics reported: app size (split in apk size and actual code) and dex information (methods & classes count).

apk-stats.sh (chmod +x)

#!/bin/sh

DEXDUMP=$ANDROID_HOME/build-tools/19.1.0/dexdump
DEX_FILE="classes.dex"
FORMAT_TERMINAL="%s\t%s\t%s\t%s\t%s\n"
TEAMCITY=0

die()
{
    echo "Processing failed: ${1}"
    exit 1
}

#apk classes methods dex-bytes apk-bytes
print_result_terminal()
{
    printf ${FORMAT_TERMINAL} $2 $3 $4 $5 $1
}

print_result_teamcity()
{
    printf "##teamcity[buildStatisticValue key='%s-%s' value='%s']\n" "classes" $1 $2
    printf "##teamcity[buildStatisticValue key='%s-%s' value='%s']\n" "methods" $1 $3
    printf "##teamcity[buildStatisticValue key='%s-%s' value='%s']\n" "dex-size" $1 $4
    printf "##teamcity[buildStatisticValue key='%s-%s' value='%s']\n" "apk-size" $1 $5
}

process_apk()
{
    rm -f $DEX_FILE
    unzip -q -j $1 $DEX_FILE -d . || die "Could not unzip ${1}, does file exists?"
    CLASS_COUNT=$($DEXDUMP $DEX_FILE | grep 'Class descriptor' | wc -l)
    METHOD_COUNT=$(cat $DEX_FILE | head -c 92 | tail -c 4 | hexdump -e '1/4 "%d"')
    DEX_SIZE=$(stat -f%z $DEX_FILE)
    APK_SIZE=$(stat -f%z $1)
    print_result_terminal $1 $CLASS_COUNT $METHOD_COUNT $DEX_SIZE $APK_SIZE
    if [ $TEAMCITY == 1 ]; then
        print_result_teamcity $1 $CLASS_COUNT $METHOD_COUNT $DEX_SIZE $APK_SIZE
    fi
    rm -f $DEX_FILE
}

process_all_apks()
{
    for f in *.apk
    do
        process_apk $f
    done
}

if [[ ! -d $1 && $1 == "teamcity" ]]; then
    echo "Team City system messages enabled!"
    TEAMCITY=1
fi

echo "classes\tmethods\\tdex-bytes\tapk-bytes\tfile"
process_all_apks

Oct 22
2013

Converting GHUnit to XCTests

Motivation

With the release of Xcode 5 and Xcode Server we wanted to convert our old GHUnit test suite (some 250 files, 400 unit tests) to Apple’s new version of their testing framework so we could utilize the new features for automated testing on Xcode Server.

GHUnit have served us well, but we want bots!

Out with our old trusty Jenkins CI instance, in comes Xcode Bots!

How?

Our test suite was fairly large, so we definitely didn’t want to visit every file and do this by hand.

GHUnit is very similar to Apple’s SenTestingKit and the new XCTests so we figured some clever regexp'ing and bash wizardry would do the job for us.

Here’s how you can convert your suite easily with a minimum of manual labour.

Create a new XCUnit Test Bundle for your target under test.
Drop in a replacement for GHUnit’s GHAsyncTestCase, say hello to XCTAsyncTestCase. Put this file in your project as part of your test target.
Make all of your old GHUnit test case classes member of your new target, usually the quickest way to do this is by finding them in Finder and dragging them to your target. If you like manual labour, you can go through all of the files in XCode and check off the box in the right panel so they become part of your test target.
Run the following script to replace all imports, subclassing and assertions with XCTest’s. GHTestCase will become XCTestCase, GHAssertNil will become XCTAssertNil etc. Download this script and make it executable using: chmod +x xctestify.sh and run it with ./xtestify.sh in the folder(s) containing your test files.

Result

Our test suite is now runnable as a proper iOS test target with a dependency to your target project, no more making sure that your classes are a member of both the actual and the test target. Win!

Apr 09
2013

StatsPi: our first entry into the world of open source

I’m just going to let the picture of what we did speak for itself:

Yes, you want that. Graphs of your infrastructure all over the place, all the time.

Featuring:

RaspberryPi (model B)
Automatic distribution of graphs among TVs
Automatic graph and config refreshing
A tiny memory footprint
Super otter powers

Get it now: https://github.com/iheartradio/statspi

We’ve got a few other projects in the works; keep your eyes peeled.

Jan 23
2013

A Custom Icon Font of Our Own

We have dozens of icons on our site and a handful of intricate sprites to serve them. Even though our trusty sprites have never let us down, the idea of creating a custom icon web font to replace them was appealing. Styling elements with CSS eliminates the multiple image states necessary when using sprites; icons could be colored, shadowed, rotated, transitioned, and resized without opening Photoshop. The eleven PSDs I’ve maintained (including a total of 40 icon and 14 logo treatments) could be tossed. And to top it off, webfonts are resolution independent, rendering beautifully at all pixel densities–good news while adapting our platforms to higher resolutions.

Collecting our assets

I work solely on the web platform, but it made sense to include the icons we use on mobile as well. This proved worthwhile as it shed light on inconsistencies we had across platforms and gave us a chance to rectify them. After deciding on one official set of icons to be used globally, I set out to turn my vectors into glyphs. I opened all of our icons in Illustrator, did any path cleanup necessary (a good rule of thumb is the fewer points the better), resized each to the same dimensions, and saved them out as SVGs.

Finding the right tools

After some light googling for font applications, I decided to try out FontForge. I should note that I designed a custom font years ago in design school, and when I opened FontForge I was instantly reminded of the tediousness. Foreshadowed by a circuitous installation, working with FontForge was very, very frustrating. While it does have the virtue of being free, I proved too impatient to decode and fix my font’s mysterious errors.

Annoyed, yet undaunted, I went in search of another method and found the lovely (and also free) Hieroglyph. I was thrilled when thirty minutes later I had my icon font–but soon realized there were was a serious deal breaker. Hieroglyph will only create glyphs from vectors that are a single continuous path. Half of our icons are simple enough to be one small shape, but the others have necessary additional components (i.e. radio waves, talk bubbles) which Glyphs would simply omit. Back to square one.

From Vectors to Glyphs

After striking out twice now, my determination faltering, it was high time I entertained the idea of purchasing software. Along came Glyphs, and with it’s handy 30-day demo, I was finally able to create our font in all its multi-pathed glory. Glyphs is a pleasure to work with, well documented, and intuitive–definitely worth the price tag.

Implementation

After exporting from Glyphs and running the FontSquirrel @font-face generator, the iHeartIcons webfont was ready for action. Now to swap out the sprites! I would have preferred to use the :before selector to avoid additional markup, but, unfortunately, we still support IE7 (honestly, I’m just thrilled it supports webfonts at all). To appease IE7, I had to include the actual letter of the icon in the html. Initially, each icon was mapped to a corresponding letter for easy recall–e.g. P for Play–but now those letters would be read by screen readers. I subsequently remapped all of the icons to either existing symbols or private use characters (where they really should have been all along), making them both semantic and screen reader-friendly.

This technique works so magically well! It’s really delightful every time you get to toss out a now useless sprite. It’s supported by all modern browsers (and even the aforementioned broken one), and simple to implement. I did need to bump up the text sizes a hair on Windows because their font rendering made the icons appear jagged, but this was a compromise that only improves usability for our users.

Update!

As much as I liked using Glyphs, I’m now using the service Icomoon instead. Icomoon streamlines the process even further by generating the webfont automatically and offers hundreds of free icons that you can pick and choose to include in your pack. Their web app is very easy to use, free, and by far the quickest solution for making your icon font–just upload your SVGs and hit the ground running.

Jan 16
2013

Hadoop Streaming logfiles into HDFS/HIVE

Tutorial on how to process web log files and stream data into HDFS to be manipulated by HIVE (Hadoop)

This post will show how you can process your log files from you web/api servers. It will show how you can cleanup the format for the log files to fit better with the HIVE table formats, show how you can use python scripts to retrieve the City from the IP address in the access logs and how to store the data as an external table in HIVE. We will then do some simple HIVE queries on the data to show everything is working as it should.

Step 1 - Get the log files in right format and ingested into HDFS

We will process a very small segment of a log file. Essentially the format of the log files is as follows;

Fields: cs(X-Cluster-Client-IP) date time cs-method cs-uri sc-status cs(Session-Id) s-dns cs(Device-Name) cs(Client-Id) cs(Profile-Id) x- P(contentType) x-P(contentId) x-P(host) x-P(sessionId) x-P(profileId) x-P(userName) time-taken c-ip

A typical entry will be as follows;

'64.134.68.183' 2012-11-27 18:00:00 POST /api/v1/sevicecall/endpoint 200 'gqYdeilrVlRVY5SyKB6nPA==' apiserver1 'SGH-T989' '359626040714460' '12349' - '3801325' 'mobile.app.thumbplay.com' 'gqYdeilrVlRVY5SyKB6nPA%3D%3D' '12349' - 0.002 10.90.50.2

We want it to be stripped of all un-nessecary characters so the HIVE ingestion is a simple as possible. The format I would like is the following;

64.134.68.183 2012-11-27 18:00:00 POST /api/v1/sevicecall/endpoint 200 gqYdeilrVlRVY5SyKB6nPA== apiserver1 SGH-T989 359626040714460 12349 - 3801325 mobile.app.thumbplay.com gqYdeilrVlRVY5SyKB6nPA%3D%3D 12349 - 0.002 10.90.50.2

A simple way of doing this is to run a Hadoop streaming job that cleans the data and then adds the output to a HDFS folder.

First thing we do is to upload the access log file into HDFS raw. You can use the command below to simply put the file into a folder on HDFS.

hadoop fs -put access.log /yourfolder/access.log

You now have the raw log file in HDFS and we can run a Hadoop streaming job to load the data in the right format to load into HIVE.

hadoop jar $STREAMJAR -Dmapred.reduce.task=0 -input /yourfolder/access.log -output /yourfolder/processed/access_log -mapper createAccessLogInHDFS.py -file createAccessLogInHDFS.py

This script uses a python script to stream the data line by line into a new file on HDFS. The python script does the following;

-- #!/usr/bin/python
import datetime
import re
import sys

for line in sys.stdin:
    line = line.strip().strip('\n').replace('\'','')
    print line

You now have a folder on HDFS thats ready to be ingested into HIVE. To load into HIVE, you need to do a couple of things. First of all you need to run a regex to match the lines in the log file. Loading into HIVE is very picky so the regex needs to match each line or it will just insert NULL for all values. To make this easier to test, I wrote the RegEx in a python script and then once I had confirmed I could match all the lines, I added this into the HIVE script.

Here is the regex code I used to validate the process;

--#!/usr/bin/python

import datetime
import re
import sys
--# For extracting request info from the request.

p = "^([\\'+\\d.\\S+]+) (.{19}) (\\w+) ([\\/+\\S+]+) (\\d+) (\\S+) ([\\w+\\d+\\.+]+) ([\\'+\\d.\\S+\\s+]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) \\S+ ([\\d.]+) ([\\d.]+).*"

pattern = re.compile(p)
file = open("/home/mydir/access.log", "rb")

for line in file:
    res = pattern.match(line)
    if res:
        print 'Match found: ', res.group()
    else:
        print line
        print "no match"

print "done"

I am not going into details on how to do regex but the expression above will work easily. Please note that you need the .* at the end of the line and also you need to use “" instead of ”". You now will create the HIVE script to insert the data into HIVE.

-- Load the access.log file as an external table.
-- Note that all types must be strings for the regex serde
DROP TABLE ACCESS_LOGS;

CREATE EXTERNAL TABLE ACCESS_LOGS (
ip_address string,
date_time string,
method string, 
uri string,
status string,
session_id string,
api_server string,
n1 string, 
n2 string, 
cid string,
n3 string,
pid string,
sid string,
pid2 string,
ms string,
server string
) 

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "^([\\'+\\d.\\S+]+) (.{19}) (\\w+) ([\\/+\\S+]+) (\\d+) (\\S+) ([\\w+\\d+\\.+]+) ([\\'+\\d.\\S+\\s+]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) \\S+ ([\\d.]+) ([\\d.]+).*",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s %12$s %13$s %14$s %15$s %16$s %17$s"
)
LOCATION '/yourfolder/processed/access_log';

Command to run script:

hive -f create_access_logs.hql

After running the script, you should now be able to validate that the data is in the HIVE table. You can run the following command to just quickly test that data is there.

Start HIVE with just typing in HIVE to start the command line.

Then type a simple query like:

select * from access_logs limit 10;

Check to see if you have any NULL values for rows, this means your regex did not match a row and you need to run the python script again to make sure all lines are being matched by the regex

     select count(*) from access_logs where ip_address IS NULL;

Step 2 - Add GEO lookup for IP in access log files to get the city

Next thing we want to do is to do some more logic when we stream the log file into HDFS. We have the client IP address and we want to get the City/ZIP for each request so we can do some cool UI graphs on where our users come from. The best way of doing this is to have Python do the lookup against a Maxmind dataset. I initially looked at loading the table into HIVE and then doing a join but the problem is that the maxmind db comes with ip ranges in 2 columns for each row so doing this in HIVE would be very complicated and most likely involved doing some built in functions that you could do on ingestion anyways. So I opted to go with having the streaming do the python lookup of IP addresses and append the data while inserting it in HDFS. The cool thing is that you can upload the files into HDFS and have each Hadoop process cache it so its very fast.

So main difference between the previous way of doing this is that I now upload the Maxmind DB file (GeoLiteCity.dat) and the python script for doing maxmind lookups (code.google.com/p/pygeoip/) into HDFS so our Streaming job can access it. Please note that I tar'ed these files together so I could use CacheArchive to unpack, create the symlink folder and thus have the Hadoop job have direct access to the files. Essentially I just ran the command;

$ tar czvf geoip.tgz *

I then upload this to HDFS:

hadoop fs -put geoip.tgz /yourfolder/geoip.tgz

I now have a file on HDFS that has the .dat file with all the ip address and the python script to procees it. I willl now modify my python streaming script to utilize this. Here is the code, please note that we do the same regex as we did when we inserted into HIVE table since I want to parse out the IP to be passed into the Python Maxmind script.

#!/usr/bin/python

import sys
import re
sys.path.append('./geoip')
import pygeoip

# create the match criteria
p = "^([\\'+\\d.\\S+]+) (.{19}) (\\w+) ([\\/+\\S+]+) (\\d+) (\\S+) ([\\w+\\d+\\.+]+) ([\\'+\\d.\\S+\\s+]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) \\S+ ([\\d.]+) ([\\d.]+).*"

pattern = re.compile(p)

# Load the database once and store it globally in interpreter memory.
GEOIP = pygeoip.Database('geoip/GeoLiteCity.dat')

for line in sys.stdin:
    line = line.strip().strip('\n').replace('\'','')
    res = pattern.match(line)

    if res:
        ip = res.group(1)
        if ip!= "-":
            info = GEOIP.lookup(res.group(1))
            newline = res.group() + ' ' + str(info.city)
            print newline
        else:
            print line

I now modify my Hadoop streaming command to look like the following;

hadoop jar $STREAMJAR -Dmapred.reduce.task=0 -input /yourfolder/access.log -output /yourfolder/processed/access_ip_log -mapper createHDFS_GEO.py -file createHDFS_GEO.py -cacheArchive 'hdfs://localhost:8020/yourfolder/geoip.tgz#geoip' -verbose

You now should have the data in HDFS and you should have the extra data for city appended to the file. You can check this easiy by running the command;

hadoop fs -tail /yourfolder/processed/access_ip_log/part-00000

You now would modify the HIVE command that loads data into HIVE table to extract the new data.

-- Load the access.log file as an external table.
-- Note that all types must be strings for the regex serde
DROP TABLE ACCESS_LOGS;

CREATE EXTERNAL TABLE ACCESS_LOGS (
    ip_address string,
    date_time string,
    method string,  
    uri string,
    status string,
    session_id string,
    api_server string,
    n1 string, 
    n2 string, 
    cid string,
    n3 string,
    pid string,
    sid string,
    pid2 string,
    n4 string,
    ms string,
    server string,
    city string
) 

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "^([\\'+\\d.\\S+]+) (.{19}) (\\w+) ([\\/+\\S+]+) (\\d+) (\\S+) ([\\w+\\d+\\.+]+) ([\\'+\\d.\\S+\\s+]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) \\S+ ([\\d.]+) ([\\d.]+) ([\\'+\\d.\\S+\\s+]+).*",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s %12$s %13$s %14$s %15$s %16$s %17$s %18$s"
)
LOCATION '/yourfolder/processed/access_ip_log';

Command to run script;

hive -f create_access_logs.hql

Start HIVE with just typing in HIVE to start the command line.

Then type a simple query like:

select * from access_logs limit 10;

Agan, check to see if you have any NULL values for rows, this means your regex did not match a row and you need to run the python script again to make sure all lines are being matched by the regex

     select count(*) from access_logs where ip_address IS NULL;

iHeartRadio Tech Blog

Open M3U8

QuickIO

Introducing ShinyBuilder - an open source, point-and-click dashboard builder GUI

Android APK Metrics

APK Information Script

Intro

Why?

How?

Result (normal mode):

Team City Support

Result (Team City mode):

Custom chart setup

apk-stats.sh (chmod +x)

Converting GHUnit to XCTests

Motivation

How?

Result

StatsPi: our first entry into the world of open source

A Custom Icon Font of Our Own

Collecting our assets

Finding the right tools

From Vectors to Glyphs

Implementation

Update!

Hadoop Streaming logfiles into HDFS/HIVE

Tutorial on how to process web log files and stream data into HDFS to be manipulated by HIVE (Hadoop)

Step 1 - Get the log files in right format and ingested into HDFS

Step 2 - Add GEO lookup for IP in access log files to get the city