Google Book Downloader Update: Download JPEGs
Posted on December 2, 2010 by hac
Google Book Downloader is my app that downloads Google Books/Book Previews in PDF format. The major problem with it so far has been that when it encodes JPEGs from Google’s servers into PDF format, there is a loss in quality of the image. So far I have found no way to avoid this loss of quality when making a PDF from JPEGs.
Today I’m releasing a new version (1.2) of Google Book Downloader in which you can choose not to save a PDF, but a folder of JPEGs. The folder also has an index.html file that makes it convenient to read the JPEGS in the right order.
Download it here.
Posted in Uncategorized
|
Tagged Google Book Downloader, OS X
|
8 Comments
Replacing C Functions with Dynamic Linking in OS X
Posted on October 24, 2010 by hac
Say we wanted to make a command line program like date use a fake time instead of the current one. We could do this by supplying a time() function to replace the time() function in libSystem.
How do we know that date uses time()? We use nm, which lists all the symbols used by a particular program:
$ nm -m /bin/date | grep _time
(undefined [lazy bound]) external _time (from libSystem)
Once we know what function to replace, we can write a replacement function:
// time.c
#include <sys/time.h>
// This function will override the one in /usr/lib/libSystem.dylib.
time_t time(time_t *tloc)
{
// January 1st, 2000.
struct tm timeStruct;
timeStruct.tm_year = 2000 - 1900;
timeStruct.tm_mon = 0;
timeStruct.tm_mday = 1;
timeStruct.tm_hour = 0;
timeStruct.tm_min = 0;
timeStruct.tm_sec = 0;
timeStruct.tm_isdst = -1;
*tloc = mktime(&timeStruct);
return *tloc;
}
Then we compile the code as a dynamic library:
gcc -c time.c
gcc -flat_namespace -dynamiclib -current_version 1.0 time.o -o libTime.dylib
To tell OS X’s dynamic linker to load our dynamic library, we need to set DYLD_INSERT_LIBRARIES to the path of the library. We also need to set DYLD_FORCE_FLAT_NAMESPACE, or our function will not override the old one. These settings and more can be found on the dyld man page.
The result:
$ date
Sun Oct 24 13:21:12 EST 2010
$ DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=./libTime.dylib date
Sat Jan 1 00:00:00 EST 2000
Posted in Uncategorized
|
Tagged C, GCC
|
4 Comments
Upcoming OS X App: Music Player/Downloader
Posted on October 24, 2010 by hac
Here is a screenshot of an app I plan on releasing soon. As you can see, it has an iTunes-like interface, with some extra features for downloading music. You can search YouTube for songs, and download them from YouTube directly into your music library.
Leave a comment or contact me if you would like to beta test.
Posted in Uncategorized
|
Tagged iTunes, music, OS X, YouTube
|
16 Comments
[PHP] Get Google’s Cache of a URL
Posted on August 30, 2010 by hac
This PHP function fetches the contents of a URL as it exists in Google’s cache:
function cachedHTMLForURL($url)
{
// Request the cache from Google.
$googleRequestURL = "webcache.googleusercontent.com/search?q=" . urlencode("cache:" . $url);
$googleResponse = file_get_contents($googleRequestURL);
// Return false if Google did not have it.
if (preg_match("/^.*<title>cache:/", $googleResponse))
return false;
// Remove the first 3 lines of the response, which is inserted by Google.
$importantHTML = preg_replace("/^(.*\n){3}/", "", $googleResponse);
// Allow one line to be inserted, which corrects the base path of the site.
preg_match_all("/<base class=\"[^\"]*\">/", $googleResponse, $matches);
$base = $matches[0][0] . "\n";
return $base . $importantHTML;
}
Use like so:
echo cachedHTMLForURL("news.google.com/");
Posted in Uncategorized
|
Tagged cache, Google, PHP
|
2 Comments
[PHP] Retrieve iTunes Store HTML/XML
Posted on August 29, 2010 by hac
In iTunes 9, almost all of the pages in the iTunes store are rendered with HTML. This PHP function will retrieve the raw iTunes Store XHTML for a URL:
function htmlForiTunesStoreURL($path)
// Download and return the HTML for an iTunes Store page at the given URL.
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $path);
// The following header is what causes the server to think we are iTunes.app.
// This header in particular is for the U.S. Store.
curl_setopt($ch, CURLOPT_HTTPHEADER, array('X-Apple-Store-Front: 143441-1,5'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
return trim($html);
}
For example, this is the music store home page:
echo htmlForiTunesStoreURL(
"ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewGrouping?id=38"
);
Some special pages will not return XHTML, but property list XML. For example, the advanced search page:
echo htmlForiTunesStoreURL(
"ax.search.itunes.apple.com/WebObjects/MZSearch.woa/wa/advancedSearch"
};
In older versions of iTunes (4-8) all pages were rendered from property list XML. This modified version of the function returns all pages as XML:
function xmlForiTunesStoreURL($path)
// Download and return the HTML for an iTunes Store page at the given URL.
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $path);
// The iTunes user agent without a special header causes the server to give us XML for a page.
curl_setopt($ch, CURLOPT_USERAGENT, 'iTunes/9.0.2 (Macintosh; Intel Mac OS X 10.5.8) AppleWebKit/531.21.8');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
return trim($html);
}
However I wouldn’t rely upon the XML from the second function for anything important, because (as far as I know) it is no longer used in iTunes, so it may stop working in the future.
Posted in Uncategorized
|
Tagged iTunes, Scraping
|
Leave a comment