Friendster

From Archiveteam

Jump to: navigation, search

Friendster


URL	www.friendster.com/ ^{[IA] [WebCite] [archive.today]} www.archive.org/details/archive-team-friendster ^{[IA] [WebCite] [archive.today]}
Project status	Offline
Archiving status	Saved!
Project source	Unknown
Project tracker	Unknown
IRC channel	#archiveteam

Friendster is an early social networking site. It's estimated that Friendster has over 115 million registered users. Founded in 2002, Friendster allowed the posting of blogs, photos, shoutouts/comments, and "widgets" of varying quality (not dissimilar to Facebook applications). It is considered one of the earlier social media networks (although it has numerous predecessors dating back for years) and distinguished itself by allowing such "rich media" additions to a user's account. After an initially high ranking and rating in the charts, Friendster's slow decline in hotness ensured an ever-growing chance of being deleted, and on April 25th, 2011, Friendster announced that most of the user-generated content on the site would be removed on May 31st, 2011. Literally terabytes of user-generated content is in danger of being wiped out, and Archive Team has made it a priority to grab as much of Friendster as possible. A unix-based script (called BFF, or Best Friends Forever) has been created and Archive Team is asking for anyone with unix and 100gb of disk space to get involved in the project.

Jonathan Abrams, the original co-founder of Friendster, has wiped his hands of the whole situation, and is mostly frustrated with Friendster's past. [1]

Because Friendster is based on numeric IDs (as opposed to usernames), it is possible to assign "chunks" to Archive Team volunteers. Please read up about the tools below, and if you have an interest in helping, join us at #foreveralone on EFnet and help us save Friendster.

There's a side project downloading a Friendster dataset.

1 DNS change
- 1.1 dnscache
- 1.2 dnsmasq
- 1.3 bind
- 1.4 hacky simple way
2 Tools
- 2.1 friendster-scrape-profile
- 2.2 Automating the process
- 2.3 Advanced: multiple instances
  - 2.3.1 Requirements
  - 2.3.2 Manually
  - 2.3.3 chunky.sh
    - 2.3.3.1 Multiple Instances of chunky.sh
  - 2.3.4 snook.sh
  - 2.3.5 invoker.pl and summary.pl
- 2.4 XML friend lists
- 2.5 Troubleshooting
3 Site Organization
- 3.1 Profiles
- 3.2 Photo Albums
- 3.3 Blogs
- 3.4 Groups
- 3.5 Forums
4 Range Signup Sheet
- 4.1 Proposal: sampling
- 4.2 Proposal: download some groups
5 Known issues
- 5.1 Running on Mac OS X
- 5.2 More on the missing image problem
- 5.3 More on the shoutout page problem
- 5.4 Blogs with bad links
- 5.5 Blogs with corrupt images
6 CEO's "Friendster re-launching" message
- 6.1 New friendster answer about old data

DNS change

On Monday, June 27, friendster switched DNS servers, pointing at their new site. However, the old site and data remain available on the old servers, if you know where to look.

NOTE: It is strongly recommended that you use a local caching DNS server, such as dnscache, dnsmasq, or bind. This reduces the DNS load on your internet connection, allows DNS lookups to resolve faster, and reduces load on the remote server.

dnscache

If you're using the dnscache server from the djbdns package, you can do the following to forward your friendster-related requests (assuming your dnscache configuration is in /etc/dnscache):

echo 50.17.127.246 > /etc/dnscache/root/servers/friendster.com
echo 50.17.127.246 > /etc/dnscache/root/servers/friendster.com.cdngc.net
svc -t /etc/dnscache

Do a lookup of friendster.com. You should get 209.11.168.113.

dnsmasq

You can tell dnsmasq to forward requests for the friendster domains to a different server. dnsmasq will also cache results for a time. the default cache size is 150 names.

Find your dnsmasq configuration.
Add the following options:
server=/friendster.com/50.17.127.246
server=/friendster.com.cdngc.net/50.17.127.246
Restart dnsmasq.
do a lookup for friendster.com. you should get 209.11.168.113.

bind

If you're using bind for your DNS needs, you can add the following to your options in order to forward your friendster-related requests to a server that is still serving the old data:

zone "friendster.com" {
   type forward;
   forwarders { 50.17.127.246; };
};

zone "friendster.com.cdngc.net" {
   type forward;
   forwarders { 50.17.127.246; };
};

Then reload/restart bind. Do a lookup of friendster.com and you should get 209.11.168.113.

hacky simple way

NOTE: This is NOT recommended. It will forward ALL of your DNS lookups to this server, for EVERY request. (linux does not cache results on the local machine by default.)

Add "nameserver 50.17.127.246" to the top of your /etc/resolv.conf file. This will send all lookups to that server first. This server does so recursive requests as well, so you could use it directly if you wanted. (it would potentially slow down all name lookups, however). The better way is to do one of the above. (by default, linux does not cache dns results on the local machine. you may want to install dnscache, change the root/servers/@ file to list your ISP dns servers (or your own local server), and point resolv.conf at 127.0.0.1).

Be aware that with this hacky method, the change could be overwritten the next time your DHCP updates. (You might be able to add the line to a new file named "/etc/resolve.conf.head" to get around this. You might also be able to configure your DHCP client to ignore the servers it got from the DHCP, or place another server before or after it. Another option, in Linuxes that support it, is to "sudo chattr +i /etc/resolv.conf")

Do a lookup of friendster.com. You should get 209.11.168.113.

Tools

friendster-scrape-profile

Script to download a Friendster profile: download it, or clone the git repository.

You need a Friendster account to use this script. (Note: if you are creating an account, mailinator email addresses are blocked) Add your login details to a file username.txt and a password.txt and save those in the directory of the download script.

Run with a numeric profile id of a Friendster user: ./friendster-scrape-profile PROFILE_ID

Currently downloads:

the main profile page (profiles.friendster.com/$PROFILE_ID)
the user's profile image from that page
the list of public albums (www.friendster.com/viewalbums.php?uid=$PROFILE_ID)
each of the album pages (www.friendster.com/viewphotos.php?a=$id&uid=$PROFILE_ID)
the original photos from each album
the list of friends (www.friendster.com/fans.php?uid=$PROFILE_ID)
the shoutoutstream (www.friendster.com/shoutoutstream.php?uid=$PROFILE_ID) and the associated comments
the Friendster blog, if any

It does not download any of the widgets.

Downloading one profile takes between 6 to 10 seconds and generates 200-400 kB of data (for normal profiles).

Automating the process

(This is all unix-only; it won't work in Windows.)
1. Create a Friendster account
2. Download the script; name it 'bff.sh'.
3. In the directory that you put the bff.sh, make a username.txt file that has your Friendster e-mail address as the text in it
4. In the directory that you put the bff.sh, make a password.txt file that has your Friendster password as the text in it.
5. Choose your profile range.
6. Edit that section to say what range you'll do.
7. On the command line, type (with your range replacing the '#'s.):

$ for i in {#..#}; do bash bff.sh $i; done

or even better

$ ./bff-thread.sh # #

which will allow you to stop at any time by touching the STOP file.

Advanced: multiple instances

Requirements

Now you might notice it's relatively slow. My average is 115 profiles per hour. The bottleneck is mainly network requests, so running multiple instances can increase your download speed nearly linearly. BUT we're not sure whether it's safe to use the same cookies.txt file for all the instances (which it will do by default). Luckily you can easily avoid this using an extra optional parameter of bff.sh. Just add the name of the cookie file you want it to create and use right after the profile ID, for instance: "bff.sh 4012089 cookie3.txt". Use a different cookie file for each instance.

Manually

The full, modified command would then be (replacing the #'s with your range or the cookie number, where applicable):

$ for i in {#..#}; do bash bff.sh $i cookie#.txt; done

chunky.sh

This is the latest and most sophisticated way to automate this is to run chunky.sh. It breaks the range up into chunks of a thousand profiles, and runs as many of these chunks concurrently as you request. This means that if some chunks contain smaller profiles and therefore download more quickly you don't end up with fewer concurrent downloads than you wanted.

$ ./chunky.sh <start> <end> <threads>

Multiple Instances of chunky.sh

In order to always be downloading at maximum capacity, we're experimenting with an updated chunky.sh that is aware of all BFF download processes on the machine, not just its own. That means that you can start a new range of profiles and the new chunky.sh will patiently wait until it sees an open download slot to take. It hasn't seen a whole lot of testing yet, so use at your own risk and report any problems or possible improvements in #forerveralone. Syntax is the same as the original chunky.sh. View it here or download it here.

snook.sh

The original automated solution was snook.sh. This script takes the start and end of a range and a number of download threads to run and launches that many instances of bff.sh at once. It automatically logs the output to individual log files and creates separate cookies files for them. This script was originally written by underscore; you may have his link to pastebin on the irc channel. I've fixed several bugs, including one very serious one. If you used the version from pastebin, you'll need to start over because it downloaded the wrong profiles (keep what you downloaded, it'll merely overlap with someone else.) If you need to stop the downloads cleanly, simply $ touch STOP.

invoker.pl and summary.pl

Another option is this perl script which does a similar job. It's not thorougly tested yet, but it's pretty simple. It takes the starting ID, the number of IDs per process, the number of processes, then creates a shell script which launches them. It has the bonus of being able to be stopped by using $ touch STOP, and it logs every finished ID from every instance to one file for monitoring. This script will give a quick summary of that file to monitor the processes' progress. (And with touch STOP and the summary file, that means easy management over SSH! Woo!)

XML friend lists

Also on the wiki: a script that uses the Friendster API to download friends lists. This has the advantage that you can get the ids of all friends of a user as one XML file, which is a lot faster than the bff method. See getfriends.sh on Github.

Troubleshooting

If you get an error like bff.sh: line 26: $'\r': command not found, you will need to convert the script to use UNIX-style line endings:

$ dos2unix bff.sh

or if you somehow find yourself without the dos2unix command, do this:

$ sed "s/\r//" bff.sh > bff-fixed.sh
$ mv bff-fixed.sh bff.sh

Site Organization

Content on Friendster seems to be primarily organized by the id number of the users, which were sequentially assigned starting at 1. This will make it fairly easy for wget to scrape the site and for us to break it up into convenient work units. The main components we need to scrape are the profile pages, photo albums and blogs, but there may be others. More research is needed

Profiles

Urls of the form 'profiles.friendster.com/<userid>'. Many pictures on these pages are hosted on urls that look like 'photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>.jpg', but these folders aren't browsable directly. Profiles will not be easy to scrape with wget.

Photo Albums

A user's photo albums are at urls that look like 'www.friendster.com/viewalbums.php?uid=<userid>' with individual albums at 'www.friendster.com/viewphotos.php?a=<album id>&uid=<userid>'. It appears that the individual photo pages use javascript to load the images, so they will be very hard to scrape.

On the individual album pages, the photo thumbnails are stored under similar paths as the main images. i.e. if the album thumb is at photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>m.jpg, just drop the final 'm' to get the main photo (or replace it with a 't' to get an even tinier version).

Blogs

Blogs are hosted by a wordpress install, typically at (somename).blog.friendster.com for the actual blog pages, with images hosted on (somename).blogs.friendster.com, where that name is the same, and picked by the user.

Groups

Friendster groups (only visible when logged in) have a profile picture, a list of members, photos, discussions (a forum) and announcements.

The group ids range from 1 to 3253050.

Forums

There are general Friendster forums on www.friendster.com/forums/. It is not clear whether they will remain open or disappear as well. (Note: I downloaded them, Alard.)

Range Signup Sheet

We're going to break up the user ids into ranges and let individuals claim a range to download. Use this table to mark your territory:

Start	End	Status	Size (Uncompressed)	Claimant
1	999	Uploaded	55MB	closure
1,000	1,999	Uploaded	283MB	alard
2,000	2,999	Uploaded	473MB	DoubleJ
3,000	3,999	Downloaded	234MB	Teaspoon
4,000	4,999	Uploaded	183MB	Paradoks
5,000	5,999	Uploaded	202MB	robbiet48/Robbie Trencheny (Amsterdam)
6,000	9,999	Uploaded	1.1gb	Sketchcow/Jason Scott
10,000	29,999	Uploaded	5.1gb	Sketchcow/Jason Scott
30,000	31,999	Uploaded	485mb	Sketchcow/Jason Scott
32,000	32,999	Uploaded	201MB	Paradoks
33,000	33,999	Uploaded	241mb	closure
34,000	100,000	Uploaded	unknown (20+ gb?)	closure
100,000	101,000	Downloaded	205.6 MB	xlene
101,001	102,000	Uploaded	232MB	robbiet48/Robbie Trencheny (Florida)
102,001	103,000	Uploaded	241MB	robbiet48/Robbie Trencheny (Amsterdam)
103,001	104,000	Uploaded		yipdw
104,001	105,000	Downloaded	252MB	Coderjoe
105,001	114,999	Uploaded	2.1GB	Paradoks
115,000	116,999	Uploaded		yipdw
117,000	119,999	Downloaded	815MB	Coderjoe
120,000	130,000	Uploaded	2.3GB	robbiet48/Robbie Trencheny (Florida)
130,000	140,000	Uploaded ia700601.us.archive.org/5/incoming/gv/friendster.130000-140000.tar		robbiet48/Robbie Trencheny (Florida) (Reclaimed by Underscor 15:24, 19 June 2011 (UTC))
140,001	160,000	Uploaded		yipdw
160,001	180,000	Downloaded	2.4GB	jch
180,001	200,000	Uploaded		yipdw
200,001	220,000	Downloaded	8.4GB	Coderjoe
220,001	230,000	Uploaded		xlene (Reclaimed by alard, 19 June 2011)
230,001	240,000	Uploaded	4.4GB	alard
240,001	250,000	Downloaded		Teaspoon
250,001	260,000	Uploaded ia700601.us.archive.org/5/incoming/gv/friendster.250001-260000.tar		robbiet48/Robbie Trencheny (Newark) (Reclaimed by Underscor 21:35, 19 June 2011 (UTC))
260,001	270,000	Uploaded	4.0GB	robbiet48/Robbie Trencheny (Fremont 1)
270,001	280,000	Uploaded	3.2GB	robbiet48/Robbie Trencheny (Fremont 2)
280,001	290,000	Uploaded	3.8GB	DoubleJ
290,001	300,000	Uploaded	3.9GB	dnova
310,001	320,000	Downloaded	5.1GB	Coderjoe
320,001	330,000	Uploaded ia700601.us.archive.org/5/incoming/gv/friendster.320001-330000.tar		robbiet48/Robbie Trencheny (Oakland) (Reclaimed by Underscor 23:20, 19 June 2011 (UTC))
330,000	340,000	Uploaded		closure
340,000	400,000	Uploaded	25gb	Sketchcow/Jason Scott
400,001	500,000	Uploaded	40 GB	DoubleJ
500,000	600,000	Downloaded	37 GB	closure (penguin)
600,001	700,000	Uploaded ia700601.us.archive.org/5/incoming/gv/friendster.600001-700000.tar		no2pencil (Reclaimed by Underscor 12:50, 20 June 2011 (UTC))
700,001	800,000	Uploaded	36GB	proub/Paul Roub
800,001	900,000	Uploaded	39GB	proub/Paul Roub
900,001	1,000,000	Uploaded (gv7@blindtiger)	36GB	Soult
1,000,001	1,100,000	Downloaded by DoubleJ	32 GB	Avram (reclaimed by DoubleJ 6/21 3PM EDT)
1,100,001	1,200,000	Uploaded	33GB	Paradoks
1,200,001	1,300,000	Uploaded	36 GB	db48x
1,300,000	1,400,000	Downloaded	36 GB	closure (penguin) (reclaimed by db48x, just in case)
1,400,001	1,500,000	Uploaded		alard
1,500,001	1,600,000	Downloaded		ksh/omglolbah
1,600,001	1,700,000	Downloaded		ksh/omglolbah
1,700,001	1,800,000	Downloaded		ksh/omglolbah
1,800,001	1,900,000	Downloaded		ksh/omglolbah
1,900,001	2,000,000	Downloaded		ksh/omglolbah
2,000,001	2,100,000	Downloaded		ksh/omglolbah
2,100,001	2,200,000	Downloaded	65 GB	Teaspoon
2,200,001	2,300,000	Uploaded	50gb compressed	Darkstar
2,300,001	2,400,000	Uploaded	70gb compressed	Darkstar
2,400,001	2,500,000	Downloaded		underscor (snookie)
2,500,001	2,600,000	Downloaded by underscor		Bardicer (Reclaimed by Underscor 04:02, 22 June 2011 (UTC))
2,600,001	2,700,000	Downloaded by underscor		Robbie Trencheny (Amsterdam) (Reclaimed by Underscor 18:44, 24 June 2011 (UTC))
2,700,001	2,800,000	Downloaded by underscor		Robbie Trencheny (Fremont 2) (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
2,800,001	2,900,000	Downloaded	139GB	Coderjoe (system1)
2,900,001	3,000,000	Downloaded	154GB	Coderjoe (system2)
3,000,001	3,100,000	Uploaded	78GB	Qwerty0
3,100,001	3,600,000	Claimed		Jason Scott/Sketchcow
3,600,001	3,700,000	Downloaded	202 GB	DoubleJ
3,700,001	3,800,000	Uploaded		yipdw
3,800,001	3,900,000	Uploaded		oli
3,900,001	4,000,000	Claimed		Jason Scott/Sketchcow
3,985,001	4,000,000	Downloaded	32GB	Coderjoe (per Sketchcow's request)
4,000,001	4,100,000	Downloaded by DoubleJ		primus102 (reclaimed by DoubleJ 6/22 3:15PM EDT)
4,100,001	4,200,000	Downloaded		Zebranky
4,200,001	4,300,000	Claimed		Zebranky (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
4,300,001	4,399,999	Uploaded	255GB (196GB compressed)	db48x
4,400,000	4,599,999	Downloaded	364GB (480 uncompressed)	Jade Falcon
4,600,000	4,799,999	Uploaded (gv7@blindtiger)		Soult
4,800,000	4,809,999	Uploaded		alard
4,810,000	4,899,999	Uploaded		oli
4,900,000	4,999,999	Uploaded	216GB (160GB compressed)	db48x
5,000,000	5,099,999	Downloaded by underscor		jch (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
5,100,000	5,199,999	Downloading (20%)		hydruh
5,200,000	5,299,999	Uploaded		chris_k
5,300,000	5,349,000	Uploaded	177~GB	ersi
5,349,001	5,359,000	Uploaded ia700601.us.archive.org/5/incoming/gv/data_5349001_5359000.tar.gz	13GB	Underscor 03:25, 22 May 2011 (UTC)
5,359,001	5,360,000	Uploaded ia700601.us.archive.org/5/incoming/gv/data_5359000_5360000.tar.bz2		Underscor 03:25, 22 May 2011 (UTC)
5,360,001	5,370,000	Uploaded ia700601.us.archive.org/5/incoming/gv/data_5360001_5370000.tar	11GB	Underscor 03:25, 22 May 2011 (UTC)
5,370,001	5,470,000	Downloaded		Underscor 03:25, 22 May 2011 (UTC)
5,470,001	5,570,000	Downloaded		Underscor 03:25, 22 May 2011 (UTC)
5,570,001	5,670,000	Downloaded		Underscor 03:25, 22 May 2011 (UTC)
5,670,001	6,349,999	Downloading		jeremydouglass
6,350,000	6,449,999	Uploaded	212~GB	Paradoks
6,450,000	6,550,000	Uploaded		yipdw
6,550,001	6,700,000	Uploaded		oli
6,700,000	6,800,000	Claimed		closure (penguin)
6,800,001	6,900,000	Uploaded		alard
6,900,001	7,000,000	Uploaded		oli
7,000,001	7,100,000	Uploaded		seanp2k (likwid/@ip2k on twitter)
7,100,001	7,150,000	Downloaded		oli
7,150,001	7,250,001	Downloaded	204 GB (160G compressed)	dashcloud
7,250,002	7,299,999	Downloaded		db48x
7,300,000	7,399,999	Uploaded	171 GB	DoubleJ
7,400,000	7,499,999	Downloaded	138GB compressed	dsquared
7,500,000	7,599,999	Uploaded		oli
7,600,000	7,699,999	Uploaded		oli
7,700,000	7,799,999	Downloaded		seanp2k (likwid/@ip2k on twitter)
7,800,000	7,899,999	Uploaded		seanp2k (likwid/@ip2k on twitter)
7,900,000	7,999,999	Compressing		seanp2k (likwid/@ip2k on twitter)
8,000,000	8,099,999	Downloaded by underscor		primus102 (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
8,100,000	8,199,999	Uploaded		alard
8,200,000	8,299,999	Downloaded	190GB / 145GB tar.gz	jeremydouglass
8,300,000	8,399,999	Downloaded	192GB	Beardicus
8,400,000	8,449,999	Compressed	100GB	Shadyman (Yes, 50k IDs)
8,450,000	8,599,999	Uploading		aristotle
8,600,000	8,699,999	Uploaded	131GB	chris_k
8,700,000	8,715,999	Uploaded		alard (redownloading vertevero's range)
8,716,000	8,999,999	Uploaded (possible errors, email aggroskater AT gmail DOT com if reup needed)		aggroskater
9,000,000	9,035,999	Uploaded ia700601.us.archive.org/5/incoming/gv/friendster.9000000-9035999.tar		chris_k (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
9,036,000	9,099,999	Downloaded	95 GB	db48x
9,100,000	9,199,999	Downloaded	139 GB	DoubleJ
9,200,000	9,299,999	Uploaded		chris_k
9,300,000	9,399,999	Downloaded		db48x
9,400,000	9,499,999	Downloaded	161 GB	db48x
9,500,000	9,529,999	Uploaded		alard
9,530,000	9,599,999	Downloaded	115G	Coderjoe (realigning)
9,600,000	9,699,999	Uploaded		aristotle
9,700,000	9,799,999	Downloaded		DoubleJ
9,800,000	9,899,999	Uploaded	144G	aristotle
9,900,000	9,999,999	Uploaded	149G	aristotle
10,000,000	10,050,000	Uploaded		yipdw (50k intentional)
10,050,001	10,100,000	Uploaded	96G	dinomite
10,100,001	10,199,999	Uploaded		chris_k
10,200,000	10,300,000	Downloaded	204GB	Coderjoe (yes, 100k1)
10,300,001	10,399,999	Uploaded	179 G (141 G compressed)	dashcloud
10,400,001	10,499,999	Claimed		Lambda_Driver
10,500,000	10,599,999	Uploaded	199G (155G compressed)	dinomite
10,600,000	10,699,999	Uploaded		dinomite
10,700,000	10,799,999	Downloaded	196 GB	DoubleJ
10,800,000	10,849,999	Downloaded		Shadyman
10,850,000	10,899,999	Downloaded		Underscor 18:56, 24 May 2011 (UTC)
10,900,000	10,999,999	Uploaded		chris_k