Tags in Space

A lot of you enjoyed our post (“Found in Space”) on the amazing astrometry.net project, and there have been some interesting followups.

A mysterious figure known only as “jim” paired up astronomy photos from Flickr with Google Sky. (You’re going to need the Google Earth plug-in for your browser — just follow the instructions on that page if you don’t have it.) In his technical writeup, “jim” explains how he used the Yahoo Query Language (YQL) to fetch the data. YQL is similar to the existing Flickr APIs, but it’s a query language like SQL rather than a set of REST-ish APIs. And both of those are really just ways to get data out of Flickr’s machine tag system, specifically the astro:* namespace. It’s turtles all the way down.

Who else is using astrotags? The British Royal Observatory in Greenwich is sponsoring a contest to determine the Astronomy Photographer of the Year and the whole thing is based on a Flickr group and extensive use of Flickr’s APIs. The integration is so seamless — galleries of photos and discussions are surfaced on their site as well as ours — you might as well consider Flickr to be their “backend” server. But they’ve also added much, such as great documentation about how to astrotag your photos as well as a concise explanation about how Astrometry.net identifies your photo, even among millions of known stars. (The sci-fi website io9 interviewed Fiona Romeo of the Royal Observatory about the contest; check it out.)

It’s dizzying how many services have been combined here — Astrometry.net grew out of research at the University of Toronto, web mashups use Google Sky for visualization in context, Yahoo infrastructure delivers and transforms data, the Royal Observatory at Greenwich provides leadership and expertise, and then little old Flickr acts as a data repository and social hub. And let’s not forget you, the Flickr community, and your inexhaustible creativity — which is the reason why all this can even come together.

All this was done with pretty light coordination and few people at Flickr were even aware what was going on until recently. I have no idea what the future is for APIs and a web of services loosely joined, but I hope we get to see more and more of this sort of thing.

Building Fast Client-side Searches

Yesterday we released a new people selector widget (which we’ve been calling Bo Selecta internally). This widget downloads a list of all of your contacts, in JavaScript, in under 200ms (this is true even for members with 10,000+ contacts). In order to get this level of performance, we had to completely rethink how we send data from the server to the client.

Server Side: Cache Everything

To make this data available quickly from the server, we maintain and update a per-member cache in our database, where we store each member’s contact list in a text blob — this way it’s a single quick DB query to retrieve it. We can format this blob in any way we want: XML, JSON, etc. Whenever a member updates their information, we update the cache for all of their contacts. Since a single member who changes their contact information can require updating the contacts cache for hundreds or even thousands of other members, we rely upon prioritized tasks in our offline queue system.

Testing the Performance of Different Data Formats

Despite the fact that our backend system can deliver the contact list data very quickly, we still don’t want to unnecessarily fetch it for each page load. This means that we need to defer loading until it’s needed, and that we have to be able to request, download, and parse the contact list in the amount of time it takes a member to go from hovering over a text field to typing a name.

With this goal in mind, we started testing various data formats, and recording the average amount of time it took to download and parse each one. We started with Ajax and XML; this proved to be the slowest by far, so much so that the larger test cases wouldn’t even run to completion (the tags used to create the XML structure also added a lot of weight to the filesize). It appeared that using XML was out of the question.

BoSelectaJsonGoodFunTimes: eval() is Slow

DJ Bo Selecta on the decks

Next we tried using Ajax to fetch the list in the JSON format (and having eval() parse it). This was a major improvement, both in terms of filesize across the wire and parse time.

While all of our tests ran to completion (even the 10,000 contacts case), parse time per contact was not the same for each case; it geometrically increased as we increased the number of contacts, up to the point where the 10,000 contact case took over 80 seconds to parse — 400 times slower than our goal of 200ms. It seemed that JavaScript had a problem manipulating and eval()ing very large strings, so this approach wasn’t going to work either.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,617 1536 81312 0.14 7.66
4,878 681 18842 0.14 3.86
2,979 393 6987 0.13 2.35
1,914 263 3381 0.14 1.77
1,363 177 1837 0.13 1.35
798 109 852 0.14 1.07
644 86 611 0.13 0.95
325 44 252 0.14 0.78
260 36 205 0.14 0.79
165 24 111 0.15 0.67

JSON and Dynamic Script Tags: Fast but Insecure

Working with the theory that large string manipulation was the problem with the last approach, we switched from using Ajax to instead fetching the data using a dynamically generated script tag. This means that the contact data was never treated as string, and was instead executed as soon as it was downloaded, just like any other JavaScript file. The difference in performance was shocking: 89ms to parse 10,000 contacts (a reduction of 3 orders of magnitude), while the smallest case of 172 contacts only took 6ms. The parse time per contact actually decreased the larger the list became. This approach looked perfect, except for one thing: in order for this JSON to be executed, we had to wrap it in a callback method. Since it’s executable code, any website in the world could use the same approach to download a Flickr member’s contact list. This was a deal breaker.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,709 1105 89 0.10 0.01
4,877 508 41 0.10 0.01
2,979 308 26 0.10 0.01
1,915 197 19 0.10 0.01
1,363 140 15 0.10 0.01
800 83 11 0.10 0.01
644 67 9 0.10 0.01
325 35 8 0.11 0.02
260 27 7 0.10 0.03
172 18 6 0.10 0.03

Going Custom

Custom Ride

Having set the performance bar pretty high with the last approach, we dove into custom data formats. The challenge would be to create a format that we could parse ourselves, using JavaScript’s String and RegExp methods, that would also match the speed of JSON executed natively. This would allow us to use Ajax again, but keep the data restricted to our domain.

Since we had already discovered that some methods of string manipulation didn’t perform well on large strings, we restricted ourselves to a method that we knew to be fast: split(). We used control characters to delimit each contact, and a different control character to delimit the fields within each contact. This allowed us to parse the string into contact objects with one split, then loop through that array and split again on each string.

that.contacts = o.responseText.split("\c");

for (var n = 0, len = that.contacts.length, contactSplit; n < len; n++) {

	contactSplit = that.contacts[n].split("\a");

	that.contacts[n] = {};
	that.contacts[n].n = contactSplit[0];
	that.contacts[n].e = contactSplit[1];
	that.contacts[n].u = contactSplit[2];
	that.contacts[n].r = contactSplit[3];
	that.contacts[n].s = contactSplit[4];
	that.contacts[n].f = contactSplit[5];
	that.contacts[n].a = contactSplit[6];
	that.contacts[n].d = contactSplit[7];
	that.contacts[n].y = contactSplit[8];
}

Though this technique sounds like it would be slow, it actually performed on par with native JSON parsing (it was a little faster for cases containing less than 1000 contacts, and a little slower for those over 1000). It also had the smallest filesize: 80% the size of the JSON data for the same number of contacts. This is the format that we ended up using.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,741 818 173 0.08 0.02
4,877 375 50 0.08 0.01
2,979 208 34 0.07 0.01
1,916 144 21 0.08 0.01
1,363 93 16 0.07 0.01
800 58 10 0.07 0.01
644 46 8 0.07 0.01
325 24 4 0.07 0.01
260 14 3 0.05 0.01
160 13 3 0.08 0.02

Searching

Ben to the Rescue

Now that we have a giant array of contacts in JavaScript, we needed a way to search through them and select one. For this, we used YUI’s excellent AutoComplete widget. To get the data into the widget, we created a DataSource object that would execute a function to get results. This function simply looped through our contact array and matched the given query against four different properties of each contact, using a regular expression (RegExp objects turned out to be extremely well-suited for this, with the average search time for the 10,000 contacts case coming in under 38ms). After the results were collected, the AutoComplete widget took care of everything else, including caching the results.

There was one optimization we made to our AutoComplete configuration that was particularly effective. Regardless of how much we optimized our search method, we could never get results to return in less than 200ms (even for trivially small numbers of contacts). After a lot of profiling and hair pulling, we found the queryDelay setting. This is set to 200ms by default, and artificially delays every search in order to reduce UI flicker for quick typists. After setting that to 0, we found our search times improved dramatically.

The End Result

Head over to your Contact List page and give it a whirl. We are also using the Bo Selecta with FlickrMail and the Share This widget on each photo page.

[changelog] Revision of the Places page, also Neighborhoods

A slightly overdue (and longer than it was supposed to be) post, considering this happened a while ago, but I thought I’d mention a few subtle updates to the Places page.

Even before that though we’ve added neighborhood links to the Photo pages, before we just listed the neighborhood …

Neighborhood Link

… now its a link through to the Places page itself, which look rather like this …

Neighborhoods (South Bank)

Meaning that from a photo that’s been geotagged you’ll be able to get more of a feel for the local area. Obviously this work better in large Cities where the neighborhoods themselves can be as big as towns, while in the towns you’re more likely to find the one or two photographers who count each neighborhood as their stomping ground.

I guess that’s the big addition, but we also tweaked a few other things at the same time, here’s a before and after shot, you’ll probably need to click through to the larger size if you want to see the details.

Old and Updated Places Pages

On the left side of each Places page we’ve moved different elements around, pushing the search further up, the date/time down and scrapping the weather altogether now that we’ve established that it rains in London.

The functional changes over on the right involved moving the title, attribution, Next & Previous buttons off the photo. When we launched Places we didn’t have Videos, and now we do and the old position clashed with the video controls.

The other benefit of the Next/Prev moving to above the photo is that they dont jump around as the photos resized. We also added key controls, now you can just press the forward and back cursor buttons on the keyboard to keep going through photos, power user tip!

The “paging” buttons no longer hover over the top of the thumbnails, as they were …

  1. Annoying.
  2. Not always obvious.

 

On a more technical level, now only geotagged photos appear on the Places page, and where possible the location shown on the map (sometimes with Neighborhoods, due to the nature of the beast, they can be just off the edge of the map).

Big obvious arrow demonstrating the feature :)

Geotagged

When we first launched the Places page we wanted to make sure that each location had plenty of photos, so we used a combination of geotagging and tags/description to automate the selection of them. Which lead to interesting results such as the city of Reading in England featuring a lot of photos about books (tagged reading, natch). Now that we have over 100 Million geotagged photos we’ve switched to “just” them.

We also factor in the Season a photo was taken in to select the interesting ones, to give us a bit more change in the first photos you see and too keep them relatively, well, seasonal. We’ll probably tweak this again soon to get them to rotate even more often, but still working through that one.

City Colours and Endless Photos

As mentioned above we moved the time down and, partly for whimsy, partly because they’re really useful, used the Dopplr colour to display it and link their pages. Here’s our Los Angeles page and Dopplr’s Los Angeles page, Dopplr decided to use Flickr to select photos for each City they know about, so we thought we borrow their colour in return :)

You can read more about how Dopplr (and therefor us) calculate the colour for a place over on their blog: In rainbows and Darker city colours.

Dopplr And Single Row

Finally, because I’ve gone on enough already, the old design used to have two rows of thumbnails under the main photo and a total of 72 photos, meaning there were 6 “pages” of thumbnails. When you got to the last page, that was kinda it, you couldn’t go any further.

Instead there’s now just one row, but I bolted on the API, so it keeps trying to load more and more photos as you get close to the end of the current lot. Instead of just 72 photos for Los Angeles there’s now the full (currently) nearly half million 444,594 photos.

Which reminds me, we should probably add a Slideshow that that page :)

And that’s the revised Places page.

[Edit: Oh and I know we've just launched Stats (again), but it's nice to give the dev a couple of days to recover before forcing them to write a changelog about it ;) ]

Moar Panda: Is a Firehose of Snowflakes a Nor’easter?

Kellan in his blog post over here: Is a Firehose of Snowflakes a Nor’easter? correctly points that that I may have been slightly cagey about the awesomeness of what we actually put out yesterday, by saying …

“But because the documentation is quirky, I think people missed the significance. These are Flickr real time data APIs.”

My excuse is that’s my natural instinct to sneak new features past the lawyers, by dressing them up in a non-serious fashion :)

But in reality, this is just an excuse to post the following photo …

Magic Panda Bus

… see there I go again. Offsetting the important with light humor, go read Kellan’s post now for he is smart and knows what he’s talking about.

Now stop trying to get on explore & enjoy yourself

Photo by Las Tonterias points out

Panda Tuesday; The History of the Panda, New APIs, Explore and You

The New APIs

I’ll cut straight to the chase on this one, we’ve just launched two new API methods;

flickr.panda.getPhotos
flickr.panda.getList

The first call (flickr.panda.getPhotos) returns you a list of photos that the mystical Flickr Pandas are currently interested in, in the following format …

<photos interval="60000" lastupdate="1235765058272" total="120" panda="ling ling">
    <photo title="Shorebirds at Pillar Point" id="3313428913" secret="2cd3cb44cb"
        server="3609" farm="4" owner="72442527@N00" ownername="Pat Ulrich"/>
    <photo title="Battle of the sky" id="3313713993" secret="3f7f51500f"
        server="3382" farm="4" owner="10459691@N05" ownername="Sven Ericsson"/>
    <!-- and so on -->
</photos>

When calling this API method please ensure that your code uses the lastupdate and interval attributes to determine when to request new photos. lastupdate is a Unix timestamp indicating when the list of photos was generated and interval is the number of seconds to wait before polling the Flickr API again.

The second call (flickr.panda.getList) tells you which of the Mystical Flickr Pandas are around to request photos from.

Ling Ling and Hsing Hsing both return photos they are currently interested in, both have slightly different tastes in photos depending on their mood. The (currently) third Panda Wang Wang returns photos that have recently been geotagged, not quite realtime but close.

Important! No-one fully understands the whims and fancies of the Pandas at any moment in time, these APIs give you a direct dump of what they are looking at (filtered for safe and public photos only) at this very moment. We cannot promise that this will not include ladies (or men) in bikinis, or possibly even ladies not in bikinis. You should keep this in mind while planning for your target audience, I’ll touch on this a little bit more in the Implementation Hints section.

You Wanted the Best, You Got the Best

Philosophy

We’re always toying around with different ways to find and surface “interesting” photos. One method that people are generally familiar with is the Explore page, of which there’s always a lot of discussion. A good starting place for more Explore information is here with more general discussion in the Secrets Of Explore group.

But behind Explore is an amount of “Secret Sauce” a lot of which focuses on activity around a media object. The Explore page is generally a daily experience, although things often pop in and out during the day.

What we are doing with the Pandas is exposing a more real time experience, letting them each focus on various aspects of that activity, the first two Ling Ling and Hsing Hsing are fairly introspective and have different views of Flickr, but both focus on what they’ve noticed going on recently. The third Wang Wang is more a panda of the world and concentrates solely on geotagging. If you wanted to show photos being geotagged nearly as they happen, Wang Wang is the way to go.

From a numbers point of view, Explore shows 500 photos from the last 7 days … a Panda can get through around 150,000 photos per day.

A couple of caveats; Just because a Panda notices a photo, doesn’t mean that photo will make it into Explore, or even be destined to be considered by the Magic Donkey for Explore. Conversely, because a photo has made it into Explore doesn’t necessarily mean that a Panda will have noticed it. Also, because of the complexity and amount of stuff going on, no-one really knows exactly what Ling Ling and Hsing Hsing will find interesting, but that won’t stop us from tweaking the underlining code now and then.

Also, you may have noticed that this is one of the less serious APIs, it’s both experimental and there to hopefully encourage fun and play. Please respect the Pandas and don’t abuse them, help us protect this endangered species and keep them alive.

Implementation Hints

So, why are we doing all this messing around with “Pandas”?

Well, mainly because we want to give developers another stream of photos with which to play with. For example you could build your own Flickr Zeitgeist Badge or slideshow or tool to just show stuff that’s going on over at Flickr.

We built this …

Flickr Panda!

… (many people wished we hadn’t) … the Rainbow Vomiting Panda of Awesomeness as an experiment (which used Ling Ling fwiw). Since then we’ve been tweaking the backend, and now finally released the API so you can all have a go.

It’s a stream of, on average, more interesting photos then you’d generally get from polling Everyone’s photos. The quality is pretty good, the best thing to do is watch The Panda for a while and figure out if a) you want to build something with a live stream of photos b) you can build something more better than a vomiting panda (which lets face it, it pretty hard to top!).

This is what I learnt while building it, it may help you too …

When you get a packet of photos back from a panda, its around 60 seconds worth of stuff that the Panda found interesting. The packet is sorted by what the Panda found most interesting in that timeframe to least. Because people can often find ladies in bikinis interesting (for some reason) there can sometimes be some activity around ladies in bikinis that the Pandas may pick-up on.

However, Pandas tend not to find ladies in bikinis that interesting (which may have something to do with why they’re in such peril) and when those photos appear they tend to happen in lower half of the packet of photos.

So in the case of the Vomiting Panda above, I just threw away the second half of each packet, rather than go through all of the photos before asking for the next packet. I found this to be a good thing to do anyway, even though there’s always a lot of amazing stuff going on on Flickr the tail end of some packet wasn’t always quite as amazing as I wanted. You may want to do things differently, but that’s just what I found while working with the APIs.

Send us stuff

Anyway, on that note, at some point I’ll try to do a round up of stuff people have built on these APIs. If you want to post what you build to the Flickr Hacks group and/or FlickrMail me I’ll start to build up a list and take it from there.

Photos by ohad* and psd.

Videos in the Flickr API: Part Deux

We hope it’s obvious that long photos are taken extremely seriously here at FlickrHQ. To that end, it’s important that videos be first class citizens in our api. And with today’s launch of video for free users and the HD output, we have some important new api changes to share with you. So if you’re an uploader developer, a mobile application developer, or you make those things that are changing the way that people consume their tv and other media (nudge-nudge Boxee), take a look at these new additions to the api and make our long photos feel right at home in your apps.

With the addition of our HD output, all videos are now transcoded into 2 or 3 MP4s. Since these are in a widely-supported format, we thought it would be interesting to make them available to third-party developers. flickr.photos.getSizes will give you semi-permanent urls to the 700k site output, the mobile-optimized output, the 2mbps HD output (if available), and the original video (if the call is authenticated by the video owner and the owner is a pro member). For example:

<rsp stat="ok">
<sizes canblog="1" canprint="1" candownload="1">
<size label="Square" width="75" height="75" source="http://farm4.static.flickr.com/3436/3232057393_815b1c5d26_s.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/sq/" media="photo"/>
<size label="Thumbnail" width="100" height="56" source="http://farm4.static.flickr.com/3436/3232057393_815b1c5d26_t.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/t/" media="photo"/>
<size label="Small" width="240" height="135" source="http://farm4.static.flickr.com/3436/3232057393_815b1c5d26_m.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/s/" media="photo"/>
<size label="Medium" width="500" height="281" source="http://farm4.static.flickr.com/3436/3232057393_815b1c5d26.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/m/" media="photo"/>
<size label="Original" width="1280" height="720" source="http://farm4.static.flickr.com/3436/3232057393_204af3bcff_o.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/o/" media="photo"/>
<size label="Video Player" width="640" height="360" source="http://www.flickr.com/apps/video/stewart.swf?v=1233362721&photo_id=3232057393&photo_secret=815b1c5d26" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
<size label="Site MP4" width="640" height="360" source="http://www.flickr.com/photos/mylesdgrant/3232057393/play/site/815b1c5d26/" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
<size label="Mobile MP4" width="480" height="360" source="http://www.flickr.com/photos/mylesdgrant/3232057393/play/mobile/815b1c5d26/" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
<size label="HD MP4" width="1280" height="720" source="http://www.flickr.com/photos/mylesdgrant/3232057393/play/hd/815b1c5d26/" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
<size label="Video Original" width="1280" height="720" source="http://www.flickr.com/photos/mylesdgrant/3232057393/play/orig/123fakest/" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
</sizes>
</rsp>

Use this if you’re lazy. If you’re not lazy and want to be one of the cool kids, you can construct the urls to these outputs yourself, assuming you have the nsid or custom url of the owner, the video id, and the secret (or original secret). Requesting the HD output for a video that doesn’t have it (because it’s less than 720 pixels high, or the owner is a free member), will deliver a 404. An example url:

http://www.flickr.com/photos/{user-id|custom-url}/{photo-id}/play/{site|mobile|hd|orig}/{secret|originalsecret}/

If you’re developing uploading tools for free users, flickr.people.getUploadStatus has new attributes to determine whether a user is allowed to upload any more videos or not. If the user is a free user, the output of this method will now look something like this:

<rsp stat="ok">
<user id="88251462@N00" ispro="0">
<username>Dev Myles</username>
<bandwidth max="107373568" used="0" maxbytes="104857600" usedbytes="0" remainingbytes="104857600" maxkb="104857" usedkb="0" remainingkb="104857" unlimited="0"/>
<filesize max="10485760" maxbytes="10485760" maxkb="10240" maxmb="10"/>
<sets created="7" remaining="lots"/>
<videosize maxbytes="157286400" maxkb="153600" maxmb="150"/>
<videos uploaded="0" remaining="2"/>
</user>
</rsp>

Note the new <videos> element in the response, which will let you know how many more videos this user can upload this month.

And as always, if you feel like we’re missing something, please let us know.