Video thumbnails are often the first things viewers see when they look for something interesting to watch. A strong, vibrant, and relevant thumbnail draws attention, giving viewers a quick preview of the content of the video, and helps them to find content more easily. Better thumbnails lead to more clicks and views for video creators.
Inspired by the recent remarkable advances of deep neural networks (DNNs) in computer vision, such as image and video classification, our team has recently launched an improved automatic YouTube "thumbnailer" in order to help creators showcase their video content. Here is how it works.
The Thumbnailer Pipeline While a video is being uploaded to YouTube, we first sample frames from the video at one frame per second. Each sampled frame is evaluated by a quality model and assigned a single quality score. The frames with the highest scores are selected, enhanced and rendered as thumbnails with different sizes and aspect ratios. Among all the components, the quality model is the most critical and turned out to be the most challenging to develop. In the latest version of the thumbnailer algorithm, we used a DNN for the quality model. So, what is the quality model measuring, and how is the score calculated?
(Training) The Quality Model Unlike the task of identifying if a video contains your favorite animal, judging the visual quality of a video frame can be very subjective - people often have very different opinions and preferences when selecting frames as video thumbnails. One of the main challenges we faced was how to collect a large set of well-annotated training examples to feed into our neural network. Fortunately, on YouTube, in addition to having algorithmically generated thumbnails, many YouTube videos also come with carefully designed custom thumbnails uploaded by creators. Those thumbnails are typically well framed, in-focus, and center on a specific subject (e.g. the main character in the video). We consider these custom thumbnails from popular videos as positive (high-quality) examples, and randomly selected video frames as negative (low-quality) examples. Some examples of the training images are shown below.
Results Compared to the previous automatically generated thumbnails, the DNN-powered model is able to select frames with much better quality. In a human evaluation, the thumbnails produced by our new models are preferred to those from the previous thumbnailer in more than 65% of side-by-side ratings. Here are some examples of how the new quality model performs on YouTube videos:
We recently launched this new thumbnailer across YouTube, which means creators can start to choose from higher quality thumbnails generated by our new thumbnailer. Next time you see an awesome YouTube thumbnail, don’t hesitate to give it a thumbs up. ;)
Weilong Yang, software engineer, recently watched “Contact Juggling - His Skills are Totally Hypnotizing” Min-hsuan Tsai, software engineer, recently watched ”People Are Awesome 2015” Thanks to the Video Content Analysis and YouTube Creator teams
Me at the Zoo is the first video uploaded to YouTube
One year after YouTube launches, videos play in the FLV container with the H.263 codec at a maximum resolution of 240p. We scale videos up to 640x360, but you can still click a button to play at original size.
YouTube is one of the original applications on the iPhone. Because it doesn't support Flash, we re-encode every single YouTube video into H.264 with the MP4 container. YouTube videos get a resolution notch to 360p.
With upload sizes and download speeds growing, videos jump in size up to 720p HD. Lower resolution files get higher quality by squeezing Main Profile H.264 into FLVs.
YouTube supports 3D videos, 1080p and live streaming.
The biggest screen in your house now gets YouTube courtesy of Flash Lite and ActionScript 2. 2010 also sees the first playbacks with HTML5 <video> thanks to VP8, an open source video codec. We bump up the maximum resolution to 4K, known as "Original" at the time.
We launch Sliced Bread, codename for a project that enables adaptive bitrate in the Flash player by requesting videos a little piece at a time. Users see higher quality videos more often and buffering less often.
We scale up our live streaming infrastructure to support the 2012 Summer Olympics, with over 1,200 events. In October, over 8 million people watch live as Felix Baumgartner jumps from the stratosphere.
We start our first experiments with VP9 in Chrome, which brings higher quality video at less bandwidth. Adaptive bitrate streaming in the HTML5 and Flash players moves to the DASH standard using both FMP4 and MKV video containers.
High frame rate isn't just for games anymore: YouTube now supports videos that play in up to 60fps. Gangnam Style becomes the first YouTube video to break the MAX_INT barrier with more than 232 / 2 - 1 views.
You can now upload videos that wrap 360 degrees around the viewer. Even 4K videos can play up to 60fps. HTML5 becomes the default YouTube web player.
Richard Leider, Engineering Manager, recently watched David Bowie - Oh You Pretty Things Jonathan Levine, Product Manager, recently watched Candide Thovex - One of those days 2