Thursday, September 16, 2010

tvnz-grab in factor

The previous post in factor. I wrote it so my Windows using friends could have a single-binary solution. Download the zip archive and unzip somewhere in the windows path; C:\windows\system32 will do.

Usage: tvnz-grab <episode-page-url>. It will download all parts of the episode into the current directory.

Unfortunately, this only works for NZ content. Foreign content uses Adobe’s encrypted RTMP protocol. To get at episodes using that, check out rtmpsuck.

Imports first.

! Copyright (C) 2010 Jeremy Hughes.
! See for BSD license.
USING: accessors assocs combinators.short-circuit fry
http.client io.streams.byte-array kernel namespaces make
regexp sequences xml locals splitting strings
io.encodings.binary io io.files command-line http system
math.parser destructors math math.functions io.pathnames
continuations xml.traversal ;
IN: tvnz-grab

Because I intend to extend this program into a small Qt demo, it is necessary that any words used to display UI information, dispatch on the type of UI.


And the display methods themselves…

HOOK: show-progress ui ( chunk full -- )
HOOK: show-begin-fetch ui ( url -- )
HOOK: show-end-fetch ui ( -- )
HOOK: show-page-fetch ui ( -- )
HOOK: show-playlist ui ( seq -- )
HOOK: show-fatal-error ui ( error -- )

M\ text show-progress uses the symbols bytes and count to keep track of the number of bytes downloaded and the proportion of progress-bar fill respectively.

: print-bar ( full chunk -- )
    count [
        [ swap / 50 * round ] dip [
            - CHAR: =
             >string write
        ] [ drop ] 2bi
    ] change ;

M: text show-progress
    swap bytes [ + [ print-bar ] keep ] change flush ;

M: text show-begin-fetch
    "Fetching " write print "[" write flush ;

M: text show-end-fetch
    "]" print flush ;

M: text show-page-fetch
    "Fetching TVNZ page..." print flush ;

M: text show-playlist
    length "Found " write number>string write " parts." print
    flush ;

M: text show-fatal-error
    dup string? [ print ]
    [ drop "Oops! Something went wrong." print ] if 1 exit ;

Failed HTTP request errors need to be wrapped in a user friendly explanation.

: wrap-failed-request ( err -- * )
        "HTTP request failed: " % [ message>> % ]
        [ " (" % code>> number>string % ")" % ] bi
    ] "" make throw ;

The playlist parameter in each episode’s web page is in a section of code looking like this.

var videoVars = {
    playlist: '/content/3685181/ta_ent_smil_skin.smil',
    config: '/fmsconfig.xml',
    ord: ord

Given this code could change unpredictably, we’ll use nothing more robust than a regular expression to get at the playlist path.

: get-playlist ( url -- data )
    http-get [ check-response drop ]
    [ R/ (?<=playlist: ').*(?=')/ first-match ] bi* [
        "" prepend http-get [
            [ check-response drop ]
            [ wrap-failed-request ] recover
        ] dip
    ] [ "Could not find playlist at address." throw ] if* ;

The playlist is an XML file of which only the video elements concern us.

<video src="path-to-segment.flv" systemBitrate="700000"/>

700000 appears to be the highest bit rate so that is what we’ll go for.

parse-playlist takes the output of get-playlist and returns a list of URLs to episode segments.

: parse-playlist ( data -- urls )
    bytes>xml body>> "video" "700000" "systemBitrate"
    [ [ drop "src" ] [ attrs>> ] bi at ] map [ ] filter ;

Each segment is downloaded in chunks.

: call-progress ( data -- )
    length response get check-response
    "content-length" header string>number show-progress ;

: process-chunk ( data stream -- )
    [ stream-write ] [ drop call-progress ] 2bi ;

: get-video-segment ( url -- )
    [ show-begin-fetch ] [ ]
    [ part-name binary  ] tri
    [ '[ _ process-chunk ] with-http-get drop flush ]
    with-disposal show-end-fetch ;

: get-video-segments ( urls -- )
    [ get-video-segment ] each ;

grab-episode is where the action starts.

: (grab-episode) ( url -- )
    show-page-fetch get-playlist parse-playlist dup
    show-playlist [
        0 bytes count [ set ] bi-curry@ bi get-video-segments
    ] with-scope ;

: grab-episode ( url -- )
    [ (grab-episode) ] [ nip show-fatal-error ] recover ;

For the complete program see my github.

Saving TVNZ “ondemand” episodes

The code in my previous post will work for episodes from TVNZ’s ondemand service; but there is a more direct approach.

Each episode page stores a URL to the episode’s playlist in a javascript property "playlist". The URL points to an XML file containing the URLs of four or more video parts and no ads. The following script downloads the episode page, grabs the playlist URL, downloads the playlist, and parses it. It returns a newline separated list of video URLs.

You will need to have curl and xmlstarlet installed.


if [ "$1" == "" ]; then
    echo "usage: $0 episode-url"
    exit 1


PLAYLIST=`curl "$EPISODE" | grep -Po "(?<=playlist: ').*?(?=')"`

curl "$PLAYLIST" | \
    xml sel -N smil= \
        -t -m "//smil:video[@systemBitrate='700000']" -v '@src' -n -

I have saved it as tvnz-grab and call it like so:

tvnz-grab <episode-page-url> | xargs curl -O

The four or five parts can be played as a single file using mplayer’s "-fixed-vo" option.

Saving flash video on linux

On linux, the flash-player caches video in files under /tmp. Each video gets a file called /tmp/Flash[random-string]. These are easy enough to find and rename to [meaningful-string].flv. However, some players delete these files on completion, which is a pain given they are the files containing the video we want to save.

Hard linking these files solves the problem. The Flashxxx files will be deleted by the player, but their contents will sill be available at whatever location we made the hard link.

Some players split their content into multiple files, so we want this hard linking process to be automatic.

I have the following script saved as lnflv in my ~/bin.


cd /tmp
for x in Flash*; do
    count=`find . -xdev -samefile "$x" | wc -l`
    if [ $count -gt 1 ]; then
        echo "Not linking $x. Already linked."
        echo "Linking $x."
        ln "$x" "dl-$x.flv"

Run it every thirty seconds like this: watch -n30 lnflv. Each flash file not already hard linked will be linked to /tmp/dl-Flashxxx.flv.

Order by time and rename video parts with something like this:

for f in `find . -size +10M -name dl-Flash\* | xargs ls -tr`; do
    cp "$f" [name]$alpha.flv
    alpha=`echo -n $alpha | tr 'a-y' 'b-z'`

Change [name] to the name of whatever you are watching. "-size +10M" filters out files smaller than 10MB; this screens out short ads and is only required for video streams that embed them.