Category Archives: Software stuff

Posts about mostly software related things

Inserting python test calls with odd data types

Kind of small “Oh wait, that’s what I should do”, but.. You know how sometimes you’re processing something and then end up doing a zillion slow http requests/file loads/calculations/etc to get the data you’re test processing? So you write it to a file and then load it, but now you’ve written a bunch that already expects it in an object? Yes, you’re right, you should restructure the whole thing so you don’t and end up with neat data objects, but ain’t nobody got time for that. So instead of redoing the code, you can just make a dummy class and pound the data into that. Ex:
Original:

import requests

s = requests.Session()
r = s.get("http://www.example.com)
a = do_huge_thing_that_isnt_a_neat_function_like_this(r.headers,r.cookies)
b = do_other_thing(r.text,a)

Debug mode:

import requests
import json

# DEBUG = 'normal'
DEBUG = 'save'
# DEBUG = 'load'

if DEBUG != 'load:
    s = requests.Session()
    r = s.get("http://www.example.com)
    if DEBUG == 'save':
        open('testdata','w').write(json.dumps([r.data,r.headers,r.cookies]))
        return
else:
    class r: pass
    [r.data, r.headers, r.cookies] = json.load(open('testdata','r'))

a = do_huge_thing_that_isnt_a_neat_function_like_this(r.headers,r.cookies)
b = do_other_thing(r.text,a)

Bam! You can still keep calling them r._stuff_ instead of redoing the places you called them that, but you can still uncomment the DEBUG = statements to do the request and save it, request it and keep going as usual, or don’t request it and just load a sample request. The variables in your class are dynamic, so as long as you’re no expecting there to be another class in the class you can just put them there. Otherwise you’d have to do something like


r.class1 = r()
r.class1.class2 = "data"

so that you create a new (implicit in the ()) to assign to. That, too, is possible and you can keep stacking them if you need to.

And then I guess later be a good boy and abstract your stuff into actual classes or functions or dicts to that your code isn’t super dependent on a data source and can be reused later (HAHAHahaha.. Yeah I know, but perhaps some day you’ll have it so sparklingly clean that reuse is an actual option).

Embedding html5 video (bonus: or flv flash video)

In the past, we’ve had to use animated gifs or flash to embed video in a web page. Well, we still do kinda – there’s still no way to include a really fully functional file that plays pretty much everywhere like gifs, and flash/java support is pretty clunky and not supported very often. But html5 does include it so if you’re OK with functioning on Chrome/Firefox/IE in any moderately recent version and leaving the people running Internet Exporer 6 under Windows XP, you can use it. It’s considerably smaller than gif (which is a *terrible* format), supports audio and provides better quality and, in most cases, lighter load on the machine playing it.

It’s pretty simple. If your video is in mp4, just upload the file somewhere and put

<video width="788" height="720" controls>
 <source src="sol50.mp4" type="video/mp4">
No html5 video tag support in your browser, sorry.
</video>

in a page somewhere. Change width and height to the appropriate size of the video and sol50.mp4 to the name of the video you uploaded (or full http://…./something.mp4 if it’s on another server). Bam..

This is a screencap of what the google doodle from the other day does if you solve the cube (captured it with ezcap, chosen fairly randoly since the ones I had installed now worked poorly after several windows updates but worked great for me). Someone asking about it prompted me to look this If you’re in a blog type format, that may get changed (this incarnation of wordpress keeps changing it to alternately include a flash version) – I’m tricking it a bit to keep it here supposing that works, but it does function otherwise too like this page.

For more compatibility, you should include an Ogg Vorbis version as some browsers needing to follow strict licencing standards do not support mp4 (a proprietary format you can’t support in software any way you’d like) but do support Ogg (a free format anyone can implement). Also, if your video isn’t in mp4, you need to convert it to that. In my particular case I needed to do both – ezcap just shoves the captured file in a wmv file in Documents\ezcap\projects\(a number)\media\(an uuid).WMV. You can convert it with anything that supports it, but I used ffmpeg, as it’s free and open source and very good (supports lots of formats).

It is a command line tool though, so nothing to click on and stuff. To use it, after you’ve installed it somewhere, go to a command prompt (or shell prompt in linux/mac) wherever your video is and do

ffmpeg -an -i input_video.wmv -b:v 150k sol50.mp4

The “-an” here means “Do not include audio”, if the video does have audio and you want to include it, skip that. “-i” is simply the input video, sol50.mp4 at the end is the output video and since it’s named .mp4, it’ll assume you’ll want it converted to an mp4. The -b:v 150k means I’m saying to use a bitrate of 150 kbit/s, which is a fairly low rate. The file size will here be around 400kb, but quality will be somewhat questionable. You can set this to something else or just not put anything to use some default (Depends on source file I think, seems to shoot for ~400k in my particular case) or specify something higher. To convert to ogg, do the same thing again but with .ogg instead of .mp4. It doesn’t seem to pay attention to -b:v, I didn’t really dig deeper into it since it’s kind of a fallback.

Include it in your tag as well, like

<video width="788" height="720" controls>
 <source src="sol50.ogg" type="video/ogg">
 <source src="sol50.mp4" type="video/mp4">
No video tag support.
</video>

..and there you go. You can run “ffmpeg -h” for a list of other commands, including “ffmpeg -encoders” / “ffmpeg -decoders” to list what it can encode/decode (it’ll vary depending on your build and version, there are several in rotation since it’s on many platforms and it’s always pretty contentious legally which formats you’re allowed to support in a piece of software without paying licencing fees to someone). It’s a very powerful little program and includes everything it does in itself (no external codecs and such), but can be a little unwieldy. VLC Media Player has a graphical user interface (you get to click on stuff), is based on many of the same libraries and can do many of the same things. It’s another option that can be a little simpler. However, it’s not *that* much simpler since it includes so many different options (despite it’s GUI, it’s quite a power tool in it’s own right). Adobe Premier can do some of it (if you were editing it heavily anyway) but then you probably know your software pretty well if you’ve takien it to those extremes. I’m sure there’s lots of more friendly options (that video editor that comes with windows? Whatever pops up when you google “free video converter”?) but I don’t have any serious recommendations. If you have a codec stack installed already, such as Shark007, CCCP or K-Lite (do NOT install two of them or all three – chaos will ensue. Pick one, if you need to add stuff it doesn’t support uninstall it and install another or install individual extra codecs) you may be able to use virtualdub, though most builds are for avi only unfortunately and even though it’s an old favorite of many of us for quick converts/light edits it’s becoming a bit obsolete with all the new formats flying around.

If you want to include a flash player version, you’ll need to convert it to that and link up a player somehow, might write up a little more specifically how to do that if I get to it. The installation of wordpress I’m using seems to auto-do that with tinyMCE and Moxieplayer by Moxiecode. But then you’re a ways out of just tinkering with html. There are some easier (and still free and open source) alternatives, like flv-player. Actually, you know what, let’s go over that too instead of me forgetting to do another post later.

Do the video conversion as above, but instead of .mp4 and/or .ogg, add .flv. This will get you a third (or.. a, if you didn’t do the other two) version in .flv, adobe flash video. Upload it to wherever you’ll be hosting it however you usually do that (add media, sftp to your site, some web interface, etc). Figure out where it got hosted (i.e. obtain the URL ,like ‘http://www.nothisisipatrik.com/videoexample/sol50.flv’ here, where people could download it). Then go to flv-player.net and pick one of the versions you think looks good (I used the mini this time). Click on generator, enter the URL and other paramaters. It’ll give you a block of code to include in your page (or to replace the “You do not have html5 video” line with), looking something like this

<object type="application/x-shockwave-flash" data="http://flv-player.net/medias/player_flv_mini.swf" width="788" height="720">
<param name="movie" value="http://flv-player.net/medias/player_flv_mini.swf" />
<param name="allowFullScreen" value="true" />
<param name="FlashVars" value="flv=http%3A//www.nothisispatrik.com/videoexample/sol50.flv&amp;width=788&amp;height=720" />
</object>

Jam that on a page and you’re done. Example page. As you may notice, you’re hotlinking to flv-player.net for the actual player (i.e. the player itself is being downloaded from them, not your site). If you’d like to host the player yourself too (so that it’ll be there even if flv-player.net is down, inaccessible or has moved – not especially likely, but hey, you never know) click Download in the same menu as Generator on the sidebar, right click the player and select Save link as.. Upload that to wherever you want on your site, rewrite the block from generator to replace the two (Yes, there are two. Replace both.) “http://flv-player.net/medias/player_flv_mini.swf” with the one on your site. Now it’ll load the player itself (a little 4.5 kb .swf) from your site. Example of this. flv-player.net is likely more stable than you are though, so it’s probably not needed. However, if you want it to work offline or you’re planning on having this site up for decades (possibly moving/archiving it now and again, having it faithfully show something like your daughters first bike ride every other year), go ahead and throw in the player with it – it’s tiny anyway and you won’t have to worry until flash becomes obsolete and unavailable. Which, I guess, is next Thursday unless it was last year, but that’s another can of worms – data archiving is harder than it seems.

If you’d like to triple platinum safeguard that *something* shows up, you could also make a gif and include that (example gif) as a <img src=”somewhere.gif”>. I did so with Photoshop. I opened the video in it, trimmed start/end a bit and did Save for web, changing type to .gif. Since I wanted it to be on imgur (didn’t feel like hosting it in case it got passed around too much. In hindsight I’m not sure I care, was more of a quick thing that then became a not-so-quick thing getting back up to date) I had to go another step and reduce the number of colors pretty massively (to 8 colors), which there’s no super straightforward way to do in photoshop (that I know of). Might write another post about that after this too (again, now and forever, if I get to it). Even in a good situation, such as this one with lots of whole single color fields, it became 1.2 Mb and more like 5 Mb with no palette tweaks (four to thirteen times the mp4).

You could also use an online service like gfycat.com, which does a pretty decent job and provides html5 + gif versions, though with some limitations. You know, if you wanna be a big sissy about it :-). It came in at around 5 Mb for gif, with defaults, though. They do host it, which is nice if you’re hurting for bandwidth. Example (html5 video), Example (gif). Odds are you’ll perpetually hunt for free service that matches your current case and hasn’t changed since last time you were there though (started adding logos/watermarks, changed acceptable file sizes or lengths, folded altogether, etc) so it kind of makes sense to learn how to do it “the real way”.

Some notes learning to format books into mobi / azw

A few days ago, my wife asked if I knew how to edit mobi files. A friend of hers were having some issues getting her latest book to come out right and while she had someone on it, perhaps I could do something. I gave a resounding “maybe”. Hey, Calibre has an edit button, right? Surely this stuff is documented online? I didn’t really expect to do anything that useful right out of the gate, but even though I’ve converted stuff for it at times, what I’d learned yet was approximately “It’s that format Amazon uses, you can make it with Calibre”. So I asked if I could look. I could and did. So I’m going to write a little about what I’ve learned so far.

Firstly, they’re often called azw files. Mobi is another format which Amazon bought for use with the kindle, originally a very ancient format. It’s in Palm database format, which places it way early in the eBook game, but it’s still used since Kindle happened to be what flipped the script and started convincing non-techies that reading eBooks is pretty cool.

Secondly, it’s actually a collection of various possible formats. This makes a lot of sense if you think about it, Kindles have been around for six-seven years now and the current ones make laptops from that era look slow. The newer stuff couldn’t (at least entirely) just have been planned for, and it wasn’t – the format has grown. It’s still backwards compatible though, you actually can dig out your Kindle 1 and order some fresh bestsellers (or my wifes friends book when it’s done) and read it just fine. That’s pretty cool. Some of the other formats are just there for specific stuff that requires very different formatting like manga, photo books, maps.. so forth. Unless you’re publishing/making stuff like that, you don’t have to worry about it because one format is pretty much *the* format. Even if you’ve been on board from the start it’s fairly likely it’s all you’ve seen. That’s also good – the less to worry about the better.

But one thing will still catch you. The “it” format is actually two formats. The old mobi is usually called mobi 7, though there’s nothing about the file itself at the consumer level making them appear different from the newer ones. When Kindle FIre launched a new format called KF8 or sometime mobi 8 came on the scene. It’s expanded in quite a few ways, capable of embedded fonts, complex tables, accurate control over the layout, access to fixed layout and a ton of other stuff. It, too, is in an azw or mobi file – it contains a copy of both. Older devices and software doesn’t mind, it just reads it as though it was nothing but the original format. That’s why an ancient Kindle can still read your books – it contains a version formatted just as it was and then a modern copy. After the old format, another copy in the newer format is attached.

 Just to top it off, it also contains a copy of whatever you generated those two copies from, which is almost always an ePub or something very similar to an ePub. That’s not always in it, some delete it since it takes up like a third of the file and no consumer devices actually use it for anything. I imagine it’s quite handy for people dealing with them though – they contain the source files so that it’s possible to unpack it, change/fix/update it and pack it back up good as new. You still have to generate a copy of the older (rather bare bones formatting wise) format, thus constructing a properly formatted copy that can be read on both systems or unpacked again.

 The making is done with one of a few different tools. Mobipocket puts out Mobipocket creator, though I think it may end up using kindlegen in the end (didn’t go all the way through with building one with it). InDesign by Adobe also has a plugin by Amazon, which looks kinda neat by I don’t have InDesign so I haven’t seen it. Amazon puts out the official tool, kindlegen, which converts an ePub or a directory containing what would be in an ePub (ePubs are actually zip files – you can unpack them with a normal unzipper like 7zip or winrar to see their insides) into the two formats and combines the whole shebang into a working mobi eBook. They also put out Kindle Previewer, which does the same thing (using the same stuff kindlegen uses I presume) but can also show them on the screen as it would show up on different devices. Those and mobipocket are free, available from these places:

KindleGen

Kindle Previewer

Mobipocket Creator

The preview thing is important! As I mentioned, there are two formats in the file you make (or get from whomever) so even if you put it on your kindle and go over it, you’re only looking at half the file – the other format in it could still be a complete disaster. The previewer isn’t perfect, but if you open something in it and select “Kindle Paperwhite” from the device menu, you’ll be looking at the old mobi 7 (in full low-resolution 16 grayscale glory). Choose Kindle DX to see the KF8 and/or Kindle Fire to see it in color. That way, you can actually see misses in the other format. If you only check it on one or the other, you can’t tell what it’ll look like for the other side. It’s also wise to give it a go on Kindle for PC/Android/Mac/iOS along with any hardware laying around. Their current versions use KF8, but they’re still not functionally identical and the resolution changes are even more extreme My smallest screen is my somewhat aging Samsung admire at 320 x 480, my biggest is the PC I’m typing on now at 1080p – that’s quite a gap in terms of making content work on both in one file.

There’s no official tool for unpacking them unfortunately, but there’s a reverse engineered one by adamselene (possibly et.al). It’s called KindleUnpack, originally mobiunpack which was the same thing without a GUI. Both are written in python, so if you want to use it you’ll need to install that too if you don’t already have it. You can get the files here:

KindleUnpack

Python 2.7.6

You need to install Python first, then unzip KindleUnpack and run KindleUnpack.pyw (open it in python or double click it). You need to be in the same directory you unpacked it in or it won’t find the files it brought with it in the “lib” directory (it doesn’t come with an install as such). Doing that and picking the input and output directory, I unpacked the book as well as a few others for comparison. It’s a pretty straight forward interface once it’s running, the only options are for if you want it to in turn unpack the two extracted .mobi and .azw3 as well so you can see their parts. It won’t work on all books, since the exact format is proprietary and known fully only to Amazon so some situations are unhandled, but most books I tested worked fine or could at least cough up the source files (meaning kindlegen or kindle previewer can reassemble those into a fresh one with all the formats).

At the time I didn’t know any of this and misunderstood quite a bit of what was going on, but I was presented with these files:

  • kindlegenbuild.log, 4 kb
  • kindlegensrc.zip, 1.8 Mb
  • mobi7-formated MOBI Angel An Exceptional Twist Kimi Flores 2-26-14.mobi, 1.3 Mb
  • mobi8-formated MOBI Angel An Exceptional Twist Kimi Flores 2-26-14.azw3, 1.7 Mb

.. and two directories, mobi7 and mobi8, containing:

  • ./mobi7/formated MOBI Angel An Exceptional Twist Kimi Flores 2-26-14.html
  • ./mobi7/formated MOBI Angel An Exceptional Twist Kimi Flores 2-26-14.nc
  • ./mobi7/formated MOBI Angel An Exceptional Twist Kimi Flores 2-26-14.opf
  • ./mobi7/formated MOBI Angel An Exceptional Twist Kimi Flores 2-26-14.rawml
  • ./mobi7/Images
  • ./mobi7/Images/cover00305.jpeg
  • ./mobi7/Images/image00226.jpeg
  • <pile more of image files>

and

  • ./mobi8./mobi8/formated MOBI Angel An Exceptional Twist Kimi Flores 2-26-14.rawml
  • ./mobi8/META-INF
  • ./mobi8/OEBPS
  • ./mobi8/OEBPS/Fonts
  • ./mobi8/OEBPS/Images
  • ./mobi8/OEBPS/Styles
  • ./mobi8/OEBPS/Styles/style0001.css
  • ./mobi8/OEBPS/Text
  • ./mobi8/OEBPS/Text/part0000.xhtml
  • ./mobi8/OEBPS/Text/part0001.xhtml
  • ./mobi8/OEBPS/Text/part0002.xhtml
  • ./mobi8/OEBPS/toc.ncx

As you may notice, the mobi8 files for images failed to extract correctly from the KF8 portion. I’m not sure why, though kindlegensrc.zip indeed contains the source files (and will open in an ePub reader if renamed, appears a compliant ePub). So, lets go over what’s here.

The .log file is the output from kindlegen (in this particular case) when building it, basically listing what it put in briefly and a “successfully encoded” message. The .mobi and .azw3 are (obviously?) the old and new standalone versions for their respective sides of devices/software. The jpgs are jpeg images. There were a few gifs thrown in too, those are the only officially supported end types I believe, at least in mobi 7. Kindlegen will happily convert tons of other formats, but if you want to have final say in the file that actually winds up with the customer you’ll need to submit those two formats specifically and also make sure they’re within the sizes that do not get reformatted for being too large. What constitutes too large varies, Amazon recommends less than 450×550 except for the cover which recommended at 800×600 for the older and 1562×2500 on newer. File size in bytes can trigger it too. Some files larger than 450×550 do not seem to be converted. Amazon seems to mostly give guidelines rather than stating point-blank what they will actually not tolerate, mostly with the justification that you should indeed actually follow those guidelines anyway and if you accidentally don’t and it happens to be something that can work under the specific circumstance, their end won’t intentionally prevent it.

As you can see, much of the files are of the same format in both (although they do not contain exactly the same thing, kindlegen made different changes to the original files in each). They are also pretty similar to what is in kindlegensrc.zip, so lets go ahead and look in that instead since it’s where they came from and what will be directly edited if changes are made. It contains:

Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2013-01-01 00:00:00 .....           20           20  mimetype
2013-01-01 00:00:00 .....          233          152  META-INF\container.xml
2013-01-01 00:00:00 .....         9069         1843  content.opf
2013-01-01 00:00:00 .....          776          331  ncx.ncx
2013-01-01 00:00:00 .....          317          211  c001.xhtml
2013-01-01 00:00:00 .....          288          185  toc.xhtml
2013-01-01 00:00:00 .....       876813       284210  c002.xhtml
2013-01-01 00:00:00 .....         2195          508  base.css
2013-01-01 00:00:00 .....        25560        24725  images\image003.png
    < lots of more image files removed >
------------------- ----- ------------ ------------  ------------------------
                               2502514      1864119  89 files, 0 folders

“mimetype” is specific to ePub, it has to be in the zip (ePub) file as the first file and without any compression, containing the mime type of the whole file. It does (“application/epub+zip”). Another requirement in ePubs is that they have to have a directory named META-INF containing an xml file named container.xml. XML is a common cross platform/application data format, html is based on it so it’s similar to that. It needs to contain a path (from the base of the zip) to another file called content.opf and specify that its type in turn is “application/oebps-package+xml”. That is indeed what is in this one. OEBPS stands for Open eBook Publishing Standard, a standard put together for how ePub files are formatted. Their content file is a OPF file, Open Package Format, which further defines what is in the publication and in which other files (in this case prior to rearranging into the other formats).

Next in the list is the just mentioned OPF. We’re now crossing over into the meatier part of the deal and specifying what is actually in this particular book, most of the others are fairly similar in most books. Inside, we find:

<package xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="BookId">
  <metadata>
    <dc:title>An Exceptional Twist</dc:title>
    <dc:publisher>Kimi Flores</dc:publisher>
    <dc:creator opf:role="aut">Kimi Flores</dc:creator>
    <dc:description>&lt;p&gt;&#13;
    What&amp;rsquo;s a girl to do when the one person she&amp;rsquo;s been forewarned about is the only one that her heart desires?&lt;/p&gt;&#13;
&lt;p&gt;&#13;
    Leah Valdez is a sassy, intelligent, hard-working woman whose beauty shines from both inside and out. Friends and family have always come first, but it&amp;#39;s time for her to start thinking about her own future.&lt;/p&gt;&#13;
&lt;p&gt;&#13;
    Stefen Hunter is a rich, charming, sexy playboy. With seemingly no effort on his part, countless women flock to him. That is, until he meets Leah. He can&amp;rsquo;t understand why it&amp;rsquo;s so difficult to win her over or what it is about her that intrigues him so much.&lt;/p&gt;&#13;
&lt;p&gt;&#13;

.. continues

As you can (likely?) see, it states its version and where to find official specs (a common “Hi, I’m a file of type….” among xml style files) and then launches into various specifics about the book. These do not all show up on all devices, but must be there so that the book itself can easily be identified with some base info (who wrote it, what’s it about, who published it, etc). It’ll proceeds with date, language, ePub version (there’s slews of further optional tags available) among a few others, and finishes off with:

    <meta content="cover" name="cover"/>
    <meta content="cover" name="cover"/>
    <dc:identifier id="BookId" opf:scheme="uuid">4bf86602-cee6-436b-aeb2-86444522cd6a</dc:identifier>

.. before moving on past metadata. The first of these defines the cover image – later in the manifest all files to be included in the final will be listed and given an ID which is referred to here instead of the file name. I’m not sure why there are two, but having more than one of a tag is allowed for most tags if there is more than one of whatever the tag specifies (author, editor, cover artist..). It doesn’t seem to bother KindleGen any. The uuid is a little unusual, each book must have a unique one. They aren’t kept track of though (as far as I know, but I’m almost positive). One way to get a unique one is to go to http://www.famkruithof.net/uuid/uuidgen. It will give you one on the spot. They are generated based on the current time, made in such a way that the one generated at that moment couldn’t have been generated previously and won’t be later (thus unique). I get the feeling there is, was or was meant to be more to it but at this point everyone just gets them by calculating them as that place does. Then comes, proceeding further into the file, the manifest:

  <manifest>
    <item href="c001.xhtml" id="c001" media-type="application/xhtml+xml"/>
    <item href="toc.xhtml" id="toc" media-type="application/xhtml+xml"/>
    <item href="c002.xhtml" id="c002" media-type="application/xhtml+xml"/>
    <item href="base.css" id="base" media-type="text/css"/>
    <item href="images/image003.png" id="image003" media-type="image/png"/>
    <item href="images/image045.png" id="image045" media-type="image/png"/>
    <item href="images/image068.jpg" id="image068" media-type="image/jpeg"/>
    <item href="images/image027.png" id="image027" media-type="image/png"/>

 … continued slew of image files…

    <item href="images/image062.jpg" id="image062" media-type="image/jpeg"/>
    <item href="ncx.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>
    <item href="images/cover.jpg" id="cover" media-type="image/jpeg"/>
    <item href="endmatter.css" id="endmatter" media-type="text/css"/>
  </manifest>
  <spine toc="ncx">
    <itemref idref="c001"/>
    <itemref idref="toc"/>
    <itemref idref="c002"/>
  </spine>
  <guide>
    <reference href="c001.xhtml" type="title-page" title="Title Page"/>
    <reference href="toc.xhtml" type="toc" title="Table of Contents"/>
    <reference href="c002.xhtml" type="text" title="An Exceptional Twist"/>
  </guide>
</package>

And the OPF file is done. Here, it lists all files to be stowed in the main book file (possibly after conversion), first starting three xhtml files. xhtml is a lot like plain html, but has slightly stricter requirements such as requiring closing all tags and not abbreviating any of the names and parameters in them. This makes it a bit easier to make sure it’s rigorously formatted and nothing got forgotten. These particular ones support pretty much all of html 5 (meaning, yes, KF8 can do pretty much everything html 5 can with the important exception of javascript and, except in some rare cases, audio and video content).

The three files here are the all required – a title page (c001), a table of content (toc) and one or more content file(s) (c002, in this case the rest of the book beyond the cover and toc). They do not have to be named this in particular and it’s ok to split the content up further (with further pointing to them so they’re found). Here they opted not to, which is fine – neither style is discouraged. “base.css” is a cascading style sheet, they’re referenced in html files to defines a lot of the formatting for them. It’s common to have only one or a few css files which defines styles that are then used over many different html files so that they don’t have to be repeated in each and can be modified without digging into every html file by itself. We finish up with with that cover we promised to give an ID higher up and then another stylesheet called endmatters.css. This css isn’t actually referenced by any other files and I don’t think it’s a name anything else grabs, so I’m not sure what it’s doing there really – leftover from an earlier structure perhaps?

The ncx.ncx is a longer story, it too is required. It contains another table of contents, this one not actually shown (usually, it’s up to the device) in about the same format as the last. In this case it only lists the minimum required – the title page, a required text only TOC and the book itself. toc.xhtml, in turn also only lists the minimum required – the title page and the rest of the book. This is technically ok and most of the KF8 devices don’t even read the ncx, but it was (and I’d assume is) a bit more useful on the old devices using mobi 7. Most of them aren’t touchscreen and also keyboardless with just four-five shortcut buttons for menu, main, back, etc, arrow keys and two buttons on each side for next/previous page and next/previous section or bookmark. The second part is where the ncx comes (or came) in – even new and without bookmarks you can skip through the sections (chapters or subheadings) with the buttons since it defined a set of important places in the text to jump through. Not sure why exactly it was excluded, but it is allowed and there’s another toc with the actual chapters right at the start of the book so it may have been a conscious choice.

And so that’s that – the ePub is fully defined. The html/css files link and include each other to whatever degree they want and point out images or other resources to display, but the rest is the content formatted in xhtml/css. c001.xhtml just contains a header stating it’s version, a tag to embed the cover and what size it should be. c0002.xhtml is, as stated before, the entire book, with a link to base.css where it defines margins, text sizes, indent, alignment, etc for various specific groups of text – can be stuff like headers, subheaders, asides, addresses, names, quotations and so on. I say “can” because it’s pretty skimpy in this one, a lot of the formatting is done on a case by case basis (specifically stating “this text here should be font X, size and indented 15 pixels” rather than the css saying “text marked as ‘body-text-listing-people’ should be <some format>” and then tags in the actual book saying “this paragraph is ‘whichever-type-it-happens-to-be’” and thus getting their format as defined in the css. That’s ok, but it’d be better to define *what* they are first (called markup) and then specify in the css how each of those things should be formatted. That way A) it’s easy to change your mind – you can easily decide a certain type of thing in the book should look different than it does and change all of them in one fell swoop by editing the style definition in the css instead of each place it happens in the book and B) whomever wrote the text got to state what it is rather than how it looks – the formatter doesn’t have to guess or assume.

The way it is here is pretty common unfortunately. It’s ok and can be made to work, but it’s much easier if the style is defined much earlier in the process. Doesn’t need to be in html, Word supports styles too as do most word processors and they can at least usually be salvaged when remaking them into html. That can be pretty important since a lot (if not most) of the rest of the formatting is usually trashed when moving it to xhtml and needs to be redone or manually excavated from the (usually abysmally formatted) exported file. It’s a lot easier and less error prone to just look at the originally written document (or ask the author/layouter/whomever) about each class of text and how they should look. As it stands now, you can only ask about or look up in the document about “The part that says <something> in chapter 3 a bit down”, not “All text that was marked as type ‘normal-body-text-emphases'”. Since a lot of the formatting requested by doing it in a word processing file isn’t possible (and even less is possible in the old version it’ll pump out) and has to be replaced, starting with a well-structured documents with the structure machine readable helps considerably.

In this particular book the chapter (and other) titles are done in a decorative font which is what all those images that keep getting mentioned are. Those could have been done by including the font in the KF8 format (i.e. new ereaders/software could render it themselves which would save some space and avoid a few other issues, such as if the reader has a different background or font selected to override the default or the one specified in the book) but the old format needs them as images or to be allowed to fall back on some other font. Because of this the rest is mostly body text. The fancy TOC does look pretty neat though, except some of the titles had been reformatted a little much and ended up pretty low resolution (hence showing up as very small on high-res devices) and a little washed out. This was the main thing she wanted to change when this came up (afaik, I wasn’t there). I think it’s done, some other changes came up so I haven’t seen the sources. It’ll be interesting to see how things ended up once it’s put together again and I can revisit it.

The original TOC (left) and a version with tweaked pictures. Both unscaled screenshots of kindle previewer, set to Kindle DX

The original TOC (left) and a version with tweaked pictures.
Both unscaled screenshots of kindle previewer, set to Kindle DX

There isn’t a single reason something like that happens, but a lot of manual reformatting ups the odds. Any tinkering does, but the overall amount grows when changing lots of individual places and things instead of overall structure and judging by the tags in the main html and the definitions in the css (many seemingly abandoned) this has gone through a lot of piecemeal changes to individual spots. Some of the other images like flourishes for jumps in the text that aren’t also chapters are include multiple times a bit too, sometimes in different formats. This would be easier to dodge if they had been specified and referenced a little more cohesively to begin with.

IDs that are human readable (“chapter-4-toc-thumbnail’ instead of “image0034.img”) would also have been handy, easier to spot errors early or in passing just by looking that way, but I’m not sure how to do that in Word or any specific word processor really, I don’t work with in much.

As an aside here, the two-format deal and their limitations is something that would have been goot to keep in mind when writing the original document. The final version will, when it’s sold on Amazon, be formatted into one fairly expansive format with high resolution and a slightly lighter version without the bells and whistles for older devices. Quite likely also an ePub (roughly comparable to KF8, equal but different) although here the ePub is the basis for the other two. Both kind of default to mostly living with the functions of the older format except for including higher resolution images for the modern screens (nicer pictures and more size appropriate on screen) but it’s possible to segment off sections of the css during formatting, essentially saying “text marked ‘something’ should look like <complex format> if this is a KF8 reader, otherwise it should instead look like <simpler or other format>”. The whole process is a little like if you were giving a presentation on TV in the late 70s early 80s – if you needed to decide what colors to use in the presentation, you’d need to know that not everyone has a color TV. A majority does, so you still have to (or at least should) give some thought to what looks good, but you can’t hinge the whole thing on it by color coding vital information – it has to be a working (if not as nice) product even in black and white. The techies can and will try to put something together of course, but it can be kind of hamfisted sometimes and it’s a fine line between artistic choice in terms of how much “extra” to put in the more expansive format version and how far to go to squeeze as much as possible out of an old format that’s intentionally pretty limited due to the smaller, older devices.

Whew, that’s a lot of text. I think I’ll leave it at that, I may write some more details some other time and perhaps do a more hands writeup of a start to end doing these things. Hope you enjoyed your read (for the few who actually made it this far). There is a bit more to say about the formatting inside the actual xhtml, but I think I’ll do that another time as well, I could stand to know a little more detail about it myself.