omniverse theirix's Thoughts About Research and Development

The tale of automating BibDesk

bibtex

For organising scientific publications I use a standard LaTeX tool bibtex. It is wise to switch later to the biblatex which handles UTF-8 better but it is a different story. Bibtex specifies a file format for publications with a lot of standard and custom fields where each field actually is a text key-value pair. For example (wikipedia):

@Book{hicks2001,
 author    = "von Hicks, III, Michael",
 title     = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
              Spacecraft Using a Novel Manufacturing Technique",
 publisher = "Stanford Press",
 year      =  2001,
 address   = "Palo Alto",
 edition   = "1st",
}

You can edit bibtex file by hands but there are some good programs to present and edit publications such as JabRef, Mendeley and BibDesk. I use BibDesk because of its good user interface and integration to OS X.

I am using a lot of specific bibtex tasks for my researches. For example, I often need to find bibtex publication id or title by filename and vice versa, grep citations, set PDF title and author fields from a publication and publishing missing publications to my Kindle via Calibre - a lot of small tasks that require reasonable amount of time and need to be automated. And I prefer a command line utility for these tasks. It accepts a command verb and optional argument and provides a list of strings as an output. Unix way rocks.

Scripting BibDesk

A few years ago I found an exhausted AppleScript support for BibDesk. I could write a script for each of these tasks. Anyone who ever wrote an apple script could understand a complexity of writing a complex data processing applescript. I wanted to wrote a script in a more friendly language, effectively any other language, preferable Ruby or Python or plain C or Java.

Scripting Bridge version

First version of script was written in “Scripting Bridge”. But it stopped working when required MacRuby died. MacRuby was needed because Scripting Bridge is based on Cocoa and the only good way to use Cocoa from Ruby is using MacRuby. MacRuby development halted in 2012.

Appscript version

Second version was rewritten in (ruby appscript)[http://appscript.sourceforge.net/rb-appscript/]. Script was nice except the part where each domain object needs to be manually extracted from the scripting object using .get call. Here is the example for task providing files by citation string:

def files_for_cite cite_str
    app("BibDesk").documents.first.publications.get
        .find { |pub| pub.cite_key.get == cite_str }.linked_files.get
        .map { |f| f.url.get }.compact
end

It uses a typical pseudo-functional Ruby chained call that can filter, map and zip things. Then in 2014 I realised that script does not work in Mavericks and Yosemite. Appscript simply does not builds here due to missing symbols. Official page says it is dead too.

Swift version

A week ago I made a third version of the script. Currently Apple Script supports binding to Objective C, JavaScript using Scripting Bridge. I do not know JavaScript enough to master a script and do not like it at all. Writing a script in Objective C is possible but very verbose. So I got a new Apple language called Swift. Apple positioned Swift as a replacement for Objective C that could work with existing codebase and improve a lot syntax and safety of Objective C. It is good for my purpose!

First of all it’s needed to generate a binding from Scripting Bridge to Swift using an experimental project SwiftScripting. Objective C bindings are supported out of box. Then you need to fix bindings by hands because (see below) it is experimental. Then just write a Swift script.

The same script in Swift looks like:

func files_for_cite(cite: String) -> [String] {
    let app: BibDeskApplication =
        SBApplication(bundleIdentifier: "edu.ucsd.cs.mmccrack.bibdesk")
    let pubs = (app.documents!().get()[0] as! BibDeskDocument).publications!()
        .get() as! [BibDeskPublication]
    let pub = pubs.filter({ pub in pub.citeKey! == cite }).first!
    return (pub.linkedFiles!() as [AnyObject])
        .map({ (f: AnyObject) in f as! BibDeskLinkedFile })
        .filter { x in x != nil }.map { x in x! }
}

Huh! I even cannot make a single chained call because amount of braces became astronomical. Technically Swift specifically encouraged chained calls but they seems very cumbersome because of static type system of Swift where we need to cast proxy chain objects to specific types. Swift functional capabilities are limited to the very weak Foundation and Cocoa library support while Ruby has a lot of useful functions in Enumerable and Array. Sometimes I just wrote a matching replacement for Ruby function for more direct porting.

Swift impressions

It took one or two hours for reading Swift manual and Stack Overflow questions and a few hours to rewrite and debug a dozen of tasks to Swift. Major problems in porting were unwrapping values and type casting.

Optional types are pretty good and could protect you from raw pointers usage and NPEs. It is a little similar to Rust optional enums but with added syntax sugar (question and exclamation marks).

Swift could automatically deduce type from right-hand expressions so variable declaration does not need a type. If type-deduction is not possible you need to manually cast type using as operator. Casting became a nightmare because Scripting Bridge provides only untyped pointers that required casting from/to AnyObject. And it looks like sad programming in Java 1.4 with non-generic containers.

So Swift is a pretty language that is objectively better than Objective C :) It has nice features that simplify existing code and improve its safety and readability. Programming Scripting Bridge in Swift is not very comfortable but entirely possible. Seems like it is the only sane way to script OS X applications without dealing with Apple Script or Objective C syntax.

omnifiles

Recently I wrote a small web application for storing temporary files and providing short links to them. It is like a shortener service integrated with file storage. I had found a few simple services for screenshots but it was needed to store pdfs and archives sometimes. Another requirement was to allow easy access by curl both for accessing (downloading) and for storing (uploading). And of course I wanted to store sensitive files at my own server.

This application called omnifiles is open-source and can be found at https://github.com/theirix/omnifiles. omnifiles is a hobby project and a playground to tighten my web skills. I am using it for my own needs but you are welcome to provide feedback or patches!

Let’s talk a little about it’s architecture. It is a simple web app built with Ruby Sinatra framework, HAML for markup, MongoDB for metadata storage and filesystem for file storage.

Storage

Files are stored in filesystem with their unique shortened names. MongoDB contains documents for each shortened link containing shortened link itself (acts as a key), original filename and MIME type, access statistics. Initially metadata was stored in sqlite. Certainly there is no need in scaling, sharding and other CAP stuff but hey, it is 2015! It is reasonable to use NoSQL if there is no strict need in structured data.

I used MongoDB 2.6 and stable ruby driver. Ruby driver was recently rewritten to version 2.0 to support Mongo 3.0 but for now a stable version is sufficient and easy to use.

API

omnifiles should be simple and easily accessible. Requests to store files are sent using POST requests. POST request emits a shortened URL as a response body. I use a curl utility for command-line posting.

There are two ways to store a file (omitting auth and url).

  1. Send a POST form with a single file field:

     % curl -F "[email protected]" ...
    
  2. Send a file using POST binary stream:

     % curl -H "Content-Type: application/octet-stream" --data-binary "@file.jpg" ...
    

Both variants are not very concise. I do not like an artificial form field at the first variant. Second variant just streams a file as a request body. Another vote against first variant is about intermediate form file saving to the temporary directory by the Rack middleware. You can be more efficient with a stream.

File is downloaded by GET request with a shortened link:

    % curl http://localhost/sge36a

Omnifiles provides an original filename as an additional header if a client wants to rename a downloaded file after downloading. Another nice feature is to return a saved MIME type to the response so browser can show images or pdfs directly inside browser window.

I added control panel to omnifiles to view a single file statistics (http://localhost/stat/sge36a), a whole storage statistics (http://localhost/stat) or to delete unneeded files.

Web

Web frontend and backend are written in Sinatra. In omnifiles API and presentation are not clearly separated for the sake of simplicity. For example, routes are not classic REST because it was necessary for me to minimize possible URLs and to group certain URLs for auth in web server. So it is not a proper way to build a REST API service.

I consider Sinatra as a good solution for simple REST services and simple apps without unneeded Rails complexity. There a lot of API, auth, model plugins and Rack middleware to Sinatra so you could build your app from the ground. Sinatra uses Rack middleware to work with requests/responses so with some Rack magic we could distinguish stream against form (see two POST approaches) and store a file in Mongo and filesystem.

Omnifiles separates logic into different apps - public (for GET requests) and protected (POST and control panel). Protected app is using digest auth using Rack::Auth. Routing between two apps is performed by Rack::URLMap. It is definitely less flexible than Rails routes and unfortunately cannot route by HTTP methods.

For presentation I am using HAML. HAML is an another markup language above HTML and it is a lot more concise than ERB. It’s required to learn it for a while it because it seems strange and awkward at first. A common problem with HAML is strict indent and string policy. Resulting markup is compact and beautiful and I like it.

App can be served as a thin app (there is a launcher script bin/omnifiles) or as a Rack app (using config.ru). Command line usage of thin or launcher can be cumbersome as you can see in README.md. Sometimes it is simpler to run Rack app inside Passenger.