PDF: I have simple bash scripts to convert PDFs, e.g. from JSTOR,
journal articles in the primary Economics literature (e.g. AER, etc)
into compact forms for printing. For instance, for AER, it strips the
JSTOR header page, expands each page (reduces margins), compresses the
document into two-per-side format, and shifts even and odd doubled
pages to optimise for double-sided printing. The result is four article
pages per sheet (two per side) but large enough font to be very
readable. Ask me if you want them; they
work on Unix, Linux, and probably OS-X.
2006: I have now made a more useful tool to assess where the content is
in an arbitrary PDF document and, taking into account also where the
bounding box is, arrange it for optimal two-per-side two-sided printing
This is scripted in Matlab, but the algorithm is clear in the code. I
am also compiling a list of magic pstops incantations produced by this
code for different journals.
I have worked on
software in Matlab for projecting star
fields / star
Matlab, astronomy, pointing) for determining precise pointing
digitized video frames. This toolbox includes a celestial database and
projects the astral component of the night sky for a given location and
time. It then uses least squares fitting to determine where a video
Do you wish your colleagues were using
Matlab instead of IDL? (Or
vice versa!?) I believe that IDL has astonishingly primitive plotting,
variable types, and a nearly inconceivable lack of built in utilities.
Matlab 5 comes with code to write MAT file format
files from C. I have written a tool to write Matlab-format files from
within IDL (keywords: converting from IDL to Matlab MAT file convert
from IDL to Matlab writing MAT file format from IDL). This may be more
convenient than using the "idl2matlab" software, which requires
starting up a Matlab kernel each time it's used. My software is
juvenile  -- please let me know if you use it, and tell me if you
extend its functionality -- but it works perfectly and I would like to
help others not waste their time starting from scratch! There are just
latter is ugly (presumably there is a proper way?!), and a lovely
example of why I sometimes loathe IDL. If you use these files, also
me for the most up to date versions.
 I have worked on developing a
sensible set of tools for writing my PhD dissertation, with the goal of
others in our field use a common LaTeX bibliography database (BibTeX)
of all the sprites papers and presentations, and in improving some
default behaviours of LaTeX in the thesis.sty style. Please see my LaTeX page for advice on theses and BibTeX (and
 I have spent much time dealing with tables and making
scripts to format regression table output into LaTeX using various
features. A bunch of what I've learned about tables is in cpblTables.sty
Some other customisations for thesis, beamer, etc are in cpblRef.sty
- How can I centre an over-wide table? When it breaks margin rules, I still want it centred, rather than aligned with the left margin.
I use a lot of open source (freedom, not
iClicker (i>clicker) with Linux
See my struggles with this.
I write horrible Perl; you can ask me for:
-  I have started a database/processor for digital images, in my
case mostly 35 mm slides. The idea is to store images in their high
resolution, original scanned form but efficiently produce chosen sets
of ordered images appropriately modified and of reasonable resolution
whenever needed for shows (projection) or web display (e.g. in a few
years when higher a resolution is appropriate, one command would
recreate a given set etc). It makes use of a simple spreadsheet (the
database) and the ImageMagick routines.
- [2002+] I have written a somewhat elaborate routine to make
interactive (HTML) "concept
maps", with automatic (wiki-like) referencing and other features.
- [2004+] I have also administrated a couple of GNU/Linux servers and unix
accounts. I have a set of Perl and PHP tools for
accounts (solicit account requests; grant them; create
accounts / initial passwords / default web pages / default unix
email forwarding; automatically require users to
set a new (secure) unix password within two days for a new account;
monitor disk usage; etc.) with really
minimal support /
administration. It is aimed at providing web, email, and file
service accounts for those uninitiated with UNIX (ie Windows / Mac
users) securely from a standard GNU/Linux box. It does the tasks
above just by running a cron job every night to process requests
etc. Please ask me for the scripts!
[2004+] I wrote a simple community
photo / contact info / interests etc directory for Green College at
UBC. It allows the individuals to update the information and photos
themselves, so requires no maintenance. It seems well
used. It is also fully controls a majordomo email list, so it
acts as a self-serve email list for which you can look up info/photos
of any subscriber. It has a number of other features, such as
showing a randomly-ordered facebook with people's names visible by
mouse rollover: nice for learning faces in your community.
The interface is passworded, but just ask if you want to see it or have
the code. It requires PHP and MySQL.
An interface to the statistical software, Stata, and between it
and LaTeX. I do almost all my Stata analysis and programming in
Python. This is motivated partly by some custom repetitive statistical
needed, partly to save the hassle of compiling multiple regression
outputs into one table, partly 'cause I could not stand to start
programming in Stata's own language. Much of what I can now do is captured in my pystata [zip] package. As an example,
running the demo regressions-demo.py a few in a
You can see the sample output, also in the git repository, though it doesn't demo many available options.
- Generate a .do file
- Execute Stata
LaTeX code with the resultsCompile the LaTeX
The form of the actual latex table is also quite
customisable without actually changing the underlying .tex table file that
corresponds to each table of results. The
preview tex file (corresponding to the pdf, above) that incorporates all the tables gives examples and a
wide range of choices for formats in case you want to tweak how things
look in your your final document. (See the cpbl-tables>/a> repository)
This code is set up to run under GNU/linux or UNIX with Stata
installed, though mostly it has facility to sense Windows or cygwin as well.
This package has evolved towards reading Stata log files directly and extracting regression results, covariance matrices, etc into numpy and pandas forms in Python, and various tools to generate LaTeX from those. It also generates the Stata code that produces those log files.
[2006+] I now write most things in Python. Great for manipulating
Some python scripts are online:
- dictTrees.py: a favourite. A class which manages nested
"dict"s (dicts a Python data type) in an elegant way.
tree: a nested set of dicts with a list of some kind of list at the bottom
level of each dict. Typically, the terminal lists are lists of dicts
which have discrete-valued properties used to separate them in the
tree heirarchy. For instance, a set of dicts describing properties of
fruit could be turned into a tree based on the sequence of properties:
'colour','shape','sweetness'. The resulting tree might have top-level
subtree: a tree
address: a list of keys that specifies the heirarchy of dict entries leading to a particular subtree or a leaf.
branch: an address to a subtree or to a leaf
keynames: the sequence of properties (of a set of dicts distributed within lists at leaves) used to define the tree structure.
leaf: a list or other object at a bottom-level of a tree.
regular tree: a tree with the property that at a particular level (depth), dicts are similar, ie have the same set of keys...... By convention, we ensure that non-regular trees have the property of no empty leaves.
- cpblUtilities.py: various utilities. Includes a good text file
(tsv,csv) reader that deals with column names, numeric columns, etc.
- circuit.py: timer for circuit training!?
- emailMyIP.py: useful for DHCP servers?
Some other stuff, not yet documented or put online:
- I do almost all my Stata programming in Python. The python code
generates stata .do files. Loops, string manipulation, coordination
with LaTeX output etc etc looks much better in Python than Stata
(actually, maybe I just don't know Stata). I use EST2TEX and CORRTEX
in Stata... but I've now replaced all the formatting functions with Python.
- est2tex needs lots of modifications and extensions. I've used a
modified version (by Ben Sand). It does not use math type minuses for
- Python wrapper for Stata regressions and for est2tex interface:
coordinates creation of an entire table (several regressions) with one
command; insertion of indicator lines giving comments or checkmarks
by column for given included properties / dummys / controls. etc, some
- Now, fairly general routines to turn Stata output into fancy LaTeX tables, transposed or not, multi-page or not, made for Beamer or articles, ...
- A number of web spidering (I guess it's called database scraping)
codes, some rather elaborate. This is not very useful, generally,
since the data cannot be saved due to its proprietary
nature. Parkers.co.uk; whatcar.co.uk; maps.google.ca; etc. I have the
locations of all wal-mart's in canada, all public libraries in canada,
- Python interface to geocoder.ca.
For my dissertation I worked on some code for simulation of
electromagnetic fields' interaction with the lower ionosphere and
resultant optical emissions. Some literate-programming source code
(using noweb) is online, and you can ask me (or the VLF group at Stanford) for more details if you like.
There are three parts to the codes:
- the FWEM calculation (my latest incarnation is version 4.0
- the optical calculation (optCrossSection), and
- the geometric/optical integration (which I have called optVideo)
If I were a C.S. prof, all my courses would be design courses of real,
open source projects. One could make a major contribution to the world
every term! Here are some of the many things I have wished for as a
unix user: (Possibly out of date, thankfully!)
- Here's a quickie: a somewhat clever (or not) converter from
article class into beamer class in LaTeX.
- An editor (word processor) which has a gmail-like ability to
attach "tags" or labels to different paragraphs or bullet points,
etc. This is key for a process I usually go through in organising
notes for a paper: notes/thoughts may come with a certain organisation
(by source, or etc), but then get reorganised according to some new
logic of the paper to be written. If portions of text were objects
with possibly multiple simultaneous properties, they could be listed
/sorted different ways within the same document, much facilitating the
organisation of thoughts.
- A really smart GUI to edit latex tables. Possibly an interface
for LyX to import spreadsheet tables, etc.
- A plotting package for Python / Pylab that is more like
Matlab's (more dynamically updatable).
- Similarly, a plotting system for Octave that is more like
Matlab's: object oriented, dynamically updating, interactive.
- A quickie: a tool to
strip out the uninteresting mail headers from text files (ie
text-based email files consisting simply of concatenated messages,
e.g. in order to archive email from my Pine folders). Maybe it could
re-sort the messages in the file too.
- I still cannot find a vector graphics editor (e.g. scribus,
inkscape, etc?) which can open an .eps or .pdf file and edit it, just
like ancient Adobe Illustrator could. Typically, I want to be able to
open an .eps made by Matlab, get rid of the background to make the
thing transparent, and edit some text a bit. Advice?
G.I.S. in Matlab
I've not bothered to learn a real GIS software, and am used to having
the control of a scripting language. Sometimes it's really slow to
plot, or awkward, to use Matlab's Mapping Toolbox, but I have been
doing a lot of that.
Finding the outlines (not convex hull) of a group of contiguous polygons...
I used to have
mild RSI from typing too much. As everyone says, treating this in time
invaluable to your livelihood and happiness. Here is information,
graphics, etc on RSI.