Tuesday, February 14, 2012


There is a sort of running nerd joke about how easy the programing language python is to use.  Basically its that to do ANYTHING in python, you just use the command "import ____" and the bulk of your programing is done.


I teach an MCAT class using pdf files, and from every file I use pages 3, 4, 7, 8.... and so on.  Its a pain to make new pdf files with only these pages, and I'm always afraid I'm going to skip a page.  Turns out there's a library of commands called "pyPdf," and using that library I was able to hack together a program that does exactly this, but more reliably and virtually instantly, in about a half hour.

The key word here is "hack." It is an ugly program, that I'm sure does things the "long" way, and to use it you have to copy/paste a folder path into the code itself which is totally against every programing style rulebook, BUT... it does its job, in spite of my programing ignorance.  So, in light of this blog's overall tone as an accretion of my random thoughts/fun things that I've done, here's the script (it needs http://pybrary.net/pyPdf/ to be installed)

import os
from pyPdf import PdfFileWriter, PdfFileReader

folder = "/Users/tylerhoppenfeld/Documents/unit2"

# create extraction function
def extract(start, source):

    # variables that should be passed when script is called, but i'm too lazy

    out =  source + "clip.pdf"

    output = PdfFileWriter()
    input = PdfFileReader(file(source, "rb"))

    endpg = input.getNumPages() - 1

    # generate list of pages to keep

    facingstart = start + 1
    pgs = range(start,endpg,4)
    add = range(facingstart, endpg, 4)
    for x in add:


    # build output
    for x in pgs:

    #write to new file
    outputStream = file(out, "wb")

target = os.listdir(folder)

for x in target:
    if x.endswith(".pdf"):
        y = folder + "/" + x

No comments:

Post a Comment