ITS TRUE.
I teach an MCAT class using pdf files, and from every file I use pages 3, 4, 7, 8.... and so on. Its a pain to make new pdf files with only these pages, and I'm always afraid I'm going to skip a page. Turns out there's a library of commands called "pyPdf," and using that library I was able to hack together a program that does exactly this, but more reliably and virtually instantly, in about a half hour.
The key word here is "hack." It is an ugly program, that I'm sure does things the "long" way, and to use it you have to copy/paste a folder path into the code itself which is totally against every programing style rulebook, BUT... it does its job, in spite of my programing ignorance. So, in light of this blog's overall tone as an accretion of my random thoughts/fun things that I've done, here's the script (it needs http://pybrary.net/pyPdf/ to be installed)
import osfrom pyPdf import PdfFileWriter, PdfFileReader#directoryfolder = "/Users/tylerhoppenfeld/Documents/unit2"# create extraction functiondef extract(start, source):# variables that should be passed when script is called, but i'm too lazyout = source + "clip.pdf"output = PdfFileWriter()input = PdfFileReader(file(source, "rb"))endpg = input.getNumPages() - 1# generate list of pages to keepfacingstart = start + 1pgs = range(start,endpg,4)add = range(facingstart, endpg, 4)for x in add:pgs.append(x)pgs.sort()# build outputfor x in pgs:output.addPage(input.getPage(x))#write to new fileoutputStream = file(out, "wb")output.write(outputStream)outputStream.close()target = os.listdir(folder)for x in target:if x.endswith(".pdf"):y = folder + "/" + xextract(2,y)
No comments:
Post a Comment