Changeset 3386 for pkpgcounter/trunk/pkpgpdls/pdf.py
- Timestamp:
- 06/23/08 00:07:15 (16 years ago)
- Files:
-
- 1 modified
Legend:
- Unmodified
- Added
- Removed
-
pkpgcounter/trunk/pkpgpdls/pdf.py
r3385 r3386 35 35 + chr(13) \ 36 36 + chr(32) 37 38 PDFDELIMITERS = r"()<>[]{}/%" 39 PDFCOMMENT = r"%" # Up to next EOL 40 41 PDFPAGEMARKER = "<< /Type /Page " # Where spaces are any whitespace char 42 37 PDFDELIMITERS = r"()<>[]{}/%" 43 38 PDFMEDIASIZE = "/MediaBox [xmin ymin xmax ymax]" # an example. MUST be present in Page objects 44 PDFOBJREGEX = r"\s+(\d+)\s+(\d+)\s+(obj\s*.+\s*endobj)" # Doesn't work as expected45 46 class PDFObject :47 """A class for PDF objects."""48 def __init__(self, major, minor, description) :49 """Initialize the PDF object."""50 self.major = major51 self.minor = minor52 self.majori = int(major)53 self.minori = int(minor)54 self.description = description55 self.comments = []56 self.content = []57 self.parent = None58 self.kids = []59 39 60 40 class Parser(pdlparser.PDLParser) : … … 82 62 Unfortunately it doesn't take into account documents 83 63 with redacted pages (only made with FrameMaker ?) 64 where an existing PDF object is replaced with one 65 with the same major number a higher minor number. 84 66 """ 85 67 newpageregexp = re.compile(r"/Type\s*/Page[/>\s]") … … 95 77 96 78 At least the actual method below is accurate, even if 25% 97 slower than the old one. 79 slower than the old one. But we will be able to extract 80 other informations as well when needed, like orientation 81 and size. 98 82 """ 99 83 # Regular expression to extract objects from a PDF document