Changeset 3386

Show
Ignore:
Timestamp:
06/23/08 00:07:15 (16 years ago)
Author:
jerome
Message:

Removed unneeded stuff in the new PDF parser. We will add it again
later if needed...

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • pkpgcounter/trunk/pkpgpdls/pdf.py

    r3385 r3386  
    3535                + chr(13) \ 
    3636                + chr(32) 
    37                   
    38 PDFDELIMITERS = r"()<>[]{}/%"                  
    39 PDFCOMMENT = r"%"        # Up to next EOL 
    40  
    41 PDFPAGEMARKER = "<< /Type /Page " # Where spaces are any whitespace char 
    42  
     37PDFDELIMITERS = r"()<>[]{}/%" 
    4338PDFMEDIASIZE = "/MediaBox [xmin ymin xmax ymax]" # an example. MUST be present in Page objects 
    44 PDFOBJREGEX = r"\s+(\d+)\s+(\d+)\s+(obj\s*.+\s*endobj)" # Doesn't work as expected 
    45  
    46 class PDFObject : 
    47     """A class for PDF objects.""" 
    48     def __init__(self, major, minor, description) : 
    49         """Initialize the PDF object.""" 
    50         self.major = major 
    51         self.minor = minor 
    52         self.majori = int(major) 
    53         self.minori = int(minor) 
    54         self.description = description 
    55         self.comments = [] 
    56         self.content = [] 
    57         self.parent = None 
    58         self.kids = [] 
    5939         
    6040class Parser(pdlparser.PDLParser) : 
     
    8262           Unfortunately it doesn't take into account documents 
    8363           with redacted pages (only made with FrameMaker ?) 
     64           where an existing PDF object is replaced with one 
     65           with the same major number a higher minor number. 
    8466        """ 
    8567        newpageregexp = re.compile(r"/Type\s*/Page[/>\s]") 
     
    9577            
    9678           At least the actual method below is accurate, even if 25% 
    97            slower than the old one. 
     79           slower than the old one. But we will be able to extract 
     80           other informations as well when needed, like orientation 
     81           and size. 
    9882        """ 
    9983        # Regular expression to extract objects from a PDF document