Context Navigation

pdf.py

Timestamp:

06/23/08 00:07:15 (16 years ago)

Author:

jerome

Message:

Removed unneeded stuff in the new PDF parser. We will add it again
later if needed...

Files:

r3385	r3386
35	35	+ chr(13) \
36	36	+ chr(32)
37
38		PDFDELIMITERS = r"()<>[]{}/%"
39		PDFCOMMENT = r"%" # Up to next EOL
40
41		PDFPAGEMARKER = "<< /Type /Page " # Where spaces are any whitespace char
42
	37	PDFDELIMITERS = r"()<>[]{}/%"
43	38	PDFMEDIASIZE = "/MediaBox [xmin ymin xmax ymax]" # an example. MUST be present in Page objects
44		~~PDFOBJREGEX = r"\s+(\d+)\s+(\d+)\s+(obj\s.+\sendobj)" # Doesn't work as expected~~
45
46		~~class PDFObject :~~
47		~~"""A class for PDF objects."""~~
48		~~def __init__(self, major, minor, description) :~~
49		~~"""Initialize the PDF object."""~~
50		~~self.major = major~~
51		~~self.minor = minor~~
52		~~self.majori = int(major)~~
53		~~self.minori = int(minor)~~
54		~~self.description = description~~
55		~~self.comments = []~~
56		~~self.content = []~~
57		~~self.parent = None~~
58		~~self.kids = []~~
59	39
60	40	class Parser(pdlparser.PDLParser) :
…	…
82	62	Unfortunately it doesn't take into account documents
83	63	with redacted pages (only made with FrameMaker ?)
	64	where an existing PDF object is replaced with one
	65	with the same major number a higher minor number.
84	66	"""
85	67	newpageregexp = re.compile(r"/Type\s*/Page[/>\s]")
…	…
95	77
96	78	At least the actual method below is accurate, even if 25%
97		slower than the old one.
	79	slower than the old one. But we will be able to extract
	80	other informations as well when needed, like orientation
	81	and size.
98	82	"""
99	83	# Regular expression to extract objects from a PDF document