Thursday, December 28, 2006

Working with PDFs

Objective: I needed to use a PDF reader inside my windows application.

Requirement of my App:

1. Can't download the PDF files to the client computer
2. Can't use Acrobat reader or any other plug in/reader (named Helper Applications) that push the user to download it .So everything should be embedded in my app, and the user can't have any plug-in or browser pre-installed to see and print the PDF files.

Because the PDF file cannot be downloaded to the user machine, it is needed to navigate or to load the document into the reader by passing to it an URL.

Alternatives:

1. Use the .Net WebBrowser Control in my Windows Form and embed the reader or any PDF plug-in (not acrobat! it means I cannot use the PDF Control that is include in VS when the acrobat reader is installed) into my Setup in order to install it unattended from the user. I never tried it!
2. Use a third party PDF browser-reader control (dll or ocx). I never found it!
3 Use a third party PDF reader control (dll or ocx) who is able to load PDF files from Memory Stream, URLs. I did it.

Solution:

I found a control “WPViewPDF” (an ocx) able to load a PDF from a stream, and I implemented this method to get the stream from the URL:

--------------------------------------------------------------------------------
//Make a Request to the file and return it as Stream
WebRequest MyReq = WebRequest.Create(textBox1.Text);
WebResponse MyResp = MyReq.GetResponse();
Stream data = MyResp.GetResponseStream();

//Create the bytes array store the Stream
byte[] buff = new byte[MyResp.ContentLength];

//Read the Stream and Load the array
int len = 0;
int cnt;
do
{
cnt = data.Read(buff, len, buff.Length - len);
len += cnt;
} while (cnt > 0);
data.Close();
MyResp.Close();

//Recreate a Stream and pass it as a parameter
MemoryStream ms = new MemoryStream(buff);
pdfViewer1.LoadFromStream(ms);
Note:
Looking at this code I though What About if I want to save this info to a physical file?
I researched and I found this:

1. The easiest way to Download the file and save it to disk is using a WebClient
WebClient client = new WebClient();
client.DownloadFile(varURL, localFileName);

2. But If I already have a MemoryStream and want to save it to a file, this is the way to go:
using (FileStream stream = new FileStream("fileName", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(stream))
{writer.Write(ms.ToArray());}
I used it!

-------------------------------------------------------------------------------

Conclusion so far:

It worked fine but the control does not render well the PDF file, so I’m looking foranother control that can download a document from a stream.
Finally!!!!!!!! I found a control which satisfies the requirement my application has. It is an assembly very ease to use. A reader-printer not a generator (We use PDFLib to generate the PDF documents on linux also work for Windows). PDFReaderControls.NET from Tall Component (www.tallcomponents.com ) a company from Netherlands, the license price $2500. UUUh!!

I tried before those components:

PDF-tools: It is an ocx and gave me to much exceptions.
PDFXpress: They don’t have a release for .NET 2.0
WPViewPDF: Does not render some PDF correctly.
PDFNEtDemo: Does not load PDF from URLs

There is also some exceptions that happen when PDF is used with .Net and the PDF document has not been done following the rfc2616 specification (www.ietf.org/rfc/rfc2616.txt) and it is loaded from the WEB (via URI):

System.Net.WebException: The server committed a protocol violation. Section=ResponseStatusLine. A pain..!!

To help the community and avoid people from waste time reading and reading I’m going to give some links that show the story of this exception and the solutions.

Important to know:

Some .NET requirements for PDF documents (http://msdn2.microsoft.com/en-us/library/system.net.configuration.httpwebrequestelement.useunsafeheaderparsing(VS.80).aspx)


1. To help protect from "HTTP Response splitting" attacks, parsing is performed according to the document that is named Request for Comments (RFC) 2616. This means that all control characters are not permitted in names or in values. For example, the carriage return (CR) character and the linefeed (LF) character are not permitted. There are other characters that are not permitted in names. Additionally, every response header must have a colon.
2. Headers names should not have spaces in them.
3. If multiple status lines exist, all additional status lines are treated as malformed header name/value pairs.
4. The status line must have a status description, in addition to a status code.
5. Header names cannot have non-ASCII chars in them. This validation is performed whether this property is set to true or false.

Links to read about it:

http://blogs.msdn.com/mflasko/archive/2005/11/02/488370.aspx
http://support.microsoft.com/?kbid=888527
(search for "Undocumented bugs in the .NET Class Library (NCL)")
http://msdn2.microsoft.com/en-us/library/65ha8tzh.aspx
http://www.velocityreviews.com/forums/t302174-why-do-i-get-quotthe-server-committed-a-protocol-violationquot.html

1 comment:

Unknown said...

July 2007 -
Please check out the new WPViewPDF Version 2:
It now integrates a rendering engine for Type1 and TTF subset fonts for better rendering quality.
It now also paints monochrome images antialiased which is important for PDFs created by scanning software. It is even faster now and opens even huge PDF files instantly.
WPViewPDF PLUS can also merge, split and stamp PDF files if you need to edit PDF files. Available as VCL, .NET and OCX.

Regards,
WPCubed GmbH
www.wpcubed.com