Tuesday, 28 August 2012

Not possible to read a pdf from file system

If we try to read a pdf on file system  without ingesting into database

for $IMPORT in xdmp:filesystem-directory("C:\ABC\0000001\Import")/dir:entry
let $Batchfilename := $IMPORT/dir:filename
let $BatchPathname := $IMPORT/dir:pathname
for $EachBatchPath in xdmp:filesystem-directory($BatchPathname)
let $PDFDocumentPath := xdmp:filesystem-directory($EachBatchPath/dir:entry/dir:pathname)/dir:entry[cts:contains(dir:filename,cts:word-query(".pdf"))]/dir:pathname
 
let $EachInputFileContent := if (xdmp:filesystem-file-exists($PDFDocumentPath) ) then
                                              xdmp:filesystem-file($PDFDocumentPath)
                                            else ( )
 return
  xdmp:save("D:/test", $EachInputFileContent,
          <options xmlns="xdmp:save">
          <output-encoding>utf-8</output-encoding>
          </options>)


throws  error : XDMP-READFILE: $r instance of node()+ -- ReadFile File is not in UTF-8: 

Solution is to establish xcc/mljam connection to access java code to read a pdf



import com.itextpdf.text.pdf.parser.PdfTextExtractor;

import java.io.FileOutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.Rectangle;
import com.lowagie.text.pdf.BaseFont;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfWriter;




public class PDFReaderSample
{
public static void main(String[] args) throws Exception
  {


PdfReader reader = new PdfReader("C:/ABC.pdf");
  int n = reader.getNumberOfPages();
  Rectangle psize = reader.getPageSize(1);
  float width = psize.height();
  float height = psize.width();
 Document document = new Document(new Rectangle(width, height));
  PdfWriter Pdfwriter = PdfWriter.getInstance(document,
new FileOutputStream("D:/test/satyam.pdf"));
 document.open();

 PdfContentByte cb = Pdfwriter.getDirectContent();
 int i = 0;
 int p = 0;
 while (i < n) {
 document.newPage();
 p++;
 i++;
 PdfImportedPage page1 = Pdfwriter.getImportedPage(reader, i);
 cb.addTemplate(page1, .5f, 0, 0, .5f, 60, 120);
 if (i < n) {
 i++;
 PdfImportedPage page2 = Pdfwriter.getImportedPage(reader, i);
 cb.addTemplate(page2, .5f, 0, 0, .5f, width / 2 + 60, 120);
 }
 BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
BaseFont.CP1252,BaseFont.NOT_EMBEDDED);
 cb.beginText();
 cb.setFontAndSize(bf, 19);
 cb.showTextAligned(PdfContentByte.ALIGN_CENTER, "page " + p
+ " of " + ((n / 2) + (n % 2 > 0? 1 : 0)), width / 2, 40, 0);
 cb.endText();
 }
 document.close();

  }
}

Sunday, 26 August 2012

xdmp:filesystem-directory


It returns the directory structure along with sub-directories and file with thieir name,path.type etc

 xdmp:filesystem-directory("C:\ABC\0000001\EXPORT")
returns the following file structure

<dir:directory xsi:schemaLocation="http://marklogic.com/xdmp/directory directory.xsd" xmlns:dir="http://marklogic.com/xdmp/directory" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <dir:entry>
    <dir:filename>0000001-0000000-0000002-ISDA-MAG-MC02_2011-06-27_04-59-01-206.xml</dir:filename>
    <dir:pathname>C:\ABC\0000001\EXPORT\0000001-0000000-0000002-ISDA-MAG-MC02_2011-06-27_04-59-01-206.xml</dir:pathname>
    <dir:type>file</dir:type>
    <dir:content-length>75087</dir:content-length>
    <dir:last-modified>2012-08-24T15:38:24+05:30</dir:last-modified>
  </dir:entry>
  <dir:entry>
    <dir:filename>docslist.xml</dir:filename>
    <dir:pathname>C:\ABC\0000001\EXPORT\docslist.xml</dir:pathname>
    <dir:type>file</dir:type>
    <dir:content-length>133</dir:content-length>
    <dir:last-modified>2012-08-24T15:38:24+05:30</dir:last-modified>
  </dir:entry>
</dir:directory>


Points:
->Here dir:directory specify the directory and not considered as the root of document so if we run command  like :  xdmp:filesystem-directory("C:\ColossusImportExport\0000001\EXPORT")/dir:directory
then it returns an empty sequence.here dir:filename is the root of structure.

->the uri path  "C:\ABC\0000001\EXPORT" is case insensitive.

Friday, 24 August 2012

To Copy a set of files from a given folder in file structure to another folder on file structure


for $Each in  xdmp:filesystem-directory($InputFileDirectoryPath)/dir:entry
 let $EachInputFilePath := $Each/dir:pathname
 let $EachInputFilename := $Each/dir:filename
 let $EachInputFileContent := if (xdmp:filesystem-file-exists($EachInputFilePath) ) then xdmp:filesystem-file($EachInputFilePath) else ( )
 let $EachOutputFilePath := fn:concat($OutputFileDirectoryPath, $EachInputFilename)
 let $EachOutputFileContent := xdmp:unquote($EachInputFileContent)
 return
  xdmp:save($EachOutputFilePath , $EachOutputFileContent)