Tuesday, September 6, 2022

Creating a pdf book from images or pdfs

This is my method of creating a pdf book for uploading to archive.org. First, I scan it into pdfs, usually about 10 scans (10 or 20 pages depending on the size of the page). I use a HP-Laserjet-200-colorMFP-m276nw. These scripts were all done on Fedora 35. Next I split out the images from the pdf:
mkdir out
for D in t??0; do
    cd $D
    mv scan.pdf scan0000.pdf
    for S in `ls scan0*`; do
        I=`echo $S | sed 's/scan\([0-9]*\).pdf/\1/'`
        echo $D $I
        pdfimages -png scan${I}.pdf ../out/s${D}_${I}
    done
    cd ..
done
Next I have three different ways of processing the book depending on how it was scanned. For each of these, I usually open a page and figure out what numbers I should use for the crop statement. The simpliest is if I scanned two pages at once, and am not splitting them:
cd out
for I in st*.png
do
    R=`echo $I | sed 's/s/r/' | sed 's/-/_/'`
    T=`echo $I | sed 's/s/t/' | sed 's/-/_/'`
    if test $I -ot $R
    then
	    echo $I $R $T already done
    else
	    echo $I $R $T
	    convert $I -crop 2550x3510+0 -rotate 90 -despeckle $R
	    convert $R -resize 33% -level 10%,90%,0.5 -posterize 32 $T
    fi
done
If I had to scan each page individually, half of them will flipped, so I need to flip some of them differently. Notice for this to work, I have to make sure that I always fip the odd ones and the even ones correctly when scanning.
cd out
for I in st*[02468].png
do
    R=`echo $I | sed 's/s/r/' | sed 's/-/_/'`
    T=`echo $I | sed 's/s/t/' | sed 's/-/_/'`
    if test $I -ot $R
    then
    	echo $I $R $T already done
    else
	    echo $I $R $T
	    convert $I -crop 1800x2700+0 -despeckle  $R
	    convert $R  -resize 33% -level 10%,90%,0.5 -posterize 32 $T
    fi
done

for I in st*[13579].png
do
    R=`echo $I | sed 's/s/r/' | sed 's/-/_/'`
    T=`echo $I | sed 's/s/t/' | sed 's/-/_/'`
    if test $I -ot $R
    then
	    echo $I $R $T already done
    else
	    echo $I $R $T
	    convert $I -crop 1800x2700+0 -rotate 180 -despeckle  $R
	    convert $R  -resize 33% -level 10%,90%,0.5 -posterize 32 $T
    fi
done
If I scanned two pages at once, and am planning to split them, I have a different script:
cd out
for I in st*.png
do
    R=`echo $I | sed 's/s/r/' | sed 's/-/_/'`
    RA=`echo $R | sed 's/.png/a.png/'`
    RB=`echo $R | sed 's/.png/b.png/'`
    T=`echo $I | sed 's/s/t/' | sed 's/-/_/'`
    TA=`echo $T | sed 's/.png/a.png/'`
    TB=`echo $T | sed 's/.png/b.png/'`
    if test $I -ot $R
    then
	    echo $I $R $T already done
    else
	    echo $I $R $T $RA $RB $TA $TB
	    convert $I -crop 1790x1350+0 -rotate 90 -despeckle $RB
	    convert $I  -crop 1790x1350+0+1350 -rotate 90 -despeckle $RA
	    convert $RA -resize 33% -level 10%,90%,0.5 -posterize 32 $TA
	    convert $RB -resize 33% -level 10%,90%,0.5 -posterize 32 $TB
    fi
done
Lastly, I need to create pdfs out of it.
img2pdf tt*.png --author "Fred Smith"  --title "Smithing" -o ../smithing_1925_small.pdf

img2pdf rt*.png --author "Fred Smith"  --title "Smithing" -o ../smithing_1925.pdf

No comments:

Post a Comment