|
Post by stamps1962 on Apr 25, 2020 23:47:59 GMT
Is this possible? I have some pages for my Mexico collection with write ups in Spanish. I can pick my way through it but would like to figure a way to scan the pages then cut and paste the write ups into an online Spanish-English translator. I know i won't get it all to come out intelligible, that's oK I'd probably just keep the results separate from the album to refer to as needed.
If I scan the pages and save to my computer would I get a saved image that I can copy/paste from? Would I need to convert the image into something else? Hope my question isn't too confusing.
|
|
stainlessb
Member
qaStaHvIS yIn 'ej chep
Posts: 4,639
What I collect: currently focused on most of western Europe, much of which is spent on France, Belgium, Germany and Great Britain Queen Victoria
Member is Online
|
Post by stainlessb on Apr 25, 2020 23:52:22 GMT
If you have Acrobat DC, you should be able to scan and save as a PDF, and then open in Acrobat DC and using the edit tools you should be able to copy and paste.... though you may need to paste into a separate document (Word) and edit out the words not needed (or correct any thing the scan or Acrobat might mis-interpret)
I have scanned catalogue pages with good success
The standard Acrobat reader will not perform this function
good luck, let us (me) know if this works
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 25, 2020 23:58:36 GMT
Yes, it can be done. Scan pages as image file, run image file through OCR software (since Spanish, may or may not need one that recognizes characters not in standard English, i.e. OCR language option -- probably can get away without it), copy text and paste into a language translator.
Some scanners, such as my book scanner, comes bundled with this software so that it is seamless to scan and convert to text/pdf file. I don't remember if my free bundled version has foreign language option, but I'd guess the upgraded versions would.
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 26, 2020 0:07:33 GMT
If you have Acrobat DC, you should be able to scan and save as a PDF, and then open in Acrobat DC and using the edit tools you should be able to copy and paste.... though you may need to paste into a separate document (Word) and edit out the words not needed (or correct any thing the scan or Acrobat might mis-interpret) I have scanned catalogue pages with good success The standard Acrobat reader will not perform this function good luck, let us (me) know if this works I only have Acrobat Reader DC. I can only select text if the scanner software has the capability of producing searchable PDF (i.e., basic built-in OCR capabilities). If the scanner software doesn't, then I basically get stuck with a pdf that contains images and no selectable text unless I pay for upgraded Acrobat. I have one scanner that doesn't have bundled OCR software and can't select any text from those scans, but my other scanners can produce searchable PDFs. Let me know if I have missed/misunderstood something, or need a different version of Acrobat (needs to be free, I got sick of paying for Adobe licenses). Because I would love to be able to easily extract text from that one scanner that doesn't auto do OCR.
|
|
stainlessb
Member
qaStaHvIS yIn 'ej chep
Posts: 4,639
What I collect: currently focused on most of western Europe, much of which is spent on France, Belgium, Germany and Great Britain Queen Victoria
Member is Online
|
Post by stainlessb on Apr 26, 2020 0:17:06 GMT
As a benefit of work I have the full Adobe Suite (with constant updates from the cloud access) , which seems to do a god job of deciding what is text and what is image- It is called simply Acrobat DC- no Reader, which will not convert (I occasionally open the wrong version)
I have been converting scanned pages from catalogues to English- sometimes it gives me some wird formatting and characters
I have the Epson V600 Photo scanner and it may well have the OCR built in
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 26, 2020 0:27:17 GMT
Thanks, Stan!
Yeah, I am stuck with using Reader because I'm a cheapskate.
|
|
ajkitt
Member
Posts: 175
What I collect: Classics, Central Europe, World
|
Post by ajkitt on Apr 26, 2020 2:04:06 GMT
Just a heads up, there are at least a few websites out there now that will OCR and translate just by drag-n-dropping an image onto the webpage. Free. Tech has definitely come a very long way in the last few years, eh?
|
|
Jerry B
Departed
Rest in Peace
Marietta, Georgia USA
Posts: 1,485
|
Post by Jerry B on Apr 26, 2020 9:16:43 GMT
Hi stamps1962 If I understand correctly, you want to create a document that is editable. One way it can be done is first download PDFCreator ( www.pdfforge.org/pdfcreator ) and install it. It installs as a printer. Once installed, "print" your pages using PDFCreator as the printer. You now have PDF pages that you can edit. I have used PDFCreator for years. I have used this method in the past and it works out pretty well. Jerry B
|
|
angore
Member
Posts: 5,335
What I collect: WW, focus on British Empire
|
Post by angore on Apr 26, 2020 10:24:07 GMT
My Epson scanner came with a free Abbyy Fineprint that can do it. The Epson software is tied to the Abbyy software so if I scan to a searcbable PDF it launches Abbyy to do the OCR so what scanner do you have? Was there any bundled software you have or did not install?
If you google "image to text" I see some free software but did not check anything.
|
|
Jerry B
Departed
Rest in Peace
Marietta, Georgia USA
Posts: 1,485
|
Post by Jerry B on Apr 26, 2020 15:04:31 GMT
Hi
I have found that PDFCreator software was a little better than OCR software. The software actually does a fantastic job of "printing" everything, images, some diagrams, etc.. Things that OCR doesn't render nicely come out clean with PDFCreater. With OCR one may not be able to take a section of a document.
Jerry B
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 26, 2020 18:04:50 GMT
My Epson scanner came with a free Abbyy Fineprint that can do it. The Epson software is tied to the Abbyy software so if I scan to a searcbable PDF it launches Abbyy to do the OCR so what scanner do you have? Was there any bundled software you have or did not install?
If you google "image to text" I see some free software but did not check anything. My book scanner did come bundled with Abbyy FineReader, and that's exactly how it functions -- 2 step/software process. I find it a little slow, but it does the job and can work independent of scanner software. Ironically, the one scanner I had that didn't come bundled with OCR software is my Epson WF! Not sure why? My Canon imageCLASS scanning software has built in OCR. It produces searchable PDF without the user ever seeing a "jump" to OCR software. While the scan quality is not good in terms of color and contrast/brightness (cannot change settings), the OCR is remarkably accurate and fast.
|
|
Ryan
Member
Calgary, Alberta, Canada
Posts: 2,720
What I collect: If I have a catalogue for it, I collect it. And I have many catalogues ....
Member is Online
|
Post by Ryan on Apr 26, 2020 20:00:16 GMT
My book scanner did come bundled with Abbyy FineReader Kim, what kind of book scanner do you have? I've wanted a book-edge scanner for years but any time I look online I get scared by the high prices! Recently I've seen a crowdfunded campaign for inexpensive "scanners" (more like batch photo editors) which might be interesting, certainly they're inexpensive enough to make it worth a stab (sort of the way I've bought digital microscopes, they're cheap enough that you don't get too upset if it turns out they aren't really of any use). The book-edge scanners look like this - this particular one is about US $750. No need to break book spines, the scanner goes right to the edge of the platen. And the latest version of the CZUR scanner looks like this - around US $150, apparently. Ryan
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 26, 2020 20:48:37 GMT
Ryan , I decided against the CZUR scanner because I noticed when I was taking simple camera pics, the room lighting/glare made a huge difference. I pretty much had to go next to a large un-paned window and rely on natural sunlight to get pictures without glare. Also, many books with tight binding, or very very thick and small binding margins (like Scott catalogs), you will get blurred areas which will not OCR properly. Quite frankly, if I have to get something like that, I will simply keep using my iphone camera -- it produces "decent" results, and I simply use the basic Windows Photo Viewer to flip through the jpgs, and the scroll on my mouse to quickly magnify whatever area my cursor is located. While not "searchable", it's fast and convenient for looking up a stamp price. PDFs are searchable, but really large files can be slow to scroll through. I use my Epson WF for large-sized documents. Not sure why this model didn't come bundled with free OCR software. The book scanner I ended up purchasing was the Plustek OpticBook scanner in your first picture. However, I purchased the cheaper 3900 model (<$300 including shipping). It is essentially the same as the 4800(?), except it only scans up to 6mm of the scanner edge instead of 2mm(?) A friend of mine had the 3900 and told me he hadn't encountered a situation where he needed that 4mm, so I elected not to spend the extra $400+ (that comes to $100/mm!!! ). The 3900 does its job. Slow like normal scanners, not like the CZUR or camera/iphone pics. But it is faster than my Canon imageCLASS, and the bundled software is also designed for book scanning (e.g., auto page-flip, immediate basic user image processing...) although the OCR is a pass-thru to Abbyy FineReader (cannot do until you finish the scanning). It does come with a pdf "combiner" for those of us who don't have Adobe subscriptions. But, know the limitations for Plustek. My experience: -- scanning color is significantly off; I haven't calibrated my scanner (I never do), and not sure if I can do that with Plustek; it was designed to be a book scanner, not a picture/stamp scanner -- it takes a long time to warm up (minutes), which to me doesn't bode well in terms of lamp life -- in reality, it is 6mm + 2mm+, because on most books you can't really get the binding flush against the edge -- unless it is a very thin pamplet, you still have to press down/against the binding so the page will be reasonably flat near the binding edge; otherwise, you will get some shadowing near the binding I discovered the hard way, when scanning prestige booklets, that if I press too hard, I will end up popping the binding or putting a noticeable "bend" in the page/binding. In hindsight, given that I spent nearly $300 for it, I should have gone ahead and forked over another 2x+ to get the additional 4mm. But, too late. So far, the only unexpected need for that additional 4mm has been: prestige booklets, books with 2-page pictures running across the binding, books with tight margins (if the book a thick hardback, and you can't physically measure a 1cm margin on the binding edge, you are likely to run into some issues). So those, I've gone back to using iphone pics or just letting the scan get chopped/blurred out. For my thick catalogs, such as Scott, I found it is too cumbersome having to lift the thick catalog and flip a page when using the book scanner. For a dozen pages, fine, but for a large section. Ugh. I've gone back to using iphone pics, and flipping through it like the CZUR unit. I have to hold the camera in my hand, can't really be wearing headset as I need to "hear" the iphone "click", and I look nothing like that model, but just not ready to spend $170 on yet another digitizing desk unit. Just my opinions. If you have any specific questions about the Plustek, feel free to ask.
|
|
Ryan
Member
Calgary, Alberta, Canada
Posts: 2,720
What I collect: If I have a catalogue for it, I collect it. And I have many catalogues ....
Member is Online
|
Post by Ryan on Apr 26, 2020 21:23:30 GMT
An excellent review - thanks very much for taking the time!
Ryan
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 26, 2020 22:52:56 GMT
You're very welcome, Ryan.
I went to the Plustek website to confirm some of my numbers (the ones with all the ? next to them), because I was going by my unreliable memory. I don't remember the difference between the 3800L and 3900, but I got the 3900.
It comes bundled with 2 pieces of scanning software. The one I use will save as jpg, pdf, tiff, but not bmp. But, you can save as no compression jpg if you don't have tiff software.
There are certain colors that my particular scanner doesn't handle well, especially light color pastels. For example, light beige or peach doesn't scan well at all. I had done a comparison between my Opticbook and imageCLASS that showed the results for that color. It was going to be for a review thread that I never got around to finish scanning comparison pics. The light beige/peach either doesn't show up or barely shows up at all for the Opticbook. Let me know if you want to see that comparison, and I can hunt down and post those pics.
One thing I like about the 3900 compared to 4800. The 3900 has large color coded buttons. While you can control it from you computer, sometimes if you are pressing against the binding, your finger can only reach "so far". Large buttons are helpful. Also, the color coding makes it easier to remember which buttons to press. It's easier for me to remember button color than button location. JMO
|
|
Philatarium
Member
Los Angeles, CA
Posts: 1,032
What I collect: Primarily focused on Japan, but lots of other material catches my eye as well ...
|
Post by Philatarium on Apr 26, 2020 23:30:34 GMT
I'm with Ryan, khj -- thanks very much for posting your experiences. It's very helpful. I've been just mentally playing with the idea of either a (not-so-expensive) book scanner or else the CZUR, which shows up in my Facebook feed with an ad several times a day. My decision strategy thus far is to wait as long as possible (it's already been years; why not a few more? lol) and then see what the available options are. Hopefully they'd be more capable and less pricey. My Samsung Android phone (a fairly recent one) has much better photo capability than my older one, and I can usually make it work for something that only needs to be functional but doesn't need to be pretty. Stealing an idea from those websites where people build their own book scanners with 2 cameras on overhead stands, I've toyed with the idea of doing that with 2 of those phones. (I can borrow the second one from a friend.) Anyway, I think I'll just continue to think about it, and revisit the idea after a couple more generations of technology are developed!
|
|
Ryan
Member
Calgary, Alberta, Canada
Posts: 2,720
What I collect: If I have a catalogue for it, I collect it. And I have many catalogues ....
Member is Online
|
Post by Ryan on Apr 26, 2020 23:32:17 GMT
My own scanner, an HP, is also poor with some colours. Yellow is especially bad, I don't know how it can get that so far off. And somehow, red scans nicely, blue scans nicely, and purple comes out wrong. I can't figure that one out either. I run a batch colour level on a finished scan to make it come out more like what I see and the scans look OK once I've done that, but some of the colour tones are wrong. I tried to fix it once by scanning a colour guide and then manipulating individual colour components in an attempt to match each colour chip and wow, what I mess I had after I was done! Those levels got thrown away and I went back to using my other method.
Ryan
|
|
|
Post by PostmasterGS on Apr 27, 2020 0:45:37 GMT
I have a good bit of experience in this area, so FWIW, here's my $.02. I started with a DIY setup using a digital camera. It worked OK. Very labor intensive, especially on the back end to clean-up the end product. I then tried the CZUR. In my top 3 of worst investments I've ever made in a piece of tech. I never got a single decent result out of it. If you're scanning magazines, it might be OK, but if there's any stiffness to the spine at all, you won't get a good product. The demos of their "page flattening" tech must have been under über-ideal conditions, because I could never get it to work worth a flip. I then decided to get one oof the Plustek OpticBooks. I wanted the higher-end model that scanned within 2mm of the edge, but at the time, it didn't have Mac support. So I went with the Plustek OpticBook 3900 that khj mentioned above. I scanned thousands of pages with it. I can't speak to the color accuracy, as I don't use it for scanning my stamps, and I honestly never paid that much attention to the color accuracy when scanning book pages. Then, Plustek came out with a new model that has Mac support and the 2mm edge, the OpticBook 4800. I recently upgraded to it, and have had good luck with it so far, though I haven't scanned more than a few dozen pages with it.
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 27, 2020 1:15:51 GMT
Actually, it was after consulting with PostmasterGS that I decided on the 3900, and have been pretty happy about the unit. So thanks! Like him, I use it for scanning books, so color isn't really that important. And the files are for my personal use, so they don't have to be perfect. I use a different scanner for stamps. I only mentioned the color issue, in case someone was thinking about using it for scanning stamps as well, and the edge details in case you were preparing something for distribution.
|
|
angore
Member
Posts: 5,335
What I collect: WW, focus on British Empire
|
Post by angore on Apr 27, 2020 11:49:24 GMT
Thanks for the review.
I was looking at a book scanner too last year to scan Scott album pages since oversized. The reviews of the camera based book scanner were generally poor and confirmed from this thread despite dubious glowing reviews.
I did buy the Abbyy pay version but it was not that significantly better than the free option in terms of OCR accuracy. The usual challenges are small printing, light printing, hyphenated, column text, text wrapped around images, etc. You can scan a PDF and then export to Word or Excel but you always have to spend time fixing errors if accuracy is important. I also have a the full version of Acrobat and sometimes you have to try several software tools to see which yields the best results.
But if you are just trying to extract text from something it works ok.
|
|
brightonpete
Departed
Rest in Peace
On a hike at Goodrich-Loomis
Posts: 5,110
|
Post by brightonpete on Apr 27, 2020 13:47:24 GMT
My scanner did OK with Scott catalogues. I purchased the pages from a few countries, tore out the pages & scanned them individually, then assembled them into a pdf for each. Of course if you own the full catalogue, ripping it up isn't a very good option! But separate countries worked perfectly for me. I usually discard the loose pages when finished. Here is a jpg of page 1 of the Scott catalogue. The pdf zooms in beautifully, unlike this jpg! The text is selectable, so one can copy from it and search for words.
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 27, 2020 18:05:20 GMT
Yes, the reason I got the book scanner is because I was ripping up my books/catalogs to scan them page by page. Then you get nice scans. My eyesight has degraded to a point where I can't really read books/catalogs and sort stamps at the same time (I can't wear bifocals because it will trigger my migraine problems). I figured I could rip up the books/catalogs since I couldn't read them directly anyway -- but such a waste. Also, they take up more space than when still bound (kind of like all the extra soil you have left over when you re-fill a hole ). That's why I ended up getting the book scanner.
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on Apr 27, 2020 18:09:47 GMT
The pdf zooms in beautifully, unlike this jpg! I actually prefer the Windows Photo Viewer to magnify jpg images, because I can enlarge "on the fly" using the scroll on my mouse. I find the Acrobat Reader magnify function cumbersome. Of course, the jpgs are not searchable like the pdfs. Can't have everything (although we try to do that with stamps!).
|
|
Will
Member
Inactive
Posts: 84
What I collect: Venezuela: ESCUELAS 1871-1880, Locals up to 1903. Cinderellas and BOB | Colombia: Up to 1940. States!
|
Post by Will on May 14, 2020 14:34:00 GMT
This will probably be a bit off of the OT intention but I'll throw it here anyway.
If you only need to translate small phrases her and there, and then make your own notes, you can use your cellphone and Google Translate.
In mobile google translate there's a camera icon... if you click on it and point it to a text, it'll machine translate it for you.
Great for your next trip to Thailand or Greece too!
|
|
|
Post by marking on May 14, 2020 21:24:21 GMT
This will solve your book scanning problems LINK
|
|
khj
Member
Posts: 1,451
Member is Online
|
Post by khj on May 15, 2020 4:39:47 GMT
This will solve your book scanning problems LINKBut it won't solve my prison time problems from robbing the bank to pay for it.
|
|
mikeclevenger
Member
Posts: 887
What I collect: Ohio Tax Stamps, Ohio & Georgia Revenues, US Revenues, US FDC's, & Germany Classics
|
Post by mikeclevenger on May 15, 2020 12:14:16 GMT
Is this possible? I have some pages for my Mexico collection with write ups in Spanish. I can pick my way through it but would like to figure a way to scan the pages then cut and paste the write ups into an online Spanish-English translator. I know i won't get it all to come out intelligible, that's oK I'd probably just keep the results separate from the album to refer to as needed. If I scan the pages and save to my computer would I get a saved image that I can copy/paste from? Would I need to convert the image into something else? Hope my question isn't too confusing. If you have an image, with words on it, go to: translate.yandex.com. Click on the "Image" icon at the top of the page, then change the language to what you want to translate. Select the file from your computer, then wait until it comes up with the picture on the screen. To the right, you will see "Open In Yandex Translate", click on it. And there is your translation. Just be warned, that if there were several different columns on the page, it may spread them out and you will have to put them back in order. LOL. Works great for me every time, and it is free.
|
|