How to save hi-res images from museum websites

Written by jon | Published 2017/11/09
Tech Story Tags: photography | museums | hacking | internet | museum-websites

TLDRvia the TL;DR App

When I worked in publishing, I used to do a lot of picture research. I’d love to go deep into a topic and uncover amazing, little-known, pictures that captured a special time and place.

On these trawls through the internet, however, I would frequently come across websites that did everything in their power to stop you downloading images.

Museum websites are particularly annoying about this as the images are usually public domain anyway.

Half the time, the same institution is running some sort of open access program, but hasn’t gotten round to making everything available yet.

When I came across situations where I knew the hi-res existed, I was pretty determined to get them. I enjoy taking things apart and tinkering with them.

Here’s how you do it.

Disclaimer

Respect copyright. Downloading public domain images for inspiration/personal use is one thing. Ripping off content creators is a different thing.

These ‘hacks’ might not work forever. Sysadmins do eventually fix things. Maybe they’ll read this article. If these methods stop working, you may or may not be able to figure out a workaround.

I’m giving you a net, not a fish. The techniques are not exhaustive, but if you play around with them, you should have the tools to experiment on different sites.

So, for educational purposes only, here is how you hack into…

The Library of Congress

Probably the biggest and best collection of historic images on the internet. An incredible collection of tremendous importance.

A large number of their scans are downloadable in the form of gorgeous, enormous, tifs. Thank you LOC — this is how you do it! You put other institutions to shame! Some are not. But they’re often still out there, hidden on the server.

Let’s look at an example.

Unlike many of their records, this example has no download options. Open the thumbnail in a new tab and you get disappointment. They claim “Full online access to this resource is only available at the Library of Congress”. Hmmm, let’s just see about that shall we?

The LOC file system is pretty easy to crack. There appear to be 3 main sizes of jpeg and 2 main sizes of tif.

The jpeg filenames end with either _150px, r or v. An example: filenamer.jpg.

The tifs end with u or a.

In the example above, lets see what happens when we replace the thumbnail _150px with an r.

Bingo

Larger image. Just like that. Try it again with a v and you’ll get an even better image. Try it with u.tif or a.tif, and you’ll get…

Failure

The fix is easy. If you find an image that you can download as a tif, you’ll see from the download address that the tifs are located in a ‘master’ folder instead of the ‘service’ folder. Change ‘service’ to ‘master’ in that section of the url and you’re good to go.

If you’re in luck, a hi-res tif will start downloading. Some of the images only seem to be digitised up to the u level, but they should still be big enough for most purposes.

Happy downloading.

University of Las Vegas

This one’s a bit trickier, but still pretty doable.

There are actually a few ways of getting in but I’ll show you the easiest.

Here’s a photo of Howard Hughes on parade.

It can be downloaded via the download button, but we’re going to ignore it for now and download it the hacker way. That way, you can use the technique with files that don’t have a download button.

Open the image in a new tab and you’ll see the website spits out a small section of the image, which is pretty useless.

Look in the url though and there’s some useful info:

http://d.library.unlv.edu/utils/ajaxhelper/?CISOROOT=hughes&CISOPTR=1713&action=2&DMSCALE=15&DMWIDTH=512&DMHEIGHT=512&DMX=0&DMY=0&DMTEXT=&DMROTATE=0

The important bits here are in bold: dmscale, dmwidth, dmheight. Change the scale to 100 and change the width and height to the values listed on the record page (in this case 6016 x 4948), hit return and you’ll get a lovely big jpeg to download.

If you can’t find the dimensions, change scale to 100, put the dimensions to something big (5000+) and see if the image is cropped. If it is, increase the dimensions by an appropriate amount until it contains the whole image.

Many archives use a system similar to this. Once you know how, it’s amazingly easy to get past it.

BNF

France’s national library is another treasure-trove of images. They have digitised some beautiful volumes, but they make it fairly hard to download the hi-res. Fortunately, we can use Chrome’s developer tools to peek under the hood and then use the same principles as above to get full-size jpegs.

Find an image and open the console in Chrome.

Find an item and flick through the pages. Once you find an image you love, right click and Inspect. Click Sources at the top of the Inspector and you’ll see a folder that reads something like:

http://gallica.bnf.fr/iiif/ark:/12148/btv1b8600236v/f24/0,0,2770,4093/174,/0

This refers to (in order left to right) the volume, folio, section coordinates, width, height, resolution, rotation.

Open the top folder, which should contain a lo-res of the full image.

The first two numbers after the folio number will be 0, and the second two numbers will give you the true dimensions.

Right click the preview image in inspector and open in a new tab.

In order to generate a full image, click in the url and change the values after the f23/ to full/full/0/native.jpg.

You could also set or keep the first two values at 0, change the second two to the full dimensions (e.g. 2770, 4093) and change the number after the slash to the full width (in this case 2770).

Boom. Massive image.

University of Chicago

The protocol is similar to the above.

Find a zoomable image.

Inspect the image and open one of the tiles in a new tab.

Replace the last command, &jtl=x,x, with &cvt=jpeg

This should give you a fairly large version of the whole image. You can also set the width of the full image by adding the command &wid=x.

It should be possible to define wid=full but, annoyingly, the server appears to have a max limit, and this doesn’t produce a bigger file.

By looking at the source code more closely, we can find out the exact size of the source file. This is a bit more technical, but just take my word for it and look at the screengrab below:

19862! That’s enormous! I tried setting the width as that and while I didn’t get that size, the server did return a file twice the size of the “full” width image. Weird. If you want to do this yourself, drop in 5000 and see what happens.

The best option for now appears to be:

  1. Inspect the image and find a tile
  2. Open the tile in a new tab
  3. Replace the jtl bit of the url with the commands &wid=5000&cvt=jpeg

This will produce a pretty big jpeg, which should be good enough for most purposes. You could probably print it in a book for instance…But it’s not a super hi-res poster-size image.

If anybody knows how to get the original tif, please let me know!

Stanford Libraries

Like the BNF, this is built on the IIIF protocol. Annoyingly, it’s much tricker to download as there are several roadblocks to get around. For a start, they’ve just completely blocked the ability to create a large image. The function just doesn’t work, so we need to use a handy tool to stitch together all the tiles instead. Nothing is that difficult, it just takes a bit more time. I’ll try to cover as clearly as possible:

Part One

  1. Open up an image page.
  2. Open developer tools.
  3. Click the NETWORK tab. Then click XHR.
  4. Refresh the page — you should see some files load in the left-hand panel.
  5. Select info.json and then right-click and “copy link address”.

Here are some images, using Wayne Gretzky as an example:

Open Developer Tools, select Network, then click XHR. I’ve circled them in red. Then refresh the page.

Select info.json. Right-click and copy the link url.

This contains all the relevant information about the image.

Part Two

  1. Load Firefox. This will most likely only work in Firefox, not Chrome.
  2. Go to Dezoomify.
  3. Paste in the json link.
  4. Wait for the image to load.
  5. Right-click and download the image.
  6. You may need to wait for a few seconds. The browser won’t like it (Chrome actually prevents it), but you’ll probably be fine, just give it a bit of time to process.
  7. Download.

A Note on Dezoomify

Dezoomify is a great little tool, but I’ve found it often takes a bit of fiddling to make it work. It never seems to detect automatically for me. It’s a good solution for blocked images, and will probably fetch any tiled, zoomed, image you want, but it’s useful to understand the server structure first and know how these archives work.

And that’s how you hack into museum websites and download hi-res images!

I hope you found this guide useful and entertaining.

I actually get more email about this article than almost anything else I’ve written, even though the Medium stats are quite low. I’m guessing it’s a very specific audience who read it. I used to try and answer every email and could usually help out. Unfortunately, I have far less free time these days, so apologies if you emailed and I never got back to you. I admit a few websites have me stumped. It appears as though dezoomify isn’t as reliable it used to be. It is encouraging that so many people are actually using these amazing archives. I hope that more and more institutions will digitize their archives and make the images freely available.

############################################

A little about me

I’m a bibliophile and writer who worked at various museums and publishers, then decided the future was digital. I learned a lot about people, design, and writing, and now use that knowledge to create great user experience.


Written by jon |
Published by HackerNoon on 2017/11/09