Skip to content

iniital commit of uploader script to upload pdf files and convert wor…#1

Closed
bobweston wants to merge 1 commit into
MuckRock:masterfrom
bobweston:uploader
Closed

iniital commit of uploader script to upload pdf files and convert wor…#1
bobweston wants to merge 1 commit into
MuckRock:masterfrom
bobweston:uploader

Conversation

@bobweston

Copy link
Copy Markdown

This script takes advantage of the python-documentcloud library to upload a directory of pdf files to documentcloud beta. It also converts word-documents on the fly to pdf format so that they can be uploaded to documentcloud beta.

We used this script to upload a large document dump (over 100 GB of data and 270,000 plus documents) and it worked well. In a future version, we've discussed splitting large documents up into smaller documents (before they're uploaded). But we thought you might be interested in reviewing it and making it available to others.

If you have no interest in this, that's fine. we just thought we would pay it forward.

I've only ever used ruby up until this point. So any complaints about code quality in this script are solely my responsibility. :)

-Bob

duckduckgrayduck added a commit that referenced this pull request Dec 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants