Package: daiR 1.2.1

daiR: Interface with Google Cloud Document AI API

R interface for the Google Cloud Services 'Document AI API' <https://cloud.google.com/document-ai> with additional tools for output file parsing and text reconstruction. 'Document AI' is a powerful server-based OCR service that extracts text and tables from images and PDF files with high accuracy. 'daiR' gives R users programmatic access to this service and additional tools to handle and visualize the output. See the package website <https://dair.info/> for more information and examples.

Authors:Thomas Hegghammer [aut, cre]

daiR_1.2.1.tar.gz
daiR_1.2.1.zip(r-4.7)daiR_1.2.1.zip(r-4.6)daiR_1.2.1.zip(r-4.5)
daiR_1.2.1.tgz(r-4.6-any)daiR_1.2.1.tgz(r-4.5-any)
daiR_1.2.1.tar.gz(r-4.7-any)daiR_1.2.1.tar.gz(r-4.6-any)
daiR_1.2.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
daiR/json (API)

# Install 'daiR' in R:
install.packages('daiR', repos = c('https://hegghammer.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/hegghammer/dair/issues

Pkgdown/docs site:https://dair.info

On CRAN:

Conda:

google-cloudocr

7.38 score 44 stars 52 scripts 368 downloads 47 exports 62 dependencies

Last updated from:5ba121bbc0. Checks:7 ERROR, 2 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64ERROR194
source / vignettesOK262
linux-release-x86_64ERROR184
macos-release-arm64ERROR174
macos-oldrel-arm64ERROR137
windows-develERROR130
windows-releaseERROR142
windows-oldrelERROR157
wasm-releaseOK154

Exports:build_block_dfbuild_token_dfcreate_processordai_asyncdai_async_tabdai_authdai_notifydai_statusdai_syncdai_sync_tabdai_tokendai_userdelete_processordisable_processordraw_blocksdraw_entitiesdraw_linesdraw_paragraphsdraw_tokensenable_processorfrom_labelmeget_entitiesget_ids_by_typeget_processor_infoget_processor_versionsget_processorsget_project_idget_tablesget_textget_versions_by_typeimage_to_pdfimg_to_binbaseis_colouris_jsonis_pdflist_processor_typesmake_hocrmerge_shardspdf_to_binbasereassign_tokensreassign_tokens2redraw_blockssplit_blocktables_from_dai_filetables_from_dai_responsetext_from_dai_filetext_from_dai_response

Dependencies:antiwordaskpassassertthataudiobase64encbeeprbitopscachemcellrangerclicpp11crayoncurldata.tabledigestfastmapfsgarglegluegoogleAuthRgoogleCloudStorageRhmshttrjsonlitelifecyclemagickmagrittrmemoisemimemintyndjsonopensslpdftoolspillarpkgconfigprettyunitsprogresspurrrqpdfR6rappdirsRcppRCurlreadODSreadtextreadxlrematchrjsonrlangstreamRstringistringrstriprtfsystibbletzdbutf8vctrswithrxml2yamlzip

Basic usage
Synchronous processing | Asynchronous processing | Large batches | Merging shards

Last update: 2025-11-17
Started: 2024-02-11

Extracting tables
Activating a form parser processor | Synchronous processing with form parsers | Asynchronous processing with form parsers | How good is it?

Last update: 2024-11-13
Started: 2024-02-11

Correcting text output
The problem | Reordering blocks | Splitting blocks | Mathematical splitting | Manual splitting

Last update: 2024-11-12
Started: 2021-03-04

Configuration
Authentication | Step 1: Get a Gmail account | Step 2: Activate the Google Cloud Console | Step 3: Link your project to your billing account | Step 4: Set up a service account | Step 5: Download a json file with the service account key | Step 6: Store the path to the credentials file as an environment variable | Step 7: Activate Document AI | Step 8: Create a processor | Step 9: Store the processor id as an environment variable | Cheatsheet

Last update: 2024-02-28
Started: 2024-02-11

Quickstart
Set up a Google Cloud Services account | Process synchronously | Process asynchronously | 1. Upload files to your Google Cloud Storage bucket | 2. Tell Document AI to process them: | 3. Download the JSON output and extract the text:

Last update: 2024-02-28
Started: 2024-02-11

Working with Google Cloud Storage
Setup | Creating and inspecting buckets | Uploading files | Downloading files | Deleting | Convenience functions | Cheatsheet

Last update: 2024-02-28
Started: 2024-02-11

Complex file and folder management
Image files | Processing a folder tree

Last update: 2023-08-28
Started: 2023-08-28

Readme and manuals

Help Manual

Help pageTopics
Run when daiR is attached.onAttach
Build block dataframebuild_block_df
Build token dataframebuild_token_df
Create processorcreate_processor
OCR documents asynchronouslydai_async
Check authenticationdai_auth
Notify on job completiondai_notify
Check job statusdai_status
OCR document synchronouslydai_sync
Produce access tokendai_token
Get user informationdai_user
Delete processordelete_processor
Disable processordisable_processor
Draw block bounding boxesdraw_blocks
Draw entity bounding boxesdraw_entities
Draw line bounding boxesdraw_lines
Draw paragraph bounding boxesdraw_paragraphs
Draw token bounding boxesdraw_tokens
Enable processorenable_processor
Extract block coordinates from labelme filesfrom_labelme
Get entitiesget_entities
List ids of available processors of a given typeget_ids_by_type
Get information about processorget_processor_info
List available versions of processorget_processor_versions
List created processorsget_processors
Get project idget_project_id
Get tablesget_tables
Get textget_text
List versions of available processors of a given typeget_versions_by_type
Convert images to PDFimage_to_pdf
Image to base64 tiffimg_to_binbase
Check that a string is a valid colour representationis_colour
Check that a file is JSONis_json
Check that a file is PDFis_pdf
List available processor typeslist_processor_types
Make hOCR filemake_hocr
Merge shardsmerge_shards
PDF to base64 tiffpdf_to_binbase
Assign tokens to new blocksreassign_tokens
Assign tokens to a single new blockreassign_tokens2
Inspect revised block bounding boxesredraw_blocks
Split a block bounding boxsplit_block
Get tables from output filetables_from_dai_file
Get tables from response objecttables_from_dai_response