Package: daiR 1.2.1

daiR: Interface with Google Cloud Document AI API

R interface for the Google Cloud Services 'Document AI API' <https://cloud.google.com/document-ai> with additional tools for output file parsing and text reconstruction. 'Document AI' is a powerful server-based OCR service that extracts text and tables from images and PDF files with high accuracy. 'daiR' gives R users programmatic access to this service and additional tools to handle and visualize the output. See the package website <https://dair.info/> for more information and examples.

Authors:Thomas Hegghammer [aut, cre]

daiR_1.2.1.tar.gz
daiR_1.2.1.zip(r-4.7)daiR_1.2.1.zip(r-4.6)daiR_1.2.1.zip(r-4.5)
daiR_1.2.1.tgz(r-4.6-any)daiR_1.2.1.tgz(r-4.5-any)
daiR_1.2.1.tar.gz(r-4.7-any)daiR_1.2.1.tar.gz(r-4.6-any)
daiR_1.2.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
daiR/json (API)
NEWS

# Install 'daiR' in R:
install.packages('daiR', repos = c('https://hegghammer.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/hegghammer/dair/issues

Pkgdown/docs site:https://dair.info

On CRAN:

Conda:

google-cloudocr

7.14 score 44 stars 45 scripts 258 downloads 47 exports 62 dependencies

Last updated from:054ca0f2ce. Checks:7 ERROR, 2 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64ERROR201
source / vignettesOK186
linux-release-x86_64ERROR195
macos-release-arm64ERROR132
macos-oldrel-arm64ERROR184
windows-develERROR147
windows-releaseERROR129
windows-oldrelERROR145
wasm-releaseOK131

Exports:build_block_dfbuild_token_dfcreate_processordai_asyncdai_async_tabdai_authdai_notifydai_statusdai_syncdai_sync_tabdai_tokendai_userdelete_processordisable_processordraw_blocksdraw_entitiesdraw_linesdraw_paragraphsdraw_tokensenable_processorfrom_labelmeget_entitiesget_ids_by_typeget_processor_infoget_processor_versionsget_processorsget_project_idget_tablesget_textget_versions_by_typeimage_to_pdfimg_to_binbaseis_colouris_jsonis_pdflist_processor_typesmake_hocrmerge_shardspdf_to_binbasereassign_tokensreassign_tokens2redraw_blockssplit_blocktables_from_dai_filetables_from_dai_responsetext_from_dai_filetext_from_dai_response

Dependencies:antiwordaskpassassertthataudiobase64encbeeprbitopscachemcellrangerclicpp11crayoncurldata.tabledigestfastmapfsgarglegluegoogleAuthRgoogleCloudStorageRhmshttrjsonlitelifecyclemagickmagrittrmemoisemimemintyndjsonopensslpdftoolspillarpkgconfigprettyunitsprogresspurrrqpdfR6rappdirsRcppRCurlreadODSreadtextreadxlrematchrjsonrlangstreamRstringistringrstriprtfsystibbletzdbutf8vctrswithrxml2yamlzip

Basic usage

Rendered fromusage.Rmdusingknitr::rmarkdownon May 20 2026.

Last update: 2025-11-17
Started: 2024-02-11

Complex file and folder management

Rendered fromcomplex_file_and_folder_management.Rmdusingknitr::rmarkdownon May 20 2026.

Last update: 2023-08-28
Started: 2023-08-28

Configuration

Rendered fromconfiguration.Rmdusingknitr::rmarkdownon May 20 2026.

Last update: 2024-02-28
Started: 2024-02-11

Correcting text output

Rendered fromreconstructing_text.Rmdusingknitr::rmarkdownon May 20 2026.

Last update: 2024-11-12
Started: 2021-03-04

Extracting tables

Rendered fromtables.Rmdusingknitr::rmarkdownon May 20 2026.

Last update: 2024-11-13
Started: 2024-02-11

Quickstart

Rendered fromquickstart.Rmdusingknitr::rmarkdownon May 20 2026.

Last update: 2024-02-28
Started: 2024-02-11

Working with Google Cloud Storage

Rendered fromgcs_storage.Rmdusingknitr::rmarkdownon May 20 2026.

Last update: 2024-02-28
Started: 2024-02-11

Readme and manuals

Help Manual

Help pageTopics
Run when daiR is attached.onAttach
Build block dataframebuild_block_df
Build token dataframebuild_token_df
Create processorcreate_processor
OCR documents asynchronouslydai_async
Check authenticationdai_auth
Notify on job completiondai_notify
Check job statusdai_status
OCR document synchronouslydai_sync
Produce access tokendai_token
Get user informationdai_user
Delete processordelete_processor
Disable processordisable_processor
Draw block bounding boxesdraw_blocks
Draw entity bounding boxesdraw_entities
Draw line bounding boxesdraw_lines
Draw paragraph bounding boxesdraw_paragraphs
Draw token bounding boxesdraw_tokens
Enable processorenable_processor
Extract block coordinates from labelme filesfrom_labelme
Get entitiesget_entities
List ids of available processors of a given typeget_ids_by_type
Get information about processorget_processor_info
List available versions of processorget_processor_versions
List created processorsget_processors
Get project idget_project_id
Get tablesget_tables
Get textget_text
List versions of available processors of a given typeget_versions_by_type
Convert images to PDFimage_to_pdf
Image to base64 tiffimg_to_binbase
Check that a string is a valid colour representationis_colour
Check that a file is JSONis_json
Check that a file is PDFis_pdf
List available processor typeslist_processor_types
Make hOCR filemake_hocr
Merge shardsmerge_shards
PDF to base64 tiffpdf_to_binbase
Assign tokens to new blocksreassign_tokens
Assign tokens to a single new blockreassign_tokens2
Inspect revised block bounding boxesredraw_blocks
Split a block bounding boxsplit_block
Get tables from output filetables_from_dai_file
Get tables from response objecttables_from_dai_response