Package: daiR 1.1.0

daiR: Interface with Google Cloud Document AI API

R interface for the Google Cloud Services 'Document AI API' <https://cloud.google.com/document-ai/> with additional tools for output file parsing and text reconstruction. 'Document AI' is a powerful server-based OCR service that extracts text and tables from images and PDF files with high accuracy. 'daiR' gives R users programmatic access to this service and additional tools to handle and visualize the output. See the package website <https://dair.info/> for more information and examples.

Authors:Thomas Hegghammer [aut, cre]

daiR_1.1.0.tar.gz
daiR_1.1.0.zip(r-4.5)daiR_1.1.0.zip(r-4.4)daiR_1.1.0.zip(r-4.3)
daiR_1.1.0.tgz(r-4.4-any)daiR_1.1.0.tgz(r-4.3-any)
daiR_1.1.0.tar.gz(r-4.5-noble)daiR_1.1.0.tar.gz(r-4.4-noble)
daiR_1.1.0.tgz(r-4.4-emscripten)daiR_1.1.0.tgz(r-4.3-emscripten)
daiR.pdf |daiR.html
daiR/json (API)
NEWS

# Install 'daiR' in R:
install.packages('daiR', repos = c('https://hegghammer.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/hegghammer/dair/issues

On CRAN:

google-cloudocr

7.59 score 41 stars 34 scripts 246 downloads 47 exports 63 dependencies

Last updated 10 days agofrom:569cfbc6b0. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKNov 13 2024
R-4.5-winNOTENov 13 2024
R-4.5-linuxNOTENov 13 2024
R-4.4-winNOTENov 13 2024
R-4.4-macNOTENov 13 2024
R-4.3-winNOTENov 13 2024
R-4.3-macNOTENov 13 2024

Exports:build_block_dfbuild_token_dfcreate_processordai_asyncdai_async_tabdai_authdai_notifydai_statusdai_syncdai_sync_tabdai_tokendai_userdelete_processordisable_processordraw_blocksdraw_entitiesdraw_linesdraw_paragraphsdraw_tokensenable_processorfrom_labelmeget_entitiesget_ids_by_typeget_processor_infoget_processor_versionsget_processorsget_project_idget_tablesget_textget_versions_by_typeimage_to_pdfimg_to_binbaseis_colouris_jsonis_pdflist_processor_typesmake_hocrmerge_shardspdf_to_binbasereassign_tokensreassign_tokens2redraw_blockssplit_blocktables_from_dai_filetables_from_dai_responsetext_from_dai_filetext_from_dai_response

Dependencies:antiwordaskpassassertthataudiobase64encbeeprbitopscachemcellrangerclicpp11crayoncurldata.tabledigestfansifastmapfsgarglegluegoogleAuthRgoogleCloudStorageRhmshttrjsonlitelifecyclemagickmagrittrmemoisemimemintyndjsonopensslpdftoolspillarpkgconfigprettyunitsprogresspurrrqpdfR6rappdirsRcppRCurlreadODSreadtextreadxlrematchrjsonrlangstreamRstringistringrstriprtfsystibbletzdbutf8vctrswithrxml2yamlzip

Basic usage

Rendered fromusage.Rmdusingknitr::rmarkdownon Nov 13 2024.

Last update: 2024-02-11
Started: 2024-02-11

Complex file and folder management

Rendered fromcomplex_file_and_folder_management.Rmdusingknitr::rmarkdownon Nov 13 2024.

Last update: 2023-08-28
Started: 2023-08-28

Configuration

Rendered fromconfiguration.Rmdusingknitr::rmarkdownon Nov 13 2024.

Last update: 2024-02-28
Started: 2024-02-11

Correcting text output

Rendered fromreconstructing_text.Rmdusingknitr::rmarkdownon Nov 13 2024.

Last update: 2024-11-12
Started: 2021-03-04

Extracting tables

Rendered fromtables.Rmdusingknitr::rmarkdownon Nov 13 2024.

Last update: 2024-11-13
Started: 2024-02-11

Quickstart

Rendered fromquickstart.Rmdusingknitr::rmarkdownon Nov 13 2024.

Last update: 2024-02-28
Started: 2024-02-11

Working with Google Cloud Storage

Rendered fromgcs_storage.Rmdusingknitr::rmarkdownon Nov 13 2024.

Last update: 2024-02-28
Started: 2024-02-11

Readme and manuals

Help Manual

Help pageTopics
Run when daiR is attached.onAttach
Build block dataframebuild_block_df
Build token dataframebuild_token_df
Create processorcreate_processor
OCR documents asynchronouslydai_async
Check authenticationdai_auth
Notify on job completiondai_notify
Check job statusdai_status
OCR document synchronouslydai_sync
Produce access tokendai_token
Get user informationdai_user
Delete processordelete_processor
Disable processordisable_processor
Draw block bounding boxesdraw_blocks
Draw entity bounding boxesdraw_entities
Draw line bounding boxesdraw_lines
Draw paragraph bounding boxesdraw_paragraphs
Draw token bounding boxesdraw_tokens
Enable processorenable_processor
Extract block coordinates from labelme filesfrom_labelme
Get entitiesget_entities
List ids of available processors of a given typeget_ids_by_type
Get information about processorget_processor_info
List available versions of processorget_processor_versions
List created processorsget_processors
Get project idget_project_id
Get tablesget_tables
Get textget_text
List versions of available processors of a given typeget_versions_by_type
Convert images to PDFimage_to_pdf
Image to base64 tiffimg_to_binbase
Check that a string is a valid colour representationis_colour
Check that a file is JSONis_json
Check that a file is PDFis_pdf
List available processor typeslist_processor_types
Make hOCR filemake_hocr
Merge shardsmerge_shards
PDF to base64 tiffpdf_to_binbase
Assign tokens to new blocksreassign_tokens
Assign tokens to a single new blockreassign_tokens2
Inspect revised block bounding boxesredraw_blocks
Split a block bounding boxsplit_block
Get tables from output filetables_from_dai_file
Get tables from response objecttables_from_dai_response