#Duckdb #htslib #Genomics #Bioinformatics #RStats
duckths: Read HTS (VCF/BCF/BAM/CRAM/FASTA/FASTQ/GTF/GFF) files in DuckDB via htslib
Rduckhts: 'DuckDB' High Throughput Sequencing File Formats Reader Extension
genomic.social/@bioinfhotep...
Latest posts tagged with #htslib on Bluesky
#Duckdb #htslib #Genomics #Bioinformatics #RStats
duckths: Read HTS (VCF/BCF/BAM/CRAM/FASTA/FASTQ/GTF/GFF) files in DuckDB via htslib
Rduckhts: 'DuckDB' High Throughput Sequencing File Formats Reader Extension
genomic.social/@bioinfhotep...
Rduckhts: 'DuckDB' 'HTS' File Reader Extension for 'R' Bundles the 'duckhts' 'DuckDB' extension for reading 'HTS' file formats (VCF/BCF, SAM/BAM/CRAM, FASTA, FASTQ, GFF, GTF, tabix) from 'R' via 'DuckDB'. The extension and its 'htslib' dependency are compiled from vendored sources during package installation. Authors:Sounkou Mahamane Toure [aut, cre], htslib authors [ctb], DuckDB C Extension API authors [ctb] Rduckhts_0.1.1-0.0.1.tar.gz Rduckhts_0.1.1-0.0.1.zip(r-4.6)Rduckhts_0.1.1-0.0.1.zip(r-4.5)Rduckhts_0.1.1-0.0.1.zip(r-4.4) Rduckhts_0.1.1-0.0.1.tgz(r-4.6-any)Rduckhts_0.1.1-0.0.1.tgz(r-4.5-any) Rduckhts_0.1.1-0.0.1.tar.gz(r-4.6-any)Rduckhts_0.1.1-0.0.1.tar.gz(r-4.5-any) Rduckhts.pdf |Rduckhts.html✨ Rduckhts/json (API) NEWS # Install 'Rduckhts' in R: install.packages('Rduckhts', repos = c('https://rgenomicsetl.r-universe.dev', 'https://cloud.r-project.org')) Bug tracker:https://github.com/rgenomicsetl/duckhts/issues0 issues On CRAN: no 3.75 score 12 exports 2 dependencies Last updated0 hours ago from:e99a28a305. Checks:7 OK, 1 NOTE, 1 FAIL. Indexed: yes. Citation To cite package ‘Rduckhts’ in publications use: Toure S (2026). Rduckhts: 'DuckDB' 'HTS' File Reader Extension for 'R'. R package version 0.1.1-0.0.1, https://github.com/rgenomicsetl/duckhts. Corresponding BibTeX entry: @Manual{, title = {Rduckhts: 'DuckDB' 'HTS' File Reader Extension for 'R'}, author = {Sounkou Mahamane Toure}, year = {2026}, note = {R package version 0.1.1-0.0.1}, url = {https://github.com/rgenomicsetl/duckhts}, }
Rduckhts: DuckDB HTS File Reader Extension for R Rduckhts provides an R interface to a DuckDB HTS (High Throughput Sequencing) file reader extension. This enables reading common bioinformatics file formats such as VCF/BCF, SAM/BAM/CRAM, FASTA, FASTQ, GFF, GTF, and tabix-indexed files directly from R using SQL queries via duckhts. How it works Following RBCFTools, tables are created and returned instead of data frames. VCF/BCF, SAM/BAM/CRAM, FASTA, FASTQ, GFF, GTF, and tabix formats can be queried. We support region queries for indexed files, and we target Linux, macOS, and RTools. htslib 1.23 is bundled so build dependencies stay minimal. The extensnion is built by adapting the generic extension infracstructure by using only makefiles unlike unlike the submitted communtity extension duckhts. Installation The package can be installed from github remotes::install_github( "RGenomicsETL/duckhts", subdir = "r/Rduckhts")`. System Requirements Installation requires htslib dependencies such ad zlib and libbz2, and optionally for full functionally liblzma, libcurl, and openssl. The package requires GNU make. On Windows’s Rtools, htslib plugins are not enable.
Quick Start The extension is loaded with rduckhts_load(con, extension_path = NULL). We can create tables with rduckhts_bcf, rduckhts_bam, rduckhts_fasta, rduckhts_fastq, rduckhts_gff, rduckhts_gtf, and rduckhts_tabix using the parameters documented in their help pages library(DBI) library(duckdb) library(Rduckhts) ext_path <- system.file("extdata", "duckhts.duckdb_extension", package = "Rduckhts") fasta_path <- system.file("extdata", "ce.fa", package = "Rduckhts") fastq_r1 <- system.file("extdata", "r1.fq", package = "Rduckhts") fastq_r2 <- system.file("extdata", "r2.fq", package = "Rduckhts") con <- dbConnect(duckdb::duckdb(config = list(allow_unsigned_extensions = "true"))) rduckhts_load(con, extension_path = ext_path) #> [1] TRUE rduckhts_fasta(con, "sequences", fasta_path, overwrite = TRUE) rduckhts_fastq(con, "reads", fastq_r1, mate_path = fastq_r2, overwrite = TRUE) dbGetQuery(con, "SELECT COUNT(*) AS n FROM sequences") #> n #> 1 7 dbGetQuery(con, "SELECT COUNT(*) AS n FROM reads") #> n #> 1 10
FASTA, BAM, FASTQ, READER
#RStats
Rduckhts: 'DuckDB' 'HTS' File Reader Extension for 'R'
Sitting on the shoulders of the great #htslib API and the duckdb C API
Package : rgenomicsetl.r-universe.dev/Rduckhts
Open BCF/VCF in duckbd Tables in R using a bcf_reader duckdb extension based on htslib
Open BCF/VCF in duckbd Tables in R using a bcf_reader duckdb extension based on htslib while in tidy format
write into minio a parquet file obtained from conversion from VCF
DuckLake Content
Maybe the fastest BCF/VCF to #RStats DataFrames using #htslib and #duckdb C API. Easily the title of fastest BCF/VCF to parquet convertors in #RStats (no other R options :D). This was motivated, among other things, by the idea of trying out #DuckLake in a familiar field
github.com/RGenomicsETL...
Release 1.22 of HTSlib, SAMtools, and BCFtools is now available from GitHub. See htslib.org/download/ for links to tarballs and release notes. 🧪
#samtools #bcftools #htslib #bioinformatics