Trending

#OpenCitations

Latest posts tagged with #OpenCitations on Bluesky

Latest Top
Trending

Posts tagged #OpenCitations

Efficient Bulk Access to Citations in OpenCitations OpenCitations aggregates and deduplicates bibliographic information from CrossRef, Europe PubMed Central, and other sources to construct a comprehensive, open index of citations between scientific works. This post describes the `opencitations-client` package which wraps the OpenCitations API and implements an automated pipeline for locally downloading, caching, and accessing OpenCitations in bulk. ## Background OpenCitations both provides access via an API and bulk data downloads distributed across FigShare and Zenodo. Importantly, it publishes its data under the CC0 public domain license to democratize access to citations - previously, this data was only available through paid access to commercial databases owned by publishers. While API access can be convenient for _ad-hoc_ usage, it’s generally slow, rate-limited, susceptible to DDoS (e.g., from crawlers), and therefore difficult (if not impossible) to use in bulk. My solution is to write software that automates downloading, processing, and caching databases in bulk and provides fast, highly available, local access. I’ve previously written about developing standalone software packages for several large databases including DrugBank, ChEMBL, UMLS, ORCiD, and ClinicalTrials.gov. Similarly, I maintain several similar workflows in the PyOBO software package for converting resources into ontology-like data structures. I previously wrote about how this looks for HGNC. ## Building on an Existing Ecosystem I’ve been developing a software ecosystem over the last decade to support common workflows in research data management and data integration. When I start a new project, I try and reuse or improve existing components from that ecosystem wherever possible. Importantly, I try and find meaningful ways of organizing code across my ecosystem to reduce duplication, separate concerns, reduce the burden of testing, and ease maintenance. OpenCitations publishes its bulk data dumps across several records in Figshare and Zenodo. I’ve previously written `zenodo-client` to interact with Zenodo’s API and orchestrates downloading and caching. `zenodo-client` heavily builds on `pystow`, which implements I/O and filesystem operations to enable reproducible, automated downloading, caching, and opening of data. I had not previously written software to interact with Figshare, so I followed the form of `zenodo-client` and created a new package, `figshare-client`. I’m able to quickly create new high-quality packages because I’ve encoded all the wisdom and experience I’ve gained over the years in a Cookiecutter template, cookiecutter-snekpack, which I can use to set up a new project in mere minutes. Along the way, I realized that the archives in Zenodo and Figshare were a combination of TAR and ZIP archives, each with many CSV files inside. In Python, TAR and ZIP archives have lots of weird quirks, even though they mostly do the same thing. However, rather than addressing those issues in `opencitations-client`, it made more sense to add utility functions in PyStow in cthoyt/pystow#125 (tar and zip archive iteration), which I was much better able to test in the PyStow archive. A key functionality of OpenCitations is to implement graph-like queries to find incoming and outgoing citations. I considered several solutions for efficiently caching and querying graph-like data including pickles and SQLite, but these were respectively slow and disk inefficient. I found better solutions based on NumPy’s memory maps and was surprised that I couldn’t find an implementation in a popular package (e.g., SciPy). So, I had to decide where to put an implementation of disk-based cached graph. I didn’t want to put it in OpenCitations nor make a tiny package for just this one operation, so I decided to expand the scope of PyStow and add it there in cthoyt/pystow#121. Finally, OpenCitations deals with a variety of identifier spaces including first-party OpenCitations Metadata IDs (OMIDs) and OpenCitations Citation IDs (OCIs) as well as third-party identifiers from Wikidata, OpenAlex, PubMed, DOI, and others. I’ve written the `curies` to handle identifiers in an explicit and transparent way. In the end, the `opencitations-client` relies on several components from my ecosystem, and of course, several more generic and popular packages. Here’s how the dependencies look: flowchart LR opencitations-client -- depends on --> figshare-client opencitations-client -- depends on --> zenodo-client opencitations-client -- depends on --> curies figshare-client -- depends on --> pystow zenodo-client -- depends on --> pystow ## Demo It’s important for software packages to implement simple, top-level APIs that cover 99% of use cases with reasonable defaults. Most use cases for OpenCitations are to get incoming/outgoing citations for a DOI, PubMed identifiers, or OpenCitations identifiers. Here’s how this looks: from curies import Reference from opencitations_client import get_incoming_citations, get_outgoing_citations # a CURIE for the DOI for the Bioregistry paper bioregistry_curie = "doi:10.1038/s41597-022-01807-3" # who did the Bioregistry paper cite? outgoing: list[Reference] = get_outgoing_citations(bioregistry_curie) # who cited the Bioregistry paper? incoming: list[Reference] = get_incoming_citations(bioregistry_curie) Importantly, each of these functions has a `backend` argument that defaults to `api` and can be swapped to `local`. Because everything is built on software that is smart about caching, loading, and data workflows, on the first time `backend='local'` is used, all processing happens automatically (warning, takes a few hours on a single core). This function also has a `return_value` argument that can be used to swap between principled `curies.Reference` data structures that explicitly encode identifiers, simple string local unique identifiers that match the input prefix, or full citation objects (only available through OpenCitations API). See the `opencitations-client` code on GitHub (https://github.com/cthoyt/opencitations-client) and documentation on ReadTheDocs (https://opencitations-client.readthedocs.io). * * * While I’ve been thinking about adding citations to the bibliographic components of knowledge graph construction workflows for several years, I was finally pushed to implement `opencitations-client` for the Catalaix project, where we’re developing new methods for recycling and reuse of (bio)plastics. I wanted to get all seventeen laboratories’ publications, who they cited, and who cited them as a seed for information extraction and curation. Here’s a small example of a citation network from those queries: flowchart TD 26802344["Mechanism-specific and whole-organism ecotoxicity of mono-rhamnolipids. Blank (2016)"] 34492827["The Green toxicology approach: Insight towards the eco-toxicologically safe development of benign catalysts. Herres-Pawlis (2021)"] 28779508["Highly Active N,O Zinc Guanidine Catalysts for the Ring-Opening Polymerization of Lactide. Herres-Pawlis (2017)"] 33195133["Genetic Cell-Surface Modification for Optimized Foam Fractionation. Blank (2020)"] 32974309["Integration of Genetic and Process Engineering for Optimized Rhamnolipid Production Using Jupke, Blank (2020)"] 30811863["New Kids in Lactide Polymerization: Highly Active and Robust Iron Guanidine Complexes as Superior Catalysts. Pich, Herres-Pawlis (2019)"] 30758389["Tuning a robust system: N,O zinc guanidine catalysts for the ROP of lactide. Pich, Herres-Pawlis (2019)"] 28524364["Biofunctional Microgel-Based Fertilizers for Controlled Foliar Delivery of Nutrients to Plants. Pich, Schwaneberg (2017)"] 34865895["A plea for the integration of Green Toxicology in sustainable bioeconomy strategies - Biosurfactants and microgel-based pesticide release systems as examples. Pich, Blank, Schwaneberg (2022)"] 32449840["Robust Guanidine Metal Catalysts for the Ring-Opening Polymerization of Lactide under Industrially Relevant Conditions. Herres-Pawlis (2020)"] 34492827 --> 30811863 34492827 --> 30758389 34492827 --> 28779508 34492827 --> 32449840 32974309 --> 33195133 34865895 --> 26802344 34865895 --> 32974309 34865895 --> 28524364 34865895 --> 34492827

I blogged about software I wrote for efficiently downloading and querying @opencitations in bulk

📖 blog: https://cthoyt.com/2026/02/10/opencitations-client.html

💽 code: https://github.com/cthoyt/opencitations-client

#opencitations #bibliometrics #opensource

2 2 0 0
Building maintaining and listening, OpenCitations 2025

Building maintaining and listening, OpenCitations 2025

Looking back at 2025, #OpenCitations focused on strengthening the foundations: stronger workflows, resilient infrastructure, and deeper community collaboration.
Read the story behind the work in our year in review on the #OpenCitationsBlog
👉 opencitations.hypotheses.org/4106

0 0 0 0
Figure 1 from the preprint: Comparison of Web of Science (Wos), OpenAlex (OA), and OpenCitations (OC) data for the articles present in the three datasets. A. Number of authors, B. Length of the title, C. Number of pages, D. Attributed year, E. Number of references. For each plot, a dashed-line represents y = x.

Figure 1 from the preprint: Comparison of Web of Science (Wos), OpenAlex (OA), and OpenCitations (OC) data for the articles present in the three datasets. A. Number of authors, B. Length of the title, C. Number of pages, D. Attributed year, E. Number of references. For each plot, a dashed-line represents y = x.

Figure 5 of the preprint: Biplot of the first and second principal components of a PCA computed on the means of the five bibliometric variables for each journal in the sample. The arrows represent the correlation between each original variable and the principal components. The direction and length of the arrows indicate how strongly each variable contributes to each component.

Figure 5 of the preprint: Biplot of the first and second principal components of a PCA computed on the means of the five bibliometric variables for each journal in the sample. The arrows represent the correlation between each original variable and the principal components. The direction and length of the arrows indicate how strongly each variable contributes to each component.

Figure 6 of the preprint: Variation in bibliometric indicators of hardness for 25 archaeological journals based on OpenAlex data. The journals are ordered for each indicator so that within each plot, the harder journals are at the top of the plot and the softer journals are at the base. Panel F shows a bar plot that is the single consensus ranking computed from all five variables, using the Borda Count ranking algorithm.

Figure 6 of the preprint: Variation in bibliometric indicators of hardness for 25 archaeological journals based on OpenAlex data. The journals are ordered for each indicator so that within each plot, the harder journals are at the top of the plot and the softer journals are at the base. Panel F shows a bar plot that is the single consensus ranking computed from all five variables, using the Borda Count ranking algorithm.

New version of my reproduction and replication attempt of @benmarwick.bsky.social 's paper published few months ago in JAS, but here with #OpenAlex and #OpenCitations data.
You can read it in a interactive html page here: aqueff.github.io/replication_...
🧪🏺

1 1 1 0
People around the table; food; wine

People around the table; food; wine

Holiday wishes from the #OpenCitations team!🎄✨

Sharing good food, great conversations, and ideas around the table.

We’ll be back in January, recharged and ready to make the new year count. Wishing everyone a bright time ahead!🌟

1 0 0 0
Preview
Day 2💫 Wednesday, 22 October | Open and Engaged 2025 British Library's Open and Engaged 2025 Conference

👩🏻‍💻 Checking the Day 2 Wed. 23Oct programme at #OpenEngaged #OAWeek:
🌟Lightning talks: #SafeguardingResearch #Datarescue #CARE #OpenGLAM #Accessibility
🌟Technology, Power, and Equitable Design session. #LocalContexts #OpenCitations #OpenMetadata
✅ Register openscholarship.gitbook.io/open-and-eng...

2 0 0 0
Preview
Sorbonne Université : retrait du classement THE en 2026 ; « privilégier les bases de données ouvertes » Sorbonne Université se retire du classement des universités produit par THE (Times Higher Education) à partir de 2026, annonce l’université le 16/09/2025. « Ce choix marque une nouvelle étape dans la....

[Veille] "Sorbonne Université : retrait du classement THE en 2026 ; « privilégier les bases de données ouvertes »" => education.newstank.fr/article/view...
#ESR #openscience #bibliometrics #scientometrics #opencitations #ranking #research #universities #opendata

2 3 0 2
Preview
Booster la visibilité des publications scientifiques lorraines dans OpenAlex : mission accomplie - Factuel - l'Info de l'Université de Lorraine Entre février et juillet 2025, l’équipe Bibliométrie de l’Université de Lorraine a mené un grand chantier de signalement des publications scientifiques de l’établissement au sein de la base de données...

[Veille] : Booster la visibilité des publications scientifiques lorraines dans OpenAlex : mission accomplie => factuel.univ-lorraine.fr/article/boos...
#openscience #opencitations #opendata #bibliometrics #scientometrics

1 3 0 1
Post image

Now on stage at #csvconf: @essepuntato.bsky.social presenting "How did we get to OpenCitations: a brief history of open scholarly citations" #opencitations

2 1 0 0
Post image

Happening now: Arcangelo Massari at #csvconf presenting "Automated citation crowdsourcing: extending scholarly data coverage through community contributions" #opencitations

1 1 0 0
csv,conf,v9

📢 Double talk session for the OpenCitations team at #csvconf v9 (Data for Communities), in #Bologna!
Sept 10, 10:30 CEST👉Arcangelo Massari on automated citation crowdsourcing;
Sept 11, h. 14:40 👉 @essepuntato.bsky.social on #OpenCitations
🦙More info at csvconf.com/index.html

1 1 0 1

❗SERVICE ALERT❗
On Aug.21, between 09:00 and 18:00 CEST, #OpenCitations services may experience temporary disruptions due to technical interventions. We apologise in advance for the inconvenience and thank you for your understanding.

1 3 0 0
Post image

New Journal Article: "Validating and Monitoring Bibliographic and Citation #Data In #OpenCitations Collections" link.springer.com/article/10.1... #libraries #openscience #datasets @opencitations.bsky.social

2 1 0 1
Post image

☀️ July may be for holidays, but it's never too late to read a good annual report!
The #OpenCitations 2024 Annual Report is now available on Zenodo, featuring our five key focus areas and full financial transparency.
📘 Read it here: doi.org/10.5281/zeno...

1 1 0 0
Preview
Seminari della Coalition for Advancing Research Assessment L'Open Science declinata nella peer review e nella valutazione della ricerca

📢 Tomorrow!
@essepuntato.bsky.social will present #OpenCitations during an online seminar organised by the Italian chapter of @coarassessment.bsky.social .
🗓️ Date: July 9
🕙 Time: 10:00–12:00 CEST
🇮🇹 Language: Italian
👉 Info & Registration: biblio.unipd.it/news/seminar...

0 0 0 0
Post image

Thanks to everyone who checked out our new website at opencitations.net
📩As a reminder, the #OpenCitations Newsletter is now live!
Don’t miss out on updates, projects & community news. Subscribe here:
👉https://opencitations.net/newsletter/

0 0 0 0

The new @opencitations.bsky.social website is now live!

Designed for a smoother, more intuitive experience, discover the updated site features today!

#OpenScience #OpenCitations #ScholarlyData #WebRedesign

3 2 0 0
Post image

🎉 Our new website is live!  

As part of our rebranding journey, it now offers easier access to our services, and a big new feature: the OpenCitations Newsletter!   

📩 opencitations.net/newsletter/  
🔗 opencitations.net  
📰 opencitations.hypotheses.org/3918

#opencitations

10 6 1 1
Post image

Exciting things are happening behind the scenes at #OpenCitations...
You may experience some temporary service disruptions, but our team is working to minimise the inconvenience.
Get ready. A big reveal is coming soon.🔎✨

2 0 0 0

❗EXTRAORDINARY MAINTENANCE❗Due to an unplanned outage, some #OpenCitations services may not be working as expected. We are currently working to resolve the issue. Thank you for your patience.

1 0 0 0

❗PLANNED MAINTENANCE ON APRIL 17 ❗
Due to ordinary maintenance planned on April 17, 2025, some #OpenCitations services may not function as expected. We apologize for any inconvenience and appreciate your patience.

1 0 0 0
Preview
GRAPHIA Project Launched in January 2025 OPERAS hosted the launch of the GRAPHIA project in Brussels earlier this year. On the 22nd and 23rd January, 2025, 21 partner institutions met in OPERAS’ office and online to celebrate the project whi...

💡 @graphiaproject.bsky.social aims to create the first comprehensive Social Science and Humanities (SSH) Knowledge Graph. #OpenCitations is one of its data sources, and our UNIBO personnel is involved in developing new tools for data extraction from PDFs: opencitations.hypotheses.org/3869

4 2 0 0
Post image

The new logo of #OpenCitations is an evolution of the elements of the old one and embodies our values of openness, curiosity, and innovation:
Letter O = “Open”;
Letter C =“Citations”;
Eye = curiosity driving research;
<> = the semantic web technologies.
👉 opencitations.hypotheses.org/3797

5 3 0 1
Post image

THE NAME OF OPENCITATIONS NEW LOGO
Have you already become familiar with the mascot of our new logo?
We have decided to name it “QUIRIN”, a name describing the four pillars of #OpenCitations services: QUality; Integration; Reusablity and INteroperability: opencitations.hypotheses.org/3797

2 0 1 0
Logo of OpenCitations, eye with citation marks

Logo of OpenCitations, eye with citation marks

Our old OC logo has served us well for many years, but we needed a design to reflect the present and future of #OpenCitations. The new logo’s uniqueness has made it possible to register it as a trademark! And...we even gave it a name: opencitations.hypotheses.org/3797

8 4 1 0
Blue cloud and question mark

Blue cloud and question mark

What's happening?  
Friday, 14 February 2025, h. 11 CET. Let's unveil together. 
#OpenCitations  
.
.
@openaire.bsky.social @scossfunding.bsky.social @investinopen.bsky.social @barcelonadori.bsky.social @dblp.org @pkp.sfu.ca @operaseu.bsky.social

4 3 0 0
Railways and the title Wheels are in Motion

Railways and the title Wheels are in Motion

For #OpenCitations, it has become a tradition to take a moment at the end of January to reflect on the achievements of the past year and plan for the future with a clearer mind. Revisit the key steps of 2024 with us in the new post on #OpenCitationsBlog: opencitations.hypotheses.org/3774

3 0 0 0

In the last few days, many old and new friends have started following #OpenCitations, and we're happy to meet you again here!
🔎If you missed the announcement, consider joining us at #WOOC2025. We hope to welcome many of you to Bologna! The call for participation and contributions is now open 👇

0 1 0 0
People from OpenCitations Team standing for a group photo

People from OpenCitations Team standing for a group photo

While the pots are already on the stove for the Christmas Eve Dinner, the #OpenCitations Team wants to wish✨Happy Holidays✨to our supporters, partners, colleagues, and anyone who has accompanied us in this successful 2024! Looking forward to engaging with you in #2025 🎆

@essepuntato.bsky.social

4 2 0 0

[À noter] Le 18e c@fé Renatis "Retours d’usage d’ @OpenAlex_org à l’ @INIST_CNRS " aura lieu le 10 déc. à 13h30➡️renatis.cnrs.fr/event/18e-cfe-renatis-re...
#openscience #searchengines #scientometrics #opencitations #PID #metadata #INIST #France #IST

0 0 0 0
https://openscience.unimib.it/2024/10/17/opencitations-uninfrastruttura-che-vive-grazie-alla-comunita-ed-e-per-la-comunita/

https://openscience.unimib.it/2024/10/17/opencitations-uninfrastruttura-che-vive-grazie-alla-comunita-ed-e-per-la-comunita/

Thank you Open Science Università di Milano-Bicocca for dedicating your monthly "Speakers' Corner" to #OpenCitations, with a detailed interview with our Director @essepuntato.bsky.social
👉Read the interview (ITA only) here:
openscience.unimib.it/2024/10/17/o...

0 0 0 0