Towards transparency and knowledge exchange in ai-assisted data analysis code generation

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Play all audios:

Access through your institution Buy or subscribe Generative artificial intelligence (AI) and large language models (LLMs) in particular are changing the way we do data science. Most

prominently, scientists use the technology for interacting with scientific data1, answering data analysis questions2,3, generating data analysis code4,5,6, and (re-)writing scientific

manuscripts7. Unfortunately, the prompts sent to LLMs are commonly not conserved, and thus, at the time of publication, it might be hard to differentiate human-made and AI-generated parts of

the scientific work. A professional peer-review system, for documenting how LLM-generated code was prompted for, and which human reviewed it, is not established in contemporary scientific

culture. However, such systems do exist for collaborative code editing involving multiple humans. For example, the source code repositories GitHub and GitLab are well-established in the

open-source software community for discussing issues and potential solutions, building code together, and for peer-reviewing content. As it was shown before that LLMs can solve real-world

GitHub issues8, developing an AI-assistant that interacts with humans directly within the GitHub platform is the obvious next step. Here, I present git-bob, a GitHub/GitLab-integration of an

LLM-based AI-assistant that can respond to GitHub issues, discuss potential solutions with humans iteratively, write code for them, and submit it as a pull-request to be reviewed by humans.

It is technically similar to various online services for data analysis such as the OpenAI ChatGPT Data Analyst or GitHub Copilot workflows, with three major differences. First, multiple

humans can interact with git-bob in one communication thread. This allows bringing together domain specialists, such as life scientists, data-analysts and the AI-assistant in one discussion,

stimulating knowledge exchange on how to interact properly with the AI-assistant. Second, discussions with git-bob and resulting code modifications are conserved in an online platform that

others can read and follow, making the interaction with the AI-assistant fully transparent. Third, git-bob is completely open-source and extensible. Other developers can read its built-in

system prompts and modify them to their needs. Developers can implement custom connectors to other LLM service providers and write plugins for their custom AI agents, which may deal with

GitHub issues differently. This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access Nature and 54 other Nature Portfolio

journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 digital issues and online access to articles

$99.00 per year only $8.25 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to local taxes which are

calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support CODE AVAILABILITY The complete source code

of git-bob is available online at GitHub11: https://github.com/haesleinhuepf/git-bob REFERENCES * Royer, L. A. _Nat. Methods_ 20, 951–952 (2023). Article Google Scholar * Lai, Y. et al.

Preprint at https://arxiv.org/abs/2211.11501 (2022). * Lei, W. et al. _Nat. Methods_ 21, 1368–1370 (2024). Article Google Scholar * Royer, L. A. _Nat. Methods_ 21, 1371–1373 (2024).

Article Google Scholar * Haase, R., Tischer, C., Hériché, J.-K. & Scherf, N. Preprint at _bioRxiv_ https://doi.org/10.1101/2024.04.19.590278 (2024). * Chen, M. et al. Preprint at

https://arxiv.org/abs/2107.03374 (2021). * Lu, C. et al. Preprint at https://arxiv.org/abs/2408.06292 (2024). * Jimenez, C. E. et al. Preprint at https://arxiv.org/abs/2310.06770 (2024). *

Yin, Z. et al. Preprint at https://arxiv.org/abs/2305.18153 (2023). * About GitHub-hosted runners. _GitHub_

https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners (accessed 14 October 2024). * Hasse, R. git-bob. _GitHub_

https://github.com/haesleinhuepf/git-bob (2024). Download references ACKNOWLEDGEMENTS I would like to thank E. K. Nicolay (UFZ Leipzig) and M. Lampert (TU Dresden) for testing git-bob in its

early days and for providing constructive feedback on the manuscript. I also would like to thank V. Hilsenstein for pushing for GitLab interoperability. I acknowledge the financial support

by the Federal Ministry of Education and Research of Germany and by Sächsische Staatsministerium für Wissenschaft, Kultur und Tourismus in the programme Center of Excellence for AI-research

“Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig”, project identification number: ScaDS.AI. I also acknowledge financial support from the Deutsche

Forschungsgemeinschaft (DFG, German Research Foundation) under the National Research Data Infrastructure – NFDI 46/1 – 501864659 - NFDI4BioImage. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS

* Data Science Center, Leipzig University, Leipzig, Germany Robert Haase * Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden / Leipzig, Leipzig, Germany

Robert Haase Authors * Robert Haase View author publications You can also search for this author inPubMed Google Scholar CORRESPONDING AUTHOR Correspondence to Robert Haase. ETHICS

DECLARATIONS COMPETING INTERESTS The author declares no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Computational Science_ thanks Virginie Uhlmann and the other,

anonymous, reviewer(s) for their contribution to the peer review of this work. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION Supplementary Figures 1–6. RIGHTS AND PERMISSIONS Reprints

and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Haase, R. Towards transparency and knowledge exchange in AI-assisted data analysis code generation. _Nat Comput Sci_ 5, 271–272 (2025).

https://doi.org/10.1038/s43588-025-00781-1 Download citation * Published: 27 March 2025 * Issue Date: April 2025 * DOI: https://doi.org/10.1038/s43588-025-00781-1 SHARE THIS ARTICLE Anyone

you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the

Springer Nature SharedIt content-sharing initiative

Towards transparency and knowledge exchange in ai-assisted data analysis code generation

Play all audios:

Trending News

Latest News