This is the PDF of the GenABEL tutorial, a book on how to use the GenABEL package and several other tools from the GenABEL suite (see. for the GenABEL project contributors. [ @GenAproj | www. . posts on forum. Open-source tutorial. GenABEL package. GenABEL suite. PredictA. PredictA. GenABEL tutorial. GenABEL tutorial Street, Suite , Mountain View, California, , USA. >= library(GenABEL) data( srdta) @.
|Published (Last):||17 February 2005|
|PDF File Size:||2.17 Mb|
|ePub File Size:||11.47 Mb|
|Price:||Free* [*Free Regsitration Required]|
LCK drafted the initial version of the manuscript and analyzed the data. All authors contributed to the review of the manuscript and agreed to the final content. For scientific software, however, this is less often the case. Most scientific software is written by only a few authors, often a student working on a thesis. Once the paper describing the tool has been published, the tool is no longer developed further and is left to its own device.
Here we describe the broad, multidisciplinary community we formed around a set of tools for statistical genomics. The GenABEL project for statistical omics actively promotes open interdisciplinary development of statistical methodology and its implementation in efficient and user-friendly software under an open source licence.
The software tools developed withing the project collectively make up the GenABEL suite, which currently consists of eleven tools. The open framework of the project actively encourages involvement of the community in all stages, from formulation of methodological ideas to application of software to specific data sets. A web forum is used to channel user questions and discussions, further promoting the use of the GenABEL suite.
Developer discussions take place on a dedicated mailing list, egnabel development is further supported by robust development practices including use of public version control, code review and continuous integration.
The field of egnabel gen- omics lies at the heart of current research into the genetic aetiology of human disease and personalized or precision medicine 1. Genome-wide association studies GWASgenotype imputation and next-generation sequencing NGS are just a few of the techniques used in this field that is driven by increasingly larger data tuhorial 23. With the advent of polyphenotype analysis as is now customary in e.
geenabel In recent years, scientists and funding organizations alike have come to realize that in order to successfully tackle the challenges of the field, close collaboration between various disciplines, e.
Unfortunately, creators of scientific software are usually not funded to actively build such a community. Moreover, our experience shows that once the peer-reviewed article describing a tool has been published, funding and time to continue development and support of that tool are usually limited or non-existent, and consequently, the genbael often slowly fades into oblivion.
The GenABEL Project for statistical genomics
It needs no explanation that this amounts to a waste of effort and money. The GenABEL project aims to provide a framework for collaborative, sustainable, robust, transparent, opensource based development of statistical genomics methodology.
Within the project, statisticians devoted to method development work together with statistical geneticists and biologists to refine existing statistical methods as well as develop new ones and make them applicable to genomic analysis. With the help of computer scientists and scientific software developers these mathematical models are then implemented into efficient and user-friendly software.
This flow of work and information is not linear, but rather more circular in nature, with information and feedback being continuously transferred between the various layers as depicted in Figure 1. In short, it is a form of agile community-driven development 11 It enables a free flow of information between the layers in the project resulting in rapid feedback between the various levels. Not only do we require that all tools are released under an open source or free software licence like the GNU Public Licence GPLwe also try to create an atmosphere of open communication using public mailing lists and web forums see the sections Interaction with the user community and Development infrastructure below.
Moreover, because of this openness results of the project i. Many tools are R packages, however, this is not a requirement for inclusion in the suite. Any software that is related to the field of statistical gen- omics is welcome technical requirements are discussed in section Development infrastructure. Currently, the suite consists of 11 officially released tools cf. Table 1 and two that are in beta stage.
The collaborative nature of the project is demonstrated in the GenABEL package as it implements several statistical methods developed within the framework, including approximate mixed models 21 — 23 and various methods for genomic control 24 This shows tutrial the project is really a platform for implementation of statistical methods which removes the burden of thinking about data formats etc.
Like the GenABEL package it allows running linear or logistic regression, genabe well as Cox proportional genabeel model, however, ProbABEL is tailored to the large file sizes that are inherent to tutoriap data sets with approximately 30 million imputed genotypes per individual. It is the second most-used tool from the suite with more than citations according to Google Scholar As indicated by its name, MixABEL is an R package for running genome-wide association analyses using mixed models in quantitative traits.
GWAS usually involves meta-analysis of the regression results of various cohorts. Such heterogeneity is an indication of interaction between a genetic marker and either another marker or an unknown factor 16 It contains a high-performance computing based approach facilitating extremely fast mixed-model based regression of multiple omics traits like metabolomics or lipidomics on imputed genotype data OmicABEL aims to increase computational throughput while reducing memory usage and energy consumption.
This was achieved by using optimal hardware-tailored algorithms using state-of-the-art linear algebra kernels, incorporating optimizations and avoiding redundant computations. It includes functions to compute univariate and multivariate odds ratios of the predictors, the area under the receiver operating characteristic ROC curve AUCHosmer-Lemeshow goodness of fit test, reclassification table, genahel reclassification improvement and integrated discrimination improvement It is a computationally efficient solution for screening general forms of CH alleles in densely imputed microarray or whole genome sequencing datasets DatABEL is an R interface to our filevector tenabel which provides a file format that is optimised for fast access to data in matrix form, e.
The source code for the genavel packages can be downloaded from our website at http: The GenABEL project website is the central hub that points to package descriptions, tutorials, the development website, and other information for potential and existing users and developers.
Usage statistics such as number of visits and country of origin of visitors are monitored using Google Analytics http: As an example henabel the information that can be obtained from this data, Figure 2 shows the top genaabel cities of origin of the visitors of the GenABEL website in the period of 28 April till 28 April The website was visited times in that period, of which visits were from an unknown city.
Only visits lasting more than 60 seconds and from cities from which more than 15 visits originated were taken into account. The total number of visits in that period wasof which came from unknown cities.
Each city name is followed by the two-letter ISO code of the country in gnabel it is located.
The GenABEL Tutorial
Collecting visitor data like this helps tutoriall an insight in the institutes thtorial use software from the GenABEL suite, which can then be used to show the impact the tools have, e. Interaction with the user community is done via social media like Twitter https: Each tool in the GenABEL suite has its own documentation and the GenABEL Tutorial 27 with more than pages takes the user from learning basic R to performing more complicated analyses, showing how the various packages interconnect.
Moreover, several video tutorials are available online.
Interactive user support is mostly done through our forum http: Having an open forum serves various purposes. First of all it is a central, easy to point to reference. Moreover, compared to having individual users e-mailing a package author, who may be on holiday or otherwise unavailable, an open forum where users and developers collaborate helps in shortening the time-to-answer.
Furthermore, having an active forum where users can help each other allows the developers to focus on fixing bugs and implementing new features. These users have contributed posts in topics, with an average 7. The first hurdle many users of scientific software encounter is the installation process. Other packages are planned to be added before the end of The GenABEL project welcomes contributions of all sorts, from new tools to fixing spelling errors in the documentation, to bug reports and feature requests.
To this end all program code and documentation are either stored in a publicly readable instance of the Subversion version control system, with write access limited to a group of core contributors, or on GitHub https: These version control systems record any change to the files so they can easily be reviewed and reverted if necessary 729 In November a mailing list was created as a central place for development discussions.
As of April this list has 34 subscribers. Currently, a total of 94 bugs have been submitted to the bug trackers on R-forge and GitHub since their opening in andrespectively. Of these 94, 12 were directly contributed by people outside of the core team of developers.
In order to be able to maintain the quality of both old and new software in the GenABEL suite prospective tools go through a review process in which both the functional quality of the code is evaluated does the tool do what it intends to do? Therefore, the community has the option to mark a tool as obsolete, warning the user that bugs will no longer be fixed and support is limited or non-existent. In we have started to use a Jenkins Continuous Integration server. Using Jenkins various tests e.
Consequently, changes that break existing functionality are detected at an early stage, thus leading to more stable software releases.
The original publication of the GenABEL package for statistical analysis of genotype data 10 has led to the evolution of a community which we now call the GenABEL project, which brings together scientists, software developers and end users with the central goal of making statistical genomics work by openly developing and subsequently implementing statistical models into user-friendly software.
The project has benefited from an open development model, facilitating communication and code sharing between the parties involved. The use of a free software licence for the tools in the GenABEL suite promotes quick uptake and widespread dissemination of new methodologies and tools. Moreover, public access to the source code is an important ingredient for active participation by people from outside the core development team and is paramount for reproducible research.
Feedback from end users is actively encouraged through a web forum, which steadily grows into a knowledge base with a multitude of answered questions. Furthermore, our open development process has resulted in transparent development of methods and software, including public code review, a large fraction of bugs being submitted by members of the community, and quick incorporation of bug fixes.
Therefore, these were counted manually. The file Analytics www. The columns contain the ISO code of the country, city, number of sessions, number of new viewers, bounce percentage, pages per session and average session duration, respectively. The code contained in the Org mode file and the data in the csv files listed above are in the public domain Creative Commons CC0 license and can be used without restriction. An up-to-date list of the packages in the suite can be found on http: Archived source code at the time of publication https: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
This is a large and impressive project that contains packages that are routinely used by researchers studying genetics. Indeed, this is reflected by more than citations according to google scholar for the original GenABEL paper published in However, the 11 packages presented in this paper have been published previously in some shape or form, albeit presumably not in their most recent version.
It is therefore tempting to question the novelty of the current paper that summarizes the GenABEL project and describes its user and developer community. Nevertheless, despite limited amount of novel scientific ideas or scientific results in the current manuscript, the authors have clearly put a lot of work into creating a very impressive interactive user and developer community. Furthermore, the paper is well written and it will undoubtedly be highly cited by future researchers.
Lastly, to reiterate, the GenABEL is a very impressive large scale project that is heavily used by the community! I therefore think this is overall a nice publication that is certainly suitable for indexation.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This very well written article describes the GenABEL project for statistical genomics and high lightens the great success of the project, that in the years has lead to the creation of an actual scientific community that is spread in several countries worldwide.