Abstract
Graphical abstract

Keywords
1. Introduction


- Luo S.
- Yu J.A.
- Li H.
- Song Y.S.
- Zhang J.Y.
- Roberts H.
- Flores D.S.C.
- Cutler A.J.
- Brown A.C.
- Whalley J.P.
- Mielczarek O.
- Buck D.
- Lockstone H.
- Xella B.
- Oliver K.
- Corton C.
- Betteridge E.
- Bashford-Rogers R.
- Knight J.C.
- Todd J.A.
- Band G.
- Watson C.T.
- Steinberg K.M.
- Huddleston J.
- Warren R.L.
- Malig M.
- Schein J.
- Willsey A.J.
- Joy J.B.
- Scott J.K.
- Graves T.A.
- Wilson R.K.
- Holt R.A.
- Eichler E.E.
- Breden F.

- Zhang W.
- Wang I.-M.
- Wang C.
- Lin L.
- Chai X.
- Wu J.
- Bett A.J.
- Dhanasekaran G.
- Casimiro D.R.
- Liu X.
- Ralph D.K.
- Matsen F.A.
- Ohlin M.
- Scheepers C.
- Corcoran M.
- Lees W.D.
- Busse C.E.
- Bagnara D.
- Thörnqvist L.
- Bürckert J.-P.
- Jackson K.J.L.
- Ralph D.
- Schramm C.A.
- Marthandan N.
- Breden F.
- Scott J.
- Matsen IV F.A.
- Greiff V.
- Yaari G.
- Kleinstein S.H.
- Christley S.
- Sherkow J.S.
- Kossida S.
- Lefranc M.-P.
- van Zelm M.C.
- Watson C.T.
- Collins A.M.
- Yang X.
- Zhu Y.
- Chen S.
- Zeng H.
- Guan J.
- Wang Q.
- Lan C.
- Sun D.
- Yu X.
- Zhang Z.
- Cirelli K.M.
- Carnathan D.G.
- Nogal B.
- Martin J.T.
- Rodriguez O.L.
- Upadhyay A.A.
- Enemuo C.A.
- Gebru E.H.
- Choe Y.
- Viviano F.
- Nakao C.
- Pauthner M.G.
- Reiss S.
- Cottrell C.A.
- Smith M.L.
- Bastidas R.
- Gibson W.
- Wolabaugh A.N.
- Melo M.B.
- Cossette B.
- Kumar V.
- Patel N.B.
- Tokatlian T.
- Menis S.
- Kulp D.W.
- Burton D.R.
- Murrell B.
- Schief W.R.
- Bosinger S.E.
- Ward A.B.
- Watson C.T.
- Silvestri G.
- Irvine D.J.
- Crotty S.
Kos J.T., Safonova Y., Shields K.M., Silver C.A., Lees W.D., Collins A.M., et al. Characterization of extensive diversity in immunoglobulin light chain variable germline genes across biomedically important mouse strains. bioRxiv 2022:489089. doi:10.1101/2022.05.01.489089.
- Lilue J.
- Doran A.G.
- Fiddes I.T.
- Abrudan M.
- Armstrong J.
- Bennett R.
- Chow W.
- Collins J.
- Collins S.
- Czechanski A.
- Danecek P.
- Diekhans M.
- Dolle D.-D.
- Dunn M.
- Durbin R.
- Earl D.
- Ferguson-Smith A.
- Flicek P.
- Flint J.
- Frankish A.
- Fu B.
- Gerstein M.
- Gilbert J.
- Goodstadt L.
- Harrow J.
- Howe K.
- Ibarra-Soria X.
- Kolmogorov M.
- Lelliott C.J.
- Logan D.W.
- Loveland J.
- Mathews C.E.
- Mott R.
- Muir P.
- Nachtweide S.
- Navarro F.C.P.
- Odom D.T.
- Park N.
- Pelan S.
- Pham S.K.
- Quail M.
- Reinholdt L.
- Romoth L.
- Shirley L.
- Sisu C.
- Sjoberg-Herrera M.
- Stanke M.
- Steward C.
- Thomas M.
- Threadgold G.
- Thybert D.
- Torrance J.
- Wong K.
- Wood J.
- Yalcin B.
- Yang F.
- Adams D.J.
- Paten B.
- Keane T.M.
- Lees W.
- Busse C.E.
- Corcoran M.
- Ohlin M.
- Scheepers C.
- Matsen F.A.
- Yaari G.
- Watson C.T.
- Community AIRR
- Collins A.
- Shepherd A.J.
Strain | Type | Sequences |
---|---|---|
129S1/SvImJ | IGKV | 91 |
IGLV | 3 | |
A/J | IGKV | 102 |
IGLV | 3 | |
AKR/J | IGKV | 85 |
IGLV | 3 | |
BALB/c | IGHV | 164 |
BALB/c/ByJ | IGLV | 3 |
IGKV | 98 | |
C3H/HeJ | IGKV | 96 |
IGLV | 3 | |
C57BL/6 | IGHV | 102 |
C57BL/6J | IGKV | 91 |
IGLV | 3 | |
CAST/EiJ | IGKV | 88 |
IGLV | 9 | |
CBA/J | IGKV | 82 |
IGLV | 3 | |
DBA/1J | IGKV | 104 |
IGLV | 3 | |
DBA/2J | IGKV | 100 |
IGLV | 3 | |
LEWES/EiJ | IGKV | 87 |
IGLV | 4 | |
MRL/MpJ | IGKV | 72 |
IGLV | 3 | |
MSM/MsJ | IGKV | 83 |
IGLV | 5 | |
NOD/ShiLtJ | IGKV | 62 |
IGLV | 3 | |
NOR/LtJ | IGKV | 80 |
IGLV | 3 | |
NZB/BlNJ | IGKV | 105 |
IGLV | 3 | |
PWD/PhJ | IGKV | 89 |
IGLV | 3 | |
SJL/J | IGKV | 67 |
IGLV | 3 |
- Warren W.C.
- Harris R.A.
- Haukness M.
- Fiddes I.T.
- Murali S.C.
- Fernandes J.
- Dishuck P.C.
- Storer J.M.
- Raveendran M.
- Hillier L.W.
- Porubsky D.
- Mao Y.
- Gordon D.
- Vollger M.R.
- Lewis A.P.
- Munson K.M.
- DeVogelaere E.
- Armstrong J.
- Diekhans M.
- Walker J.A.
- Tomlinson C.
- Graves-Lindsay T.A.
- Kremitzki M.
- Salama S.R.
- Audano P.A.
- Escalona M.
- Maurer N.W.
- Antonacci F.
- Mercuri L.
- Maggiolini F.A.M.
- Catacchio C.R.
- Underwood J.G.
- O'Connor D.H.
- Sanders A.D.
- Korbel J.O.
- Ferguson B.
- Kubisch H.M.
- Picker L.
- Kalin N.H.
- Rosene D.
- Levine J.
- Abbott D.H.
- Gray S.B.
- Sanchez M.M.
- Kovacs-Balint Z.A.
- Kemnitz J.W.
- Thomasy S.M.
- Roberts J.A.
- Kinnally E.L.
- Capitanio J.P.
- Skene J.H.P.
- Platt M.
- Cole S.A.
- Green R.E.
- Ventura M.
- Wiseman R.W.
- Paten B.
- Batzer M.A.
- Rogers J.
- Eichler E.E.
- Lin M.J.
- Lin Y.C.
- Chen N.C.
- Luo A.C.
- Lai S.K.
- Hsu C.L.
- et al.
- Lin M.J.
- Lin Y.C.
- Chen N.C.
- Luo A.C.
- Lai S.K.
- Hsu C.L.
- et al.
- Rodriguez O.L.
- Silver C.A.
- Shields K.
- Smith M.L.
- Watson C.T.
- Wilkinson M.D.
- Dumontier M.
- Aalbersberg Ij.J.
- Appleton G.
- Axton M.
- Baak A.
- Blomberg N.
- Boiten J.W.
- da Silva Santos L.B.
- Bourne P.E.
- Bouwman J.
- Brookes A.J.
- Clark T.
- Crosas M.
- Dillo I.
- Dumon O.
- Edmunds S.
- Evelo C.T.
- Finkers R.
- Gonzalez-Beltran A.
- Gray A.J.G.
- Groth P.
- Goble C.
- Grethe J.S.
- Heringa J.
- ’t Hoen P.A.C.
- Hooft R.
- Kuhn T.
- Kok R.
- Kok J.
- Lusher S.J.
- Martone M.E.
- Mons A.
- Packer A.L.
- Persson B.
- Rocca-Serra P.
- Roos M.
- van Schaik R.
- Sansone S.-A.
- Schultes E.
- Sengstag T.
- Slater T.
- Strawn G.
- Swertz M.A.
- Thompson M.
- van der Lei J.
- van Mulligen E.
- Velterop J.
- Waagmeester A.
- Wittenburg P.
- Wolstencroft K.
- Zhao J.
- Mons B.
- Christley S.
- Aguiar A.
- Blanck G.
- Breden F.
- Bukhari S.A.C.
- Busse C.E.
- Jaglale J.
- Harikrishnan S.L.
- Laserson U.
- Peters B.
- Rocha A.
- Schramm C.A.
- Taylor S.
- Vander Heiden J.A.
- Zimonja B.
- Watson C.T.
- Corrie B.
- Cowell L.G.
2. Results
- •A schema that enables germline sets to contain rich information, and, importantly, can ensure that an identified germline sequence can be tracked through time, even if its name changes in the light of new information (Section 2.1).
- •Tooling that supports germline allele review, and the publication and use of germline sets that follow the schema (Section 2.2).
- •A community approach that allows researchers to co-operate in the development of germline sets in their species of interest, utilising the functionality provided by the schema and tooling (Section 2.3).

2.1 A schema and terminology for gene and allele curation
- •Sequence: A sequence of nucleic acids that was observed in or inferred from a single individual.
- •Allele: A known – but potentially unmapped – region within the genome of at least one individual.
- •Label: the name by which an allele is referred to in a germline set
- •Gene: A defined and mapped region within the genome of a species that groups alleles based on a shared, single ancestry.
- •Genotype: The collection of all alleles of a locus from a single individual, which may contain partial or complete phasing information allowing the alleles to be mapped to chromosomes.
- •Germline set: A curated collection of genes and alleles of a single locus of a species, which may be restricted to genes and alleles of certain populations within the species.
- •Locus: Used to distinguish the chromosomal regions in which IG/TR genes are located (e.g., IGH, TRB).

- •Researchers can obtain current germline sets that support the best possible analysis of their data at that point in time.
- •Researchers can refer to genes and alleles in publications using a defined nomenclature that minimises the possibility of ambiguity and supports traceability over time.
- •Researchers can use germline sets to annotate AIRR-seq repertoires with the likely V, D and J alleles underlying each read in the repertoire, and can examine germline sets to understand the supporting evidence underlying each listed allele.
- •Researchers can load germline sets into tools used to annotate AIRR-seq repertoires and other software at a keystroke, without manipulation.
- •Software tools can produce haplotypes (i.e., fully phased genotypes) and personalised germline sets (sets containing just those alleles discovered in a single individual) in the same standard format.
- •Repositories can publish the germline sets that were used alongside annotated repertoires, enhancing transparency and reproducibility.
2.1.1 The AIRR germline set schema
- Rubelt F.
- Busse C.E.
- Bukhari S.A.C.
- Bürckert J.-P.
- Mariotti-Ferrandiz E.
- Cowell L.G.
- Watson C.T.
- Marthandan N.
- Faison W.J.
- Hershberg U.
- Laserson U.
- Corrie B.D.
- Davis M.M.
- Peters B.
- Lefranc M.-P.
- Scott J.K.
- Breden F.
- Luning Prak E.T.
- Kleinstein S.H.

2.1.2 Temporary nomenclature
IETF: RFC 4648 (Base-N Encodings), (n.d.). https://www.ietf.org/rfc/rfc4648.txt (accessed May 1, 2022).
2.2 Supporting tools
2.2.1 IgLabel - A tool for managing the allocation of temporary labels
2.2.2 OGRDB - A system for managing and publishing germline sets
2.2.3 AIRR Data Commons
- Christley S.
- Scarborough W.
- Salinas E.
- Rounds W.H.
- Toby I.T.
- Fonner J.M.
- Levin M.K.
- Kim M.
- Mock S.A.
- Jordan C.
- Ostmeyer J.
- Buntzman A.
- Rubelt F.
- Davila M.L.
- Monson N.L.
- Scheuermann R.H.
- Cowell L.G.
2.3 A community approach to curation
- •Groups should be open to all researchers working with a particular species or locus. The AIRR-C Working Groups (which are open to non-members) provide a non-exclusive solution.
- •Overlap should be discouraged, i.e., where possible, there should be just one community group working on each locus in a given species. In general, the small number of interested researchers should make it easy to avoid overlap: however, the schema provides approaches to nomenclature that can be used to coordinate parallel efforts where necessary, for example by storing the list of allocated labels in a commonly accessible and versioned repository.
- •Groups should be free to determine the evidence and approach to review that best suits the overall aim of creating the best available germline set from the resources available, bearing in mind that the resources will vary considerably between species, and that the approach may vary considerably between inbred and outbred species or strains.
- •Decision-making, supporting evidence, and review criteria should be documented and transparent. The schema provides versioning and links to records in primary repositories to support this.
3. Discussion
- Rubelt F.
- Busse C.E.
- Bukhari S.A.C.
- Bürckert J.-P.
- Mariotti-Ferrandiz E.
- Cowell L.G.
- Watson C.T.
- Marthandan N.
- Faison W.J.
- Hershberg U.
- Laserson U.
- Corrie B.D.
- Davis M.M.
- Peters B.
- Lefranc M.-P.
- Scott J.K.
- Breden F.
- Luning Prak E.T.
- Kleinstein S.H.
- Ralph D.K.
- Matsen F.A.
Author contributions
Funding
Data availability
- Data is publicly available at the links for software provided in the manuscript
Declaration of Competing Interest
Acknowledgments
Appendix. Supplementary materials
S2: AIRR Community member endorsing this publication
References
- IMGT®, the international ImMunoGeneTics information system® 25 years on.Nucl Acids Res. 2015; 43: D413-D422https://doi.org/10.1093/nar/gku1056
- VBASE2, an integrative V gene database.Nucl Acids Res. 2005; 33: D671-D674https://doi.org/10.1093/nar/gki088
- Addressing IGHV gene structural diversity enhances immunoglobulin repertoire analysis: lessons from rhesus Macaque.Front Immunol. 2022; 13 (accessed May 1, 2022)
- Commentary on Population matched (pm) germline allelic variants of immunoglobulin (IG) loci: relevance in infectious diseases and vaccination studies in human populations.Genes Immun. 2021; 22: 335-338https://doi.org/10.1038/s41435-021-00152-6
- A BALB/c IGHV reference set, defined by haplotype analysis of long-read VDJ-C sequences from F1 (BALB/c /C57BL/6) mice.Front Immunol. 2022; 13
- Ability to develop broadly neutralizing HIV-1 antibodies is not restricted by the germline Ig gene repertoire.J Immunol. 2015; 194: 4371-4378https://doi.org/10.4049/jimmunol.1500118
- Segmental duplication as one of the driving forces underlying the diversity of the human immunoglobulin heavy chain variable gene region.BMC Genom. 2011; 12: 78https://doi.org/10.1186/1471-2164-12-78
- Worldwide genetic variation of the IGHV and TRBV immune receptor gene families in humans.Life Sci Alliance. 2019; 2e201800221https://doi.org/10.26508/lsa.201800221
- Using de novo assembly to identify structural variation of eight complex immune system gene regions.PLoS Comput Biol. 2021; 17e1009254https://doi.org/10.1371/journal.pcbi.1009254
- Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation.Am J Hum Genet. 2013; 92: 530-546https://doi.org/10.1016/j.ajhg.2013.03.004
- Polymorphism and utilization of human VH Genes.Ann N Y Acad Sci. 1995; 764: 50-61https://doi.org/10.1111/j.1749-6632.1995.tb55806.x
- IMPre: an accurate and efficient software for prediction of T- and B-cell receptor germline genes and alleles from rearranged repertoire data.Front Immunol. 2016; 7https://doi.org/10.3389/fimmu.2016.00457
- Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity.Nat Commun. 2016; 7: 13642https://doi.org/10.1038/ncomms13642
- Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data.PLoS Comput Biol. 2019; 15e1007133https://doi.org/10.1371/journal.pcbi.1007133
- LymAnalyzer: a tool for comprehensive analysis of next generation sequencing data of T cell receptors and immunoglobulins.Nucl Acids Res. 2016; 44: e31https://doi.org/10.1093/nar/gkv1016
- Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles.Proc Natl Acad Sci U S A. 2015; 112: E862-E870https://doi.org/10.1073/pnas.1417683112
- Inferred allelic variants of immunoglobulin receptor genes: a system for their evaluation, documentation, and naming.Front Immunol. 2019; : 10https://doi.org/10.3389/fimmu.2019.00435
- Novel allele detection tool benchmark and application with antibody repertoire sequencing dataset.Front Immunol. 2021; 12739179https://doi.org/10.3389/fimmu.2021.739179
- Critical steps for computational inference of the 3’-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7.Mol Immunol. 2018; 103: 1-6https://doi.org/10.1016/j.molimm.2018.08.018
- Parallel antibody germline gene and haplotype analyses support the validity of immunoglobulin germline gene inference and discovery.Mol Immunol. 2017; 87: 12-22https://doi.org/10.1016/j.molimm.2017.03.012
- High-quality library preparation for NGS-Based immunoglobulin germline gene inference and repertoire expression analysis.Front Immunol. 2019; 10: 660https://doi.org/10.3389/fimmu.2019.00660
- Structure and diversity of the rhesus macaque immunoglobulin loci through multiple de novo genome assemblies.Front Immunol. 2017; 8: 1407https://doi.org/10.3389/fimmu.2017.01407
- Sequence and characterization of the Ig heavy chain constant and partial variable region of the mouse strain 129S1.J Immunol. 2007; 179: 2419-2427https://doi.org/10.4049/jimmunol.179.4.2419
- Slow delivery immunization enhances HIV neutralizing antibody and germinal center responses via modulation of immunodominance.Cell. 2019; 177 (e28): 1153-1171https://doi.org/10.1016/j.cell.2019.04.012
- A comparison of immunoglobulin IGHV, IGHD and IGHJ genes in wild-derived and classical inbred mouse strains.Immunol Cell Biol. 2019; 97: 888-901https://doi.org/10.1111/imcb.12288
- Immunoglobulin light chain gene rearrangements, receptor editing and the development of a self-tolerant antibody repertoire.Front Immunol. 2018; 9: 2249https://doi.org/10.3389/fimmu.2018.02249
Kos J.T., Safonova Y., Shields K.M., Silver C.A., Lees W.D., Collins A.M., et al. Characterization of extensive diversity in immunoglobulin light chain variable germline genes across biomedically important mouse strains. bioRxiv 2022:489089. doi:10.1101/2022.05.01.489089.
- Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.Nat Genet. 2018; 50: 1574-1583https://doi.org/10.1038/s41588-018-0223-8
- OGRDB: a reference database of inferred immune receptor genes.Nucl Acids Res. 2019; https://doi.org/10.1093/nar/gkz822
- Rhesus and cynomolgus macaque immunoglobulin heavy-chain genotyping yields comprehensive databases of germline VDJ alleles.Immunity. 2021; : 0https://doi.org/10.1016/j.immuni.2020.12.018
- Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility.Science. 2020; 370: eabc6617https://doi.org/10.1126/science.abc6617
- IMGT® biocuration and analysis of the rhesus monkey IG Loci.Vaccines. 2022; 10: 394https://doi.org/10.3390/vaccines10030394
- A novel framework for characterizing genomic haplotype diversity in the human immunoglobulin heavy chain locus.Front Immunol. 2020; 11: 2136https://doi.org/10.3389/fimmu.2020.02136
- Profiling genes encoding the adaptive immune receptor repertoire with gAIRR Suite.Front Immunol. 2022; 13https://doi.org/10.3389/fimmu.2022.922513
- Characterization of the immunoglobulin lambda chain locus from diverse populations reveals extensive genetic variation.Genes Immun. 2023; 24: 21-31https://doi.org/10.1038/s41435-022-00188-2
- Targeted long-read sequencing facilitates phased diploid assembly and genotyping of the human T cell receptor alpha, delta and beta loci.Cell Genom. 2022; 2100228https://doi.org/10.1016/j.xgen.2022.100228
- The FAIR guiding principles for scientific data management and stewardship.Sci Data. 2016; 3160018https://doi.org/10.1038/sdata.2016.18
- The ADC API: a web API for the programmatic query of the AIRR data commons.Front Big Data. 2020; 3 (accessed May 1, 2022)
- AIRR Community Standardized Representations for Annotated Immune Repertoires.Front Immunol. 2018; 9 (accessed May 1, 2022)
- Adaptive Immune Receptor Repertoire Community recommendations for sharing immune-repertoire sequencing data.Nat Immunol. 2017; 18: 1274-1278https://doi.org/10.1038/ni.3873
- IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains.Dev Comp Immunol. 2003; 27: 55-77
IETF: RFC 4648 (Base-N Encodings), (n.d.). https://www.ietf.org/rfc/rfc4648.txt (accessed May 1, 2022).
- iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories.Immunol Rev. 2018; 284: 24-41https://doi.org/10.1111/imr.12666
- VDJServer: a cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements.Front Immunol. 2018; 9: 976https://doi.org/10.3389/fimmu.2018.00976
- The adaptive immune receptor repertoire community as a model for FAIR stewardship of big immunology data.Curr Opin Syst Biol. 2020; 24: 71-77https://doi.org/10.1016/j.coisb.2020.10.001
M.P. Lefranc, From IMGT-ontology classification axiom to IMGT standardized gene and allele nomenclature: for immunoglobulins (IG) and T cell receptors (TR), Cold Spring Harb Protoc. 2011 (2011) 627–632. 10.1101/pdb.ip84.
- Many human immunoglobulin heavy-chain IGHV gene polymorphisms have been reported in error.Immunol Cell Biol. 2008; 86: 111-115https://doi.org/10.1038/sj.icb.7100144
- Unique features of fish immune repertoires: particularities of adaptive immunity within the largest group of vertebrates.Results Probl Cell Differ. 2015; 57: 235-264https://doi.org/10.1007/978-3-319-20819-0_10
- Whole-genome duplication in teleost fishes and its evolutionary consequences.Mol Genet Genom. 2014; 289: 1045-1060https://doi.org/10.1007/s00438-014-0889-2
Article info
Publication history
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy