openpml

Members Page(restricted)

Minutes of The 4th IBIC

JBIC 4th INternational Bio-data Interoperability Conference
DNA Variation and Phenotype Data based on XML

MINUTES
(Not verbatim, any missing data or misunderstandings are the fauly of yours truly / DF. Feel free to add to this!).

Day 1, 2006-10-17

Hideaki Sugawara opening statement.

Goal of meeting, to prepare PML2 submission and dissemination
1) Expansion of schema, 2)conversion/wrapping/implementation, 3) poster/oral presntations, publication (jornal, contents, ISO)

Heikki: Some informal discussion before meeting agrees that we should focus on describing what we are doing better, i.e. paper & practical examples.

Greetings from Fumiko Arata & Yoshihiko Hirai

Expresses support for PML

Takashi Gojobori, keynote "DNA Variation and phenotype data in the H-invitational Human gene database"

The time for international data standards has come. We need widely agreed data standards, rather than standards enforced by the strongest contender (i.e NCBI, EBI, Sanger). Molecular data may be easier to describe, but there is a strong need phenotype data.

Highlights the efforts of GenBank/EMBL/DDBJ for data standards.
Examples of successful collaboration where T.G. has been involved: Human genome, Rice genome

Annotation of full-length cDNAs (118 people, 40 organizations). www.h-invitational.jp
Issue: 40 organizations with different definitions. Annotation could not start.
Overcome through: 3 days of initial adjustment/compromised until all agreed. After this, the annotation could commence, and was successful. Resulting in a global transcriptome database, highly publicised (nature, science, PLOS Biology ? the main paper, highly cited leading to first impact factor noted for PLOS Biology).

Takehome messages:
Standardization essential for wide collaboration.
Possible for a group of invited experts. Open participation runs a risk of puzzlement, preventing consensus.
Large collaboration leads to wide publication
This activity should be a very important trigger for initiating/showing the value of wide international standardization of phenotype bio-data.

Other databases
H-ANGEL. Relative expression patterns of genes in 64? tissues
http://www.jbirc.aist.go.jp/

We also developed a database of known disease-related genes, H-Inv Disease Edition
Methods:
LEGENDA Text-mining, based on co-occurrence of keywords (from dictionary). This depends on standardization
Scoring system (PANDA), based on known disease gene and region of interest.
DNAPRobeLocator (Gene expression)
SNP Viewer
*Structural Variation
*Epigenetics

    • Two latter are especially important in e.g. cancer. Standardization needed also for this type of data

Genome viewer and human curation integrate the above data from public and private sources.
Outcome: Candidate genes. For example, OMIM-reported cancer-related gene ADP-ribosyltransferase PARP1 also linked to diabetes by PANDA.

Microsatellite collection (34000+ markers, worlds largest collection of verified microsatellites): http://www.jbirc.aist.go.jp/gdbs

Structural variation: Feuk et al review, shows many different variant dypes. Especially important for non-coding regions.
Pattern of methylation also appears to be very important (in cancer, for example)

Summary:
Standards help to integrate your data with known data. Ensure interoperability. Unless description is unified, your annotation is useless.
Beyond SNPs, standards are needed for:
CNV, Indel, chr abberrations
proteomes
pathways/network

Phenotype data:
Medical domains, e.g. cancer genomics
Comparative genomics (functional genomic domains, higher order phenotype similarities)
Metagenomics, high speed sequencing & functional predictions

Personal notes: we can cover structural variation and methylation (by allele definitions)

Heikki: The annotation pipeline, has it been used for other species?
A: Yes, rice
Heikki: is it packaged nicely? can it be shared?
-Yes it is. But it is a huge system, and transfer to DDBJ took 6 months.
Heikki: It predates the ontologies. Any plans to adapt?

  • Yes.. we have changed our annotation around a few times.

Discussion, Session 1

Heikki is elected chair.
Heikki:
We should focus on expanding the schema, and put a few hours into providing real world examples.
Martin will give a short presentation about where we are in the standardization process. Albert will tell us about G3P, and bring up the issue of assay sets (representing gtp-arrays in the model) that he brought up earlier on the mailinglist.
If anyone has anything else they would like to add to the agenda, please just bring it up. The agenda is flexible.

Debasis Dash presents Indian gtp project and HGVBASE-G2P.

See slides.
Heikki: There is a lot to discuss here. Let's do this tomorrow morning, when Tony is here.

Juha summarizes PML1 and PML2

Describes model.
Juha: in addition to gassay seth we are missing Instrument (these genotypes have been done using a specific instrument)
Albert: .. and operator etc? We should have some tracking of this.

Any questions?
Martin: Why is the arrow PML1:Population-PML2:observable striped?
Juha/Heikki: Because population inherits from the abstract class observable. Observable is never implemented
martin: This means you are changing PML1. You have added the PML2 class observable
Heikki: yes. we are adding this observable. this is a more generic concept. but we are not changing the way that PML1 works. We are doing this extention in PML2, to be able to link to Genomic observation. PML1 still works as is.
Martin: I am not convinced. To me it looks like you are rewriting PML1. And I don't know how to standardize this..
Juha: it's a matter of implementation. We don't have to change the model... but it looks simpler
Martin: Let's discuss this further in the break.

In PML2, phenotypes are Measurable_features.
Heikki: One particular instance of Measurable_feature and Observation_method, there is a hirearchical system that can be implemented as an ontology. This allows e.g. observations to be described to various degrees of specificity. It does not limithow the ontology should be build ? list of terms, tree or graph.
Martin: but should we have Ontology there? Perhaps it is better to just comment that those values can come from any ontology
Heikki: yes, we should be agnostic about ontologies. any ontologies should be possible.
martin: but the question is, are we then still useful to the communities..?
?: what happens when the ontology is not available to both systems ? source and target? should we embed the subpart of the ontology that is used (e.g. like PDF embeds its fonts)? Debate which is the best way to do it..
Martin: What is the main for PML2? For PML1 it was clear, it was for data exchange between database.
Juha: To describe the link between genotype and phenotype, so that we can describe an association study.
Martin: but what are the use cases? where are we going to use it? why?
Debasis: to be able to build (emerging) phenotype databases.

Martin: Do we know what is missing? Do we have anyone to check that what we are doing is right? Debasis says db:s are emerging, but do we have any cases that can be used? Why are you here? What are you expecting from PML2?
Albert: To take a step back, can we make sure that PML1 is used?
Martin: Yes, good point. I will adress this when the status of OMG standardization is brought up.
Heikki: let's do that now.
Martin: ok. the procedure, revision task force, will end in december. It can accept changes until Nov 6th (3 more weeks). So please submit required changes with solutions. This does not mean that things can't be changed after, but then the process has to be created again. So even if this is a meeting for PML2, we can also work on PML1.
Heikki: It is also clear that PML2 is practically independent of PML1. So these things can be progressed in parallell.
Martin: We have to make an initial submission of PML2 for Nov 16th . But major changes can be made for resubmission which is for June. It is also possible to change PML1 at that point, with the motivation that we need that specification to be changed for PML2.
Heikki: So are you suggesting that PML1 should be a separate standard? And that PML2 is only for the phenotype/g2p part?
Martin: I am not sure I am suggesting that, but both ways are possible.
Heikki: which are the official names?
Martin: PML2 is gDNA Variation and Phenotype data modelh. PML1 is g??*
Heikki: We need to decide an aim for this. And a name. We can't call it PML2.
Tony: We need to take a further step back. What is the scope of PML1 and PML2? PML1 should not cover only SNPs, but also copy number variation and methylation. There is a major problem here, as the concept of galleleh is not useful in a CNV context . there is no easily definable allele here, only genotype. PML2 might be gdata model for association studiesh.
Tony: One reason
Martin: Data standard PML (1) describe: Single Nucleotide Polymorphisms. The CNV level can be added within gDNA Variation and Phenotype data modelh. This enables us to not be hindered by standardization.. But for end users, we can define two separate models ? one is for genome level information (e.g gPMLh) and one for association studies (ge.g. PML2h).
Heikki: So the OMG models can be viewed as tickets for our development process. If we try to keep PML1 as a separate standard, and then build the genotype/phenotype information as another different submission this will lead to confusion. We should gforget abouth the PML1 submission as soon as possible and our second submission should include everything we have done so far! We just have to structure this into clearly defined areas ? one is a module describing the new/updated PML1.
Martin: Yes. The gDNA Variation & phenotype data modelh specification can be implemented
Albert: Is there a reason to keep things separate, for the reason of getting a specification out as soon as possible?
Heikki: In practise, we should forget about PML1 and PML2.
Tony: In all discussions I've been in about PML2, we talked about a genetic association studies. Is this a reasonable as a boundary as a scope for what we are doing? The title as it stands now is to broad and covers all of biological research.
? : It appears we are For PML, we are conclusive. But for phenotype, there are a lot of other players.. And here we as a group can't be sure that we are right, for anyone.. And then we have the thing in between.
Heikki: So can we conceptualize three models: 1)genotype, 2) a general phenotype model (e.g. clinical study) and 3) what sits inbetween, the association study
Tony: We need to talk to HL7 to see how they think about the genotype end of things..
Tonys definition of a Genetic Association Studies: look at the correlation between genotype and phenotype using a statistical inference from observations in a large number of individuals.
-- short break--
Heikki: (Shows a UML) After discussion during break, I believe the we should partition our thoughts into 3 parts.. Genotype/PML (add structural variation, and sets), 2) association (association study, mendelian trait). 3) phenotype (snomed etc)
Albert: Isn't mendelian trait a special case of association study?
Heikki: interesting point.. I would like to have this discussion later on, and am curious of the result. But let's start with the Genotype part and get that right.
Martin: I am worried that we start from scratch again.. we should discuss with models in from of us.

Albert gives presentation of P3G Consortium
www.p3gconsortium.org

  • Develop XML specifications for genotyping datra exchange between biobanks
  • No standard yet defined by major organizations such as NCBI

For a single effort, they may only use Affy and Illumina.. but this is putting their head in the sand, as they might probably extend the genotyping later, with other technologies..
Some avenues, use PML term definitions and specifications, but
PML is still lacking some key concepts
E.g. a panel of assays such an Affymetrix chip or an Illumina 1536 SNP panel.. multiplex issues.
Start a collaboration between P3G and PML group to extend the definitions? Advantages:
For P3G: Use of a OMG accepted specification. Maybe a creation of a core to be added in the grand application for October 14th. However, the window of opportunity is small, as we want to move quickly.
For PML: Access to biobank membership of P3G to make sure that their specifications will be used at some point.
How to exchange data
PML has had little real world use to date
hapmap is more proved
high.thoroughput and low-throughout genotyping might be treated in a different way for data exchange (e.g sample vs assyas and assays vs samples)
should we use tab-delimited text file for high-throughput genotyping

Heikki: this reminds me of something. Users needs a place to go and see how to use PML, as opposed to our old site that records our process. Is that something that JBIC can do?
Answer: yes..
Tony: We also need funding to be able to do the leg-work of disseminating this. And put the model into some high-profile efforts.. My strategy is to secure the funding. Without this funding, I don't think it's possible.
Martin: But is this true? Perhaps some of us already have some funding..
JBIC: we will create a webpage, and we will provide some training for using PML.
Martin: We need to provide documentation, tools for converting formats into PML etc in a professional way. Otherwise, PML will die. It should also be presented at meetings ? it is unfortunate that this is not done.
Tony: It has been done.
Heikki: Those ab
Albert: Drug trials is one area where standards are very important for good reasons. What's being used, and what are the expectations there? How to put this forward to FDA?
Tony: Glaxo have a large association study linked to the druggable genome. They are searching for ways to disseminate the data. Not sure what they plan for what you're asking though.. or if even FDA know what they want?
Martin: we also need an open emaillist. It should be searchable, e.g. a google group.
Sugawara: JBIC will create a webpage mockup..

!Albert: Hapmap will discuss with NCBI for how to put genotypes out in a standard format. NCBI are not pushing their own standard.
Heikki: Can we give you this task, to talk to Steve with Sugawara/JBIC, and report back within one month with something positive on how PML can be a the centre of this?
Albert: yes. although no guarantees. Steve may certainly stay agnostic about hits.
?: how do we know that something is PML compliant?
Heikki: we provide a tool online for validating data. is it updated?
jbic: no.
heikki: can we set a deadline for having this updated? one month?
jbic: yes.

Martin: manuscript? status of that?
sugawara: I can present the current status of the manuscript tomorrow.
tony: maybe we should work on development of PML1/2 first, to list problems?
heikki: pml2 will probably take too long time..
sugawara: timing of manuscript is important.

tony: can we go back to the problems of PML1?
heikki: yes. thanks to albert for the presentation, which got us going in all these directions.
albert: list the problems we have now. I have the set problem. Juha?
juha: size of file, and tracking instrument.
tony: the relative roles of alleles and genotypes (goes into cnv).

-- the above is discussed, juha models--
Assay set and instrument.
Not conclusive, but it seems that the problem is solved by adding an assay_set class linked to assay, and run linked to genotype.
File size
Juha: we might run into terabytes of data with this model
martin: unless we have a viable suggestion for a different implementation, I want to put this down as urban legend.. compression will help for the transfer stage
heikki: this is a real issue, but we really can't go for anything but XML at this point.

Structural variation
Tony: Genotype of a CNV should be the Genotype of the Marker, not the Genotype of an Allele. Can be fixed by adding gParalogueh class parallell to Allele.
Martin: I am sceptic to touching something in such a central place..
Albert: CNV is a result of insufficient technology that result in that we don't know where the copies are. I also think that we can represent this within the existing model. E.g. see definition of consensus_genotype
.. we are left to think about how this can possibly be solved until tomorrow.

Heikki: I remember something Juha said ? we haven't defined polarity of symmetrical SNPs in the model. We have to bring this up tomorrow.

Day2

Heikki summarizes day 1 as follows:

Structure

DNA variation and Phenotype Data Model

  • Next submission will cover everything
  • *We will attempt to split the submission into three
    • Genotype
    • Linking
      • Genetic association studies
      • Mendelian traits
    • Phenotype (general structure that we feel is right) ? work from previous meeting is solid, we just need to make sure that what we submit does not offend those that know more about this domain.
  • RFP for PML2 December 2005
  • Initial submission on 16 November
    • No review.
    • Time from initial submission to revised submission should bring in comments from users.
    • June 2007 revised submission
      • This is the one that will be evaluated by others

Talks given

Albert V Smith

Action Points

  • examples needed
  • New web site for users
    • send all data, abstracts to the mailing list
    • schema validator
    • tools: training, using
    • funding
    • dmain name
    • open maling list (google groups)
    • 17 Nov for the new website developed by JBIC
  • Manual
  • JSNP conversion tool ? miss Kuroda
  • dbSNP conversion
    • package: standalone, downloadable, documented tool
    • Albert talk to Steve, together with JBIC
    • 17th Nov deadline

Presentations

Tony: HGVbase-G2P data model, copy number changes
Tony,Albert,Juha: Instrumentation and multiplexing (left for them as an exercise after discussion)
Hudeaki Sugawara: manuscript
Matthew Darlison: HL7, openEHR

Discussion

Web
Martin: I would like to agree on a technology for webpages.. I propose cvs or subversion.
Sugawara: I am familiar with a few content management systems that we could use.
Heikki: there are a lot of those.. wordpress, etc.
Martin: I suggest we should go for simplest possible. Later, we can improve to something with better presentation, but I would like to start with something with separate sharing and presentation..
Heikki: so.. ok. how we do it is perhaps not so important, but that we do it.
Martin: JBIC, please make a decision and let us know tomorrow
Sugarawa: btw, www.pml.org was taken.
Sugawara: we should have a decision of the name for the website name on this meeting..
It is agreed that we should base the name on PML, somehow. The pml.ddbj.nig.ac.jo/index.html site provides good starting point for raw material, but we need a site with better sharing (focus) and less information that is end-user oriented.

Compliancy
Debasis: Asia-Pacific mtg on 9th of November. I would like to ask them to be PML compliant, how do we measure that?
Heikki: by the tool on the webpage.
sugawara: it won't be done by 9th of November
Heikki: yesterday we agreed on 17th November

CNV
We agree to make changes to the model, and later decide if this goes into a revised gPML1h submission, or into the new submission.
Tony: *presents the db used in his genotyping facility.

  • A Marker is a sequence, defined by its flanking sequence.
  • It is also defined by its Alleles, a version/versions of the sequence present for that marker
  • The Genotype defines what was observed, and does not need to be linked through Alleles.
  • Filesize ? 500k-1M results per individual. A lot of redundancy for this

Albert: We can do the CNV within the current model. we just need to open up assayed_genomic_genotype to allow for:
count
signal
gain/loss
ratio
.. etc
david/tony: perhaps we need a genotype definition table?
Albert: I would like to add a new table Measurement, recording the actual output of the instrument.
<discussion>
Resulting changes:
A specific Plate concept/class was introduced. Previosly Panel was serving for this purpose, but not explicitly.
We have added Run (broken out from Polymorphism_assay). Associated to assayed_genomic_genotype, assay_set and Plate.
We are removing the association between sample and assay (they only need to be connected when an experiment has been performed)

We have added Expected_genotype which enumerates the possible genotyping outcome from a Polymorphism_assay. It has attributes data_type=Allele_present<Allele_ratio<Allele_count and Signal_present<Signal_ratio<Signal_value and No_result. Precedence: if you known count, you know ratio. if you know ratio, oyu know gpresenth.
We removed the association between Genomic_allele and Assayed_genomic_genotype
Association between Oligo and Assay_set was added to allow for generic primers (Illumina etc)

Procedure
We will not change the PML submission to OMG, but these changes will be propagated to the public (via website) imminently.

Tony: we need attributes for all of these boxes.
Matthew: we also need to name the associations.

Albert would like:
*A design score for the quality of the assay design [go into polymorphism_assay?]
*wether an individual is a twin or not
Tony; is this really the place to model pedigrees?
Albert: there is a std for representing pedigrees (name??), we don't need to remodel this

One more bit
Tony: NIH has announced that they are doing a genotype-phenotype database. This puts more pressure on us to publish soon, and we should also contact NIH as they are asking for feedback from the community.
Tony: Human Variome project. They are publishing in Nature Genetics. We should target the same issue.. we have a few weeks for the manuscript, and even less time for contacting N.G.
Manuscript
Sugawara presents the current state of the manuscript. This is the same status that Heikki left it in 1 year ago.
Heikki: we realize now that readers don't care about ratification. we can add association studies and that we want to model enough of phenotypes such that people know that we are not ignoring those areas. but we focus on genotypes on which we are authorities.
We agree to write a brief communication to N.G. and a full report to PLOS/Genome Research/BMC/Human Mut report.
! Heikki is elected to be the front man for PML, or gP.I.h. He will have the help of one or two other main individuals for sharing responsibilities. Two suggested names are Tony and Sugawara.
These individuals will be responsible most importantly for getting the paper together, but also have the power to ask people to do tasks for getting everything together for this. This includes tools and webpages, and work for the manuscript.

Medical records
Matthew presents an update on openEHR, and promises to get more people from that community updated.
-the richness of clinichal reality may render enormous entries, e.g. the test was made on this machine with this serial nr.. where as the genomics community might stretch to gtest made on old machineh, but may leave gaping holes elsewhere if it's not clinically relevant. So there will be a granularity mismatch.

Tony/Martin: so how does PML fit in here?
Matthew: The 13606 format allows HL7 and openEHR to exchange data. So probably 13606 is a good starting point.. we need the right interface. But we DON'T want to remodel complex phenotypes.
Tony: In summary, we are interested in gwhat kind of structures do we want to be able to link toh, and there's nothing in clinichal phenotypes that looks incompatible with what we have done in the model so far. We don't have to worry now about decoders/parsers etc.. those are project specific.
Matthew: I will take back to my colleagues what we discussed here this morning, to explain what this community wants from the clinichal community.

Action points/Responsibilites
UML diagram ? Juha => XMI (with full documentation,)
Class definitions ? David
There was also a narrative for each subdiagram that explained how it works.
Convert UML diagram to XML schema ? Yasumasa

Juha to export current diagram today and provide to Yasumasa, who will check compatibility with conversion tool until tomorrow.

Phenotype
Tony goes through phenotype as it stands. missing items
Martin: I would like to move Environment/Lifestyle to category.
--wild discussion ensues.. -- outcome is: heikki/david says: this is ok..

 

Day3

Summary of Day2

Web server and domain name
*Openpml.org

Content management system
JBIC decides and will tell today
Deadline two weeks from here

Manuscript
Nat. Genet. (Brief Communication)
Heikki, Tony, Sugawara
Next two weeks
Published with human variome project
Full report to PLOS/Genome Research/BMC/Human Mut
Deadline?
PML Manual
Heikki decides on subchapters
How to use PML in different scenarios

Mr PML was elected
Heikki got the honor
Duties: front man for the PML, coordinates development work and work on publications/web pages
Corresponding author in the manuscript
Will get help from two other main individuals for sharing responsibilities
Suggested names are Tony and Hideaki Sugawara

OMG issues
No modifications to approved PML1 spec
We keep it as a retired specification and the main focus is on new PML2 model

Matthew presented openEHR and 13606
OpenEHR is specification for health information infrastructure
Detailed information on health records and information attached to it (like fins grained access rights etc)
13606 standard is a subset of openEHR used as a data change format/ststem between openEHR and other systems like HL7
Comments:
In PML all that richneww is not needed
PML should be compatible between the openEHR so that data can be mapped from openEHR to PML and vice versa
Data conversion tool?
OpenEHR people will evaluate the PML (gtp part)
Matthew will take care of that
UML Diagrams
Check compatibility of tools etc today (Yasumasa, Juha)

Data modeling (gtp part)
New class: Expected genotype
All expected genotypes of the specific assay
Alleles, signals, counts..
Plate, Run, new associations (see above)

Results for Day3 (Summary)
Full paper
A first draft should be the responsibility of JBIC, who are also making the webpage with similar content. JBIC should have main authorship as this is your project.. with a statement saying you made this happen. Not mission critical that we have a full description.

For the manual/cookbook we need examples for how you use the model when you are dealing with :
the laboratory gtp part, this is how you should use the model.
(mutation) database construction, this is how you should organize the data.
Submission batch
how to ensure allele polarity safeness

New or renamed classes/attributes/associations:
Value (from the GCP project) ? to replace some of our enumerations, strings etc.
Evidence (from the GCP project)
?Measurement (genotype raw data pointer..)
Quality score (possibly as attribute) in all important places)
Observation -> renamed PhenotypeObservation
FrequencyCluster: A set or aggregate of frequencies. It can have more attributes than Frequency.. ?which?
Genetic_association_study: A subclass of Genotype_phenotype_correlation_experiment. Has children Control_panel and Case_panel.
Association_study_panel: An specialized but abstract subclass of Panel. Associated (role:deals with) with Genetic_association_study
Subclasses: TestPanel or ReferencePanel. Associated (role:Defining feature) with PhenotypeObservation. Also associated to FrequencyCluster.
Genetic_association _study receives comment: gmust have feature of type resulth.

Plan for Initial submission:
Oct 27: Martin/Juha - Update the original umbrello UML model with the new classes, Description of things that are new, and any deletions/changes.
Fri Nov 3: Descriptions to Juha, who will pass on .xmi to Yasumasa, who works on the conversion to XML schema using his tool.
Nov 10: XLM schema submitted to Martin
Nov 13: Martin makes initial submission
May: Possible next meeting for finalization of submission.

 

Discussion starts
SNP polarity
Albert: It can be difficult to get the allele pinned down properly. E.g if you have a T/C snp it may be an A/G.. but these can usually be straighened up. but in the case of A/T and C/G SNPs, this may result in that you confuse the alleles in your result. There are a few solutions.

  • Rather than representing alleles like base-calls, present them like top/bottom or A/B. This comes from Affymetrix and Illumina. For the Illumina rules (A/B?) the call is calculated by counting alternating bases 5' on the watson strand and from the 3' crick until you find something in alphabetical order...??

www.illumina.com/General/Support/Downloads/TOPBOT_technote27Jun06.pdf

  • I have tools to check that the primers line up with what's reported from hapmap collaborators to check that strands..

Heikki: Can this tool be provided to the community?
Albert: well.. yes... it's been on my todo list for quite some time. we shall see..
Albert: Note, that there is a higher error rate for monomorphic SNPs.. in that there are discrepancies between populations ? one allele is reported from one population, and the other allele is in a another population. But people are mining this type of data for selection, and there are rare such events, so we can't enforce that they must be compatible (shortened answer/df)
Juha: We don't need to enforce this type of coding, but we need a flag to say which coding system has been used. I would put this in some sort of batch
Tony: This should be on allele level.. this is a key issue that people are struggling with, AND there could be different encodings within a batch.
Heikki: we can't solve this in the model.. but this should be part of the cookbook, where we make users aware of this issue and recommend the use of strand-safe methods. gThe polarity of the allele is essential for...h. The cookbook should be
Tony: we always represent alleles with flanking sequence.. I would recommend that we recommend that these are stored with alleles..
Albert: we keep and use the flanking sequence to determine strand. but we also need the primers, and put these two bits of information to test that everything is ok.
Martin: can we please move on.. these are details. what about the UML model? converter?

yasumasa: we need to keep on using Umbrello. the difference between the new EA program that Juha is using is to large for me to change the converter in a timely manner. especially when it comes to how inheritance is represented
Heikki: Tony, we will rely on you now for the discussion on how to separate the association study classes from the phenotype classes.
heikki: ok. juha will transfer the model to Umbrello after the meeting. I will assist if necessary.

Tony: Ok. we have the gtp part, and we have phenotype. will we finish phenotype first? that would take the rest of the day...
heikki/martin: we need to have stubs in for each of the packages that we are making: genotype, phenotype, the link between, bibliographic etc..
martin: we are heading towards a lot of classes, so we really need to separate into packages now, so that users may use only one of them.
Heikki: Let's hear Martins experience with a different (partly phenotype) modelling project

Martin presents his model for the GCP Domain model
gBruskiewich et al 2006h. It's based on the Fuge (Functional Genomics Experiment project) model. See
http://pantheon.generationcp.org/demeter/Features.html
http://pantheon.generationcp.org/demeter/Values.html

Tony: we clearly need an evidence piece for genotypes in our model
Albert: that's why I wanted to add a quality score, and Measurement class to the model
we agree that this should be put in
tony: perhaps we need quality scores to be allowed to be hooked in all over the place.

The Value and Evidence sections appear attractive.. we want to use these as-is pretty much..

Heikki: we can use this, but lets keep it hidded from the domain specific area. This will potentially confuse users. Also, are you confident that this part of the model is solid and haven't changed. I am frightened from following this project before.. it was very cluttered
Martin: I will help Juha put it in, and then you can view this, complain, and then we will change it accordingly ;)

Tony presents the HGVbase G2P association study db
See slide sent by Tony.
Heikki: Let's take this picture and start modelling..
Heikki; Let's start at the highest abstraction level, where we have any genotype/phenotype relationship.
?? Assay_result is suggested (not agreed) new name for Genomic_observation
Martin/Tony: but.. the previous model connects genomic_observation directly to genotype_phenotype_correlation_experiment.. don't like that necessarily. can't we start from the center, and work our way out instead?
<discussion ensues, not possible to resolve at this point>
Martin: I don't like the split nature of the phenotype part. I would like to put Phenotype, Environment, Lifestyle as types of Phenotype_Categories instead.. It is possible to use both, but.. I don' think Phenotype/Environment/Lifestile are necessary..
Tony: I think we will find out when we put in the association study bits..
Heikki: The subclasses allow us to link to them specifically from different places.. i.e. to phenotype from the association experiment, and to Environment from sample descriptions (something like this /DF)

There is a clear confusion to what Observation is in the PML model. Martin puts in a call for going from experiment to Phenotype description rather than through the value. gWithout phenotype, the value is meaninglessh.
There is also confusion to wether the Observation+Observation Method+Phenotype is gone big thingh that we link to the association experiment.. but the question is, what is the way in to that

q: There is also a question whether intermediate associations can be represented.
A: they can

Tony presents a Frequency centric model designed to handle case-control studies
The phenotypes define the sub-panels. The distribution of Genotype/Allele frequencies are then compared between these two subpanels.

Matthew wonders how this will cater to association studies done gbackwardsh i.e genotype all individuals, and partition by phenotype in individuals.
Peter Rice: both cases will be covered. the connections are the same.

Heikki: We have a total set of individuals = Panel. We have a subset of individuals for that Panel for whom we have Genotypes. We have another subset that we have Phenotypes for. Hopefully these sets are inclusive. The union of these form the set of individuals on whom we can perform analysis. Essentially what Frequency is currently describing is the genotyped portion of individuals.
Debasis: There are other types of association than frequency, e.g. nr of repeats vs age of onset. if you have more than a certain number of repeats.
Tony: this could possibly be represented as frequency, at least if you split them at a certain nr of repeats.
there are three limitations in the model I presented:
1)dichotomized traits rather than continuous
2)fixing phenotype asking for genotypes (not the other way around).
3)case-control study supported, not Transmission Desequilibrium Test (TDT)
so.. we are looking at 1 combination out of 9 possible combinations from the 3 parameters here.
matthew: so we need to make this clear in our worked examples.

martin: what is the result?
tony/heikki: the frequency delta between test and reference panels
martin: ok, where do we put this in?
<discussion>
martin: AND where do we put the statistical test? Can we add Feature to hold this? An object that can hold
heikki: this is a very generic object.. too generic.

Genetic_association _study receives comment: gmust have feature of type resulth.
gA study is a set of experiments (e.g.association experiments) adressing the same hypothesish.

 

Heikki: we are running out of time..
martin: we need some homework ? to fill in work from now on

 

Home | Top | Contact Us | ©2007 JBIC