The aim of the moderated t-test is to test whether the log-ratios are consistently greater or lesser than zero, not to test whether the log-ratios are equal.  The moderated t-test does not penalize a gene for having a large variance as much as the ordinary t-test would do.  Compared to the ordinary t-test, the moderated t-test gives more weight to the treatment fold change.  It can therefore be viewed as a compromise between the ordinary t-test and ranking by fold-change.
--------------------------------------------------------------------------------------------------------------
The basic difference between linear models and glms is that with linear models, the F-test and all the individual t-tests that make it up can be conducted as one computation, but for glms each individual 1df test and overall F-test have to all be separate computations.
--------------------------------------------------------------------------------------------------------------
Delta beta refers to change in Beta value relative to a group.
----------------------------------------------------------------------------------------------------------------
Pearson correlations are a marginal measure of association and thus sensitive to confounding factors.
------------------------------------------------------------------------------------------------------
Agilent arrays - can't use robust-spline normalization, only global loess normalization is meaningful.

22/11/2012

Because I always forget:
2^number
###################
(HAO) AND "Homo sapiens"[porgn:__txid9606]

19/11/2012

scvitic
spartak 25
[Note the space!]
##############

Tuesday, 13th November 2012

Bpipe
hs_er_pid4.log - is a Java crash dump (a 'hotspot error log')
See if something is still running:
bpipe jobs
go to the directory of the pipeline and type
bpipe log
bpipe run -n 8
########
48 cores, 264G of memory
#####################
pathognomonic - characteristic or symptomatic of a disease or condition.
-------------------------------------------------------------------------
CNV arrays:
For any future work, ideally batches of > 50.
Less than 10 is basically ineffective.
-----------------------------------
ClinVar aggregates information about sequence variation and its relationship to human health.  Because the resource is still under active development, our preliminary website release is limited to our preview site.

Thursday, 8th November 2012

grep -E "7$" hasoverlaps.bed.complete - get all lines ending in 7.
###################################################
Making a coloured .bed?
http://cloford.com/resources/colours/namedcol.htm
#######################################

Wednesday, 7th November 2012

If I ever revisit quantitative PCR (!), good refererence: (PubMed)

##################################################

Perl - removing duplicates

#!/usr/bin/perl
while (<>) {
chomp;
($ID,$data) = (/^(\S+)\s+(\S+.+)/);
@list = split (/\s+/,$data);
sort @list;
%seen = ();
@unique_list = grep { ! $seen{$_}++ } @list ;
print "$ID\t" . join (" ",@unique_list) ."\n";
}


Input:
UGT1A3r 7364 54575 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 7363 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727

Output:
UGT1A3r 7364 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727

5th November 2012

The R function sweep
Working with a matrix, want to change each row or column.  Whether you operate by row or column is defined by MARGIN, as for apply..
E.g. add 1 to the 1st column, 2 to the 2nd, etc, of a matrix:
sweep(mymatrix,1,c(1:4), "+")

#############################################

find /path/to/files/ -name "session_*" -delete
#############################################
Type:
python
from Bio import SeqIO
SeqIO.convert("4788-1_S1_L001_R1_001_SISPA4_1.fastq", "fastq", "4788-1_S1_L001_R1_001_SISPA4_1.fasta", "fasta")
SeqIO.convert("4788-1_S1_L001_R1_001_SISPA4_1.fasta", "fastq", "4788-1_S1_L001_R1_001_SISPA4_1.qual", "quality")

#################################################

tar -zcvf archive_name.tar.gz directory_to_compress

##################################################
Reading in big files
library(sqldf)
f <- file="file" mybigfile="mybigfile" p="p">system.time(bigdf <- dbname="tempfile()," f="f" file.format="list(header=F," from="from" row.names="F)))<br" select="select" sqldf="sqldf">
See here

awk one liner

AWK
splitting a file on a pattern and naming the output files:

awk '/ENDMDL/{i++}{print > "file.pdb."i}' file