Learning R

Beginning at the beginning: R inferno - nice pdf, probably helps if you are already a programmer. -------------------------------------------------------------------------------------- Youtube! apply, effectively removes the need for a double for loop. tapply expects vectors. ----------------------- This is wrong!: mean(2, -100, -4, 3, -230, 5) This is correct!: mean(c(2, -100, -4, 3, -230, 5))

11/12/2012

If using SQLForge, need to have installed (but not necessarily loaded) human.db0 --------------------------------------------------------------------------------- plot(.., col=ifelse(foo>2, "red", "black") -------------------------------------------
The aim of the moderated t-test is to test whether the log-ratios are consistently greater or lesser than zero, not to test whether the log-ratios are equal.  The moderated t-test does not penalize a gene for having a large variance as much as the ordinary t-test would do.  Compared to the ordinary t-test, the moderated t-test gives more weight to the treatment fold change.  It can therefore be viewed as a compromise between the ordinary t-test and ranking by fold-change.
--------------------------------------------------------------------------------------------------------------
The basic difference between linear models and glms is that with linear models, the F-test and all the individual t-tests that make it up can be conducted as one computation, but for glms each individual 1df test and overall F-test have to all be separate computations.
--------------------------------------------------------------------------------------------------------------
Delta beta refers to change in Beta value relative to a group.
----------------------------------------------------------------------------------------------------------------
Pearson correlations are a marginal measure of association and thus sensitive to confounding factors.
------------------------------------------------------------------------------------------------------
Agilent arrays - can't use robust-spline normalization, only global loess normalization is meaningful.

22/11/2012

Because I always forget:
2^number
###################
(HAO) AND "Homo sapiens"[porgn:__txid9606]

19/11/2012

scvitic
spartak 25
[Note the space!]
##############

Tuesday, 13th November 2012

Bpipe
hs_er_pid4.log - is a Java crash dump (a 'hotspot error log')
See if something is still running:
bpipe jobs
go to the directory of the pipeline and type
bpipe log
bpipe run -n 8
########
48 cores, 264G of memory
#####################
pathognomonic - characteristic or symptomatic of a disease or condition.
-------------------------------------------------------------------------
CNV arrays:
For any future work, ideally batches of > 50.
Less than 10 is basically ineffective.
-----------------------------------
ClinVar aggregates information about sequence variation and its relationship to human health.  Because the resource is still under active development, our preliminary website release is limited to our preview site.

Thursday, 8th November 2012

grep -E "7$" hasoverlaps.bed.complete - get all lines ending in 7.
###################################################
Making a coloured .bed?
http://cloford.com/resources/colours/namedcol.htm
#######################################

Wednesday, 7th November 2012

If I ever revisit quantitative PCR (!), good refererence: (PubMed)

##################################################

Perl - removing duplicates

#!/usr/bin/perl
while (<>) {
chomp;
($ID,$data) = (/^(\S+)\s+(\S+.+)/);
@list = split (/\s+/,$data);
sort @list;
%seen = ();
@unique_list = grep { ! $seen{$_}++ } @list ;
print "$ID\t" . join (" ",@unique_list) ."\n";
}


Input:
UGT1A3r 7364 54575 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 7363 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727

Output:
UGT1A3r 7364 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727

5th November 2012

The R function sweep
Working with a matrix, want to change each row or column.  Whether you operate by row or column is defined by MARGIN, as for apply..
E.g. add 1 to the 1st column, 2 to the 2nd, etc, of a matrix:
sweep(mymatrix,1,c(1:4), "+")

#############################################

find /path/to/files/ -name "session_*" -delete
#############################################
Type:
python
from Bio import SeqIO
SeqIO.convert("4788-1_S1_L001_R1_001_SISPA4_1.fastq", "fastq", "4788-1_S1_L001_R1_001_SISPA4_1.fasta", "fasta")
SeqIO.convert("4788-1_S1_L001_R1_001_SISPA4_1.fasta", "fastq", "4788-1_S1_L001_R1_001_SISPA4_1.qual", "quality")

#################################################

tar -zcvf archive_name.tar.gz directory_to_compress

##################################################
Reading in big files
library(sqldf)
f <- file="file" mybigfile="mybigfile" p="p">system.time(bigdf <- dbname="tempfile()," f="f" file.format="list(header=F," from="from" row.names="F)))<br" select="select" sqldf="sqldf">
See here

awk one liner

AWK
splitting a file on a pattern and naming the output files:

awk '/ENDMDL/{i++}{print > "file.pdb."i}' file

better or dead!

"Any time you encounter a small annoyance in your daily computer life, think about how you could write a program to help solve that problem. Any time you find something interesting that you want to experiment with, do it. Play with new concepts and tools and languages, as much as possible. Get into the mentality that a day/week/month in which you have not learned something interesting is a failure. But most of all: write code. Every day, even if it's just a regular expression to search through your email history or something. Do something programming-ish as often as you can."

Hippocrates (c. 400BC):
"Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile"
"Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult."

back to base

transform e.g. head(airquality) head(transform(airquality, Ozone = -Ozone)) # makes Ozone negative head(transform(airquality, new = -Ozone, Temp = (Temp-32)/1.8)) # creates a new column called 'new' with negative ozone values and changes the values of the Temp column # example of subset one <- subset(bnames, sex == "boy" & year == 2008) one$rank <- rank(-one$percent, ties.method = "first") # or one <- transform(one, rank = rank(-percent, ties.method = "first")) # the use of minus means the rank values go from 1,2,3.. instead of 1000,999,998... # Fine for one year at a time, to do all at once: # Split pieces <- split(bnames, list(bnames$sex, bnames$year)) # Apply results <- vector("list", length(pieces)) for(i in seq_along(pieces)) { piece <- pieces[[i]] piece <- transform(piece, rank = rank(-percent, ties.method = "first")) results[[i]] <- piece } # Combine result <- do.call("rbind", results) # Or equivalently bnames <- ddply(bnames, c("sex", "year"), transform, rank = rank(-percent, ties.method = "first"))

Another thing often forgotten sometimes useful

log2-fold-changes are contained in fit2$coefficients.

R history

So again, something I have been meaning to look into: R history. By default only 512 lines are saved. Often I like to tinker with something until I get worked out exactly what I want to do, so saving a perfect script from the word go is not an option. So: Sys.setenv(R_HISTSIZE = Inf) *should* save all of my history, which I can then export, edit and save as a .R file.

Google is amazing.

Have been meaning to look this up for ages: how to set the cran mirror permanently.

Graphics

This is general, not just R specific, and I always forget the reasons so posted here: jpeg and pngs are raster graphics that do not scale well without becoming pixellated, whereas pdfs are vector graphics that will scale well.

Learning R

Books to think about buying: http://www.springer.com/series/6991?detailsPage=titles

Coursera - Mathematical Biostatistics Bootcamp - HWK Quiz 1

P(A U B) is always equal to... (DeMorgan's laws)

Coursera

Fabulous, amazing, just started but has enormous promise.

Today is the first day...

So!, no 1, become a better programmer, best to continue with what I mainly use, i.e. R, so become a better programmer in R.  Ways to do this? Program more often!  I’ll start a list of documents in one of the side bars.  As I get into work early I have the luxury of time to myself, so maybe about an hour every morning first thing?  Key to R – functions! First up, “The Art of R Programming”.
TODO: snow package and parApply