Being 33SuperWoman
Focus, if not now when? Start broad then narrow down - e.g. programming - R - functions!, form manageable tasks rather than wild unrealistic dreams.
Pages
Learning R
Beginning at the beginning:
R inferno - nice pdf, probably helps if you are already a programmer.
--------------------------------------------------------------------------------------
Youtube!
apply, effectively removes the need for a double for loop.
tapply expects vectors.
-----------------------
This is wrong!:
mean(2, -100, -4, 3, -230, 5)
This is correct!:
mean(c(2, -100, -4, 3, -230, 5))
7:48 PM | Filed Under | 0 Comments
11/12/2012
If using SQLForge, need to have installed (but not necessarily loaded) human.db0
---------------------------------------------------------------------------------
plot(.., col=ifelse(foo>2, "red", "black")
-------------------------------------------
1:35 PM | Filed Under | 0 Comments
The aim of the moderated t-test is to test whether the log-ratios are consistently greater or lesser than zero, not to test whether the log-ratios are equal. The moderated t-test does not penalize a gene for having a large variance as much as the ordinary t-test would do. Compared to the ordinary t-test, the moderated t-test gives more weight to the treatment fold change. It can therefore be viewed as a compromise between the ordinary t-test and ranking by fold-change.
--------------------------------------------------------------------------------------------------------------
The basic difference between linear models and glms is that with linear models, the F-test and all the individual t-tests that make it up can be conducted as one computation, but for glms each individual 1df test and overall F-test have to all be separate computations.
--------------------------------------------------------------------------------------------------------------
Delta beta refers to change in Beta value relative to a group.
----------------------------------------------------------------------------------------------------------------
Pearson correlations are a marginal measure of association and thus sensitive to confounding factors.
------------------------------------------------------------------------------------------------------
Agilent arrays - can't use robust-spline normalization, only global loess normalization is meaningful.
--------------------------------------------------------------------------------------------------------------
The basic difference between linear models and glms is that with linear models, the F-test and all the individual t-tests that make it up can be conducted as one computation, but for glms each individual 1df test and overall F-test have to all be separate computations.
--------------------------------------------------------------------------------------------------------------
Delta beta refers to change in Beta value relative to a group.
----------------------------------------------------------------------------------------------------------------
Pearson correlations are a marginal measure of association and thus sensitive to confounding factors.
------------------------------------------------------------------------------------------------------
Agilent arrays - can't use robust-spline normalization, only global loess normalization is meaningful.
6:05 PM | Filed Under | 0 Comments
22/11/2012
Because I always forget:
2^number
###################
(HAO) AND "Homo sapiens"[porgn:__txid9606]
2^number
###################
(HAO) AND "Homo sapiens"[porgn:__txid9606]
5:46 PM | Filed Under | 0 Comments
Tuesday, 13th November 2012
Bpipe
hs_er_pid4.log - is a Java crash dump (a 'hotspot error log')
See if something is still running:
bpipe jobs
go to the directory of the pipeline and type
bpipe log
bpipe run -n 8
########
48 cores, 264G of memory
#####################
pathognomonic - characteristic or symptomatic of a disease or condition.
-------------------------------------------------------------------------
CNV arrays:
For any future work, ideally batches of > 50.
Less than 10 is basically ineffective.
-----------------------------------
ClinVar aggregates information about sequence variation and its relationship to human health. Because the resource is still under active development, our preliminary website release is limited to our preview site.
hs_er_pid4.log - is a Java crash dump (a 'hotspot error log')
See if something is still running:
bpipe jobs
go to the directory of the pipeline and type
bpipe log
bpipe run -n 8
########
48 cores, 264G of memory
#####################
pathognomonic - characteristic or symptomatic of a disease or condition.
-------------------------------------------------------------------------
CNV arrays:
For any future work, ideally batches of > 50.
Less than 10 is basically ineffective.
-----------------------------------
ClinVar aggregates information about sequence variation and its relationship to human health. Because the resource is still under active development, our preliminary website release is limited to our preview site.
8:21 PM | Filed Under | 0 Comments
Thursday, 8th November 2012
grep -E "7$" hasoverlaps.bed.complete - get all lines ending in 7.
###################################################
Making a coloured .bed?
http://cloford.com/resources/colours/namedcol.htm
#######################################
###################################################
Making a coloured .bed?
http://cloford.com/resources/colours/namedcol.htm
#######################################
8:26 PM | Filed Under | 0 Comments
Wednesday, 7th November 2012
If I ever revisit quantitative PCR (!), good refererence: (PubMed)
##################################################
Perl - removing duplicates
#!/usr/bin/perl
while (<>) {
chomp;
($ID,$data) = (/^(\S+)\s+(\S+.+)/);
@list = split (/\s+/,$data);
sort @list;
%seen = ();
@unique_list = grep { ! $seen{$_}++ } @list ;
print "$ID\t" . join (" ",@unique_list) ."\n";
}
Input:
UGT1A3r 7364 54575 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 7363 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727
Output:
UGT1A3r 7364 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727
##################################################
Perl - removing duplicates
#!/usr/bin/perl
while (<>) {
chomp;
($ID,$data) = (/^(\S+)\s+(\S+.+)/);
@list = split (/\s+/,$data);
sort @list;
%seen = ();
@unique_list = grep { ! $seen{$_}++ } @list ;
print "$ID\t" . join (" ",@unique_list) ."\n";
}
Input:
UGT1A3r 7364 54575 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 7363 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727
Output:
UGT1A3r 7364 54575
UGT1A4r 54575 54490 54576
UGT1A8r 7363 54575 54576
UGT1A9r 54490
UMPK 51727
UMPK2 51727
2:13 PM | Filed Under | 0 Comments
5th November 2012
The R function sweep
Working with a matrix, want to change each row or column. Whether you operate by row or column is defined by MARGIN, as for apply..
E.g. add 1 to the 1st column, 2 to the 2nd, etc, of a matrix:
sweep(mymatrix,1,c(1:4), "+")
#############################################
#################################################
tar -zcvf archive_name.tar.gz directory_to_compress
##################################################
Reading in big files
library(sqldf)
f <- file="file" mybigfile="mybigfile" p="p">system.time(bigdf <- dbname="tempfile()," f="f" file.format="list(header=F," from="from" row.names="F)))<br" select="select" sqldf="sqldf">->->
See here
Working with a matrix, want to change each row or column. Whether you operate by row or column is defined by MARGIN, as for apply..
E.g. add 1 to the 1st column, 2 to the 2nd, etc, of a matrix:
sweep(mymatrix,1,c(1:4), "+")
#############################################
find
/path/to/files/
-name
"session_*"
-delete
#############################################
Type:
python
from Bio import SeqIO
SeqIO.convert("4788-1_S1_L001_R1_001_SISPA4_1.fastq", "fastq", "4788-1_S1_L001_R1_001_SISPA4_1.fasta", "fasta")
SeqIO.convert("4788-1_S1_L001_R1_001_SISPA4_1.fasta", "fastq", "4788-1_S1_L001_R1_001_SISPA4_1.qual", "quality")
#################################################
tar -zcvf archive_name.tar.gz directory_to_compress
##################################################
Reading in big files
library(sqldf)
f <- file="file" mybigfile="mybigfile" p="p">system.time(bigdf <- dbname="tempfile()," f="f" file.format="list(header=F," from="from" row.names="F)))<br" select="select" sqldf="sqldf">->->
See here
4:30 PM | Filed Under | 0 Comments
awk one liner
AWK
splitting a file on a pattern and naming the output files:
splitting a file on a pattern and naming the output files:
awk '/ENDMDL/{i++}{print > "file.pdb."i}' file
1:08 PM | Filed Under | 0 Comments
better or dead!
"Any time you encounter a small annoyance in your daily computer life, think about how you could write a program to help solve that problem. Any time you find something interesting that you want to experiment with, do it. Play with new concepts and tools and languages, as much as possible. Get into the mentality that a day/week/month in which you have not learned something interesting is a failure. But most of all: write code. Every day, even if it's just a regular expression to search through your email history or something. Do something programming-ish as often as you can."
Hippocrates (c. 400BC):
"Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile"
"Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult."
Hippocrates (c. 400BC):
"Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile"
"Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult."
5:37 PM | Filed Under | 0 Comments
back to base
transform
e.g.
head(airquality)
head(transform(airquality, Ozone = -Ozone))
# makes Ozone negative
head(transform(airquality, new = -Ozone, Temp = (Temp-32)/1.8))
# creates a new column called 'new' with negative ozone values
and changes the values of the Temp column
# example of subset
one <- subset(bnames, sex == "boy" & year == 2008)
one$rank <- rank(-one$percent, ties.method = "first")
# or
one <- transform(one, rank = rank(-percent, ties.method = "first"))
# the use of minus means the rank values go from 1,2,3.. instead of 1000,999,998...
# Fine for one year at a time, to do all at once:
# Split
pieces <- split(bnames,
list(bnames$sex, bnames$year))
# Apply
results <- vector("list", length(pieces))
for(i in seq_along(pieces)) {
piece <- pieces[[i]]
piece <- transform(piece,
rank = rank(-percent, ties.method = "first"))
results[[i]] <- piece
}
# Combine
result <- do.call("rbind", results)
# Or equivalently
bnames <- ddply(bnames, c("sex", "year"), transform,
rank = rank(-percent, ties.method = "first"))
7:25 PM | Filed Under | 0 Comments
Another thing often forgotten sometimes useful
log2-fold-changes are contained in fit2$coefficients.
9:18 PM | Filed Under | 0 Comments
R history
So again, something I have been meaning to look into: R history. By default only 512 lines are saved. Often I like to tinker with something until I get worked out exactly what I want to do, so saving a perfect script from the word go is not an option.
So:
Sys.setenv(R_HISTSIZE = Inf)
*should* save all of my history, which I can then export, edit and save as a .R file.
5:18 PM | Filed Under | 0 Comments
Google is amazing.
Have been meaning to look this up for ages:
how to set the cran mirror permanently.
3:14 PM | Filed Under | 0 Comments
Graphics
This is general, not just R specific, and I always forget the reasons so posted here:
jpeg and pngs are raster graphics that do not scale well without becoming pixellated, whereas pdfs are vector graphics that will scale well.
9:07 PM | Filed Under | 0 Comments
Learning R
Books to think about buying: http://www.springer.com/series/6991?detailsPage=titles
9:49 PM | Filed Under | 0 Comments
Coursera - Mathematical Biostatistics Bootcamp - HWK Quiz 1
P(A U B) is always equal to... (DeMorgan's laws)
5:26 PM | Filed Under | 0 Comments
Coursera
Fabulous, amazing, just started but has enormous promise.
10:55 PM | Filed Under | 0 Comments
Today is the first day...
So!, no 1, become a better programmer, best to
continue with what I mainly use, i.e. R, so become a better programmer
in R. Ways to do this? Program more often! I’ll start a list of
documents in one of the side bars. As I get into work early I have the
luxury of time to myself, so maybe about an hour every morning first
thing? Key to R – functions! First up, “The Art of R Programming”.
TODO: snow package and parApply
TODO: snow package and parApply
7:54 PM | Filed Under | 0 Comments