n0b3l1a: August 2014

Friday, August 29, 2014

Eclipse hotkeys

Shift hover over function shows code
Alt-left, Alt-right previous view
F3
Ctrl+F3 outline
Ctrl+hover open declaration, implementation

Monday, August 25, 2014

RESTful web api for getting gene information

NCBI's E-Utils

BRCA1
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=672

BioMart
http://central.biomart.org/martwizard/#!/Search_by_database_name?mart=Hugo+Gene+Nomenclature+(HGNC)+(EBI%2C+UK)&step=1&datasets=hgnc
<!DOCTYPE Query><Query client="true" processor="TSV" limit="-1" header="1"><Dataset name="hgnc" config="hgnc_config_1"><Filter name="gd_status" value="Approved" filter_list=""/><Filter name="gd_app_sym" value="BRCA1" filter_list=""/><Attribute name="gd_aliases"/></Dataset></Query>

HGNC
http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=on&col=gd_app_sym&col=gd_aliases&status=Approved&status=Entry+Withdrawn&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&submit=submit&.cgifields=&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag

Monday, August 18, 2014

One Codex Wants To Be The Google For Genomic Data

http://techcrunch.com/2014/08/15/one-codex-wants-to-be-the-google-for-genomic-data/

As hospitals and public health organizations switch to using genomic data for testing, searching through genomic data can still take some time. Y Combinator-backed startup, One Codex, wants to help researchers, clinicians and public health officials, who have sequenced more than 100,000 genomes and created petabytes of data, to search this data.

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights

http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=0

Yet far too much handcrafted work — what data scientists call “data wrangling,” “data munging” and “data janitor work” — is still required. Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.

ClearStory Data, a start-up in Palo Alto, Calif., makes software that recognizes many data sources, pulls them together and presents the results visually as charts, graphics or data-filled maps. Its goal is to reach a wider market of business users beyond data masters.

Trifacta makes a tool for data professionals. Its software employs machine-learning technology to find, present and suggest types of data that might be useful for a data scientist to see and explore, depending on the task at hand.

Friday, August 1, 2014

Human Longevity Project

http://www.genomeweb.com/blog/deeper-bench

With the launch of his new company, Human Longevity, this year, Venter aims to not only sequence tens of thousands of people, but also collect physiological data such as how much blood their heart can pump and brain size. So far, he tells Tech Review that his company has sequenced 500 people who are now beginning to undergo those additional tests.

"Google Translate started as a slow algorithm that took hours or days to run and was not very accurate. But Franz [Och] built a machine-learning version that could go out on the Web and find every article translated from German to English or vice versa, and learn from those," Venter says. "And then it was optimized, so it works in milliseconds."