Module 2 Notes - SCIE20001 PDF

Title Module 2 Notes - SCIE20001
Author Janath Fernando
Course Thinking Scientifically
Institution University of Melbourne
Pages 11
File Size 700.5 KB
File Type PDF
Total Downloads 80
Total Views 139

Summary

Lecture content (not general content) for weeks 4 through 6....


Description

Janath Fernando 2020 Thinking Scientifically SCIE20001: Module 2 Notes – Bias and Self-Correction Video 1: Introduction to Bias and Self-Correction What makes scientific evidence trustworthy? What are the self-correction mechanisms which create philosophers of science called collective objectivity? -

Replication Error replication Peer review

Investing in these trustworthy mechanism is what makes science trustworthy. Are we investing enough in replication?  On the whole, we would like successful replication odds to be greater than 50%

Are we investing in error detection? -

-

E.g. rerunning statistic analysis by an independent researcher. An analysis of 20,000 biomedical papers found that 4% contain inappropriately duplicated images. o These duplicated images were found in published papers and so weren’t pulled up upon in peer review. It is common that error detection work is completed by retired scientists completing voluntary work. Those that complete replication studies are often unfairly known as the replication ‘police’ or ‘bullies’

We need to provide incentive into self-correction to build trustworthy science. -

“Transparency doesn’t guarantee credibility; it only guarantees you get the credibility you deserve.” Open science + invite and reward inspection (invite and reward criticism for the science to operate effectively)

The scientific method is shown to be a singular formula for all sciences & an impartial ‘view from nowhere’? However, the ideal scientific method is actually a disputed ideal for some sciences and

Janath Fernando 2020 mythical in practice  many contemporary scientists argue that there is no one set formula that scientists should follow. Scientific communities assess the inter-subjective validity of claims before regarding them as a trustworthy contribution to their objective fields of science. There are many good reasons to trust scientific knowledge even if the mythical scientific method is not one of them. Video 2: Cognitive Biases Science is done by humans and we are prone to a range of cognitive biases Visual perception is to illusions as thinking is to cognitive biases There are limits on our attention and memory. 

We create shortcuts to work around those limits. Sometimes they are useful and sometimes they are not

Confirmation bias  A tendency to search for, interpret, focus on, and remember information in a way that confirms our preconceptions.  E.g. putting a lot of weight on evidence that supports your theory.  To fit in with evidence that we encounter, we might even change our requirements  provide excuses for evidence that doesn’t fit our ‘definition’  The graph is based on nesting behaviours of ants  aggressive behaviour can be difficult to catch with ants (these types of behaviours are easy to overlook especially when we don’t expect them)  hypothesis: limited aggression exists between nest mates. o In this review study, and in some experiments, observers were ‘blind’ (i.e. unbeknownst) to the hypothesis that limited aggression to nest mates exist. o Aggression amongst nest mates (which goes against theory) was 3 times more likely to be reported in blinded versus non-blinded experiments. o Observations are prone to bias when the observer has an interest in the outcome of the study (e.g. when hypothesis is known)  How does confirmation bias manifest in science? o Cherry picking outcomes and studies (has caused a lot of problems) o The file drawer problem/publication bias (cannot be fixed with blinding)  driven by external pressures  scientific publication is very competitive, and they all want to publish positive, not negative results.  The file drawer problem can only be addressed through second level self-correction mechanisms.

Janath Fernando 2020 Hindsight bias 

 

“I knew it all along” As humans we like to fit new events into existing narratives to help us remember things and make sense of the world  we can trick ourselves into thinking we predicted something earlier than we actually did. HARKing (Hypothesis After Results are Known) There is a fine line between fooling ourselves, and actually making discoveries.

Mitigating confirmation and hindsight bias (what can we do?)  Need to make sure we are transparent and practice open science (lecture 3) o Preregister study plans o Publish registered reports o Keep lab notebooks, consult the historical record of ideas  Do more replication studies o Retest hypothesis (don’t use the same data to “generate” and “test” Video 3: What is Open Science? Open science is conducting research transparently enough for others to repeat EXACTLY what you did. Getting closer to a world where someone else can repeat a study without needing to contact you. Sharing data and material has numerous purposes:     

Clarifies your exact processes so people can evaluate them Allows people to check the analysis and data Reveals mistakes in analysis Makes questionable research practices more detectable Makes it possible for people to replicate the work

What is ‘sharing data’?  Data refers to information collected during your study.  Sharing data ideally means ‘FAIR’ data o Findable  Has detailed meta-data describing how the study was conducted and what the variables mean  Has a stable DOI or handle o Accessible  Available to everyone without needing to ask  Exceptions for:  Where there are legitimate privacy concerns  Ethics approvals preclude (limit  e.g. with consent forms) sharing of data o Interoperable  Community agreed formats and language for data and meta-data o Reusable

Janath Fernando 2020  

Include all of the data collected during the study (not the subset included in the paper) Machine readable (sharing databases or tables or CSVs of tables and not using something like PDFs)

Shows a substantial difference in data sharing between different field  there is also a trend over time. NOTE: not all of this data is fair data What is ‘sharing materials’? Materials: Anything you used to ‘do’ the study including: -

Data collection sheets Stimuli Protocols Analysis code/files

Sharing materials -

Share all materials you used Conduct analyses repeatably Share analysis code that will work on someone else’s computer

Data sharing is much more likely to occur than material sharing  plenty of room to grow.

Open Science Badges Acknowledge open science behaviours including: Sharing data Sharing materials Implemented by over 30 journals It has been shown to increase the rate of positive behaviours such as data sharing (Kidwell et al. 2016)

Where to share?

Janath Fernando 2020 Data  Data Dryad, Dataverse, PANGAEA Data and Materials  Figshare, GitHub, BitBucket Open Science Framework can be used to help researchers share their materials, code, and data, with the goal of making science more reproducible. https://www.youtube.com/watch?v=2TV21gOzfhw

Video 4: Replication The % of published literature that are replication studies are low (even ~1%) More than 70% of researchers have tried and failed to reproduce another scientist’s experiment, and more than half have failed to reproduce their own experiments. Efforts to ensure reproducibility can increase the time spent on a project by 30%. We are too heavily focused on breakthroughs and not focused enough on error detection and correction.

Direct replication projects  Reproducibility psychology project: in this project, a team of over 200 researchers around the world tried to replicate over 100 psychology projects  only 39 out of these 100 projects could be replicated successfully with the same results (e.g. same quantitative results, or results in the same significant direction) Large-scale replication projects show large issues in regard to replication (even in pre-clinical med and biology) Why is successful replication results so low?   

Absence of statistically significant results in published literature (i.e. think about file drawer problem) There are typically lacking non-significant or negative results (publicists don’t publish studies in journals that are not breakthrough. What if for every significant result every researcher publishes, there is a non-significant one hidden away in the file-drawer.

Janath Fernando 2020 This models of publishing removes entriely the temptation to judge articles based on whether their results were statistically significant or not. It forces reviwers and editors to judge work on how interesting the question, how detailed the method is, rather than based on how the results turned out  this is how publications should be (as researchers have control over this, and not on the results)

Hypothetico-deductive scientific method p-hacking refers to a range of statistical practices designed to push results under the significance threshold (p = 0.05)  e.g. not excluding outliers from the study. p-hacking, cherry picking and HARKING all increase the falsepostive error rate.

Publication bias has become worse over time  there are more and more positive results whilst the rate of negative (non-significant) results is decreasing. In fact, researchers are now using cherrypicking on negative results to turn them into positive results.

Janath Fernando 2020

Researchers believe that p-hacking is wrong is a jaywalking way, even though it is wrong in a grand theft sort of way. Many Analysts, One Dataset by Hannah Fisher Conducting multiple analyses of the same question using the same data can indicate how robust effects are to modelling choices BUT we usually only present a single result from a single model. In the referees give more red cards to darker skin toned players, the Odds Ratio is between 0.89 and 2.93 (predominantly between 0.89 and 1.71 – 2 potential outliers)  69% of analysts found a significant effect Why does differences occur?  People approach the same research question differently, even with the same data.  Different approaches leads to important differences in results.  Thus, we should have less faith in the results of a single model in a single paper but a greater reliance on people conducting multiple analyses of the same question within a paper (and examining robustness of overall finding with different ways modelled)  There should be more acceptance of multiple studies looking at the exact same question and evaluating how differences between these studies reflect the differences in which different researchers are approaching the question.

Janath Fernando 2020

Video 5: Error Detection Rewards for error detection: -

Data thugs “on a witch hunt” Methodological terrorists Replication bullies/replication police Second stringers School maams

Labs and supervisers are typically open to open science practices and foster error detection practices  PhD and Masters students. Interview 1 with Michele Nutjen (looks for inconsistent p-values)  StatCheck looks for inconsistencies in the null hypothesis (using the test statistic and degrees of freedom and p-values).  StatCheck checks if reported p-value is the same as the re-checked p-value.  StatCheck was initially run on 30,000 psychology papers  there is a fairly large inconsistency rate of 50% (a lot of these paper’s p-values differed in the 3rd decimal point). 1 in 8 papers was found to have a decision inconsistency (e.g. p-value reported was lower than 0.05 when in fact it should have been higher)  There are so many numbers in statistical analysis so there are numerous places in which a mistake can be made.  A lot of papers can be scanned at the same time  a benefit of powerful tools such as StatCheck.  Peer review can also be a very important tool for ‘fraud’ detection within scientific papers.  As StatCheck is an automated system, it can also make errors regarding the calculation of the p-value.  Whilst StatCheck can detect if, or if not there are discrepancies, it cannot determine where exactly the error has occurred (e.g. whether within p-value or d.f. etc). Interview 2 with Elisabeth Bik (looks for duplicated microscopic photos)  Scan publications and plots to see whether they are duplicated within a paper, and against the same photo.  Looking for the same microscopic photos (e.g. an area of a microscope) which are reproduced in different reports with different purposes (most probably done intentionally)  Most peer reviewers focus differently to investigators (might have different things they are looking for).

Janath Fernando 2020  When you find the errors, journals and publishers are contacted  sometimes nothing happens (only about 1/3 of them are corrected for their mistakes)  Error detectives often do not get met with congratulations and rewards but are often met with resistance  just volunteer work with goodwill the main prize.  Error detectives don’t often make a lot of friends but with time there are more positive comments coming about.  A huge amount of work goes into error detection and finding inconsistencies.  Usually not a full time job as non-paying but still takes up a lot of time.  Journals/publishers should be the best place where error detectives should be involved  institutions may have a conflict of interest.  Once published, its much more embarassing for publicists to take out a research report from the journal.  Where there is a lot of pressure to publish and time pressure, there could temptation to reproduce or partake in misconduct  a lot of prssure to get a significant result  A lab where pressure is alleviated is one can prevent misconduct from occurring.  If students see misconduct happen, they could whistleblow to the misconduct of their peer. Interview 3 with James Heathers (look for inconsistent summary statistics)  Looking for different kinds of errors.  There was a series of summary statistics (e.g. means, median, max, min) and it couldn’t be recreated manually  was trying to recreate summary statistics (there was not any data to live behind the summary from this psychology study).  The headspace for doing error detection is like hacking or trying to solve a puzzle.  1 in 6 papers had inconsistency which could not be explained.  When you have a row of integers (e.g. age)  if we had a sample of people and all of their ages was described in integers, when you do the average, whole numbers divided by ten have to be in tenths. If a value was obtained e.g. 10.07, then an error can be known to have been made.  Data entry can be an issue too  missing values that were not recorded (this could define the detected error)  StatCheck (Michele Nutijten) is automatic  nothing needs to be inputted and so has an advantage over GRIM (a paper just needs to be put into the system)  Replication is heavily under resourced much like error detection  probably won’t get funding as journals do not find it interesting.  When error detection methods such as GRIM was published, reactions are highly variable.  Mathematical criticism results in a lot of scrutiny for error detectors which is ironic.  A lot of scientific results will not live through being poorly organised  labs which are not well organised introduces a capacity which you cannot get past.  Assessment task for this module is basically an exercise as to how to organise our scientific material.  If you are in an environment where an individual complete work but are left of the publication goes hand in hand with the notion of “publish at means necessary.” Video 6: Practice, Culture and Peer Review Improving scientific culture: -

Open peer review Reward error detection Be open to criticism Preregister, or use Registered Reports

Janath Fernando 2020 -

Share data, be open – but beware that credibility requires more than just transparency

Open science is like opening the hood, but error detection is like finding what wrongs within the bonnet/engine and should invite and reward inspection. Interview with Simine Vazire – Editor in Chief of Collabora  

  



   

General public has the view of peer review that it should focus on finding errors, but instead focuses on the big ideas and the glamour reports. Often the reviewers are asked not so much if the report is solid and if it has no errors and rather “what are your thoughts on this?” So much more subjective than objective lens for reviewers. A lot of the substance of reviews is how to do something preferentially and rather the errors which often lie within data and results. Peer review can be improved by having blinding procedures where reviewers don’t know whose paper they are reviewing (e.g. whether they are from a more fancy institution) Reviewers should be asked to re-analyse the data. Utilising the skills of reviewers  publishers love flashy papers that have prestige which sways what reviewers really look for (peer review now often involves a number of people so the dissemination of tasks would be better). Peer review can be improved by linking these comments within journals  i.e. show the content of peer reviews (in not anonymous)  provides incentive to do a better job with review. Science should put self-correction first  can assist self-replication and replication for other researchers to occur  ammunition against critics of error detection. Students should work in labs in which a culture around openness and checking each other’s work Nitty gritty of lab practices are what is most important (e.g. is data shared between researchers or are reports registered) Publishing negative results should have more weight placed on it  heavily undervalued.

Video 7: Assessment Task ( ht t ps: / / osf . i o/ ) Create own account Build an Open Science Framework project page for a made-up project. (or a real one, if you have a group project in another class etc). Create components within your project (e.g. one where you could store background reading and scientific literature; one for your method files; one for analysis and results files) Explore how to make ‘private’ and ‘public’ components. What kinds of things would you keep private? What should be public? What you need to hand in: A document containing a screen shot of the OSF page you created, together with no more than 500 words of text reflecting on how making the scientific workflow and data accessible and transparent through platforms like OSF improves science’s self-correction capacity. In the written piece we want to see these three things, each worth a third of your mark for this assessment (absolute maximum of 500 words):  provide a link to the OSF project page you've created (for either a made up or real project, it doesn't matter)

Janath Fernando 2020  1-2 short paragraphs explaining the role that open science tools like OSF play in improving the transparency and openness of science  1-2 short paragraphs explaining the connection between transparency and openness on the one hand, and "self correction" on the other (e.g., are transparency and openness sufficient for self-correction? If not, what else do we need to ensure correction is 'working'?)...


Similar Free PDFs