Updated Post
After a lot of work, I have finally merged three different datasets. The result is a time series data frame with 43,396 observations of 7 seven variables. Below, I have included a few rows of what my data looks like.
Dyad year cyberattack cybersev MID MIDsev peace score
2360 2005 NA NA 0 1 0
2360 2006 NA NA NA NA 0
2360 2007 1 3.0 0 1 0
2360 2008 1 4.0 0 1 0
2360 2009 3 3.33 1 4 0
2360 2010 1 3.0 NA NA 0
2360 2011 3 2.0 NA NA 0
2360 2012 1 2.0 NA NA 0
2360 2013 4 2.0 NA NA 0
If I am interested in comparing how different country pairs (dyads) differ in how often they launch attacks (either in cyberspace, physically with MIDs, or neither)...how should I go about doing this?
Since I am working with country/year data, how can I get descriptive statistics for the different countries (Dyads) in my Dyad variable? For example, I would like to know how the behavior of Dyad 2360 (USA and Iran) compares with other countries.
I tried this code, but it just gave me a list of my unique dyad pairs:
table(final$Dyadpair)
names(sort(-table(final$Dyadpair)))
You mentioned using aggregate or dplyr - but I don't see how those will allow me to descriptive statistics for all of my unique dyads? Would you mind elaborating on this?
Is it possible for a code to return something like this: For Dyad 2360 during the years 2005-2013, 80% were NA, 10% were cyber attacks, and 10% were MID attacks, etc. ?
Upate to clarify:
Ok, yes - the above example was just hypothetical. Based on the nine rows of data that I have provided - here is what I am hoping R can provide when it comes to descriptive statistics.
Dyad: 2360 No attacks: 22.22% (2/9) ….in 2005 and 2006
Cyber attacks: 77.78% (7/9) ….in the years 2007-2013
MID attacks: 11.11% (1/9) ….in 2009
Both cyber and MID: 11.11% (1/9) ….in 2009
Essentially, during a given time range (2005-2013 for the example I gave above), how many of those years result in NO attacks, how many of those years result in a cyber attack, how many of those years result in a MID attack, and how many of those years result in both a cyber and MID attack.
I do not know if this is possible with how my data is set up —> since I aggregated cyber-attacks and MID attacks per year? And yes, I would also like to take into consideration the severity of the attacks (both cyber attacks and MID attacks), but I don’t know how to do that.
Does this help clarify what I am looking for?
merge
, you can set the argumentall = TRUE
to keep all records. For the rest, "how to make sense of my data so that it comes across in a paper and presentation" is far too broad. Stack Overflow is for specific, answerable, programming questions---that is a general, open-ended question about data analysis and communication. – Gregor ThomasNA
, different rating scales, etc. Whether and how much those will cause problems will depend on how you analyze them, but consistency is good and will generally make things better. I would advise (a) usingNA
consistently for missing values, rather than for 0s, (b) using consistent scales--1
makes sense to me as a non-severe attack,0
as no attack, andNA
as "we don't know". Transforming your data to do (a) and (b) is probably a good idea. And do so before you aggregate and take averages. – Gregor Thomasaggregate
, which you're already using, is a good tool for that. You'll have to define what you mean exactly by "the percentage of the time they launch cyber attacks" - maybe you mean the percentage of all attacks that are cyber attacks, or maybe you mean the percentage of years with attacks that include cyber attacks, or maybe you mean something else. Whileaggregate
is good in base R, you may finddplyr
more powerful, here's a nice introduction. – Gregor Thomas