'intelligence'에 해당되는 글 3건

  1. 2019.04.03 Accuracy, Precision, Recall or F1, Artificial Intelligence, Statistics by CEOinIRVINE
  2. 2008.11.30 Investigation Begins as Siege in Mumbai Ends by CEOinIRVINE
  3. 2008.11.21 Comercial Vulnerability Alerts by CEOinIRVINE

Accuracy, Precision, Recall or F1?

 

Often when I talk to organizations that are looking to implement data science into their processes, they often ask the question, “How do I get the most accurate model?”. And I asked further, “What business challenge are you trying to solve using the model?” and I will get the puzzling look because the question that I posed does not really answer their question. I will then need to explain why I asked the question before we start exploring if Accuracy is the be-all and end-all model metric that we shall choose our “best” model from.

So I thought I will explain in this blog post that Accuracy need not necessary be the one-and-only model metrics data scientists chase and include simple explanation of other metrics as well.

Firstly, let us look at the following confusion matrix. What is the accuracy for the model?

Very easily, you will notice that the accuracy for this model is very very high, at 99.9%!! Wow! You have hit the jackpot and holy grail (*scream and run around the room, pumping the fist in the air several times*)!

But….(well you know this is coming right?) what if I mentioned that the positive over here is actually someone who is sick and carrying a virus that can spread very quickly? Or the positive here represent a fraud case? Or the positive here represents terrorist that the model says its a non-terrorist? Well you get the idea. The costs of having a mis-classified actual positive (or false negative) is very high here in these three circumstances that I posed.

OK, so now you realized that accuracy is not the be-all and end-all model metric to use when selecting the best model…now what?

Precision and Recall

Let me introduce two new metrics (if you have not heard about it and if you do, perhaps just humor me a bit and continue reading? :D )

So if you look at Wikipedia, you will see that the the formula for calculating Precision and Recall is as follows:

Let me put it here for further explanation.

Let me put in the confusion matrix and its parts here.

Precision

Great! Now let us look at Precision first.

What do you notice for the denominator? The denominator is actually the Total Predicted Positive! So the formula becomes

True Positive + False Positive = Total Predicted Positive

Immediately, you can see that Precision talks about how precise/accurate your model is out of those predicted positive, how many of them are actual positive.

Precision is a good measure to determine, when the costs of False Positive is high. For instance, email spam detection. In email spam detection, a false positive means that an email that is non-spam (actual negative) has been identified as spam (predicted spam). The email user might lose important emails if the precision is not high for the spam detection model.

Recall

So let us apply the same logic for Recall. Recall how Recall is calculated.

True Positive + False Negative = Actual Positive

There you go! So Recall actually calculates how many of the Actual Positives our model capture through labeling it as Positive (True Positive). Applying the same understanding, we know that Recall shall be the model metric we use to select our best model when there is a high cost associated with False Negative.

For instance, in fraud detection or sick patient detection. If a fraudulent transaction (Actual Positive) is predicted as non-fraudulent (Predicted Negative), the consequence can be very bad for the bank.

Similarly, in sick patient detection. If a sick patient (Actual Positive) goes through the test and predicted as not sick (Predicted Negative). The cost associated with False Negative will be extremely high if the sickness is contagious.

F1 Score

Now if you read a lot of other literature on Precision and Recall, you cannot avoid the other measure, F1 which is a function of Precision and Recall. Looking at Wikipedia, the formula is as follows:

F1 Score is needed when you want to seek a balance between Precision and Recall. Right…so what is the difference between F1 Score and Accuracy then? We have previously seen that accuracy can be largely contributed by a large number of True Negatives which in most business circumstances, we do not focus on much whereas False Negative and False Positive usually has business costs (tangible & intangible) thus F1 Score might be a better measure to use if we need to seek a balance between Precision and Recall AND there is an uneven class distribution (large number of Actual Negatives).

I hope the explanation will help those starting out on Data Science and working on Classification problems, that Accuracy will not always be the metric to select the best model from.

 

Posted by CEOinIRVINE
l
Investigation Begins as Siege in Mumbai Ends
Gunmen attack popular tourist sites in Mumbai, India, killing dozens and taking hostages.
» LAUNCH PHOTO GALLERY
MUMBAI, Nov. 29 -- Indian officials said today that 10 gunmen, nine of whom were killed, were responsible for the three-day assault on India's financial and cultural capital. Nearly 200 people died in the violence.


Pakistani officials, responding to charges by Indian leaders that the attack was carried out by an organization with ties to Pakistan, said Friday that a senior intelligence officer would travel to India, in an apparent attempt to ease tensions between the two nuclear-armed states.

Indian officials said they believe that at least some of the gunmen reached Mumbai by sea. After an interrogation of one of the attackers, Indian intelligence officials said they suspected that a Pakistani Islamist group, Lashkar-i-Taiba, was responsible. An Indian intelligence document from 2006 obtained by The Washington Post said members of the group had been trained in maritime assault.

Authorities said that the death toll had risen to 195 as more bodies were discovered and that 295 people were wounded, in attacks on the hotels, the Jewish center and several other sites in Mumbai. Among the dead were two Americans from Virginia; the American rabbi who ran the city's Chabad-Lubavitch center and his Israeli wife; and three of their visitors, including an American man, an Israeli woman and a man with U.S. and Israeli citizenship. In all, at least 16 non-Indians have been reported killed.

Security forces killed the last gunmen holed up in the Taj Mahal Palace and Tower Hotel here early Saturday and clean-up operations around the sites that had been attacked continued through the day. 

The government used 350 security forces and 400 police officers to capture or kill the gunmen, officials announced at a news conference Saturday. On the basis of preliminary inquiry, we know that there were a total of 10 terrorists. Nine have been eliminated, one is caught," said Vilasrao Deshmukh, the chief minister of the state of Maharashtra, of which Mumbai is the capital. "They split into teams of two for action, and there were four at the Taj."

In Washington, the White House announced that President Bush would speak about the Mumbai attacks at 12:30 p.m. Eastern time.

President-elect Barack Obama spoke Friday evening by phone with Indian Prime Minister Manmohan Singh to offer his condolences for those killed, Obama's office announced Saturday.

Secretary of State Condoleezza Rice spoke by telephone Friday with Obama for the third time since the attacks began to update him on information coming from India.

"These terrorists who targeted innocent civilians will not defeat India's great democracy, nor shake the will of a global coalition to defeat them," Obama said in a statement. "The United States must stand with India and all nations and people who are committed to destroying terrorist networks, and defeating their hate-filled ideology."

Deshmukh denied that there was any final statement to make about the nationality of the slain gunmen. But he said that the government was only certain that the one in their custody had confessed to being from Pakistan. He said Indian officials had no specific intelligence about an impending attack.

"The information that we get is always general, not specific. Mumbai is always on the target, it is a commercial city, it is an international city," he said. "It is a sensitive place, there is no denying that. But this kind of attack, not just on Mumbai but also on the nation, is something we did not anticipate."



Posted by CEOinIRVINE
l

eEye Preview (http://research.eeye.com/html/services/)
3Com TippingPoint DVLabs (http://dvlabs.tippingpoint.com)
VeriSign iDefense Security Intelligence Services (http://labs.idefense.com/services/)

'Hacking' 카테고리의 다른 글

Manually Unpacking a Morphine-Packed DLL with OllyDbg  (1) 2008.11.23
Dynamic-Link Library Creation  (0) 2008.11.21
Investigation of Vulnerabilities  (0) 2008.11.21
1.4. Assessment Service Definitions  (0) 2008.11.21
Snort Configuration : Linux  (0) 2008.11.18
Posted by CEOinIRVINE
l