No Tricks: AV

Showing posts with label AV. Show all posts

Friday, November 27, 2009

The Other Google Desktop

Two weeks ago the Economist ran an interesting article Calling All Cars, describing how systems such as OnStar (GM) and Sync (Ford) that were conceived for roadside assistance have expanded beyond their original service offerings to include remote tracking and deactivation (car won’t start), slowing down a moving car to a halt (no high speed chases), fault diagnosis and timely servicing. Even so, 60,000 OnStar subscribers a month still use the service to unlock their cars – the auto equivalent of password resets.

I have often wondered why AV vendors have not leveraged their platforms and infrastructure significantly beyond their initial offerings in the same way. The larger vendors that service enterprise customers have a sophisticated update network for clients, that feeds into corporate networks for secondary distribution internally. Desktops are equipped with software for searching against a database of signatures, accepting or initiating dynamic updates, plus monitoring and reporting. Surely this is a basis for useful enterprise applications beyond the necessary but not so business-friendly task of malware scanning?

It is widely reported that the traditional AV signature cycle of detect-produce-distribute is being overwhelmed, and the effectiveness of AV solutions is decreasing. So AV companies should be on the lookout for new and perhaps non-traditional functionality. But even if this was not the case it would be worthwhile to consider additional services bootstrapped off the installed base.

I think one generalization would be the extension of the search capability away from signature matching towards a Google desktop model - away from search-and-destroy to search-and-deliver. Imagine if Norton or Kaspersky were presented to users as document management systems permitting tagging, search of file content, indexing, and database semantics for files – that is, provide a useful service to users beyond informing them that opening this file would not be a good idea.

In the corporate setting, desktop document search and analysis could provide many useful functions. Let’s take data classification for example. I am not sure if we can ever expect people to label documents for data sensitivity. Even if people were resolved to be diligent in the new year, there would still be a large legacy problem. Imagine now that senior management could create a list of sensitive documents and then feed them into an indexer, which distributed data classification “signatures” to desktops. The local software can scan for matching documents (exact or related), create the correct labelling and perhaps even inform the users that such documents should be dealt with carefully, perhaps even pop-up the data classification policy as a reminder.

You could also track the number of copies and location of a given sensitive document, such as a drafts of quarterly financial results, which must be distributed for review but only within a select group. Again management could define which documents need to be tracked and feed them into a signature engine for propagation to desktops. If a document fell outside the defined review group, then a flag could be raised. When a sensitive document is detected as being attached to an email the user can be reminded that the contents should be encrypted, certainly if the recipients are external to the company, and perhaps prevented from even sending the document at all.

The general innovation here is to permit client management to define search and response functions for deployment within the AV infrastructure, extending beyond the malware updates from the vendor. I think there are many possible applications for managing documents (and other information) on the basis of the present AV infrastructure for distribution, matching and scanning, especially if local management could create their own signatures.

I have to admit that I am not overly familiar with DLP solution capabilities, and perhaps some of my wish list is already here today. I would be glad to hear about it.

Tuesday, March 31, 2009

Randomness Tests for Packed Malware

In my post Risk Factors for AV Scanning, I discussed a recent patent filing by Kaspersky related to reducing the amount of malware scanning based on various file properties. One of those properties was whether the file is packed or not, since packed files are to be treated more suspiciously than plain unpacked files.

The impact of packed malware is discussed more thoroughly by Tzi-cker Chiueh of Symantec Research Labs in a 2007 presentation on the difficulties in detecting malware, where he devotes 5 slides to the "Packer Problem". Packing adds another dimension to the production of malware signatures which must now account for potentially ten or more layers of packing that obscure the real code that needs to be scanned. Symantec knows of over 1000 different packing programs all of which need to be recognised and stripped off to apply signature scanning.

But is it possible to distinguish between suspicious and actual packed malware using statistical properties of the files without unpacking?

Three authors (Tim Erbinger, Li Sun & Serdar Boztas) have done some interesting research into developing specific randomness tests for detecting packed malware. As it turns out, malware does exhibit a definite randomness signal when the randomness measure accounts for local patterns and structure.

The paper begins by observing that common randomness measures, such as the entropy or other statistical tests, compute a global measure of randomness over all available data and tend to obscure (local) sections of code that may be highly random or highly structured. The figure below shows the distribution of bytes in the program calc.exe before and after packing with UPX. The distribution becomes more uniform after packing, but still far from uniform.

The authors proposed several randomness tests that preserve locality (structure or lack thereof), based on constructing a Huffman tree at the byte level of the data (packed file). The Huffman tree algorithm computes codes for each observed byte where more frequent bytes are assigned shorter codes. If the data is random, causing the byte frequencies to be similar, then the Huffman tree codes will be roughly of similar length. Structured data on the other hand will produce codes of quite differing lengths.

The authors then sample the data at various bytes positions, collecting the corresponding Huffman codes, and then normalise the code lengths between 0 and 1 (where a higher value means more random). The byte sampling strategies proposed are (1) sample a fixed number of bytes equally spaced in the data (2) slide a window of fixed size across all the data. Below we see the sliding window test applied to the basename binary from UnxUtils before and after being packed with FGS 2.0.

UnxUtils has 116 binaries ranging in size from a few KB to about 200 KB, and the authors performed a collection of experiments to validate their local randomness tests using 6 well-known packers.
The authors observe that there is a definite characteristic signal of packed files. They go onto to further propose additional tests that can can be used to discriminate between packers.

Returning to Tzi-cker Chiueh and his packer problem, he suggests to workaround packing by Just-in-Time scanning approach which tries to scan a file just before it runs in its unpacked form, and also to consider whitelisting. Perhaps this new local randomness test from Erbinger, Sun & Boztas can help him out.

The Positive Trust Model and Whitelisting

I recently came across the presentation The Positive Trust Model and Whitelists, by Wyatt Starnes of Signacert, a company that specialises in whitelisting solutions (get the video here). I thought the presentation made some good points, worth repeating here, and extending with other opinions (of which there are many).

Whitelisting, like virtualisation, is a mainframe concept largely forgotten in the era of personal computing, recently rediscovered in our modern IT environment. There are various forms of whitelisting for security purposes, for example in the context of combating email fraud and SPAM, but here we will be concerned with application whitelisting - a method to ensure that only approved applications and their associated executables are permitted to run on a given machine.

John W. Thompson, CEO of Symantec, from his 2008 RSA keynote, supported whitelisting in the face of growing malware diversity (quoted by Starnes)

From where I sit, a few things are very, very, clear. If the growth of malicious software continues to outpace the growth of legitimate software, techniques like whitelisting, where we identify and allow only the good stuff to come in, will become much, much, more critical.

This is a telling statement from the CEO of a company whose cash cow is desktop AV software, the epitome of blacklisting technology. Thompson, and other companies whose business models are firmly based on blacklisting, now agree that whitelisting as a malware defence is an idea whose time has come.

Malware is Increasing

Blacklisting is not really about maintaining a list of prohibited software, but rather maintaining a database of malware signatures to evaluate the state of software through scanning. Software is blacklisted when it is identified to have characteristics identical or similar to known malware. And this is the key point - known malware. The successful and timely identification of malware depends on the rapid identification, production and distribution of updates to signature databases.

Over the last year an inflection point was reached where malware crossed over as being produced in greater quantities than legitimate software. We are heading to the same state of affairs in email where SPAM dominates the number of legitimate messages. Starnes depicted this situation as follows

The slide heading for the graphic above is Chase the Infinite or Confirm the Finite? The question asks whether it is a better IT defensive strategy to attempt to screen a wide and increasing variety of malware, or focus on maintaining the current integrity of the known components of your IT system.

A presentation from Martin Fréchette of Symantec Labs, given at RAID 2007, provides more background. First he has a more detailed graph on the number of new threats, which are essentially increasing exponentially.

By the end of 2008 there were approximately 1,000,000 known examples of malware, over 2/3 of which had been produced in 2008. That is, 2008 saw more malware produced than all previous years combined. While this sounds alarming, Fréchette notes that part of the reason known malware has been increasing rapidly is due to better detection methods, in particular honeypots and malware sensor networks.

But malware is also increasing due to a change in strategy of the malware industry. Fréchette observes a shift from a mass distribution of a small number of threats to micro distribution of millions of distinct threats, more recently referred to as targeted attacks. Symantec has observed single days where 10,000 new virus strains have been produced, mainly through a technique known as server-side polymorphism, which can automatically regenerate malware strains.

Fréchette notes that the micro distribution strategy is greatly reducing the effectiveness of classic malware signature detection. Even just a few years ago a single signature could be expected to protect 10,000 users whereas today that expectation has dropped to less than 20 users. That is, malware attacks are so specific that signatures serve only to protect small groups of users. Thus signatures must be produced in vast numbers to protect the broader user community.

The Twilight of Blacklisting

The AV blacklisting industry has reached a point of diminishing returns - the marginal value of producing additional signatures is minimal, but the underlying model can offer no more advice than to simply keep doing exactly that. The AV signature cycle of detect-produce-distribute is being overwhelmed, and the effectiveness of AV solutions (that is, the fraction of known malware that is detectable) is decreasing. Equivalently, the false positive rate is increasing, and consumers are getting less protection than they expect.

There is a significant burden on networks to distribute signatures and also on platforms to perform scanning. Scanning each and every file is neither feasible nor effective. In October last year I posted some remarks on a new patent granted to Kaspersky for a risk-based approach to AV scanning which described criteria (risk factors) for reducing the amount of file scanning. In July last year Robert Vamosi of CNET reported that Norton will also follow a risk-based approach in 2009 with their products, creating a trust index that will be used to judge how often files are scanned. However when I posed the question Would you support less malware scanning to improve user performance? over at LinkedIn the resounding answer was No.

Blacklisting only has a future as a primary security defense if we can actually find ways to do less of it and still retain a low false positive rate. But this sounds like squaring the circle.

Charge of the White Brigade

Enter Whitelisting. Rather than attempting to determine if an arbitrary file (executable) is malicious based on signatures or other criteria, whitelisting creates approved copies of software and simply checks whether the current copy of a binary is the same as its approved copy. Software that is not on the approved list are blocked from running, period. Starnes represents the whitelist production process as follows

There is a significant reliance on hashing and signatures for trust, where signature here means a PKI signature, not a blacklist signature. Unauthorized change in a file is detected (with high probability) by a change in its associated hash which will in turn cause the signature verification step to fail.

Notice that the hash-sign paradigm of trust here detects changes in software, and does not indicate the absence of security vulnerabilities in software. The point of whitelisting is not to prevent insecure software from being unintentionally loaded onto desktops through an authorized software distribution process. It strives to prevent software (whether secure or no) from being loaded on your desktop in an unauthorized manner. Whitelisting makes sure that the assumed good software stays good, and keeps out the unknown and potentially malicious, software. In essence whitelisting is about maintaining a known software state, and implementing authorized change from one known state to another.

Whitelisting therefore requires a repository of trusted software, which Starnes refers to as a collection of platinum images (you can pick your favourite precious metal or stone).

So while blacklists require a signature database and other contextual information for assessing potential malware, whitelisting also requires a repository for proper functioning. The difference is that whitelisting mainly performs comparisons between software to be executed and its respective repository image (a simple check), while the blacklisting database is used to scan and assess the security of the software in question (a more difficult operation).

The size of the whitelist repository grows as a function of the software base supported, while the blacklist database grows in proportion to the amount of known malware - which we have just seen is increasing at an unchecked rate. Chase the infinite or confirm the finite?

While the whitelist model is compelling, in practice it requires significant work to deploy. Assessing and creating the initial list of approved software is a significant task, and would be greatly assisted by existing CMDB implementation efforts in a company. Also, the success of whitelisting is wedded to its tight integration into the software patching and upgrading process. The repository will be in constant flux since software itself is in a constant state of (approved and required) change.

The Way Forward

Not unexpectedly, there is quite a bit of hype around whitelisting, mainly concerning its magical powers to cure security conundrums such as zero-day attacks and endless patching cycles for example. But it will do neither of these two things. Whitelisting does not prevent developers from coding buffer overflows into applications, nor does it prevent malicious attacks from exploiting them. Developers will still require security coding education, and identified vulnerabilities will still require patching.

Nonetheless, the major vendors agree that we will require blacklisting in some form, but whitelisting may become the new leading actor. Bit9 (a whitelist provider) and Kaspersky (a blacklist provider) have teamed up to provide a hybrid consumer solution, where software is scanned only when it is not present on a whitelist. This is not quite whitelisting as intended but it represents the first step in a gradual integration, and more importantly, a way to preserve the blacklisting revenue model. One way or another, whitelists will be coming to a desktop near you soon.

You can find the research used to produce this post as a FreeMind mindmap rendered into Flash here.

Risk Factors for AV Scanning

Wednesday, October 22, 2008

Risk Factors for AV Scanning

If you work in a large company then you probably are familiar with the all-too-regular process of your desktop AV software performing a scheduled full disk scan. This may happen a few times a month, and during the scan (which may last a few hours) you typically experience quite sluggish performance. You may also get the sinking feeling that most files are being needlessly scanned (again). Kaspersky, a security software vendor that includes AV, gets that feeling as well. Treating all files on your desktop as equally likely to contain malware is wasteful, but without any criteria to discern less-likely from more-likely malware candidates, the current regime remains. Kaspersky observes that users are only willing to wait a few seconds at the outside for AV to perform its checks, and typically AV scans are limited to what can be done in these windows of user-defined expectations.

Kapersky has decided to make AV scanning more efficient not by making it faster but by doing less, as determined by risk-based criteria. And they were recently issued a US patent for this approach. If you have been looking for a simple example to highlight the difference between traditional security and risk-based security then your search is over.

Be warned that the patent is repetitive and not very clearly written, in keeping with the style of such documents. Patents are lawyers' solution to the legal optimisation problem of being both vague (to support the broadest claims) and specific (giving details in an embodiment that demonstrates the invention). Also towards the end, beginning with the paragraph describing FIG. 2, you can read the convoluted legalese required to define a "computer" and a "network".

Trading Risk against Speed

The purpose of the patent is "balancing relatively quick (but less thorough) anti-malware checks with more thorough, but also more time-consuming, anti-malware checks". The basic approach is to employ different scanning strategies for known files that have been previously scanned, and new files whose status is unknown to the AV software. Known files will receive a quick signature-based scan, while unknown files may be subject to more detailed scans.

When unknown executable files are first launched a risk assessment is performed to determine the appropriate level of scanning. The risk assessment evaluates a collection of risk factors that produces a metric which determines whether to rate the file as having a high, medium or low risk of containing malware. This rating in turn determines the thoroughness of the scan to be performed on the file. Options for a more detailed anti-malware scan can include heuristics analysis, emulating file execution (in an isolated environment, and steeping through particular instructions) or the statistical analysis of instruction patterns. Beyond local checks, the AV software may also consult online scanning services for additional information. Employing these more sophisticated scanning methods increases the rate of detection at a cost of additional processing time.

Risk Factors

The patent provides some example risk factors for the purpose of satisfying the embodiment requirement of the invention. While the risk factors are intended only as examples, they are interesting nonetheless.

Online Status

The patent mentions that it can take between 15 minutes to 2 hours to update local AV databases when new malware appears. It is suggested to contact an AV server directly to obtain the latest information available. If the executable is sitting on a blacklist then it is likely to be malware (that's why its on the list), and if its on a whitelist then the likelihood of malware being present is low.

File Origin

If the origin of the file is a storage medium such as CD or DVD then it it less likely to have malware than if the software was distributed over the internet. Email attachments are always suspicious. For downloaded files, the URL source of the download should be considered to determine if the origin is a suspicious web site or P2P network.

File Compression

Malware is now commonly compressed (packed) to thwart signature-based methods of virus detection. Packed files should be treated as being more suspicious than unpacked (plain) files (see here for a general discussion on how malware author use packing to hide their payloads).

File Location

The current location and/or path to the file can also be considered, since some executable files install themselves in a particular directory, especially those directories that are infrequently used. For example, the Temporary Internet Files folder is a higher risk than the My Documents folder.

File Size

Relatively small executable files executed are more suspicious than a large executable files since propagating malware does not want to draw attention to itself by transferring large files. Sending a large number of emails with a relatively small attachment is much more practical. Kaspersky states that files sent out in this manner are on the order of 50-100 kilobytes (which, if packed, reduces to something on the order of 20-50 kilobytes).

Installer File

Malware often propagates by sending out small installer files that when executed triggers a process of downloading a much larger malware payload from a web server or a file server on the Internet.

Digital Signature

Files that are digitally signed are less likely to contain malware than unsigned files.

Surprisingly the patent also mentions the possibility of alerting the user with a popup that gives them the option to skip or minimize the scan of the unknown file. The patent states that "as yet a further option, the user can manually choose to run some of the anti-virus scans in the background after the new software has been launched, but not necessarily the entire spectrum of available technologies, which obviously increases the risk that a virus can infect the computer" (italics added).

Risk supporting Business

The idea of Kaspersky is to vary malware scanning sophistication based on well-defined risk-factors. Presumably they have a sufficiently large data set on these risk factors to facilitate a transformation into hard (numeric) decision criteria. The patent does not describe how the various risk factors will be combined to produce a risk decision, but much tuning will be required.

Note that the purpose of the patent is not to reduce the risk of being infected by malware but to provide (security) risk support for the decision to improve user response times by reducing the overall effort for malware scanning. And this is clearly the task of the IT Security Risk function - balancing business and security requirements using risk analysis.

But to be clear, we don't get something for nothing, and the likelihood of being infected by malware will actually increase simply because less scanning will be done and the risk factors will not correlate perfectly with the presence of malware. And this is the next important responsibility of the IT Security Risk function - to make business aware of the residual risk in following the Kaspersky approach and getting acceptance for this risk. Hopefully Kaspersky will provide some data to help here.

I posed the question on LinkedIn whether people would support deploying this type of AV, and the resounding answer was no.

The Positive Trust Model and Whitelists

No Tricks

Friday, November 27, 2009

The Other Google Desktop

Tuesday, March 31, 2009

Randomness Tests for Packed Malware

Sunday, March 15, 2009

The Positive Trust Model and Whitelisting

Wednesday, October 22, 2008

Risk Factors for AV Scanning

About Me

Quick Links

ALL POSTS

Search This Blog

Blog Archive

Lijit Search

Labels

Apture

No Tricks

Friday, November 27, 2009

The Other Google Desktop

Tuesday, March 31, 2009

Randomness Tests for Packed Malware

Sunday, March 15, 2009

The Positive Trust Model and Whitelisting

Wednesday, October 22, 2008

Risk Factors for AV Scanning

About Me

Quick Links

Subscribe

ALL POSTS

Search This Blog

Blog Archive

Lijit Search

Labels

Apture