The SOC concept and its evolution
I. What is a SOC ?
In companies, the SOC, which stands for Security Operation Center is the service designed to defend against cyber-attacks. It regroups all security experts from the company in the same place and uses a great variety of tools and processes to detect and respond to malicious activity hiding in network traffic, data, informatic equipment or even in the behavior of user accounts, in order to protect the company’s assets and image.
The SOC has several key objectives to successfully carry out its mission, which can be summarized in these 4 points:
- Prevent attacks before they occur whenever it’s possible.
- Embody the last line of defense by detecting attacks as they occur through filtering, analysis and correlation.
- Investigate attacks to understand their sources, causes and limit their damage.
- Manage the compliance of the company’s information systems.
A detailed list can be found in this MCAfee article.
Security Operation Centers were born around 1975, at about the same time as the creation of the Internet we all know. HP described multiple generations of SOC, advancing to adapt to the evolution of cyber threats to date.
They range from the simple detection of non-repeatable attacks using basic antivirus and firewalls to the analysis of highly sophisticated threats within a large volume of data collected across the whole enterprise.
C. How does a SOC deal with security incidents?
Apart from preventing attacks proactively by performing threat hunting and cybersecurity watch, here is how a SOC operates. First, it needs be supplied with feeds from the important parts of the company, such as event logs, telemetry data, assets vulnerability scans and other diverse sources. For that purpose, the SOC depends on various equipment like firewall appliances, switches, web application firewalls, proxies, Intrusion and prevention systems, monitoring software or audit tools.
All the information entering the SOC is filtered, inspected, and sometimes merged, mostly by a SIEM (Security Information Management System), configured with rules that discard all of the irrelevant parts and then perform analysis to find events and patterns corresponding to potential threats. The resulting output can then either be processed automatically or passed on to Level 1 analysts.
Level 1 analysts must determine if the threats are either real to send it to L2 analysts or just false positives that require rules adjustments. In the event of an escalation, cases can be created where the events are enriched and scored.
Level 2 Analysts, among other tasks, investigate the detected security problems, track trends in the reported events, maintain the detection rules up to date and help the L3 analysts in the case of a cyber-security crisis.
Level 3 analysts, sometimes referred as threat hunters or responders, must perform at the highest level of agility in order to increase the efficiency of the SOC. They must deal with crisis, extend the security perimeter, realize deep malware and perform forensic analysis. They are often specialized in a particular field like network, endpoint or cloud.
SOC Managers are in chge of the operation and have to make sure that processes and Service Levels of Agreement are respected. They directly report to the CISO.
Compliance Auditors, which can take other names, check if the operation complies with norms and policies.
SIEM Engineers can sometimes be found in large operations and are in charge of optimizing rules and other SIEM-related processes, mainly at the beginning of implementation.
At last, the CISO (RSSI in French) is responsible for the company’s whole security so the SOC falls under his oversight with the definition of policies, strategy, procedures but also compliance and risk management.
D. CSIRT and CERT
A SOC can sometimes be named a CSIRT or a CERT but it is not exactly the same thing.
Most of the time, a CSIRT (Computer Security Incident Response Team) encompasses the Level 2 and 3 of the SOC but it can also work independently from it, when this is the case, the soc generally deals with level 1 and the CSIRT responds to confirmed incidents. The CSIRT is intended to limit the damage caused by security incidents in a company and may sometimes employ personnel outside of the technical or even IT field.
CERT (Computer Emergency Response Team) is a registered trademark of the Software Engineering Institute at Carnegie Mellon University in Pennsylvania and thus a CSIRT may apply for membership to be called a CERT. All CERTs are independent so there is no benefit to being called a CERT except the name, with the exception that sometimes, a CERT will sometimes be more willing to exchange knowledge with another CERT than with a CSIRT.
For instance, the French government has a CERT operated by the ANSSI named CERT-FR.
E. Common SOC Tools
SIEM: As mentioned before, the Security Information Management System has an important place in the SOC, by analyzing all kinds of logs and data, it can create dashboards, alerts and reports from every detected security problem. Good examples are Solarwinds SIEM, IBM QRadar, Splunk Enterprise Security, Elastic Suite/ELK, Micro Focus ArcSight or LogRhytm but there are many others.
Processes: The SOC uses processes to deal with incidents as fast as possible. An interesting one is the playbook. A playbook contains one or more instructions, used to either guide a human to rapidly react to a known threat pattern or to be used by automation tools like SOARs, that we will see in the last part of this document, to deal with incidents automatically. It can be used to perform a large variety of actions like scan files or system processes, deactivate users, shutdown access ports or simply escalate the issues to the level above.
IPS/IDS/EDR/NDR: Intrusion Detection Systems are used to detect threats on endpoints, servers, network equipment and virtual or cloud-based systems. Intrusion Prevention Systems can push further by automatically performing tasks with elevated privileges, directly on the systems they are installed on to try to stop the attacks before they cause arm and spread. The terms EDR and NDR for Endpoint Detection and Response and Network Detection and Response are often used to refer to specialized IPS for endpoints and network, which can both use artificial intelligence instead of signature check only to detect threats in their dedicated fiels.
Asset managers and vulnerability scanners: Apart from treating the occurring security events, the SOC most prevent incidents from happening in the first place. Asset management software like ServiceNow ITSM or Nifty are a great way of automatically detecting and referencing all devices or listening ports connected to the network. Vulnerability scanners are then used to find unpatched vulnerabilities and generate reports. Some examples are OpenVAS or Nessus for servers and computers and Pacu or Prawler, more focused on the cloud.
Penetration testing tools: Penetration testing tools are often used outside of the SOC to simply confront the best defended systems against controlled real-world threats. The SOC can rely on the tested vulnerabilities to further improve detection mechanisms and defenses. The most popular tool is called Kali Linux and takes the form of an operating system, supplied with a lot of community-based penetration testing framework and opensource software.
Cyber threats feeds and databases : Feeds and shared databases are some of the most important tools used by the SOC. It provides a constant source of emerging threats, discovered by the company itself or partners, that are then directly used by the SIEM or IDS/IPS and can also warn the company employees about new breaches. Such feeds contain Cyber Threat Intelligence. CTI helps to identify malicious activities by learning where they come from, how they operate and what traces they leave. These characteristics are called IOCs for Indicators of Compromise and can take various forms such as file hashes, domain names, IP addresses, changes in the registry or even services. They are used to detect threats at the time they act or afterward to facilitate remediation. Examples of feeds are Anomali for new Threats or Qualys for Vulnerabilities.
Investigation tools: Investigation tools are often used after a breach have been identified to try to understand how the malware is operating, if it spread and what damages it caused. They can be frameworks to help qualify the attack like MITRE ATT&CK or specialized in a particular field like Wireshark for network, Autopsy for systems, FTK Manager for disks or Dumpzilla for browsers history.
GRC systems: More and more, the Governance, Risk and Compliance aspect of cybersecurity is becoming an integral part of the SOC. The GRC tools help measuring the level of compliance with every policy by coordinating controls and audits and obtain certifications of industry standards such as PCI-DSS for example. Those tools can be used outside of the IT spectrum too to lower all sorts of risks. Some examples of such tools are IBM OpenPages, ServiceNow GRC or BSSI MyCyberEyes.
II. Should compagnies have a SOC?
A. Computerization and associated threats
To date, there are several reasons that make IT security one of the most important aspects when building an IS. Firstly, all sectors, such as communication, data storage, transport, health, education, agriculture, or industry are almost completely computerized, connected and therefore the target of attacks.
Secondly, threats are becoming increasingly complex with the involvement of highly organized groups and state-sponsored entities, so that simple detection mechanisms based on signatures or behavioral analysis are no longer sufficient to contain threats.
There are many statistics on this subject, but here are some interesting ones:
- In 2019, companies took an average of 206 days to detect cyber-attacks and 73 days to contain them, which means that the attacks managed to bypass all defense mechanisms and data might have been exposed during those 279 days. (IBM)
- In 2019, 88% of organizations worldwide experienced spear phishing attempts, meaning that the phishing mails were designed specifically for each of them. (Proofpoint)
- In 2017, 68% of small companies and 72% of bigger ones suffered at least one cyber-attack in the US, which is one of the most targeted country in the world. (Hiscox)
- In 2020, data breaches have costed businesses an average of $3.86 million (IBM), where ransomware attacks an average of $732.520, in addition to the price of the ransom. (Sophos)
Therefore, having a team of experts ready to act with tools that analyze trends to alert or act automatically in case of anomalies has become essential today.
B. When to have a SOC?
It is estimated that only a third of all enterprises have a SOC (Kaspersky), because for a long time, having one has been considered costly and reserved for big companies, but things are changing. As for now, no business can work without informatic and everyone now owns valuable data from customers, partners, projects, banking establishments and many other sources.
Moreover, cyber-attacks, especially with the emergence of RaaS (Ransomwares as a Service), now target small, medium and large businesses equally, so it’s no longer a question of “if”, but “when” will a business be targeted. Consequently, each one should rely on a SOC to defend against threats.
One important thing to keep in mind is that having a SOC won’t solve every security problem. Companies must first grow in maturity in terms of process organization for cybersecurity issues before tackling the question of the SOC, which should be based on well-established management. The SOC is not a support function of the information system, but its client.
C. Then where to start?
Companies generally have 2 choices to begin with, Internal or Outsourced, which can then be mixed and tweaked.
Internal: By opting for an internal SOC, there are several advantages such as storing all data internally, greater responsiveness in the event of a crisis, greater customization of tools and better communication. But there are also disadvantages such as the cost of the operation, the recruitment of qualified staff, the increase in maturity which can take a long time, the maintenance of the team’s skills up to date or the lack of documentation and processes.
Outsourced/External/Managed: The other possibility, which is by far the most popular, is to entrust this task to a Managed Security Service Provider (MSSP) or a Managed Detection and Response (MDR) Partner, meaning sending data and logs to another company so that its team can analyze and react to and advise about identified problems. The main advantage is the lower and more transparent price, the company also benefits from a highly experienced team, a high level of service, often 24/7, short SLAs for incident resolution, simple implementation, already established tools to facilitate compliance and risk management, easy access to threat intelligence and a probable image gain that can reassures partners. In this case, the disadvantages are the data leaving the company that can be problematic, the difficult reversibility of the process in terms of administration and the time needed for the provider to fully understand how the company operates.
Hybrid: By mixing Internal and Outsourced resources, it is possible to own an internal SOC and fill some gaps like night shifts, maintenance, technologies, or intelligence with the help of a partner.
Hub and spoke: A popular method among big organizations is to have a large central SOC, the Hub, monitoring and helping smaller SOCs, the Spokes, distributed in the different branches of the company.
D. Financial aspect
From a financial point of view, a 2020 Ponemon Institute study, conducted on 637 professionals, revealed that the average maintenance cost of an internal SOC for a company with between 1,000 and 5,000 employees is as high as $1.68 million. Interestingly, the ROI of an outsourced SOC decreases as the company grows, with a higher average cost for the outsourced one than for the internal one, if we take into account all the companies surveyed ($2.86 million versus $4.44 million).
It may therefore seem complicated for some businesses to adopt the use of a SOC in their budget, even if it is outsourced, but it should not be seen as expanse but rather as a stable long-term investment, increasing in value by simply preventing other assets from losing theirs, or in extreme cases, from losing everything.
E. How to implement an internal SOC?
If, despite the many disclaimers, you still decide to implement an internal SOC, here are the 5 key elements you will need to develop to do so:
Organization: The first step is to ensure that all decision-makers understand why the SOC is being set up, decide on its size, the precise perimeter it should monitor, what access rights it should have and whether it is operated 24/7 or only during office hours. The teams responsible for each part of the perimeter to be monitored should also be notified.
The key to this part is to start small and aim for quality.
Qualified personnel: It is then necessary to find qualified staff, made up of professionals who are passionate about their job and capable of adapting to each task of the SOC and making it evolve.
The aim here is the same as before, favor quality over quantity.
Adapted tools: The next step is to choose the right tools.
The first one that is essential is the SIEM, there are paid, or free ones and it should be noted that paid SIEMs benefit from a better support and often from many plugins allowing to facilitate the work of the analysts, whereas free ones are often more difficult to implement and maintain.
The second essential tool is the probe to collect logs, it is possible to start by collecting the logs of different systems from their internal probes and with a network IPS that is often found in firewalls. It is then possible to add EDRs and other types of IPS, while trying to maximize de coverage of each of those probes.
The third essential tool is a governance, risk and compliance system, associated with a vulnerability scanner to have a clear vision of the company’s various assets.
Finally, a source of cyber security intelligence is needed to feed the tools and analysts. The SOC can then produce and trade Intelligence with other SOCs.
The goal here is to perfect and understand the edges of a few tools, expand only when it is necessary and most importantly, protect each one of it.
End-to-end processes and communication: A very important but often underestimated step is the creation of processes. Processes should be created and refined over time for each SOC’s task and include associated communication.
The aim here is to leave no grey areas and to standardize and facilitate the management of notifications, alerts, incidents, and security crisis.
Maintenance and evolution: The final step in the implementation of the SOC is continuous improvement. The SOC must evolve to cope with the increase in the number and complexity of threats and also the volume of logs processed. To do this, it can for example modernize its tools by implementing a SOAR, review the detection rules to reduce false positives, improve its processes by creating reflex cards or procedures, and train the members of its team.
The aim of this stage is to always gain in agility and quality over time.
III. The future of SOCs
Mainly due to the rapid expansion of threat sources and amplified by the covid-19 crisis which has forced many employees to work from home, as in the US for 42% of the population (Stanford News), increasing the opportunities of phishing campaigns being successful, and added to the number of qualified cybersecurity professionals becoming less and less sufficient in relation to the demand, the current situation makes the creation of internal SOCs more and more difficult and therefore many companies choose to outsource it.
B. Increase of Attack Surfaces & Data Volume
As seen previously, there is an increase in the attack surface, but also in the volume of data to be processed by SIEMs, due to teleworking employees using VPN, the relocation of systems to the cloud and the increase in the number of connected devices. Logs slowly become a problem, and this for two reasons. The first one being the price of cloud stored logs which can be very expansive when overlooked and the second being that SIEM editors charge based on the amount of data analyzed. Moreover, packages are often designed for large structures, so small companies that want to build a SOC will probably continue to use on-premise free solutions such as Elastic Suite or OSSEC for a long time.
C. Automation with SOARs
SIEMs were created to improve the quality of life of cybersecurity analysts but even perfectly configured, pure SIEMs simply detect events but do not automatically process them. This systematically requires human intervention, which cannot be as fast as machines. This is where the Security Orchestration, Automation and Response comes in. SOARs not only automate the processing of incidents but also simplify their management when they are complex and require the intervention of analysts.
Within it, it is possible to create lists of automatic instructions called playbooks but also cases that allow all the elements of an incident to be grouped together in the same place and to be enriched with more information with an associated ticketing service, making the operation much more reactive.
SOARs, previously named SIRPs for Security Incident Response Platforms, are rather costly and difficult to configure, and solutions that can do both SIEM and SOAR are not common for the time being, but it might change in the future.
In a feedback report, Orange Cyber Defense explains that the integration of a SOAR takes on average between 6 and 9 months. Playbooks are often used to sort and extract phishing emails in very large numbers and also to qualify, enrich, analyze and find IOCs (Indicator of Compromise) automatically from SIEM alerts or from other probes. Another interest of the SOAR is “Retro-Hunting” which consists in searching for compromises in the log history in order to, for example, detect a theft of identifiers.
Difficulties encountered include the struggle to find qualified people to deploy and maintain the SOAR, the fear that potentially damaging actions may be performed automatically and the time to integrate and make the solution profitable which can be extremely long.
Finally, OCD recommends having each step of the deployment well validated on governance and project management aspects because many different teams are involved in the project and the SOAR needs to obtain high privileges on many parts of the infrastructure, but also to prioritize a small number of qualitative playbooks requiring manual validation by an analyst before fully automating them.
In terms of numbers, from the same article, Gartner predicts that 30% of enterprises with a cyber security department of at least 5 people will be equipped with a SOAR by 2022.
D. Artificial intelligence
One of the big challenges of SIEM is the management of false positives and duplicate alerts. A false positive is an alert created when there is no real security problem, which will require the attention of an analyst. Duplicates are easier to understand but are also a big problem because they take time to be detected. A FireEye report conducted on more than 500 companies in 2014 indicates that 37% of them were already receiving 300 alerts per day or 10,000 per month and that of these alerts, 52% were false positives and 64% were redundant alerts.
An interesting way to fight against this type of alerts is to use Machine Learning. This process, based on probabilities, can understand beyond human abilities what is normal or not, like the volume of traffic supposed to be observed (Network Traffic Analysis), and what users typically do (User and Entity Behavior Analytics) in the company at each period of the day and detect anomalies. It can also use information that has nothing to do with IT and correlate it in order to understand certain rare events such as a connection of the CEO of the company in China at midnight because he is on a business trip. By using such capabilities, it is possible to drastically reduce the number of irrelevant alerts, freeing up time for analysts but also automate tasks with more confidence and incorporate much more best practices in its incident management.
E. Smarter Detection and Response
Speaking of machine learning, one of the fast-growing technologies based on it is XDR for eXtended Detection and Response. This is an improvement of EDRs and NDRs that applies not only to endpoints and networks but to all the attack surfaces of the company such as cloud services and messaging. It enables correlation and threat-hunting between these different technologies to prevent attacks automatically before they occur, while fully integrating with a SOAR, which ultimately reduces the number of alerts to be processed in the SOC.
The term MDR for Managed Detection and Response is often used to refer to an XDR managed by an external SOC.
F. The SOAPA
SOAPA (Security Operations and Analytics Platform Architecture) is a yet to be accepted industry standard for SOC architectures that brings together all the points discussed above to try to orchestrate and automate them as much as possible:
- Security Equipment
- Machine Learning
- Threat intelligence
- Security asset managers and vulnerability scanners
It is a model that uses standard languages such as STIX (Structured Threat Information Expression) and TAXII (Trusted Automated eXchange of Indicator Information) to combine logs from various sources more efficiently and better contextualize it. It relies more heavily on EDR/XRD to handle threats, offers an integrated ticketing tool to track incidents throughout their lifecycle and APIs to connect everything.
SOAPA consists of 4 layers:
- Common distributed data service that creates a data pipeline to accommodate logs of all kinds and make them available for a long time.
- Software services and integration layer that links the data to the analysis engines by transforming the logs into compatible formats.
- Analytics layer that performs all analyses using artificial intelligence.
- Security operations platform layer which provides an interface to the analysts and offers programmable and automatable tools.
IBM QRadar coupled with IBM Resilient is the only SOAPA in existence today, but other SIEM developers, especially Splunk, also tend to adopt this model.
A. Increasing complexity
As can be seen in most threat landscapes and in the news, the complexity of malware is constantly increasing, and the number of phishing emails is growing every year. No company is safe any longer and even the most demanding of them in terms of cybersecurity can come under attack with, for example, the emergence of supply-chain attacks. In a Deepinstinct report, between 2019 and 2020, the number of new ransomware samples increased by 435% and by 1061% for crypto miners. To face those constant threats, the tools used must be more and more sharpened and automated, the teams in charge of defense must be faster and faster, and important means are consequently necessary to keep up the pace.
Building an SOC is not simple, and the amount of work involved in its preparation, implementation and maintenance is often underestimated. The project should not be too ambitious at the beginning because the team in charge can quickly find itself overwhelmed. In another Ponemon Institute study from 2019, conducted on more than 550 professionals, 58% of those surveyed said that their SOC is not efficient enough to find evidence, investigate and discover the sources of threats, due to a lack of visibility into their infrastructure or slow response times to incidents.
Another well-known problem, often found in SOCs where there is little automation, is “alert fatigue”, caused by the increase in the number of alerts and the time required to process them, which leads to high analyst turnover and makes them even more difficult to recruit. In the same study, 73% of the respondents said that the increased workload can make their job painful.
C. Different approaches
Tackling the issue of the SOC quickly forces companies to make a choice, whether internal or external. Both solutions come with advantages and disadvantages. On one hand, independence but with the difficult choice of technologies and the problem of financing. On the other hand, simplicity but with a loss of visibility and sovereignty over its data. In both cases, if the transition is properly managed, it provides much better visibility on cyber threats and unlocks access to very important industry standard certifications such as ISO27001, PCI-DSS or SCEE GPG53, which are in most cases necessary for partner trust.
Consultant Confirmé BSSI by EVA Group