Click in to Chinese tech tsunami
DeepSeek — a Chinese AI startup — has sent shockwaves through Silicon Valley and the US where AI innovation & investment have dominated for years by successfully creating a model that rivals OpenAI;
Chinese Artificial Intelligence (AI) company DeepSeek has sent shockwaves through the tech community, with the release of extremely efficient AI models that can compete with cutting-edge products from US companies such as OpenAI and Anthropic. Founded in 2023, DeepSeek has achieved its results with a fraction of the cash and computing power of its competitors.
There are reasons to believe that DeepSeek’s creator, a Chinese company by the same name, has exaggerated its thriftiness — more on that shortly — but ignoring the global freak-out it has triggered would be a mistake. DeepSeek rocketed to number one in app stores on the weekend and wiped more than half a trillion US dollars off the world’s largest company and leading AI chip supplier, Nvidia, in very recent trading. It was a public routing of the world’s hitherto unshakable confidence in Silicon Valley’s best labs, which have always insisted that the only path to a shining AI future is paved with as many expensive computer chips as possible.
DeepSeek now appears to have debunked one of the tech world’s holiest scriptures, achieving similar success with far fewer and far older chips. It could possibly mark the end of US dominance of the AI race, according to experts. Some (though not all) of DeepSeek’s code and technical explanation is open source, meaning it can be viewed, downloaded and used by anyone. DeepSeek’s “reasoning” R1 model, released last week, provoked excitement among researchers, shock among investors, and responses from AI heavyweights. The company followed up on January 28 with a model that can work with images as well as text.
In December, DeepSeek released its V3 model. This is a very powerful “standard” large language model that performs at a similar level to OpenAI’s GPT-4o and Anthropic’s Claude 3.5. While these models are prone to errors and sometimes make up their own facts, they can carry out tasks such as answering questions, writing essays and generating computer code. On some tests of problem-solving and mathematical reasoning, they score better than the average human.
V3 was trained at a reported cost of about USD 5.58 million. This is dramatically cheaper than GPT-4, for example, which cost more than USD 100 million to develop. DeepSeek also claims to have trained V3 using around 2,000 specialised computer chips, specifically H800 GPUs made by NVIDIA. This is again much fewer than other companies, which may have used up to 16,000 of the more powerful H100 chips. On January 20, DeepSeek released another model, called R1. This is a so-called “reasoning” model, which tries to work through complex problems step by step. These models seem to be better at many tasks that require context and have multiple interrelated parts, such as reading comprehension and strategic planning. The R1 model is a tweaked version of V3, modified with a technique called reinforcement learning. R1 appears to work at a similar level to OpenAI’s o1, released last year.
DeepSeek also used the same technique to make “reasoning” versions of small open-source models that can run on home computers. This release has sparked a huge surge of interest in DeepSeek, driving up the popularity of its V3-powered chatbot app and triggering a massive price crash in tech stocks as investors re-evaluate the AI industry. At the time of writing, chipmaker NVIDIA has lost around USD 600 billion in value.
If anything, experts believe the US government is likely to redouble its efforts. “In geopolitical terms, the US is likely to want to keep ahead of China,” Jeannie Paterson from the University of Melbourne has been quoted as saying in a report. “I don’t think it will quell AI investment,” she says, although she warns a race will come with risks of its own. “What we should be talking more about is whether this is the end of responsible and safe AI.” Even prior to DeepSeek’s breakthrough, US President Donald Trump was cutting the brakes on development, repealing a Biden executive order on AI safety in his first week.
On the other hand, DeepSeek has been subjected to a series of sophisticated and large-scale cyberattacks over the past month, according to XLab, a Chinese cybersecurity firm. The attacks, which began in early January, have escalated significantly in both scale and complexity, posing unprecedented challenges to DeepSeek’s operations and data security, experts from the XLab have been quoted as saying and they have warned that the attacks are expected to continue in the future. The lab said there are still HTTP proxy attacks targeting DeepSeek. The monitored source IPs range from hundreds to thousands, most of which are located in the US, Singapore, the Netherlands, Germany, and domestically, according to XLab. DeepSeek is an open-source platform, which means software developers can adapt it to their own ends. It has sparked hopes of a new wave of innovation in AI, which had appeared to be dominated by US tech companies reliant on huge investments in microchips, data centres and new power sources.
Some people testing DeepSeek have found that it will not answer questions on sensitive topics such as the Tiananmen Square massacre. When asked about the status of Taiwan, it repeats the Chinese Communist Party line that the island is an “inalienable” part of China.
One user, Azeem Azhar, an AI expert, asked about the events in Tiananmen Square and was told that DeepSeek could not provide detailed information and that “this topic is highly sensitive and often censored in many countries, including China”.
However, the AI then did explain that the events were “widely recognised as the suppression of pro-democracy protests” and said: “The Chinese government responded with a violent crackdown, resulting in the deaths of hundreds (or possibly thousands) of people, including both protesters and soldiers.”
People use AI models such as DeepSeek and ChatGPT to help them process personal papers or documents for work, such as meeting minutes, but anything uploaded can be taken by the owner of the company and used for training the AI or for other purposes.
DeepSeek is based in Hangzhou and makes clear in its privacy policy that the personal information it collects from users is held “on secure servers located in the People’s Republic of China”.
It says it uses data to “comply with our legal obligations, or as necessary to perform tasks in the public interest, or to protect the vital interests of our users and other people”.
China’s national intelligence law states that all enterprises, organisations and citizens “shall support, assist and cooperate with national intelligence efforts”.
DeepSeek’s breakthroughs have been in achieving greater efficiency: getting good results with fewer resources. In particular, DeepSeek’s developers have pioneered two techniques that may be adopted by AI researchers more broadly.
The first has to do with a mathematical idea called “sparsity”. AI models have a lot of parameters that determine their responses to inputs (V3 has around 671 billion), but only a small fraction of these parameters is used for any given input.
However, predicting which parameters will be needed isn’t easy. DeepSeek used a new technique to do this and then trained only those parameters. As a result, its models needed far less training than a conventional approach.
The other trick has to do with how V3 stores information in computer memory. DeepSeek has found a clever way to compress the relevant data, so it is easier to store and access quickly.
DeepSeek has shaken up the multi-billion dollar AI industry and the models and techniques have been released under the free MIT License, which means anyone can download and modify them.
While this may be bad news for some AI companies — whose profits might be eroded by the existence of freely available, powerful models — it is great news for the broader AI research community.
At present, a lot of AI research requires access to enormous amounts of computing resources.
For consumers, access to AI may also become cheaper. More AI models may be run on users’ own devices, such as laptops or phones, rather than running “in the cloud” for a subscription fee.
For researchers who already have a lot of resources, more efficiency may have less of an effect. It is unclear whether DeepSeek’s approach will help to make models with better performance overall, or simply models that are more efficient.
Even within the Chinese AI industry, DeepSeek is an unconventional player. It started as Fire-Flyer, a deep-learning research branch of High-Flyer, one of China’s best-performing quantitative hedge funds. Founded in 2015, the hedge fund quickly rose to prominence in China, becoming the first quant hedge fund to raise over 100 billion RMB (around USD 15 billion). (Since 2021, the number has dipped to around USD 8 billion, though High-Flyer remains one of the most important quant hedge funds in the country).
For years, High-Flyer had been stockpiling GPUs and building Fire-Flyer supercomputers to analyse financial data. Then, in 2023, Liang, who has a master’s degree in computer science, decided to pour the fund’s resources into a new company called DeepSeek that would build its own cutting-edge models — and hopefully, develop artificial general intelligence.
Today, DeepSeek represents a new generation of Chinese tech companies that prioritise long-term technological advancement over quick commercialisation.
DeepSeek’s willingness to share these innovations with the public has earned it considerable goodwill within the global AI research community. For many Chinese AI companies, developing open-source models is the only way to play catch-up with their Western counterparts, because it attracts more users and contributors, which in turn helps the models grow. They’ve now demonstrated that cutting-edge models can be built using less, though still a lot of, money and that the current norms of model-building leave plenty of room for optimisation. Having said that, moving forward, it’s almost certain that a lot more attempts will be seen in this direction, ushering in an era in which American supremacy will be challenged and confronted at every go.
Views expressed are personal