Hi AastaLLL, thanks again! I might check those frameworks, but I’m quite happy to report that I have MLC and TVM built from source on the Jetson. Running Hermes-3-Llama-3.2-3B-q4f16_1-MLC with context size 1024 and prefill chunk size 512 gives me a very snappy result (and 4 GB RAM still available!):
Hello there!
Greetings! I am Hermes 3, an AI designed to assist and serve you in any way I can. How may I be of service today?
/stats
prefill: 39.2 tok/s, decode: 27.5 tok/s
Can you tell me what you know about quantum layers?
Quantum layers are a concept in quantum computing. They refer to the layers of qubits (quantum bits) that are used to construct quantum circuits. Each layer represents a different quantum state, and these states are manipulated through various quantum operations to perform computations.
In a quantum computer, data is stored and processed in the form of quantum bits, which are the fundamental units of quantum information. These bits can exist in a superposition of states, allowing them to be in a multitude of states simultaneously until measured. This property is what gives quantum computers their power.
Quantum circuits are composed of quantum gates, which are the building blocks for performing quantum logic operations. These operations are applied to the qubits across different layers of the quantum circuit. The layers are typically arranged in a hierarchy, with each subsequent layer being more complex and entangled than the previous one.
The number of layers in a quantum circuit is often referred to as the “depth” of the circuit. The depth is a measure of how many quantum operations are applied to a qubit before a measurement is made. As the depth increases, the quantum computer can perform more complex calculations.
Quantum layers allow quantum computers to simulate quantum systems and perform certain computations much more efficiently than classical computers. However, they also pose challenges in terms of error correction and scaling up to larger numbers of qubits.
I can provide you with more information on the specific quantum layering techniques used in various quantum computing architectures if you’re interested.
/stats
prefill: 108.2 tok/s, decode: 28.4 tok/s
I was actually pretty surprised to see the speed, and that’s without too much tuning (just enough to get it in memory comfortably)!
Mistral-7B-Instruct-v0.3-q4f16_0-MLC runs in about 6GB ram and is usable, but the challenge with this model isn’t so much the RAM, but preventing the GPU to start throttling due to heat :-)
Hello there!
Of course! I’m here to help you, provide useful information, and create a positive environment. I’m committed to respecting and caring for all users. Let’s have a pleasant, insightful, and ethical conversation! How can I be of assistance today?
/stats
prefill: 11.1 tok/s, decode: 11.2 tok/s
Can you tell me about the quantum layers?
Absolutely! Quantum layers refer to the subatomic particle layers within an atom. Atoms have a nucleus, which is made up of protons and neutrons, surrounded by electrons that occupy energy levels or atomic orbitals.
There are three main quantum numbers associated with electrons:
-
Principal quantum number (n): Determines the energy level or shell where the electron resides. The shells are numbered from 1 (K-shell) to 7 (Q-shell) in increasing order of energy level.
-
Azimuthal quantum number (l): Determines the shape of the orbital and the number of sublevels within that energy level. It ranges from 0 to (n - 1), with sublevels labeled s, p, d, f, etc., in the following order: s (0), p (1), d (2), f (3), g (4), h (5), and i (6).
-
Magnetic quantum number (ml): Determines the orientation of the sublevel within the specific shell. It ranges from -l to +l, and the resulting values are used to label the specific orbitals within a shell.
For example, in the 2nd energy level (n = 2), we have two sublevels labeled s and p. The s-sublevel has only one possible magnetic quantum number (ml = 0). The p-sublevel has three possible magnetic quantum numbers (ml = -1, 0, 1) and results in three p-orbitals with different orientations: dxy, dxz, and dyz.
The quantum numbers are fundamental to understanding the properties of atoms, making quantum physics a fascinating field of study! If you’d like more information on a specific aspect of quantum mechanics, I’d be happy to elaborate.
/stats
prefill: 31.1 tok/s, decode: 11.0 tok/s
[edit]
Tweaking some settings, the mlc config and managing KV cache via a Python3 script has Mistral 7B Instruct 0.3 q4f16 running comfortably. Temps are < 50 celsius with the stock cooling on power level MAXN, and TPS is between 12 and 15 celsius. Of the 8GB shared ram, only 5GB is in use.
——
I’ve written all the steps, dependencies etc. to achieve this in a document that I want to refactor into a neater format and publish on GitHub for others to use/learn from. I’ll add a link here if that’s done. Thanks for the help so far!