Skip to content

Conversation

@Xarbirus
Copy link
Contributor

I would like to add several fixes for building the repo on Windows ARM, as well as a CI runner for automated build control.

  • fix error C2078: too many initializers with ggml_vld1q_u32 with a macro for MSVC ARM64
  • fix error C2065: '__fp16': undeclared identifier
  • suppress warning C4146: unary minus operator applied to unsigned type, result still unsigned

#define GGML_F16x8_ZERO vdupq_n_f16(0.0f)
#define GGML_F16x8_SET1(x) vdupq_n_f16(x)
#define GGML_F16x8_LOAD(x) vld1q_f16((const __fp16 *)(x))
#define GGML_F16x8_LOAD(x) vld1q_f16((const ggml_fp16_internal_t *)(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would call vld1q_f16 with uint16_t * argument.
Does this produce correct results?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this produce correct results?

Hm, yes - no reason not to.

Wouldn't it be better if instead of introducing ggml_fp16_internal_t, we simply change these to:

diff --git a/ggml.c b/ggml.c
index 80efa6f2..ac0d15b2 100644
--- a/ggml.c
+++ b/ggml.c
@@ -857,7 +857,7 @@ inline static float vaddvq_f32(float32x4_t v) {
     #define GGML_F16x8              float16x8_t
     #define GGML_F16x8_ZERO         vdupq_n_f16(0.0f)
     #define GGML_F16x8_SET1(x)      vdupq_n_f16(x)
-    #define GGML_F16x8_LOAD(x)      vld1q_f16((const __fp16 *)(x))
+    #define GGML_F16x8_LOAD(x)      vld1q_f16((const uint16_t *)(x))
     #define GGML_F16x8_STORE        vst1q_f16
     #define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
     #define GGML_F16x8_ADD          vaddq_f16
@@ -900,7 +900,7 @@ inline static float vaddvq_f32(float32x4_t v) {
     #define GGML_F32Cx4              float32x4_t
     #define GGML_F32Cx4_ZERO         vdupq_n_f32(0.0f)
     #define GGML_F32Cx4_SET1(x)      vdupq_n_f32(x)
-    #define GGML_F32Cx4_LOAD(x)      vcvt_f32_f16(vld1_f16((const __fp16 *)(x)))
+    #define GGML_F32Cx4_LOAD(x)      vcvt_f32_f16(vld1_f16((const uint16_t *)(x)))
     #define GGML_F32Cx4_STORE(x, y)  vst1_f16(x, vcvt_f16_f32(y))
     #define GGML_F32Cx4_FMA(a, b, c) vfmaq_f32(a, b, c)
     #define GGML_F32Cx4_ADD          vaddq_f32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll get some warnings (like incompatible pointer types assigning to 'const __fp16 *' from 'const uint16_t *') from GGML_F16_VEC_LOAD in this case.

But if it's ok, I'll commit the change.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it like proposed

@ggerganov ggerganov merged commit 3202361 into ggml-org:master Mar 11, 2024
NeoZhangJianyu pushed a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024
* windows arm ci

* fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64

* fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned`

* fix `error C2065: '__fp16': undeclared identifier`
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* windows arm ci

* fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64

* fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned`

* fix `error C2065: '__fp16': undeclared identifier`
@Xarbirus Xarbirus deleted the windows-arm-runner branch April 17, 2024 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants