-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: skl enable the core and get ROM init #120
ASoC: SOF: skl enable the core and get ROM init #120
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall needing more comments explaining what each block does.
sound/soc/sof/intel/hda-loader.c
Outdated
|
||
// skl/kbl enable core and code loader DMA has some difference with apl/cnl | ||
// add the APIs for skl/kbl | ||
static int cl_stream_prepare_skl(struct snd_sof_dev *sdev, unsigned int format, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed to set tag to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is for debug, this function will be implement in next patch. and if not set a value, checkpatch will give a warning.
sound/soc/sof/intel/hda-loader.c
Outdated
@@ -372,3 +372,148 @@ int hda_dsp_cl_boot_firmware(struct snd_sof_dev *sdev) | |||
dev_err(sdev->dev, "error: load fw failed err: %d\n", ret); | |||
return ret; | |||
} | |||
|
|||
// skl/kbl enable core and code loader DMA has some difference with apl/cnl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always use C style comments /* comment */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will change that.
sound/soc/sof/intel/hda-loader.c
Outdated
} | ||
|
||
/* prepare DMA for code loader stream */ | ||
tag = cl_stream_prepare_skl(sdev, 0x40, fwsize, &sdev->dmab, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just set tag = 0 at start of this function if it's always going to be 0 ???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like answered above, the next patch will get the true tag value, this is only for debug.
sound/soc/sof/intel/hda-loader.c
Outdated
int tag, ret, i; | ||
u32 hipcie; | ||
u32 reg; | ||
const struct sof_intel_dsp_desc *chip = sdev->hda->desc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
structures should always be at the top of the variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will change that.
sound/soc/sof/intel/hda-loader.c
Outdated
} | ||
|
||
static int cl_dsp_init_skl(struct snd_sof_dev *sdev, const void *fwdata, | ||
u32 fwsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, github can skew the formatting a little, so it's best to always run checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run the checkpatch, but don't know why here is still not aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just github rendering, if checkpatch is OK then you are good.
sound/soc/sof/intel/hda-loader.c
Outdated
|
||
memcpy(sdev->dmab.area, fwdata, fwsize); | ||
|
||
snd_sof_dsp_update_bits(sdev, HDA_DSP_BAR, HDA_DSP_REG_ADSPIC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls comment what we are setting and why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so every register set need to be comment? Or only the main part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally, every register if its not obvious.
sound/soc/sof/intel/hda-loader.c
Outdated
u32 reg; | ||
const struct sof_intel_dsp_desc *chip = sdev->hda->desc; | ||
|
||
if (hda_dsp_core_is_enabled(sdev, HDA_DSP_CORE_MASK(0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to comment what this block does, and why we check core status here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will have comment on that.
sound/soc/sof/intel/hda-loader.c
Outdated
goto err; | ||
} | ||
} else { | ||
ret = hda_dsp_core_reset_power_down(sdev, HDA_DSP_CORE_MASK(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please comment this block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
sound/soc/sof/intel/hda-loader.c
Outdated
|
||
static int cl_copy_fw_skl(struct snd_sof_dev *sdev, int tag) | ||
{ | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is just for debug ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
sound/soc/sof/intel/hda-loader.c
Outdated
struct firmware stripped_firmware; | ||
int ret, tag; | ||
|
||
stripped_firmware.data = plat_data->fw->data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to temporarily add an offset here to strip the extended manifest when using the closed source firmware (as it has an extended manifest pre pended ato the start of the FW binary file) otherwise the FW will not authenticate or boot..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ask @keyonjie for extended manifest details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will ask Keyon what is the offset. We still need to check if the firmware is rightly booted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, FW will not boot without the offset change and remove of extended manifest
0930255
to
69a810a
Compare
1. enable the skl core 2. get the ROM int Signed-off-by: Zhu Yingjiang <[email protected]>
69a810a
to
2310452
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this PR is the correct one as it looks more complete than your other PR ? Can you close the old PR that way I know I'm reviewing the correct one.
unsigned int size, struct snd_dma_buffer *dmab, | ||
int direction) | ||
{ | ||
/* the skl cl dma don't use stream tag, ret is for debug */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this func if its not needed.
} | ||
|
||
/* prepare DMA for code loader stream */ | ||
ret = cl_stream_prepare_skl(sdev, 0x40, fwsize, &sdev->dmab, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic number 0x40, please use a macro
commit 5c4c450 upstream. The parameters of v4l2_ctrl_new_std_menu_items() are tricky: instead of the number of possible values, it requires the number of the maximum value. In other words, the ARRAY_SIZE() value should be decremented, otherwise it will go past the array bounds, as warned by KASAN: [ 279.839688] BUG: KASAN: global-out-of-bounds in v4l2_querymenu+0x10d/0x180 [videodev] [ 279.839709] Read of size 8 at addr ffffffffc10a4cb0 by task v4l2-compliance/16676 [ 279.839736] CPU: 1 PID: 16676 Comm: v4l2-compliance Not tainted 4.18.0-rc2+ thesofproject#120 [ 279.839741] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0364.2017.0511.0949 05/11/2017 [ 279.839743] Call Trace: [ 279.839758] dump_stack+0x71/0xab [ 279.839807] ? v4l2_querymenu+0x10d/0x180 [videodev] [ 279.839817] print_address_description+0x1c9/0x270 [ 279.839863] ? v4l2_querymenu+0x10d/0x180 [videodev] [ 279.839871] kasan_report+0x237/0x360 [ 279.839918] v4l2_querymenu+0x10d/0x180 [videodev] [ 279.839964] __video_do_ioctl+0x2c8/0x590 [videodev] [ 279.840011] ? copy_overflow+0x20/0x20 [videodev] [ 279.840020] ? avc_ss_reset+0xa0/0xa0 [ 279.840028] ? check_stack_object+0x21/0x60 [ 279.840036] ? __check_object_size+0xe7/0x240 [ 279.840080] video_usercopy+0xed/0x730 [videodev] [ 279.840123] ? copy_overflow+0x20/0x20 [videodev] [ 279.840167] ? v4l_enumstd+0x40/0x40 [videodev] [ 279.840177] ? __handle_mm_fault+0x9f9/0x1ba0 [ 279.840186] ? __pmd_alloc+0x2c0/0x2c0 [ 279.840193] ? __vfs_write+0xb6/0x350 [ 279.840200] ? kernel_read+0xa0/0xa0 [ 279.840244] ? video_usercopy+0x730/0x730 [videodev] [ 279.840284] v4l2_ioctl+0xa1/0xb0 [videodev] [ 279.840295] do_vfs_ioctl+0x117/0x8a0 [ 279.840303] ? selinux_file_ioctl+0x211/0x2f0 [ 279.840313] ? ioctl_preallocate+0x120/0x120 [ 279.840319] ? selinux_capable+0x20/0x20 [ 279.840332] ksys_ioctl+0x70/0x80 [ 279.840342] __x64_sys_ioctl+0x3d/0x50 [ 279.840351] do_syscall_64+0x6d/0x1c0 [ 279.840361] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 279.840367] RIP: 0033:0x7fdfb46275d7 [ 279.840369] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48 [ 279.840474] RSP: 002b:00007ffee1179038 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 279.840483] RAX: ffffffffffffffda RBX: 00007ffee1179180 RCX: 00007fdfb46275d7 [ 279.840488] RDX: 00007ffee11790c0 RSI: 00000000c02c5625 RDI: 0000000000000003 [ 279.840493] RBP: 0000000000000002 R08: 0000000000000020 R09: 00000000009f0902 [ 279.840497] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffee117a5a0 [ 279.840501] R13: 00007ffee11790c0 R14: 0000000000000002 R15: 0000000000000000 [ 279.840515] The buggy address belongs to the variable: [ 279.840535] tvp5150_test_patterns+0x10/0xffffffffffffe360 [tvp5150] Fixes: c43875f ("[media] tvp5150: replace MEDIA_ENT_F_CONN_TEST by a control") Cc: [email protected] Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 1413ef6 upstream. The struct cdev is embedded in the struct i2c_dev. In the current code, we would free the i2c_dev struct directly in put_i2c_dev(), but the cdev is manged by a kobject, and the release of it is not predictable. So it is very possible that the i2c_dev is freed before the cdev is entirely released. We can easily get the following call trace with CONFIG_DEBUG_KOBJECT_RELEASE and CONFIG_DEBUG_OBJECTS_TIMERS enabled. ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x38 WARNING: CPU: 19 PID: 1 at lib/debugobjects.c:325 debug_print_object+0xb0/0xf0 Modules linked in: CPU: 19 PID: 1 Comm: swapper/0 Tainted: G W 5.2.20-yocto-standard+ thesofproject#120 Hardware name: Marvell OcteonTX CN96XX board (DT) pstate: 80c00089 (Nzcv daIf +PAN +UAO) pc : debug_print_object+0xb0/0xf0 lr : debug_print_object+0xb0/0xf0 sp : ffff00001292f7d0 x29: ffff00001292f7d0 x28: ffff800b82151788 x27: 0000000000000001 x26: ffff800b892c0000 x25: ffff0000124a2558 x24: 0000000000000000 x23: ffff00001107a1d8 x22: ffff0000116b5088 x21: ffff800bdc6afca8 x20: ffff000012471ae8 x19: ffff00001168f2c8 x18: 0000000000000010 x17: 00000000fd6f304b x16: 00000000ee79de43 x15: ffff800bc0e80568 x14: 79616c6564203a74 x13: 6e6968207473696c x12: 5f72656d6974203a x11: ffff0000113f0018 x10: 0000000000000000 x9 : 000000000000001f x8 : 0000000000000000 x7 : ffff0000101294cc x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 00000000ffffffff x2 : 0000000000000000 x1 : 387fc15c8ec0f200 x0 : 0000000000000000 Call trace: debug_print_object+0xb0/0xf0 __debug_check_no_obj_freed+0x19c/0x228 debug_check_no_obj_freed+0x1c/0x28 kfree+0x250/0x440 put_i2c_dev+0x68/0x78 i2cdev_detach_adapter+0x60/0xc8 i2cdev_notifier_call+0x3c/0x70 notifier_call_chain+0x8c/0xe8 blocking_notifier_call_chain+0x64/0x88 device_del+0x74/0x380 device_unregister+0x54/0x78 i2c_del_adapter+0x278/0x2d0 unittest_i2c_bus_remove+0x3c/0x80 platform_drv_remove+0x30/0x50 device_release_driver_internal+0xf4/0x1c0 driver_detach+0x58/0xa0 bus_remove_driver+0x84/0xd8 driver_unregister+0x34/0x60 platform_driver_unregister+0x20/0x30 of_unittest_overlay+0x8d4/0xbe0 of_unittest+0xae8/0xb3c do_one_initcall+0xac/0x450 do_initcall_level+0x208/0x224 kernel_init_freeable+0x2d8/0x36c kernel_init+0x18/0x108 ret_from_fork+0x10/0x1c irq event stamp: 3934661 hardirqs last enabled at (3934661): [<ffff00001009fa04>] debug_exception_exit+0x4c/0x58 hardirqs last disabled at (3934660): [<ffff00001009fb14>] debug_exception_enter+0xa4/0xe0 softirqs last enabled at (3934654): [<ffff000010081d94>] __do_softirq+0x46c/0x628 softirqs last disabled at (3934649): [<ffff0000100b4a1c>] irq_exit+0x104/0x118 This is a common issue when using cdev embedded in a struct. Fortunately, we already have a mechanism to solve this kind of issue. Please see commit 233ed09 ("chardev: add helper function to register char devs with a struct device") for more detail. In this patch, we choose to embed the struct device into the i2c_dev, and use the API provided by the commit 233ed09 to make sure that the release of i2c_dev and cdev are in sequence. Signed-off-by: Kevin Hao <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Cc: Ben Hutchings <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
New printing support
[ Upstream commit 031af50 ] The inline assembly for arm64's cmpxchg_double*() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems. This is similar to what we fixed back in commit: fee960b ("arm64: xchg: hazard against entire exchange variable") ... but we forgot to adjust cmpxchg_double*() similarly at the same time. The same problem applies, as demonstrated with the following test: | struct big { | u64 lo, hi; | } __aligned(128); | | unsigned long foo(struct big *b) | { | u64 hi_old, hi_new; | | hi_old = b->hi; | cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); | hi_new = b->hi; | | return hi_old ^ hi_new; | } ... which GCC 12.1.0 compiles as: | 0000000000000000 <foo>: | 0: d503233f paciasp | 4: aa0003e4 mov x4, x0 | 8: 1400000e b 40 <foo+0x40> | c: d2800240 mov x0, #0x12 // thesofproject#18 | 10: d2800681 mov x1, #0x34 // thesofproject#52 | 14: aa0003e5 mov x5, x0 | 18: aa0103e6 mov x6, x1 | 1c: d2800ac2 mov x2, #0x56 // thesofproject#86 | 20: d2800f03 mov x3, #0x78 // thesofproject#120 | 24: 48207c82 casp x0, x1, x2, x3, [x4] | 28: ca050000 eor x0, x0, x5 | 2c: ca060021 eor x1, x1, x6 | 30: aa010000 orr x0, x0, x1 | 34: d2800000 mov x0, #0x0 // #0 <--- BANG | 38: d50323bf autiasp | 3c: d65f03c0 ret | 40: d2800240 mov x0, #0x12 // thesofproject#18 | 44: d2800681 mov x1, #0x34 // thesofproject#52 | 48: d2800ac2 mov x2, #0x56 // thesofproject#86 | 4c: d2800f03 mov x3, #0x78 // thesofproject#120 | 50: f9800091 prfm pstl1strm, [x4] | 54: c87f1885 ldxp x5, x6, [x4] | 58: ca0000a5 eor x5, x5, x0 | 5c: ca0100c6 eor x6, x6, x1 | 60: aa0600a6 orr x6, x5, x6 | 64: b5000066 cbnz x6, 70 <foo+0x70> | 68: c8250c82 stxp w5, x2, x3, [x4] | 6c: 35ffff45 cbnz w5, 54 <foo+0x54> | 70: d2800000 mov x0, #0x0 // #0 <--- BANG | 74: d50323bf autiasp | 78: d65f03c0 ret Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double(). This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified. With this change, GCC 12.1.0 compiles the above test as: | 0000000000000000 <foo>: | 0: f9400407 ldr x7, [x0, thesofproject#8] | 4: d503233f paciasp | 8: aa0003e4 mov x4, x0 | c: 1400000f b 48 <foo+0x48> | 10: d2800240 mov x0, #0x12 // thesofproject#18 | 14: d2800681 mov x1, #0x34 // thesofproject#52 | 18: aa0003e5 mov x5, x0 | 1c: aa0103e6 mov x6, x1 | 20: d2800ac2 mov x2, #0x56 // thesofproject#86 | 24: d2800f03 mov x3, #0x78 // thesofproject#120 | 28: 48207c82 casp x0, x1, x2, x3, [x4] | 2c: ca050000 eor x0, x0, x5 | 30: ca060021 eor x1, x1, x6 | 34: aa010000 orr x0, x0, x1 | 38: f9400480 ldr x0, [x4, thesofproject#8] | 3c: d50323bf autiasp | 40: ca0000e0 eor x0, x7, x0 | 44: d65f03c0 ret | 48: d2800240 mov x0, #0x12 // thesofproject#18 | 4c: d2800681 mov x1, #0x34 // thesofproject#52 | 50: d2800ac2 mov x2, #0x56 // thesofproject#86 | 54: d2800f03 mov x3, #0x78 // thesofproject#120 | 58: f9800091 prfm pstl1strm, [x4] | 5c: c87f1885 ldxp x5, x6, [x4] | 60: ca0000a5 eor x5, x5, x0 | 64: ca0100c6 eor x6, x6, x1 | 68: aa0600a6 orr x6, x5, x6 | 6c: b5000066 cbnz x6, 78 <foo+0x78> | 70: c8250c82 stxp w5, x2, x3, [x4] | 74: 35ffff45 cbnz w5, 5c <foo+0x5c> | 78: f9400480 ldr x0, [x4, thesofproject#8] | 7c: d50323bf autiasp | 80: ca0000e0 eor x0, x7, x0 | 84: d65f03c0 ret ... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect. For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities. I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1. Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double") Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by: Boqun Feng <[email protected]> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Mark Rutland <[email protected]> Cc: [email protected] Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Steve Capper <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
The inline assembly for arm64's cmpxchg_double*() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems. This is similar to what we fixed back in commit: fee960b ("arm64: xchg: hazard against entire exchange variable") ... but we forgot to adjust cmpxchg_double*() similarly at the same time. The same problem applies, as demonstrated with the following test: | struct big { | u64 lo, hi; | } __aligned(128); | | unsigned long foo(struct big *b) | { | u64 hi_old, hi_new; | | hi_old = b->hi; | cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); | hi_new = b->hi; | | return hi_old ^ hi_new; | } ... which GCC 12.1.0 compiles as: | 0000000000000000 <foo>: | 0: d503233f paciasp | 4: aa0003e4 mov x4, x0 | 8: 1400000e b 40 <foo+0x40> | c: d2800240 mov x0, #0x12 // thesofproject#18 | 10: d2800681 mov x1, #0x34 // thesofproject#52 | 14: aa0003e5 mov x5, x0 | 18: aa0103e6 mov x6, x1 | 1c: d2800ac2 mov x2, #0x56 // thesofproject#86 | 20: d2800f03 mov x3, #0x78 // thesofproject#120 | 24: 48207c82 casp x0, x1, x2, x3, [x4] | 28: ca050000 eor x0, x0, x5 | 2c: ca060021 eor x1, x1, x6 | 30: aa010000 orr x0, x0, x1 | 34: d2800000 mov x0, #0x0 // #0 <--- BANG | 38: d50323bf autiasp | 3c: d65f03c0 ret | 40: d2800240 mov x0, #0x12 // thesofproject#18 | 44: d2800681 mov x1, #0x34 // thesofproject#52 | 48: d2800ac2 mov x2, #0x56 // thesofproject#86 | 4c: d2800f03 mov x3, #0x78 // thesofproject#120 | 50: f9800091 prfm pstl1strm, [x4] | 54: c87f1885 ldxp x5, x6, [x4] | 58: ca0000a5 eor x5, x5, x0 | 5c: ca0100c6 eor x6, x6, x1 | 60: aa0600a6 orr x6, x5, x6 | 64: b5000066 cbnz x6, 70 <foo+0x70> | 68: c8250c82 stxp w5, x2, x3, [x4] | 6c: 35ffff45 cbnz w5, 54 <foo+0x54> | 70: d2800000 mov x0, #0x0 // #0 <--- BANG | 74: d50323bf autiasp | 78: d65f03c0 ret Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double(). This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified. With this change, GCC 12.1.0 compiles the above test as: | 0000000000000000 <foo>: | 0: f9400407 ldr x7, [x0, thesofproject#8] | 4: d503233f paciasp | 8: aa0003e4 mov x4, x0 | c: 1400000f b 48 <foo+0x48> | 10: d2800240 mov x0, #0x12 // thesofproject#18 | 14: d2800681 mov x1, #0x34 // thesofproject#52 | 18: aa0003e5 mov x5, x0 | 1c: aa0103e6 mov x6, x1 | 20: d2800ac2 mov x2, #0x56 // thesofproject#86 | 24: d2800f03 mov x3, #0x78 // thesofproject#120 | 28: 48207c82 casp x0, x1, x2, x3, [x4] | 2c: ca050000 eor x0, x0, x5 | 30: ca060021 eor x1, x1, x6 | 34: aa010000 orr x0, x0, x1 | 38: f9400480 ldr x0, [x4, thesofproject#8] | 3c: d50323bf autiasp | 40: ca0000e0 eor x0, x7, x0 | 44: d65f03c0 ret | 48: d2800240 mov x0, #0x12 // thesofproject#18 | 4c: d2800681 mov x1, #0x34 // thesofproject#52 | 50: d2800ac2 mov x2, #0x56 // thesofproject#86 | 54: d2800f03 mov x3, #0x78 // thesofproject#120 | 58: f9800091 prfm pstl1strm, [x4] | 5c: c87f1885 ldxp x5, x6, [x4] | 60: ca0000a5 eor x5, x5, x0 | 64: ca0100c6 eor x6, x6, x1 | 68: aa0600a6 orr x6, x5, x6 | 6c: b5000066 cbnz x6, 78 <foo+0x78> | 70: c8250c82 stxp w5, x2, x3, [x4] | 74: 35ffff45 cbnz w5, 5c <foo+0x5c> | 78: f9400480 ldr x0, [x4, thesofproject#8] | 7c: d50323bf autiasp | 80: ca0000e0 eor x0, x7, x0 | 84: d65f03c0 ret ... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect. For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities. I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1. Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double") Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by: Boqun Feng <[email protected]> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Mark Rutland <[email protected]> Cc: [email protected] Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Steve Capper <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs") for s390x, add support for 64-bit pointers to kfuncs for LoongArch. Since the infrastructure is already implemented in BPF core, the only thing need to be done is to override bpf_jit_supports_far_kfunc_call(). Before this change, several test_verifier tests failed: # ./test_verifier | grep # | grep FAIL thesofproject#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL thesofproject#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL thesofproject#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL thesofproject#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL thesofproject#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL thesofproject#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL thesofproject#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL thesofproject#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL thesofproject#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL thesofproject#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL thesofproject#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL thesofproject#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL thesofproject#486/p map_kptr: ref: reference state created and released on xchg FAIL This is because the kfuncs in the loaded module are far away from __bpf_call_base: ffff800002009440 t bpf_kfunc_call_test_fail1 [bpf_testmod] 9000000002e128d8 T __bpf_call_base The offset relative to __bpf_call_base does NOT fit in s32, which breaks the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts this limit. Note that to reproduce the above result, tools/testing/selftests/bpf/config should be applied, and run the test with JIT enabled, unpriv BPF enabled. With this change, the test_verifier tests now all passed: # ./test_verifier ... Summary: 777 PASSED, 0 SKIPPED, 0 FAILED Tested-by: Tiezhu Yang <[email protected]> Signed-off-by: Hengqi Chen <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
enable the kbl DSP core.
get the ROM init.